The intention is to get rid of the l2cap_sk_list usage inside
l2cap_core.c. l2cap_sk_list will soon be replaced by a list that does not
depend on socket usage.
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
First, make callers pass on-stack flowi4 to ip_route_output_gre()
so they can get at the fully resolved flow key.
Next, use that in ipgre_tunnel_xmit() to avoid the need to use
rt->rt_{dst,src}.
Signed-off-by: David S. Miller <davem@davemloft.net>
To more accurately reflect that it is purely a routing
cache lookup key and is used in no other context.
Signed-off-by: David S. Miller <davem@davemloft.net>
Four years ago, Patrick made a change to hold rtnl mutex during netlink
dump callbacks.
I believe it was a wrong move. This slows down concurrent dumps, making
good old /proc/net/ files faster than rtnetlink in some situations.
This occurred to me because one "ip link show dev ..." was _very_ slow
on a workload adding/removing network devices in background.
All dump callbacks are able to use RCU locking now, so this patch does
roughly a revert of commits :
1c2d670f36 : [RTNETLINK]: Hold rtnl_mutex during netlink dump callbacks
6313c1e099 : [RTNETLINK]: Remove unnecessary locking in dump callbacks
This let writers fight for rtnl mutex and readers going full speed.
It also takes care of phonet : phonet_route_get() is now called from rcu
read section. I renamed it to phonet_route_get_rcu()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Remi Denis-Courmont <remi.denis-courmont@nokia.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that output route lookups update the flow with
source address et al. selections, the fl4->{saddr,daddr}
assignments here are no longer necessary.
Signed-off-by: David S. Miller <davem@davemloft.net>
We lack proper synchronization to manipulate inet->opt ip_options
Problem is ip_make_skb() calls ip_setup_cork() and
ip_setup_cork() possibly makes a copy of ipc->opt (struct ip_options),
without any protection against another thread manipulating inet->opt.
Another thread can change inet->opt pointer and free old one under us.
Use RCU to protect inet->opt (changed to inet->inet_opt).
Instead of handling atomic refcounts, just copy ip_options when
necessary, to avoid cache line dirtying.
We cant insert an rcu_head in struct ip_options since its included in
skb->cb[], so this patch is large because I had to introduce a new
ip_options_rcu structure.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Even for keys that shouldn't be stored some use cases require the
knowledge of a new key having been created so that the conclusion of a
successful pairing can be made. Therefore, always send the mgmt_new_key
event but add a store_hint parameter to it to indicate to user space
whether the key should be stored or not.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
User space shouldn't have any need for the old key type so remove it
from the corresponding Management interface event.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Link keys should only be stored if very specific criteria of the
authentication process are fulfilled. This patch essentially copies the
criteria that user space has so far been using to the kernel side so
that the management interface works properly.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
When accepting a pairing request which fulfills the SSP auto-accept
criteria we need to push the request all the way to the user for
confirmation. This patch adds a new hint to the user_confirm_request
management event so user space can know when to show a numeric
comparison dialog and when to show a simple yes/no confirmation dialog.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Some test systems require an arbitrary delay to the auto-accept test
cases for Secure Simple Pairing in order for the tests to pass.
Previously when this was handled in user space it was worked around by
code modifications and recompilation, but now that it's on the kernel
side it's more convenient if there's a debugfs interface for it.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
This patch adds a new API for setting a TX rate mask in
drivers that have rate control in either the firmware or hardware.
This can be used for various purposes, for example, masking out the
11b rates in P2P operation.
Signed-off-by: Sujith Manoharan <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add API that allows low level drivers to notify mac80211 about TX
packet loss. This is useful when there are FW triggers to notify the
low level driver about these events.
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Extend the mgmt_pin_code_request interface to require secure
pin code (16 digit) for authentication.
This is a kernel part of the secure pin code requirement notification
to user space agent.
Code styling fix by Johan Hedberg.
Signed-off-by: Waldemar Rymarkiewicz <waldemar.rymarkiewicz@tieto.com>
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Keep the link key type together with connection and use it to
map security level to link key requirements. Authenticate and/or
encrypt connection if the link is insufficiently secure.
Signed-off-by: Waldemar Rymarkiewicz <waldemar.rymarkiewicz@tieto.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Introduce the link key types defs and use them instead of magic numbers.
Signed-off-by: Waldemar Rymarkiewicz <waldemar.rymarkiewicz@tieto.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
This patch adds a new event to the Management interface to track when
local adapters are discovering remote devices. For now this only tracks
BR/EDR discovery procedures.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Anderson Briglia <anderson.briglia@openbossa.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
This patch adds start_discovery and stop_discovery commands to the
management interface. Right now their implementation is fairly
simplistic and the parameters are fixed to what user space has
defaulted to so far.
This is the very initial phase for discovery implementation into
the kernel. Next steps include name resolution, LE scanning and
bdaddr type handling.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Anderson Briglia <anderson.briglia@openbossa.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
There is no need to the socket deal directly with the channel, most of the
time it cares about the channel only.
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
In this commit, omtu, imtu, flush_to, mode and sport. It also remove the
pi var from l2cap_sock_sendmsg().
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
l2cap_chan_connect() is a much better name and reflects what this
functions is doing (or will do once socket dependence is removed from the
core).
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
If the allocation happens at l2cap_sock_create() will be able to use the
struct l2cap_chan to store channel info that comes from the user via
setsockopt.
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
It's not used by anything in the kernel, and defined in net/route.h so
never exported to userspace.
Therefore we can safely remove it.
Signed-off-by: David S. Miller <davem@davemloft.net>
These functions are used together as a unit for route resolution
during connect(). They address the chicken-and-egg problem that
exists when ports need to be allocated during connect() processing,
yet such port allocations require addressing information from the
routing code.
It's currently more heavy handed than it needs to be, and in
particular we allocate and initialize a flow object twice.
Let the callers provide the on-stack flow object. That way we only
need to initialize it once in the ip_route_connect() call.
Later, if ip_route_newports() needs to do anything, it re-uses that
flow object as-is except for the ports which it updates before the
route re-lookup.
Also, describe why this set of facilities are needed and how it works
in a big comment.
Signed-off-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Change the call to take the transport parameter and set the
cached 'dst' appropriately inside the get_dst() function calls.
This will allow us in the future to clean up source address
storage as well.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no point in passing a destination address to
a get_saddr() call.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ipv6 routing lookup does give us a source address,
but instead of filling it into the dst, it's stored in
the flowi. We can use that instead of going through the
entire source address selection again.
Also the useless ->dst_saddr member of sctp_pf is removed.
And sctp_v6_dst_saddr() is removed, instead by introduce
sctp_v6_to_addr(), which can be reused to cleanup some dup
code.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These header files are never installed to user consumption, so any
__KERNEL__ cpp checks are superfluous.
Projects should also not copy these files into their userland utility
sources and try to use them there. If they insist on doing so, the
onus is on them to sanitize the headers as needed.
Signed-off-by: David S. Miller <davem@davemloft.net>
Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers
where possible, to make code intention more obvious.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implement event notification SCTP_SENDER_DRY_EVENT.
SCTP Socket API Extensions:
6.1.9. SCTP_SENDER_DRY_EVENT
When the SCTP stack has no more user data to send or retransmit, this
notification is given to the user. Also, at the time when a user app
subscribes to this event, if there is no data to be sent or
retransmit, the stack will immediately send up this notification.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch change the auth event type name to SCTP_AUTHENTICATION_EVENT,
which is based on API extension compliance.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch Implement socket option SCTP_GET_ASSOC_ID_LIST.
SCTP Socket API Extension:
8.2.6. Get the Current Identifiers of Associations
(SCTP_GET_ASSOC_ID_LIST)
This option gets the current list of SCTP association identifiers of
the SCTP associations handled by a one-to-many style socket.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make heartbeat information in sctp_make_heartbeat() instead
of make it in sctp_sf_heartbeat() directly for common using.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SCTP does not check whether the source address of COOKIE-ECHO
chunk is the original address of INIT chunk or part of the any
address parameters saved in COOKIE in CLOSED state. So even if
the COOKIE-ECHO chunk is from any address but with correct COOKIE,
the COOKIE-ECHO chunk still be accepted. If the COOKIE is not from
a valid address, the assoc should not be established.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SCTP does not SCTP_STATE_EMPTY and we can never be in
that state. Remove useless code.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When pos.v <= (void *)chunk + end - ntohs(pos.p->length) and
ntohs(pos.p->length) >= sizeof(sctp_paramhdr_t) these two expressions are all true,
pos.v <= (void *)chunk + end - sizeof(sctp_paramhdr_t) *must* be true.
This patch removes this kind of redundant check.
It's same to _sctp_walk_errors macro.
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove SCTP_CMD_TRANSMIT command as it never be used.
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The macro never be used.
And if needed, can use !sctp_chunk_is_data instead of.
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This allows a driver to buffer frames for a PS station and tell mac80211
to wake it up even though mac80211 does not have any buffered frames for
it.
This is necessary for properly handling aggregation related buffering,
in ath9k, because the driver needs to keep its frames in order to keep
track of the Block-ACK window.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
[ipv6] Add support for RTA_PREFSRC
This patch allows a user to select the preferred source address
for a specific IPv6-Route. It can be set via a netlink message
setting RTA_PREFSRC to a valid IPv6 address which must be
up on the device the route will be bound to.
Signed-off-by: Daniel Walter <dwalter@barracuda.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Harsh Prateek Bora <harsh@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric VAn Hensbergen <ericvh@gmail.com>
Now that we use write_inode to flush server
cache related to fid, we don't need tsyncfs either fort dotl or dotu
protocols. For dotu this helps to do a more efficient server flush.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
fib_select_default() is a complete NOP, and completely pointless
to invoke, when we have no more than 1 default route installed.
And this is far and away the common case.
So remember how many prefixlen==0 routes we have in the routing
table, and elide the call when we have no more than one of those.
This cuts output route creation time by 157 cycles on Niagara2+.
In order to add the new int to fib_table, we have to correct the type
of ->tb_data[] to unsigned long, otherwise the private area will be
unaligned on 64-bit systems.
Signed-off-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Add station connected time in debugfs. This will be helpful to get a
measure of stability of the connection and for debugging stress issues
Cc: Senthilkumar Balasubramanian <Senthilkumar.Balasubramanian@Atheros.com>
Signed-off-by: Mohammed Shafi Shajakhan <mshajakhan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Notify userspace when a beacon/presp is received from a suitable mesh
peer candidate for whom no sta information exists. Userspace can then
decide to create a sta info for the candidate. If userspace is not
ready to authenticate the peer right away, it can create the sta info
with the authenticated flag unset and set it later.
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
During mesh setup, use NL80211_MESH_SETUP_USERSPACE_AUTH flag to create
a secure mesh and route management frames to userspace.
Also, NL80211_CMD_GET_WIPHY now returns a flag NL80211_SUPPORT_MESH_AUTH
if the wiphy's mesh implementation supports routing of mesh auth frames
to userspace. This is useful for forward compatibility between old
kernels and new userspace tools.
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
To NL80211_MESH_SETUP_IE. This reflects our ability to insert any ie
into a mesh beacon, not simply path selection ies.
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
In a highly noisy environment, the tx rate of the driver drops and
the application slows down since it has not yet received ACKs for
the frames already queued in the hardware. Since this ACK may take
more than 100ms, stopping the dev queues for entering PS at this
stage breaks applications, WMM test cases in my testing.
If there are frames already pending in the tx queue, postponing the
PS logic helps to avoid redundant queue stops. When power save is
enabled by default and in a noisy environment, this API certainly
helps in improving the average throughput.
Signed-off-by: Vivek Natarajan <vnatarajan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Cleanup of new CAIF code.
* make local functions static
* remove code that is never used
* expand get_caif_conf() since wrapper is no longer needed
* make args to comparison functions const
* rename connect_req_to_link_param to keep exported names
consistent
Compile tested only.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (34 commits)
net: Add support for SMSC LAN9530, LAN9730 and LAN89530
mlx4_en: Restoring RX buffer pointer in case of failure
mlx4: Sensing link type at device initialization
ipv4: Fix "Set rt->rt_iif more sanely on output routes."
MAINTAINERS: add entry for Xen network backend
be2net: Fix suspend/resume operation
be2net: Rename some struct members for clarity
pppoe: drop PPPOX_ZOMBIEs in pppoe_flush_dev
dsa/mv88e6131: add support for mv88e6085 switch
ipv6: Enable RFS sk_rxhash tracking for ipv6 sockets (v2)
be2net: Fix a potential crash during shutdown.
bna: Fix for handling firmware heartbeat failure
can: mcp251x: Allow pass IRQ flags through platform data.
smsc911x: fix mac_lock acquision before calling smsc911x_mac_read
iwlwifi: accept EEPROM version 0x423 for iwl6000
rt2x00: fix cancelling uninitialized work
rtlwifi: Fix some warnings/bugs
p54usb: IDs for two new devices
wl12xx: fix potential buffer overflow in testmode nvs push
zd1211rw: reset rx idle timer from tasklet
...
The reverse path filter interferes with IPsec subnet-to-subnet tunnels,
especially when the link to the IPsec peer is on an interface other than
the one hosting the default route.
With dynamic routing, where the peer might be reachable through eth0
today and eth1 tomorrow, it's difficult to keep rp_filter enabled unless
fake routes to the remote subnets are configured on the interface
currently used to reach the peer.
IPsec provides a much stronger anti-spoofing policy than rp_filter, so
this patch disables the rp_filter for packets with a security path.
Signed-off-by: Michael Smith <msmith@cbnco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This makes sk_buff available for other use in fib_validate_source().
Signed-off-by: Michael Smith <msmith@cbnco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is part of "moving things to l2cap_chan". As one the first move it
triggered a big number of changes in the funcions parameters, basically
changing the struct sock param to struct l2cap_chan.
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
As we use struct list_head to keep L2CAP channels list the workaround with
del_list is not needed anymore.
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Use a well known Kernel API is always a good idea than implement your own
list.
In the future we might use RCU on this list.
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
struct l2cap_chan cames to create a clear separation between what
properties and data belongs to the L2CAP channel and what belongs to the
socket. By now we just fold the struct sock * in struct l2cap_chan as all
the channel info is struct l2cap_pinfo today.
In the next commits we will see a move of channel stuff to struct
l2cap_chan.
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Commit 1018b5c016 ("Set rt->rt_iif more
sanely on output routes.") breaks rt_is_{output,input}_route.
This became the cause to return "IP_PKTINFO's ->ipi_ifindex == 0".
To fix it, this does:
1) Add "int rt_route_iif;" to struct rtable
2) For input routes, always set rt_route_iif to same value as rt_iif
3) For output routes, always set rt_route_iif to zero. Set rt_iif
as it is done currently.
4) Change rt_is_{output,input}_route() to test rt_route_iif
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
This allows user-space monitoring of BSS parameters for the associated
station. This is useful for debugging and verifying that the paramaters
are as expected.
[Exactly the same as before but bundled into a single message]
Signed-off-by: Paul Stewart <pstew@chromium.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
commit c6e1a0d12c broken the calc
(net: Allow no-cache copy from user on transmit)
of checksum, which may cause some tcp packets be dropped because
incorrect checksum. ssh does not work under today's net-next-2.6
tree.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch uses __copy_from_user_nocache on transmit to bypass data
cache for a performance improvement. skb_add_data_nocache and
skb_copy_to_page_nocache can be called by sendmsg functions to use
this feature, initial support is in tcp_sendmsg. This functionality is
configurable per device using ethtool.
Presumably, this feature would only be useful when the driver does
not touch the data. The feature is turned on by default if a device
indicates that it does some form of checksum offload; it is off by
default for devices that do no checksum offload or indicate no checksum
is necessary. For the former case copy-checksum is probably done
anyway, in the latter case the device is likely loopback in which case
the no cache copy is probably not beneficial.
This patch was tested using 200 instances of netperf TCP_RR with
1400 byte request and one byte reply. Platform is 16 core AMD x86.
No-cache copy disabled:
672703 tps, 97.13% utilization
50/90/99% latency:244.31 484.205 1028.41
No-cache copy enabled:
702113 tps, 96.16% utilization,
50/90/99% latency 238.56 467.56 956.955
Using 14000 byte request and response sizes demonstrate the
effects more dramatically:
No-cache copy disabled:
79571 tps, 34.34 %utlization
50/90/95% latency 1584.46 2319.59 5001.76
No-cache copy enabled:
83856 tps, 34.81% utilization
50/90/95% latency 2508.42 2622.62 2735.88
Note especially the effect on latency tail (95th percentile).
This seems to provide a nice performance improvement and is
consistent in the tests I ran. Presumably, this would provide
the greatest benfits in the presence of an application workload
stressing the cache and a lot of transmit data happening.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a new remote_name event to the Management interface
which is sent every time the name of a remote device is resolved (over
BR/EDR).
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
This patch adds a device_found event to the Management interface. For
now the event only maps to BR/EDR inquiry result HCI events, but in the
future the plan is to also use it for the LE device discovery process.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
The description for buf_size was misleading and
just said you couldn't TX larger aggregates, but
of course you can't TX aggregates in a way that
would exceed the window either, which is possible
even if the aggregates are shorter than that.
Expand the description, thanks to Emmanuel for
explaining this to me.
Cc: Emmanuel Grumbach <egrumbach@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
ipvsadm -ln --daemon will trigger a Null pointer exception because
ip_vs_genl_dump_daemons() uses skb_net() instead of skb_sknet().
To prevent others from NULL ptr a check is made in ip_vs.h skb_net().
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
auth_hmacs field of struct sctp_cookie is used for store
Requested HMAC Algorithm Parameter, and each HMAC Identifier
is 2 bytes, so the length should be:
SCTP_AUTH_NUM_HMACS * sizeof(__u16) + 2
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can't send new commands before a cmd_complete for the HCI_RESET command
shows up.
Reported-by: Mikko Vinni <mmvinni@yahoo.com>
Reported-by: Justin P. Mattock <justinmattock@gmail.com>
Reported-by: Ed Tomlinson <edt@aei.ca>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Tested-by: Justin P. Mattock <justinmattock@gmail.com>
Tested-by: Mikko Vinni <mmvinni@yahoo.com>
Tested-by: Ed Tomlinson <edt@aei.ca>
This patch adds automated creation of the local EIR data based on what
16-bit UUIDs are registered and what the device name is. This should
cover the majority use cases, however things like 32/128-bit UUIDs, TX
power and Device ID will need to be added later to be on par with what
bluetoothd is capable of doing (without the Management interface).
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
This patch adds commands to add and remove remote OOB data to the managment
interface. Remote data is stored in kernel and can be used by corresponding
HCI commands and events when needed.
Signed-off-by: Szymon Janc <szymon.janc@tieto.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
This patch adds a command to read local OOB data to the managment interface.
The command maps directly to the Read Local OOB Data HCI command.
Signed-off-by: Szymon Janc <szymon.janc@tieto.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
This patch adds a new set_local_name management command as well as a
local_name_changed management event. With these user space can both
change the local name as well as monitor changes to it by others.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
This patch adds the name of the adapter to the reply of the read_info
management command.
The management messages reserve 249 bytes for the name instead of 248
(like in the HCI spec) so that there is always a guarantee that it is
nul-terminated. That way it can safely be passed onto string
manipulation functions.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
This patch adds a clear define for the maximum device name length in HCI
messages and thereby avoids magic numbers in the code.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
On-stack initialization via assignment of flow structures are
expensive because GCC emits a memset() to clear the entire
structure out no matter what.
Add a helper for ipv4 output flow key setup which we can use to avoid
the memset.
Signed-off-by: David S. Miller <davem@davemloft.net>
Indicate an NL80211_CMD_DEL_STATION event when a station entry in
mac80211 is deleted to match with the NL80211_CMD_NEW_STATION event
that is used when the entry was added. This is needed, e.g., to allow
user space to remove a peer from RSN IBSS Authenticator state machine
to avoid re-authentication and re-keying delays when the peer is not
reachable anymore.
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
My commit 6d55cb91a0 (gre: fix hard header destination
address checking) broke multicast.
The reason is that ip_gre used to get ipgre_header() calls with
zero destination if we have NOARP or multicast destination. Instead
the actual target was decided at ipgre_tunnel_xmit() time based on
per-protocol dissection.
Instead of allowing the "abuse" of ->header() calls with invalid
destination, this creates multicast mappings for ip_gre. This also
fixes "ip neigh show nud noarp" to display the proper multicast
mappings used by the gre device.
Reported-by: Doug Kehn <rdkehn@yahoo.com>
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Acked-by: Doug Kehn <rdkehn@yahoo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we clone a xfrm state we have to assign the replay_esn
and the preplay_esn pointers to the state if we use the
new replay detection method. To this end, we add a
xfrm_replay_clone() function that allocates memory for
the replay detection and takes over the necessary values
from the original state.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Define some constant offsets for CALL_REQUEST based on the description
at <http://www.techfest.com/networking/wan/x25plp.htm> and the
definition of ROSE as using 10-digit (5-byte) addresses. Use them
consistently. Validate all implicit and explicit facilities lengths.
Validate the address length byte rather than either trusting or
assuming its value.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
We clone the child entry in skb_dst_pop before we call
skb_dst_drop(). Otherwise we might kill the child right
before we return it to the caller.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we set up the flow informations in ip_route_newports(), we take
the address informations from the the rt_key_src and rt_key_dst fields
of the rtable. They appear to be empty. So take the address
informations from rt_src and rt_dst instead. This issue was introduced
by commit 5e2b61f784 ("ipv4: Remove
flowi from struct rtable.")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the scope value out of the fib alias entries and into fib_info,
so that we always use the correct scope when recomputing the nexthop
cached source address.
Reported-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
Any operation that:
1) Brings up an interface
2) Adds an IP address to an interface
3) Deletes an IP address from an interface
can potentially invalidate the nh_saddr value, requiring
it to be recomputed.
Perform the recomputation lazily using a generation ID.
Reported-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can't send new commands before a cmd_complete for the HCI_RESET command
shows up.
Reported-by: Mikko Vinni <mmvinni@yahoo.com>
Reported-by: Justin P. Mattock <justinmattock@gmail.com>
Reported-by: Ed Tomlinson <edt@aei.ca>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Tested-by: Justin P. Mattock <justinmattock@gmail.com>
Tested-by: Mikko Vinni <mmvinni@yahoo.com>
Tested-by: Ed Tomlinson <edt@aei.ca>
commit fd245a4adb (net_sched: move TCQ_F_THROTTLED flag)
added a race.
qdisc_watchdog() is run from softirq, so special care should be taken or
we can lose one state transition (THROTTLED/RUNNING)
Prior to fd245a4adb, we were manipulating q->flags (qdisc->flags &=
~TCQ_F_THROTTLED;) and this manipulation could only race with
qdisc_warn_nonwc().
Since we want to avoid atomic ops in qdisc fast path - it was the
meaning of commit 3711210576 (QDISC_STATE_RUNNING dont need atomic
bit ops) - fix is to move THROTTLE bit into 'state' field, this one
being manipulated with SMP and IRQ safe operations.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This avoids explicit cast to avoid 'discards qualifiers'
compiler warning in a netfilter patch that i've been working on.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Sidorenko reported for problems with local
routes left after IP addresses are deleted. It happens
when same IPs are used in more than one subnet for the
device.
Fix fib_del_ifaddr to restrict the checks for duplicate
local and broadcast addresses only to the IFAs that use
our primary IFA or another primary IFA with same address.
And we expect the prefsrc to be matched when the routes
are deleted because it is possible they to differ only by
prefsrc. This patch prevents local and broadcast routes
to be leaked until their primary IP is deleted finally
from the box.
As the secondary address promotion needs to delete
the routes for all secondaries that used the old primary IFA,
add option to ignore these secondaries from the checks and
to assume they are already deleted, so that we can safely
delete the route while these IFAs are still on the device list.
Reported-by: Alex Sidorenko <alexandre.sidorenko@hp.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
We dont need to test if we run from softirq context, we definitely are.
This saves few instructions in ip_rcv() & ip_rcv_finish()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix many of each of these warnings:
Warning(include/net/cfg80211.h:519): No description found for parameter 'rxrate'
Warning(include/net/mac80211.h:1163): bad line:
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1480 commits)
bonding: enable netpoll without checking link status
xfrm: Refcount destination entry on xfrm_lookup
net: introduce rx_handler results and logic around that
bonding: get rid of IFF_SLAVE_INACTIVE netdev->priv_flag
bonding: wrap slave state work
net: get rid of multiple bond-related netdevice->priv_flags
bonding: register slave pointer for rx_handler
be2net: Bump up the version number
be2net: Copyright notice change. Update to Emulex instead of ServerEngines
e1000e: fix kconfig for crc32 dependency
netfilter ebtables: fix xt_AUDIT to work with ebtables
xen network backend driver
bonding: Improve syslog message at device creation time
bonding: Call netif_carrier_off after register_netdevice
bonding: Incorrect TX queue offset
net_sched: fix ip_tos2prio
xfrm: fix __xfrm_route_forward()
be2net: Fix UDP packet detected status in RX compl
Phonet: fix aligned-mode pipe socket buffer header reserve
netxen: support for GbE port settings
...
Fix up conflicts in drivers/staging/brcm80211/brcmsmac/wl_mac80211.c
with the staging updates.
* 'tty-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6: (76 commits)
pch_uart: reference clock on CM-iTC
pch_phub: add new device ML7213
n_gsm: fix UIH control byte : P bit should be 0
n_gsm: add a documentation
serial: msm_serial_hs: Add MSM high speed UART driver
tty_audit: fix tty_audit_add_data live lock on audit disabled
tty: move cd1865.h to drivers/staging/tty/
Staging: tty: fix build with epca.c driver
pcmcia: synclink_cs: fix prototype for mgslpc_ioctl()
Staging: generic_serial: fix double locking bug
nozomi: don't use flush_scheduled_work()
tty/serial: Relax the device_type restriction from of_serial
MAINTAINERS: Update HVC file patterns
tty: phase out of ioctl file pointer for tty3270 as well
tty: forgot to remove ipwireless from drivers/char/pcmcia/Makefile
pch_uart: Fix DMA channel miss-setting issue.
pch_uart: fix exclusive access issue
pch_uart: fix auto flow control miss-setting issue
pch_uart: fix uart clock setting issue
pch_uart : Use dev_xxx not pr_xxx
...
Fix up trivial conflicts in drivers/misc/pch_phub.c (same patch applied
twice, then changes to the same area in one branch)
If a transport prefers payload to be sent separate from the PDU
(P9_TRANS_PREF_PAYLOAD_SEP), there is no need to allocate msize
PDU buffers(struct p9_fcall).
This patch allocates only upto 4k buffers for this kind of transports
and there won't be any change to the legacy transports.
Hence, this patch on top of zero copy changes allows user to
specify higher msizes through the mount option
without hogging the kernel heap.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
This patch adds preferences field to the p9_trans_module.
Through this, now transport layer can express its preference about the
payload. i.e if payload neds to be part of the PDU or it prefers it
to be sent sepearetly so that the transport layer can handle it in
a better way.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
This patch prepares p9_fcall structure for zero copy. Added
fields send the payload buffer information to the transport layer.
In addition it adds a 'private' field for the transport layer to
store mapped/pinned page information so that it can be freed/unpinned
during req_done.
This patch also creates trans_common.[ch] to house helper functions.
It adds the following helper functions.
p9_release_req_pages - Release pages after the transaction.
p9_nr_pages - Return number of pages needed to accomodate the payload.
payload_gup - Translates user buffer into kernel pages.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
ip_vs_conntrack_enabled() becomes a noop when CONFIG_SYSCTL is undefined.
In preparation for not including sysctl_conntrack in
struct netns_ipvs when CONFIG_SYCTL is not defined.
Signed-off-by: Simon Horman <horms@verge.net.au>
In preparation for not including sysctl_sync_ver in
struct netns_ipvs when CONFIG_SYCTL is not defined.
Signed-off-by: Simon Horman <horms@verge.net.au>
In preparation for not including sysctl_sync_threshold in
struct netns_ipvs when CONFIG_SYCTL is not defined.
Signed-off-by: Simon Horman <horms@verge.net.au>
Rename ip_vs_new_estimator to ip_vs_start_estimator
and ip_vs_kill_estimator to ip_vs_stop_estimator to better
match their logic.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Move the estimator reading from estimation_timer to user
context. ip_vs_read_estimator() will be used to decode the rate
values. As the decoded rates are not set by estimation timer
there is no need to reset them in ip_vs_zero_stats.
There is no need ip_vs_new_estimator() to encode stats
to rates, if the destination is in trash both the stats and the
rates are inactive.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Remove ustats_seq, IPVS_STAT_INC and IPVS_STAT_ADD
because they are not used. They were replaced with u64_stats.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Currently, the new percpu counters are not zeroed and
the zero commands do not work as expected, we still show the old
sum of percpu values. OTOH, we can not reset the percpu counters
from user context without causing the incrementing to use old
and bogus values.
So, as Eric Dumazet suggested fix that by moving all overhead
to stats reading in user context. Do not introduce overhead in
timer context (estimator) and incrementing (packet handling in
softirqs).
The new ustats0 field holds the zero point for all
counter values, the rates always use 0 as base value as before.
When showing the values to user space just give the difference
between counters and the base values. The only drawback is that
percpu stats are not zeroed, they are accessible only from /proc
and are new interface, so it should not be a compatibility problem
as long as the sum stats are correct after zeroing.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
The global tot_stats contains cpustats field just like the
stats for dest and svc, so better use it to simplify the usage
in estimation_timer. As tot_stats is registered as estimator
we can remove the special ip_vs_read_cpu_stats call for
tot_stats. Fix ip_vs_read_cpu_stats to be called under
stats lock because it is still used as synchronization between
estimation timer and user context (the stats readers).
Also, make sure ip_vs_stats_percpu_show reads properly
the u64 stats from user context.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Remove include/net/netns/ip_vs.h because it depends on
structures from include/net/ip_vs.h. As ipvs is pointer in
struct net it is better to move struct netns_ipvs into
include/net/ip_vs.h, so that we can easily use other structures
in struct netns_ipvs.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
There's no sense to 'ct = ct = ' in ip_vs_notrack(). Just assign
nf_ct_get()'s return value directly to the pointer variable 'ct' once.
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Simon Horman <horms@verge.net.au>
This patch adds support for IPsec extended sequence numbers (esn)
as defined in RFC 4303. The bits to manage the anti-replay window
are based on a patch from Alex Badea.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
To support multiple versions of replay detection, we move the replay
detection functions to a separate file and make them accessible
via function pointers contained in the struct xfrm_replay.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
To support IPsec extended sequence numbers, we split the
output sequence numbers of xfrm_skb_cb in low and high order 32 bits
and we add the high order 32 bits to the input sequence numbers.
All users are updated accordingly.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the struct xfrm_replay_state_esn which will be
used to support IPsec extended sequence numbers and anti replay windows
bigger than 32 packets. Also we add a function that returns the actual
size of the xfrm_replay_state_esn, a xfrm netlink atribute and a xfrm state
flag for the use of extended sequence numbers.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
To start doing these conversions, we need to add some temporary
flow4_* macros which will eventually go away when all the protocol
code paths are changed to work on AF specific flowi objects.
Signed-off-by: David S. Miller <davem@davemloft.net>
This is just a shorthand which will help in passing around AF
specific flow structures as generic ones.
Signed-off-by: David S. Miller <davem@davemloft.net>
Now we have struct flowi4, flowi6, and flowidn for each address
family. And struct flowi is just a union of them all.
It might have been troublesome to convert flow_cache_uli_match() but
as it turns out this function is completely unused and therefore can
be simply removed.
Signed-off-by: David S. Miller <davem@davemloft.net>
Create two sets of port member accessors, one set prefixed by fl4_*
and the other prefixed by fl6_*
This will let us to create AF optimal flow instances.
It will work because every context in which we access the ports,
we have to be fully aware of which AF the flowi is anyways.
Signed-off-by: David S. Miller <davem@davemloft.net>
I intend to turn struct flowi into a union of AF specific flowi
structs. There will be a common structure that each variant includes
first, much like struct sock_common.
This is the first step to move in that direction.
Signed-off-by: David S. Miller <davem@davemloft.net>
The idea here is this minimizes the number of places one has to edit
in order to make changes to how flows are defined and used.
Signed-off-by: David S. Miller <davem@davemloft.net>
This moves most of the accept logic to process context like other
socket stacks do. Then we can use a few more common socket helpers
and simplify a bit.
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have to use cfg->fc_scope not the final nh_scope value.
Reported-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
When doing output route lookups, we have to select the source address
if the user has not specified an explicit one.
First, if the route has an explicit preferred source address
specified, then we use that.
Otherwise we search the route's outgoing interface for a suitable
address.
This search can be precomputed and cached at route insertion time.
The only missing part is that we have to refresh this precomputed
value any time addresses are added or removed from the interface, and
this is accomplished by fib_update_nh_saddrs().
Signed-off-by: David S. Miller <davem@davemloft.net>
The only necessary parts are the src/dst addresses, the
interface indexes, the TOS, and the mark.
The rest is unnecessary bloat, which amounts to nearly
50 bytes on 64-bit.
Signed-off-by: David S. Miller <davem@davemloft.net>
Because of various alignements [SLUB / qdisc], we use 512 bytes of
memory for one {p|b}fifo qdisc, instead of 256 bytes on 64bit arches and
192 bytes on 32bit ones.
Move the "u32 limit" inside "struct Qdisc" (no impact on other qdiscs)
Change qdisc_alloc(), first trying a regular allocation before an
oversized one.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the support for retrieving the remote or peer DCBX
configuration via dcbnl for embedded DCBX stacks supporting the CEE DCBX
standard.
Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These 2 patches add the support for retrieving the remote or peer DCBX
configuration via dcbnl for embedded DCBX stacks. The peer configuration
is part of the DCBX MIB and is useful for debugging and diagnostics of
the overall DCB configuration. The first patch add this support for IEEE
802.1Qaz standard the second patch add the same support for the older
CEE standard. Diff for v2 - the peer-app-info is CEE specific.
Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Return a dst pointer which is potentitally error encoded.
Don't pass original dst pointer by reference, pass a struct net
instead of a socket, and elide the flow argument since it is
unnecessary.
Signed-off-by: David S. Miller <davem@davemloft.net>
Route lookups follow a general pattern in the ipv6 code wherein
we first find the non-IPSEC route, potentially override the
flow destination address due to ipv6 options settings, and then
finally make an IPSEC search using either xfrm_lookup() or
__xfrm_lookup().
__xfrm_lookup() is used when we want to generate a blackhole route
if the key manager needs to resolve the IPSEC rules (in this case
-EREMOTE is returned and the original 'dst' is left unchanged).
Otherwise plain xfrm_lookup() is used and when asynchronous IPSEC
resolution is necessary, we simply fail the lookup completely.
All of these cases are encapsulated into two routines,
ip6_dst_lookup_flow and ip6_sk_dst_lookup_flow. The latter of which
handles unconnected UDP datagram sockets.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch converts UDP to use the new ip_finish_skb API. This
would then allows us to more easily use ip_make_skb which allows
UDP to run without a socket lock.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the helper ip_make_skb which is like ip_append_data
and ip_push_pending_frames all rolled into one, except that it does
not send the skb produced. The sending part is carried out by
ip_send_skb, which the transport protocol can call after it has
tweaked the skb.
It is meant to be called in cases where corking is not used should
have a one-to-one correspondence to sendmsg.
This patch also adds the helper ip_finish_skb which is meant to
be replace ip_push_pending_frames when corking is required.
Previously the protocol stack would peek at the socket write
queue and add its header to the first packet. With ip_finish_skb,
the protocol stack can directly operate on the final skb instead,
just like the non-corking case with ip_make_skb.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to allow simultaneous calls to ip_append_data on the same
socket, it must not modify any shared state in sk or inet (other
than those that are designed to allow that such as atomic counters).
This patch abstracts out write references to sk and inet_sk in
ip_append_data and its friends so that we may use the underlying
code in parallel.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Also fix a typo in the STATION_INFO_TX_BITRATE description
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Most mgmt commands and event are related to hci adapter. Moving index to
common header allow to easily use it in command status while reporting errors.
For those not related to adapter use MGMT_INDEX_NONE (0xFFFF) as index.
Signed-off-by: Szymon Janc <szymon.janc@tieto.com>
Acked-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
For devices supported by iwlwifi sometimes
off-channel transmissions need to be handled
by the device completely. To support this
mac80211 needs to pass the frame directly
to the driver and not through the TX path
as the driver needs the frame and channel
information at the same time.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The return value of the tx operation is commonly
misused by drivers, leading to errors. All drivers
will drop frames if they fail to TX the frame, and
they must also properly manage the queues (if they
didn't, mac80211 would already warn).
Removing the ability for drivers to return a BUSY
value also allows significant cleanups of the TX
TX handling code in mac80211.
Note that this also fixes a bug in ath9k_htc, the
old "return -1" there was wrong.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Tested-by: Sedat Dilek <sedat.dilek@googlemail.com> [ath5k]
Acked-by: Gertjan van Wingerde <gwingerde@gmail.com> [rt2x00]
Acked-by: Larry Finger <Larry.Finger@lwfinger.net> [b43, rtl8187, rtlwifi]
Acked-by: Luciano Coelho <coelho@ti.com> [wl12xx]
Signed-off-by: John W. Linville <linville@tuxdriver.com>
* Do not fail if the peer supports more or less than 3 algorithms.
* Ignore unknown congestion control algorithms instead of failing.
* Simplify congestion algorithm negotiation (largest is best).
* Do not use a static buffer.
* Fix off-by-two read overflow.
* Avoid extra memory copy (in addition to skb_copy_bits()).
The previous code really made no sense.
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk->sk_state already contains the pipe state.
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
lc and wlc use the same formula, but lblc and lblcr use another one. There
is no reason for using two different formulas for the lc variants.
The formula used by lc is used by all the lc variants in this patch.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Acked-by: Wensong Zhang <wensong@linux-vs.org>
Signed-off-by: Simon Horman <horms@verge.net.au>
ip_route_newports() is the only place in the entire kernel that
cares about the port members in the routing cache entry's lookup
flow key.
Therefore the only reason we store an entire flow inside of the
struct rtentry is for this one special case.
Rewrite ip_route_newports() such that:
1) The caller passes in the original port values, so we don't need
to use the rth->fl.fl_ip_{s,d}port values to remember them.
2) The lookup flow is constructed by hand instead of being copied
from the routing cache entry's flow.
Signed-off-by: David S. Miller <davem@davemloft.net>
netem_skb_cb() does :
return (struct netem_skb_cb *)qdisc_skb_cb(skb)->data;
Unfortunatly struct qdisc_skb_cb data is not long word aligned, so
access to psched_time_t time_to_send uses a non aligned access.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The flag isn't very descriptive -- the intention
is that the driver provides a TSF timestamp at
the beginning of the MPDU -- make that clearer
by renaming the flag to RX_FLAG_MACTIME_MPDU.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add proper RCU annotations/verbs to sk_wq and wq members
Fix __sctp_write_space() sk_sleep() abuse (and sock->wq access)
Fix sunrpc sk_sleep() abuse too
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit adds the missing IPv6 multicast address flag defines to
complement the already existing multicast address scope defines and to
be able to check these flags nicely in the future.
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
To properly track bonding completion an event to indicate authentication
failure is needed. This event will be sent whenever an authentication
complete HCI event with a non-zero status comes. It will also be sent
when we're acting in acceptor role for SSP authentication in which case
the controller will send a Simple Pairing Complete event.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
The command complete event for mgmt_pin_code_reply &
mgmt_pin_code_neg_reply should have the adapter index, Bluetooth address
as well as the status.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
This patch adds support for the user confirmation (numeric comparison)
Secure Simple Pairing authentication method.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
This patch adds a new mgmt_pair_device which can be used to initiate a
dedicated bonding procedure. Some extra callbacks are added to the
hci_conn struct so that the pairing code can get notified of the
completion of the procedure.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Now, TCP_CHECK_TIMER is not used for debuging, it does nothing.
And, it has been there for several years, maybe 6 years.
Remove it to keep code clearer.
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The only troublesome bit here is __mkroute_output which wants
to override res->fi and res->type, compute those in local
variables instead.
Signed-off-by: David S. Miller <davem@davemloft.net>
This allows avoiding multiple writes to the initial __refcnt.
The most simplest cases of wanting an initial reference of "1"
in ipv4 and ipv6 have been converted, the rest have been left
along and kept at the existing "0".
Signed-off-by: David S. Miller <davem@davemloft.net>
Only oddities here are a couple of drivers that bogusly called the ldisc
helpers instead of returning -ENOIOCTLCMD. Fix the bug and the rest goes
away.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Doing tiocmget was such fun we should do tiocmset as well for the same
reasons
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We don't actually need this and it causes problems for internal use of
this functionality. Currently there is a single use of the FILE * pointer.
That is the serial core which uses it to check tty_hung_up_p. However if
that is true then IO_ERROR is also already set so the check may be removed.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Assigning a socket in timewait state to skb->sk can trigger
kernel oops, e.g. in nfnetlink_log, which does:
if (skb->sk) {
read_lock_bh(&skb->sk->sk_callback_lock);
if (skb->sk->sk_socket && skb->sk->sk_socket->file) ...
in the timewait case, accessing sk->sk_callback_lock and sk->sk_socket
is invalid.
Either all of these spots will need to add a test for sk->sk_state != TCP_TIME_WAIT,
or xt_TPROXY must not assign a timewait socket to skb->sk.
This does the latter.
If a TW socket is found, assign the tproxy nfmark, but skip the skb->sk assignment,
thus mimicking behaviour of a '-m socket .. -j MARK/ACCEPT' re-routing rule.
The 'SYN to TW socket' case is left unchanged -- we try to redirect to the
listener socket.
Cc: Balazs Scheidler <bazsi@balabit.hu>
Cc: KOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: Florian Westphal <fwestphal@astaro.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
If the new connection update parameter are accepted, the LE master
host sends the LE Connection Update Command to its controller informing
the new requested parameters.
Signed-off-by: Claudio Takahasi <claudio.takahasi@openbossa.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Use proper timer instead of hci command flow control to timeout
failed hci commands. Otherwise stack ends up sending commands
when flow control is used to block new commands.
2010-09-01 18:29:41.592132 < HCI Command: Remote Name Request (0x01|0x0019) plen 10
bdaddr 00:16:CF:E1:C7:D7 mode 2 clkoffset 0x0000
2010-09-01 18:29:41.592681 > HCI Event: Command Status (0x0f) plen 4
Remote Name Request (0x01|0x0019) status 0x00 ncmd 0
2010-09-01 18:29:51.022033 < HCI Command: Remote Name Request Cancel (0x01|0x001a) plen 6
bdaddr 00:16:CF:E1:C7:D7
Signed-off-by: Ville Tervo <ville.tervo@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Implements L2CAP Connection Parameter Update Response defined in
the Bluetooth Core Specification, Volume 3, Part A, section 4.21.
Address the LE Connection Parameter Procedure initiated by the slave.
Connection Interval Minimum and Maximum have the same range: 6 to
3200. Time = N * 1.25ms. Minimum shall be less or equal to Maximum.
The Slave Latency field shall have a value in the range of 0 to
((connSupervisionTimeout / connIntervalMax) - 1). Latency field shall
be less than 500. connSupervisionTimeout = Timeout Multiplier * 10 ms.
Multiplier field shall have a value in the range of 10 to 3200.
Signed-off-by: Claudio Takahasi <claudio.takahasi@openbossa.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
This patch splits the L2CAP command handling function in order to
have a clear separation between the commands related to BR/EDR and
LE. Commands and responses in the LE signaling channel are not being
handled yet, command reject is sent to all received requests. Bluetooth
Core Specification, Volume 3, Part A, section 4 defines the signaling
packets formats and allowed commands/responses over the LE signaling
channel.
Signed-off-by: Claudio Takahasi <claudio.takahasi@openbossa.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Add support for LE server sockets.
Signed-off-by: Ville Tervo <ville.tervo@nokia.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Add basic LE connection support to L2CAP. LE
connection can be created by specifying cid
in struct sockaddr_l2
Signed-off-by: Ville Tervo <ville.tervo@nokia.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Bluetooth chips may have separate buffers for LE traffic.
This patch add support to use LE buffers provided by the chip.
Signed-off-by: Ville Tervo <ville.tervo@nokia.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Bluetooth V4.0 adds support for Low Energy (LE) connections.
Specification introduces new set of hci commands to control LE
connection. This patch adds logic to create, cancel and disconnect
LE connections.
Signed-off-by: Ville Tervo <ville.tervo@nokia.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Add needed HCI command and event structs to
create LE connections.
Signed-off-by: Ville Tervo <ville.tervo@nokia.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
When IP_VS schedulers do not find a destination, they output a terse
"WLC: no destination available" message through kernel syslog, which I
can not only make sense of because syslog puts them in a logfile
together with keepalived checker results.
This patch makes the output a bit more informative, by telling you which
virtual service failed to find a destination.
Example output:
kernel: [1539214.552233] IPVS: wlc: TCP 192.168.8.30:22 - no destination available
kernel: [1539299.674418] IPVS: wlc: FWM 22 0x00000016 - no destination available
I have tested the code for IPv4 and FWM services, as you can see from
the example; I do not have an IPv6 setup to test the third code path
with.
To avoid code duplication, I put a new function ip_vs_scheduler_err()
into ip_vs_sched.c, and use that from the schedulers instead of calling
IP_VS_ERR_RL directly.
Signed-off-by: Patrick Schaaf <netdev@bof.de>
Signed-off-by: Simon Horman <horms@verge.net.au>
l2cap_load() was added to trigger l2cap.ko module loading from the RFCOMM
and BNEP modules. Now that L2CAP module is gone, we don't need it anymore.
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Actually doesn't make sense have these modules built separately.
The L2CAP layer is needed by almost all Bluetooth protocols and profiles.
There isn't any real use case without having L2CAP loaded.
SCO is only essential for Audio transfers, but it is so small that we can
have it loaded always in bluetooth.ko without problems.
If you really doesn't want it you can disable SCO in the kernel config.
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
If we didn't have a routing cache, we would not be able to properly
propagate certain kinds of dynamic path attributes, for example
PMTU information and redirects.
The reason is that if we didn't have a routing cache, then there would
be no way to lookup all of the active cached routes hanging off of
sockets, tunnels, IPSEC bundles, etc.
Consider the case where we created a cached route, but no inetpeer
entry existed and also we were not asked to pre-COW the route metrics
and therefore did not force the creation a new inetpeer entry.
If we later get a PMTU message, or a redirect, and store this
information in a new inetpeer entry, there is no way to teach that
cached route about the newly existing inetpeer entry.
The facilities implemented here handle this problem.
First we create a generation ID. When we create a cached route of any
kind, we remember the generation ID at the time of attachment. Any
time we force-create an inetpeer entry in response to new path
information, we bump that generation ID.
The dst_ops->check() callback is where the knowledge of this event
is propagated. If the global generation ID does not equal the one
stored in the cached route, and the cached route has not attached
to an inetpeer yet, we look it up and attach if one is found. Now
that we've updated the cached route's information, we update the
route's generation ID too.
This clears the way for implementing PMTU and redirects directly in
the inetpeer cache. There is absolutely no need to consult cached
route information in order to maintain this information.
At this point nothing bumps the inetpeer genids, that comes in the
later changes which handle PMTUs and redirects using inetpeers.
Signed-off-by: David S. Miller <davem@davemloft.net>
Validity of the cached PMTU information is indicated by it's
expiration value being non-zero, just as per dst->expires.
The scheme we will use is that we will remember the pre-ICMP value
held in the metrics or route entry, and then at expiration time
we will restore that value.
In this way PMTU expiration does not kill off the cached route as is
done currently.
Redirect information is permanent, or at least until another redirect
is received.
Signed-off-by: David S. Miller <davem@davemloft.net>
Future changes will add caching information, and some of
these new elements will be addresses.
Since the family is implicit via the ->daddr.family member,
replicating the family in ever address we store is entirely
redundant.
Signed-off-by: David S. Miller <davem@davemloft.net>
The Linux IPv4 AH stack aligns the AH header on a 64 bit boundary
(like in IPv6). This is not RFC compliant (see RFC4302, Section
3.3.3.2.1), it should be aligned on 32 bits.
For most of the authentication algorithms, the ICV size is 96 bits.
The AH header alignment on 32 or 64 bits gives the same results.
However for SHA-256-128 for instance, the wrong 64 bit alignment results
in adding useless padding in IPv4 AH, which is forbidden by the RFC.
To avoid breaking backward compatibility, we use a new flag
(XFRM_STATE_ALIGN4) do change original behavior.
Initial patch from Dang Hongwu <hongwu.dang@6wind.com> and
Christophe Gouault <christophe.gouault@6wind.com>.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It causes the move of the declaration of 3 functions to l2cap.h:
l2cap_get_ident(), l2cap_send_cmd(), l2cap_build_conf_req()
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
This patch tries to do the minimal to move l2cap_sock_create() and its
dependencies to l2cap_sock.c. It create a API to initialize and cleanup
the L2CAP sockets from l2cap_core.c through l2cap_init_sockets() and
l2cap_cleanup_sockets().
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
This patch adds a new set_io_capability management command which is used
to set the IO capability for Secure Simple Pairing (SSP) as well as the
Security Manager Protocol (SMP). The value is per hci_dev and each
hci_conn object inherits it upon creation.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
This patch adds the necessary commands and events needed to communicate
PIN code related actions between the kernel and userspace. This includes
a pin_code_request event as well as pin_code_reply and
pin_code_negative_reply commands.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
This patch adds a get_connections command to the management interface.
With this command userspace can get the current list of connected
devices. Typically this command would only be used once when enumerating
existing adapters. After that the connected and disconnected events are
used to track connections.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
This patch add a new connect failed management event to track failures
in connecting to remote devices. It is particularly useful for security
mode 3 scenarios when we don't have a connected state while pairing but
still need to detect when the connect attempt failed.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
This patch adds a disconnect command to the managment interface. Using
this command user space is able to force the disconnection of connected
devices. The command maps directly to the Disconnect HCI command.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
This patch adds connected and disconnected managment events to track the
connection status to remote devices. The events map directly to
successful connection complete and disconnection complete HCI events for
ACL links.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
This patch adds a management commands to feed the kernel with all stored
link keys as well as remove specific ones or all of them. Once the
load_keys command has been called the kernel takes over link key
replies. A new_key event is also added to inform userspace of newly
created link keys that should be stored permanently.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
This patch adds the possibility for user space to fully control the
Class of Device value of local adapters. To control the service class
bits each UUID that's added comes with a service class "hint" which acts
as a mask of bits that the UUID needs to have enabled. The
set_service_cache management command is used to make sure we queue up
all UUID changes as user space initializes its drivers and then send a
single HCI_Write_Class_of_Device command when initialization is
complete.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Using the managment interface means that user space doesn't need to do
any HCI command sending at all. This patch moves the remaining
initialization commands from user space to the kernel side. The patch
makes use of the new feature of __hci_request which allows the request
to be dynamically modified while it is ongoing (something that is needed
to react appropriately to the local features and the version of the
adapter).
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
The controller may have link keys in its own memory and these keys could
be used for secure connections. However, since the interface to access
these keys doesn't provide information about the key types (which would
be needed to infer the level of security each key provides) using these
keys is rather useless. Therefore, simply clear the controller side list
in the initialization procedure.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
To support a more dynamic HCI initialization sequence the __hci_request
behavior requires some more changes. Particularly, the init sequence
should be able to have conditionals in it (sending some HCI commands
depending on the outcome of a previous command) instead of being a fixed
list as it is right now.
The reasons for these additional requirements are the moving all
previously user space driven initialization commands to the kernel side
as well as the support the Low Energy controllers.
To fulfull these requirements the init sequence is made the only special
case for multi-command requests and req_last_cmd is renamed to
init_last_cmd. The hci_send_cmd function is changed to update
init_last_cmd as long as the HCI_INIT flag is set.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
This patch adds the necessary logic to act accordingly when the
HCI_PAIRABLE flag is not set. In that case PIN code replies as well as
Secure Simple Pairing requests without a NoBonding requirement need to
be rejected.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
This patch adds methods to the management interface for userspace to
notify the kernel of which services have been registered for specific
adapters. This information is needed for setting the appropriate Class
of Device value as well as the Extended Inquiry Response value. This
patch doesn't actually implement setting of these values but just
provides the storage of the UUIDs so the needed functionality can be
built on top of it.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
This patch implements a new set_pairable management command to control
the pairable state of local adapters. The state is represented using a
new HCI_PAIRABLE flag in the hci_dev struct.
For backwards compatibility with older user space versions the
HCI_PAIRABLE flag gets automatically set when the existence of an
adapter is reported to user space through legacy methods and the
HCI_MGMT flag is not set.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
This patch adds a HCI_MGMT flag to track adapters which are under the
control of the management interface. This is needed to make sure that
new kernels will work with old user space versions. I.e. behaviour which
could break old user space versions (but is needed by the management
interface) should not be exhibited when the HCI_MGMT flag is not set.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
The powered, connectable and discoverable messages all have the same
format. By using a single struct for all of them a lot of code can be
simplified and reused.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
This patch adds a set_connectable command as well as a corresponding
event to the management interface. It's mainly useful for setting an
adapter as connectable from a non-initialized state as well as setting
an already initialized adapter as non-connectable (mostly useful for
qualification purposes).
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
This patch adds a set_discoverable command to the management interface
as well as the corresponding event. The command is used to control the
discoverable state of adapters.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
This patch adds a set_powered command to the management interface
through which the powered state of local adapters can be controlled.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
This patch adds support for the powered event that's used to indicate to
userspace when the powered state of a local adapter changes.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
This patch implements automatic initialization of basic information
about newly registered Bluetooth adapters. E.g. the address and features
are always needed so it makes sense for the kernel to automatically
power on adapters and read this information. A new HCI_SETUP flag is
added to track this state.
In order to not consume unnecessary amounts of power if there isn't a
user space available that could switch the adapter back off, a timer is
added to do this automatically as long as no Bluetooth user space seems
to be present. A new HCI_AUTO_OFF flag is added that user space needs to
clear to avoid the automatic power off.
Additionally, the management interface index_added event is moved to the
end of the HCI_SETUP stage so a user space supporting the managment
inteface has all the necessary information available for fetching when
it gets notified of a new adapter. The HCI_DEV_REG event is kept in the
same place as before since existing HCI raw socket based user space
versions depend on seeing the kernels initialization sequence
(hci_init_req) to determine when the adapter is ready for use.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Modification of Nick Pelly <npelly@google.com> patch.
With Bluetooth 2.1 ACL packets can be flushable or non-flushable. This commit
makes ACL data packets non-flushable by default on compatible chipsets, and
adds the BT_FLUSHABLE socket option to explicitly request flushable ACL
data packets for a given L2CAP socket. This is useful for A2DP data which can
be safely discarded if it can not be delivered within a short time (while
other ACL data should not be discarded).
Note that making ACL data flushable has no effect unless the automatic flush
timeout for that ACL link is changed from its default of 0 (infinite).
Default packet types (for compatible chipsets):
Frame 34: 13 bytes on wire (104 bits), 13 bytes captured (104 bits)
Bluetooth HCI H4
Bluetooth HCI ACL Packet
.... 0000 0000 0010 = Connection Handle: 0x0002
..00 .... .... .... = PB Flag: First Non-automatically Flushable Packet (0)
00.. .... .... .... = BC Flag: Point-To-Point (0)
Data Total Length: 8
Bluetooth L2CAP Packet
After setting BT_FLUSHABLE
(sock.setsockopt(274 /*SOL_BLUETOOTH*/, 8 /* BT_FLUSHABLE */, 1 /* flush */))
Frame 34: 13 bytes on wire (104 bits), 13 bytes captured (104 bits)
Bluetooth HCI H4
Bluetooth HCI ACL Packet
.... 0000 0000 0010 = Connection Handle: 0x0002
..10 .... .... .... = PB Flag: First Automatically Flushable Packet (2)
00.. .... .... .... = BC Flag: Point-To-Point (0)
Data Total Length: 8
Bluetooth L2CAP Packet
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Like metrics, the ICMP rate limiting bits are cached state about
a destination. So move it into the inet_peer entries.
If an inet_peer cannot be bound (the reason is memory allocation
failure or similar), the policy is to allow.
Signed-off-by: David S. Miller <davem@davemloft.net>
nlmsg_cancel can accept NULL as its second argument, so for similarity,
this patch extends genlmsg_cancel to be able to accept a NULL second
argument as well.
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
TKIP countermeasures depend on devices being able to detect Michael
MIC failures on received frames and for stations to report errors to
the AP. In order to test that behavior, it is useful to be able to
send out TKIP frames with incorrect Michael MIC. This testing behavior
has minimal effect on the TX path, so it can be added to mac80211 for
convenient use.
The interface for using this functionality is a file in mac80211
netdev debugfs (tkip_mic_test). Writing a MAC address to the file
makes mac80211 generate a dummy data frame that will be sent out using
invalid Michael MIC value. In AP mode, the address needs to be for one
of the associated stations or ff:ff:ff:ff:ff:ff to use a broadcast
frame. In station mode, the address can be anything, e.g., the current
BSSID. It should be noted that this functionality works correctly only
when associated and using TKIP.
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When operating in AP mode the wl1271 hardware filters out null-data
packets as well as management packets. This makes it impossible for
mac80211 to monitor the PS mode by using the PM bit of incoming frames.
Implement a HW flag to indicate that mac80211 should ignore the PM bit.
In addition, expose ieee80211_sta_ps_transition() to make low-level
drivers capable of controlling PS-mode.
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
These variables are unused as a result of the recent netns work.
Signed-off-by: Simon Horman <horms@verge.net.au>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Tested-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
For the following rule:
iptables -I PREROUTING -t raw -j CT --ctevents assured
The event delivered looks like the following:
[UPDATE] tcp 6 src=192.168.0.2 dst=192.168.1.2 sport=37041 dport=80 src=192.168.1.2 dst=192.168.1.100 sport=80 dport=37041 [ASSURED]
Note that the TCP protocol state is not included. For that reason
the CT event filtering is not very useful for conntrackd.
To resolve this issue, instead of conditionally setting the CT events
bits based on the ctmask, we always set them and perform the filtering
in the late stage, just before the delivery.
Thus, the event delivered looks like the following:
[UPDATE] tcp 6 432000 ESTABLISHED src=192.168.0.2 dst=192.168.1.2 sport=37041 dport=80 src=192.168.1.2 dst=192.168.1.100 sport=80 dport=37041 [ASSURED]
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
The patch adds the NFNL_SUBSYS_IPSET id and NLA_PUT_NET* macros to the
vanilla kernel.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Both fib_trie and fib_hash have a local implementation of
fib_table_select_default(). This is completely unnecessary
code duplication.
Since we now remember the fib_table and the head of the fib
alias list of the default route, we can implement one single
generic version of this routine.
Looking at the fib_hash implementation you may get the impression
that it's possible for there to be multiple top-level routes in
the table for the default route. The truth is, it isn't, the
insert code will only allow one entry to exist in the zero
prefix hash table, because all keys evaluate to zero and all
keys in a hash table must be unique.
Signed-off-by: David S. Miller <davem@davemloft.net>
This will be used later to implement fib_select_default() in a
completely generic manner, instead of the current situation where the
default route is re-looked up in the TRIE/HASH table and then the
available aliases are analyzed.
Signed-off-by: David S. Miller <davem@davemloft.net>
SIOCGETSGCNT is not a unique ioctl value as it it maps tio SIOCPROTOPRIVATE +1,
which unfortunately means the existing infrastructure for compat networking
ioctls is insufficient. A trivial compact ioctl implementation would conflict
with:
SIOCAX25ADDUID
SIOCAIPXPRISLT
SIOCGETSGCNT_IN6
SIOCGETSGCNT
SIOCRSSCAUSE
SIOCX25SSUBSCRIP
SIOCX25SDTEFACILITIES
To make this work I have updated the compat_ioctl decode path to mirror the
the normal ioctl decode path. I have added an ipv4 inet_compat_ioctl function
so that I can have ipv4 specific compat ioctls. I have added a compat_ioctl
function into struct proto so I can break out ioctls by which kind of ip socket
I am using. I have added a compat_raw_ioctl function because SIOCGETSGCNT only
works on raw sockets. I have added a ipmr_compat_ioctl that mirrors the normal
ipmr_ioctl.
This was necessary because unfortunately the struct layout for the SIOCGETSGCNT
has unsigned longs in it so changes between 32bit and 64bit kernels.
This change was sufficient to run a 32bit ip multicast routing daemon on a
64bit kernel.
Reported-by: Bill Fenner <fenner@aristanetworks.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If there are no explicit metrics attached to a route, hook
fi->fib_info up to dst_default_metrics.
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the initial gateway towards super-sharing metrics
if they are all set to zero for a route.
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds the MCS information we currently get
from the drivers into radiotap.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
TCP is going to record metrics for the connection,
so pre-COW the route metrics at route cache entry
creation time.
This avoids several atomic operations that have to
occur if we COW the metrics after the entry reaches
global visibility.
Signed-off-by: David S. Miller <davem@davemloft.net>
Set the RTAX_LOCKED metric to INETPEER_METRICS_NEW (basically,
all ones) on fresh inetpeer entries.
This way code can determine if default metrics have been loaded
in from a routing table entry already.
Signed-off-by: David S. Miller <davem@davemloft.net>
Routing metrics are now copy-on-write.
Initially a route entry points it's metrics at a read-only location.
If a routing table entry exists, it will point there. Else it will
point at the all zero metric place-holder called 'dst_default_metrics'.
The writeability state of the metrics is stored in the low bits of the
metrics pointer, we have two bits left to spare if we want to store
more states.
For the initial implementation, COW is implemented simply via kmalloc.
However future enhancements will change this to place the writable
metrics somewhere else, in order to increase sharing. Very likely
this "somewhere else" will be the inetpeer cache.
Note also that this means that metrics updates may transiently fail
if we cannot COW the metrics successfully.
But even by itself, this patch should decrease memory usage and
increase cache locality especially for routing workloads. In those
cases the read-only metric copies stay in place and never get written
to.
TCP workloads where metrics get updated, and those rare cases where
PMTU triggers occur, will take a very slight performance hit. But
that hit will be alleviated when the long-term writable metrics
move to a more sharable location.
Since the metrics storage went from a u32 array of RTAX_MAX entries to
what is essentially a pointer, some retooling of the dst_entry layout
was necessary.
Most importantly, we need to preserve the alignment of the reference
count so that it doesn't share cache lines with the read-mostly state,
as per Eric Dumazet's alignment assertion checks.
The only non-trivial bit here is the move of the 'flags' member into
the writeable cacheline. This is OK since we are always accessing the
flags around the same moment when we made a modification to the
reference count.
Signed-off-by: David S. Miller <davem@davemloft.net>
Quoting Ben Hutchings: we presumably won't be defining features that
can only be enabled on 64-bit architectures.
Occurences found by `grep -r` on net/, drivers/net, include/
[ Move features and vlan_features next to each other in
struct netdev, as per Eric Dumazet's suggestion -DaveM ]
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend channel to frequency mapping for 802.11j Japan 4.9GHz band, according to
IEEE802.11 section 17.3.8.3.2 and Annex J. Because there are now overlapping
channel numbers in the 2GHz and 5GHz band we can't map from channel to
frequency without knowing the band. This is no problem as in most contexts we
know the band. In places where we don't know the band (and WEXT compatibility)
we assume the 2GHz band for channels below 14.
This patch does not implement all channel to frequency mappings defined in
802.11, it's just an extension for 802.11j 20MHz channels. 5MHz and 10MHz
channels as well as 802.11y channels have been omitted.
The following drivers have been updated to reflect the API changes:
iwl-3945, iwl-agn, iwmc3200wifi, libertas, mwl8k, rt2x00, wl1251, wl12xx.
The drivers have been compile-tested only.
Signed-off-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: Brian Prodoehl <bprodoehl@gmail.com>
Acked-by: Luciano Coelho <coelho@ti.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
In commit 44b8288308 (net_sched: pfifo_head_drop problem), we fixed
a problem with pfifo_head drops that incorrectly decreased
sch->bstats.bytes and sch->bstats.packets
Several qdiscs (CHOKe, SFQ, pfifo_head, ...) are able to drop a
previously enqueued packet, and bstats cannot be changed, so
bstats/rates are not accurate (over estimated)
This patch changes the qdisc_bstats updates to be done at dequeue() time
instead of enqueue() time. bstats counters no longer account for dropped
frames, and rates are more correct, since enqueue() bursts dont have
effect on dequeue() rate.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch converts stab qdisc management to RCU, so that we can perform
the qdisc_calculate_pkt_len() call before getting qdisc lock.
This shortens the lock's held time in __dev_xmit_skb().
This permits more qdiscs to get TCQ_F_CAN_BYPASS status, avoiding lot of
cache misses and so reducing latencies.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
CC: Jesper Dangaard Brouer <hawk@diku.dk>
CC: Jarek Poplawski <jarkao2@gmail.com>
CC: Jamal Hadi Salim <hadi@cyberus.ca>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 3711210576 (net: QDISC_STATE_RUNNING dont need atomic bit
ops) I moved QDISC_STATE_RUNNING flag to __state container, located in
the cache line containing qdisc lock and often dirtied fields.
I now move TCQ_F_THROTTLED bit too, so that we let first cache line read
mostly, and shared by all cpus. This should speedup HTB/CBQ for example.
Not using test_bit()/__clear_bit()/__test_and_set_bit allows to use an
"unsigned int" for __state container, reducing by 8 bytes Qdisc size.
Introduce helpers to hide implementation details.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
CC: Jesper Dangaard Brouer <hawk@diku.dk>
CC: Jarek Poplawski <jarkao2@gmail.com>
CC: Jamal Hadi Salim <hadi@cyberus.ca>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/built-in.o: In function `nf_conntrack_init_net':
net/netfilter/nf_conntrack_core.c:1521:
undefined reference to `nf_conntrack_tstamp_init'
net/netfilter/nf_conntrack_core.c:1531:
undefined reference to `nf_conntrack_tstamp_fini'
Add dummy inline functions for the =n case to fix this.
Reported-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (41 commits)
sctp: user perfect name for Delayed SACK Timer option
net: fix can_checksum_protocol() arguments swap
Revert "netlink: test for all flags of the NLM_F_DUMP composite"
gianfar: Fix misleading indentation in startup_gfar()
net/irda/sh_irda: return to RX mode when TX error
net offloading: Do not mask out NETIF_F_HW_VLAN_TX for vlan.
USB CDC NCM: tx_fixup() race condition fix
ns83820: Avoid bad pointer deref in ns83820_init_one().
ipv6: Silence privacy extensions initialization
bnx2x: Update bnx2x version to 1.62.00-4
bnx2x: Fix AER setting for BCM57712
bnx2x: Fix BCM84823 LED behavior
bnx2x: Mark full duplex on some external PHYs
bnx2x: Fix BCM8073/BCM8727 microcode loading
bnx2x: LED fix for BCM8727 over BCM57712
bnx2x: Common init will be executed only once after POR
bnx2x: Swap BCM8073 PHY polarity if required
iwlwifi: fix valid chain reading from EEPROM
ath5k: fix locking in tx_complete_poll_work
ath9k_hw: do PA offset calibration only on longcal interval
...
The option name of Delayed SACK Timer should be SCTP_DELAYED_SACK,
not SCTP_DELAYED_ACK.
Left SCTP_DELAYED_ACK be concomitant with SCTP_DELAYED_SACK,
for making compatibility with existing applications.
Reference:
8.1.19. Get or Set Delayed SACK Timer (SCTP_DELAYED_SACK)
(http://tools.ietf.org/html/draft-ietf-tsvwg-sctpsocket-25)
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Acked-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The conn->sec_level value is supposed to represent the current level of
security that the connection has. However, by assigning to it before
requesting authentication it will have the wrong value during the
authentication procedure. To fix this a pending_sec_level variable is
added which is used to track the desired security level while making
sure that sec_level always represents the current level of security.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Currently, mac80211 always advertises that it may send
up to 64 subframes in an aggregate. This is fine, since
it's the max, but might as well be set to zero instead
since it doesn't have any information.
However, drivers might have that information, so allow
them to set a variable giving it, which will then be
used. The default of zero will be fine since to the
peer that means we don't know and it will just use its
own limit for the buffer size.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The aggregation code currently doesn't implement the
buffer size negotiation. It will always request a max
buffer size (which is fine, if a little pointless, as
the mac80211 code doesn't know and might just use 0
instead), but if the peer requests a smaller size it
isn't possible to honour this request.
In order to fix this, look at the buffer size in the
addBA response frame, keep track of it and pass it to
the driver in the ampdu_action callback when called
with the IEEE80211_AMPDU_TX_OPERATIONAL action. That
way the driver can limit the number of subframes in
aggregates appropriately.
Note that this doesn't fix any drivers apart from the
addition of the new argument -- they all need to be
updated separately to use this variable!
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Some devices don't support the maximum AMDPU buffer size of 64, so we
need to add an option to configure this in the hardware configuration.
This value will be used in the ADDBA response instead of the value
suggested in the request, if the latter is greater than the max
supported.
Signed-off-by: Luciano Coelho <coelho@ti.com>
Tested-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds flow-based timestamping for conntracks. This
conntrack extension is disabled by default. Basically, we use
two 64-bits variables to store the creation timestamp once the
conntrack has been confirmed and the other to store the deletion
time. This extension is disabled by default, to enable it, you
have to:
echo 1 > /proc/sys/net/netfilter/nf_conntrack_timestamp
This patch allows to save memory for user-space flow-based
loogers such as ulogd2. In short, ulogd2 does not need to
keep a hashtable with the conntrack in user-space to know
when they were created and destroyed, instead we use the
kernel timestamp. If we want to have a sane IPFIX implementation
in user-space, this nanosecs resolution timestamps are also
useful. Other custom user-space applications can benefit from
this via libnetfilter_conntrack.
This patch modifies the /proc output to display the delta time
in seconds since the flow start. You can also obtain the
flow-start date by means of the conntrack-tools.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Packet filter (BPF) doesnt need to disable softirqs, being fully
re-entrant and lock-less.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adding support for SNMP broadcast connection tracking. The SNMP
broadcast requests are now paired with the SNMP responses.
Thus allowing using SNMP broadcasts with firewall enabled.
Please refer to the following conversation:
http://marc.info/?l=netfilter-devel&m=125992205006600&w=2
Patrick McHardy wrote:
> > The best solution would be to add generic broadcast tracking, the
> > use of expectations for this is a bit of abuse.
> > The second best choice I guess would be to move the help() function
> > to a shared module and generalize it so it can be used for both.
This patch implements the "second best choice".
Since the netbios-ns conntrack module uses the same helper
functionality as the snmp, only one helper function is added
for both snmp and netbios-ns modules into the new object -
nf_conntrack_broadcast.
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
My previous patch (netfilter: nf_nat: don't use atomic bit operation)
made a mistake when converting atomic_set to a normal bit 'or'.
IPS_*_BIT should be replaced with IPS_*.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Cc: Tim Gardner <tim.gardner@canonical.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (47 commits)
GRETH: resolve SMP issues and other problems
GRETH: handle frame error interrupts
GRETH: avoid writing bad speed/duplex when setting transfer mode
GRETH: fixed skb buffer memory leak on frame errors
GRETH: GBit transmit descriptor handling optimization
GRETH: fix opening/closing
GRETH: added raw AMBA vendor/device number to match against.
cassini: Fix build bustage on x86.
e1000e: consistent use of Rx/Tx vs. RX/TX/rx/tx in comments/logs
e1000e: update Copyright for 2011
e1000: Avoid unhandled IRQ
r8169: keep firmware in memory.
netdev: tilepro: Use is_unicast_ether_addr helper
etherdevice.h: Add is_unicast_ether_addr function
ks8695net: Use default implementation of ethtool_ops::get_link
ks8695net: Disable non-working ethtool operations
USB CDC NCM: Don't deref NULL in cdc_ncm_rx_fixup() and don't use uninitialized variable.
vxge: Remember to release firmware after upgrading firmware
netdev: bfin_mac: Remove is_multicast_ether_addr use in netdev_for_each_mc_addr
ipsec: update MAX_AH_AUTH_LEN to support sha512
...
Use is_vmalloc_addr() in nf_ct_free_hashtable() and get rid of
the vmalloc flags to indicate that a hash table has been allocated
using vmalloc().
Signed-off-by: Patrick McHardy <kaber@trash.net>
Fix dependencies of netfilter realm match: it depends on NET_CLS_ROUTE,
which itself depends on NET_SCHED; this dependency is missing from netfilter.
Since matching on realms is also useful without having NET_SCHED enabled and
the option really only controls whether the tclassid member is included in
route and dst entries, rename the config option to IP_ROUTE_CLASSID and move
it outside of traffic scheduling context to get rid of the NET_SCHED dependeny.
Reported-by: Vladis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
icv_truncbits is set to 256 for sha512, so update
MAX_AH_AUTH_LEN to 64.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The RED statistics structure includes backlog field which is not
set or used by any code.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Last two global vars to be moved,
ip_vs_ftpsvc_counter and ip_vs_nullsvc_counter.
[horms@verge.net.au: removed whitespace-change-only hunk]
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
trash list per namspace,
and reordering of some params in dst struct.
[ horms@verge.net.au: Use cancel_delayed_work_sync() instead of
cancel_rearming_delayed_work(). Found during
merge conflict resoliution ]
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
This patch makes defense work timer per name-space,
A net ptr had to be added to the ipvs struct,
since it's needed by defense_work_handler.
[ horms@verge.net.au: Use cancel_delayed_work_sync() instead of
cancel_rearming_delayed_work(). Found during
merge conflict resoliution ]
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Moving global vars to ipvs struct, except for svc table lock.
Next patch for ctl will be drop-rate handling.
*v3
__ip_vs_mutex remains global
ip_vs_conntrack_enabled(struct netns_ipvs *ipvs)
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Connection hash table is now name space aware.
i.e. net ptr >> 8 is xor:ed to the hash,
and this is the first param to be compared.
The net struct is 0xa40 in size ( a little bit smaller for 32 bit arch:s)
and cache-line aligned, so a ptr >> 5 might be a more clever solution ?
All lookups where net is compared uses net_eq() which returns 1 when netns
is disabled, and the compiler seems to do something clever in that case.
ip_vs_conn_fill_param() have *net as first param now.
Three new inlines added to keep conn struct smaller
when names space is disabled.
- ip_vs_conn_net()
- ip_vs_conn_net_set()
- ip_vs_conn_net_eq()
*v3
moved net compare to the end in "fast path"
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
The statistic counter locks for every packet are now removed,
and that statistic is now per CPU, i.e. no locks needed.
However summing is made in ip_vs_est into ip_vs_stats struct
which is moved to ipvs struc.
procfs, ip_vs_stats now have a "per cpu" count and a grand total.
A new function seq_file_single_net() in ip_vs.h created for handling of
single_open_net() since it does not place net ptr in a struct, like others.
/var/lib/lxc # cat /proc/net/ip_vs_stats_percpu
Total Incoming Outgoing Incoming Outgoing
CPU Conns Packets Packets Bytes Bytes
0 0 3 1 9D 34
1 0 1 2 49 70
2 0 1 2 34 76
3 1 2 2 70 74
~ 1 7 7 18A 18E
Conns/s Pkts/s Pkts/s Bytes/s Bytes/s
0 0 0 0 0
*v3
ip_vs_stats reamains as before, instead ip_vs_stats_percpu is added.
u64 seq lock added
*v4
Bug correction inbytes and outbytes as own vars..
per_cpu counter for all stats now as suggested by Julian.
[horms@verge.net.au: removed whitespace-change-only hunk]
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
All global variables moved to struct ipvs,
most external changes fixed (i.e. init_net removed)
in sync_buf create + 4 replaced by sizeof(struct..)
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
All variables moved to struct ipvs,
most external changes fixed (i.e. init_net removed)
*v3
timer per ns instead of a common timer in estimator.
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
All variables moved to struct ipvs,
most external changes fixed (i.e. init_net removed)
in ip_vs_protocol param struct net *net added to:
- register_app()
- unregister_app()
This affected almost all proto_xxx.c files
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
appcnt and timeout_table moved from struct ip_vs_protocol to
ip_vs proto_data.
struct net *net added as first param to
- register_app()
- unregister_app()
- app_conn_bind()
- ip_vs_conn_new()
[horms@verge.net.au: removed cosmetic-change-only hunk]
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
ip_vs_protocol *pp is replaced by ip_vs_proto_data *pd in
function call in ip_vs_protocol struct i.e. :,
- timeout_change()
- state_transition()
ip_vs_protocol_timeout_change() got ipvs as param, due to above
and a upcoming patch - defence work
Most of this changes are triggered by Julians comment:
"tcp_timeout_change should work with the new struct ip_vs_proto_data
so that tcp_state_table will go to pd->state_table
and set_tcp_state will get pd instead of pp"
*v3
Mostly comments from Julian
The pp -> pd conversion should start from functions like
ip_vs_out() that use pp = ip_vs_proto_get(iph.protocol),
now they should use ip_vs_proto_data_get(net, iph.protocol).
conn_in_get() and conn_out_get() unused param *pp, removed.
*v4
ip_vs_protocol_timeout_change() walk the proto_data path.
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
In this phase (one), all local vars will be moved to ipvs struct.
Remaining work, add param struct net *net to a couple of
functions that is common for all protos and use ip_vs_proto_data
*v3
Removed unuset function set_state_timeout()
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
In this phase (one), all local vars will be moved to ipvs struct.
Remaining work, add param struct net *net to a couple of
functions that is common for all protos and use ip_vs_proto_data
*v3
Removed unused function set_state_timeout()
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
In this phase (one), all local vars will be moved to ipvs struct.
Remaining work, add param struct net *net to a couple of
functions that is common for all protos and use all
ip_vs_proto_data
*v3
Removed unused function as sugested by Simon
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Add support for protocol data per name-space.
in struct ip_vs_protocol, appcnt will be removed when all protos
are modified for network name-space.
This patch causes warnings of unused functions, they will be used
when next patch will be applied.
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
var sysctl_ip_vs_lblc_expiration moved to ipvs struct as
sysctl_lblc_expiration
procfs updated to handle this.
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
var sysctl_ip_vs_lblcr_expiration moved to ipvs struct as
sysctl_lblcr_expiration
procfs updated to handle this.
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Services hash tables got netns ptr a hash arg,
While Real Servers (rs) has been moved to ipvs struct.
Two new inline functions added to get net ptr from skb.
Since ip_vs is called from different contexts there is two
places to dig for the net ptr skb->dev or skb->sk
this is handled in skb_net() and skb_sknet()
Global functions, ip_vs_service_get() ip_vs_lookup_real_service()
etc have got struct net *net as first param.
If possible get net ptr skb etc,
- if not &init_net is used at this early stage of patching.
ip_vs_ctl.c procfs not ready for netns yet.
*v3
Comments by Julian
- __ip_vs_service_find and __ip_vs_svc_fwm_find are fast path,
net_eq(svc->net, net) so the check is at the end now.
- net = skb_net(skb) in ip_vs_out moved after check for skb_dst.
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Preparation for network name-space init, in this stage
some empty functions exists.
In most files there is a check if it is root ns i.e. init_net
if (!net_eq(net, &init_net))
return ...
this will be removed by the last patch, when enabling name-space.
*v3
ip_vs_conn.c merge error corrected.
net_ipvs #ifdef removed as sugested by Jan Engelhardt
[ horms@verge.net.au: Removed whitespace-change-only hunks ]
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
The IPv6 tproxy patches split IPv6 defragmentation off of conntrack, but
failed to update the #ifdef stanzas guarding the defragmentation related
fields and code in skbuff and conntrack related code in nf_defrag_ipv6.c.
This patch adds the required #ifdefs so that IPv6 tproxy can truly be used
without connection tracking.
Original report:
http://marc.info/?l=linux-netdev&m=129010118516341&w=2
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
For SHA256, RFC4868 requires to truncate ICV length to 128 bits,
hence MAX_AH_AUTH_LEN should be updated to 16.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPv4 over firewire needs to be able to remove ARP entries
from the ARP cache that belong to nodes that are removed, because
IPv4 over firewire uses ARP packets for private information
about nodes.
This information becomes invalid as soon as node drops
off the bus and when it reconnects, its only possible
to start talking to it after it responded to an ARP packet.
But ARP cache prevents such packets from being sent.
Signed-off-by: Maxim Levitsky <maximlevitsky@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
HTB takes into account skb is segmented in stats updates.
Generalize this to all schedulers.
They should use qdisc_bstats_update() helper instead of manipulating
bstats.bytes and bstats.packets
Add bstats_update() helper too for classes that use
gnet_stats_basic_packed fields.
Note : Right now, TCQ_F_CAN_BYPASS shortcurt can be taken only if no
stab is setup on qdisc.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Rosenberg pointed out that there were some signed comparison bugs
in the phonet protocol.
http://marc.info/?l=full-disclosure&m=129424528425330&w=2
The problem is that we check for array overflows but "protocol" is
signed and we don't check for array underflows. If you have already
have CAP_SYS_ADMIN then you could use the bugs to get root, or someone
could cause an oops by mistake.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When I made the patch to add mesh join/leave I
didn't pay attention to docs because it was a
proof of concept, and then when we actually did
merge it I forgot -- add docs now.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The flag is IEEE80211_TX_CTL_TX_OFFCHAN and I had
added that in a previous patch but forgotten docs.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add documentation for the new callbacks that I
forgot in the patch adding the callbacks.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Fix new kernel-doc notation warning in sock.h by annotating skc_dontcopy_*
as private fields.
Warning(include/net/sock.h:163): No description found for parameter 'skc_dontcopy_end[0]'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since nf_ct_expect_dst_hash() may be called without nf_conntrack_lock
locked, nf_ct_expect_hash_rnd should be initialized in the atomic way.
In this patch, we use nf_conntrack_hash_rnd instead of
nf_ct_expect_hash_rnd.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This allows drivers to support remain-on-channel
offload if they implement smarter timing or need
to use a device implementation like iwlwifi.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Adding a pair of set-get routines to dcbnl for setting the negotiation
flags of the various DCB features. Conforms to the CEE flavor of DCBX
The user sets these flags (enable, advertise, willing) for each feature
to be used by the DCBX engine. The 'get' routine returns which of the
features is enabled after the negotiation.
This patch is dependent on the following patches:
[net-next-2.6 PATCH 1/3] dcbnl: add support for ieee8021Qaz attributes
[net-next-2.6 PATCH 2/3] dcbnl: add appliction tlv handlers
[net-next-2.6 PATCH 3/3] net_dcb: add application notifiers
Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adding an optional DCBX capability and a pair for get-set routines for
setting the device DCBX mode. The DCBX capability is a bit field of
supported attributes. The user is expected to set the DCBX mode with a
subset of the advertised attributes.
This patch is dependent on the following patches:
[net-next-2.6 PATCH 1/3] dcbnl: add support for ieee8021Qaz attributes
[net-next-2.6 PATCH 2/3] dcbnl: add appliction tlv handlers
[net-next-2.6 PATCH 3/3] net_dcb: add application notifiers
Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
DCBx applications priorities can be changed dynamically. If
application stacks are expected to keep the skb priority
consistent with the dcbx priority the stack will need to
be notified when these changes occur.
This patch adds application notifiers for the stack to register
with.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds application tlv handlers. Networking stacks
may use the application priority to set the skb priority of
their stack using the negoatiated dcbx priority.
This patch provides the dcb_{get|set}app() routines for the
stack to query these parameters. Notice lower layer drivers
can use the dcbnl_ops routines if additional handling is
needed. Perhaps in the firmware case for example
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The IEEE8021Qaz is the IEEE standard version of CEE. The
standard has had enough significant changes from the CEE
version that many of the CEE attributes have no meaning
in the new spec or do not easily map to IEEE standards.
Rather then attempt to create a complicated mapping
between CEE and IEEE standards this patch adds a nested
IEEE attribute to the list of DCB attributes. The policy
is,
[DCB_ATTR_IFNAME]
[DCB_ATTR_STATE]
...
[DCB_ATTR_IEEE]
[DCB_ATTR_IEEE_ETS]
[DCB_ATTR_IEEE_PFC]
[DCB_ATTR_IEEE_APP_TABLE]
[DCB_ATTR_IEEE_APP]
...
The following dcbnl_rtnl_ops routines were added to handle
the IEEE standard,
int (*ieee_getets) (struct net_device *, struct ieee_ets *);
int (*ieee_setets) (struct net_device *, struct ieee_ets *);
int (*ieee_getpfc) (struct net_device *, struct ieee_pfc *);
int (*ieee_setpfc) (struct net_device *, struct ieee_pfc *);
int (*ieee_getapp) (struct net_device *, struct dcb_app *);
int (*ieee_setapp) (struct net_device *, struct dcb_app *);
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 4465b46900.
Conflicts:
net/ipv4/fib_frontend.c
As reported by Ben Greear, this causes regressions:
> Change 4465b46900 caused rules
> to stop matching the input device properly because the
> FLOWI_FLAG_MATCH_ANY_IIF is always defined in ip_dev_find().
>
> This breaks rules such as:
>
> ip rule add pref 512 lookup local
> ip rule del pref 0 lookup local
> ip link set eth2 up
> ip -4 addr add 172.16.0.102/24 broadcast 172.16.0.255 dev eth2
> ip rule add to 172.16.0.102 iif eth2 lookup local pref 10
> ip rule add iif eth2 lookup 10001 pref 20
> ip route add 172.16.0.0/24 dev eth2 table 10001
> ip route add unreachable 0/0 table 10001
>
> If you had a second interface 'eth0' that was on a different
> subnet, pinging a system on that interface would fail:
>
> [root@ct503-60 ~]# ping 192.168.100.1
> connect: Invalid argument
Reported-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The initialization function used by hci_open_dev (hci_init_req) sends
many different HCI commands. The __hci_request function should only
return when all of these commands have completed (or a timeout occurs).
Several of these commands cause hci_req_complete to be called which
causes __hci_request to return prematurely.
This patch fixes the issue by adding a new hdev->req_last_cmd variable
which is set during the initialization procedure. The hci_req_complete
function will no longer mark the request as complete until the command
matching hdev->req_last_cmd completes.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
This patch adds Bluetooth Management interface events for controller
addition and removal. The events correspond to the existing HCI_DEV_REG
and HCI_DEV_UNREG stack internal events.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
This patch implements the read_info command which is used to fetch basic
info about an adapter.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
This patch implements the read_index_list command through which
userspace can get a list of current adapter indices.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
This patch implements the initial read_version command that userspace
will use before any other management interface operations.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
The throughput LED trigger was always active when
the radio was enabled. In most cases that's likely
the desired behaviour, but iwlwifi requires it to
be only active when one of the virtual interfaces
is actually "connected" in some way.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
iwlwifi and other drivers like to blink their LED
based on throughput. Implement this generically in
mac80211, based on a throughput table the driver
specifies. That way, drivers can set the blink
frequencies depending on their desired behaviour
and max throughput.
All the drivers need to do is provide an LED class
device, best with blink hardware offload.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Conflicts:
MAINTAINERS
arch/arm/mach-omap2/pm24xx.c
drivers/scsi/bfa/bfa_fcpim.c
Needed to update to apply fixes for which the old branch was too
outdated.
This function has three bugs:
1) The offset should be valid most of the time, this is just
a sanity check, therefore we should use "likely" not "unlikely"
2) This is the only place where we can check for arithmetic overflow
of the pointer plus the length.
3) The existing range checks are off by one, the valid range is
skb->head to skb_tail_pointer(), inclusive.
Based almost entirely upon a patch by Ralph Loader.
Reported-by: Ralph Loader <suckfish@ihug.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch changes the default initial receive window to 10 mss
(defined constant). The default window is limited to the maximum
of 10*1460 and 2*mss (when mss > 1460).
draft-ietf-tcpm-initcwnd-00 is a proposal to the IETF that recommends
increasing TCP's initial congestion window to 10 mss or about 15KB.
Leading up to this proposal were several large-scale live Internet
experiments with an initial congestion window of 10 mss (IW10), where
we showed that the average latency of HTTP responses improved by
approximately 10%. This was accompanied by a slight increase in
retransmission rate (0.5%), most of which is coming from applications
opening multiple simultaneous connections. To understand the extreme
worst case scenarios, and fairness issues (IW10 versus IW3), we further
conducted controlled testbed experiments. We came away finding minimal
negative impact even under low link bandwidths (dial-ups) and small
buffers. These results are extremely encouraging to adopting IW10.
However, an initial congestion window of 10 mss is useless unless a TCP
receiver advertises an initial receive window of at least 10 mss.
Fortunately, in the large-scale Internet experiments we found that most
widely used operating systems advertised large initial receive windows
of 64KB, allowing us to experiment with a wide range of initial
congestion windows. Linux systems were among the few exceptions that
advertised a small receive window of 6KB. The purpose of this patch is
to fix this shortcoming.
References:
1. A comprehensive list of all IW10 references to date.
http://code.google.com/speed/protocols/tcpm-IW10.html
2. Paper describing results from large-scale Internet experiments with IW10.
http://ccr.sigcomm.org/drupal/?q=node/621
3. Controlled testbed experiments under worst case scenarios and a
fairness study.
http://www.ietf.org/proceedings/79/slides/tcpm-0.pdf
4. Raw test data from testbed experiments (Linux senders/receivers)
with initial congestion and receive windows of both 10 mss.
http://research.csc.ncsu.edu/netsrv/?q=content/iw10
5. Internet-Draft. Increasing TCP's Initial Window.
https://datatracker.ietf.org/doc/draft-ietf-tcpm-initcwnd/
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As has been pointed out by Daniel Halperin some devices (e.g. Intel IWL5100)
can only TX from a subset of RX antennas, so use separate availability masks
for RX and TX.
Signed-off-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Userspace will now be allowed to toggle between the default path
selection algorithm (HWMP, implemented in the kernel), and a vendor
specific alternative. Also in the same patch, allow userspace to add
information elements to mesh beacons. This is accordance with the
Extensible Path Selection Framework specified in version 7.0 of the
802.11s draft.
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Mesh parameters can be to setup a mesh or to configure it.
This patch renames the ambiguous name mesh_params to mesh_config
in preparation for mesh_setup.
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
All rt2x00 drivers except rt2800pci call ieee80211_tx_status() from
a workqueue, which causes "NOHZ: local_softirq_pending 08" messages.
To fix it, add ieee80211_tx_status_ni() similar to ieee80211_rx_ni()
which can be called from process context, and call it from
rt2x00lib_txdone(). For the rt2800pci special case a driver
flag is introduced.
https://bugzilla.kernel.org/show_bug.cgi?id=24892
Signed-off-by: Johannes Stezenbach <js@sig21.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Flush the routing cache only of entries that match the
network namespace in which the purge event occurred.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Pawel reported a panic related to handling shared skbs in ixgbe
incorrectly. So we need to revert my previous patch to work around
this bug. Instead of reverting the patch completely, I just revert
the essential lines, so we can add the previous optimization
back more easily in future.
commit 3511c9132f
Author: Changli Gao <xiaosuo@gmail.com>
Date: Sat Oct 16 13:04:08 2010 +0000
net_sched: remove the unused parameter of qdisc_create_dflt()
Reported-by: Pawel Staszewski <pstaszewski@itcare.pl>
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch modifies IPsec6 to fragment IPv6 packets that are
locally generated as needed.
This version of the patch only fragments in tunnel mode, so that fragment
headers will not be obscured by ESP in transport mode.
Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Special care is taken inside sk_port_alloc to avoid overwriting
skc_node/skc_nulls_node. We should also avoid overwriting
skc_bind_node/skc_portaddr_node.
The patch fixes the following crash:
BUG: unable to handle kernel paging request at fffffffffffffff0
IP: [<ffffffff812ec6dd>] udp4_lib_lookup2+0xad/0x370
[<ffffffff812ecc22>] __udp4_lib_lookup+0x282/0x360
[<ffffffff812ed63e>] __udp4_lib_rcv+0x31e/0x700
[<ffffffff812bba45>] ? ip_local_deliver_finish+0x65/0x190
[<ffffffff812bbbf8>] ? ip_local_deliver+0x88/0xa0
[<ffffffff812eda35>] udp_rcv+0x15/0x20
[<ffffffff812bba45>] ip_local_deliver_finish+0x65/0x190
[<ffffffff812bbbf8>] ip_local_deliver+0x88/0xa0
[<ffffffff812bb2cd>] ip_rcv_finish+0x32d/0x6f0
[<ffffffff8128c14c>] ? netif_receive_skb+0x99c/0x11c0
[<ffffffff812bb94b>] ip_rcv+0x2bb/0x350
[<ffffffff8128c14c>] netif_receive_skb+0x99c/0x11c0
Signed-off-by: Leonard Crestez <lcrestez@ixiacom.com>
Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some windows versions have wrong RFC1323 implementations, with SYN and
SYNACKS messages containing zero tcp timestamps.
We relaxed in commit fc1ad92dfc the passive connection case
(Windows connects to a linux machine), but the reverse case (linux
connects to a Windows machine) has an analogue problem when tsvals from
windows machine are 'negative' (high order bit set) : PAWS triggers and
we drops incoming messages.
Fix this by making zero ts_recent value special, allowing frame to be
processed.
Based on a report and initial patch from Dmitiy Balakin
Bugzilla reference : https://bugzilla.kernel.org/show_bug.cgi?id=24842
Reported-by: dmitriy.balakin@nicneiron.ru
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add dev_close_many and dev_deactivate_many to factorize another
sync-rcu operation on the netdevice unregister path.
$ modprobe dummy numdummies=10000
$ ip link set dev dummy* up
$ time rmmod dummy
Without the patch With the patch
real 0m 24.63s real 0m 5.15s
user 0m 0.00s user 0m 0.00s
sys 0m 6.05s sys 0m 5.14s
Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new notification to indicate that a received, unprotected
Deauthentication or Disassociation frame was dropped due to
management frame protection being in use. This notification is
needed to allow user space (e.g., wpa_supplicant) to implement
SA Query procedure to recover from association state mismatch
between an AP and STA.
This is needed to avoid getting stuck in non-working state when MFP
(IEEE 802.11w) is used and a protected Deauthentication or
Disassociation frame is dropped for any reason. After that, the
station would silently discard any unprotected Deauthentication or
Disassociation frame that could be indicating that the AP does not
have association for the STA (when the Reason Code would be 6 or 7).
IEEE Std 802.11w-2009, 11.13 describes this recovery mechanism.
Signed-off-by: Jouni Malinen <j@w1.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The IPv6 tproxy patches split IPv6 defragmentation off of conntrack, but
failed to update the #ifdef stanzas guarding the defragmentation related
fields and code in skbuff and conntrack related code in nf_defrag_ipv6.c.
This patch adds the required #ifdefs so that IPv6 tproxy can truly be used
without connection tracking.
Original report:
http://marc.info/?l=linux-netdev&m=129010118516341&w=2
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Allow drivers or rate control algorithms to specify BlockAck session
timeout when initiating an ADDBA transaction. This is useful in cases
where maintaining persistent BA sessions does not incur any overhead.
The current timeout value of 5000 TUs is retained for all non ath9k/ath9k_htc
drivers.
Signed-off-by: Sujith Manoharan <Sujith.Manoharan@atheros.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
With the upcoming hardware offload implementation,
some devices will have a different maximum duration
for the remain-on-channel command. Advertise the
maximum duration in mac80211, and make mac80211 set
it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Like RTAX_ADVMSS, make the default calculation go through a dst_ops
method rather than caching the computation in the routing cache
entries.
Now dst metrics are pretty much left as-is when new entries are
created, thus optimizing metric sharing becomes a real possibility.
Signed-off-by: David S. Miller <davem@davemloft.net>
Make all RTAX_ADVMSS metric accesses go through a new helper function,
dst_metric_advmss().
Leave the actual default metric as "zero" in the real metric slot,
and compute the actual default value dynamically via a new dst_ops
AF specific callback.
For stacked IPSEC routes, we use the advmss of the path which
preserves existing behavior.
Unlike ipv4/ipv6, DecNET ties the advmss to the mtu and thus updates
advmss on pmtu updates. This inconsistency in advmss handling
results in more raw metric accesses than I wish we ended up with.
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow userspace to specify that a given key
is default only for unicast and/or multicast
transmissions. Only WEP keys are for both,
WPA/RSN keys set here are GTKs for multicast
only. For more future flexibility, allow to
specify all combiations.
Wireless extensions can only set both so use
nl80211; WEP keys (connect keys) must be set
as default for both (but 802.1X WEP is still
possible).
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add a field to wiphy for the hardware to report the availble antennas for
configuration. Only if this is set to something bigger than zero, will the
anntenna configuration ops be executed.
Allthough this could be a simple number of antennas, I defined it as a bitmap
of antennas which are available for configuration, since it's more consistent
with the rest of the antenna API and there could be cases where the
hardware allows only configuration of certain antennas. As it does not make
much of a difference in size or normal usage, I think it's better to be able to
support this, in case the need arises.
The antenna configuration is now also checked against the availabe antennas and
rejected if it does not match.
Signed-off-by: Bruno Randolf <br1@einfach.org>
--
v3: always apply available antenna mask (for "all" antennas case).
v2: reject antenna configurations which don't match the available antennas
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Always go through a new ip4_dst_hoplimit() helper, just like ipv6.
This allowed several simplifications:
1) The interim dst_metric_hoplimit() can go as it's no longer
userd.
2) The sysctl_ip_default_ttl entry no longer needs to use
ipv4_doint_and_flush, since the sysctl is not cached in
routing cache metrics any longer.
3) ipv4_doint_and_flush no longer needs to be exported and
therefore can be marked static.
When ipv4_doint_and_flush_strategy was removed some time ago,
the external declaration in ip.h was mistakenly left around
so kill that off too.
We have to move the sysctl_ip_default_ttl declaration into
ipv4's route cache definition header net/route.h, because
currently net/ip.h (where the declaration lives now) has
a back dependency on net/route.h
Signed-off-by: David S. Miller <davem@davemloft.net>
The XFRMA_TFCPAD attribute for XFRM state installation configures
Traffic Flow Confidentiality by padding ESP packets to a specified
length.
Signed-off-by: Martin Willi <martin@strongswan.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Followup of commit b178bb3dfc (net: reorder struct sock fields)
Optimize INET input path a bit further, by :
1) moving sk_refcnt close to sk_lock.
This reduces number of dirtied cache lines by one on 64bit arches (and
64 bytes cache line size).
2) moving inet_daddr & inet_rcv_saddr at the beginning of sk
(same cache line than hash / family / bound_dev_if / nulls_node)
This reduces number of accessed cache lines in lookups by one, and dont
increase size of inet and timewait socks.
inet and tw sockets now share same place-holder for these fields.
Before patch :
offsetof(struct sock, sk_refcnt) = 0x10
offsetof(struct sock, sk_lock) = 0x40
offsetof(struct sock, sk_receive_queue) = 0x60
offsetof(struct inet_sock, inet_daddr) = 0x270
offsetof(struct inet_sock, inet_rcv_saddr) = 0x274
After patch :
offsetof(struct sock, sk_refcnt) = 0x44
offsetof(struct sock, sk_lock) = 0x48
offsetof(struct sock, sk_receive_queue) = 0x68
offsetof(struct inet_sock, inet_daddr) = 0x0
offsetof(struct inet_sock, inet_rcv_saddr) = 0x4
compute_score() (udp or tcp) now use a single cache line per ignored
item, instead of two.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use helper functions to hide all direct accesses, especially writes,
to dst_entry metrics values.
This will allow us to:
1) More easily change how the metrics are stored.
2) Implement COW for metrics.
In particular this will help us put metrics into the inetpeer
cache if that is what we end up doing. We can make the _metrics
member a pointer instead of an array, initially have it point
at the read-only metrics in the FIB, and then on the first set
grab an inetpeer entry and point the _metrics member there.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Add a new BSS attribute to allow hostapd to set the current HT opmode.
Otherwise drivers won't be able to set up protection for HT rates in
AP mode.
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
In order to send data to management control sockets the function should:
- skip checks intended for raw HCI data and stack internal events
- make sure RAW HCI data or stack internal events don't go to
management control sockets
In order to accomplish this the patch adds a new member to the bluetooth
skb private data to flag skb's that are destined for management control
sockets.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Add initial definitions for the new Bluetooth Management interface to
the bluetooth headers.
Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Extend nl80211 to report an exponential weighted moving average (EWMA) of the
signal value. Since the signal value usually fluctuates between different
packets, an average can be more useful than the value of the last packet.
This uses the recently added generic EWMA library function.
--
v2: fix ABI breakage and change factor to be a power of 2.
Signed-off-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Instead of tying mesh activity to interface up,
add join and leave commands for mesh. Since we
must be backward compatible, let cfg80211 handle
joining a mesh if a mesh ID was pre-configured
when the device goes up.
Note that this therefore must modify mac80211 as
well since mac80211 needs to lose the logic to
start the mesh on interface up.
We now allow querying mesh parameters before the
mesh is connected, which simply returns defaults.
Setting them (internally renamed to "update") is
only allowed while connected. Specify them with
the new mesh join command instead where needed.
In mac80211, beaconing must now also follow the
mesh enabled/not enabled state, which is done
by testing the mesh ID.
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
cfg80211 used to do all its bookkeeping in
the notifier, but some new stuff will have
to use local variables so make the callback
return the netdev pointer.
Tested-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The TTL in path selection information elements is different from
the mesh ttl used in mesh data frames. Version 7.03 of the 11s
draft calls this ttl 'Element TTL'.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Pavel Emelyanov tried to fix a race between sk_filter_(de|at)tach and
sk_clone() in commit 47e958eac2
Problem is we can have several clones sharing a common sk_filter, and
these clones might want to sk_filter_attach() their own filters at the
same time, and can overwrite old_filter->rcu, corrupting RCU queues.
We can not use filter->rcu without being sure no other thread could do
the same thing.
Switch code to a more conventional ref-counting technique : Do the
atomic decrement immediately and queue one rcu call back when last
reference is released.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As part of the removal of TIPC's native API support it is no longer
necessary for TIPC to export symbols for routines that can be called
by kernel-based applications, nor for it to have header files that
kernel-based applications can include to access the declarations for
those routines. This commit eliminates the exporting of symbols by
TIPC and migrates the contents of each obsolete native API include
file into its corresponding non-native API equivalent.
The code which was migrated in this commit was migrated intact, in
that there are no technical changes combined with the relocation.
Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These macros have been defined for several years since v2.6.12-rc2(tracing by git),
but never be used. So remove them.
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
__ICMP_MIB_MAX is equal to the total number of icmp mib,
So no need to add 1. This wastes 4/8 bytes memory.
Change it to be same as ICMP6_MIB_MAX, TCP_MIB_MAX, UDP_MIB_MAX.
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The last patch with the same title was for mac80211 ops, accidentally.
Signed-off-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The only thing AF-specific about remembering the timestamp
for a time-wait TCP socket is getting the peer.
Abstract that behind a new timewait_sock_ops vector.
Support for real IPV6 sockets is not filled in yet, but
curiously this makes timewait recycling start to work
for v4-mapped ipv6 sockets.
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove extra spaces from legal text so that legal stuff looks
the same for all bluetooth code.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Do not use assignment in IF condition, remove extra spaces,
fixing typos, simplify code.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Do not initialize static vars to zero, macros with complex values
shall be enclosed with (), remove unneeded braces.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Remove extra spaces, assignments in if statement, zeroing static
variables, extra braces. Fix includes.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Do not use assignments in IF condition, remove extra spaces
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Then we can make a completely generic tcp_remember_stamp()
that uses ->get_peer() as a helper, minimizing the AF specific
code and minimizing the eventual code duplication when we implement
the ipv6 side of TW recycling.
Signed-off-by: David S. Miller <davem@davemloft.net>
All rt2x00 drivers except rt2800pci call ieee80211_tx_status() from
a workqueue, which causes "NOHZ: local_softirq_pending 08" messages.
To fix it, add ieee80211_tx_status_ni() similar to ieee80211_rx_ni()
which can be called from process context, and call it from
rt2x00lib_txdone(). For the rt2800pci special case a driver
flag is introduced.
Signed-off-by: Johannes Stezenbach <js@sig21.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
With p2p, it is sometimes necessary to transmit
a frame (typically an action frame) on another
channel than the current channel. Enable this
through the CMD_FRAME API, and allow it to wait
for a response. A new command allows that wait
to be aborted.
However, allow userspace to specify whether or
not it wants to allow off-channel TX, it may
actually want to use the same channel only.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Its easy to eat all kernel memory and trigger NMI watchdog, using an
exploit program that queues unix sockets on top of others.
lkml ref : http://lkml.org/lkml/2010/11/25/8
This mechanism is used in applications, one choice we have is to have a
recursion limit.
Other limits might be needed as well (if we queue other types of files),
since the passfd mechanism is currently limited by socket receive queue
sizes only.
Add a recursion_level to unix socket, allowing up to 4 levels.
Each time we send an unix socket through sendfd mechanism, we copy its
recursion level (plus one) to receiver. This recursion level is cleared
when socket receive queue is emptied.
Reported-by: Марк Коренберг <socketpair@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1. SCTP_CMD_NUM_VERBS,SCTP_CMD_MAX
These two macros have never been used for several years since v2.6.12-rc2.
2.sctp_port_rover,sctp_port_alloc_lock
The commit 063930 abandoned global variables of port_rover and port_alloc_lock,
but still keep two macros to refer to them.
So, remove them now.
commit 0639300900
Author: Stephen Hemminger <shemminger@linux-foundation.org>
Date: Wed Oct 10 17:30:18 2007 -0700
[SCTP]: port randomization
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fl->fl_gre_key is network byte order contrary to fl->fl_icmp_*.
Make xfrm_flowi_{s|d}port return network byte order values for gre
key too.
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
These macros have been existed for several years since v2.6.12-rc2.
But they never be used. So remove them now.
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As David pointed out correctly, updates to af-specific attributes
are currently not atomic. If multiple changes are requested and
one of them fails, previous updates may have been applied already
leaving the link behind in a undefined state.
This patch splits the function parse_link_af() into two functions
validate_link_af() and set_link_at(). validate_link_af() is placed
to validate_linkmsg() check for errors as early as possible before
any changes to the link have been made. set_link_af() is called to
commit the changes later.
This method is not fail proof, while it is currently sufficient
to make set_link_af() inerrable and thus 100% atomic, the
validation function method will not be able to detect all error
scenarios in the future, there will likely always be errors
depending on states which are f.e. not protected by rtnl_mutex
and thus may change between validation and setting.
Also, instead of silently ignoring unknown address families and
config blocks for address families which did not register a set
function the errors EAFNOSUPPORT respectively EOPNOSUPPORT are
returned to avoid comitting 4 out of 5 update requests without
notifying the user.
Signed-off-by: Thomas Graf <tgraf@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a sysclt net.ipv4.vs.sync_version
that can be used to send sync msg in version 0 or 1 format.
sync_version value is logical,
Value 1 (default) New version
0 Plain old version
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Enable sending and removal of version 0 sending
Affected functions,
ip_vs_sync_buff_create()
ip_vs_sync_conn()
ip_vs_core.c removal of IPv4 check.
*v5
Just check cp->pe_data_len in ip_vs_sync_conn
Check if padding needed before adding a new sync_conn
to the buffer, i.e. avoid sending padding at the end.
*v4
moved sanity check and pe_name_len after sloop.
use cp->pe instead of cp->dest->svc->pe
real length in each sync_conn, not padded length
however total size of a sync_msg includes padding.
*v3
Sending ip_vs_sync_conn_options in network order.
Sending Templates for ONE_PACKET conn.
Renaming of ip_vs_sync_mesg to ip_vs_sync_mesg_v0
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Functionality improvements
* flags changed from 16 to 32 bits
* fwmark added (32 bits)
* timeout in sec. added (32 bits)
* pe data added (Variable length)
* IPv6 capabilities (3x16 bytes for addr.)
* Version and type in every conn msg.
ip_vs_process_message() now handles Version 1 messages
and will call ip_vs_process_message_v0() for version 0 messages.
ip_vs_proc_conn() is common for both version, and handles the update of
connection hash.
ip_vs_conn_fill_param_sync() - Version 1 messages only
ip_vs_conn_fill_param_sync_v0() - Version 0 messages only
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
One struct will have fwmark added:
* ip_vs_conn
ip_vs_conn_new() and ip_vs_find_dest()
will have an extra param - fwmark
The effects of that, is in this patch.
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
This adds the ability for drivers to use CQM events
to notify about packet loss for specific stations
(which could be the AP for the managed mode case).
Since the threshold might be determined by the
driver (it isn't passed in right now) it will be
passed out of the driver to userspace in the event.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
- store the multicast rate as an index instead of the rate value
(reduces cpu overhead in a hotpath)
- validate the rate values (must match a bitrate in at least one sband)
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Lower SCM_MAX_FD from 255 to 253 so that allocations for scm_fp_list are
halved. (commit f8d570a4 added two pointers in this structure)
scm_fp_dup() should not copy whole structure (and trigger kmemcheck
warnings), but only the used part. While we are at it, only allocate
needed size.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6_sk_mc_lock rwlock becomes a spinlock.
readers (inet6_mc_check()) now takes rcu_read_lock() instead of read
lock. Writers dont need to disable BH anymore.
struct ipv6_mc_socklist objects are reclaimed after one RCU grace
period.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When two cards are connected with the same regulatory domain
if CRDA had a delayed response then cfg80211's own set regulatory
domain would still be the world regulatory domain. There was a bug
on cfg80211's logic such that it assumed that once you pegged a
request as the last request it was already the currently set
regulatory domain. This would mean we would race setting a stale
regulatory domain to secondary cards which had the same regulatory
domain since the alpha2 would match.
We fix this by processing each regulatory request atomically,
and only move on to the next one once we get it fully processed.
In the case CRDA is not present we will simply world roam.
This issue is only present when you have a slow system and the
CRDA processing is delayed. Because of this it is not a known
regression.
Without this fix when a delay is present with CRDA the second card
would end up with an intersected regulatory domain and not allow it
to use the channels it really is designed for. When two cards with
two different regulatory domains were inserted you'd end up
rejecting the second card's regulatory domain request.
This fails with mac80211_hswim's regtest=2 (two requests, same alpha2)
and regtest=3 (two requests, different alpha2) module parameter
options.
This was reproduced and tested against mac80211_hwsim using this
CRDA delayer:
#!/bin/bash
echo $COUNTRY >> /tmp/log
sleep 2
/sbin/crda.orig
And these regulatory tests:
modprobe mac80211_hwsim regtest=2
modprobe mac80211_hwsim regtest=3
Reported-by: Mark Mentovai <mark@moxienet.com>
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Tested-by: Mark Mentovai <mark@moxienet.com>
Tested-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This commit is same in nature as v2.6.37-rc1-755-g3654654; the network
namespace itself is not modified when calling net_generic, so the
parameter can be const.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend nl80211 to report an exponential weighted moving average (EWMA) of the
signal value. Since the signal value usually fluctuates between different
packets, an average can be more useful than the value of the last packet.
This uses the recently added generic EWMA library function.
Signed-off-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
jiffies is defined as "volatile".
extern unsigned long volatile __jiffy_data jiffies;
ACCESS_ONCE() uses "volatile".
As a result, some compilers warn duplicate `volatile' for ACCESS_ONCE(jiffies).
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
In many places we've just hardcoded the
AC numbers -- which is a relic from the
original mac80211 (d80211). Add constants
for them so we know what we're talking
about.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Use the macros defined for the members of flowi to clean the code up.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each net_device contains address family specific data such as
per device settings and statistics. We already expose this data
via procfs/sysfs and partially netlink.
The netlink method requires the requester to send one RTM_GETLINK
request for each address family it wishes to receive data of
and then merge this data itself.
This patch implements a new API which combines all address family
specific link data in a new netlink attribute IFLA_AF_SPEC.
IFLA_AF_SPEC contains a sequence of nested attributes, one for each
address family which in turn defines the structure of its own
attribute. Example:
[IFLA_AF_SPEC] = {
[AF_INET] = {
[IFLA_INET_CONF] = ...,
},
[AF_INET6] = {
[IFLA_INET6_FLAGS] = ...,
[IFLA_INET6_CONF] = ...,
}
}
The API also allows for address families to implement a function
which parses the IFLA_AF_SPEC attribute sent by userspace to
implement address family specific link options.
Signed-off-by: Thomas Graf <tgraf@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chipsets with hardware based connection monitoring need to autonomically
send directed probe-request frames to the AP (in the event of beacon loss,
for example.)
For the hardware to be able to do this, it requires a template for the frame
to transmit to the AP, filled in with the BSSID and SSID of the AP, but also
the supported rate IE's.
This patch adds a function to mac80211, which allows the hardware driver to
fetch this template after association, so it can be configured to the hardware.
Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Allow antenna configuration by calling driver's function for it.
We disallow antenna configuration if the wiphy is already running, mainly to
make life easier for 802.11n drivers which need to recalculate HT capabilites.
Signed-off-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Allow setting of TX and RX antennas configuration via nl80211.
The antenna configuration is defined as a bitmap of allowed antennas to use.
This API can be used to mask out antennas which are not attached or should not
be used for other reasons like regulatory concerns or special setups.
Separate bitmaps are used for RX and TX to allow configuring different antennas
for receiving and transmitting. Each bitmap is 32 bit long, each bit
representing one antenna, starting with antenna 1 at the first bit. If an
antenna bit is set, this means the driver is allowed to use this antenna for RX
or TX respectively; if the bit is not set the hardware is not allowed to use
this antenna.
Using bitmaps has the benefit of allowing for a flexible configuration
interface which can support many different configurations and which can be used
for 802.11n as well as non-802.11n devices. Instead of relying on some hardware
specific assumptions, drivers can use this information to know which antennas
are actually attached to the system and derive their capabilities based on
that.
802.11n devices should enable or disable chains, based on which antennas are
present (If all antennas belonging to a particular chain are disabled, the
entire chain should be disabled). HT capabilities (like STBC, TX Beamforming,
Antenna selection) should be calculated based on the available chains after
applying the antenna masks. Should a 802.11n device have diversity antennas
attached to one of their chains, diversity can be enabled or disabled based on
the antenna information.
Non-802.11n drivers can use the antenna masks to select RX and TX antennas and
to enable or disable antenna diversity.
While covering chainmasks for 802.11n and the standard "legacy" modes "fixed
antenna 1", "fixed antenna 2" and "diversity" this API also allows more rare,
but useful configurations as follows:
1) Send on antenna 1, receive on antenna 2 (or vice versa). This can be used to
have a low gain antenna for TX in order to keep within the regulatory
constraints and a high gain antenna for RX in order to receive weaker signals
("speak softly, but listen harder"). This can be useful for building long-shot
outdoor links. Another usage of this setup is having a low-noise pre-amplifier
on antenna 1 and a power amplifier on the other antenna. This way transmit
noise is mostly kept out of the low noise receive channel.
(This would be bitmaps: tx 1 rx 2).
2) Another similar setup is: Use RX diversity on both antennas, but always send
on antenna 1. Again that would allow us to benefit from a higher gain RX
antenna, while staying within the legal limits.
(This would be: tx 0 rx 3).
3) And finally there can be special experimental setups in research and
development even with pre 802.11n hardware where more than 2 antennas are
available. It's good to keep the API simple, yet flexible.
Signed-off-by: Bruno Randolf <br1@einfach.org>
--
v7: Made bitmasks 32 bit wide and rebased to latest wireless-testing.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The lower driver is notified when the fragmentation threshold changes
and upon a reconfig of the interface.
If the driver supports hardware TX fragmentation, don't fragment
packets in the stack.
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Right now, fields in struct sock are not optimally ordered, because each
path (RX softirq, TX completion, RX user, TX user) has to touch fields
that are contained in many different cache lines.
The really critical thing is to shrink number of cache lines that are
used at RX softirq time : CPU handling softirqs for a device can receive
many frames per second for many sockets. If load is too big, we can drop
frames at NIC level. RPS or multiqueue cards can help, but better reduce
latency if possible.
This patch starts with UDP protocol, then additional patches will try to
reduce latencies of other ones as well.
At RX softirq time, fields of interest for UDP protocol are :
(not counting ones in inet struct for the lookup)
Read/Written:
sk_refcnt (atomic increment/decrement)
sk_rmem_alloc & sk_backlog.len (to check if there is room in queues)
sk_receive_queue
sk_backlog (if socket locked by user program)
sk_rxhash
sk_forward_alloc
sk_drops
Read only:
sk_rcvbuf (sk_rcvqueues_full())
sk_filter
sk_wq
sk_policy[0]
sk_flags
Additional notes :
- sk_backlog has one hole on 64bit arches. We can fill it to save 8
bytes.
- sk_backlog is used only if RX sofirq handler finds the socket while
locked by user.
- sk_rxhash is written only once per flow.
- sk_drops is written only if queues are full
Final layout :
[1] One section grouping all read/write fields, but placing rxhash and
sk_backlog at the end of this section.
[2] One section grouping all read fields in RX handler
(sk_filter, sk_rcv_buf, sk_wq)
[3] Section used by other paths
I'll post a patch on its own to put sk_refcnt at the end of struct
sock_common so that it shares same cache line than section [1]
New offsets on 64bit arch :
sizeof(struct sock)=0x268
offsetof(struct sock, sk_refcnt) =0x10
offsetof(struct sock, sk_lock) =0x48
offsetof(struct sock, sk_receive_queue)=0x68
offsetof(struct sock, sk_backlog)=0x80
offsetof(struct sock, sk_rmem_alloc)=0x80
offsetof(struct sock, sk_forward_alloc)=0x98
offsetof(struct sock, sk_rxhash)=0x9c
offsetof(struct sock, sk_rcvbuf)=0xa4
offsetof(struct sock, sk_drops) =0xa0
offsetof(struct sock, sk_filter)=0xa8
offsetof(struct sock, sk_wq)=0xb0
offsetof(struct sock, sk_policy)=0xd0
offsetof(struct sock, sk_flags) =0xe0
Instead of :
sizeof(struct sock)=0x270
offsetof(struct sock, sk_refcnt) =0x10
offsetof(struct sock, sk_lock) =0x50
offsetof(struct sock, sk_receive_queue)=0xc0
offsetof(struct sock, sk_backlog)=0x70
offsetof(struct sock, sk_rmem_alloc)=0xac
offsetof(struct sock, sk_forward_alloc)=0x10c
offsetof(struct sock, sk_rxhash)=0x128
offsetof(struct sock, sk_rcvbuf)=0x4c
offsetof(struct sock, sk_drops) =0x16c
offsetof(struct sock, sk_filter)=0x198
offsetof(struct sock, sk_wq)=0x88
offsetof(struct sock, sk_policy)=0x98
offsetof(struct sock, sk_flags) =0x130
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
UDP sockets refcount is usually 2, unless an incoming frame is going to
be queued in receive or backlog queue.
Using atomic_inc_not_zero_hint() permits to reduce latency, because
processor issues less memory transactions.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The changed functions do not modify the NL messages and/or attributes
at all. They should use const (similar to strchr), so that callers
which have a const nlmsg/nlattr around can make use of them without
casting.
While at it, constify a data array.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The dest of a connection may not exist if it has been created as the result
of connection synchronisation. But in order for connection entries for
templates with persistence engine data created through connection
synchronisation to be valid access to the persistence engine pointer is
required. So add the persistence engine to the connection itself.
Signed-off-by: Simon Horman <horms@verge.net.au>
WIPHY_FLAG_IBSS_RSN is BIT(7) as is WIPHY_FLAG_CONTROL_PORT_PROTOCOL. Change
to BIT(8).
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When b43legacy is compiled on the arm platform, the following errors are seen:
CC [M] drivers/net/wireless/b43legacy/xmit.o
In file included from include/net/dst.h:11,
from drivers/net/wireless/b43legacy/xmit.c:31:
include/net/dst_ops.h:28: error: expected ':', ',', ';', '}' or '__attribute__'
before '____cacheline_aligned_in_smp'
include/net/dst_ops.h: In function 'dst_entries_get_fast':
include/net/dst_ops.h:33: error: 'struct dst_ops' has no member named
'pcpuc_entries'
include/net/dst_ops.h: In function 'dst_entries_get_slow':
include/net/dst_ops.h:41: error: 'struct dst_ops' has no member named
'pcpuc_entries'
include/net/dst_ops.h: In function 'dst_entries_add':
include/net/dst_ops.h:49: error: 'struct dst_ops' has no member named
'pcpuc_entries'
include/net/dst_ops.h: In function 'dst_entries_init':
include/net/dst_ops.h:55: error: 'struct dst_ops' has no member named
'pcpuc_entries'
include/net/dst_ops.h: In function 'dst_entries_destroy':
include/net/dst_ops.h:60: error: 'struct dst_ops' has no member named
'pcpuc_entries'
make[4]: *** [drivers/net/wireless/b43legacy/xmit.o] Error 1
make[3]: *** [drivers/net/wireless/b43legacy] Error 2
make[2]: *** [drivers/net/wireless] Error 2
make[1]: *** [drivers/net] Error 2
make: *** [drivers] Error 2
The cause is a missing include of <linux/cache.h>, which is present for
i386 and x86_64 architectures, but not for arm.
Signed-off-by: Arnd Hannemann <arnd@arndnet.de>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Stable <stable@kernel.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The GRE Key field is intended to be used for identifying an individual
traffic flow within a tunnel. It is useful to be able to have XFRM
policy selector matches to have different policies for different
GRE tunnels.
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
We should be enabling country IE hints for WIPHY_FLAG_STRICT_REGULATORY
even if we haven't yet recieved regulatory domain hint for the driver
if it needed one. Without this Country IEs are not passed on to drivers
that have set WIPHY_FLAG_STRICT_REGULATORY, today this is just all
Atheros chipset drivers: ath5k, ath9k, ar9170, carl9170.
This was part of the original design, however it was completely
overlooked...
Cc: Easwar Krishnan <easwar.krishnan@atheros.com>
Cc: stable@kernel.org
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add some __rcu annotations and use helpers to reduce number of sparse
warnings (CONFIG_SPARSE_RCU_POINTER=y)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
As we own the conntrack and the others can't see it until we confirm it,
we don't need to use atomic bit operation on ct->status.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (66 commits)
can-bcm: fix minor heap overflow
gianfar: Do not call device_set_wakeup_enable() under a spinlock
ipv6: Warn users if maximum number of routes is reached.
docs: Add neigh/gc_thresh3 and route/max_size documentation.
axnet_cs: fix resume problem for some Ax88790 chip
ipv6: addrconf: don't remove address state on ifdown if the address is being kept
tcp: Don't change unlocked socket state in tcp_v4_err().
x25: Prevent crashing when parsing bad X.25 facilities
cxgb4vf: add call to Firmware to reset VF State.
cxgb4vf: Fail open if link_start() fails.
cxgb4vf: flesh out PCI Device ID Table ...
cxgb4vf: fix some errors in Gather List to skb conversion
cxgb4vf: fix bug in Generic Receive Offload
cxgb4vf: don't implement trivial (and incorrect) ndo_select_queue()
ixgbe: Look inside vlan when determining offload protocol.
bnx2x: Look inside vlan when determining checksum proto.
vlan: Add function to retrieve EtherType from vlan packets.
virtio-net: init link state correctly
ucc_geth: Fix deadlock
ucc_geth: Do not bring the whole IF down when TX failure.
...
in_dev->mc_list is protected by one rwlock (in_dev->mc_list_lock).
This can easily be converted to a RCU protection.
Writers hold RTNL, so mc_list_lock is removed, not replaced by a
spinlock.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Cypher Wu <cypher.w@gmail.com>
Cc: Américo Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ct->proto is big(60 bytes) due to structure ip_ct_tcp, and we don't need
to initialize the whole for all the other protocols. This patch moves
proto to the end of structure nf_conn, and pushes the initialization down
to the individual protocols.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
When we test rt->fl.iif against zero, we're seeing if it's
an output or an input route.
Make that explicit with some helper functions.
Signed-off-by: David S. Miller <davem@davemloft.net>
It seems idev field in struct rtable has no special purpose, but adding
extra atomic ops.
We hold refcounts on the device itself (using percpu data, so pretty
cheap in current kernel).
infiniband case is solved using dst.dev instead of idev->dev
Removal of this field means routing without route cache is now using
shared data, percpu data, and only potential contention is a pair of
atomic ops on struct neighbour per forwarded packet.
About 5% speedup on routing test.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is important to move nud_state outside of the often modified cache
line (because of refcnt), to reduce false sharing in neigh_event_send()
This is a followup of commit 0ed8ddf404 (neigh: Protect neigh->ha[]
with a seqlock)
This gives a 7% speedup on routing test with IP route cache disabled.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Robin Holt tried to boot a 16TB machine and found some limits were
reached : sysctl_tcp_mem[2], sysctl_udp_mem[2]
We can switch infrastructure to use long "instead" of "int", now
atomic_long_t primitives are available for free.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Robin Holt <holt@sgi.com>
Reviewed-by: Robin Holt <holt@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
While tracking dev_base_lock users, I found decnet used it in
dnet_select_source(), but for a wrong purpose:
Writers only hold RTNL, not dev_base_lock, so readers must use RCU if
they cannot use RTNL.
Adds an rcu_head in struct dn_ifaddr and handle proper RCU management.
Adds __rcu annotation in dn_route as well.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Presently the b43legacy build fails on an sh randconfig:
In file included from include/net/dst.h:12,
from drivers/net/wireless/b43legacy/xmit.c:32:
include/net/dst_ops.h:28: error: expected ':', ',', ';', '}' or '__attribute__' before '____cacheline_aligned_in_smp'
include/net/dst_ops.h: In function 'dst_entries_get_fast':
include/net/dst_ops.h:33: error: 'struct dst_ops' has no member named 'pcpuc_entries'
include/net/dst_ops.h: In function 'dst_entries_get_slow':
include/net/dst_ops.h:41: error: 'struct dst_ops' has no member named 'pcpuc_entries'
include/net/dst_ops.h: In function 'dst_entries_add':
include/net/dst_ops.h:49: error: 'struct dst_ops' has no member named 'pcpuc_entries'
include/net/dst_ops.h: In function 'dst_entries_init':
include/net/dst_ops.h:55: error: 'struct dst_ops' has no member named 'pcpuc_entries'
include/net/dst_ops.h: In function 'dst_entries_destroy':
include/net/dst_ops.h:60: error: 'struct dst_ops' has no member named 'pcpuc_entries'
make[5]: *** [drivers/net/wireless/b43legacy/xmit.o] Error 1
make[5]: *** Waiting for unfinished jobs....
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (41 commits)
inet_diag: Make sure we actually run the same bytecode we audited.
netlink: Make nlmsg_find_attr take a const nlmsghdr*.
fib: fib_result_assign() should not change fib refcounts
netfilter: ip6_tables: fix information leak to userspace
cls_cgroup: Fix crash on module unload
memory corruption in X.25 facilities parsing
net dst: fix percpu_counter list corruption and poison overwritten
rds: Remove kfreed tcp conn from list
rds: Lost locking in loop connection freeing
de2104x: fix panic on load
atl1 : fix panic on load
netxen: remove unused firmware exports
caif: Remove noisy printout when disconnecting caif socket
caif: SPI-driver bugfix - incorrect padding.
caif: Bugfix for socket priority, bindtodev and dbg channel.
smsc911x: Set Ethernet EEPROM size to supported device's size
ipv4: netfilter: ip_tables: fix information leak to userland
ipv4: netfilter: arp_tables: fix information leak to userland
cxgb4vf: remove call to stop TX queues at load time.
cxgb4: remove call to stop TX queues at load time.
...
This will let us use it on a nlmsghdr stored inside a netlink_callback.
Signed-off-by: Nelson Elhage <nelhage@ksplice.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Changes:
o Bugfix: SO_PRIORITY for SOL_SOCKET could not be handled
in caif's setsockopt, using the struct sock attribute priority instead.
o Bugfix: SO_BINDTODEVICE for SOL_SOCKET could not be handled
in caif's setsockopt, using the struct sock attribute ifindex instead.
o Wrong assert statement for RFM layer segmentation.
o CAIF Debug channels was not working over SPI, caif_payload_info
containing padding info must be initialized.
o Check on pointer before dereferencing when unregister dev in caif_dev.c
Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (34 commits)
b43: Fix warning at drivers/mmc/core/core.c:237 in mmc_wait_for_cmd
mac80211: fix failure to check kmalloc return value in key_key_read
libertas: Fix sd8686 firmware reload
ath9k: Fix incorrect access of rate flags in RC
netfilter: xt_socket: Make tproto signed in socket_mt6_v1().
stmmac: enable/disable rx/tx in the core with a single write.
net: atarilance - flags should be unsigned long
netxen: fix kdump
pktgen: Limit how much data we copy onto the stack.
net: Limit socket I/O iovec total length to INT_MAX.
USB: gadget: fix ethernet gadget crash in gether_setup
fib: Fix fib zone and its hash leak on namespace stop
cxgb3: Fix panic in free_tx_desc()
cxgb3: fix crash due to manipulating queues before registration
8390: Don't oops on starting dev queue
dccp ccid-2: Stop polling
dccp: Refine the wait-for-ccid mechanism
dccp: Extend CCID packet dequeueing interface
dccp: Return-value convention of hc_tx_send_packet()
igbvf: fix panic on load
...
When we stop a namespace we flush the table and free one, but the
added fn_zone-s (and their hashes if grown) are leaked. Need to free.
Tries releases all its stuff in the flushing code.
Shame on us - this bug exists since the very first make-fib-per-net
patches in 2.6.27 :(
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
SYNOPSIS
size[4] Tfsync tag[2] fid[4] datasync[4]
size[4] Rfsync tag[2]
DESCRIPTION
The Tfsync transaction transfers ("flushes") all modified in-core data of
file identified by fid to the disk device (or other permanent storage
device) where that file resides.
If datasync flag is specified data will be fleshed but does not flush
modified metadata unless that metadata is needed in order to allow a
subsequent data retrieval to be correctly handled.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Synopsis
size[4] TReadlink tag[2] fid[4]
size[4] RReadlink tag[2] target[s]
Description
Readlink is used to return the contents of the symoblic link
referred by fid. Contents of symboic link is returned as a
response.
target[s] - Contents of the symbolic link referred by fid.
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Synopsis
size[4] TGetlock tag[2] fid[4] getlock[n]
size[4] RGetlock tag[2] getlock[n]
Description
TGetlock is used to test for the existence of byte range posix locks on a file
identified by given fid. The reply contains getlock structure. If the lock could
be placed it returns F_UNLCK in type field of getlock structure. Otherwise it
returns the details of the conflicting locks in the getlock structure
getlock structure:
type[1] - Type of lock: F_RDLCK, F_WRLCK
start[8] - Starting offset for lock
length[8] - Number of bytes to check for the lock
If length is 0, check for lock in all bytes starting at the location
'start' through to the end of file
pid[4] - PID of the process that wants to take lock/owns the task
in case of reply
client[4] - Client id of the system that owns the process which
has the conflicting lock
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Synopsis
size[4] TLock tag[2] fid[4] flock[n]
size[4] RLock tag[2] status[1]
Description
Tlock is used to acquire/release byte range posix locks on a file
identified by given fid. The reply contains status of the lock request
flock structure:
type[1] - Type of lock: F_RDLCK, F_WRLCK, F_UNLCK
flags[4] - Flags could be either of
P9_LOCK_FLAGS_BLOCK - Blocked lock request, if there is a
conflicting lock exists, wait for that lock to be released.
P9_LOCK_FLAGS_RECLAIM - Reclaim lock request, used when client is
trying to reclaim a lock after a server restrart (due to crash)
start[8] - Starting offset for lock
length[8] - Number of bytes to lock
If length is 0, lock all bytes starting at the location 'start'
through to the end of file
pid[4] - PID of the process that wants to take lock
client_id[4] - Unique client id
status[1] - Status of the lock request, can be
P9_LOCK_SUCCESS(0), P9_LOCK_BLOCKED(1), P9_LOCK_ERROR(2) or
P9_LOCK_GRACE(3)
P9_LOCK_SUCCESS - Request was successful
P9_LOCK_BLOCKED - A conflicting lock is held by another process
P9_LOCK_ERROR - Error while processing the lock request
P9_LOCK_GRACE - Server is in grace period, it can't accept new lock
requests in this period (except locks with
P9_LOCK_FLAGS_RECLAIM flag set)
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
SYNOPSIS
size[4] Tfsync tag[2] fid[4]
size[4] Rfsync tag[2]
DESCRIPTION
The Tfsync transaction transfers ("flushes") all modified in-core data of
file identified by fid to the disk device (or other permanent storage
device) where that file resides.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Adds __rcu annotations to inetpeer
(struct inet_peer)->avl_left
(struct inet_peer)->avl_right
This is a tedious cleanup, but removes one smp_wmb() from link_to_pool()
since we now use more self documenting rcu_assign_pointer().
Note the use of RCU_INIT_POINTER() instead of rcu_assign_pointer() in
all cases we dont need a memory barrier.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds __rcu annotation to (struct fib_rule)->ctarget
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add __rcu annotations to :
(struct ip_tunnel)->prl
(struct ip_tunnel_prl_entry)->next
(struct xfrm_tunnel)->next
struct xfrm_tunnel *tunnel4_handlers
struct xfrm_tunnel *tunnel64_handlers
And use appropriate rcu primitives to reduce sparse warnings if
CONFIG_SPARSE_RCU_POINTER=y
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add __rcu annotations to :
struct net_protocol *inet_protos
struct net_protocol *inet6_protos
And use appropriate casts to reduce sparse warnings if
CONFIG_SPARSE_RCU_POINTER=y
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add __rcu annotations to :
(struct dst_entry)->rt_next
(struct rt_hash_bucket)->chain
And use appropriate rcu primitives to reduce sparse warnings if
CONFIG_SPARSE_RCU_POINTER=y
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add __rcu annotations to :
(struct ip_ra_chain)->next
struct ip_ra_chain *ip_ra_chain;
And use appropriate rcu primitives.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add __rcu annotation to :
(struct sock)->sk_filter
And use appropriate rcu primitives to reduce sparse warnings if
CONFIG_SPARSE_RCU_POINTER=y
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
add __rcu annotation to (struct net)->gen, and use
rcu_dereference_protected() in net_assign_generic()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(struct ip6_tnl)->next is rcu protected :
(struct ip_tunnel)->next is rcu protected :
(struct xfrm6_tunnel)->next is rcu protected :
add __rcu annotation and proper rcu primitives.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(struct net_device)->garp_port is rcu protected :
(struct garp_port)->applicants is rcu protected :
add __rcu annotation and proper rcu primitives.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1699 commits)
bnx2/bnx2x: Unsupported Ethtool operations should return -EINVAL.
vlan: Calling vlan_hwaccel_do_receive() is always valid.
tproxy: use the interface primary IP address as a default value for --on-ip
tproxy: added IPv6 support to the socket match
cxgb3: function namespace cleanup
tproxy: added IPv6 support to the TPROXY target
tproxy: added IPv6 socket lookup function to nf_tproxy_core
be2net: Changes to use only priority codes allowed by f/w
tproxy: allow non-local binds of IPv6 sockets if IP_TRANSPARENT is enabled
tproxy: added tproxy sockopt interface in the IPV6 layer
tproxy: added udp6_lib_lookup function
tproxy: added const specifiers to udp lookup functions
tproxy: split off ipv6 defragmentation to a separate module
l2tp: small cleanup
nf_nat: restrict ICMP translation for embedded header
can: mcp251x: fix generation of error frames
can: mcp251x: fix endless loop in interrupt handler if CANINTF_MERRF is set
can-raw: add msg_flags to distinguish local traffic
9p: client code cleanup
rds: make local functions/variables static
...
Fix up conflicts in net/core/dev.c, drivers/net/pcmcia/smc91c92_cs.c and
drivers/net/wireless/ath/ath9k/debug.c as per David
Just like with IPv4, we need access to the UDP hash table to look up local
sockets, but instead of exporting the global udp_table, export a lookup
function.
Signed-off-by: Balazs Scheidler <bazsi@balabit.hu>
Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Like with IPv4, TProxy needs IPv6 defragmentation but does not
require connection tracking. Since defragmentation was coupled
with conntrack, I split off the two, creating an nf_defrag_ipv6 module,
similar to the already existing nf_defrag_ipv4.
Signed-off-by: Balazs Scheidler <bazsi@balabit.hu>
Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Make p9_client_version static since only used in one file.
Remove p9_client_auth because it is defined but never used.
Compile tested only.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When __inet_inherit_port() is called on a tproxy connection the wrong locks are
held for the inet_bind_bucket it is added to. __inet_inherit_port() made an
implicit assumption that the listener's port number (and thus its bind bucket).
Unfortunately, if you're using the TPROXY target to redirect skbs to a
transparent proxy that assumption is not true anymore and things break.
This patch adds code to __inet_inherit_port() so that it can handle this case
by looking up or creating a new bind bucket for the child socket and updates
callers of __inet_inherit_port() to gracefully handle __inet_inherit_port()
failing.
Reported by and original patch from Stephen Buck <stephen.buck@exinda.com>.
See http://marc.info/?t=128169268200001&r=1&w=2 for the original discussion.
Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Also, inline this function as the lookup_type is always a literal
and inlining removes branches performed at runtime.
Signed-off-by: Balazs Scheidler <bazsi@balabit.hu>
Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Without tproxy redirections an incoming SYN kicks out conflicting
TIME_WAIT sockets, in order to handle clients that reuse ports
within the TIME_WAIT period.
The same mechanism didn't work in case TProxy is involved in finding
the proper socket, as the time_wait processing code looked up the
listening socket assuming that the listener addr/port matches those
of the established connection.
This is not the case with TProxy as the listener addr/port is possibly
changed with the tproxy rule.
Signed-off-by: Balazs Scheidler <bazsi@balabit.hu>
Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
The first parameter dev isn't in use in qdisc_create_dflt().
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function rtnl_kill_links is defined but never used.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Function only defined and used in one file.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As skb->protocol is not valid in LOCAL_OUT add
parameter for address family in packet debugging functions.
Even if ports are not present in AH and ESP change them to
use ip_vs_tcpudp_debug_packet to show at least valid addresses
as before. This patch removes the last user of skb->protocol
in IPVS.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
This patch deals with local real servers:
- Add support for DNAT to local address (different real server port).
It needs ip_vs_out hook in LOCAL_OUT for both families because
skb->protocol is not set for locally generated packets and can not
be used to set 'af'.
- Skip packets in ip_vs_in marked with skb->ipvs_property because
ip_vs_out processing can be executed in LOCAL_OUT but we still
have the conn_out_get check in ip_vs_in.
- Ignore packets with inet->nodefrag from local stack
- Require skb_dst(skb) != NULL because we use it to get struct net
- Add support for changing the route to local IPv4 stack after DNAT
depending on the source address type. Local client sets output
route and the remote client sets input route. It looks like
IPv6 does not need such rerouting because the replies use
addresses from initial incoming header, not from skb route.
- All transmitters now have strict checks for the destination
address type: redirect from non-local address to local real
server requires NAT method, local address can not be used as
source address when talking to remote real server.
- Now LOCALNODE is not set explicitly as forwarding
method in real server to allow the connections to provide
correct forwarding method to the backup server. Not sure if
this breaks tools that expect to see 'Local' real server type.
If needed, this can be supported with new flag IP_VS_DEST_F_LOCAL.
Now it should be possible connections in backup that lost
their fwmark information during sync to be forwarded properly
to their daddr, even if it is local address in the backup server.
By this way backup could be used as real server for DR or TUN,
for NAT there are some restrictions because tuple collisions
in conntracks can create problems for the traffic.
- Call ip_vs_dst_reset when destination is updated in case
some real server IP type is changed between local and remote.
[ horms@verge.net.au: removed trailing whitespace ]
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
This patch is needed to avoid scheduling of
packets from local real server when we add ip_vs_in
in LOCAL_OUT hook to support local client.
Currently, when ip_vs_in can not find existing
connection it tries to create new one by calling ip_vs_schedule.
The default indication from ip_vs_schedule was if
connection was scheduled to real server. If real server is
not available we try to use the bypass forwarding method
or to send ICMP error. But in some cases we do not want to use
the bypass feature. So, add flag 'ignored' to indicate if
the scheduler ignores this packet.
Make sure we do not create new connections from replies.
We can hit this problem for persistent services and local real
server when ip_vs_in is added to LOCAL_OUT hook to handle
local clients.
Also, make sure ip_vs_schedule ignores SYN packets
for Active FTP DATA from local real server. The FTP DATA
connection should be created on SYN+ACK from client to assign
correct connection daddr.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Change skb->ipvs_property semantic. This is preparation
to support ip_vs_out processing in LOCAL_OUT. ipvs_property=1
will be used to avoid expensive lookups for traffic sent by
transmitters. Now when conntrack support is not used we call
ip_vs_notrack method to avoid problems in OUTPUT and
POST_ROUTING hooks instead of exiting POST_ROUTING as before.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Avoid full checksum calculation for apps that can provide
info whether csum was broken after payload mangling. For now only
ip_vs_ftp mangles payload and it updates the csum, so the full
recalculation is avoided for all packets.
Add CHECKSUM_UNNECESSARY for snat_handler (TCP and UDP).
It is needed to support SNAT from local address for the case
when csum is fully recalculated.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
IPv6 encapsulation uses a bad source address for the tunnel.
i.e. VIP will be used as local-addr and encap. dst addr.
Decapsulation will not accept this.
Example
LVS (eth1 2003::2:0:1/96, VIP 2003::2:0:100)
(eth0 2003::1:0:1/96)
RS (ethX 2003::1:0:5/96)
tcpdump
2003::2:0:100 > 2003::1:0:5: IP6 (hlim 63, next-header TCP (6) payload length: 40) 2003::3:0:10.50991 > 2003::2:0:100.http: Flags [S], cksum 0x7312 (correct), seq 3006460279, win 5760, options [mss 1440,sackOK,TS val 1904932 ecr 0,nop,wscale 3], length 0
In Linux IPv6 impl. you can't have a tunnel with an any cast address
receiving packets (I have not tried to interpret RFC 2473)
To have receive capabilities the tunnel must have:
- Local address set as multicast addr or an unicast addr
- Remote address set as an unicast addr.
- Loop back addres or Link local address are not allowed.
This causes us to setup a tunnel in the Real Server with the
LVS as the remote address, here you can't use the VIP address since it's
used inside the tunnel.
Solution
Use outgoing interface IPv6 address (match against the destination).
i.e. use ip6_route_output() to look up the route cache and
then use ipv6_dev_get_saddr(...) to set the source address of the
encapsulated packet.
Additionally, cache the results in new destination
fields: dst_cookie and dst_saddr and properly check the
returned dst from ip6_route_output. We now add xfrm_lookup
call only for the tunneling method where the source address
is a local one.
Signed-off-by:Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>