linux

Author	SHA1	Message	Date
Jon Paul Maloy	ec8a2e5621	tipc: same receive code path for connection protocol and data messages As a preparation to eliminate port_lock we need to bring reception of connection protocol messages under proper protection of bh_lock_sock or socket owner. We fix this by letting those messages follow the same code path as incoming data messages. As a side effect of this change, the last reference to the function net_route_msg() disappears, and we can eliminate that function. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:56 -07:00
Jon Paul Maloy	b786e2b0fa	tipc: let port protocol senders use new link send function Several functions in port.c, related to the port protocol and connection shutdown, need to send messages. We now convert them to use the new link send function. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:55 -07:00
Jon Paul Maloy	4ccfe5e041	tipc: connection oriented transport uses new send functions We move the message sending across established connections to use the message preparation and send functions introduced earlier in this series. We now do the message preparation and call to the link send function directly from the socket, instead of going via the port layer. As a consequence of this change, the functions tipc_send(), tipc_port_iovec_rcv(), tipc_port_iovec_reject() and tipc_reject_msg() become unreferenced and can be eliminated from port.c. For the same reason, the functions tipc_link_xmit_fast(), tipc_link_iovec_xmit_long() and tipc_link_iovec_fast() can be eliminated from link.c. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:55 -07:00
Jon Paul Maloy	e2dafe87d3	tipc: RDM/DGRAM transport uses new fragmenting and sending functions We merge the code for sending port name and port identity addressed messages into the corresponding send functions in socket.c, and start using the new fragmenting and transmit functions we just have introduced. This saves a call level and quite a few code lines, as well as making this part of the code easier to follow. As a consequence, the functions tipc_send2name() and tipc_send2port() in port.c can be removed. For practical reasons, we break out the code for sending multicast messages from tipc_sendmsg() and move it into a separate function, tipc_sendmcast(), but we do not yet convert it into using the new build/send functions. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:55 -07:00
Jon Paul Maloy	5a379074a7	tipc: introduce message evaluation function When a message arrives in a node and finds no destination socket, we may need to drop it, reject it, or forward it after a secondary destination lookup. The latter two cases currently results in a code path that is perceived as complex, because it follows a deep call chain via obscure functions such as net_route_named_msg() and net_route_msg(). We now introduce a function, tipc_msg_eval(), that takes the decision about whether such a message should be rejected or forwarded, but leaves it to the caller to actually perform the indicated action. If the decision is 'reject', it is still the task of the recently introduced function tipc_msg_reverse() to take the final decision about whether the message is rejectable or not. In the latter case it drops the message. As a result of this change, we can finally eliminate the function net_route_named_msg(), and hence become independent of net_route_msg(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:55 -07:00
Jon Paul Maloy	8db1bae30b	tipc: separate building and sending of rejected messages The way we build and send rejected message is currenty perceived as hard to follow, partly because we let the transmission go via deep call chains through functions such as tipc_reject_msg() and net_route_msg(). We want to remove those functions, and make the call sequences shallower and simpler. For this purpose, we separate building and sending of rejected messages. We build the reject message using the new function tipc_msg_reverse(), and let the transmission go via the newly introduced tipc_link_xmit2() function, as all transmission eventually will do. We also ensure that all calls to tipc_link_xmit2() are made outside port_lock/bh_lock_sock. Finally, we replace all calls to tipc_reject_msg() with the two new calls at all locations in the code that we want to keep. The remaining calls are made from code that we are planning to remove, along with tipc_reject_msg() itself. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:55 -07:00
Jon Paul Maloy	067608e9d0	tipc: introduce direct iovec to buffer chain fragmentation function Fragmentation at message sending is currently performed in two places in link.c, depending on whether data to be transmitted is delivered in the form of an iovec or as a big sk_buff. Those functions are also tightly entangled with the send functions that are using them. We now introduce a re-entrant, standalone function, tipc_msg_build2(), that builds a packet chain directly from an iovec. Each fragment is sized according to the MTU value given by the caller, and is prepended with a correctly built fragment header, when needed. The function is independent from who is calling and where the chain will be delivered, as long as the caller is able to indicate a correct MTU. The function is tested, but not called by anybody yet. Since it is incompatible with the existing tipc_msg_build(), and we cannot yet remove that function, we have given it a temporary name. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:55 -07:00
Jon Paul Maloy	16e166b88c	tipc: make link mtu easily accessible from socket Message fragmentation is currently performed at link level, inside the protection of node_lock. This potentially binds up the sending link structure for a long time, instead of letting it do other tasks, such as handle reception of new packets. In this commit, we make the MTUs of each active link become easily accessible from the socket level, i.e., without taking any spinlock or dereferencing the target link pointer. This way, we make it possible to perform fragmentation in the sending socket, before sending the whole fragment chain to the link for transport. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:55 -07:00
Jon Paul Maloy	4f1688b2c6	tipc: introduce send functions for chained buffers in link The current link implementation provides several different transmit functions, depending on the characteristics of the message to be sent: if it is an iovec or an sk_buff, if it needs fragmentation or not, if the caller holds the node_lock or not. The permutation of these options gives us an unwanted amount of unnecessarily complex code. As a first step towards simplifying the send path for all messages, we introduce two new send functions at link level, tipc_link_xmit2() and __tipc_link_xmit2(). The former looks up a link to the message destination, and if one is found, it grabs the node lock and calls the second function, which works exclusively inside the node lock protection. If no link is found, and the destination is on the same node, it delivers the message directly to the local destination socket. The new functions take a buffer chain where all packet headers are already prepared, and the correct MTU has been used. These two functions will later replace all other link-level transmit functions. The functions are not backwards compatible, so we have added them as new functions with temporary names. They are tested, but have no users yet. Those will be added later in this series. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:54 -07:00
Jon Paul Maloy	e4de5fab80	tipc: use negative error return values in functions In some places, TIPC functions returns positive integers as return codes. This goes against standard Linux coding practice, and may even cause problems in some cases. We now change the return values of the functions filter_rcv() and filter_connect() to become signed integers, and return negative error codes when needed. The codes we use in these particular cases are still TIPC specific, since they are both part of the TIPC API and have no correspondence in errno.h Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:54 -07:00
Jon Paul Maloy	3d09fc4244	tipc: eliminate case of writing to freed memory In the function tipc_nodesub_notify() we call a function pointer aggregated into the object to be notified, whereafter we set the function pointer to NULL. However, in some cases the function pointed to will free the struct containing the function pointer, resulting in a write to already freed memory. This bug seems to always have been there, without causing any notable harm. In this commit we fix the problem by inverting the order of the zeroing and the function call. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:54 -07:00
Michael S. Tsirkin	ac5ccdba3a	iovec: move memcpy_from/toiovecend to lib/iovec.c ERROR: "memcpy_fromiovecend" [drivers/vhost/vhost_scsi.ko] undefined! commit `9f977ef7b6` vhost-scsi: Include prot_bytes into expected data transfer length in target-pending makes drivers/vhost/scsi.c call memcpy_fromiovecend(). This function is not available when CONFIG_NET is not enabled. socket.h already includes uio.h, so no callers need updating. Reported-by: Randy Dunlap <rdunlap@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2014-06-27 11:47:58 -07:00
John W. Linville	f9fa39e9ac	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem	2014-06-27 13:35:56 -04:00
Pablo Neira Ayuso	09d27b88f1	netfilter: nft_log: complete logging support Use the unified nf_log_packet() interface that allows us explicit logger selection through the nf_loginfo structure. If you specify the group attribute, this means you want to receive logging messages through nfnetlink_log. In that case, the snaplen and qthreshold attributes allows you to tune internal aspects of the netlink logging infrastructure. On the other hand, if the level is specified, then the plain text format through the kernel logging ring is used instead, which is also used by default if neither group nor level are indicated. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-27 13:21:37 +02:00
Pablo Neira Ayuso	85d30e2416	netfilter: nft_log: request explicit logger when loading rules This includes the special handling for NFPROTO_INET. There is no real inet logger since we don't see packets of this family. However, rules are loaded using this special family type. So let's just request both IPV4 and IPV6 loggers. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-27 13:21:30 +02:00
Pablo Neira Ayuso	960649d192	netfilter: bridge: add generic packet logger This adds the generic plain text packet loggger for bridged packets. It routes the logging message to the real protocol packet logger. I decided not to refactor the ebt_log code for two reasons: 1) The ebt_log output is not consistent with the IPv4 and IPv6 Netfilter packet loggers. The output is different for no good reason and it adds redundant code to handle packet logging. 2) To avoid breaking backward compatibility for applications outthere that are parsing the specific ebt_log output, the ebt_log output has been left as is. So only nftables will use the new consistent logging format for logged bridged packets. More decisions coming in this patch: 1) This also removes ebt_log as default logger for bridged packets. Thus, nf_log_packet() routes packet to this new packet logger instead. This doesn't break backward compatibility since nf_log_packet() is not used to log packets in plain text format from anywhere in the ebtables/netfilter bridge code. 2) The new bridge packet logger also performs a lazy request to register the real IPv4, ARP and IPv6 netfilter packet loggers. If the real protocol logger is no available (not compiled or the module is not available in the system, not packet logging happens. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-27 13:20:47 +02:00
Pablo Neira Ayuso	35b9395104	netfilter: add generic ARP packet logger This adds the generic plain text packet loggger for ARP packets. It is based on the ebt_log code. Nevertheless, the output has been modified to make it consistent with the original xt_LOG output. This is an example output: IN=wlan0 OUT= ARP HTYPE=1 PTYPE=0x0800 OPCODE=2 MACSRC=00🆎12:34:55:63 IPSRC=192.168.10.1 MACDST=80:09:12:70:4f:50 IPDST=192.168.10.150 This patch enables packet logging from ARP chains, eg. nft add rule arp filter input log prefix "input: " Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-27 13:20:38 +02:00
Pablo Neira Ayuso	fab4085f4e	netfilter: log: nf_log_packet() as real unified interface Before this patch, the nf_loginfo parameter specified the logging configuration in case the specified default logger was loaded. This patch updates the semantics of the nf_loginfo parameter in nf_log_packet() which now indicates the logger that you explicitly want to use. Thus, nf_log_packet() is exposed as an unified interface which internally routes the log message to the corresponding logger type by family. The module dependencies are expressed by the new nf_logger_find_get() and nf_logger_put() functions which bump the logger module refcount. Thus, you can not remove logger modules that are used by rules anymore. Another important effect of this change is that the family specific module is only loaded when required. Therefore, xt_LOG and nft_log will just trigger the autoload of the nf_log_{ip,ip6} modules according to the family. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-27 13:20:13 +02:00
Pablo Neira Ayuso	83e96d443b	netfilter: log: split family specific code to nf_log_{ip,ip6,common}.c files The plain text logging is currently embedded into the xt_LOG target. In order to be able to use the plain text logging from nft_log, as a first step, this patch moves the family specific code to the following files and Kconfig symbols: 1) net/ipv4/netfilter/nf_log_ip.c: CONFIG_NF_LOG_IPV4 2) net/ipv6/netfilter/nf_log_ip6.c: CONFIG_NF_LOG_IPV6 3) net/netfilter/nf_log_common.c: CONFIG_NF_LOG_COMMON These new modules will be required by xt_LOG and nft_log. This patch is based on original patch from Arturo Borrero Gonzalez. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-27 13:19:59 +02:00
Hangbin Liu	e940f5d6ba	ipv6: Fix MLD Query message check Based on RFC3810 6.2, we also need to check the hop limit and router alert option besides source address. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 00:21:50 -07:00
James M Leddy	3e215c8d1b	udp: Add MIB counters for rcvbuferrors Add MIB counters for rcvbuferrors in UDP to help diagnose problems. Signed-off-by: James M Leddy <james.leddy@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 00:20:55 -07:00
Mathias Krause	1990e4f883	vti: Simplify error handling in module init and exit The error handling in the module init and exit functions can be shortened to safe us some code. 1/ Remove the code duplications in the init function, jump straight to the existing cleanup code by adding some labels. Also give the error message some more value by telling the reason why loading the module has failed. Furthermore fix the "IPSec" typo -- it should be "IPsec" instead. 2/ Remove the error handling in the exit function as the only legitimate reason xfrm4_protocol_deregister() might fail is inet_del_protocol() returning -1. That, in turn, means some other protocol handler had been registered for this very protocol in the meantime. But that essentially means we haven't been handling that protocol any more, anyway. What it definitely means not is that we "can't deregister tunnel". Therefore just get rid of that bogus warning. It's plain wrong. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-06-26 08:21:57 +02:00
Mathias Krause	e59d82fd33	vti6: Simplify error handling in module init and exit The error handling in the module init and exit functions can be shortened to safe us some code. 1/ Remove the code duplications in the init function, jump straight to the existing cleanup code by adding some labels. Also give the error message some more value by telling the reason why loading the module has failed. 2/ Remove the error handling in the exit function as the only legitimate reason xfrm6_protocol_deregister() might fail is inet6_del_protocol() returning -1. That, in turn, means some other protocol handler had been registered for this very protocol in the meantime. But that essentially means we haven't been handling that protocol any more, anyway. What it definitely means not is that we "can't deregister protocol". Therefore just get rid of that bogus warning. It's plain wrong. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-06-26 08:21:57 +02:00
Steffen Klassert	b7eea4545e	xfrm: Fix refcount imbalance in xfrm_lookup xfrm_lookup must return a dst_entry with a refcount for the caller. Git commit `1a1ccc96ab` ("xfrm: Remove caching of xfrm_policy_sk_bundles") removed this refcount for the socket policy case accidentally. This patch restores it and sets DST_NOCACHE flag to make sure that the dst_entry is freed when the refcount becomes null. Fixes: `1a1ccc96ab` ("xfrm: Remove caching of xfrm_policy_sk_bundles") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-06-26 07:52:42 +02:00
David S. Miller	9b8d90b963	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2014-06-25 22:40:43 -07:00
Linus Torvalds	f40ede392d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Fix crash in ipvs tot_stats estimator, from Julian Anastasov. 2) Fix OOPS in nf_nat on netns removal, from Florian Westphal. 3) Really really really fix locking issues in slip and slcan tty write wakeups, from Tyler Hall. 4) Fix checksum offloading in fec driver, from Fugang Duan. 5) Off by one in BPF instruction limit test, from Kees Cook. 6) Need to clear all TSO capability flags when doing software TSO in tg3 driver, from Prashant Sreedharan. 7) Fix memory leak in vlan_reorder_header() error path, from Li RongQing. 8) Fix various bugs in xen-netfront and xen-netback multiqueue support, from David Vrabel and Wei Liu. 9) Fix deadlock in cxgb4 driver, from Li RongQing. 10) Prevent double free of no-cache DST entries, from Eric Dumazet. 11) Bad csum_start handling in skb_segment() leads to crashes when forwarding, from Tom Herbert. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits) net: fix setting csum_start in skb_segment() ipv4: fix dst race in sk_dst_get() net: filter: Use kcalloc/kmalloc_array to allocate arrays trivial: net: filter: Change kerneldoc parameter order trivial: net: filter: Fix typo in comment net: allwinner: emac: Add missing free_irq cxgb4: use dev_port to identify ports xen-netback: bookkeep number of active queues in our own module tg3: Change nvram command timeout value to 50ms cxgb4: Not need to hold the adap_rcu_lock lock when read adap_rcu_list be2net: fix qnq mode detection on VFs of: mdio: fixup of_phy_register_fixed_link parsing of new bindings at86rf230: fix irq setup net: phy: at803x: fix coccinelle warnings net/mlx4_core: Fix the error flow when probing with invalid VF configuration tulip: Poll link status more frequently for Comet chips net: huawei_cdc_ncm: increase command buffer size drivers: net: cpsw: fix dual EMAC stall when connected to same switch xen-netfront: recreate queues correctly when reconnecting xen-netfront: fix oops when disconnected from backend ...	2014-06-25 21:08:24 -07:00
Tom Herbert	de843723f9	net: fix setting csum_start in skb_segment() Dave Jones reported that a crash is occurring in csum_partial tcp_gso_segment inet_gso_segment ? update_dl_migration skb_mac_gso_segment __skb_gso_segment dev_hard_start_xmit sch_direct_xmit __dev_queue_xmit ? dev_hard_start_xmit dev_queue_xmit ip_finish_output ? ip_output ip_output ip_forward_finish ip_forward ip_rcv_finish ip_rcv __netif_receive_skb_core ? __netif_receive_skb_core ? trace_hardirqs_on __netif_receive_skb netif_receive_skb_internal napi_gro_complete ? napi_gro_complete dev_gro_receive ? dev_gro_receive napi_gro_receive It looks like a likely culprit is that SKB_GSO_CB()->csum_start is not set correctly when doing non-scatter gather. We are using offset as opposed to doffset. Reported-by: Dave Jones <davej@redhat.com> Tested-by: Dave Jones <davej@redhat.com> Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: `7e2b10c1e5` ("net: Support for multiple checksums with gso") Acked-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-25 20:45:54 -07:00
Eric Dumazet	f886497212	ipv4: fix dst race in sk_dst_get() When IP route cache had been removed in linux-3.6, we broke assumption that dst entries were all freed after rcu grace period. DST_NOCACHE dst were supposed to be freed from dst_release(). But it appears we want to keep such dst around, either in UDP sockets or tunnels. In sk_dst_get() we need to make sure dst refcount is not 0 before incrementing it, or else we might end up freeing a dst twice. DST_NOCACHE set on a dst does not mean this dst can not be attached to a socket or a tunnel. Then, before actual freeing, we need to observe a rcu grace period to make sure all other cpus can catch the fact the dst is no longer usable. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dormando <dormando@rydia.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-25 17:41:44 -07:00
Tobias Klauser	99e72a0fed	net: filter: Use kcalloc/kmalloc_array to allocate arrays Use kcalloc/kmalloc_array to make it clear we're allocating arrays. No integer overflow can actually happen here, since len/flen is guaranteed to be less than BPF_MAXINSNS (4096). However, this changed makes sure we're not going to get one if BPF_MAXINSNS were ever increased. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Acked-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-25 16:40:02 -07:00
Tobias Klauser	677a9fd3e6	trivial: net: filter: Change kerneldoc parameter order Change the order of the parameters to sk_unattached_filter_create() in the kerneldoc to reflect the order they appear in the actual function. This fix is only cosmetic, in the generated doc they still appear in the correct order without the fix. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Acked-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-25 16:38:54 -07:00
Tobias Klauser	285276e72c	trivial: net: filter: Fix typo in comment Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Acked-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-25 16:38:54 -07:00
Eric Dumazet	f6d8cb2eed	inet: reduce TLB pressure for listeners It seems overkill to use vmalloc() for typical listeners with less than 2048 hash buckets. Try kmalloc() and fallback to vmalloc() to reduce TLB pressure. Use kvfree() helper as it is now available. Use ilog2() instead of a loop. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-25 16:37:24 -07:00
Fabian Frederick	1f74714f1e	net/dsa/dsa.c: remove unnecessary null test before kfree Fix checkpatch warning: WARNING: kfree(NULL) is safe this check is probably not required Cc: "David S. Miller" <davem@davemloft.net> Cc: Grant Likely <grant.likely@linaro.org> Cc: netdev@vger.kernel.org Cc: Joe Perches <joe@perches.com> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-25 16:15:16 -07:00
John W. Linville	8b87efba61	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth	2014-06-25 14:22:35 -04:00
John W. Linville	12307e4208	Merge git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211	2014-06-25 14:17:50 -04:00
Pablo Neira Ayuso	27fd8d90c9	netfilter: nf_log: move log buffering to core logging This patch moves Eric Dumazet's log buffer implementation from the xt_log.h header file to the core net/netfilter/nf_log.c. This also includes the renaming of the structure and functions to avoid possible undesired namespace clashes. This change allows us to use it from the arp and bridge packet logging implementation in follow up patches.	2014-06-25 19:28:43 +02:00
Pablo Neira Ayuso	5962815a6a	netfilter: nf_log: use an array of loggers instead of list Now that legacy ulog targets are not available anymore in the tree, we can have up to two possible loggers: 1) The plain text logging via kernel logging ring. 2) The nfnetlink_log infrastructure which delivers log messages to userspace. This patch replaces the list of loggers by an array of two pointers per family for each possible logger and it also introduces a new field to the nf_logger structure which indicates the position in the logger array (based on the logger type). This prepares a follow up patch that consolidates the nf_log_packet() interface by allowing to specify the logger as parameter. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-25 19:28:43 +02:00
Pablo Neira Ayuso	7200135bc1	netfilter: kill ulog targets This has been marked as deprecated for quite some time and the NFLOG target replacement has been also available since 2006. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-25 19:28:43 +02:00
Florian Westphal	9500507c61	netfilter: conntrack: remove timer from ecache extension This brings the (per-conntrack) ecache extension back to 24 bytes in size (was 152 byte on x86_64 with lockdep on). When event delivery fails, re-delivery is attempted via work queue. Redelivery is attempted at least every 0.1 seconds, but can happen more frequently if userspace is not congested. The nf_ct_release_dying_list() function is removed. With this patch, ownership of the to-be-redelivered conntracks (on-dying-list-with-DYING-bit not yet set) is with the work queue, which will release the references once event is out. Joint work with Pablo Neira Ayuso. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-25 19:15:38 +02:00
Michal Kazior	97dc94f1d9	cfg80211: remove channel_switch combination check Driver is now responsible for veryfing if the switch is possible. Since this is inherently tricky driver may decide to disconnect an interface later with cfg80211_stop_iface(). This doesn't mean driver can accept everything. It should do it's best to verify requests and reject them as soon as possible. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-06-25 18:06:20 +02:00
Michal Kazior	4c3ebc56d7	mac80211: use chanctx reservation for STA CSA Channel switch finalization is now 2-step. First step is when driver calls chswitch_done(), the other is when reservation is actually finalized (which be defered for in-place reservation). It is now safe to call ieee80211_chswitch_done() more than once. Also remove the ieee80211_vif_change_channel() because it is no longer used. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-06-25 18:06:20 +02:00
Michal Kazior	03078de4f9	mac80211: use chanctx reservation for AP CSA Channel switch finalization is now 2-step. First step is when driver calls csa_finish(), the other is when reservation is actually finalized (which can be deferred for in-place reservation). It is now safe to call ieee80211_csa_finish() more than once. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-06-25 18:06:20 +02:00
Michal Kazior	71e6195ed2	mac80211: make check_combinations() aware of chanctx reservation The ieee80211_check_combinations() computes radar_detect accordingly depending on chanctx reservation status. This makes it possible to use the function for channel_switch validation. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-06-25 18:06:20 +02:00
Michal Kazior	5bcae31d9c	mac80211: implement multi-vif in-place reservations Multi-vif in-place reservations happen when it is impossible to allocate more channel contexts as indicated by interface combinations. Such reservations are not finalized until all assigned interfaces are ready. This still doesn't handle all possible cases (i.e. degradation of number of channels) properly. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-06-25 18:06:20 +02:00
Eric Dumazet	f6b50824f7	netfilter: x_tables: xt_free_table_info() cleanup kvfree() helper can make xt_free_table_info() much cleaner. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-25 14:52:16 +02:00
Fabian Frederick	397304b52d	netfilter: ctnetlink: remove null test before kfree Fix checkpatch warning: WARNING: kfree(NULL) is safe this check is probably not required Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-25 14:49:38 +02:00
David Spinadel	633e271326	mac80211: split sched scan IEs Split sched scan IEs to band specific and not band specific blocks. Common IEs blocks may be sent to the FW once per command, instead of per band. This allows optimization of size of the command, which may be required by some drivers (eg. iwlmvm with newer firmware version). As this changes the mac80211 API, update all drivers to use the new version correctly, even if they don't (yet) make use of the split data. Signed-off-by: David Spinadel <david.spinadel@intel.com> Reviewed-by: Alexander Bondar <alexander.bondar@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-06-25 09:10:43 +02:00
David Spinadel	c56ef67250	mac80211: support more than one band in scan request Some drivers (such as iwlmvm) can handle multiple bands in a single HW scan request. Add a HW flag to indicate that the driver support this. To hold the required data, create a separate structure for HW scan request that holds cfg scan request and data about different parts of the scan IEs. As this changes the mac80211 API, update all drivers using it to use the correct new function type/argument. Signed-off-by: David Spinadel <david.spinadel@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-06-25 09:10:42 +02:00
Andy Adamson	66b0686049	NFSv4: test SECINFO RPC_AUTH_GSS pseudoflavors for support Fix nfs4_negotiate_security to create an rpc_clnt used to test each SECINFO returned pseudoflavor. Check credential creation (and gss_context creation) which is important for RPC_AUTH_GSS pseudoflavors which can fail for multiple reasons including mis-configuration. Don't call nfs4_negotiate in nfs4_submount as it was just called by nfs4_proc_lookup_mountpoint (nfs4_proc_lookup_common) Signed-off-by: Andy Adamson <andros@netapp.com> [Trond: fix corrupt return value from nfs_find_best_sec()] Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-06-24 18:46:58 -04:00
Johannes Berg	02df00eb00	nl80211: move set_qos_map command into split state The non-split wiphy state shouldn't be increased in size so move the new set_qos_map command into the split if statement. Cc: stable@vger.kernel.org (3.14+) Fixes: `fa9ffc7456` ("cfg80211: Add support for QoS mapping") Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-06-24 16:13:10 +02:00
Rasmus Villemoes	79631c89ed	trivial: net/irda/irlmp.c: Fix closing brace followed by if Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-23 15:04:33 -07:00
Govindarajulu Varadarajan	e0f31d8498	flow_keys: Record IP layer protocol in skb_flow_dissect() skb_flow_dissect() dissects only transport header type in ip_proto. It dose not give any information about IPv4 or IPv6. This patch adds new member, n_proto, to struct flow_keys. Which records the IP layer type. i.e IPv4 or IPv6. This can be used in netdev->ndo_rx_flow_steer driver function to dissect flow. Adding new member to flow_keys increases the struct size by around 4 bytes. This causes BUILD_BUG_ON(sizeof(qcb->data) < sz); to fail in qdisc_cb_private_validate() So increase data size by 4 Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-23 14:32:19 -07:00
Kinglong Mee	f15a5cf912	SUNRPC/NFSD: Change to type of bool for rq_usedeferral and rq_splice_ok rq_usedeferral and rq_splice_ok are used as 0 and 1, just defined to bool. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-06-23 11:31:36 -04:00
Arik Nemtsov	ee10f2c779	mac80211: protect TDLS discovery session After sending a TDLS discovery-request, we expect a reply to arrive on the AP's channel. We must stay on the channel (no PSM, scan, etc.), since a TDLS setup-response is a direct packet not buffered by the AP. Add a new mac80211 driver callback to allow discovery session protection. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-06-23 14:28:19 +02:00
Arik Nemtsov	7adc3e4664	mac80211: make sure TDLS peer STA exists during setup Make sure userspace added a TDLS peer station before invoking the transmission of the first setup frame. This ensures packets to the peer won't go through the AP path. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-06-23 14:28:18 +02:00
Arik Nemtsov	c887f0d3a0	mac80211: add API to request TDLS operation from userspace Write a mac80211 to the cfg80211 API for requesting a userspace TDLS operation. Define TDLS specific reason codes that can be used here. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-06-23 14:28:17 +02:00
Arik Nemtsov	db67d661e8	mac80211: implement proper Tx path flushing for TDLS As the spec mandates, flush data in the AP path before transmitting the first setup frame. Data packets transmitted during setup are already dropped in the Tx path. For the teardown flow, flush all packets in the direct path before transmitting the teardown frame. Un-authorize the peer sta after teardown is sent, forcing all subsequent Tx to the peer through the AP. Make sure to flush the queues when disabling the link to get the teardown packet out. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> [adjust to Luca's new quuee API and stop only vif queues] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-06-23 14:27:58 +02:00
Arik Nemtsov	191dd46905	mac80211: split tdls_mgmt function There are setup/teardown specific actions to be done that accompany the sending of a TDLS management packet. Split the main function to simplify future additions. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-06-23 14:24:55 +02:00
Arik Nemtsov	2fb6b9b8e5	mac80211: use TDLS initiator in tdls_mgmt operations The TDLS initiator is set once during link setup. If determines the address ordering in the link identifier IE. Use the value from userspace in order to have a correct teardown packet. With the current code, a teardown from the responder side fails the TDLS MIC check because of a bad link identifier IE. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-06-23 14:24:55 +02:00
Arik Nemtsov	31fa97c5de	cfg80211: pass TDLS initiator in tdls_mgmt operations The TDLS initiator is set once during link setup. If determines the address ordering in the link identifier IE. Fix dependent drivers - mwifiex and mac80211. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-06-23 14:24:55 +02:00
Arik Nemtsov	17e6a59a36	mac80211: cleanup TDLS state during failed setup When setting up a TDLS session, register a delayed work to remove the peer if setup times out. Prevent concurrent setups to support this capacity. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-06-23 14:24:55 +02:00
Arik Nemtsov	68885a54cd	mac80211: set auth flags after other station info For TDLS, the AUTHORIZED flag arrives with all other important station info (supported rates, HT/VHT caps, ...). Make sure to set the station state in the low-level driver after transferring this information to the mac80211 STA entry. This aligns the STA information during sta_state callbacks with the non-TDLS case. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-06-23 14:24:55 +02:00
Arik Nemtsov	9deba04d0f	mac80211: clarify TDLS Tx handling Rename the flags used in the Tx path and add an explanation for the reasons to drop, send directly or through the AP. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-06-23 14:24:54 +02:00
Luciano Coelho	a46992b441	mac80211: stop only the queues assigned to the vif during channel switch Instead of stopping all the hardware queues during channel switch, which is especially bad when we have large CSA counts, stop only the queues that are assigned to the vif that is performing the channel switch. Additionally, check for (sdata->csa_block_tx) instead of calling ieee80211_csa_needs_block_tx(), which can now be removed. Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-06-23 14:22:29 +02:00
Luciano Coelho	26da23b695	mac80211: add functions to stop and wake all queues assigned to a vif In some cases we may want to stop the queues of a single vif (for instance during a channel-switch). Add a function that stops all the queues that are assigned to a vif. If a queue is assigned to more than one vif, the corresponding netdev subqueue of the other vif(s) will also be stopped. If the HW doesn't set the IEEE80211_HW_QUEUE_CONTROL flag, then all queues are stopped. Also add a corresponding function to wake the queues of a vif back. Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-06-23 14:22:27 +02:00
Luciano Coelho	cca07b00a5	mac80211: introduce refcount for queue_stop_reasons Sometimes different vifs may be stopping the queues for the same reason (e.g. when several interfaces are performing a channel switch). Instead of using a bitmask for the reasons, use an integer that holds a refcount instead. In order to keep it backwards compatible, introduce a boolean in some functions that tell us whether the queue stopping should be refcounted or not. For now, use not refcounted for all calls to keep it functionally the same as before. Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-06-23 14:22:25 +02:00
Luciano Coelho	59f48fe22f	mac80211: don't stop all queues when flushing There is no need to stop all queues when we want to flush specific queues, so stop only the queues that will be flushed. Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-06-23 14:22:24 +02:00
Thomas Gleixner	8d7b70fb7b	net: Mac80211: Remove silly timespec dance Converting time from one format to another seems to give coders a warm and fuzzy feeling. Use the proper interfaces. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John W. Linville <linville@tuxdriver.com> [fix compile error] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-06-23 14:22:21 +02:00
Thomas Gleixner	181715203b	mac80211: Use ktime_get_ts() do_posix_clock_monotonic_gettime() is a leftover from the initial posix timer implementation which maps to ktime_get_ts(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-06-23 14:22:18 +02:00
Michal Kazior	10d78f2782	mac80211: use csa counter offsets instead of csa_active vif->csa_active is protected by mutexes only. This means it is unreliable to depend on it on codeflow in non-sleepable beacon and CSA code. There was no guarantee to have vif->csa_active update be visible before beacons are updated on SMP systems. Using csa counter offsets which are embedded in beacon struct (and thus are protected with single RCU assignment) is much safer. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-06-23 14:22:16 +02:00
Michal Kazior	af296bdb8d	mac80211: move csa counters from sdata to beacon/presp Having csa counters part of beacon and probe_resp structures makes it easier to get rid of possible races between setting a beacon and updating counters on SMP systems by guaranteeing counters are always consistent against given beacon struct. While at it relax WARN_ON into WARN_ON_ONCE to prevent spamming logs and racing. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> [remove pointless array check] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-06-23 14:22:06 +02:00
Eliad Peller	0ce12026d6	cfg80211: fix elapsed_jiffies calculation MAX_JIFFY_OFFSET has no meaning when calculating the elapsed jiffies, as jiffies run out until ULONG_MAX. This miscalculation results in erroneous values in case of a wrap-around. Signed-off-by: Eliad Peller <eliad@wizery.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-06-23 11:29:25 +02:00
Johannes Berg	e33e2241e2	Revert "cfg80211: Use 5MHz bandwidth by default when checking usable channels" This reverts commit `8eca1fb692`. Felix notes that this broke regulatory, leaving channel 12 open for AP operation in the US regulatory domain where it isn't permitted. Link: http://mid.gmane.org/53A6C0FF.9090104@openwrt.org Reported-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-06-23 11:06:28 +02:00
Janusz Dziedzic	b49328361b	mac80211: allow tx via monitor iface when DFS Allow send frames using monitor interface when DFS chandef and we pass CAC (beaconing allowed). This fix problem when old kernel and new backports used, in such case hostapd create/use also monitor interface. Before this patch all frames hostapd send using monitor iface were dropped when AP was configured on DFS channel. Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-06-23 11:05:35 +02:00
Johannes Berg	b7ffbd7ef6	cfg80211: make ethtool the driver's responsibility Currently, cfg80211 tries to implement ethtool, but that doesn't really scale well, with all the different operations. Make the lower-level driver responsible for it, which currently only has an effect on mac80211. It will similarly not scale well at that level though, since mac80211 also has many drivers. To cleanly implement this in mac80211, introduce a new file and move some code to appropriate places. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-06-23 11:05:33 +02:00
Johannes Berg	ba9030c20a	mac80211: remove weak WEP IV accounting Since WEP is practically dead, there seems very little point in keeping WEP weak IV accounting. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-06-23 11:05:31 +02:00
Antonio Ospite	b314c66990	trivial: net/mac80211/mesh.c: fix typo s/Substract/Subtract/ Signed-off-by: Antonio Ospite <ao2@ao2.it> Cc: Luis Carlos Cobo <luisca@cozybit.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: linux-wireless@vger.kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-06-23 11:05:29 +02:00
Bob Copeland	2b470c39e8	mac80211: remove ignore_plink_timer flag The mesh_plink code is doing some interesting things with the ignore_plink_timer flag. It seems the original intent was to handle this race: cpu 0 cpu 1 ----- ----- start timer handler for state X acquire sta_lock change state from X to Y mod_timer() / del_timer() release sta_lock acquire sta_lock execute state Y timer too soon However, using the mod_timer()/del_timer() return values to detect these cases is broken. As a result, timers get ignored unnecessarily, and stations can get stuck in the peering state machine. Instead, we can detect the case by looking at the timer expiration. In the case of del_timer, just ignore the timers in the following (LISTEN/ESTAB) states since they won't have timers anyway. Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-06-23 11:05:27 +02:00
Johannes Berg	5ac2e35030	mac80211: fix station/driver powersave race It is currently possible to have a race due to the station PS unblock work like this: * station goes to sleep with frames buffered in the driver * driver blocks wakeup * station wakes up again * driver flushes/returns frames, and unblocks, which schedules the unblock work * unblock work starts to run, and checks that the station is awake (i.e. that the WLAN_STA_PS_STA flag isn't set) * we process a received frame with PM=1, setting the flag again * ieee80211_sta_ps_deliver_wakeup() runs, delivering all frames to the driver, and then clearing the WLAN_STA_PS_DRIVER and WLAN_STA_PS_STA flags In this scenario, mac80211 will think that the station is awake, while it really is asleep, and any TX'ed frames should be filtered by the device (it will know that the station is sleeping) but then passed to mac80211 again, which will not buffer it either as it thinks the station is awake, and eventually the packets will be dropped. Fix this by moving the clearing of the flags to exactly where we learn about the situation. This creates a problem of reordering, so introduce another flag indicating that delivery is being done, this new flag also queues frames and is cleared only while the spinlock is held (which the queuing code also holds) so that any concurrent delivery/TX is handled correctly. Reported-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-06-23 11:05:25 +02:00
John W. Linville	20edb50e59	mac80211: remove PID rate control Minstrel has long since proven its worth. Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-06-23 11:05:23 +02:00
Max Stepanov	744462a91e	mac80211: WEP extra head/tail room in ieee80211_send_auth After skb allocation and call to ieee80211_wep_encrypt in ieee80211_send_auth the flow fails with a warning in ieee80211_wep_add_iv on verification of available head/tailroom needed for WEP_IV and WEP_ICV. Signed-off-by: Max Stepanov <Max.Stepanov@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-06-23 10:45:14 +02:00
Duan Jiong	2b74e2caec	net: em_canid: remove useless statements from em_canid_change tcf_ematch is allocated by kzalloc in function tcf_em_tree_validate(), so cm_old is always NULL. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-21 15:40:22 -07:00
Li RongQing	a3f5ee71cd	bridge: use list_for_each_entry_continue_reverse use list_for_each_entry_continue_reverse to rollback in fdb_add_hw when add address failed Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-21 15:33:22 -07:00
Li RongQing	916c1689a0	8021q: fix a potential memory leak skb_cow called in vlan_reorder_header does not free the skb when it failed, and vlan_reorder_header returns NULL to reset original skb when it is called in vlan_untag, lead to a memory leak. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-21 15:12:13 -07:00
Lukasz Rymanowski	1d56dc4f5f	Bluetooth: Fix for ACL disconnect when pairing fails When pairing fails hci_conn refcnt drops below zero. This cause that ACL link is not disconnected when disconnect timeout fires. Probably this is because l2cap_conn_del calls l2cap_chan_del for each channel, and inside l2cap_chan_del conn is dropped. After that loop hci_chan_del is called which also drops conn. Anyway, as it is desrcibed in hci_core.h, it is known that refcnt drops below 0 sometimes and it should be fine. If so, let disconnect link when hci_conn_timeout fires and refcnt is 0 or below. This patch does it. This affects PTS test SM_TC_JW_BV_05_C Logs from scenario: [69713.706227] [6515] pair_device: [69713.706230] [6515] hci_conn_add: hci0 dst 00:1b:dc:06:06:22 [69713.706233] [6515] hci_dev_hold: hci0 orig refcnt 8 [69713.706235] [6515] hci_conn_init_sysfs: conn ffff88021f65a000 [69713.706239] [6515] hci_req_add_ev: hci0 opcode 0x200d plen 25 [69713.706242] [6515] hci_prepare_cmd: skb len 28 [69713.706243] [6515] hci_req_run: length 1 [69713.706248] [6515] hci_conn_hold: hcon ffff88021f65a000 orig refcnt 0 [69713.706251] [6515] hci_dev_put: hci0 orig refcnt 9 [69713.706281] [8909] hci_cmd_work: hci0 cmd_cnt 1 cmd queued 1 [69713.706288] [8909] hci_send_frame: hci0 type 1 len 28 [69713.706290] [8909] hci_send_to_monitor: hdev ffff88021f0c7000 len 28 [69713.706316] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000 [69713.706382] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000 [69713.711664] [8909] hci_rx_work: hci0 [69713.711668] [8909] hci_send_to_monitor: hdev ffff88021f0c7000 len 6 [69713.711680] [8909] hci_rx_work: hci0 Event packet [69713.711683] [8909] hci_cs_le_create_conn: hci0 status 0x00 [69713.711685] [8909] hci_sent_cmd_data: hci0 opcode 0x200d [69713.711688] [8909] hci_req_cmd_complete: opcode 0x200d status 0x00 [69713.711690] [8909] hci_sent_cmd_data: hci0 opcode 0x200d [69713.711695] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000 [69713.711744] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000 [69713.818875] [8909] hci_rx_work: hci0 [69713.818889] [8909] hci_send_to_monitor: hdev ffff88021f0c7000 len 21 [69713.818913] [8909] hci_rx_work: hci0 Event packet [69713.818917] [8909] hci_le_conn_complete_evt: hci0 status 0x00 [69713.818922] [8909] hci_send_to_control: len 19 [69713.818927] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000 [69713.818938] [8909] hci_conn_add_sysfs: conn ffff88021f65a000 [69713.818975] [6450] bt_sock_poll: sock ffff88005e758500, sk ffff88010323b800 [69713.818981] [6515] hci_sock_recvmsg: sock ffff88005e75a080, sk ffff88010323ac00 ... [69713.819021] [8909] hci_dev_hold: hci0 orig refcnt 10 [69713.819025] [8909] l2cap_connect_cfm: hcon ffff88021f65a000 bdaddr 00:1b:dc:06:06:22 status 0 [69713.819028] [8909] hci_chan_create: hci0 hcon ffff88021f65a000 [69713.819031] [8909] l2cap_conn_add: hcon ffff88021f65a000 conn ffff880221005c00 hchan ffff88020d60b1c0 [69713.819034] [8909] l2cap_conn_ready: conn ffff880221005c00 [69713.819036] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000 [69713.819037] [8909] smp_conn_security: conn ffff880221005c00 hcon ffff88021f65a000 level 0x02 [69713.819039] [8909] smp_chan_create: [69713.819041] [8909] hci_conn_hold: hcon ffff88021f65a000 orig refcnt 1 [69713.819043] [8909] smp_send_cmd: code 0x01 [69713.819045] [8909] hci_send_acl: hci0 chan ffff88020d60b1c0 flags 0x0000 [69713.819046] [5949] hci_sock_recvmsg: sock ffff8800941a9900, sk ffff88012bf4e800 [69713.819049] [8909] hci_queue_acl: hci0 nonfrag skb ffff88005157c100 len 15 [69713.819055] [5949] hci_sock_recvmsg: sock ffff8800941a9900, sk ffff88012bf4e800 [69713.819057] [8909] l2cap_le_conn_ready: [69713.819064] [8909] l2cap_chan_create: chan ffff88005ede2c00 [69713.819066] [8909] l2cap_chan_hold: chan ffff88005ede2c00 orig refcnt 1 [69713.819069] [8909] l2cap_sock_init: sk ffff88005ede5800 [69713.819072] [8909] bt_accept_enqueue: parent ffff880160356000, sk ffff88005ede5800 [69713.819074] [8909] __l2cap_chan_add: conn ffff880221005c00, psm 0x00, dcid 0x0004 [69713.819076] [8909] l2cap_chan_hold: chan ffff88005ede2c00 orig refcnt 2 [69713.819078] [8909] hci_conn_hold: hcon ffff88021f65a000 orig refcnt 2 [69713.819080] [8909] smp_conn_security: conn ffff880221005c00 hcon ffff88021f65a000 level 0x01 [69713.819082] [8909] l2cap_sock_ready_cb: sk ffff88005ede5800, parent ffff880160356000 [69713.819086] [8909] le_pairing_complete_cb: status 0 [69713.819091] [8909] hci_tx_work: hci0 acl 10 sco 8 le 0 [69713.819093] [8909] hci_sched_acl: hci0 [69713.819094] [8909] hci_sched_sco: hci0 [69713.819096] [8909] hci_sched_esco: hci0 [69713.819098] [8909] hci_sched_le: hci0 [69713.819099] [8909] hci_chan_sent: hci0 [69713.819101] [8909] hci_chan_sent: chan ffff88020d60b1c0 quote 10 [69713.819104] [8909] hci_sched_le: chan ffff88020d60b1c0 skb ffff88005157c100 len 15 priority 7 [69713.819106] [8909] hci_send_frame: hci0 type 2 len 15 [69713.819108] [8909] hci_send_to_monitor: hdev ffff88021f0c7000 len 15 [69713.819119] [8909] hci_chan_sent: hci0 [69713.819121] [8909] hci_prio_recalculate: hci0 [69713.819123] [8909] process_pending_rx: [69713.819226] [6450] hci_sock_recvmsg: sock ffff88005e758780, sk ffff88010323d400 ... [69713.822022] [6450] l2cap_sock_accept: sk ffff880160356000 timeo 0 [69713.822024] [6450] bt_accept_dequeue: parent ffff880160356000 [69713.822026] [6450] bt_accept_unlink: sk ffff88005ede5800 state 1 [69713.822028] [6450] l2cap_sock_accept: new socket ffff88005ede5800 [69713.822368] [6450] l2cap_sock_getname: sock ffff8800941ab700, sk ffff88005ede5800 [69713.822375] [6450] l2cap_sock_getsockopt: sk ffff88005ede5800 [69713.822383] [6450] l2cap_sock_getname: sock ffff8800941ab700, sk ffff88005ede5800 [69713.822414] [6450] bt_sock_poll: sock ffff8800941ab700, sk ffff88005ede5800 ... [69713.823255] [6450] l2cap_sock_getname: sock ffff8800941ab700, sk ffff88005ede5800 [69713.823259] [6450] l2cap_sock_getsockopt: sk ffff88005ede5800 [69713.824322] [6450] l2cap_sock_getname: sock ffff8800941ab700, sk ffff88005ede5800 [69713.824330] [6450] l2cap_sock_getsockopt: sk ffff88005ede5800 [69713.825029] [6450] bt_sock_poll: sock ffff88005e758500, sk ffff88010323b800 ... [69713.825187] [6450] l2cap_sock_sendmsg: sock ffff8800941ab700, sk ffff88005ede5800 [69713.825189] [6450] bt_sock_wait_ready: sk ffff88005ede5800 [69713.825192] [6450] l2cap_create_basic_pdu: chan ffff88005ede2c00 len 3 [69713.825196] [6450] l2cap_do_send: chan ffff88005ede2c00, skb ffff880160b0b500 len 7 priority 0 [69713.825199] [6450] hci_send_acl: hci0 chan ffff88020d60b1c0 flags 0x0000 [69713.825201] [6450] hci_queue_acl: hci0 nonfrag skb ffff880160b0b500 len 11 [69713.825210] [8909] hci_tx_work: hci0 acl 9 sco 8 le 0 [69713.825213] [8909] hci_sched_acl: hci0 [69713.825214] [8909] hci_sched_sco: hci0 [69713.825216] [8909] hci_sched_esco: hci0 [69713.825217] [8909] hci_sched_le: hci0 [69713.825219] [8909] hci_chan_sent: hci0 [69713.825221] [8909] hci_chan_sent: chan ffff88020d60b1c0 quote 9 [69713.825223] [8909] hci_sched_le: chan ffff88020d60b1c0 skb ffff880160b0b500 len 11 priority 0 [69713.825225] [8909] hci_send_frame: hci0 type 2 len 11 [69713.825227] [8909] hci_send_to_monitor: hdev ffff88021f0c7000 len 11 [69713.825242] [8909] hci_chan_sent: hci0 [69713.825253] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000 [69713.825253] [8909] hci_prio_recalculate: hci0 [69713.825292] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000 [69713.825768] [6450] bt_sock_poll: sock ffff88005e758500, sk ffff88010323b800 ... [69713.866902] [8909] hci_rx_work: hci0 [69713.866921] [8909] hci_send_to_monitor: hdev ffff88021f0c7000 len 7 [69713.866928] [8909] hci_rx_work: hci0 Event packet [69713.866931] [8909] hci_num_comp_pkts_evt: hci0 num_hndl 1 [69713.866937] [8909] hci_tx_work: hci0 acl 9 sco 8 le 0 [69713.866939] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000 [69713.866940] [8909] hci_sched_acl: hci0 ... [69713.866944] [8909] hci_sched_le: hci0 [69713.866953] [8909] hci_chan_sent: hci0 [69713.866997] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000 [69713.867840] [28074] hci_rx_work: hci0 [69713.867844] [28074] hci_send_to_monitor: hdev ffff88021f0c7000 len 7 [69713.867850] [28074] hci_rx_work: hci0 Event packet [69713.867853] [28074] hci_num_comp_pkts_evt: hci0 num_hndl 1 [69713.867857] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000 [69713.867858] [28074] hci_tx_work: hci0 acl 10 sco 8 le 0 [69713.867860] [28074] hci_sched_acl: hci0 [69713.867861] [28074] hci_sched_sco: hci0 [69713.867862] [28074] hci_sched_esco: hci0 [69713.867863] [28074] hci_sched_le: hci0 [69713.867865] [28074] hci_chan_sent: hci0 [69713.867888] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000 [69714.145661] [8909] hci_rx_work: hci0 [69714.145666] [8909] hci_send_to_monitor: hdev ffff88021f0c7000 len 10 [69714.145676] [8909] hci_rx_work: hci0 ACL data packet [69714.145679] [8909] hci_acldata_packet: hci0 len 6 handle 0x002d flags 0x0002 [69714.145681] [8909] hci_conn_enter_active_mode: hcon ffff88021f65a000 mode 0 [69714.145683] [8909] l2cap_recv_acldata: conn ffff880221005c00 len 6 flags 0x2 [69714.145693] [8909] l2cap_recv_frame: len 2, cid 0x0006 [69714.145696] [8909] hci_send_to_control: len 14 [69714.145710] [8909] smp_chan_destroy: [69714.145713] [8909] pairing_complete: status 3 [69714.145714] [8909] cmd_complete: sock ffff88010323ac00 [69714.145717] [8909] hci_conn_drop: hcon ffff88021f65a000 orig refcnt 3 [69714.145719] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000 [69714.145720] [6450] bt_sock_poll: sock ffff88005e758500, sk ffff88010323b800 [69714.145722] [6515] hci_sock_recvmsg: sock ffff88005e75a080, sk ffff88010323ac00 [69714.145724] [6450] bt_sock_poll: sock ffff8801db6b4f00, sk ffff880160351c00 ... [69714.145735] [6515] hci_sock_recvmsg: sock ffff88005e75a080, sk ffff88010323ac00 [69714.145737] [8909] hci_conn_drop: hcon ffff88021f65a000 orig refcnt 2 [69714.145739] [8909] l2cap_conn_del: hcon ffff88021f65a000 conn ffff880221005c00, err 13 [69714.145740] [6450] bt_sock_poll: sock ffff8801db6b5400, sk ffff88021e775000 [69714.145743] [6450] bt_sock_poll: sock ffff8801db6b5e00, sk ffff880160356000 [69714.145744] [8909] l2cap_chan_hold: chan ffff88005ede2c00 orig refcnt 3 [69714.145746] [6450] bt_sock_poll: sock ffff8800941ab700, sk ffff88005ede5800 [69714.145748] [8909] l2cap_chan_del: chan ffff88005ede2c00, conn ffff880221005c00, err 13 [69714.145749] [8909] l2cap_chan_put: chan ffff88005ede2c00 orig refcnt 4 [69714.145751] [8909] hci_conn_drop: hcon ffff88021f65a000 orig refcnt 1 [69714.145754] [6450] bt_sock_poll: sock ffff8800941ab700, sk ffff88005ede5800 [69714.145756] [8909] l2cap_chan_put: chan ffff88005ede2c00 orig refcnt 3 [69714.145759] [8909] hci_chan_del: hci0 hcon ffff88021f65a000 chan ffff88020d60b1c0 [69714.145766] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000 [69714.145787] [6515] hci_sock_release: sock ffff88005e75a080 sk ffff88010323ac00 [69714.146002] [6450] hci_sock_recvmsg: sock ffff88005e758780, sk ffff88010323d400 [69714.150795] [6450] l2cap_sock_release: sock ffff8800941ab700, sk ffff88005ede5800 [69714.150799] [6450] l2cap_sock_shutdown: sock ffff8800941ab700, sk ffff88005ede5800 [69714.150802] [6450] l2cap_chan_close: chan ffff88005ede2c00 state BT_CLOSED [69714.150805] [6450] l2cap_sock_kill: sk ffff88005ede5800 state BT_CLOSED [69714.150806] [6450] l2cap_chan_put: chan ffff88005ede2c00 orig refcnt 2 [69714.150808] [6450] l2cap_sock_destruct: sk ffff88005ede5800 [69714.150809] [6450] l2cap_chan_put: chan ffff88005ede2c00 orig refcnt 1 [69714.150811] [6450] l2cap_chan_destroy: chan ffff88005ede2c00 [69714.150970] [6450] bt_sock_poll: sock ffff88005e758500, sk ffff88010323b800 ... [69714.151991] [8909] hci_conn_drop: hcon ffff88021f65a000 orig refcnt 0 [69716.150339] [8909] hci_conn_timeout: hcon ffff88021f65a000 state BT_CONNECTED, refcnt -1 Signed-off-by: Lukasz Rymanowski <lukasz.rymanowski@tieto.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-06-20 13:53:54 +02:00
Johan Hedberg	2ed8f65ca2	Bluetooth: Fix rejecting pairing in case of insufficient capabilities If we need an MITM protected connection but the local and remote IO capabilities cannot provide it we should reject the pairing attempt in the appropriate way. This patch adds the missing checks for such a situation to the smp_cmd_pairing_req() and smp_cmd_pairing_rsp() functions. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-06-20 13:53:48 +02:00
Johan Hedberg	581370cc74	Bluetooth: Refactor authentication method lookup into its own function We'll need to do authentication method lookups from more than one place, so refactor the lookup into its own function. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-06-20 13:53:42 +02:00
Johan Hedberg	c7262e711a	Bluetooth: Fix overriding higher security level in SMP When we receive a pairing request or an internal request to start pairing we shouldn't blindly overwrite the existing pending_sec_level value as that may actually be higher than the new one. This patch fixes the SMP code to only overwrite the value in case the new one is higher than the old. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-06-20 13:53:38 +02:00
David S. Miller	1b0608fd9b	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== pull request: wireless 2014-06-18 Please pull this batch of fixes intended for the 3.16 stream! For the Bluetooth bits, Gustavo says: "This is our first batch of fixes for 3.16. Be aware that two patches here are not exactly bugfixes: * 71f28af57066 Bluetooth: Add clarifying comment for conn->auth_type This commit just add some important security comments to the code, we found it important enough to include it here for 3.16 since it is security related. * 9f7ec8871132 Bluetooth: Refactor discovery stopping into its own function This commit is just a refactor in a preparation for a fix in the next commit (`f8680f128b`). All the other patches are fixes for deadlocks and for the Bluetooth protocols, most of them related to authentication and encryption." On top of that... Chin-Ran Lo fixes a problems with overlapping DMA areas in mwifiex. Michael Braun corrects a couple of issues in order to enable a new device in rt2800usb. Rafał Miłecki reverts a b43 patch that caused a regression, fixes a Kconfig typo, and corrects a frequency reporting error with the G-PHY. Stanislaw Grsuzka fixes an rfkill regression for rt2500pci, and avoids a rt2x00 scheduling while atomic BUG. Please let me know if there are problems! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-19 21:32:27 -07:00
Daniel Borkmann	24599e61b7	net: sctp: check proc_dointvec result in proc_sctp_do_auth When writing to the sysctl field net.sctp.auth_enable, it can well be that the user buffer we handed over to proc_dointvec() via proc_sctp_do_auth() handler contains something other than integers. In that case, we would set an uninitialized 4-byte value from the stack to net->sctp.auth_enable that can be leaked back when reading the sysctl variable, and it can unintentionally turn auth_enable on/off based on the stack content since auth_enable is interpreted as a boolean. Fix it up by making sure proc_dointvec() returned sucessfully. Fixes: `b14878ccb7` ("net: sctp: cache auth_enable per endpoint") Reported-by: Florian Westphal <fwestpha@redhat.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-19 21:30:19 -07:00
Neal Cardwell	2cd0d743b0	tcp: fix tcp_match_skb_to_sack() for unaligned SACK at end of an skb If there is an MSS change (or misbehaving receiver) that causes a SACK to arrive that covers the end of an skb but is less than one MSS, then tcp_match_skb_to_sack() was rounding up pkt_len to the full length of the skb ("Round if necessary..."), then chopping all bytes off the skb and creating a zero-byte skb in the write queue. This was visible now because the recently simplified TLP logic in `bef1909ee3` ("tcp: fixing TLP's FIN recovery") could find that 0-byte skb at the end of the write queue, and now that we do not check that skb's length we could send it as a TLP probe. Consider the following example scenario: mss: 1000 skb: seq: 0 end_seq: 4000 len: 4000 SACK: start_seq: 3999 end_seq: 4000 The tcp_match_skb_to_sack() code will compute: in_sack = false pkt_len = start_seq - TCP_SKB_CB(skb)->seq = 3999 - 0 = 3999 new_len = (pkt_len / mss) * mss = (3999/1000)*1000 = 3000 new_len += mss = 4000 Previously we would find the new_len > skb->len check failing, so we would fall through and set pkt_len = new_len = 4000 and chop off pkt_len of 4000 from the 4000-byte skb, leaving a 0-byte segment afterward in the write queue. With this new commit, we notice that the new new_len >= skb->len check succeeds, so that we return without trying to fragment. Fixes: `adb92db857` ("tcp: Make SACK code to split only at mss boundaries") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Ilpo Jarvinen <ilpo.jarvinen@helsinki.fi> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-19 20:50:49 -07:00
David S. Miller	8e4946ccdc	Revert "net: return actual error on register_queue_kobjects" This reverts commit `d36a4f4b47`. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-19 18:12:15 -07:00
Kees Cook	6f9a093b66	net: filter: fix upper BPF instruction limit The original checks (via sk_chk_filter) for instruction count uses ">", not ">=", so changing this in sk_convert_filter has the potential to break existing seccomp filters that used exactly BPF_MAXINSNS many instructions. Fixes: `bd4cf0ed33` ("net: filter: rework/optimize internal BPF interpreter's instruction set") Signed-off-by: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org # v3.15+ Acked-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-18 17:04:15 -07:00
Daniel Borkmann	ff5e92c1af	net: sctp: propagate sysctl errors from proc_do* properly sysctl handler proc_sctp_do_hmac_alg(), proc_sctp_do_rto_min() and proc_sctp_do_rto_max() do not properly reflect some error cases when writing values via sysctl from internal proc functions such as proc_dointvec() and proc_dostring(). In all these cases we pass the test for write != 0 and partially do additional work just to notice that additional sanity checks fail and we return with hard-coded -EINVAL while proc_do* functions might also return different errors. So fix this up by simply testing a successful return of proc_do* right after calling it. This also allows to propagate its return value onwards to the user. While touching this, also fix up some minor style issues. Fixes: `4f3fdf3bc5` ("sctp: add check rto_min and rto_max in sysctl") Fixes: `3c68198e75` ("sctp: Make hmac algorithm selection for cookie generation dynamic") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-18 17:03:07 -07:00
Jie Liu	d36a4f4b47	net: return actual error on register_queue_kobjects Return the actual error code if call kset_create_and_add() failed Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-18 16:58:40 -07:00
David S. Miller	3a3ec1b2ba	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== netfilter fixes for net The following patchset contains netfilter updates for your net tree, they are: 1) Fix refcount leak when dumping the dying/unconfirmed conntrack lists, from Florian Westphal. 2) Fix crash in NAT when removing a netnamespace, also from Florian. 3) Fix a crash in IPVS when trying to remove an estimator out of the sysctl scope, from Julian Anastasov. 4) Add zone attribute to the routing to calculate the message size in ctnetlink events, from Ken-ichirou MATSUZAWA. 5) Another fix for the dying/unconfirmed list which was preventing to dump more than one memory page of entries (~17 entries in x86_64). 6) Fix missing RCU-safe list insertion in the rule replacement code in nf_tables. 7) Since the new transaction infrastructure is in place, we have to upgrade the chain use counter from u16 to u32 to avoid overflow after more than 2^16 rules are added. 8) Fix refcount leak when replacing rule in nf_tables. This problem was also introduced in new transaction. 9) Call the ->destroy() callback when releasing nft-xt rules to fix module refcount leaks. 10) Set the family in the netlink messages that contain set elements in nf_tables to make it consistent with other object types. 11) Don't dump NAT port information if it is unset in nft_nat. 12) Update the MAINTAINERS file, I have merged the ebtables entry into netfilter. While at it, also removed the netfilter users mailing list, the development list should be enough. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-18 16:08:40 -07:00
John W. Linville	2ee3f63d39	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem	2014-06-18 14:39:25 -04:00
Octavian Purdila	e0f802fbca	tcp: move ir_mark initialization to tcp_openreq_init ir_mark initialization is done for both TCP v4 and v6, move it in the common tcp_openreq_init function. Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-17 15:30:54 -07:00
Peter Pan(潘卫平)	d215d10f2d	net: delete duplicate dev_set_rx_mode() call In __dev_open(), it already calls dev_set_rx_mode(). and dev_set_rx_mode() has no effect for a net device which does not have IFF_UP flag set. So the call of dev_set_rx_mode() is duplicate in __dev_change_flags(). Signed-off-by: Weiping Pan <panweiping3@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-17 15:30:54 -07:00
John W. Linville	7f4dbaa3ae	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth	2014-06-17 14:08:47 -04:00
Dave Jones	17846376f2	tcp: remove unnecessary tcp_sk assignment. This variable is overwritten by the child socket assignment before it ever gets used. Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-16 21:35:00 -07:00
Florian Westphal	945b2b2d25	netfilter: nf_nat: fix oops on netns removal Quoting Samu Kallio: Basically what's happening is, during netns cleanup, nf_nat_net_exit gets called before ipv4_net_exit. As I understand it, nf_nat_net_exit is supposed to kill any conntrack entries which have NAT context (through nf_ct_iterate_cleanup), but for some reason this doesn't happen (perhaps something else is still holding refs to those entries?). When ipv4_net_exit is called, conntrack entries (including those with NAT context) are cleaned up, but the nat_bysource hashtable is long gone - freed in nf_nat_net_exit. The bug happens when attempting to free a conntrack entry whose NAT hash 'prev' field points to a slot in the freed hash table (head for that bin). We ignore conntracks with null nat bindings. But this is wrong, as these are in bysource hash table as well. Restore nat-cleaning for the netns-is-being-removed case. bug: https://bugzilla.kernel.org/show_bug.cgi?id=65191 Fixes: `c2d421e171` ('netfilter: nf_nat: fix race when unloading protocol modules') Reported-by: Samu Kallio <samu.kallio@aberdeencloud.com> Debugged-by: Samu Kallio <samu.kallio@aberdeencloud.com> Signed-off-by: Florian Westphal <fw@strlen.de> Tested-by: Samu Kallio <samu.kallio@aberdeencloud.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-16 13:58:54 +02:00
Ken-ichirou MATSUZAWA	4a001068d7	netfilter: ctnetlink: add zone size to length Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-16 13:53:03 +02:00
Pablo Neira Ayuso	98ca74f4d5	Merge branch 'ipvs' Simon Horman says: ==================== Fix for panic due use of tot_stats estimator outside of CONFIG_SYSCTL It has been present since v3.6.39. ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-16 13:22:33 +02:00
Pablo Neira Ayuso	915136065b	netfilter: nft_nat: don't dump port information if unset Don't include port information attributes if they are unset. Reported-by: Ana Rey <anarey@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-16 13:08:14 +02:00
Pablo Neira Ayuso	6403d96254	netfilter: nf_tables: indicate family when dumping set elements Set the nfnetlink header that indicates the family of this element. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-16 13:08:09 +02:00
Pablo Neira Ayuso	3d9b142131	netfilter: nft_compat: call {target, match}->destroy() to cleanup entry Otherwise, the reference to external objects (eg. modules) are not released when the rules are removed. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-16 13:08:04 +02:00
Pablo Neira Ayuso	ac904ac835	netfilter: nf_tables: fix wrong type in transaction when replacing rules In `b380e5c` ("netfilter: nf_tables: add message type to transactions"), I used the wrong message type in the rule replacement case. The rule that is replaced needs to be handled as a deleted rule. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-16 13:07:58 +02:00
Pablo Neira Ayuso	ac34b86197	netfilter: nf_tables: decrement chain use counter when replacing rules Thus, the chain use counter remains with the same value after the rule replacement. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-16 13:07:50 +02:00
Pablo Neira Ayuso	a0a7379e16	netfilter: nf_tables: use u32 for chain use counter Since `4fefee5` ("netfilter: nf_tables: allow to delete several objects from a batch"), every new rule bumps the chain use counter. However, this is limited to 16 bits, which means that it will overrun after 2^16 rules. Use a u32 chain counter and check for overflows (just like we do for table objects). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-16 13:07:44 +02:00
Pablo Neira Ayuso	5bc5c30765	netfilter: nf_tables: use RCU-safe list insertion when replacing rules The patch `5e94846` ("netfilter: nf_tables: add insert operation") did not include RCU-safe list insertion when replacing rules. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-16 13:07:29 +02:00
Florian Westphal	cd5f336f17	netfilter: ctnetlink: fix refcnt leak in dying/unconfirmed list dumper 'last' keeps track of the ct that had its refcnt bumped during previous dump cycle. Thus it must not be overwritten until end-of-function. Another (unrelated, theoretical) issue: Don't attempt to bump refcnt of a conntrack whose reference count is already 0. Such conntrack is being destroyed right now, its memory is freed once we release the percpu dying spinlock. Fixes: `b7779d06` ('netfilter: conntrack: spinlock per cpu to protect special lists.') Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-16 12:51:36 +02:00
Pablo Neira Ayuso	266155b2de	netfilter: ctnetlink: fix dumping of dying/unconfirmed conntracks The dumping prematurely stops, it seems the callback argument that indicates that all entries have been dumped is set after iterating on the first cpu list. The dumping also may stop before the entire per-cpu list content is also dumped. With this patch, conntrack -L dying now shows the dying list content again. Fixes: `b7779d06` ("netfilter: conntrack: spinlock per cpu to protect special lists.") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-16 12:51:35 +02:00
Linus Torvalds	a9be22425e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Fix checksumming regressions, from Tom Herbert. 2) Undo unintentional permissions changes for SCTP rto_alpha and rto_beta sysfs knobs, from Denial Borkmann. 3) VXLAN, like other IP tunnels, should advertize it's encapsulation size using dev->needed_headroom instead of dev->hard_header_len. From Cong Wang. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: net: sctp: fix permissions for rto_alpha and rto_beta knobs vxlan: Checksum fixes net: add skb_pop_rcv_encapsulation udp: call __skb_checksum_complete when doing full checksum net: Fix save software checksum complete net: Fix GSO constants to match NETIF flags udp: ipv4: do not waste time in __udp4_lib_mcast_demux_lookup vxlan: use dev->needed_headroom instead of dev->hard_header_len MAINTAINERS: update cxgb4 maintainer	2014-06-15 16:37:03 -10:00
Daniel Borkmann	b58537a1f5	net: sctp: fix permissions for rto_alpha and rto_beta knobs Commit `3fd091e73b` ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.") has silently changed permissions for rto_alpha and rto_beta knobs from 0644 to 0444. The purpose of this was to discourage users from tweaking rto_alpha and rto_beta knobs in production environments since they are key to correctly compute rtt/srtt. RFC4960 under section 6.3.1. RTO Calculation says regarding rto_alpha and rto_beta under rule C3 and C4: [...] C3) When a new RTT measurement R' is made, set RTTVAR <- (1 - RTO.Beta) * RTTVAR + RTO.Beta * \|SRTT - R'\| and SRTT <- (1 - RTO.Alpha) * SRTT + RTO.Alpha * R' Note: The value of SRTT used in the update to RTTVAR is its value before updating SRTT itself using the second assignment. After the computation, update RTO <- SRTT + 4 * RTTVAR. C4) When data is in flight and when allowed by rule C5 below, a new RTT measurement MUST be made each round trip. Furthermore, new RTT measurements SHOULD be made no more than once per round trip for a given destination transport address. There are two reasons for this recommendation: First, it appears that measuring more frequently often does not in practice yield any significant benefit [ALLMAN99]; second, if measurements are made more often, then the values of RTO.Alpha and RTO.Beta in rule C3 above should be adjusted so that SRTT and RTTVAR still adjust to changes at roughly the same rate (in terms of how many round trips it takes them to reflect new values) as they would if making only one measurement per round-trip and using RTO.Alpha and RTO.Beta as given in rule C3. However, the exact nature of these adjustments remains a research issue. [...] While it is discouraged to adjust rto_alpha and rto_beta and not further specified how to adjust them, the RFC also doesn't explicitly forbid it, but rather gives a RECOMMENDED default value (rto_alpha=3, rto_beta=2). We have a couple of users relying on the old permissions before they got changed. That said, if someone really has the urge to adjust them, we could allow it with a warning in the log. Fixes: `3fd091e73b` ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-15 01:17:32 -07:00
Tom Herbert	46fb51eb96	net: Fix save software checksum complete Geert reported issues regarding checksum complete and UDP. The logic introduced in commit `7e3cead517` ("net: Save software checksum complete") is not correct. This patch: 1) Restores code in __skb_checksum_complete_header except for setting CHECKSUM_UNNECESSARY. This function may be calculating checksum on something less than skb->len. 2) Adds saving checksum to __skb_checksum_complete. The full packet checksum 0..skb->len is calculated without adding in pseudo header. This value is saved in skb->csum and then the pseudo header is added to that to derive the checksum for validation. 3) In both __skb_checksum_complete_header and __skb_checksum_complete, set skb->csum_valid to whether checksum of zero was computed. This allows skb_csum_unnecessary to return true without changing to CHECKSUM_UNNECESSARY which was done previously. 4) Copy new csum related bits in __copy_skb_header. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-15 01:00:49 -07:00
Eric Dumazet	63c6f81cdd	udp: ipv4: do not waste time in __udp4_lib_mcast_demux_lookup Its too easy to add thousand of UDP sockets on a particular bucket, and slow down an innocent multicast receiver. Early demux is supposed to be an optimization, we should avoid spending too much time in it. It is interesting to note __udp4_lib_demux_lookup() only tries to match first socket in the chain. 10 is the threshold we already have in __udp4_lib_lookup() to switch to secondary hash. Fixes: `421b3885bf` ("udp: ipv4: Add udp early demux") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: David Held <drheld@google.com> Cc: Shawn Bohrer <sbohrer@rgmadvisors.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-13 15:39:24 -07:00
Marcin Kraglak	92d1372e1a	Bluetooth: Allow change security level on ATT_CID in slave role Kernel supports SMP Security Request so don't block increasing security when we are slave. Signed-off-by: Marcin Kraglak <marcin.kraglak@tieto.com> Acked-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org	2014-06-13 14:36:39 +02:00
Johan Hedberg	c73f94b8c0	Bluetooth: Fix locking of hdev when calling into SMP code The SMP code expects hdev to be unlocked since e.g. crypto functions will try to (re)lock it. Therefore, we need to release the lock before calling into smp.c from mgmt.c. Without this we risk a deadlock whenever the smp_user_confirm_reply() function is called. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Tested-by: Lukasz Rymanowski <lukasz.rymanowski@tieto.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org	2014-06-13 13:32:29 +02:00
Jukka Taimisto	7ab56c3a6e	Bluetooth: Fix deadlock in l2cap_conn_del() A deadlock occurs when PDU containing invalid SMP opcode is received on Security Manager Channel over LE link and conn->pending_rx_work worker has not run yet. When LE link is created l2cap_conn_ready() is called and before returning it schedules conn->pending_rx_work worker to hdev->workqueue. Incoming data to SMP fixed channel is handled by l2cap_recv_frame() which calls smp_sig_channel() to handle the SMP PDU. If smp_sig_channel() indicates failure l2cap_conn_del() is called to delete the connection. When deleting the connection, l2cap_conn_del() purges the pending_rx queue and calls flush_work() to wait for the pending_rx_work worker to complete. Since incoming data is handled by a worker running from the same workqueue as the pending_rx_work is being scheduled on, we will deadlock on waiting for pending_rx_work to complete. This patch fixes the deadlock by calling cancel_work_sync() instead of flush_work(). Signed-off-by: Jukka Taimisto <jtt@codenomicon.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org	2014-06-13 13:32:26 +02:00
Johan Hedberg	f8680f128b	Bluetooth: Reuse hci_stop_discovery function when cleaning up HCI state When cleaning up the HCI state as part of the power-off procedure we can reuse the hci_stop_discovery() function instead of explicitly sending HCI command related to discovery. The added benefit of this is that it takes care of canceling name resolving and inquiry which were not previously covered by the code. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org	2014-06-13 13:32:23 +02:00
Johan Hedberg	21a60d307d	Bluetooth: Refactor discovery stopping into its own function We'll need to reuse the same logic for stopping discovery also when cleaning up HCI state when powering off. This patch refactors the code out to its own function that can later (in a subsequent patch) be used also for the power off case. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org	2014-06-13 13:32:20 +02:00
Johan Hedberg	50143a433b	Bluetooth: Fix indicating discovery state when canceling inquiry When inquiry is canceled through the HCI_Cancel_Inquiry command there is no Inquiry Complete event generated. Instead, all we get is the command complete for the HCI_Inquiry_Cancel command. This means that we must call the hci_discovery_set_state() function from the respective command complete handler in order to ensure that user space knows the correct discovery state. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org	2014-06-13 13:32:16 +02:00
Johan Hedberg	fff3490f47	Bluetooth: Fix setting correct authentication information for SMP STK When we store the STK in slave role we should set the correct authentication information for it. If the pairing is producing a HIGH security level the STK is considered authenticated, and otherwise it's considered unauthenticated. This patch fixes the value passed to the hci_add_ltk() function when adding the STK on the slave side. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Tested-by: Marcin Kraglak <marcin.kraglak@tieto.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org	2014-06-13 13:30:48 +02:00
Johan Hedberg	4ad51a75c7	Bluetooth: Add clarifying comment for conn->auth_type When responding to an IO capability request when we're the initiators of the pairing we will not yet have the remote IO capability information. Since the conn->auth_type variable is treated as an "absolute" requirement instead of a hint of what's needed later in the user confirmation request handler it's important that it doesn't have the MITM bit set if there's any chance that the remote device doesn't have the necessary IO capabilities. This patch adds a clarifying comment so that conn->auth_type is left untouched in this scenario. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-06-13 13:30:45 +02:00
Johan Hedberg	ba15a58b17	Bluetooth: Fix SSP acceptor just-works confirmation without MITM From the Bluetooth Core Specification 4.1 page 1958: "if both devices have set the Authentication_Requirements parameter to one of the MITM Protection Not Required options, authentication stage 1 shall function as if both devices set their IO capabilities to DisplayOnly (e.g., Numeric comparison with automatic confirmation on both devices)" So far our implementation has done user confirmation for all just-works cases regardless of the MITM requirements, however following the specification to the word means that we should not be doing confirmation when neither side has the MITM flag set. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Tested-by: Szymon Janc <szymon.janc@tieto.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org	2014-06-13 13:30:42 +02:00
Johan Hedberg	e694788d73	Bluetooth: Fix check for connection encryption The conn->link_key variable tracks the type of link key in use. It is set whenever we respond to a link key request as well as when we get a link key notification event. These two events do not however always guarantee that encryption is enabled: getting a link key request and responding to it may only mean that the remote side has requested authentication but not encryption. On the other hand, the encrypt change event is a certain guarantee that encryption is enabled. The real encryption state is already tracked in the conn->link_mode variable through the HCI_LM_ENCRYPT bit. This patch fixes a check for encryption in the hci_conn_auth function to use the proper conn->link_mode value and thereby eliminates the chance of a false positive result. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org	2014-06-13 13:30:39 +02:00
Johan Hedberg	b62b65055b	Bluetooth: Fix incorrectly overriding conn->src_type The src_type member of struct hci_conn should always reflect the address type of the src_member. It should never be overridden. There is already code in place in the command status handler of HCI_LE_Create_Connection to copy the right initiator address into conn->init_addr_type. Without this patch, if privacy is enabled, we will send the wrong address type in the SMP identity address information PDU (it'll e.g. contain our public address but a random address type). Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org	2014-06-13 13:30:37 +02:00
Julian Anastasov	9802d21e7a	ipvs: stop tot_stats estimator only under CONFIG_SYSCTL The tot_stats estimator is started only when CONFIG_SYSCTL is defined. But it is stopped without checking CONFIG_SYSCTL. Fix the crash by moving ip_vs_stop_estimator into ip_vs_control_net_cleanup_sysctl. The change is needed after commit `14e405461e` ("IPVS: Add __ip_vs_control_{init,cleanup}_sysctl()") from 2.6.39. Reported-by: Jet Chen <jet.chen@intel.com> Tested-by: Jet Chen <jet.chen@intel.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2014-06-13 16:22:25 +09:00
Linus Torvalds	6d87c225f5	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph updates from Sage Weil: "This has a mix of bug fixes and cleanups. Alex's patch fixes a rare race in RBD. Ilya's patches fix an ENOENT check when a second rbd image is mapped and a couple memory leaks. Zheng fixes several issues with fragmented directories and multiple MDSs. Josh fixes a spin/sleep issue, and Josh and Guangliang's patches fix setting and unsetting RBD images read-only. Naturally there are several other cleanups mixed in for good measure" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (23 commits) rbd: only set disk to read-only once rbd: move calls that may sleep out of spin lock range rbd: add ioctl for rbd ceph: use truncate_pagecache() instead of truncate_inode_pages() ceph: include time stamp in every MDS request rbd: fix ida/idr memory leak rbd: use reference counts for image requests rbd: fix osd_request memory leak in __rbd_dev_header_watch_sync() rbd: make sure we have latest osdmap on 'rbd map' libceph: add ceph_monc_wait_osdmap() libceph: mon_get_version request infrastructure libceph: recognize poolop requests in debugfs ceph: refactor readpage_nounlock() to make the logic clearer mds: check cap ID when handling cap export message ceph: remember subtree root dirfrag's auth MDS ceph: introduce ceph_fill_fragtree() ceph: handle cap import atomically ceph: pre-allocate ceph_cap struct for ceph_add_cap() ceph: update inode fields according to issued caps rbd: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO ...	2014-06-12 23:06:23 -07:00
Linus Torvalds	f9da455b93	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking updates from David Miller: 1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov. 2) Multiqueue support in xen-netback and xen-netfront, from Andrew J Benniston. 3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn Mork. 4) BPF now has a "random" opcode, from Chema Gonzalez. 5) Add more BPF documentation and improve test framework, from Daniel Borkmann. 6) Support TCP fastopen over ipv6, from Daniel Lee. 7) Add software TSO helper functions and use them to support software TSO in mvneta and mv643xx_eth drivers. From Ezequiel Garcia. 8) Support software TSO in fec driver too, from Nimrod Andy. 9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli. 10) Handle broadcasts more gracefully over macvlan when there are large numbers of interfaces configured, from Herbert Xu. 11) Allow more control over fwmark used for non-socket based responses, from Lorenzo Colitti. 12) Do TCP congestion window limiting based upon measurements, from Neal Cardwell. 13) Support busy polling in SCTP, from Neal Horman. 14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru. 15) Bridge promisc mode handling improvements from Vlad Yasevich. 16) Don't use inetpeer entries to implement ID generation any more, it performs poorly, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits) rtnetlink: fix userspace API breakage for iproute2 < v3.9.0 tcp: fixing TLP's FIN recovery net: fec: Add software TSO support net: fec: Add Scatter/gather support net: fec: Increase buffer descriptor entry number net: fec: Factorize feature setting net: fec: Enable IP header hardware checksum net: fec: Factorize the .xmit transmit function bridge: fix compile error when compiling without IPv6 support bridge: fix smatch warning / potential null pointer dereference via-rhine: fix full-duplex with autoneg disable bnx2x: Enlarge the dorq threshold for VFs bnx2x: Check for UNDI in uncommon branch bnx2x: Fix 1G-baseT link bnx2x: Fix link for KR with swapped polarity lane sctp: Fix sk_ack_backlog wrap-around problem net/core: Add VF link state control policy net/fsl: xgmac_mdio is dependent on OF_MDIO net/fsl: Make xgmac_mdio read error message useful net_sched: drr: warn when qdisc is not work conserving ...	2014-06-12 14:27:40 -07:00
Michal Schmidt	e5eca6d41f	rtnetlink: fix userspace API breakage for iproute2 < v3.9.0 When running RHEL6 userspace on a current upstream kernel, "ip link" fails to show VF information. The reason is a kernel<->userspace API change introduced by commit `88c5b5ce5c` ("rtnetlink: Call nlmsg_parse() with correct header length"), after which the kernel does not see iproute2's IFLA_EXT_MASK attribute in the netlink request. iproute2 adjusted for the API change in its commit 63338dca4513 ("libnetlink: Use ifinfomsg instead of rtgenmsg in rtnl_wilddump_req_filter"). The problem has been noticed before: http://marc.info/?l=linux-netdev&m=136692296022182&w=2 (Subject: Re: getting VF link info seems to be broken in 3.9-rc8) We can do better than tell those with old userspace to upgrade. We can recognize the old iproute2 in the kernel by checking the netlink message length. Even when including the IFLA_EXT_MASK attribute, its netlink message is shorter than struct ifinfomsg. With this patch "ip link" shows VF information in both old and new iproute2 versions. Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-12 11:07:42 -07:00
Per Hurtig	bef1909ee3	tcp: fixing TLP's FIN recovery Fix to a problem observed when losing a FIN segment that does not contain data. In such situations, TLP is unable to recover from any tail loss and instead adds at least PTO ms to the retransmission process, i.e., RTO = RTO + PTO. Signed-off-by: Per Hurtig <per.hurtig@kau.se> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Nandita Dukkipati <nanditad@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-12 11:05:51 -07:00
Linus Lüssing	3993c4e159	bridge: fix compile error when compiling without IPv6 support Some fields in "struct net_bridge" aren't available when compiling the kernel without IPv6 support. Therefore adding a check/macro to skip the complaining code sections in that case. Introduced by `2cd4143192` ("bridge: memorize and export selected IGMP/MLD querier port") Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-12 11:00:24 -07:00
Linus Lüssing	6c03ee8bda	bridge: fix smatch warning / potential null pointer dereference "New smatch warnings: net/bridge/br_multicast.c:1368 br_ip6_multicast_query() error: we previously assumed 'group' could be null (see line 1349)" In the rare (sort of broken) case of a query having a Maximum Response Delay of zero, we could create a potential null pointer dereference. Fixing this by skipping the multicast specific MLD Query parsing again if no multicast group address is available. Introduced by `dc4eb53a99` ("bridge: adhere to querier election mechanism specified by RFCs") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-12 11:00:24 -07:00
Linus Torvalds	16b9057804	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "This the bunch that sat in -next + lock_parent() fix. This is the minimal set; there's more pending stuff. In particular, I really hope to get acct.c fixes merged this cycle - we need that to deal sanely with delayed-mntput stuff. In the next pile, hopefully - that series is fairly short and localized (kernel/acct.c, fs/super.c and fs/namespace.c). In this pile: more iov_iter work. Most of prereqs for ->splice_write with sane locking order are there and Kent's dio rewrite would also fit nicely on top of this pile" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (70 commits) lock_parent: don't step on stale ->d_parent of all-but-freed one kill generic_file_splice_write() ceph: switch to iter_file_splice_write() shmem: switch to iter_file_splice_write() nfs: switch to iter_splice_write_file() fs/splice.c: remove unneeded exports ocfs2: switch to iter_file_splice_write() ->splice_write() via ->write_iter() bio_vec-backed iov_iter optimize copy_page_{to,from}_iter() bury generic_file_aio_{read,write} lustre: get rid of messing with iovecs ceph: switch to ->write_iter() ceph_sync_direct_write: stop poking into iov_iter guts ceph_sync_read: stop poking into iov_iter guts new helper: copy_page_from_iter() fuse: switch to ->write_iter() btrfs: switch to ->write_iter() ocfs2: switch to ->write_iter() xfs: switch to ->write_iter() ...	2014-06-12 10:30:18 -07:00
Xufeng Zhang	d3217b15a1	sctp: Fix sk_ack_backlog wrap-around problem Consider the scenario: For a TCP-style socket, while processing the COOKIE_ECHO chunk in sctp_sf_do_5_1D_ce(), after it has passed a series of sanity check, a new association would be created in sctp_unpack_cookie(), but afterwards, some processing maybe failed, and sctp_association_free() will be called to free the previously allocated association, in sctp_association_free(), sk_ack_backlog value is decremented for this socket, since the initial value for sk_ack_backlog is 0, after the decrement, it will be 65535, a wrap-around problem happens, and if we want to establish new associations afterward in the same socket, ABORT would be triggered since sctp deem the accept queue as full. Fix this issue by only decrementing sk_ack_backlog for associations in the endpoint's list. Fix-suggested-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Xufeng Zhang <xufeng.zhang@windriver.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-12 10:27:14 -07:00
Al Viro	9c1d5284c7	Merge commit '9f12600fe425bc28f0ccba034a77783c09c15af4' into for-linus Backmerge of dcache.c changes from mainline. It's that, or complete rebase... Conflicts: fs/splice.c Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-06-12 00:28:09 -04:00
David S. Miller	902455e007	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: net/core/rtnetlink.c net/core/skbuff.c Both conflicts were very simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 16:02:55 -07:00
Doug Ledford	c5b4616087	net/core: Add VF link state control policy Commit `1d8faf48c7` (net/core: Add VF link state control) added VF link state control to the netlink VF nested structure, but failed to add a proper entry for the new structure into the VF policy table. Add the missing entry so the table and the actual data copied into the netlink nested struct are in sync. Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:51:37 -07:00
Florian Westphal	6e765a009a	net_sched: drr: warn when qdisc is not work conserving The DRR scheduler requires that items on the active list are work conserving, i.e. do not hold on to skbs for throttling purposes, etc. Attaching e.g. tbf renders DRR useless because all other classes on the active list are delayed as well. So, warn users that this configuration won't work as expected; we already do this in couple of other qdiscs, see e.g. commit `b00355db3f` ('pkt_sched: sch_hfsc: sch_htb: Add non-work-conserving warning handler') The 'const' change is needed to avoid compiler warning ("discards 'const' qualifier from pointer target type"). tested with: drr_hier() { parent=$1 classes=$2 for i in $(seq 1 $classes); do classid=$parent$(printf %x $i) tc class add dev eth0 parent $parent classid $classid drr tc qdisc add dev eth0 parent $classid tbf rate 64kbit burst 256kbit limit 64kbit done } tc qdisc add dev eth0 root handle 1: drr drr_hier 1: 32 tc filter add dev eth0 protocol all pref 1 parent 1: handle 1 flow hash keys dst perturb 1 divisor 32 Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:50:59 -07:00
Tom Herbert	6bae1d4cc3	net: Add skb_gro_postpull_rcsum to udp and vxlan Need to gro_postpull_rcsum for GRO to work with checksum complete. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:46:13 -07:00
Tom Herbert	7e3cead517	net: Save software checksum complete In skb_checksum complete, if we need to compute the checksum for the packet (via skb_checksum) save the result as CHECKSUM_COMPLETE. Subsequent checksum verification can use this. Also, added csum_complete_sw flag to distinguish between software and hardware generated checksum complete, we should always be able to trust the software computation. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:46:13 -07:00
stephen hemminger	f647944995	ceph: remove bogus extern Sparse complained about this bogus extern on definition of a function. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:39:19 -07:00
Eric Dumazet	9709674e68	ipv4: fix a race in ip4_datagram_release_cb() Alexey gave a AddressSanitizer[1] report that finally gave a good hint at where was the origin of various problems already reported by Dormando in the past [2] Problem comes from the fact that UDP can have a lockless TX path, and concurrent threads can manipulate sk_dst_cache, while another thread, is holding socket lock and calls __sk_dst_set() in ip4_datagram_release_cb() (this was added in linux-3.8) It seems that all we need to do is to use sk_dst_check() and sk_dst_set() so that all the writers hold same spinlock (sk->sk_dst_lock) to prevent corruptions. TCP stack do not need this protection, as all sk_dst_cache writers hold the socket lock. [1] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel AddressSanitizer: heap-use-after-free in ipv4_dst_check Read of size 2 by thread T15453: [<ffffffff817daa3a>] ipv4_dst_check+0x1a/0x90 ./net/ipv4/route.c:1116 [<ffffffff8175b789>] __sk_dst_check+0x89/0xe0 ./net/core/sock.c:531 [<ffffffff81830a36>] ip4_datagram_release_cb+0x46/0x390 ??:0 [<ffffffff8175eaea>] release_sock+0x17a/0x230 ./net/core/sock.c:2413 [<ffffffff81830882>] ip4_datagram_connect+0x462/0x5d0 ??:0 [<ffffffff81846d06>] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534 [<ffffffff817580ac>] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701 [<ffffffff817596ce>] SyS_connect+0xe/0x10 ./net/socket.c:1682 [<ffffffff818b0a29>] system_call_fastpath+0x16/0x1b ./arch/x86/kernel/entry_64.S:629 Freed by thread T15455: [<ffffffff8178d9b8>] dst_destroy+0xa8/0x160 ./net/core/dst.c:251 [<ffffffff8178de25>] dst_release+0x45/0x80 ./net/core/dst.c:280 [<ffffffff818304c1>] ip4_datagram_connect+0xa1/0x5d0 ??:0 [<ffffffff81846d06>] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534 [<ffffffff817580ac>] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701 [<ffffffff817596ce>] SyS_connect+0xe/0x10 ./net/socket.c:1682 [<ffffffff818b0a29>] system_call_fastpath+0x16/0x1b ./arch/x86/kernel/entry_64.S:629 Allocated by thread T15453: [<ffffffff8178d291>] dst_alloc+0x81/0x2b0 ./net/core/dst.c:171 [<ffffffff817db3b7>] rt_dst_alloc+0x47/0x50 ./net/ipv4/route.c:1406 [< inlined >] __ip_route_output_key+0x3e8/0xf70 __mkroute_output ./net/ipv4/route.c:1939 [<ffffffff817dde08>] __ip_route_output_key+0x3e8/0xf70 ./net/ipv4/route.c:2161 [<ffffffff817deb34>] ip_route_output_flow+0x14/0x30 ./net/ipv4/route.c:2249 [<ffffffff81830737>] ip4_datagram_connect+0x317/0x5d0 ??:0 [<ffffffff81846d06>] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534 [<ffffffff817580ac>] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701 [<ffffffff817596ce>] SyS_connect+0xe/0x10 ./net/socket.c:1682 [<ffffffff818b0a29>] system_call_fastpath+0x16/0x1b ./arch/x86/kernel/entry_64.S:629 [2] <4>[196727.311203] general protection fault: 0000 [#1] SMP <4>[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp pps_core mdio <4>[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 #1 <4>[196727.311344] Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013 <4>[196727.311364] task: ffff885e6f069700 ti: ffff885e6f072000 task.ti: ffff885e6f072000 <4>[196727.311377] RIP: 0010:[<ffffffff815f8c7f>] [<ffffffff815f8c7f>] ipv4_dst_destroy+0x4f/0x80 <4>[196727.311399] RSP: 0018:ffff885effd23a70 EFLAGS: 00010282 <4>[196727.311409] RAX: dead000000200200 RBX: ffff8854c398ecc0 RCX: 0000000000000040 <4>[196727.311423] RDX: dead000000100100 RSI: dead000000100100 RDI: dead000000200200 <4>[196727.311437] RBP: ffff885effd23a80 R08: ffffffff815fd9e0 R09: ffff885d5a590800 <4>[196727.311451] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 <4>[196727.311464] R13: ffffffff81c8c280 R14: 0000000000000000 R15: ffff880e85ee16ce <4>[196727.311510] FS: 0000000000000000(0000) GS:ffff885effd20000(0000) knlGS:0000000000000000 <4>[196727.311554] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[196727.311581] CR2: 00007a46751eb000 CR3: 0000005e65688000 CR4: 00000000000407e0 <4>[196727.311625] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>[196727.311669] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>[196727.311713] Stack: <4>[196727.311733] ffff8854c398ecc0 ffff8854c398ecc0 ffff885effd23ab0 ffffffff815b7f42 <4>[196727.311784] ffff88be6595bc00 ffff8854c398ecc0 0000000000000000 ffff8854c398ecc0 <4>[196727.311834] ffff885effd23ad0 ffffffff815b86c6 ffff885d5a590800 ffff8816827821c0 <4>[196727.311885] Call Trace: <4>[196727.311907] <IRQ> <4>[196727.311912] [<ffffffff815b7f42>] dst_destroy+0x32/0xe0 <4>[196727.311959] [<ffffffff815b86c6>] dst_release+0x56/0x80 <4>[196727.311986] [<ffffffff81620bd5>] tcp_v4_do_rcv+0x2a5/0x4a0 <4>[196727.312013] [<ffffffff81622b5a>] tcp_v4_rcv+0x7da/0x820 <4>[196727.312041] [<ffffffff815fd9e0>] ? ip_rcv_finish+0x360/0x360 <4>[196727.312070] [<ffffffff815de02d>] ? nf_hook_slow+0x7d/0x150 <4>[196727.312097] [<ffffffff815fd9e0>] ? ip_rcv_finish+0x360/0x360 <4>[196727.312125] [<ffffffff815fda92>] ip_local_deliver_finish+0xb2/0x230 <4>[196727.312154] [<ffffffff815fdd9a>] ip_local_deliver+0x4a/0x90 <4>[196727.312183] [<ffffffff815fd799>] ip_rcv_finish+0x119/0x360 <4>[196727.312212] [<ffffffff815fe00b>] ip_rcv+0x22b/0x340 <4>[196727.312242] [<ffffffffa0339680>] ? macvlan_broadcast+0x160/0x160 [macvlan] <4>[196727.312275] [<ffffffff815b0c62>] __netif_receive_skb_core+0x512/0x640 <4>[196727.312308] [<ffffffff811427fb>] ? kmem_cache_alloc+0x13b/0x150 <4>[196727.312338] [<ffffffff815b0db1>] __netif_receive_skb+0x21/0x70 <4>[196727.312368] [<ffffffff815b0fa1>] netif_receive_skb+0x31/0xa0 <4>[196727.312397] [<ffffffff815b1ae8>] napi_gro_receive+0xe8/0x140 <4>[196727.312433] [<ffffffffa00274f1>] ixgbe_poll+0x551/0x11f0 [ixgbe] <4>[196727.312463] [<ffffffff815fe00b>] ? ip_rcv+0x22b/0x340 <4>[196727.312491] [<ffffffff815b1691>] net_rx_action+0x111/0x210 <4>[196727.312521] [<ffffffff815b0db1>] ? __netif_receive_skb+0x21/0x70 <4>[196727.312552] [<ffffffff810519d0>] __do_softirq+0xd0/0x270 <4>[196727.312583] [<ffffffff816cef3c>] call_softirq+0x1c/0x30 <4>[196727.312613] [<ffffffff81004205>] do_softirq+0x55/0x90 <4>[196727.312640] [<ffffffff81051c85>] irq_exit+0x55/0x60 <4>[196727.312668] [<ffffffff816cf5c3>] do_IRQ+0x63/0xe0 <4>[196727.312696] [<ffffffff816c5aaa>] common_interrupt+0x6a/0x6a <4>[196727.312722] <EOI> <1>[196727.313071] RIP [<ffffffff815f8c7f>] ipv4_dst_destroy+0x4f/0x80 <4>[196727.313100] RSP <ffff885effd23a70> <4>[196727.313377] ---[ end trace 64b3f14fae0f2e29 ]--- <0>[196727.380908] Kernel panic - not syncing: Fatal exception in interrupt Reported-by: Alexey Preobrazhensky <preobr@google.com> Reported-by: dormando <dormando@rydia.ne> Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: `8141ed9fce` ("ipv4: Add a socket release callback for datagram sockets") Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:39:18 -07:00
Octavian Purdila	bad93e9d4e	net: add __pskb_copy_fclone and pskb_copy_for_clone There are several instances where a pskb_copy or __pskb_copy is immediately followed by an skb_clone. Add a couple of new functions to allow the copy skb to be allocated from the fclone cache and thus speed up subsequent skb_clone calls. Cc: Alexander Smirnov <alex.bluesman.smirnov@gmail.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Marek Lindner <mareklindner@neomailbox.ch> Cc: Simon Wunderlich <sw@simonwunderlich.de> Cc: Antonio Quartulli <antonio@meshcoding.com> Cc: Marcel Holtmann <marcel@holtmann.org> Cc: Gustavo Padovan <gustavo@padovan.org> Cc: Johan Hedberg <johan.hedberg@gmail.com> Cc: Arvid Brodin <arvid.brodin@alten.se> Cc: Patrick McHardy <kaber@trash.net> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: Lauro Ramos Venancio <lauro.venancio@openbossa.org> Cc: Aloisio Almeida Jr <aloisio.almeida@openbossa.org> Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: Jon Maloy <jon.maloy@ericsson.com> Cc: Allan Stephens <allan.stephens@windriver.com> Cc: Andrew Hendry <andrew.hendry@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Reviewed-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:38:02 -07:00
Toshiaki Makita	204177f3f3	bridge: Support 802.1ad vlan filtering This enables us to change the vlan protocol for vlan filtering. We come to be able to filter frames on the basis of 802.1ad vlan tags through a bridge. This also changes br->group_addr if it has not been set by user. This is needed for an 802.1ad bridge. (See IEEE 802.1Q-2011 8.13.5.) Furthermore, this sets br->group_fwd_mask_required so that an 802.1ad bridge can forward the Nearest Customer Bridge group addresses except for br->group_addr, which should be passed to higher layer. To change the vlan protocol, write a protocol in sysfs: # echo 0x88a8 > /sys/class/net/br0/bridge/vlan_protocol Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:22:53 -07:00
Toshiaki Makita	f2808d226f	bridge: Prepare for forwarding another bridge group addresses If a bridge is an 802.1ad bridge, it must forward another bridge group addresses (the Nearest Customer Bridge group addresses). (For details, see IEEE 802.1Q-2011 8.6.3.) As user might not want group_fwd_mask to be modified by enabling 802.1ad, introduce a new mask, group_fwd_mask_required, which indicates addresses the bridge wants to forward. This will be set by enabling 802.1ad. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:22:53 -07:00
Toshiaki Makita	8580e2117c	bridge: Prepare for 802.1ad vlan filtering support This enables a bridge to have vlan protocol informantion and allows vlan tag manipulation (retrieve, insert and remove tags) according to the vlan protocol. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:22:53 -07:00
Toshiaki Makita	1c5abb6c77	bridge: Add 802.1ad tx vlan acceleration Bridge device doesn't need to embed S-tag into skb->data. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:22:53 -07:00
Alexei Starovoitov	61f83d0d57	net: filter: fix warning on 32-bit arch fix compiler warning on 32-bit architectures: net/core/filter.c: In function '__sk_run_filter': net/core/filter.c:540:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] net/core/filter.c:550:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] net/core/filter.c:560:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:12:27 -07:00
Jon Paul Maloy	02c00c2ab0	tipc: fix potential bug in function tipc_backlog_rcv In commit `4f4482dcd9` ("tipc: compensate for double accounting in socket rcv buffer") we access 'truesize' of a received buffer after it might have been released by the function filter_rcv(). In this commit we correct this by reading the value of 'truesize' to the stack before delivering the buffer to filter_rcv(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:01:30 -07:00
Daniel Borkmann	9b87d46510	net: sctp: fix incorrect type in gfp initializer This fixes the following sparse warning: net/sctp/associola.c:1556:29: warning: incorrect type in initializer (different base types) net/sctp/associola.c:1556:29: expected bool [unsigned] [usertype] preload net/sctp/associola.c:1556:29: got restricted gfp_t Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 12:23:17 -07:00
Daniel Borkmann	a7288c4dd5	net: sctp: improve sctp_select_active_and_retran_path selection In function sctp_select_active_and_retran_path(), we walk the transport list in order to look for the two most recently used ACTIVE transports (trans_pri, trans_sec). In case we didn't find anything ACTIVE, we currently just camp on a possibly PF or INACTIVE transport that is primary path; this behavior actually dates back to linux-history tree of the very early days of lksctp, and can yield a behavior that chooses suboptimal transport paths. Instead, be a bit more clever by reusing and extending the recently introduced sctp_trans_elect_best() handler. In case both transports are evaluated to have the same score resulting from their states, break the tie by looking at: 1) transport patch error count 2) last_time_heard value from each transport. This is analogous to Nishida's Quick Failover draft [1], section 5.1, 3: The sender SHOULD avoid data transmission to PF destinations. When all destinations are in either PF or Inactive state, the sender MAY either move the destination from PF to active state (and transmit data to the active destination) or the sender MAY transmit data to a PF destination. In the former scenario, (i) the sender MUST NOT notify the ULP about the state transition, and (ii) MUST NOT clear the destination's error counter. It is recommended that the sender picks the PF destination with least error count (fewest consecutive timeouts) for data transmission. In case of a tie (multiple PF destinations with same error count), the sender MAY choose the last active destination. Thus for sctp_select_active_and_retran_path(), we keep track of the best, if any, transport that is in PF state and in case no ACTIVE transport has been found (hence trans_{pri,sec} is NULL), we select the best out of the three: current primary_path and retran_path as well as a possible PF transport. The secondary may still camp on the original primary_path as before. The change in sctp_trans_elect_best() with a more fine grained tie selection also improves at the same time path selection for sctp_assoc_update_retran_path() in case of non-ACTIVE states. [1] http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05 Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 12:23:17 -07:00
Daniel Borkmann	e575235fc6	net: sctp: migrate most recently used transport to ktime Be more precise in transport path selection and use ktime helpers instead of jiffies to compare and pick the better primary and secondary recently used transports. This also avoids any side-effects during a possible roll-over, and could lead to better path decision-making. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 12:23:17 -07:00
Daniel Borkmann	b82e8f31ac	net: sctp: refactor active path selection This patch just refactors and moves the code for the active path selection into its own helper function outside of sctp_assoc_control_transport() which is already big enough. No functional changes here. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 12:23:17 -07:00
Daniel Borkmann	67cb9366ff	ktime: add ktime_after and ktime_before helper Add two minimal helper functions analogous to time_before() and time_after() that will later on both be needed by SCTP code. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 12:23:17 -07:00
Phoebe Buckheister	2d3b5b0a90	mac802154: don't deliver packets to devices that are down Only one WPAN devices can be active at any given time, so only deliver packets to that one interface that is actually up. Multiple monitors may be up at any given time, but we don't have to deliver to monitors that are down either. Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 12:10:19 -07:00
Phoebe Buckheister	a374eeb5e5	mac802154: properly free incoming skbs on decryption failure mac802154 RX did not free skbs on decryption failure, assuming that the caller would when the local rx handler returned _DROP. This was false. Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 12:10:18 -07:00
Wei-Chun Chao	5882a07c72	net: fix UDP tunnel GSO of frag_list GRO packets This patch fixes a kernel BUG_ON in skb_segment. It is hit when testing two VMs on openvswitch with one VM acting as VXLAN gateway. During VXLAN packet GSO, skb_segment is called with skb->data pointing to inner TCP payload. skb_segment calls skb_network_protocol to retrieve the inner protocol. skb_network_protocol actually expects skb->data to point to MAC and it calls pskb_may_pull with ETH_HLEN. This ends up pulling in ETH_HLEN data from header tail. As a result, pskb_trim logic is skipped and BUG_ON is hit later. Move skb_push in front of skb_network_protocol so that skb->data lines up properly. kernel BUG at net/core/skbuff.c:2999! Call Trace: [<ffffffff816ac412>] tcp_gso_segment+0x122/0x410 [<ffffffff816bc74c>] inet_gso_segment+0x13c/0x390 [<ffffffff8164b39b>] skb_mac_gso_segment+0x9b/0x170 [<ffffffff816b3658>] skb_udp_tunnel_segment+0xd8/0x390 [<ffffffff816b3c00>] udp4_ufo_fragment+0x120/0x140 [<ffffffff816bc74c>] inet_gso_segment+0x13c/0x390 [<ffffffff8109d742>] ? default_wake_function+0x12/0x20 [<ffffffff8164b39b>] skb_mac_gso_segment+0x9b/0x170 [<ffffffff8164b4d0>] __skb_gso_segment+0x60/0xc0 [<ffffffff8164b6b3>] dev_hard_start_xmit+0x183/0x550 [<ffffffff8166c91e>] sch_direct_xmit+0xfe/0x1d0 [<ffffffff8164bc94>] __dev_queue_xmit+0x214/0x4f0 [<ffffffff8164bf90>] dev_queue_xmit+0x10/0x20 [<ffffffff81687edb>] ip_finish_output+0x66b/0x890 [<ffffffff81688a58>] ip_output+0x58/0x90 [<ffffffff816c628f>] ? fib_table_lookup+0x29f/0x350 [<ffffffff816881c9>] ip_local_out_sk+0x39/0x50 [<ffffffff816cbfad>] iptunnel_xmit+0x10d/0x130 [<ffffffffa0212200>] vxlan_xmit_skb+0x1d0/0x330 [vxlan] [<ffffffffa02a3919>] vxlan_tnl_send+0x129/0x1a0 [openvswitch] [<ffffffffa02a2cd6>] ovs_vport_send+0x26/0xa0 [openvswitch] [<ffffffffa029931e>] do_output+0x2e/0x50 [openvswitch] Signed-off-by: Wei-Chun Chao <weichunc@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 00:48:47 -07:00
huizhang	f6c20c596f	net: ipv6: Fixed up ipsec packet be re-routing issue Bug report on https://bugzilla.kernel.org/show_bug.cgi?id=75781 When a local output ipsec packet match the mangle table rule, and be set mark value, the packet will be route again in route_me_harder -> _session_decoder6 In this case, the nhoff in CB of skb was still the default value 0. So the protocal match can't success and the packet can't match correct SA rule,and then the packet be send out in plaintext. To fixed up the issue. The CB->nhoff must be set. Signed-off-by: Hui Zhang <huizhang@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 00:47:31 -07:00
Dmitry Popov	5ce54af1fc	ip_tunnel: fix i_key matching in ip_tunnel_find Some tunnels (though only vti as for now) can use i_key just for internal use: for example vti uses it for fwmark'ing incoming packets. So raw i_key value shouldn't be treated as a distinguisher for them. ip_tunnel_key_match exists for cases when we want to compare two ip_tunnel_parms' i_keys. Example bug: ip link add type vti ikey 1 local 1.0.0.1 remote 2.0.0.2 ip link add type vti ikey 2 local 1.0.0.1 remote 2.0.0.2 spawned two tunnels, although it doesn't make sense. Signed-off-by: Dmitry Popov <ixaphire@qrator.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 00:43:37 -07:00
Dmitry Popov	7c8e6b9c28	ip_vti: Fix 'ip tunnel add' with 'key' parameters ip tunnel add remote 10.2.2.1 local 10.2.2.2 mode vti ikey 1 okey 2 translates to p->iflags = VTI_ISVTI\|GRE_KEY and p->i_key = 1, but GRE_KEY != TUNNEL_KEY, so ip_tunnel_ioctl would set i_key to 0 (same story with o_key) making us unable to create vti tunnels with [io]key via ip tunnel. We cannot simply translate GRE_KEY to TUNNEL_KEY (as GRE module does) because vti_tunnels with same local/remote addresses but different ikeys will be treated as different then. So, imo the best option here is to move p->i_flags & *_KEY check for vti tunnels from ip_tunnel.c to ip_vti.c and to think about [io]_mark field for ip_tunnel_parm in the future. Signed-off-by: Dmitry Popov <ixaphire@qrator.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 00:30:52 -07:00
Alexei Starovoitov	e430f34ee5	net: filter: cleanup A/X name usage The macro 'A' used in internal BPF interpreter: #define A regs[insn->a_reg] was easily confused with the name of classic BPF register 'A', since 'A' would mean two different things depending on context. This patch is trying to clean up the naming and clarify its usage in the following way: - A and X are names of two classic BPF registers - BPF_REG_A denotes internal BPF register R0 used to map classic register A in internal BPF programs generated from classic - BPF_REG_X denotes internal BPF register R7 used to map classic register X in internal BPF programs generated from classic - internal BPF instruction format: struct sock_filter_int { __u8 code; /* opcode / __u8 dst_reg:4; / dest register / __u8 src_reg:4; / source register / __s16 off; / signed offset / __s32 imm; / signed immediate constant */ }; - BPF_X/BPF_K is 1 bit used to encode source operand of instruction In classic: BPF_X - means use register X as source operand BPF_K - means use 32-bit immediate as source operand In internal: BPF_X - means use 'src_reg' register as source operand BPF_K - means use 32-bit immediate as source operand Suggested-by: Chema Gonzalez <chema@google.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Chema Gonzalez <chema@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 00:13:16 -07:00
Manuel Schölling	84a7c0b1db	dns_resolver: assure that dns_query() result is null-terminated dns_query() credulously assumes that keys are null-terminated and returns a copy of a memory block that is off by one. Signed-off-by: Manuel Schölling <manuel.schoelling@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 00:12:04 -07:00
Linus Lüssing	2cd4143192	bridge: memorize and export selected IGMP/MLD querier port Adding bridge support to the batman-adv multicast optimization requires batman-adv knowing about the existence of bridged-in IGMP/MLD queriers to be able to reliably serve any multicast listener behind this same bridge. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-10 23:50:47 -07:00
Linus Lüssing	07f8ac4a1e	bridge: add export of multicast database adjacent to net_dev With this new, exported function br_multicast_list_adjacent(net_dev) a list of IPv4/6 addresses is returned. This list contains all multicast addresses sensed by the bridge multicast snooping feature on all bridge ports of the bridge interface of net_dev, excluding addresses from the specified net_device itself. Adding bridge support to the batman-adv multicast optimization requires batman-adv knowing about the existence of bridged-in multicast listeners to be able to reliably serve them with multicast packets. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-10 23:50:47 -07:00
Linus Lüssing	dc4eb53a99	bridge: adhere to querier election mechanism specified by RFCs MLDv1 (RFC2710 section 6), MLDv2 (RFC3810 section 7.6.2), IGMPv2 (RFC2236 section 3) and IGMPv3 (RFC3376 section 6.6.2) specify that the querier with lowest source address shall become the selected querier. So far the bridge stopped its querier as soon as it heard another querier regardless of its source address. This results in the "wrong" querier potentially becoming the active querier or a potential, unnecessary querying delay. With this patch the bridge memorizes the source address of the currently selected querier and ignores queries from queriers with a higher source address than the currently selected one. This slight optimization is supposed to make it more RFC compliant (but is rather uncritical and therefore probably not necessary to be queued for stable kernels). Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-10 23:50:47 -07:00
Linus Lüssing	90010b36eb	bridge: rename struct bridge_mcast_query/querier The current naming of these two structs is very random, in that reversing their naming would not make any semantical difference. This patch tries to make the naming less confusing by giving them a more specific, distinguishable naming. This is also useful for the upcoming patches reintroducing the "struct bridge_mcast_querier" but for storing information about the selected querier (no matter if our own or a foreign querier). Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-10 23:50:46 -07:00
Dmitry Popov	2346829e64	ipip, sit: fix ipv4_{update_pmtu,redirect} calls ipv4_{update_pmtu,redirect} were called with tunnel's ifindex (t->dev is a tunnel netdevice). It caused wrong route lookup and failure of pmtu update or redirect. We should use the same ifindex that we use in ip_route_output_* in *tunnel_xmit code. It is t->parms.link . Signed-off-by: Dmitry Popov <ixaphire@qrator.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-10 23:35:52 -07:00
stephen hemminger	f8c1b7ce00	gre: allow changing mac address when device is up There is no need to require forcing device down on a Ethernet GRE (gretap) tunnel to change the MAC address. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-10 22:46:42 -07:00
Octavian Purdila	6cc55e096f	tcp: add gfp parameter to tcp_fragment tcp_fragment can be called from process context (from tso_fragment). Add a new gfp parameter to allow it to preserve atomic memory if possible. Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Reviewed-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-10 22:30:58 -07:00
Linus Torvalds	d1e1cda862	NFS client updates for Linux 3.16 Highlights include: - Massive cleanup of the NFS read/write code by Anna and Dros - Support multiple NFS read/write requests per page in order to deal with non-page aligned pNFS striping. Also cleans up the r/wsize < page size code nicely. - stable fix for ensuring inode is declared uptodate only after all the attributes have been checked. - stable fix for a kernel Oops when remounting - NFS over RDMA client fixes - move the pNFS files layout driver into its own subdirectory -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJTl3pmAAoJEGcL54qWCgDyraIP/08ZbbDowVTP9572bxl+VR2i zNbrflBtl1R05D4Imi/IEySK0w6xj1CLsncNpXAT2bxTlyKPW70tpiiPlRKMPuO8 JW+iPiepR2t0mol6MEd46yuV8btXVk8I+7IYjPXANiMJG8O5dJzNQ8NiCQOERBNt FQ7rzTCFO0ESGXnT6vYrT4I0bwqYVklBiJRTT4PQVzhhhDq9qUdq21BlQjQJFXP4 9aBLurxKptlHBvE6A2Quja6ObEC0s31CxcijqHIJ+Ue4GbKcFbMG1tgjY7ESE/AD rqzDeF0jvWHT+frmvFEUUXWqzF1ReZ4x9pfDoOgeG6T9/K6DT91O0yMOgG8jvlbF 8DSATNYGDX5sSjpvaG5JokGG+cGCk9srVDx+itn7HlwzalRwn0PjKtIYwOJ7TJIr o/j20nOsPrRGF0OqLf9phyocgRrlbMKOzj1IXldHHfAbNkRcISTK08lxvsz96Ddn zRyDmbsbY6QFXdB3AVSeQmg5R0OOLtzNIcsFPmNdvy5eiy67qU0lsGg8UGNnoz8k PHN1pcGejkctLhQ32ee3w/W6zkrgpJZcNC9JSoG8Dc3SeXus0c3IgumRknFCmiep ssN+1jEITAGeS5a2aBxwLQLVI2JAr2lxs5e+R4D5EsQlFkCl6Mrgtzh/aToWTuFl Qt7l2zI3r3VieKT9u7Bh =OyXR -----END PGP SIGNATURE----- Merge tag 'nfs-for-3.16-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client updates from Trond Myklebust: "Highlights include: - massive cleanup of the NFS read/write code by Anna and Dros - support multiple NFS read/write requests per page in order to deal with non-page aligned pNFS striping. Also cleans up the r/wsize < page size code nicely. - stable fix for ensuring inode is declared uptodate only after all the attributes have been checked. - stable fix for a kernel Oops when remounting - NFS over RDMA client fixes - move the pNFS files layout driver into its own subdirectory" * tag 'nfs-for-3.16-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (79 commits) NFS: populate ->net in mount data when remounting pnfs: fix lockup caused by pnfs_generic_pg_test NFSv4.1: Fix typo in dprintk NFSv4.1: Comment is now wrong and redundant to code NFS: Use raw_write_seqcount_begin/end int nfs4_reclaim_open_state xprtrdma: Disconnect on registration failure xprtrdma: Remove BUG_ON() call sites xprtrdma: Avoid deadlock when credit window is reset SUNRPC: Move congestion window constants to header file xprtrdma: Reset connection timeout after successful reconnect xprtrdma: Use macros for reconnection timeout constants xprtrdma: Allocate missing pagelist xprtrdma: Remove Tavor MTU setting xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnecting xprtrdma: Reduce the number of hardway buffer allocations xprtrdma: Limit work done by completion handler xprtrmda: Reduce calls to ib_poll_cq() in completion handlers xprtrmda: Reduce lock contention in completion handlers xprtrdma: Split the completion queue xprtrdma: Make rpcrdma_ep_destroy() return void ...	2014-06-10 15:02:42 -07:00
Linus Torvalds	5b174fd647	Merge branch 'for-3.16' of git://linux-nfs.org/~bfields/linux Pull nfsd updates from Bruce Fields: "The largest piece is a long-overdue rewrite of the xdr code to remove some annoying limitations: for example, there was no way to return ACLs larger than 4K, and readdir results were returned only in 4k chunks, limiting performance on large directories. Also: - part of Neil Brown's work to make NFS work reliably over the loopback interface (so client and server can run on the same machine without deadlocks). The rest of it is coming through other trees. - cleanup and bugfixes for some of the server RDMA code, from Steve Wise. - Various cleanup of NFSv4 state code in preparation for an overhaul of the locking, from Jeff, Trond, and Benny. - smaller bugfixes and cleanup from Christoph Hellwig and Kinglong Mee. Thanks to everyone! This summer looks likely to be busier than usual for knfsd. Hopefully we won't break it too badly; testing definitely welcomed" * 'for-3.16' of git://linux-nfs.org/~bfields/linux: (100 commits) nfsd4: fix FREE_STATEID lockowner leak svcrdma: Fence LOCAL_INV work requests svcrdma: refactor marshalling logic nfsd: don't halt scanning the DRC LRU list when there's an RC_INPROG entry nfs4: remove unused CHANGE_SECURITY_LABEL nfsd4: kill READ64 nfsd4: kill READ32 nfsd4: simplify server xdr->next_page use nfsd4: hash deleg stateid only on successful nfs4_set_delegation nfsd4: rename recall_lock to state_lock nfsd: remove unneeded zeroing of fields in nfsd4_proc_compound nfsd: fix setting of NFS4_OO_CONFIRMED in nfsd4_open nfsd4: use recall_lock for delegation hashing nfsd: fix laundromat next-run-time calculation nfsd: make nfsd4_encode_fattr static SUNRPC/NFSD: Remove using of dprintk with KERN_WARNING nfsd: remove unused function nfsd_read_file nfsd: getattr for FATTR4_WORD0_FILES_AVAIL needs the statfs buffer NFSD: Error out when getting more than one fsloc/secinfo/uuid NFSD: Using type of uint32_t for ex_nflavors instead of int ...	2014-06-10 11:50:57 -07:00
Linus Torvalds	14208b0ec5	Merge branch 'for-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "A lot of activities on cgroup side. Heavy restructuring including locking simplification took place to improve the code base and enable implementation of the unified hierarchy, which currently exists behind a __DEVEL__ mount option. The core support is mostly complete but individual controllers need further work. To explain the design and rationales of the the unified hierarchy Documentation/cgroups/unified-hierarchy.txt is added. Another notable change is css (cgroup_subsys_state - what each controller uses to identify and interact with a cgroup) iteration update. This is part of continuing updates on css object lifetime and visibility. cgroup started with reference count draining on removal way back and is now reaching a point where csses behave and are iterated like normal refcnted objects albeit with some complexities to allow distinguishing the state where they're being deleted. The css iteration update isn't taken advantage of yet but is planned to be used to simplify memcg significantly" * 'for-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (77 commits) cgroup: disallow disabled controllers on the default hierarchy cgroup: don't destroy the default root cgroup: disallow debug controller on the default hierarchy cgroup: clean up MAINTAINERS entries cgroup: implement css_tryget() device_cgroup: use css_has_online_children() instead of has_children() cgroup: convert cgroup_has_live_children() into css_has_online_children() cgroup: use CSS_ONLINE instead of CGRP_DEAD cgroup: iterate cgroup_subsys_states directly cgroup: introduce CSS_RELEASED and reduce css iteration fallback window cgroup: move cgroup->serial_nr into cgroup_subsys_state cgroup: link all cgroup_subsys_states in their sibling lists cgroup: move cgroup->sibling and ->children into cgroup_subsys_state cgroup: remove cgroup->parent device_cgroup: remove direct access to cgroup->children memcg: update memcg_has_children() to use css_next_child() memcg: remove tasks/children test from mem_cgroup_force_empty() cgroup: remove css_parent() cgroup: skip refcnting on normal root csses and cgrp_dfl_root self css cgroup: use cgroup->self.refcnt for cgroup refcnting ...	2014-06-09 15:03:33 -07:00
David S. Miller	b78370c021	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next John W. Linville says: ==================== pull request: wireless-next 2014-06-06 Please accept this batch of fixes intended for the 3.16 stream. For the bluetooth bits, Gustavo says: "Here some more patches for 3.16. We know that Linus already opened the merge window, but this is fix only pull request, and most of the patches here are also tagged for stable." Along with that, Andrea Merello provides a fix for the broken scanning in the venerable at76c50x driver... ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-08 14:17:39 -07:00
Eric Dumazet	87757a917b	net: force a list_del() in unregister_netdevice_many() unregister_netdevice_many() API is error prone and we had too many bugs because of dangling LIST_HEAD on stacks. See commit `f87e6f4793` ("net: dont leave active on stack LIST_HEAD") In fact, instead of making sure no caller leaves an active list_head, just force a list_del() in the callee. No one seems to need to access the list after unregister_netdevice_many() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-08 14:15:14 -07:00
Linus Torvalds	b20dcab9d4	LLVMLinux patches for v3.16 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iEYEABECAAYFAlOTY+wACgkQuseO5dulBZXrIgCdFZyXRojufLLKikWEvHjZ3/k5 KsQAnimtcge+62/IX7YwDjWS+xg9Wt3m =yPrI -----END PGP SIGNATURE----- Merge tag 'llvmlinux-for-v3.16' of git://git.linuxfoundation.org/llvmlinux/kernel Pull LLVM patches from Behan Webster: "Next set of patches to support compiling the kernel with clang. They've been soaking in linux-next since the last merge window. More still in the works for the next merge window..." * tag 'llvmlinux-for-v3.16' of git://git.linuxfoundation.org/llvmlinux/kernel: arm, unwind, LLVMLinux: Enable clang to be used for unwinding the stack ARM: LLVMLinux: Change "extern inline" to "static inline" in glue-cache.h all: LLVMLinux: Change DWARF flag to support gcc and clang net: netfilter: LLVMLinux: vlais-netfilter crypto: LLVMLinux: aligned-attribute.patch	2014-06-08 12:27:44 -07:00
Linus Torvalds	3f17ea6dea	Merge branch 'next' (accumulated 3.16 merge window patches) into master Now that 3.15 is released, this merges the 'next' branch into 'master', bringing us to the normal situation where my 'master' branch is the merge window. * accumulated work in next: (6809 commits) ufs: sb mutex merge + mutex_destroy powerpc: update comments for generic idle conversion cris: update comments for generic idle conversion idle: remove cpu_idle() forward declarations nbd: zero from and len fields in NBD_CMD_DISCONNECT. mm: convert some level-less printks to pr_* MAINTAINERS: adi-buildroot-devel is moderated MAINTAINERS: add linux-api for review of API/ABI changes mm/kmemleak-test.c: use pr_fmt for logging fs/dlm/debug_fs.c: replace seq_printf by seq_puts fs/dlm/lockspace.c: convert simple_str to kstr fs/dlm/config.c: convert simple_str to kstr mm: mark remap_file_pages() syscall as deprecated mm: memcontrol: remove unnecessary memcg argument from soft limit functions mm: memcontrol: clean up memcg zoneinfo lookup mm/memblock.c: call kmemleak directly from memblock_(alloc\|free) mm/mempool.c: update the kmemleak stack trace for mempool allocations lib/radix-tree.c: update the kmemleak stack trace for radix tree allocations mm: introduce kmemleak_update_trace() mm/kmemleak.c: use %u to print ->checksum ...	2014-06-08 11:31:16 -07:00
Mark Charlebois	066c6807f7	net: netfilter: LLVMLinux: vlais-netfilter Replaced non-standard C use of Variable Length Arrays In Structs (VLAIS) in xt_repldata.h with a C99 compliant flexible array member and then calculated offsets to the other struct members. These other members aren't referenced by name in this code, however this patch maintains the same memory layout and padding as was previously accomplished using VLAIS. Had the original structure been ordered differently, with the entries VLA at the end, then it could have been a flexible member, and this patch would have been a lot simpler. However since the data stored in this structure is ultimately exported to userspace, the order of this structure can't be changed. This patch makes no attempt to change the existing behavior, merely the way in which the current layout is accomplished using standard C99 constructs. As such the code can now be compiled with either gcc or clang. This version of the patch removes the trailing alignment that the VLAIS structure would allocate in order to simplify the patch. Author: Mark Charlebois <charlebm@gmail.com> Signed-off-by: Mark Charlebois <charlebm@gmail.com> Signed-off-by: Behan Webster <behanw@converseincode.com> Signed-off-by: Vinícius Tinti <viniciustinti@gmail.com>	2014-06-07 11:44:39 -07:00
Phoebe Buckheister	fff1f59b17	mac802154: llsec: add forgotten list_del_rcu in key removal During key removal, the key object is freed, but not taken out of the llsec key list properly. Fix that. Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-06 16:25:37 -07:00
Steve Wise	83710fc753	svcrdma: Fence LOCAL_INV work requests Fencing forces the invalidate to only happen after all prior send work requests have been completed. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reported by : Devesh Sharma <Devesh.Sharma@Emulex.Com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-06-06 19:22:51 -04:00
Steve Wise	0bf4828983	svcrdma: refactor marshalling logic This patch refactors the NFSRDMA server marshalling logic to remove the intermediary map structures. It also fixes an existing bug where the NFSRDMA server was not minding the device fast register page list length limitations. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com>	2014-06-06 19:22:50 -04:00
J. Bruce Fields	05638dc73a	nfsd4: simplify server xdr->next_page use The rpc code makes available to the NFS server an array of pages to encod into. The server represents its reply as an xdr buf, with the head pointing into the first page in that array, the pages ** array starting just after that, and the tail (if any) sharing any leftover space in the page used by the head. While encoding, we use xdr_stream->page_ptr to keep track of which page we're currently using. Currently we set xdr_stream->page_ptr to buf->pages, which makes the head a weird exception to the rule that page_ptr always points to the page we're currently encoding into. So, instead set it to buf->pages - 1 (the page actually containing the head), and remove the need for a little unintuitive logic in xdr_get_next_encode_buffer() and xdr_truncate_encode. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-06-06 19:22:46 -04:00
John W. Linville	c6ac68a612	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem	2014-06-06 11:59:11 -04:00
Dmitry Popov	586d5fc867	ip_tunnel: fix possible rtable leak ip_rt_put(rt) is always called in "error" branches above, but was missed in skb_cow_head branch. As rt is not yet bound to skb here we have to release it by hand. Signed-off-by: Dmitry Popov <ixaphire@qrator.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-05 18:44:44 -07:00
Ilya Dryomov	6044cde6f2	libceph: add ceph_monc_wait_osdmap() Add ceph_monc_wait_osdmap(), which will block until the osdmap with the specified epoch is received or timeout occurs. Export both of these as they are going to be needed by rbd. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2014-06-06 09:29:57 +08:00
Ilya Dryomov	513a8243d6	libceph: mon_get_version request infrastructure Add support for mon_get_version requests to libceph. This reuses much of the ceph_mon_generic_request infrastructure, with one exception. Older OSDs don't set mon_get_version reply hdr->tid even if the original request had a non-zero tid, which makes it impossible to lookup ceph_mon_generic_request contexts by tid in get_generic_reply() for such replies. As a workaround, we allocate a reply message on the reply path. This can probably interfere with revoke, but I don't see a better way. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2014-06-06 09:29:57 +08:00
Ilya Dryomov	002b36ba5e	libceph: recognize poolop requests in debugfs Recognize poolop requests in debugfs monc dump, fix prink format specifiers - tid is unsigned. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2014-06-06 09:29:56 +08:00
Sven Wegener	9e89fd8b7d	ipv6: Shrink udp_v6_mcast_next() to one socket variable To avoid the confusion of having two variables, shrink the function to only use the parameter variable for looping. Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-05 16:23:08 -07:00
David S. Miller	f666f87b94	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/xen-netback/netback.c net/core/filter.c A filter bug fix overlapped some cleanups and a conversion over to some new insn generation macros. A xen-netback bug fix overlapped the addition of multi-queue support. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-05 16:22:02 -07:00
Alexei Starovoitov	0dcceabb0c	net: filter: fix SKF_AD_PKTTYPE extension on big-endian BPF classic->internal converter broke SKF_AD_PKTTYPE extension, since pkt_type_offset() was failing to find skb->pkt_type field which is defined as: __u8 pkt_type:3, fclone:2, ipvs_property:1, peeked:1, nf_trace:1; Fix it by searching for 3 most significant bits and shift them by 5 at run-time Fixes: `bd4cf0ed33` ("net: filter: rework/optimize internal BPF interpreter's instruction set") Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Tested-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-05 15:40:38 -07:00
David S. Miller	6934e79ed1	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter/nf_tables fixes for net-next This patchset contains fixes for recent updates available in your net-next, they are: 1) Fix double memory allocation for accounting objects that results in a leak, this slipped through with the new quota extension, patch from Mathieu Poirier. 2) Fix broken ordering when adding set element transactions. 3) Make sure that objects are released in reverse order in the abort path, to avoid possible use-after-free when accessing dependencies. 4) Allow to delete several objects (as long as dependencies are fulfilled) by using one batch. This includes changes in the use counter semantics of the nf_tables objects. 5) Fix illegal sleeping allocation from rcu callback. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-05 15:35:04 -07:00
Toshiaki Makita	e0a47d1f78	bridge: Fix incorrect judgment of promisc br_manage_promisc() incorrectly expects br_auto_port() to return only 0 or 1, while it actually returns flags, i.e., a subset of BR_AUTO_MASK. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-05 15:20:31 -07:00
Simon Horman	3b392ddba2	MPLS: Use mpls_features to activate software MPLS GSO segmentation If an MPLS packet requires segmentation then use mpls_features to determine if the software implementation should be used. As no driver advertises MPLS GSO segmentation this will always be the case. I had not noticed that this was necessary before as software MPLS GSO segmentation was already being used in my test environment. I believe that the reason for that is the skbs in question always had fragments and the driver I used does not advertise NETIF_F_FRAGLIST (which seems to be the case for most drivers). Thus software segmentation was activated by skb_gso_ok(). This introduces the overhead of an extra call to skb_network_protocol() in the case where where CONFIG_NET_MPLS_GSO is set and skb->ip_summed == CHECKSUM_NONE. Thanks to Jesse Gross for prompting me to investigate this. Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-05 15:05:09 -07:00
John W. Linville	67be1e4f4b	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next	2014-06-05 14:10:07 -04:00
WANG Cong	ebbe495f19	ipv4: use skb frags api in udp4_hwcsum() Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-05 00:51:47 -07:00
WANG Cong	4cb28970a2	net: use the new API kvfree() It is available since v3.15-rc5. Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-05 00:49:51 -07:00
Manuel Schölling	9638f6713f	dns_resolver: Do not accept domain names longer than 255 chars According to RFC1035 "[...] the total length of a domain name (i.e., label octets and label length octets) is restricted to 255 octets or less." Signed-off-by: Manuel Schölling <manuel.schoelling@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-05 00:05:53 -07:00
Tom Herbert	359a0ea987	vxlan: Add support for UDP checksums (v4 sending, v6 zero csums) Added VXLAN link configuration for sending UDP checksums, and allowing TX and RX of UDP6 checksums. Also, call common iptunnel_handle_offloads and added GSO support for checksums. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-04 22:46:39 -07:00
Tom Herbert	4749c09c37	gre: Call gso_make_checksum Call gso_make_checksum. This should have the benefit of using a checksum that may have been previously computed for the packet. This also adds NETIF_F_GSO_GRE_CSUM to differentiate devices that offload GRE GSO with and without the GRE checksum offloaed. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-04 22:46:38 -07:00
Tom Herbert	0f4f4ffa7b	net: Add GSO support for UDP tunnels with checksum Added a new netif feature for GSO_UDP_TUNNEL_CSUM. This indicates that a device is capable of computing the UDP checksum in the encapsulating header of a UDP tunnel. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-04 22:46:38 -07:00
Tom Herbert	e9c3a24b3a	tcp: Call gso_make_checksum Call common gso_make_checksum when calculating checksum for a TCP GSO segment. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-04 22:46:38 -07:00
Tom Herbert	7e2b10c1e5	net: Support for multiple checksums with gso When creating a GSO packet segment we may need to set more than one checksum in the packet (for instance a TCP checksum and UDP checksum for VXLAN encapsulation). To be efficient, we want to do checksum calculation for any part of the packet at most once. This patch adds csum_start offset to skb_gso_cb. This tracks the starting offset for skb->csum which is initially set in skb_segment. When a protocol needs to compute a transport checksum it calls gso_make_checksum which computes the checksum value from the start of transport header to csum_start and then adds in skb->csum to get the full checksum. skb->csum and csum_start are then updated to reflect the checksum of the resultant packet starting from the transport header. This patch also adds a flag to skbuff, encap_hdr_csum, which is set in *gso_segment fucntions to indicate that a tunnel protocol needs checksum calculation Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-04 22:46:38 -07:00
Tom Herbert	77157e1973	l2tp: call udp{6}_set_csum Call common functions to set checksum for UDP tunnel. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-04 22:46:38 -07:00
Tom Herbert	af5fcba7f3	udp: Generic functions to set checksum Added udp_set_csum and udp6_set_csum functions to set UDP checksums in packets. These are for simple UDP packets such as those that might be created in UDP tunnels. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-04 22:46:38 -07:00
Sven Wegener	3bfdc59a6c	ipv6: Fix regression caused by `efe4208` in udp_v6_mcast_next() Commit `efe4208` ("ipv6: make lookups simpler and faster") introduced a regression in udp_v6_mcast_next(), resulting in multicast packets not reaching the destination sockets under certain conditions. The packet's IPv6 addresses are wrongly compared to the IPv6 addresses from the function's socket argument, which indicates the starting point for looping, instead of the loop variable. If the addresses from the first socket do not match the packet's addresses, no socket in the list will match. Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-04 15:42:01 -07:00
Sasha Levin	f830b0223c	net: Revert "fib_trie: use seq_file_net rather than seq->private" This reverts commit `30f38d2fdd`. fib_triestat is surrounded by a big lie: while it claims that it's a seq_file (fib_triestat_seq_open, fib_triestat_seq_show), it isn't: static const struct file_operations fib_triestat_fops = { .owner = THIS_MODULE, .open = fib_triestat_seq_open, .read = seq_read, .llseek = seq_lseek, .release = single_release_net, }; Yes, fib_triestat is just a regular file. A small detail (assuming CONFIG_NET_NS=y) is that while for seq_files you could do seq_file_net() to get the net ptr, doing so for a regular file would be wrong and would dereference an invalid pointer. The fib_triestat lie claimed a victim, and trying to show the file would be bad for the kernel. This patch just reverts the issue and fixes fib_triestat, which still needs a rewrite to either be a seq_file or stop claiming it is. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-04 15:11:41 -07:00
Chuck Lever	c93c62231c	xprtrdma: Disconnect on registration failure If rpcrdma_register_external() fails during request marshaling, the current RPC request is killed. Instead, this RPC should be retried after reconnecting the transport instance. The most likely reason for registration failure with FRMR is a failed post_send, which would be due to a remote transport disconnect or memory exhaustion. These issues can be recovered by a retry. Problems encountered in the marshaling logic itself will not be corrected by trying again, so these should still kill a request. Now that we've added a clean exit for marshaling errors, take the opportunity to defang some BUG_ON's. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-06-04 08:56:53 -04:00
Chuck Lever	c977dea227	xprtrdma: Remove BUG_ON() call sites If an error occurs in the marshaling logic, fail the RPC request being processed, but leave the client running. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-06-04 08:56:53 -04:00
Chuck Lever	e7ce710a88	xprtrdma: Avoid deadlock when credit window is reset Update the cwnd while processing the server's reply. Otherwise the next task on the xprt_sending queue is still subject to the old credit window. Currently, no task is awoken if the old congestion window is still exceeded, even if the new window is larger, and a deadlock results. This is an issue during a transport reconnect. Servers don't normally shrink the credit window, but the client does reset it to 1 when reconnecting so the server can safely grow it again. As a minor optimization, remove the hack of grabbing the initial cwnd size (which happens to be RPC_CWNDSCALE) and using that value as the congestion scaling factor. The scaling value is invariant, and we are better off without the multiplication operation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-06-04 08:56:52 -04:00
Chuck Lever	4f4cf5ad6f	SUNRPC: Move congestion window constants to header file I would like to use one of the RPC client's congestion algorithm constants in transport-specific code. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-06-04 08:56:51 -04:00
Chuck Lever	18906972aa	xprtrdma: Reset connection timeout after successful reconnect If the new connection is able to make forward progress, reset the re-establish timeout. Otherwise it keeps growing even if disconnect events are rare. The same behavior as TCP is adopted: reconnect immediately if the transport instance has been able to make some forward progress. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-06-04 08:56:51 -04:00
Chuck Lever	bfaee096de	xprtrdma: Use macros for reconnection timeout constants Clean up: Ensure the same max and min constant values are used everywhere when setting reconnect timeouts. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-06-04 08:56:50 -04:00
Shirley Ma	196c69989d	xprtrdma: Allocate missing pagelist GETACL relies on transport layer to alloc memory for reply buffer. However xprtrdma assumes that the reply buffer (pagelist) has been pre-allocated in upper layer. This problem was reported by IOL OFA lab test on PPC. Signed-off-by: Shirley Ma <shirley.ma@oracle.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Edward Mossman <emossman@iol.unh.edu> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-06-04 08:56:49 -04:00
Chuck Lever	5bc4bc7292	xprtrdma: Remove Tavor MTU setting Clean up. Remove HCA-specific clutter in xprtrdma, which is supposed to be device-independent. Hal Rosenstock <hal@dev.mellanox.co.il> observes: > Note that there is OpenSM option (enable_quirks) to return 1K MTU > in SA PathRecord responses for Tavor so that can be used for this. > The default setting for enable_quirks is FALSE so that would need > changing. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-06-04 08:56:48 -04:00
Chuck Lever	ec62f40d35	xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnecting Devesh Sharma <Devesh.Sharma@Emulex.Com> reports that after a disconnect, his HCA is failing to create a fresh QP, leaving ia_ri->ri_id->qp set to NULL. But xprtrdma still allows RPCs to wake up and post LOCAL_INV as they exit, causing an oops. rpcrdma_ep_connect() is allowing the wake-up by leaking the QP creation error code (-EPERM in this case) to the RPC client's generic layer. xprt_connect_status() does not recognize -EPERM, so it kills pending RPC tasks immediately rather than retrying the connect. Re-arrange the QP creation logic so that when it fails on reconnect, it leaves ->qp with the old QP rather than NULL. If pending RPC tasks wake and exit, LOCAL_INV work requests will flush rather than oops. On initial connect, leaving ->qp == NULL is OK, since there are no pending RPCs that might use ->qp. But be sure not to try to destroy a NULL QP when rpcrdma_ep_connect() is retried. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-06-04 08:56:47 -04:00
Chuck Lever	65866f8259	xprtrdma: Reduce the number of hardway buffer allocations While marshaling an RPC/RDMA request, the inline_{rsize,wsize} settings determine whether an inline request is used, or whether read or write chunks lists are built. The current default value of these settings is 1024. Any RPC request smaller than 1024 bytes is sent to the NFS server completely inline. rpcrdma_buffer_create() allocates and pre-registers a set of RPC buffers for each transport instance, also based on the inline rsize and wsize settings. RPC/RDMA requests and replies are built in these buffers. However, if an RPC/RDMA request is expected to be larger than 1024, a buffer has to be allocated and registered for that RPC, and deregistered and released when the RPC is complete. This is known has a "hardway allocation." Since the introduction of NFSv4, the size of RPC requests has become larger, and hardway allocations are thus more frequent. Hardway allocations are significant overhead, and they waste the existing RPC buffers pre-allocated by rpcrdma_buffer_create(). We'd like fewer hardway allocations. Increasing the size of the pre-registered buffers is the most direct way to do this. However, a blanket increase of the inline thresholds has interoperability consequences. On my 64-bit system, rpcrdma_buffer_create() requests roughly 7000 bytes for each RPC request buffer, using kmalloc(). Due to internal fragmentation, this wastes nearly 1200 bytes because kmalloc() already returns an 8192-byte piece of memory for a 7000-byte allocation request, though the extra space remains unused. So let's round up the size of the pre-allocated buffers, and make use of the unused space in the kmalloc'd memory. This change reduces the amount of hardway allocated memory for an NFSv4 general connectathon run from 1322092 to 9472 bytes (99%). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-06-04 08:56:46 -04:00
Chuck Lever	8301a2c047	xprtrdma: Limit work done by completion handler Sagi Grimberg <sagig@dev.mellanox.co.il> points out that a steady stream of CQ events could starve other work because of the boundless loop pooling in rpcrdma_{send,recv}_poll(). Instead of a (potentially infinite) while loop, return after collecting a budgeted number of completions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Sagi Grimberg <sagig@dev.mellanox.co.il> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-06-04 08:56:45 -04:00
Chuck Lever	1c00dd0776	xprtrmda: Reduce calls to ib_poll_cq() in completion handlers Change the completion handlers to grab up to 16 items per ib_poll_cq() call. No extra ib_poll_cq() is needed if fewer than 16 items are returned. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-06-04 08:56:44 -04:00
Chuck Lever	7f23f6f6e3	xprtrmda: Reduce lock contention in completion handlers Skip the ib_poll_cq() after re-arming, if the provider knows there are no additional items waiting. (Have a look at commit `ed23a727` for more details). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-06-04 08:56:43 -04:00
Chuck Lever	fc66448549	xprtrdma: Split the completion queue The current CQ handler uses the ib_wc.opcode field to distinguish between event types. However, the contents of that field are not reliable if the completion status is not IB_WC_SUCCESS. When an error completion occurs on a send event, the CQ handler schedules a tasklet with something that is not a struct rpcrdma_rep. This is never correct behavior, and sometimes it results in a panic. To resolve this issue, split the completion queue into a send CQ and a receive CQ. The send CQ handler now handles only struct rpcrdma_mw wr_id's, and the receive CQ handler now handles only struct rpcrdma_rep wr_id's. Fix suggested by Shirley Ma <shirley.ma@oracle.com> Reported-by: Rafael Reiter <rafael.reiter@ims.co.at> Fixes: `5c635e09ce` BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=73211 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Klemens Senn <klemens.senn@ims.co.at> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-06-04 08:56:42 -04:00
Chuck Lever	7f1d54191e	xprtrdma: Make rpcrdma_ep_destroy() return void Clean up: rpcrdma_ep_destroy() returns a value that is used only to print a debugging message. rpcrdma_ep_destroy() already prints debugging messages in all error cases. Make rpcrdma_ep_destroy() return void instead. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-06-04 08:56:41 -04:00
Chuck Lever	13c9ff8f67	xprtrdma: Simplify rpcrdma_deregister_external() synopsis Clean up: All remaining callers of rpcrdma_deregister_external() pass NULL as the last argument, so remove that argument. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-06-04 08:56:40 -04:00
Chuck Lever	cdd9ade711	xprtrdma: mount reports "Invalid mount option" if memreg mode not supported If the selected memory registration mode is not supported by the underlying provider/HCA, the NFS mount command reports that there was an invalid mount option, and fails. This is misleading. Reporting a problem allocating memory is a lot closer to the truth. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-06-04 08:56:39 -04:00
Chuck Lever	f10eafd3a6	xprtrdma: Fall back to MTHCAFMR when FRMR is not supported An audit of in-kernel RDMA providers that do not support the FRMR memory registration shows that several of them support MTHCAFMR. Prefer MTHCAFMR when FRMR is not supported. If MTHCAFMR is not supported, only then choose ALLPHYSICAL. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-06-04 08:56:39 -04:00
Chuck Lever	0ac531c183	xprtrdma: Remove REGISTER memory registration mode All kernel RDMA providers except amso1100 support either MTHCAFMR or FRMR, both of which are faster than REGISTER. amso1100 can continue to use ALLPHYSICAL. The only other ULP consumer in the kernel that uses the reg_phys_mr verb is Lustre. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-06-04 08:56:38 -04:00
Chuck Lever	b45ccfd25d	xprtrdma: Remove MEMWINDOWS registration modes The MEMWINDOWS and MEMWINDOWS_ASYNC memory registration modes were intended as stop-gap modes before the introduction of FRMR. They are now considered obsolete. MEMWINDOWS_ASYNC is also considered unsafe because it can leave client memory registered and exposed for an indeterminant time after each I/O. At this point, the MEMWINDOWS modes add needless complexity, so remove them. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-06-04 08:56:37 -04:00
Chuck Lever	03ff8821eb	xprtrdma: Remove BOUNCEBUFFERS memory registration mode Clean up: This memory registration mode is slow and was never meant for use in production environments. Remove it to reduce implementation complexity. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-06-04 08:56:37 -04:00
Chuck Lever	254f91e2fa	xprtrdma: RPC/RDMA must invoke xprt_wake_pending_tasks() in process context An IB provider can invoke rpcrdma_conn_func() in an IRQ context, thus rpcrdma_conn_func() cannot be allowed to directly invoke generic RPC functions like xprt_wake_pending_tasks(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-06-04 08:56:36 -04:00
Allen Andrews	4034ba0423	nfs-rdma: Fix for FMR leaks Two memory region leaks were found during testing: 1. rpcrdma_buffer_create: While allocating RPCRDMA_FRMR's ib_alloc_fast_reg_mr is called and then ib_alloc_fast_reg_page_list is called. If ib_alloc_fast_reg_page_list returns an error it bails out of the routine dropping the last ib_alloc_fast_reg_mr frmr region creating a memory leak. Added code to dereg the last frmr if ib_alloc_fast_reg_page_list fails. 2. rpcrdma_buffer_destroy: While cleaning up, the routine will only free the MR's on the rb_mws list if there are rb_send_bufs present. However, in rpcrdma_buffer_create while the rb_mws list is being built if one of the MR allocation requests fail after some MR's have been allocated on the rb_mws list the routine never gets to create any rb_send_bufs but instead jumps to the rpcrdma_buffer_destroy routine which will never free the MR's on rb_mws list because the rb_send_bufs were never created. This leaks all the MR's on the rb_mws list that were created prior to one of the MR allocations failing. Issue(2) was seen during testing. Our adapter had a finite number of MR's available and we created enough connections to where we saw an MR allocation failure on our Nth NFS connection request. After the kernel cleaned up the resources it had allocated for the Nth connection we noticed that FMR's had been leaked due to the coding error described above. Issue(1) was seen during a code review while debugging issue(2). Signed-off-by: Allen Andrews <allen.andrews@emulex.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-06-04 08:56:35 -04:00
Steve Wise	0fc6c4e7bb	xprtrdma: mind the device's max fast register page list depth Some rdma devices don't support a fast register page list depth of at least RPCRDMA_MAX_DATA_SEGS. So xprtrdma needs to chunk its fast register regions according to the minimum of the device max supported depth or RPCRDMA_MAX_DATA_SEGS. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2014-06-04 08:56:33 -04:00
David S. Miller	c99f7abf0e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: include/net/inetpeer.h net/ipv6/output_core.c Changes in net were fixing bugs in code removed in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-03 23:32:12 -07:00
WANG Cong	92ff71b8fe	net: remove some unless free on failure in alloc_netdev_mqs() When we jump to free_pcpu on failure in alloc_netdev_mqs() rx and tx queues are not yet allocated, so no need to free them. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-03 19:18:58 -07:00
Cong Wang	e51fb15231	rtnetlink: fix a memory leak when ->newlink fails It is possible that ->newlink() fails before registering the device, in this case we should just free it, it's safe to call free_netdev(). Fixes: commit `0e0eee2465` (net: correct error path in rtnl_newlink()) Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-03 19:16:10 -07:00
Michal Kubecek	21ee543edc	xfrm: fix race between netns cleanup and state expire notification The xfrm_user module registers its pernet init/exit after xfrm itself so that its net exit function xfrm_user_net_exit() is executed before xfrm_net_exit() which calls xfrm_state_fini() to cleanup the SA's (xfrm states). This opens a window between zeroing net->xfrm.nlsk pointer and deleting all xfrm_state instances which may access it (via the timer). If an xfrm state expires in this window, xfrm_exp_state_notify() will pass null pointer as socket to nlmsg_multicast(). As the notifications are called inside rcu_read_lock() block, it is sufficient to retrieve the nlsk socket with rcu_dereference() and check the it for null. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-03 16:07:44 -07:00
Linus Torvalds	776edb5931	Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next Pull core locking updates from Ingo Molnar: "The main changes in this cycle were: - reduced/streamlined smp_mb__() interface that allows more usecases and makes the existing ones less buggy, especially in rarer architectures - add rwsem implementation comments - bump up lockdep limits" 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits) rwsem: Add comments to explain the meaning of the rwsem's count field lockdep: Increase static allocations arch: Mass conversion of smp_mb__() arch,doc: Convert smp_mb__() arch,xtensa: Convert smp_mb__() arch,x86: Convert smp_mb__() arch,tile: Convert smp_mb__() arch,sparc: Convert smp_mb__() arch,sh: Convert smp_mb__() arch,score: Convert smp_mb__() arch,s390: Convert smp_mb__() arch,powerpc: Convert smp_mb__() arch,parisc: Convert smp_mb__() arch,openrisc: Convert smp_mb__() arch,mn10300: Convert smp_mb__() arch,mips: Convert smp_mb__() arch,metag: Convert smp_mb__() arch,m68k: Convert smp_mb__() arch,m32r: Convert smp_mb__() arch,ia64: Convert smp_mb__() ...	2014-06-03 12:57:53 -07:00
David S. Miller	014b20133b	Merge branch 'ethtool-rssh-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bwh/net-next Ben Hutchings says: ==================== Pull request: Fixes for new ethtool RSS commands This addresses several problems I previously identified with the new ETHTOOL_{G,S}RSSH commands: 1. Missing validation of reserved parameters 2. Vague documentation 3. Use of unnamed magic number 4. No consolidation with existing driver operations I don't currently have access to suitable network hardware, but have tested these changes with a dummy driver that can support various combinations of operations and sizes, together with (a) Debian's ethtool 3.13 (b) ethtool 3.14 with the submitted patch to use ETHTOOL_{G,S}RSSH and minor adjustment for fixes 1 and 3. v2: Update RSS operations in vmxnet3 too ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-02 23:07:02 -07:00
Ben Hutchings	f062a38448	ethtool: Check that reserved fields of struct ethtool_rxfh are 0 We should fail rather than silently ignoring use of these extensions. Signed-off-by: Ben Hutchings <ben@decadent.org.uk>	2014-06-03 02:43:16 +01:00
Ben Hutchings	fe62d00137	ethtool: Replace ethtool_ops::{get,set}_rxfh_indir() with {get,set}_rxfh() ETHTOOL_{G,S}RXFHINDIR and ETHTOOL_{G,S}RSSH should work for drivers regardless of whether they expose the hash key, unless you try to set a hash key for a driver that doesn't expose it. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-06-03 02:42:44 +01:00
Roopa Prabhu	41c389d72c	bridge: Add bridge ifindex to bridge fdb notify msgs (This patch was previously posted as RFC at http://patchwork.ozlabs.org/patch/352677/) This patch adds NDA_MASTER attribute to neighbour attributes enum for bridge/master ifindex. And adds NDA_MASTER to bridge fdb notify msgs. Today bridge fdb notifications dont contain bridge information. Userspace can derive it from the port information in the fdb notification. However this is tricky in some scenarious. Example, bridge port delete notification comes before bridge fdb delete notifications. And we have seen problems in userspace when using libnl where, the bridge fdb delete notification handling code does not understand which bridge this fdb entry is part of because the bridge and port association has already been deleted. And these notifications (port membership and fdb) are generated on separate rtnl groups. Fixing the order of notifications could possibly solve the problem for some cases (I can submit a separate patch for that). This patch chooses to add NDA_MASTER to bridge fdb notify msgs because it not only solves the problem described above, but also helps userspace avoid another lookup into link msgs to derive the master index. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-02 17:58:55 -07:00
Leon Yu	418c96ac15	net: filter: fix possible memory leak in __sk_prepare_filter() __sk_prepare_filter() was reworked in commit `bd4cf0ed3` (net: filter: rework/optimize internal BPF interpreter's instruction set) so that it should have uncharged memory once things went wrong. However that work isn't complete. Error is handled only in __sk_migrate_filter() while memory can still leak in the error path right after sk_chk_filter(). Fixes: `bd4cf0ed33` ("net: filter: rework/optimize internal BPF interpreter's instruction set") Signed-off-by: Leon Yu <chianglungyu@gmail.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Tested-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-02 17:49:45 -07:00
Yuchung Cheng	0cfa5c07d6	tcp: fix cwnd undo on DSACK in F-RTO This bug is discovered by an recent F-RTO issue on tcpm list https://www.ietf.org/mail-archive/web/tcpm/current/msg08794.html The bug is that currently F-RTO does not use DSACK to undo cwnd in certain cases: upon receiving an ACK after the RTO retransmission in F-RTO, and the ACK has DSACK indicating the retransmission is spurious, the sender only calls tcp_try_undo_loss() if some never retransmisted data is sacked (FLAG_ORIG_DATA_SACKED). The correct behavior is to unconditionally call tcp_try_undo_loss so the DSACK information is used properly to undo the cwnd reduction. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-02 16:50:49 -07:00
David Ahern	30f38d2fdd	fib_trie: use seq_file_net rather than seq->private Make fib_triestat_seq_show consistent with other /proc/net files and use seq_file_net. Signed-off-by: David Ahern <dsahern@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-02 16:41:38 -07:00
Eric W. Biederman	2d7a85f4b0	netlink: Only check file credentials for implicit destinations It was possible to get a setuid root or setcap executable to write to it's stdout or stderr (which has been set made a netlink socket) and inadvertently reconfigure the networking stack. To prevent this we check that both the creator of the socket and the currentl applications has permission to reconfigure the network stack. Unfortunately this breaks Zebra which always uses sendto/sendmsg and creates it's socket without any privileges. To keep Zebra working don't bother checking if the creator of the socket has privilege when a destination address is specified. Instead rely exclusively on the privileges of the sender of the socket. Note from Andy: This is exactly Eric's code except for some comment clarifications and formatting fixes. Neither I nor, I think, anyone else is thrilled with this approach, but I'm hesitant to wait on a better fix since 3.15 is almost here. Note to stable maintainers: This is a mess. An earlier series of patches in 3.15 fix a rather serious security issue (CVE-2014-0181), but they did so in a way that breaks Zebra. The offending series includes: commit `aa4cf9452f` Author: Eric W. Biederman <ebiederm@xmission.com> Date: Wed Apr 23 14:28:03 2014 -0700 net: Add variants of capable for use on netlink messages If a given kernel version is missing that series of fixes, it's probably worth backporting it and this patch. if that series is present, then this fix is critical if you care about Zebra. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-02 16:34:09 -07:00
Eric Dumazet	39c36094d7	net: fix inet_getid() and ipv6_select_ident() bugs I noticed we were sending wrong IPv4 ID in TCP flows when MTU discovery is disabled. Note how GSO/TSO packets do not have monotonically incrementing ID. 06:37:41.575531 IP (id 14227, proto: TCP (6), length: 4396) 06:37:41.575534 IP (id 14272, proto: TCP (6), length: 65212) 06:37:41.575544 IP (id 14312, proto: TCP (6), length: 57972) 06:37:41.575678 IP (id 14317, proto: TCP (6), length: 7292) 06:37:41.575683 IP (id 14361, proto: TCP (6), length: 63764) It appears I introduced this bug in linux-3.1. inet_getid() must return the old value of peer->ip_id_count, not the new one. Lets revert this part, and remove the prevention of a null identification field in IPv6 Fragment Extension Header, which is dubious and not even done properly. Fixes: `87c48fa3b4` ("ipv6: make fragment identifications less predictable") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-02 14:09:28 -07:00
Toshiaki Makita	e0d7968ab6	bridge: Prevent insertion of FDB entry with disallowed vlan br_handle_local_finish() is allowing us to insert an FDB entry with disallowed vlan. For example, when port 1 and 2 are communicating in vlan 10, and even if vlan 10 is disallowed on port 3, port 3 can interfere with their communication by spoofed src mac address with vlan id 10. Note: Even if it is judged that a frame should not be learned, it should not be dropped because it is destined for not forwarding layer but higher layer. See IEEE 802.1Q-2011 8.13.10. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-02 13:38:23 -07:00
David S. Miller	31595de219	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next John W. Linville says: ==================== pull request: wireless-next 2014-06-02 Please pull this remaining batch of updates intended for the 3.16 stream... For the mac80211 bits, Johannes says: "The remainder for -next right now is mostly fixes, and a handful of small new things like some CSA infrastructure, the regdb script mW/dBm conversion change and sending wiphy notifications." For the bluetooth bits, Gustavo says: "Some more patches for 3.16. There is nothing really special here, just a bunch of clean ups, fixes plus some small improvements. Please pull." For the nfc bits, Samuel says: "We have: - Felica (Type3) tags support for trf7970a - Type 4b tags support for port100 - st21nfca DTS typo fix - A few sparse warning fixes" For the atheros bits, Kalle says: "Ben added support for setting antenna configurations. Michal improved warm reset so that we would not need to fall back to cold reset that often, an issue where ath10k stripped protected flag while in monitor mode and made module initialisation asynchronous to fix the problems with firmware loading when the driver is linked to the kernel. Luca removed unused channel_switch_beacon callbacks both from ath9k and ath10k. Marek fixed Protected Management Frames (PMF) when using Action Frames. Also we had other small fixes everywhere in the driver." Along with that, there are a handful of updates to a variety of drivers. This includes updates to at76c50x-usb, ath9k, b43, brcmfmac, mwifiex, rsi, rtlwifi, and wil6210. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-02 11:17:35 -07:00
Eric Dumazet	73f156a6e8	inetpeer: get rid of ip_id_count Ideally, we would need to generate IP ID using a per destination IP generator. linux kernels used inet_peer cache for this purpose, but this had a huge cost on servers disabling MTU discovery. 1) each inet_peer struct consumes 192 bytes 2) inetpeer cache uses a binary tree of inet_peer structs, with a nominal size of ~66000 elements under load. 3) lookups in this tree are hitting a lot of cache lines, as tree depth is about 20. 4) If server deals with many tcp flows, we have a high probability of not finding the inet_peer, allocating a fresh one, inserting it in the tree with same initial ip_id_count, (cf secure_ip_id()) 5) We garbage collect inet_peer aggressively. IP ID generation do not have to be 'perfect' Goal is trying to avoid duplicates in a short period of time, so that reassembly units have a chance to complete reassembly of fragments belonging to one message before receiving other fragments with a recycled ID. We simply use an array of generators, and a Jenkin hash using the dst IP as a key. ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it belongs (it is only used from this file) secure_ip_id() and secure_ipv6_id() no longer are needed. Rename ip_select_ident_more() to ip_select_ident_segs() to avoid unnecessary decrement/increment of the number of segments. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-02 11:00:41 -07:00
Alexander Duyck	670e5b8eaf	net: Add support for device specific address syncing This change provides a function to be used in order to break the ndo_set_rx_mode call into a set of address add and remove calls. The code is based on the implementation of dev_uc_sync/dev_mc_sync. Since they essentially do the same thing but with only one dev I simply named my functions __dev_uc_sync/__dev_mc_sync. I also implemented an unsync version of the functions as well to allow for cleanup on close. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-02 10:40:54 -07:00
Alexander Aring	eb06481d69	6lowpan_rtnl: fix off by one while fragmentation This patch fix a off by one error while fragmentation. If the frag_cap value is equal to skb_unprocessed value we need to stop the fragmentation loop because the last fragment which has a size of skb_unprocessed fits into the frag capability size. This issue was introduced by commit `d4b2816d67` ("6lowpan: fix fragmentation"). Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-02 10:39:42 -07:00
Alexander Aring	51263fffad	6lowpan_rtnl: fix fragmentation with two fragments This patch fix the 6LoWPAN fragmentation for the case if we have exactly two fragments. The problem is that the (skb_unprocessed >= frag_cap) condition is always false on the second fragment after sending the first fragment. A fragmentation with only one fragment doesn't make any sense. The solution is that we use a do while loop here, that ensures we sending always a minimum of two fragments if we need a fragmentation. This issue was introduced by commit `d4b2816d67` ("6lowpan: fix fragmentation"). Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-02 10:39:42 -07:00
Denis ChengRq	2f91abd451	genetlink: remove superfluous assignment the local variable ops and n_ops were just read out from family, and not changed, hence no need to assign back. Validation functions should operate on const parameters and not change anything. Signed-off-by: Cheng Renquan <crquan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-02 10:36:18 -07:00
John W. Linville	fcb2c0d6cf	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem	2014-06-02 11:20:17 -04:00
Jukka Taimisto	8a96f3cd22	Bluetooth: Fix L2CAP deadlock -[0x01 Introduction We have found a programming error causing a deadlock in Bluetooth subsystem of Linux kernel. The problem is caused by missing release_sock() call when L2CAP connection creation fails due full accept queue. The issue can be reproduced with 3.15-rc5 kernel and is also present in earlier kernels. -[0x02 Details The problem occurs when multiple L2CAP connections are created to a PSM which contains listening socket (like SDP) and left pending, for example, configuration (the underlying ACL link is not disconnected between connections). When L2CAP connection request is received and listening socket is found the l2cap_sock_new_connection_cb() function (net/bluetooth/l2cap_sock.c) is called. This function locks the 'parent' socket and then checks if the accept queue is full. 1178 lock_sock(parent); 1179 1180 /* Check for backlog size */ 1181 if (sk_acceptq_is_full(parent)) { 1182 BT_DBG("backlog full %d", parent->sk_ack_backlog); 1183 return NULL; 1184 } If case the accept queue is full NULL is returned, but the 'parent' socket is not released. Thus when next L2CAP connection request is received the code blocks on lock_sock() since the parent is still locked. Also note that for connections already established and waiting for configuration to complete a timeout will occur and l2cap_chan_timeout() (net/bluetooth/l2cap_core.c) will be called. All threads calling this function will also be blocked waiting for the channel mutex since the thread which is waiting on lock_sock() alread holds the channel mutex. We were able to reproduce this by sending continuously L2CAP connection request followed by disconnection request containing invalid CID. This left the created connections pending configuration. After the deadlock occurs it is impossible to kill bluetoothd, btmon will not get any more data etc. requiring reboot to recover. -[0x03 Fix Releasing the 'parent' socket when l2cap_sock_new_connection_cb() returns NULL seems to fix the issue. Signed-off-by: Jukka Taimisto <jtt@codenomicon.com> Reported-by: Tommi Mäkilä <tmakila@codenomicon.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Cc: stable@vger.kernel.org	2014-06-02 13:38:19 +03:00
Pablo Neira Ayuso	31f8441c32	netfilter: nf_tables: atomic allocation in set notifications from rcu callback Use GFP_ATOMIC allocations when sending removal notifications of anonymous sets from rcu callback context. Sleeping in that context is illegal. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-02 10:54:38 +02:00
Pablo Neira Ayuso	4fefee570d	netfilter: nf_tables: allow to delete several objects from a batch Three changes to allow the deletion of several objects with dependencies in one transaction, they are: 1) Introduce speculative counter increment/decrement that is undone in the abort path if required, thus we avoid hitting -EBUSY when deleting the chain. The counter updates are reverted in the abort path. 2) Increment/decrement table/chain use counter for each set/rule. We need this to fully rely on the use counters instead of the list content, eg. !list_empty(&chain->rules) which evaluate true in the middle of the transaction. 3) Decrement table use counter when an anonymous set is bound to the rule in the commit path. This avoids hitting -EBUSY when deleting the table that contains anonymous sets. The anonymous sets are released in the nf_tables_rule_destroy path. This should not be a problem since the rule already bumped the use counter of the chain, so the bound anonymous set reflects dependencies through the rule object, which already increases the chain use counter. So the general assumption after this patch is that the use counters are bumped by direct object dependencies. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-02 10:54:35 +02:00
Pablo Neira Ayuso	7632667d26	netfilter: nft_rbtree: introduce locking There's no rbtree rcu version yet, so let's fall back on the spinlock to protect the concurrent access of this structure both from user (to update the set content) and kernel-space (in the packet path). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-02 10:54:31 +02:00
Pablo Neira Ayuso	a1cee076f4	netfilter: nf_tables: release objects in reverse order in the abort path The patch `c7c32e7` ("netfilter: nf_tables: defer all object release via rcu") indicates that we always release deleted objects in the reverse order, but that is only needed in the abort path. These are the two possible scenarios when releasing objects: 1) Deletion scenario in the commit path: no need to release objects in the reverse order since userspace already ensures that dependencies are fulfilled), ie. userspace tells us to delete rule -> ... -> rule -> chain -> table. In this case, we have to release the objects in the same order as userspace provided. 2) Deletion scenario in the abort path: we have to iterate in the reverse order to undo what it cannot be added, ie. userspace sent us a batch that includes: table -> chain -> rule -> ... -> rule, and that needs to be partially undone. In this case, we have to release objects in the reverse order to ensure that the set and chain objects point to valid rule and table objects. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-02 10:54:28 +02:00
Pablo Neira Ayuso	46bbafceb2	netfilter: nf_tables: fix wrong transaction ordering in set elements The transaction needs to be placed at the end of the commit list, otherwise event notifications are reordered and we may crash when releasing object via call_rcu. This problem was introduced in `60319eb` ("netfilter: nf_tables: use new transaction infrastructure to handle elements"). Reported-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-02 10:54:25 +02:00
Mathieu Poirier	4c552a64df	netfilter: nfnetlink_acct: Fix memory leak Allocation of memory need only to happen once, that is after the proper checks on the NFACCT_FLAGS have been done. Otherwise the code can return without freeing already allocated memory. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-02 10:46:52 +02:00
Johan Hedberg	f3fb0b58c8	Bluetooth: Fix missing check for FIPS security level When checking whether a legacy link key provides at least HIGH security level we also need to check for FIPS level which is one step above HIGH. This patch fixes a missing check in the hci_link_key_request_evt() function. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org	2014-06-02 00:34:36 -07:00
Daniel Borkmann	f8f6d679aa	net: filter: improve filter block macros Commit `9739eef13c` ("net: filter: make BPF conversion more readable") started to introduce helper macros similar to BPF_STMT()/BPF_JUMP() macros from classic BPF. However, quite some statements in the filter conversion functions remained in the old style which gives a mixture of block macros and non block macros in the code. This patch makes the block macros itself more readable by using explicit member initialization, and converts the remaining ones where possible to remain in a more consistent state. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-01 22:16:58 -07:00
Daniel Borkmann	3480593131	net: filter: get rid of BPF_S_* enum This patch finally allows us to get rid of the BPF_S_* enum. Currently, the code performs unnecessary encode and decode workarounds in seccomp and filter migration itself when a filter is being attached in order to overcome BPF_S_* encoding which is not used anymore by the new interpreter resp. JIT compilers. Keeping it around would mean that also in future we would need to extend and maintain this enum and related encoders/decoders. We can get rid of all that and save us these operations during filter attaching. Naturally, also JIT compilers need to be updated by this. Before JIT conversion is being done, each compiler checks if A is being loaded at startup to obtain information if it needs to emit instructions to clear A first. Since BPF extensions are a subset of BPF_LD \| BPF_{W,H,B} \| BPF_ABS variants, case statements for extensions can be removed at that point. To ease and minimalize code changes in the classic JITs, we have introduced bpf_anc_helper(). Tested with test_bpf on x86_64 (JIT, int), s390x (JIT, int), arm (JIT, int), i368 (int), ppc64 (JIT, int); for sparc we unfortunately didn't have access, but changes are analogous to the rest. Joint work with Alexei Starovoitov. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mircea Gherzan <mgherzan@gmail.com> Cc: Kees Cook <keescook@chromium.org> Acked-by: Chema Gonzalez <chemag@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-01 22:16:58 -07:00
Jon Maxwell	c65c7a3066	bridge: notify user space after fdb update There has been a number incidents recently where customers running KVM have reported that VM hosts on different Hypervisors are unreachable. Based on pcap traces we found that the bridge was broadcasting the ARP request out onto the network. However some NICs have an inbuilt switch which on occasions were broadcasting the VMs ARP request back through the physical NIC on the Hypervisor. This resulted in the bridge changing ports and incorrectly learning that the VMs mac address was external. As a result the ARP reply was directed back onto the external network and VM never updated it's ARP cache. This patch will notify the bridge command, after a fdb has been updated to identify such port toggling. Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-01 22:14:50 -07:00
wangweidong	019ee792d7	bridge: fix the unbalanced promiscuous count when add_if failed As commit `2796d0c648` ("bridge: Automatically manage port promiscuous mode."), make the add_if use dev_set_allmulti instead of dev_set_promiscuous, so when add_if failed, we should do dev_set_allmulti(dev, -1). Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Reviewed-by: Amos Kong <akong@redhat.com> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-01 22:05:16 -07:00
Nikolay Aleksandrov	4b9b1cdf83	net: fix wrong mac_len calculation for vlans After `1e785f48d2` ("net: Start with correct mac_len in skb_network_protocol") skb->mac_len is used as a start of the calculation in skb_network_protocol() but that is not always correct. If skb->protocol == 8021Q/AD, usually the vlan header is already inserted in the skb (i.e. vlan reorder hdr == 0). Usually when the packet enters dev_hard_xmit it has mac_len == 0 so we take 2 bytes from the destination mac address (skb->data + VLAN_HLEN) as a type in skb_network_protocol() and return vlan_depth == 4. In the case where TSO is off, then the mac_len is set but it's == 18 (ETH_HLEN + VLAN_HLEN), so skb_network_protocol() returns a type from inside the packet and offset == 22. Also make vlan_depth unsigned as suggested before. As suggested by Eric Dumazet, move the while() loop in the if() so we can avoid additional testing in fast path. Here are few netperf tests + debug printk's to illustrate: cat netperf.tso-on.reorder-on.bugged - Vlan -> device (reorder on, default, this case is okay) MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.3.1 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 7111.54 [ 81.605435] skb->len 65226 skb->gso_size 1448 skb->proto 0x800 skb->mac_len 0 vlan_depth 0 type 0x800 - Vlan -> device (reorder off, bad) cat netperf.tso-on.reorder-off.bugged MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.3.1 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 241.35 [ 204.578332] skb->len 1518 skb->gso_size 0 skb->proto 0x8100 skb->mac_len 0 vlan_depth 4 type 0x5301 0x5301 are the last two bytes of the destination mac. And if we stop TSO, we may get even the following: [ 83.343156] skb->len 2966 skb->gso_size 1448 skb->proto 0x8100 skb->mac_len 18 vlan_depth 22 type 0xb84 Because mac_len already accounts for VLAN_HLEN. After the fix: cat netperf.tso-on.reorder-off.fixed MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.3.1 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.01 5001.46 [ 81.888489] skb->len 65230 skb->gso_size 1448 skb->proto 0x8100 skb->mac_len 0 vlan_depth 18 type 0x800 CC: Vlad Yasevich <vyasevic@redhat.com> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Daniel Borkman <dborkman@redhat.com> CC: David S. Miller <davem@davemloft.net> Fixes:1e785f48d29a ("net: Start with correct mac_len in skb_network_protocol") Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-01 19:39:13 -07:00
Johan Hedberg	79897d2097	Bluetooth: Fix requiring SMP MITM for outgoing connections Due to recent changes to the way that the MITM requirement is set for outgoing pairing attempts we can no longer rely on the hcon->auth_type variable (which is actually good since it was formed from BR/EDR concepts that don't really exist for SMP). To match the logic that BR/EDR now uses simply rely on the local IO capability and/or needed security level to set the MITM requirement for outgoing pairing requests. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org	2014-05-31 23:51:12 -07:00
David S. Miller	6ce995c6f4	Included changes: - prevent NULL dereference in multicast code -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJTiZ58AAoJEJgn97Bh2u9e2x4QAIjWLDIo6feo4jH8l6q6R5iO cH/EXCtqk9GHNwvfZDNt+pF19ejzVk/TPnmmXTZ4QcElS9GuXe5WdWxiGcS5KEwa 0UNDRp8fgcBSV1Kqc/vbyKiQ4j69QtC1PPfLWUtxj/GYE0qHX/A1OzB9zROvoHJ7 sa3l8O5XRWiaxBYDkT0RfhHH0jeDdvm3I9yt8B+4B6c71094VIsfGXBVPp4tPrdg nkuzBdwF0HFPiFrlsfboJDLcXPLpRR93H1GsmfELYd5jQ4rtUhlcuEESq6573tvB TV93tkm/zmbwtInMoPI29qKL8t2478cJH7SvKvM4NiqMsB1zOhknhUXzElh9TPGA xyNivxJraYJzL53XguBFO8A8fP1k/E8Z6UQXJbgry4lu+6qZ60e0/J8zGxGpSamP i1JX0MAVPX6T4MAlZ70LMxfmzJ5sSNkkYyXobG+aBa/AgzRsXVvG4So1qi364COx btCxgBXK1Z20ZuNclY8/J06D8EbTXI5y5MCSDvMCOHQlb5mjBl34RtFVw+5/QXkg v2suc7T/YLOPNtZktZC2506caPHoOlwEVvkyA55p+qdkcD/Dd5Iv4Hndi+g+C5gv O2ja7gUQco1R8ElormKW9rE7OvjiUlowNJmguXWAdzc9FC0yISpP66BAGjBqwhF9 6YibEebXMQICjxAVTEAM =fvfu -----END PGP SIGNATURE----- Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge Included changes: - prevent NULL dereference in multicast code Antonion Quartulli says: ==================== pull request net: batman-adv 20140527 here you have another very small fix intended for net/linux-3.15. It prevents some multicast functions from dereferencing a NULL pointer. (Actually it was nothing more than a typo) I hope it is not too late for such a small patch. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-31 20:01:47 -07:00
Marek Lindner	af0a171c07	batman-adv: fix NULL pointer dereferences Was introduced with `4c8755d69c` ("batman-adv: Send multicast packets to nodes with a WANT_ALL flag") Reported-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Acked-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>	2014-05-31 10:07:14 +02:00
Jukka Rissanen	6a5e81650a	Bluetooth: l2cap: Set more channel defaults Default values for various channel settings were missing. This way channel users do not need to set default values themselves. Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-05-30 21:38:37 -07:00
Jukka Rissanen	62bbd5b359	Bluetooth: 6LoWPAN: Fix MAC address universal/local bit handling The universal/local bit handling was incorrectly done in the code. So when setting EUI address from BD address we do this: - If BD address type is PUBLIC, then we clear the universal bit in EUI address. If the address type is RANDOM, then the universal bit is set (BT 6lowpan draft chapter 3.2.2) - After this we invert the universal/local bit according to RFC 2464 When figuring out BD address we do the reverse: - Take EUI address from stateless IPv6 address, invert the universal/local bit according to RFC 2464 - If universal bit is 1 in this modified EUI address, then address type is set to RANDOM, otherwise it is PUBLIC Note that 6lowpan_iphc.[ch] does the final toggling of U/L bit before sending or receiving the network packet. Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org	2014-05-30 21:28:21 -07:00
Johan Hedberg	7e3691e13a	Bluetooth: Fix authentication check for FIPS security level When checking whether we need to request authentication or not we should include HCI_SECURITY_FIPS to the levels that always need authentication. This patch fixes check for it in the hci_outgoing_auth_needed() function. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org	2014-05-30 21:25:01 -07:00
Johan Hedberg	61b433579b	Bluetooth: Fix properly ignoring LTKs of unknown types In case there are new LTK types in the future we shouldn't just blindly assume that != MGMT_LTK_UNAUTHENTICATED means that the key is authenticated. This patch adds explicit checks for each allowed key type in the form of a switch statement and skips any key which has an unknown value. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org	2014-05-30 21:23:29 -07:00
David S. Miller	dbfc4b698a	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== The following patchset contains a late fix for IPVS: * Fix crash when trying to remove the transport header with non-linear skbuffs, this was introduced in 3.6-rc. Patch from Peter Christensen via the IPVS folks. I'll pass this to -stable once this hits mainstream. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-30 17:56:09 -07:00
David S. Miller	90d0e08e57	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next This small patchset contains three accumulated Netfilter/IPVS updates, they are: 1) Refactorize common NAT code by encapsulating it into a helper function, similarly to what we do in other conntrack extensions, from Florian Westphal. 2) A minor format string mismatch fix for IPVS, from Masanari Iida. 3) Add quota support to the netfilter accounting infrastructure, now you can add quotas to accounting objects via the nfnetlink interface and use them from iptables. You can also listen to quota notifications from userspace. This enhancement from Mathieu Poirier. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-30 17:54:47 -07:00
Himangi Saraogi	47162c0b7e	af_key: Replace comma with semicolon This patch replaces a comma between expression statements by a semicolon. A simplified version of the semantic patch that performs this transformation is as follows: // <smpl> @r@ expression e1,e2,e; type T; identifier i; @@ e1 -, +; e2; // </smpl> Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-30 17:48:58 -07:00
Himangi Saraogi	01728371dc	rds/tcp_listen: Replace comma with semicolon This patch replaces a comma between expression statements by a semicolon. A simplified version of the semantic patch that performs this transformation is as follows: // <smpl> @r@ expression e1,e2,e; type T; identifier i; @@ e1 -, +; e2; // </smpl> Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-30 17:48:58 -07:00
Himangi Saraogi	cc2afe9fe2	RDS/RDMA: Replace comma with semicolon This patch replaces a comma between expression statements by a semicolon. A simplified version of the semantic patch that performs this transformation is as follows: // <smpl> @r@ expression e1,e2,e; type T; identifier i; @@ e1 -, +; e2; // </smpl> Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-30 17:48:58 -07:00
Himangi Saraogi	70cb4a4526	ipmr: Replace comma with semicolon This patch replaces a comma between expression statements by a semicolon. A simplified version of the semantic patch that performs this transformation is as follows: // <smpl> @r@ expression e1,e2,e; type T; identifier i; @@ e1 -, +; e2; // </smpl> Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-30 17:48:57 -07:00
Ursula Braun	4d520f62e0	af_iucv: correct cleanup if listen backlog is full In case of transport HIPER a sock struct is allocated for an incoming connect request. If the backlog queue is full this socket is not needed, but is left in the list of af_iucv sockets. Final socket release posts console message "Attempt to release alive iucv socket". This patch makes sure the new created socket is cleaned up correctly if the backlog queue is full. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com> Reported-by: Philipp Hachtmann <phacht@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-30 17:35:23 -07:00
Philipp Hachtmann	53a4b4995e	af_iucv: Add automatic (source) iucv_name to bind If a socket is bound to an address using before calling connect it is usual to leave it to the network system to choose an appropriate outgoing application name respective port address. af_iucv on VM uses a counter and uses simple numbers as unique identifiers. This behaviour was missing when af_iucv is used with HiperSockets. This patch contains a simple approach to harmonize af_iucv's behaviour. Signed-off-by: Philipp Hachtmann <phacht@linux.vnet.ibm.com> Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-30 17:35:23 -07:00
Kinglong Mee	a48fd0f9f7	SUNRPC/NFSD: Remove using of dprintk with KERN_WARNING When debugging, rpc prints messages from dprintk(KERN_WARNING ...) with "^A4" prefixed, [ 2780.339988] ^A4nfsd: connect from unprivileged port: 127.0.0.1, port=35316 Trond tells, > dprintk != printk. We have NEVER supported dprintk(KERN_WARNING...) This patch removes using of dprintk with KERN_WARNING. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 20:25:28 -04:00
David S. Miller	4d1cdf1db6	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next John W. Linville says: ==================== Please pull this batch of updates intended for 3.16... For the mac80211 bits, Johannes says: "Here I just have Heikki's rfkill GPIO cleanups. The ARM/tegra patch is OK with the maintainer (Stephen). Let me know of any problems." and; "We have a whole bunch of work on CSA by Andrei, Luca and Michal, but unfortunately it doesn't seem quite complete yet so it's still disabled. There's some TDLS work from Arik, and the rest is mostly minor fixes and cleanups." For the NFC bits, Samuel says: "This is the NFC pull request for 3.16. We have: - STMicroeectronics st21nfca support. The st21nfca is an HCI chipset and thus relies on the HCI stack. This submission provides support for tag redaer/writer mode (including Type 5) and device tree bindings. - PM runtime support and a bunch of bug fixes for TI's trf7970a. - Device tree support for NXP's pn544. Legacy platform data support is obviously kept intact. - NFC Tag type 4B support to the NFC Digital stack. - SOCK_RAW type support to the raw NFC socket, and allow NCI sniffing from that. This can be extended to report HCI frames and also proprietarry ones like e.g. the pn533 ones." For the iwlwifi bits, Emmanuel says: "Eran continues to work on new devices, Eyal is still digging in the rate control stuff, and Johannes added new functionality to the debug system we have in place now along with a few cleanups he made on the way. That's pretty much it." and; "Avri continues to work on the power code and Eran is improving the NVM handling as a preparations for new devices on which he works with Liad. Luca cleans up a bit the code while working on CSA. I have the regular BT Coex stuff and a small lockdep fix. Johannes has his regular amount of clean ups and improvements, the main one is the ability to leave 2 chains open to improve diversity and hence the throughput in high attenuation scenarios." and; "The regular amount of housekeeping here. I merged iwlwifi-fixes.git to be able to add the patch you didn't want in wireless.git at that stage of the -rc cycle. Luca has a few preparations for CSA implementation and also what seems to be a bugfix for P2P but hasn't caused issues we could notice." For the Atheros bits, Kalle says: "For ath10k Michal did various small fixes on how we handle hardware/firmware problems and he also fixed two memory leaks." Also included are a couple of pulls from the wireless tree to avoid/resolve merge issues... ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-30 17:18:46 -07:00
Paul Bolle	391296c90c	atm: remove commented out check This preprocessor check is commented out ever since this file was added during the v2.3 development cycle. It is unclear what it purpose might have been. Whatever it was, it can safely be removed now. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-30 17:17:04 -07:00
Sachin Kamat	484611e530	net: tso: Export symbols for modular build Export the symbols to fix the below errors when built as modules: ERROR: "tso_build_data" [drivers/net/ethernet/marvell/mvneta.ko] undefined! ERROR: "tso_build_hdr" [drivers/net/ethernet/marvell/mvneta.ko] undefined! ERROR: "tso_start" [drivers/net/ethernet/marvell/mvneta.ko] undefined! ERROR: "tso_count_descs" [drivers/net/ethernet/marvell/mvneta.ko] undefined! ERROR: "tso_build_data" [drivers/net/ethernet/marvell/mv643xx_eth.ko] undefined! ERROR: "tso_build_hdr" [drivers/net/ethernet/marvell/mv643xx_eth.ko] undefined! ERROR: "tso_start" [drivers/net/ethernet/marvell/mv643xx_eth.ko] undefined! ERROR: "tso_count_descs" [drivers/net/ethernet/marvell/mv643xx_eth.ko] undefined! Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org> Acked-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-30 15:52:03 -07:00
J. Bruce Fields	a5cddc885b	nfsd4: better reservation of head space for krb5 RPC_MAX_AUTH_SIZE is scattered around several places. Better to set it once in the auth code, where this kind of estimate should be made. And while we're at it we can leave it zero when we're not using krb5i or krb5p. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:32:17 -04:00
J. Bruce Fields	db3f58a95b	rpc: define xdr_restrict_buflen With this xdr_reserve_space can help us enforce various limits. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:32:01 -04:00
J. Bruce Fields	2825a7f907	nfsd4: allow encoding across page boundaries After this we can handle for example getattr of very large ACLs. Read, readdir, readlink are still special cases with their own limits. Also we can't handle a new operation starting close to the end of a page. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:31:54 -04:00
J. Bruce Fields	3e19ce762b	rpc: xdr_truncate_encode This will be used in the server side in a few cases: - when certain operations (read, readdir, readlink) fail after encoding a partial response. - when we run out of space after encoding a partial response. - in readlink, where we initially reserve PAGE_SIZE bytes for data, then truncate to the actual size. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:31:47 -04:00
John W. Linville	57afc62e94	NFC: 3.16: Second pull request This is the 2nd NFC pull request for 3.16. We have: - Felica (Type3) tags support for trf7970a - Type 4b tags support for port100 - st21nfca DTS typo fix - A few sparse warning check fixes -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJTh7q/AAoJEIqAPN1PVmxKKAMP/1mJP88CXQv6SZUpmWMXP/K2 7LqdG6nSnuPwm8k43qbNTdCbZRxfRVTdmyBjdAsxHYVOj2S3hGMkYXcCW6phD+AJ I4OPi3quC+y+4Tjl34fWIpEPTgvmAqMxuyLXiKwMTwuzdwNkDF3JzYiRyxm2QvqM qFevVEUdWqj0YywJGfokQLFfWNJbu7ghpBei4eIK53QX63dIQVPi63Lih5jBI4ig gJg7CHfPzaduYuCysU7rRss93p4CJ45Mc8b9CZn59KWW2nRw98wp867083Rbr9F7 zwaH0hc/L1kwFLLEXMYPx2a/1CoEya54amu8oKaBEg90OUvYPxjPQlPKvmy1hKXB cNwW7snuAH+10IBmD3dcoEqZ50pTXkMZw5czdNmgnUUxrOyS4wzR/n1X10+FqH3O 1E6G8MWVZuIU9l/FBSRvhX0jFK2upHgGrD93nu1qAg7giAZvqDHUSKdGVmMfI32D Tm+j6cS0/AouePssWChQtPwbAJus2kgeBO/w8gu2HaFN8C13E/nPSg77tONlRWQ6 rEkXum1P2jE9QTGQfzGwbCITxhEiMpHxtXV80lD5THkfHVVtQV6zkL2Lj9QDzoxQ d80Xk2DOScKnDcVCOiHX1NrnST3sFH1TsRS9XCKvmDX02VMl+KbYZzzJJaQ8gDLj NCVNv3BvuclwsG3VVqFn =t52d -----END PGP SIGNATURE----- Merge tag 'nfc-next-3.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next Samuel Ortiz <sameo@linux.intel.com> says: "NFC: 3.16: Second pull request This is the 2nd NFC pull request for 3.16. We have: - Felica (Type3) tags support for trf7970a - Type 4b tags support for port100 - st21nfca DTS typo fix - A few sparse warning check fixes" Signed-off-by: John W. Linville <linville@tuxdriver.com>	2014-05-30 13:41:40 -04:00
John W. Linville	a5eb1aeb25	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Conflicts: drivers/bluetooth/btusb.c	2014-05-29 13:03:47 -04:00
John W. Linville	737be10d8c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next	2014-05-29 12:55:38 -04:00
David Rientjes	c6c8fe79a8	net, sunrpc: suppress allocation warning in rpc_malloc() rpc_malloc() allocates with GFP_NOWAIT without making any attempt at reclaim so it easily fails when low on memory. This ends up spamming the kernel log: SLAB: Unable to allocate memory on node 0 (gfp=0x4000) cache: kmalloc-8192, object size: 8192, order: 1 node 0: slabs: 207/207, objs: 207/207, free: 0 rekonq: page allocation failure: order:1, mode:0x204000 CPU: 2 PID: 14321 Comm: rekonq Tainted: G O 3.15.0-rc3-12.gfc9498b-desktop+ #6 Hardware name: System manufacturer System Product Name/M4A785TD-V EVO, BIOS 2105 07/23/2010 0000000000000000 ffff880010ff17d0 ffffffff815e693c 0000000000204000 ffff880010ff1858 ffffffff81137bd2 0000000000000000 0000001000000000 ffff88011ffebc38 0000000000000001 0000000000204000 ffff88011ffea000 Call Trace: [<ffffffff815e693c>] dump_stack+0x4d/0x6f [<ffffffff81137bd2>] warn_alloc_failed+0xd2/0x140 [<ffffffff8113be19>] __alloc_pages_nodemask+0x7e9/0xa30 [<ffffffff811824a8>] kmem_getpages+0x58/0x140 [<ffffffff81183de6>] fallback_alloc+0x1d6/0x210 [<ffffffff81183be3>] ____cache_alloc_node+0x123/0x150 [<ffffffff81185953>] __kmalloc+0x203/0x490 [<ffffffffa06b0ee2>] rpc_malloc+0x32/0xa0 [sunrpc] [<ffffffffa06a6999>] call_allocate+0xb9/0x170 [sunrpc] [<ffffffffa06b19d8>] __rpc_execute+0x88/0x460 [sunrpc] [<ffffffffa06b2da9>] rpc_execute+0x59/0xc0 [sunrpc] [<ffffffffa06a932b>] rpc_run_task+0x6b/0x90 [sunrpc] [<ffffffffa077b5c1>] nfs4_call_sync_sequence+0x51/0x80 [nfsv4] [<ffffffffa077d45d>] _nfs4_do_setattr+0x1ed/0x280 [nfsv4] [<ffffffffa0782a72>] nfs4_do_setattr+0x72/0x180 [nfsv4] [<ffffffffa078334c>] nfs4_proc_setattr+0xbc/0x140 [nfsv4] [<ffffffffa074a7e8>] nfs_setattr+0xd8/0x240 [nfs] [<ffffffff811baa71>] notify_change+0x231/0x380 [<ffffffff8119cf5c>] chmod_common+0xfc/0x120 [<ffffffff8119df80>] SyS_chmod+0x40/0x90 [<ffffffff815f4cfd>] system_call_fastpath+0x1a/0x1f ... If the allocation fails, simply return NULL and avoid spamming the kernel log. Reported-by: Marc Dietrich <marvin24@gmx.de> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-05-29 11:11:51 -04:00
Avraham Stern	d3a58df87a	mac80211: set new interfaces as idle upon init Mark new interfaces as idle to allow operations that require that interfaces are idle to take place. Interface types that are always not idle (like AP interfaces) will be set as not idle when they are assigned a channel context. Signed-off-by: Avraham Stern <avraham.stern@intel.com> Signed-off-by: Emmanuel Grumbach<emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-28 16:22:49 +02:00
Felix Fietkau	abd43a6a68	mac80211: reduce packet loss notifications under load During strong signal fluctuations under high throughput, few consecutive failed A-MPDU transmissions can easily trigger packet loss notification, and thus (in AP mode) client disconnection. Reduce the number of false positives by checking the A-MPDU status flag and treating a failed A-MPDU as a single packet. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-28 16:22:48 +02:00
Arik Nemtsov	923eaf3672	mac80211: don't check netdev state for debugfs read/write Doing so will lead to an oops for a p2p-dev interface, since it has no netdev. Cc: stable@vger.kernel.org Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-28 16:22:48 +02:00
Felix Fietkau	53d045258e	mac80211: fix a memory leak on sta rate selection table If the rate control algorithm uses a selection table, it is leaked when the station is destroyed - fix that. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Reported-by: Christophe Prévotaux <cprevotaux@nltinc.com> Fixes: `0d528d85c5` ("mac80211: improve the rate control API") Cc: stable@vger.kernel.org # v3.10+ [add commit log entry, remove pointless NULL check] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-28 16:22:41 +02:00
John W. Linville	9db7cb6901	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem	2014-05-27 13:51:31 -04:00
John W. Linville	03c4444650	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless	2014-05-27 13:47:27 -04:00
chaitanya.mgit@gmail.com	a9fb54169b	regdb: Generalize the mW to dBm power conversion Generalize the power conversion from mW to dBm using log. This should fix the below compilation error for country NO which adds a new power value 2000mW which is not handled earlier. CC [M] net/wireless/wext-sme.o CC [M] net/wireless/regdb.o net/wireless/regdb.c:1130:1: error: Unknown undeclared here (not in a function) net/wireless/regdb.c:1130:9: error: expected } before power make[2]: * [net/wireless/regdb.o] Error 1 make[1]: * [net/wireless] Error 2 make: *** [net] Error 2 Reported-By: John Walker <john@x109.net> Signed-off-by: Chaitanya T K <chaitanya.mgit@gmail.com> Acked-by: John W. Linville <linville@tuxdriver.com> [remove unneeded parentheses, fix rounding by using %.0f] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-27 17:58:58 +02:00
Krzysztof Hałasa	c7d37a66e3	mac80211: fix IBSS join by initializing last_scan_completed Without this fix, freshly rebooted Linux creates a new IBSS instead of joining an existing one. Only when jiffies counter overflows after 5 minutes the IBSS can be successfully joined. Signed-off-by: Krzysztof Hałasa <khalasa@piap.pl> [edit commit message slightly] Cc: stable@vger.kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-27 08:54:01 +02:00
Johannes Berg	3bb2055672	cfg80211: send events when devices are added/removed We're currently sending NEW_WIPHY events for renames (which is a bit odd, but now can't be changed), but also send them for really new devices that register. Also send DEL_WIPHY events when a device is removed, the event ID for this was already reserved. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-26 13:52:25 +02:00
Emmanuel Grumbach	34171dc0d6	mac80211: fix virtual monitor interface addition Since the commit below, cfg80211_chandef_dfs_required() will warn if it gets a an NL80211_IFTYPE_UNSPECIFIED iftype as explicitely written in the commit log. When an virtual monitor interface is added, its type is set in ieee80211_sub_if_data.vif.type, but not in ieee80211_sub_if_data.wdev.iftype which is passed to cfg80211_chandef_dfs_required() hence resulting in the following warning: WARNING: CPU: 1 PID: 21265 at net/wireless/chan.c:376 cfg80211_chandef_dfs_required+0xbc/0x130 [cfg80211]() Modules linked in: [...] CPU: 1 PID: 21265 Comm: ifconfig Tainted: G W O 3.13.11+ #12 Hardware name: Dell Inc. Latitude E6410/0667CC, BIOS A01 03/05/2010 0000000000000009 ffff88008f5fdb08 ffffffff817d4219 ffff88008f5fdb50 ffff88008f5fdb40 ffffffff8106f57d 0000000000000000 0000000000000000 ffff880081062fb8 ffff8800810604e0 0000000000000001 ffff88008f5fdba0 Call Trace: [<ffffffff817d4219>] dump_stack+0x4d/0x66 [<ffffffff8106f57d>] warn_slowpath_common+0x7d/0xa0 [<ffffffff8106f5ec>] warn_slowpath_fmt+0x4c/0x50 [<ffffffffa04ea4ec>] cfg80211_chandef_dfs_required+0xbc/0x130 [cfg80211] [<ffffffffa06b1024>] ieee80211_vif_use_channel+0x94/0x500 [mac80211] [<ffffffffa0684e6b>] ieee80211_add_virtual_monitor+0x1ab/0x5c0 [mac80211] [<ffffffffa0686ae5>] ieee80211_do_open+0xe75/0x1580 [mac80211] [<ffffffffa0687259>] ieee80211_open+0x69/0x70 [mac80211] [snip] Fixes: `00ec75fc5a` ("cfg80211: pass the actual iftype when calling cfg80211_chandef_dfs_required()") Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Acked-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-26 11:04:42 +02:00
Luciano Coelho	1a5f0c13d1	mac80211: add a single-transaction driver op to switch contexts In some cases, when the driver is already using all the channel contexts it can handle at once, we have to do an in-place switch (ie. we cannot afford using an extra context temporarily for the transaction). But some drivers may not support switching the channel context assigned to a vif on the fly (ie. without unassigning and assigning it) while others may only work if the context is changed on the fly, without unassigning it first. To allow these different scenarios, add a new driver operation that let's the driver decide how to handle an in-place switch. Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-26 11:04:41 +02:00
Pablo Neira	1708803ef2	netfilter: bridge: fix Kconfig unmet dependencies Before `f5efc69` ("netfilter: nf_tables: Add meta expression key for bridge interface name"), the entire net/bridge/netfilter/ directory depended on BRIDGE_NF_EBTABLES, ie. on ebtables. However, that directory already contained the nf_tables bridge extension that we should allow to compile separately. In `f5efc69`, we tried to generalize this by using CONFIG_BRIDGE_NETFILTER which was not a good idea since this option already existed and it is dedicated to enable the Netfilter bridge IP/ARP filtering. Let's try to fix this mess by: 1) making net/bridge/netfilter/ dependent on the toplevel CONFIG_NETFILTER option, just like we do with the net/netfilter and net/ipv{4,6}/netfilter/ directories. 2) Changing 'selects' to 'depends on' NETFILTER_XTABLES for BRIDGE_NF_EBTABLES. I believe this problem was already before `f5efc69`: warning: (BRIDGE_NF_EBTABLES) selects NETFILTER_XTABLES which has unmet direct dependencies (NET && INET && NETFILTER) 3) Fix ebtables/nf_tables bridge dependencies by making NF_TABLES_BRIDGE and BRIDGE_NF_EBTABLES dependent on BRIDGE and NETFILTER: warning: (NF_TABLES_BRIDGE && BRIDGE_NF_EBTABLES) selects BRIDGE_NETFILTER which has unmet direct dependencies (NET && BRIDGE && NETFILTER && INET && NETFILTER_ADVANCED) net/built-in.o: In function `br_parse_ip_options': br_netfilter.c:(.text+0x4a5ba): undefined reference to `ip_options_compile' br_netfilter.c:(.text+0x4a5ed): undefined reference to `ip_options_rcv_srr' net/built-in.o: In function `br_nf_pre_routing_finish': br_netfilter.c:(.text+0x4a8a4): undefined reference to `ip_route_input_noref' br_netfilter.c:(.text+0x4a987): undefined reference to `ip_route_output_flow' make: *** [vmlinux] Error 1 Reported-by: Jim Davis <jim.epost@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-26 00:42:30 -04:00
Peter Christensen	f44a5f45f5	ipvs: Fix panic due to non-linear skb Receiving a ICMP response to an IPIP packet in a non-linear skb could cause a kernel panic in __skb_pull. The problem was introduced in commit `f2edb9f770` ("ipvs: implement passive PMTUD for IPIP packets"). Signed-off-by: Peter Christensen <pch@ordbogen.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2014-05-26 10:22:46 +09:00
Fengguang Wu	db3287da34	NFC: nfc_sock_link() can be static CC: Hiren Tandel <hirent@marvell.com> CC: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-05-26 00:53:10 +02:00
Fengguang Wu	cb30caf027	NFC: digital: digital_in_send_attrib_req() can be static CC: "Mark A. Greer" <mgreer@animalcreek.com> CC: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-05-26 00:52:15 +02:00
Thierry Escande	9dc33705b2	NFC: digital: Randomize poll cycles This change adds some entropy to polling cycles, choosing the next polling rf technology randomly. This reflects the change done in the pn533 driver, avoiding possible infinite loop for devices that export 2 targets on 2 different modulations. If the first target is not readable, we will stay in an error loop for ever. Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-05-26 00:42:02 +02:00
Thierry Escande	00e625df3e	NFC: digital: Return proper error code when sending ATR_REQ The error code returned by digital_in_send_cmd() was not returned by digital_in_send_atr_req(). Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-05-26 00:42:02 +02:00
Arnaldo Carvalho de Melo	85d3fc9418	tipc: Don't reset the timeout when restarting As it may then take longer than what the user specified using setsockopt(SO_RCVTIMEO). Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-24 14:11:41 -04:00
David S. Miller	8646224cdb	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== pull request: wireless 2014-05-23 I have two more fixes intended for the 3.15 stream... For the iwlwifi one, Emmanuel says: "A race has been discovered in the beacon filtering code. Since the fix is too big for 3.15, I disable here the feature." For the bluetooth one, Gustavo says: "This pull request contains a very important fix for 3.15. Here we fix the permissions of a debugfs file that would otherwise allow unauthorized users to write content to it." Please let me know if there are problems! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-24 14:06:19 -04:00
David S. Miller	54e5c4def0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/bonding/bond_alb.c drivers/net/ethernet/altera/altera_msgdma.c drivers/net/ethernet/altera/altera_sgdma.c net/ipv6/xfrm6_output.c Several cases of overlapping changes. The xfrm6_output.c has a bug fix which overlaps the renaming of skb->local_df to skb->ignore_df. In the Altera TSE driver cases, the register access cleanups in net-next overlapped with bug fixes done in net. Similarly a bug fix to send ALB packets in the bonding driver using the right source address overlaps with cleanups in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-24 00:32:30 -04:00
Linus Torvalds	5fa6a683c0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: "It looks like a sizeble collection but this is nearly 3 weeks of bug fixing while you were away. 1) Fix crashes over IPSEC tunnels with NAT, the latter can reroute the packet through a non-IPSEC protected path and the code has to be able to handle SKBs attached to routes lacking an attached xfrm state. From Steffen Klassert. 2) Fix OOPSs in ipv4 and ipv6 ipsec layers for unsupported sub-protocols, also from Steffen Klassert. 3) Set local_df on fragmented netfilter skbs otherwise we won't be able to forward successfully, from Florian Westphal. 4) cdc_mbim ipv6 neighbour code does __vlan_find_dev_deep without holding RCU lock, from Bjorn Mork. 5) local_df test in ip_may_fragment is inverted, from Florian Westphal. 6) jme driver doesn't check for DMA mapping failures, from Neil Horman. 7) qlogic driver doesn't calculate number of TX queues properly, from Shahed Shaikh. 8) fib_info_cnt can drift irreversibly positive if we fail to allocate the fi->fib_metrics array, from Sergey Popovich. 9) Fix use after free in ip6_route_me_harder(), also from Sergey Popovich. 10) When SYSCTL is disabled, we don't handle local_port_range and ping_group_range defaults properly at all, from Cong Wang. 11) Unaccelerated VLAN tagged frames improperly handled by cdc_mbim driver, fix from Bjorn Mork. 12) cassini driver needs nested lock annotations for TX locking, from Emil Goode. 13) On init error ipv6 VTI driver can unregister pernet ops twice, oops. Fix from Mahtias Krause. 14) If macvlan device is down, don't propagate IFF_ALLMULTI changes, from Peter Christensen. 15) Missing NULL pointer check while parsing netlink config options in ip6_tnl_validate(). From Susant Sahani. 16) Fix handling of neighbour entries during ipv6 router reachability probing, from Duan Jiong. 17) x86 and s390 JIT address randomization has some address calculation bugs leading to crashes, from Alexei Starovoitov and Heiko Carstens. 18) Clear up those uglies with nop patching and net_get_random_once(), from Hannes Frederic Sowa. 19) Option length miscalculated in ip6_append_data(), fix also from Hannes Frederic Sowa. 20) A while ago we fixed a race during device unregistry when a namespace went down, turns out there is a second place that needs similar protection. From Cong Wang. 21) In the new Altera TSE driver multicast filtering isn't working, disable it and just use promisc mode until the cause is found. From Vince Bridgers. 22) When we disable router enabling in ipv6 we have to flush the cached routes explicitly, from Duan Jiong. 23) NBMA tunnels should not cache routes on the tunnel object because the key is variable, from Timo Teräs. 24) With stacked devices GRO information in skb->cb[] can be not setup properly, make sure it is in all code paths. From Eric Dumazet. 25) Really fix stacked vlan locking, multiple levels of nesting with intervening non-vlan devices are possible. From Vlad Yasevich. 26) Fallback ipip tunnel device's mtu is not setup properly, from Steffen Klassert. 27) The packet scheduler's tcindex filter can crash because we structure copy objects with list_head's inside, oops. From Cong Wang. 28) Fix CHECKSUM_COMPLETE handling for ipv6 GRE tunnels, from Eric Dumazet. 29) In some configurations 'itag' in __mkroute_input() can end up being used uninitialized because of how fib_validate_source() works. Fix it by explitly initializing itag to zero like all the other fib_validate_source() callers do, from Li RongQing" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits) batman: fix a bogus warning from batadv_is_on_batman_iface() ipv4: initialise the itag variable in __mkroute_input bonding: Send ALB learning packets using the right source bonding: Don't assume 802.1Q when sending alb learning packets. net: doc: Update references to skb->rxhash stmmac: Remove unbalanced clk_disable call ipv6: gro: fix CHECKSUM_COMPLETE support net_sched: fix an oops in tcindex filter can: peak_pci: prevent use after free at netdev removal ip_tunnel: Initialize the fallback device properly vlan: Fix build error wth vlan_get_encap_level() can: c_can: remove obsolete STRICT_FRAME_ORDERING Kconfig option MAINTAINERS: Pravin Shelar is Open vSwitch maintainer. bnx2x: Convert return 0 to return rc bonding: Fix alb mode to only use first level vlans. bonding: Fix stacked device detection in arp monitoring macvlan: Fix lockdep warnings with stacked macvlan devices vlan: Fix lockdep warning with stacked vlan devices. net: Allow for more then a single subclass for netif_addr_lock net: Find the nesting level of a given device by type. ...	2014-05-23 15:29:43 -07:00
Daniel Borkmann	b1fcd35cf5	net: filter: let unattached filters use sock_fprog_kern The sk_unattached_filter_create() API is used by BPF filters that are not directly attached or related to sockets, and are used in team, ptp, xt_bpf, cls_bpf, etc. As such all users do their own internal managment of obtaining filter blocks and thus already have them in kernel memory and set up before calling into sk_unattached_filter_create(). As a result, due to __user annotation in sock_fprog, sparse triggers false positives (incorrect type in assignment [different address space]) when filters are set up before passing them to sk_unattached_filter_create(). Therefore, let sk_unattached_filter_create() API use sock_fprog_kern to overcome this issue. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-23 16:48:05 -04:00
Daniel Borkmann	8556ce79d5	net: filter: remove DL macro Lets get rid of this macro. After commit `5bcfedf06f` ("net: filter: simplify label names from jump-table"), labels have become more readable due to omission of BPF_ prefix but at the same time more generic, so that things like `git grep -n` would not find them. As a middle path, lets get rid of the DL macro as it's not strictly needed and would otherwise just hide the full name. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-23 16:48:05 -04:00
Tom Herbert	6b649feafe	l2tp: Add support for zero IPv6 checksums Added new L2TP configuration options to allow TX and RX of zero checksums in IPv6. Default is not to use them. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-23 16:28:53 -04:00
Tom Herbert	1c19448c9b	net: Make enabling of zero UDP6 csums more restrictive RFC 6935 permits zero checksums to be used in IPv6 however this is recommended only for certain tunnel protocols, it does not make checksums completely optional like they are in IPv4. This patch restricts the use of IPv6 zero checksums that was previously intoduced. no_check6_tx and no_check6_rx have been added to control the use of checksums in UDP6 RX and TX path. The normal sk_no_check_{rx,tx} settings are not used (this avoids ambiguity when dealing with a dual stack socket). A helper function has been added (udp_set_no_check6) which can be called by tunnel impelmentations to all zero checksums (send on the socket, and accept them as valid). Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-23 16:28:53 -04:00
Tom Herbert	28448b8045	net: Split sk_no_check into sk_no_check_{rx,tx} Define separate fields in the sock structure for configuring disabling checksums in both TX and RX-- sk_no_check_tx and sk_no_check_rx. The SO_NO_CHECK socket option only affects sk_no_check_tx. Also, removed UDP_CSUM_* defines since they are no longer necessary. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-23 16:28:53 -04:00
Tom Herbert	b26ba202e0	net: Eliminate no_check from protosw It doesn't seem like an protocols are setting anything other than the default, and allowing to arbitrarily disable checksums for a whole protocol seems dangerous. This can be done on a per socket basis. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-23 16:28:53 -04:00
Tom Herbert	0f8066bd48	sunrpc: Remove sk_no_check setting Setting sk_no_check to UDP_CSUM_NORCV seems to have no effect. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-23 16:28:53 -04:00
Sucheta Chakraborty	ed616689a3	net-next:v4: Add support to configure SR-IOV VF minimum and maximum Tx rate through ip tool. o min_tx_rate puts lower limit on the VF bandwidth. VF is guaranteed to have a bandwidth of at least this value. max_tx_rate puts cap on the VF bandwidth. VF can have a bandwidth of up to this value. o A new handler set_vf_rate for attr IFLA_VF_RATE has been introduced which takes 4 arguments: netdev, VF number, min_tx_rate, max_tx_rate o ndo_set_vf_rate replaces ndo_set_vf_tx_rate handler. o Drivers that currently implement ndo_set_vf_tx_rate should now call ndo_set_vf_rate instead and reject attempt to set a minimum bandwidth greater than 0 for IFLA_VF_TX_RATE when IFLA_VF_RATE is not yet implemented by driver. o If user enters only one of either min_tx_rate or max_tx_rate, then, userland should read back the other value from driver and set both for IFLA_VF_RATE. Drivers that have not yet implemented IFLA_VF_RATE should always return min_tx_rate as 0 when read from ip tool. o If both IFLA_VF_TX_RATE and IFLA_VF_RATE options are specified, then IFLA_VF_RATE should override. o Idea is to have consistent display of rate values to user. o Usage example: - ./ip link set p4p1 vf 0 rate 900 ./ip link show p4p1 32: p4p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000 link/ether 00:0e:1e:08:b0:f0 brd ff:ff:ff:ff:ff:ff vf 0 MAC 3e:a0:ca:bd:ae:5a, tx rate 900 (Mbps), max_tx_rate 900Mbps vf 1 MAC f6:c6:7c:3f:3d:6c vf 2 MAC 56:32:43:98:d7:71 vf 3 MAC d6:be:c3:b5:85:ff vf 4 MAC ee:a9:9a:1e:19:14 vf 5 MAC 4a:d0:4c:07:52:18 vf 6 MAC 3a:76:44:93:62:f9 vf 7 MAC 82:e9:e7:e3:15:1a ./ip link set p4p1 vf 0 max_tx_rate 300 min_tx_rate 200 ./ip link show p4p1 32: p4p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000 link/ether 00:0e:1e:08:b0:f0 brd ff:ff:ff:ff:ff:ff vf 0 MAC 3e:a0:ca:bd:ae:5a, tx rate 300 (Mbps), max_tx_rate 300Mbps, min_tx_rate 200Mbps vf 1 MAC f6:c6:7c:3f:3d:6c vf 2 MAC 56:32:43:98:d7:71 vf 3 MAC d6:be:c3:b5:85:ff vf 4 MAC ee:a9:9a:1e:19:14 vf 5 MAC 4a:d0:4c:07:52:18 vf 6 MAC 3a:76:44:93:62:f9 vf 7 MAC 82:e9:e7:e3:15:1a ./ip link set p4p1 vf 0 max_tx_rate 600 rate 300 ./ip link show p4p1 32: p4p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000 link/ether 00:0e:1e:08:b0:f brd ff:ff:ff:ff:ff:ff vf 0 MAC 3e:a0:ca:bd:ae:5, tx rate 600 (Mbps), max_tx_rate 600Mbps, min_tx_rate 200Mbps vf 1 MAC f6:c6:7c:3f:3d:6c vf 2 MAC 56:32:43:98:d7:71 vf 3 MAC d6:be:c3:b5:85:ff vf 4 MAC ee:a9:9a:1e:19:14 vf 5 MAC 4a:d0:4c:07:52:18 vf 6 MAC 3a:76:44:93:62:f9 vf 7 MAC 82:e9:e7:e3:15:1a Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-23 15:04:02 -04:00
Johan Hedberg	d7b2545023	Bluetooth: Clearly distinguish mgmt LTK type from authenticated property On the mgmt level we have a key type parameter which currently accepts two possible values: 0x00 for unauthenticated and 0x01 for authenticated. However, in the internal struct smp_ltk representation we have an explicit "authenticated" boolean value. To make this distinction clear, add defines for the possible mgmt values and do conversion to and from the internal authenticated value. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-05-23 11:24:04 -07:00
John W. Linville	5ca2504ea3	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem	2014-05-23 10:55:58 -04:00
Pravin B Shelar	0c200ef94c	openvswitch: Simplify genetlink code. Following patch get rid of struct genl_family_and_ops which is redundant due to changes to struct genl_family. Signed-off-by: Kyle Mestery <mestery@noironetworks.com> Acked-by: Kyle Mestery <mestery@noironetworks.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-05-22 16:27:37 -07:00
Jarno Rajahalme	893f139b9a	openvswitch: Minimize ovs_flow_cmd_new\|set critical sections. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-05-22 16:27:36 -07:00
Jarno Rajahalme	37bdc87ba0	openvswitch: Split ovs_flow_cmd_new_or_set(). Following patch will be easier to reason about with separate ovs_flow_cmd_new() and ovs_flow_cmd_set() functions. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-05-22 16:27:36 -07:00
Jarno Rajahalme	aed067783e	openvswitch: Minimize ovs_flow_cmd_del critical section. ovs_flow_cmd_del() now allocates reply (if needed) after the flow has already been removed from the flow table. If the reply allocation fails, a netlink error is signaled with netlink_set_err(), as is already done in ovs_flow_cmd_new_or_set() in the similar situation. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-05-22 16:27:36 -07:00
Jarno Rajahalme	0e9796b4af	openvswitch: Reduce locking requirements. Reduce and clarify locking requirements for ovs_flow_cmd_alloc_info(), ovs_flow_cmd_fill_info() and ovs_flow_cmd_build_info(). A datapath pointer is available only when holding a lock. Change ovs_flow_cmd_fill_info() and ovs_flow_cmd_build_info() to take a dp_ifindex directly, rather than a datapath pointer that is then (only) used to get the dp_ifindex. This is useful, since the dp_ifindex is available even when the datapath pointer is not, both before and after taking a lock, which makes further critical section reduction possible. Make ovs_flow_cmd_alloc_info() take an 'acts' argument instead a 'flow' pointer. This allows some future patches to do the allocation before acquiring the flow pointer. The locking requirements after this patch are: ovs_flow_cmd_alloc_info(): May be called without locking, must not be called while holding the RCU read lock (due to memory allocation). If 'acts' belong to a flow in the flow table, however, then the caller must hold ovs_mutex. ovs_flow_cmd_fill_info(): Either ovs_mutex or RCU read lock must be held. ovs_flow_cmd_build_info(): This calls both of the above, so the caller must hold ovs_mutex. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-05-22 16:27:36 -07:00
Jarno Rajahalme	86ec8dbae2	openvswitch: Fix ovs_flow_stats_get/clear RCU dereference. For ovs_flow_stats_get() using ovsl_dereference() was wrong, since flow dumps call this with RCU read lock. ovs_flow_stats_clear() is always called with ovs_mutex, so can use ovsl_dereference(). Also, make the ovs_flow_stats_get() 'flow' argument const to make later patches cleaner. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-05-22 16:27:35 -07:00
Jarno Rajahalme	eb07265904	openvswitch: Fix typo. Incorrect struct name was confusing, even though otherwise inconsequental. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-05-22 16:27:35 -07:00
Jarno Rajahalme	6093ae9aba	openvswitch: Minimize dp and vport critical sections. Move most memory allocations away from the ovs_mutex critical sections. vport allocations still happen while the lock is taken, as changing that would require major refactoring. Also, vports are created very rarely so it should not matter. Change ovs_dp_cmd_get() now only takes the rcu_read_lock(), rather than ovs_lock(), as nothing need to be changed. This was done by ovs_vport_cmd_get() already. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-05-22 16:27:35 -07:00
Jarno Rajahalme	56c19868e1	openvswitch: Make flow mask removal symmetric. Masks are inserted when flows are inserted to the table, so it is logical to correspondingly remove masks when flows are removed from the table, in ovs_flow_table_remove(). This allows ovs_flow_free() to be called without locking, which will be used by later patches. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-05-22 16:27:35 -07:00
Jarno Rajahalme	fb5d1e9e12	openvswitch: Build flow cmd netlink reply only if needed. Use netlink_has_listeners() and NLM_F_ECHO flag to determine if a reply is needed or not for OVS_FLOW_CMD_NEW, OVS_FLOW_CMD_SET, or OVS_FLOW_CMD_DEL. Currently, OVS userspace does not request a reply for OVS_FLOW_CMD_NEW, but usually does for OVS_FLOW_CMD_DEL, as stats may have changed. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-05-22 16:27:34 -07:00
Jarno Rajahalme	bb6f9a708d	openvswitch: Clarify locking. Remove unnecessary locking from functions that are always called with appropriate locking. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Thomas Graf <tgraf@redhat.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-05-22 16:27:34 -07:00
Jarno Rajahalme	be52c9e96a	openvswitch: Avoid assigning a NULL pointer to flow actions. Flow SET can accept an empty set of actions, with the intended semantics of leaving existing actions unmodified. This seems to have been brokin after OVS 1.7, as we have assigned the flow's actions pointer to NULL in this case, but we never check for the NULL pointer later on. This patch restores the intended behavior and documents it in the include/linux/openvswitch.h. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-05-22 16:27:34 -07:00
Jarno Rajahalme	1139e241ec	openvswitch: Compact sw_flow_key. Minimize padding in sw_flow_key and move 'tp' top the main struct. These changes simplify code when accessing the transport port numbers and the tcp flags, and makes the sw_flow_key 8 bytes smaller on 64-bit systems (128->120 bytes). These changes also make the keys for IPv4 packets to fit in one cache line. There is a valid concern for safety of packing the struct ovs_key_ipv4_tunnel, as it would be possible to take the address of the tun_id member as a __be64 * which could result in unaligned access in some systems. However: - sw_flow_key itself is 64-bit aligned, so the tun_id within is always 64-bit aligned. - We never make arrays of ovs_key_ipv4_tunnel (which would force every second tun_key to be misaligned). - We never take the address of the tun_id in to a __be64 *. - Whereever we use struct ovs_key_ipv4_tunnel outside the sw_flow_key, it is in stack (on tunnel input functions), where compiler has full control of the alignment. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-05-22 16:27:34 -07:00
Cong Wang	b6ed549860	batman: fix a bogus warning from batadv_is_on_batman_iface() batman tries to search dev->iflink to check if it's a batman interface, but ->iflink could be 0, which is not a valid ifindex. It should just avoid iflink == 0 case. Reported-by: Jet Chen <jet.chen@intel.com> Tested-by: Jet Chen <jet.chen@intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Antonio Quartulli <antonio@open-mesh.com> Cc: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-22 17:23:00 -04:00
David S. Miller	65db611a5c	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2014-05-22 This is the last ipsec pull request before I leave for a three weeks vacation tomorrow. David, can you please take urgent ipsec patches directly into net/net-next during this time? I'll continue to run the ipsec/ipsec-next trees as soon as I'm back. 1) Simplify the xfrm audit handling, from Tetsuo Handa. 2) Codingstyle cleanup for xfrm_output, from abian Frederick. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-22 16:00:00 -04:00
NeilBrown	ef11ce2487	SUNRPC: track whether a request is coming from a loop-back interface. If an incoming NFS request is coming from the local host, then nfsd will need to perform some special handling. So detect that possibility and make the source visible in rq_local. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-22 15:59:18 -04:00
Li RongQing	fbdc0ad095	ipv4: initialise the itag variable in __mkroute_input the value of itag is a random value from stack, and may not be initiated by fib_validate_source, which called fib_combine_itag if CONFIG_IP_ROUTE_CLASSID is not set This will make the cached dst uncertainty Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-22 15:57:36 -04:00
Trond Myklebust	c789102c20	SUNRPC: Fix a module reference leak in svc_handle_xprt If the accept() call fails, we need to put the module reference. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-22 15:57:22 -04:00
Chuck Lever	16e4d93f6d	NFSD: Ignore client's source port on RDMA transports An NFS/RDMA client's source port is meaningless for RDMA transports. The transport layer typically sets the source port value on the connection to a random ephemeral port. Currently, NFS server administrators must specify the "insecure" export option to enable clients to access exports via RDMA. But this means NFS clients can access such an export via IP using an ephemeral port, which may not be desirable. This patch eliminates the need to specify the "insecure" export option to allow NFS/RDMA clients access to an export. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=250 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-22 15:55:48 -04:00
Dan Carpenter	b3f7a7b48f	ieee802154: missing put_dev() on error We should call put_dev() on the error path here. Fixes: `3e9c156e2c` ('ieee802154: add netlink interfaces for llsec') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-22 15:54:45 -04:00
Cong Wang	b1282726d5	bridge: make br_device_notifier static Merge net/bridge/br_notify.c into net/bridge/br.c, since it has only br_device_event() and br.c is small. Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-22 15:33:47 -04:00
Chen Gang	5c4a43b024	net/dccp/timer.c: use 'u64' instead of 's64' to avoid compiler's warning 'dccp_timestamp_seed' is initialized once by ktime_get_real() in dccp_timestamping_init(). It is always less than ktime_get_real() in dccp_timestamp(). Then, ktime_us_delta() in dccp_timestamp() will always return positive number. So can use manual type cast to let compiler and do_div() know about it to avoid warning. The related warning (with allmodconfig under unicore32): CC [M] net/dccp/timer.o net/dccp/timer.c: In function ‘dccp_timestamp’: net/dccp/timer.c:285: warning: comparison of distinct pointer types lacks a cast Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-22 15:31:45 -04:00
Phoebe Buckheister	53819a6ced	mac802154: llsec: correctly lookup implicit-indexed keys Key id comparison for type 1 keys (implicit source, with index) should return true if mode and id are equal, not false. Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-22 15:27:32 -04:00
Phoebe Buckheister	62e9c117ee	mac802154: llsec: fold useless return value check llsec_do_encrypt will never return a positive value, so the restriction to 0-or-negative on return is useless. Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-22 15:24:13 -04:00
Phoebe Buckheister	6f3eabcd04	mac802154: llsec: fix incorrect lock pairing In encrypt, sec->lock is taken with read_lock_bh, so in the error path, we must read_unlock_bh. Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-22 15:24:13 -04:00
Michal Kubeček	da08143b85	vlan: more careful checksum features handling When combining real_dev's features and vlan_features, simple bitwise AND is used. This doesn't work well for checksum offloading features as if one set has NETIF_F_HW_CSUM and the other NETIF_F_IP_CSUM and/or NETIF_F_IPV6_CSUM, we end up with no checksum offloading. However, from the logical point of view (how can_checksum_protocol() works), NETIF_F_HW_CSUM contains the functionality of NETIF_F_IP_CSUM and NETIF_F_IPV6_CSUM so that the result should be IP/IPV6. Add helper function netdev_intersect_features() implementing this logic and use it in vlan_dev_fix_features(). Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-22 15:07:23 -04:00
Ezequiel Garcia	e876f208af	net: Add a software TSO helper API Although the implementation probably needs a lot of work, this initial API allows to implement software TSO in mvneta and mv643xx_eth drivers in a not so intrusive way. Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-22 14:57:15 -04:00
John W. Linville	40a10fd740	Merge git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next	2014-05-22 13:58:36 -04:00
John W. Linville	99abe65ff1	NFC: 3.16: First pull request This is the NFC pull request for 3.16. We have: - STMicroeectronics st21nfca support. The st21nfca is an HCI chipset and thus relies on the HCI stack. This submission provides support for tag redaer/writer mode (including Type 5) and device tree bindings. - PM runtime support and a bunch of bug fixes for TI's trf7970a. - Device tree support for NXP's pn544. Legacy platform data support is obviously kept intact. - NFC Tag type 4B support to the NFC Digital stack. - SOCK_RAW type support to the raw NFC socket, and allow NCI sniffing from that. This can be extended to report HCI frames and also proprietarry ones like e.g. the pn533 ones. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJTepRlAAoJEIqAPN1PVmxKnF0P/RvfrZs6CbGNJC+dkEbk90p1 nsngy4+4MmPwJYVzObnLz4Br0k1kmFKiOKske6drjMpgzDWeuQelw3B7bd3FYfxD YkQsc5RC984xrDoDH5pn8mA6VJqmn7whrmcibTYAixrDqTvo8gw6uja4ryAnSdZm n7cRbh/A5F/sa7O4mPA0bCTdp4jAS/vOP9rGFDOth/b5yJVs99XmC+AZp/Ad9BUx +/osWGmBV5jshtX7aPTSxIQB4BUaP/lP1DW8yF5whKDjsHC9QyJcAtw9HfZ4tv2h YNteZZ8yjM+rSjnDw/LvDc2Gp8Z8P1GYf8D3QN3cWhw1ZvXi7CnqKjEnm41sbfaH L5esIfsRBUdmk6Ika7zALqmOQFI3PzH+ag96punl29qb2gyBDRSnXKVLirv3xxFG h7vYtQL43Rosn/4pSilRbYReRwyKbSCxW3un/tUJy0Faafs6q+9oMC2aWbIfTT6l 40n4H9EmzYy2OaaXSFckiIIYYgVDAji8GLXTf+dPHb+NrH3QQOR3m27WzHc4rmYk kUrv0lKoFswA+VLlIcJTrSKNF21FDjwuImzIWiPz6Fx/+rWJ0b4GlQyIynD72LpR 2LkUhTrxuRuRtxVCtvTdkPlL6Bdp3HO7t4qZ0EirgnpmGK6NScBgABoqFJSbz9uS UUvZbHVIjLrDU9zzoyz8 =cSl+ -----END PGP SIGNATURE----- Merge tag 'nfc-next-3.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next Samuel Ortiz <sameo@linux.intel.com> says: "NFC: 3.16: First pull request This is the NFC pull request for 3.16. We have: - STMicroeectronics st21nfca support. The st21nfca is an HCI chipset and thus relies on the HCI stack. This submission provides support for tag redaer/writer mode (including Type 5) and device tree bindings. - PM runtime support and a bunch of bug fixes for TI's trf7970a. - Device tree support for NXP's pn544. Legacy platform data support is obviously kept intact. - NFC Tag type 4B support to the NFC Digital stack. - SOCK_RAW type support to the raw NFC socket, and allow NCI sniffing from that. This can be extended to report HCI frames and also proprietarry ones like e.g. the pn533 ones." Signed-off-by: John W. Linville <linville@tuxdriver.com>	2014-05-22 13:56:46 -04:00
David S. Miller	8af750d739	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables Pablo Neira Ayuso says: ==================== Netfilter/nftables updates for net-next The following patchset contains Netfilter/nftables updates for net-next, most relevantly they are: 1) Add set element update notification via netlink, from Arturo Borrero. 2) Put all object updates in one single message batch that is sent to kernel-space. Before this patch only rules where included in the batch. This series also introduces the generic transaction infrastructure so updates to all objects (tables, chains, rules and sets) are applied in an all-or-nothing fashion, these series from me. 3) Defer release of objects via call_rcu to reduce the time required to commit changes. The assumption is that all objects are destroyed in reverse order to ensure that dependencies betweem them are fulfilled (ie. rules and sets are destroyed first, then chains, and finally tables). 4) Allow to match by bridge port name, from Tomasz Bursztyka. This series include two patches to prepare this new feature. 5) Implement the proper set selection based on the characteristics of the data. The new infrastructure also allows you to specify your preferences in terms of memory and computational complexity so the underlying set type is also selected according to your needs, from Patrick McHardy. 6) Several cleanup patches for nft expressions, including one minor possible compilation breakage due to missing mark support, also from Patrick. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-22 12:06:23 -04:00
Neal Cardwell	ca8a226343	tcp: make cwnd-limited checks measurement-based, and gentler Experience with the recent `e114a710aa` ("tcp: fix cwnd limited checking to improve congestion control") has shown that there are common cases where that commit can cause cwnd to be much larger than necessary. This leads to TSO autosizing cooking skbs that are too large, among other things. The main problems seemed to be: (1) That commit attempted to predict the future behavior of the connection by looking at the write queue (if TSO or TSQ limit sending). That prediction sometimes overestimated future outstanding packets. (2) That commit always allowed cwnd to grow to twice the number of outstanding packets (even in congestion avoidance, where this is not needed). This commit improves both of these, by: (1) Switching to a measurement-based approach where we explicitly track the largest number of packets in flight during the past window ("max_packets_out"), and remember whether we were cwnd-limited at the moment we finished sending that flight. (2) Only allowing cwnd to grow to twice the number of outstanding packets ("max_packets_out") in slow start. In congestion avoidance mode we now only allow cwnd to grow if it was fully utilized. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-22 12:04:49 -04:00
Emmanuel Grumbach	67af981153	cfg80211: allow RSSI compensation Channels in 2.4GHz band overlap, this means that if we send a probe request on channel 1 and then move to channel 2, we will hear the probe response on channel 2. In this case, the RSSI will be lower than if we had heard it on the channel on which it was sent (1 in this case). The firmware / low level driver can parse the channel in the DS IE or HT IE and compensate the RSSI so that it will still have a valid value even if we heard the frame on an adjacent channel. This can be done up to a certain offset. Add this offset as a configuration for the low level driver. A low level driver that can compensate the low RSSI in this case should assign the maximal offset for which the RSSI value is still valid. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-22 09:58:49 +02:00
Eric Dumazet	4de462ab63	ipv6: gro: fix CHECKSUM_COMPLETE support When GRE support was added in linux-3.14, CHECKSUM_COMPLETE handling broke on GRE+IPv6 because we did not update/use the appropriate csum : GRO layer is supposed to use/update NAPI_GRO_CB(skb)->csum instead of skb->csum Tested using a GRE tunnel and IPv6 traffic. GRO aggregation now happens at the first level (ethernet device) instead of being done in gre tunnel. Native IPv6+TCP is still properly aggregated. Fixes: `bf5a755f5e` ("net-gre-gro: Add GRE support to the GRO stack") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jerry Chu <hkchu@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-21 17:18:47 -04:00
Alexei Starovoitov	5fe821a9de	net: filter: cleanup invocation of internal BPF Kernel API for classic BPF socket filters is: sk_unattached_filter_create() - validate classic BPF, convert, JIT SK_RUN_FILTER() - run it sk_unattached_filter_destroy() - destroy socket filter Cleanup internal BPF kernel API as following: sk_filter_select_runtime() - final step of internal BPF creation. Try to JIT internal BPF program, if JIT is not available select interpreter SK_RUN_FILTER() - run it sk_filter_free() - free internal BPF program Disallow direct calls to BPF interpreter. Execution of the BPF program should be done with SK_RUN_FILTER() macro. Example of internal BPF create, run, destroy: struct sk_filter fp; fp = kzalloc(sk_filter_size(prog_len), GFP_KERNEL); memcpy(fp->insni, prog, prog_len sizeof(fp->insni[0])); fp->len = prog_len; sk_filter_select_runtime(fp); SK_RUN_FILTER(fp, ctx); sk_filter_free(fp); Sockets, seccomp, testsuite, tracing are using different ways to populate sk_filter, so first steps of program creation are not common. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-21 17:07:17 -04:00
Cong Wang	bf63ac73b3	net_sched: fix an oops in tcindex filter Kelly reported the following crash: IP: [<ffffffff817a993d>] tcf_action_exec+0x46/0x90 PGD 3009067 PUD `300c067` PMD 11ff30067 PTE 800000011634b060 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC CPU: 1 PID: 639 Comm: dhclient Not tainted 3.15.0-rc4+ #342 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff8801169ecd00 ti: ffff8800d21b8000 task.ti: ffff8800d21b8000 RIP: 0010:[<ffffffff817a993d>] [<ffffffff817a993d>] tcf_action_exec+0x46/0x90 RSP: 0018:ffff8800d21b9b90 EFLAGS: 00010283 RAX: 00000000ffffffff RBX: ffff88011634b8e8 RCX: ffff8800cf7133d8 RDX: ffff88011634b900 RSI: ffff8800cf7133e0 RDI: ffff8800d210f840 RBP: ffff8800d21b9bb0 R08: ffffffff8287bf60 R09: 0000000000000001 R10: ffff8800d2b22b24 R11: 0000000000000001 R12: ffff8800d210f840 R13: ffff8800d21b9c50 R14: ffff8800cf7133e0 R15: ffff8800cad433d8 FS: 00007f49723e1840(0000) GS:ffff88011a800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff88011634b8f0 CR3: 00000000ce469000 CR4: 00000000000006e0 Stack: ffff8800d2170188 ffff8800d210f840 ffff8800d2171b90 0000000000000000 ffff8800d21b9be8 ffffffff817c55bb ffff8800d21b9c50 ffff8800d2171b90 ffff8800d210f840 ffff8800d21b0300 ffff8800d21b9c50 ffff8800d21b9c18 Call Trace: [<ffffffff817c55bb>] tcindex_classify+0x88/0x9b [<ffffffff817a7f7d>] tc_classify_compat+0x3e/0x7b [<ffffffff817a7fdf>] tc_classify+0x25/0x9f [<ffffffff817b0e68>] htb_enqueue+0x55/0x27a [<ffffffff817b6c2e>] dsmark_enqueue+0x165/0x1a4 [<ffffffff81775642>] __dev_queue_xmit+0x35e/0x536 [<ffffffff8177582a>] dev_queue_xmit+0x10/0x12 [<ffffffff818f8ecd>] packet_sendmsg+0xb26/0xb9a [<ffffffff810b1507>] ? __lock_acquire+0x3ae/0xdf3 [<ffffffff8175cf08>] __sock_sendmsg_nosec+0x25/0x27 [<ffffffff8175d916>] sock_aio_write+0xd0/0xe7 [<ffffffff8117d6b8>] do_sync_write+0x59/0x78 [<ffffffff8117d84d>] vfs_write+0xb5/0x10a [<ffffffff8117d96a>] SyS_write+0x49/0x7f [<ffffffff8198e212>] system_call_fastpath+0x16/0x1b This is because we memcpy struct tcindex_filter_result which contains struct tcf_exts, obviously struct list_head can not be simply copied. This is a regression introduced by commit `33be627159` (net_sched: act: use standard struct list_head). It's not very easy to fix it as the code is a mess: if (old_r) memcpy(&cr, r, sizeof(cr)); else { memset(&cr, 0, sizeof(cr)); tcf_exts_init(&cr.exts, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE); } ... tcf_exts_change(tp, &cr.exts, &e); ... memcpy(r, &cr, sizeof(cr)); the above code should equal to: tcindex_filter_result_init(&cr); if (old_r) cr.res = r->res; ... if (old_r) tcf_exts_change(tp, &r->exts, &e); else tcf_exts_change(tp, &cr.exts, &e); ... r->res = cr.res; after this change, since there is no need to copy struct tcf_exts. And it also fixes other places zero'ing struct's contains struct tcf_exts. Fixes: commit `33be627159` (net_sched: act: use standard struct list_head) Reported-by: Kelly Anderson <kelly@xilka.com> Tested-by: Kelly Anderson <kelly@xilka.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-21 16:47:13 -04:00
Li RongQing	1495664355	ipv6: slight optimization in ip6_dst_gc entries is always greater than rt_max_size here, since if entries is less than rt_max_size, the fib6_run_gc function will be skipped Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-21 15:52:23 -04:00
Tom Gundersen	f98f89a010	net: tunnels - enable module autoloading Enable the module alias hookup to allow tunnel modules to be autoloaded on demand. This is in line with how most other netdev kinds work, and will allow userspace to create tunnels without having CAP_SYS_MODULE. Signed-off-by: Tom Gundersen <teg@jklm.no> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-21 15:46:52 -04:00
Arik Nemtsov	4d3df547e8	cfg80211: don't set reg timeout for user-handled hint Otherwise every "indoor" setting by usermode will cause a regdomain reset. Acked-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-21 09:15:18 +02:00
Antonio Quartulli	7406353d43	cfg80211: implement cfg80211_get_station cfg80211 API Implement and export the new cfg80211_get_station() API. This utility can be used by other kernel modules to obtain detailed information about a given wireless station. It will be in particular useful to batman-adv which will implement a wireless rate based metric. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-21 09:15:17 +02:00
Antonio Quartulli	cca674d47e	mac80211: export the expected throughput Add get_expected_throughput() API to mac80211 so that each driver can implement its own version based on the RC algorithm they are using (might be using an HW RC algo). The API returns a value expressed in Kbps. Also, add the new get_expected_throughput() member to the rate_control_ops structure in order to be able to query the RC algorithm (this patch provides an implementation of this API for both minstrel and minstrel_ht). The related member in the station_info object is now filled accordingly when dumping a station. Cc: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-21 09:15:16 +02:00
Steffen Klassert	78ff4be45a	ip_tunnel: Initialize the fallback device properly We need to initialize the fallback device to have a correct mtu set on this device. Otherwise the mtu is set to null and the device is unusable. Fixes: `fd58156e45` ("IPIP: Use ip-tunneling code.") Cc: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-21 02:08:32 -04:00
David S. Miller	d050de607f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter/nftables fixes for net The following patchset contains nftables fixes for your net tree, they are: 1) Fix crash when using the goto action in a rule by making sure that we always fall back on the base chain. Otherwise, this may try to access the counter memory area of non-base chains, which does not exists. 2) Fix several aspects of the rule tracing that are currently broken: * Reset rule number counter after goto/jump action, otherwise the tracing reports a bogus rule number. * Fix tracing of the goto action. * Fix bogus rule number counter after goto. * Fix missing return trace after finishing the walk through the non-base chain. * Fix missing trace when matching non-terminal rule. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-21 01:24:19 -04:00
Johan Hedberg	1cc6114402	Bluetooth: Update smp_confirm to return a response code Now that smp_confirm() is called "inline" we can have it return a response code and have the sending of it be done in the shared place for command handlers. One exception is when we're entering smp.c from mgmt.c when user space responds to authentication, in which case we still need our own code to call smp_failure(). Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-05-20 08:44:14 -07:00
Johan Hedberg	861580a970	Bluetooth: Update smp_random to return a response code Since we're now calling smp_random() "inline" we can have it directly return a response code and have the shared command handler send the response. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-05-20 08:44:14 -07:00
Johan Hedberg	4a74d65868	Bluetooth: Rename smp->smp_flags to smp->flags There's no reason to have "smp" in this variable name since it is already part of the SMP struct which provides sufficient context. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-05-20 08:44:14 -07:00
Johan Hedberg	9dd4dd275f	Bluetooth: Remove unnecessary work structs from SMP code When the SMP code was initially created (mid-2011) parts of the Bluetooth subsystem were still not converted to use workqueues. This meant that the crypto calls, which could sleep, couldn't be called directly. Because of this the "confirm" and "random" work structs were introduced. These days the entire Bluetooth subsystem runs through workqueues which makes these structs unnecessary. This patch removes them and converts the calls to queue them to use direct function calls instead. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-05-20 08:44:13 -07:00
Johan Hedberg	1ef35827a9	Bluetooth: Fix setting initial local auth_req value There is no reason to have the initial local value conditional to whether the remote value has bonding set or not. We can either way start off with the value we received. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-05-20 08:44:12 -07:00
Johan Hedberg	4bc58f51e1	Bluetooth: Make SMP context private to smp.c There are no users of the smp_chan struct outside of smp.c so move it away from smp.h. The addition of the l2cap.h include to hci_core.c, hci_conn.c and mgmt.c is something that should have been there already previously to avoid warnings of undeclared struct l2cap_conn, but the compiler warning was apparently shadowed away by the mention of l2cap_conn in the struct smp_chan definition. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-05-20 08:44:11 -07:00
Antonio Quartulli	867d849fc8	cfg80211: export expected throughput through get_station() Users may need information about the expected throughput towards a given peer. This value is supposed to consider the size overhead generated by the 802.11 header. This value is exported in kbps through the get_station() API by including it into the station_info object. Moreover, it is sent to user space when replying to the nl80211 GET_STATION command. This information will be useful to the batman-adv module which will use it for its new metric computation. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-20 15:13:32 +02:00
Hiren Tandel	0515829642	NFC: NCI: Send all NCI frames to raw sockets So that anyone listening on SOCKPROTO_RAW for raw frames will get all NCI frames, in both directions. This actually implements userspace NFC NCI sniffing. It's now up to userspace to decode those frames. Signed-off-by: Hiren Tandel <hirent@marvell.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-05-20 00:23:59 +02:00
Hiren Tandel	57be1f3f3e	NFC: Add RAW socket type support for SOCKPROTO_RAW This allows for a more generic NFC sniffing by using SOCKPROTO_RAW SOCK_RAW to read RAW NFC frames. This is for sniffing anything but LLCP (HCI, NCI, etc...). Signed-off-by: Hiren Tandel <hirent@marvell.com> Signed-off-by: Rahul Tank <rahult@marvell.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-05-20 00:06:04 +02:00
Hiren Tandel	c79d9f9ef8	NFC: NCI: No need to reverse ATR_RES Response ATR_RES response received within Activation Parameters is already in correct order. Reversing it fails LLCP magic number check and so P2P functionality fails. Signed-off-by: Hiren Tandel <hirent@marvell.com> Signed-off-by: Rahul Tank <rahult@marvell.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-05-19 23:58:08 +02:00
Mark A. Greer	4b8b6267be	NFC: digital: Handle multiple SENSF_REQ frames According to section 5.15.1.3 of the NFC Activity Specification, multiple SENSF_REQ commands can be received by a target before it receives an ATR_REQ command. To handle this, add a routine that checks whether a SENSF_REQ or ATR_REQ has been recieved. If its a SENSF_REQ, respond appropriately and continue waiting for a ATR_REQ. If its an ATR_REQ, handle it as before. CC: Thierry Escande <thierry.escande@linux.intel.com> Signed-off-by: Mark A. Greer <mgreer@animalcreek.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-05-19 23:52:40 +02:00
Mark A. Greer	96e829b433	NFC: digital: SENSF_RES excludes RD when SENSF_REQ RC is zero The check in digital_tg_send_sensf_res() that excludes the 'RD' field from the SENSF_RES is inverted. The 'RD' field should be excluded when the SENSF_REQ 'RC' field is equal to DIGITAL_SENSF_REQ_RC_NONE instead of when its not equal. This is described in section 6.6.2.11 of the NFC Digital Specification. CC: Thierry Escande <thierry.escande@linux.intel.com> Signed-off-by: Mark A. Greer <mgreer@animalcreek.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-05-19 23:52:37 +02:00
John W. Linville	20b4f9c73f	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth	2014-05-19 16:34:27 -04:00
Johannes Berg	922bd80fc3	cfg80211: constify wowlan/coalesce mask/pattern pointers This requires changing the nl80211 parsing code a bit to use intermediate pointers for the allocation, but clarifies the API towards the drivers. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-19 18:06:50 +02:00
Johannes Berg	c1e5f4714d	cfg80211: constify more pointers in the cfg80211 API This also propagates through the drivers. The orinoco driver uses the cfg80211 API structs for internal bookkeeping, and so needs a (void *) cast that removes the const - but that's OK because it allocates those pointers. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-19 17:53:16 +02:00
Johannes Berg	3b3a0162fa	cfg80211: constify MAC addresses in cfg80211 ops This propagates through all the drivers and mac80211. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-19 17:34:42 +02:00
Johannes Berg	00591cea31	mac80211: minstrel-ht: small clarifications Antonio and I were looking over this code and some things didn't immediately make sense, so we came up with two small clarifications. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-19 14:30:37 +02:00
Pablo Neira Ayuso	c7c32e72cb	netfilter: nf_tables: defer all object release via rcu Now that all objects are released in the reverse order via the transaction infrastructure, we can enqueue the release via call_rcu to save one synchronize_rcu. For small rule-sets loaded via nft -f, it now takes around 50ms less here. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-19 12:06:13 +02:00
Pablo Neira Ayuso	128ad3322b	netfilter: nf_tables: remove skb and nlh from context structure Instead of caching the original skbuff that contains the netlink messages, this stores the netlink message sequence number, the netlink portID and the report flag. This helps to prepare the introduction of the object release via call_rcu. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-19 12:06:13 +02:00
Pablo Neira Ayuso	35151d840c	netfilter: nf_tables: simplify nf_tables__notify Now that all these function are called from the commit path, we can pass the context structure to reduce the amount of parameters in all of the nf_tables__notify functions. This patch also removes unneeded branches to check for skb, nlh and net that should be always set in the context structure. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-19 12:06:12 +02:00
Pablo Neira Ayuso	60319eb1ca	netfilter: nf_tables: use new transaction infrastructure to handle elements Leave the set content in consistent state if we fail to load the batch. Use the new generic transaction infrastructure to achieve this. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-19 12:06:12 +02:00
Pablo Neira Ayuso	55dd6f9307	netfilter: nf_tables: use new transaction infrastructure to handle table This patch speeds up rule-set updates and it also provides a way to revert updates and leave things in consistent state in case that the batch needs to be aborted. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-19 12:06:12 +02:00
Pablo Neira Ayuso	e1aaca93ee	netfilter: nf_tables: pass context to nf_tables_updtable() So nf_tables_uptable() only takes one single parameter. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-19 12:06:11 +02:00
Pablo Neira Ayuso	f75edf5e9c	netfilter: nf_tables: disabling table hooks always succeeds nf_tables_table_disable() always succeeds, make this function void. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-19 12:06:11 +02:00
Pablo Neira Ayuso	91c7b38dc9	netfilter: nf_tables: use new transaction infrastructure to handle chain This patch speeds up rule-set updates and it also introduces a way to revert chain updates if the batch is aborted. The idea is to store the changes in the transaction to apply that in the commit step. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-19 12:06:11 +02:00
Pablo Neira Ayuso	ff3cd7b3c9	netfilter: nf_tables: refactor chain statistic routines Add new routines to encapsulate chain statistics allocation and replacement. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-19 12:06:11 +02:00
Pablo Neira Ayuso	958bee14d0	netfilter: nf_tables: use new transaction infrastructure to handle sets This patch reworks the nf_tables API so set updates are included in the same batch that contains rule updates. This speeds up rule-set updates since we skip a dialog of four messages between kernel and user-space (two on each direction), from: 1) create the set and send netlink message to the kernel 2) process the response from the kernel that contains the allocated name. 3) add the set elements and send netlink message to the kernel. 4) process the response from the kernel (to check for errors). To: 1) add the set to the batch. 2) add the set elements to the batch. 3) add the rule that points to the set. 4) send batch to the kernel. This also introduces an internal set ID (NFTA_SET_ID) that is unique in the batch so set elements and rules can refer to new sets. Backward compatibility has been only retained in userspace, this means that new nft versions can talk to the kernel both in the new and the old fashion. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-19 12:06:10 +02:00
Pablo Neira Ayuso	b380e5c733	netfilter: nf_tables: add message type to transactions The patch adds message type to the transaction to simplify the commit the and abort routines. Yet another step forward in the generalisation of the transaction infrastructure. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-19 12:06:10 +02:00
Pablo Neira Ayuso	37082f930b	netfilter: nf_tables: relocate commit and abort routines in the source file Move the commit and abort routines to the bottom of the source code file. This change is required by the follow up patches that add the set, chain and table transaction support. This patch is just a cleanup to access several functions without having to declare their prototypes. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-19 12:06:10 +02:00
Pablo Neira Ayuso	1081d11b08	netfilter: nf_tables: generalise transaction infrastructure This patch generalises the existing rule transaction infrastructure so it can be used to handle set, table and chain object transactions as well. The transaction provides a data area that stores private information depending on the transaction type. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-19 12:06:10 +02:00
Pablo Neira Ayuso	7c95f6d866	netfilter: nf_tables: deconstify table and chain in context structure The new transaction infrastructure updates the family, table and chain objects in the context structure, so let's deconstify them. While at it, move the context structure initialization routine to the top of the source file as it will be also used from the table and chain routines. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-19 12:06:09 +02:00
Oliver Hartkopp	45c700291a	can: add hash based access to single EFF frame filters In contrast to the direct access to the single SFF frame filters (which are indexed by the SFF CAN ID itself) the single EFF frame filters are arranged in a single linked hlist. To reduce the hlist traversal in the case of many filter subscriptions a hash based access is introduced for single EFF filters. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2014-05-19 09:38:24 +02:00
Oliver Hartkopp	e3d3917f3d	can: proc: make array printing function indenpendent from sff frames The can_rcvlist_sff_proc_show_one() function which prints the array of filters for the single SFF CAN identifiers is prepared to be used by a second caller. Therefore it is also renamed to properly describe its future functionality. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2014-05-19 09:38:24 +02:00
David S. Miller	b6052af61a	Included changes: - fix codestyle to respect new checkpatch warnings - increase internal version number -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJTeLMvAAoJEEKTMo6mOh1VKtgP/RuR34USuUbY/xMZ9/Rn2/E7 z1qn6hh8hlw+Hd+Vn+9BvDJzwn+Baneu1c3SMP08kE+pAst0n788y/f/pVzfToJk Gll0sOVHiSm05M0QQ0Vq57H+rxoFv2KACM1t2+NMW+pB+PsSYG5y87b6I+0hR4Pv lbBCNmgIxY2alxM8qab2Zlt+cCUdkKUnI67P0LtVnMh91JuKwsheOdR+Smxz2+2g J+2Bzcz+NIHhJP9c+QmJipV+gtIRjFr7+bebaXDm/eEBq/3f6cEhFtwa76CmCpI/ cAIMDFORCHB27qNMgKSuzFDdhF1qQJnZh8FX0dfRBXvH8NwxBOkjFh1CBJ3iwjm1 T7GBTLTKiv/JqdNjqrWJ9OxChl8I2jppevZdimq1VUjhv9117Jc73TnzazjULTST xr5PpZ1gRfruUVXl362otrtzm0N/hdqez+mYlkZEx/ERTDedLCZZAnjTsx5PPMG+ GXlbc1BWuQZuHpvs8uWMcnXDaWtNyNKKpvfRPuvLIST80F1Bw/KRd2FDH/AiO2tL 2eACn9ughC5XO9E+/iyfWm1MQMEwo/w9+EfWpnRWV9HtDuHepVGy59x3mCYH/bN0 7FP23lbaFw05i/UpsRRneqkzMJLk/16qLCiNoC8u2hEiqKzu0/celPwl7B16Fs4Z CU65LSN/QNU9q+AXVQOd =tdAQ -----END PGP SIGNATURE----- Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge Included changes: - fix codestyle to respect new checkpatch warnings - increase internal version number	2014-05-18 21:27:09 -04:00
Manuel Schölling	71fd762f2e	net: rds: Use time_after() for time comparison To be future-proof and for better readability the time comparisons are modified to use time_after() instead of raw math. Signed-off-by: Manuel Schölling <manuel.schoelling@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-18 21:24:52 -04:00
stephen hemminger	614d056c8e	ipv4: minor spelling fix Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-18 21:10:29 -04:00
stephen hemminger	025559eec8	bridge: fix spelling of promiscuous Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-18 21:10:08 -04:00
Ben Hutchings	61d88c6811	ethtool: Disallow ETHTOOL_SRSSH with both indir table and hash key unchanged This would be a no-op, so there is no reason to request it. This also allows conversion of the current implementations of ethtool_ops::{get,set}_rxfh_indir to ethtool_ops::{get,set}_rxfh with no change other than their parameters. Signed-off-by: Ben Hutchings <ben@decadent.org.uk>	2014-05-19 01:29:42 +01:00
Ben Hutchings	7455fa2422	ethtool: Name the 'no change' value for setting RSS hash key but not indir table We usually allocate special values of u32 fields starting from the top down, so also change the value to 0xffffffff. As these operations haven't been included in a stable release yet, it's not too late to change. Signed-off-by: Ben Hutchings <ben@decadent.org.uk>	2014-05-19 01:18:19 +01:00
Ben Hutchings	fb95cd8d14	ethtool: Return immediately on error in ethtool_copy_validate_indir() We must return -EFAULT immediately rather than continuing into the loop. Similarly, we may as well return -EINVAL directly. Signed-off-by: Ben Hutchings <ben@decadent.org.uk>	2014-05-19 01:17:32 +01:00
Alexei Starovoitov	d4f0e0958d	net: bridge: fix build fix build when BRIDGE_VLAN_FILTERING is not set Fixes: `2796d0c648` ("bridge: Automatically manage port promiscuous mode") Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-18 20:09:50 -04:00
Trond Myklebust	7a9a7b774f	SUNRPC: Fix a module reference issue in rpcsec_gss We're not taking a reference in the case where _gss_mech_get_by_pseudoflavor loops without finding the correct rpcsec_gss flavour, so why are we releasing it? Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-05-18 13:47:14 -04:00
Simon Wunderlich	871d3d9fdf	batman-adv: Start new development cycle Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>	2014-05-18 15:04:00 +02:00
Antonio Quartulli	2b64df2058	batman-adv: remove semi-colon after macro definition Reported by checkpatch with the following warning: "WARNING: macros should not use a trailing semicolon" Signed-off-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2014-05-18 15:04:00 +02:00
Antonio Quartulli	f138694b15	batman-adv: add blank line between declarations and the rest of the code Reported by checkpatch with the following message: "WARNING: Missing a blank line after declarations" Signed-off-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2014-05-18 15:03:52 +02:00
Vlad Yasevich	44a4085538	bonding: Fix stacked device detection in arp monitoring Prior to commit `fbd929f2dc` bonding: support QinQ for bond arp interval the arp monitoring code allowed for proper detection of devices stacked on top of vlans. Since the above commit, the code can still detect a device stacked on top of single vlan, but not a device stacked on top of Q-in-Q configuration. The search will only set the inner vlan tag if the route device is the vlan device. However, this is not always the case, as it is possible to extend the stacked configuration. With this patch it is possible to provision devices on top Q-in-Q vlan configuration that should be used as a source of ARP monitoring information. For example: ip link add link bond0 vlan10 type vlan proto 802.1q id 10 ip link add link vlan10 vlan100 type vlan proto 802.1q id 100 ip link add link vlan100 type macvlan Note: This patch limites the number of stacked VLANs to 2, just like before. The original, however had another issue in that if we had more then 2 levels of VLANs, we would end up generating incorrectly tagged traffic. This is no longer possible. Fixes: `fbd929f2dc` (bonding: support QinQ for bond arp interval) CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@redhat.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: Ding Tianhong <dingtianhong@huawei.com> CC: Patric McHardy <kaber@trash.net> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-16 22:29:05 -04:00
Vlad Yasevich	d38569ab2b	vlan: Fix lockdep warning with stacked vlan devices. This reverts commit `dc8eaaa006`. vlan: Fix lockdep warning when vlan dev handle notification Instead we use the new new API to find the lock subclass of our vlan device. This way we can support configurations where vlans are interspersed with other devices: bond -> vlan -> macvlan -> vlan Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-16 22:14:49 -04:00
Vlad Yasevich	4085ebe8c3	net: Find the nesting level of a given device by type. Multiple devices in the kernel can be stacked/nested and they need to know their nesting level for the purposes of lockdep. This patch provides a generic function that determines a nesting level of a particular device by its type (ex: vlan, macvlan, etc). We only care about nesting of the same type of devices. For example: eth0 <- vlan0.10 <- macvlan0 <- vlan1.20 The nesting level of vlan1.20 would be 1, since there is another vlan in the stack under it. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-16 22:14:49 -04:00
Thomas Graf	97dc48e220	pktgen: Use seq_puts() where seq_printf() is not needed Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-16 17:30:30 -04:00
Eric Dumazet	29e9824278	net: gro: make sure skb->cb[] initial content has not to be zero Starting from linux-3.13, GRO attempts to build full size skbs. Problem is the commit assumed one particular field in skb->cb[] was clean, but it is not the case on some stacked devices. Timo reported a crash in case traffic is decrypted before reaching a GRE device. Fix this by initializing NAPI_GRO_CB(skb)->last at the right place, this also removes one conditional. Thanks a lot to Timo for providing full reports and bisecting this. Fixes: `8a29111c7c` ("net: gro: allow to build full sized skb") Bisected-by: Timo Teras <timo.teras@iki.fi> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Timo Teräs <timo.teras@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-16 17:24:54 -04:00
Phoebe Buckheister	f0f77dc6be	ieee802154, mac802154: implement devkey record option The 802.15.4-2011 standard states that for each key, a list of devices that use this key shall be kept. Previous patches have only considered two options: * a device "uses" (or may use) all keys, rendering the list useless * a device is restricted to a certain set of keys Another option would be that a device may use all keys, but need not do so, and we are interested in the actual set of keys the device uses. Recording keys used by any given device may have a noticable performance impact and might not be needed as often. The common case, in which a device will not switch keys too often, should still perform well. Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-16 17:23:42 -04:00
Phoebe Buckheister	3e9c156e2c	ieee802154: add netlink interfaces for llsec This patch adds user-visible interfaces for the llsec infrastructure. For the added methods, the only major difference between all add/remove implementation lies in how the specific object is parsed, and for dump requests, how objects are written into netlink messages. To save on boilerplate code, table dumps are routed through a helper function that handles netlink dump state, leaving the actual dumping code to care only about iterating over the table to be dumped and filling netlink messages. For add/remove methods, the boilerplate required to work is not quite as large, but still enough to also move into a local helper. Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-16 17:23:41 -04:00
Phoebe Buckheister	9b0bb4a83f	mac802154: propagate device address changes to llsec Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-16 17:23:41 -04:00
Phoebe Buckheister	29e023746a	mac802154: add llsec configuration functions Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-16 17:23:41 -04:00
Phoebe Buckheister	af9eed5bbf	ieee802154: add dgram sockopts for security control Allow datagram sockets to override the security settings of the device they send from on a per-socket basis. Requires CAP_NET_ADMIN or CAP_NET_RAW, since raw sockets can send arbitrary packets anyway. Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-16 17:23:41 -04:00
Phoebe Buckheister	f30be4d53c	mac802154: integrate llsec with wpan devices Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-16 17:23:41 -04:00
Phoebe Buckheister	4c14a2fb5d	mac802154: add llsec decryption method Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-16 17:23:41 -04:00
Phoebe Buckheister	03556e4d0d	mac802154: add llsec encryption method Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-16 17:23:40 -04:00
Phoebe Buckheister	5d637d5aab	mac802154: add llsec structures and mutators This patch adds containers and mutators for the major ieee802154_llsec structures to mac802154. Most of the (rather simple) ieee802154_llsec structs are wrapped only to provide an rcu_head for orderly disposal, but some structs - llsec keys notably - require more complex bookkeeping. Since each llsec key may be referenced by a number of llsec key table entries (with differing key ids, but the same actual key), we want to save memory and not allocate crypto transforms for each entry in the table. Thus, the mac802154 llsec key is reference-counted instead. Further, each key will have four associated crypto transforms - three CCM transforms for the authsizes 4/8/16 and one CTR transform for unauthenticated encryption. If we had a CCM* transform that allowed authsize 0, and authsize as part of requests instead of transforms, this would not be necessary. Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-16 17:23:40 -04:00
Phoebe Buckheister	87de726c9b	mac802154: update Kconfig Link-layer security requires AES CCM for authenticated modes and AES CTR for the unauthenticated encryption mode. Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-16 17:23:40 -04:00
David S. Miller	e54740e6d7	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch Jesse Gross says: ==================== A set of OVS changes for net-next/3.16. The major change here is a switch from per-CPU to per-NUMA flow statistics. This improves scalability by reducing kernel overhead in flow setup and maintenance. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-16 17:21:51 -04:00
Vlad Yasevich	2796d0c648	bridge: Automatically manage port promiscuous mode. There exist configurations where the administrator or another management entity has the foreknowledge of all the mac addresses of end systems that are being bridged together. In these environments, the administrator can statically configure known addresses in the bridge FDB and disable flooding and learning on ports. This makes it possible to turn off promiscuous mode on the interfaces connected to the bridge. Here is why disabling flooding and learning allows us to control promiscuity: Consider port X. All traffic coming into this port from outside the bridge (ingress) will be either forwarded through other ports of the bridge (egress) or dropped. Forwarding (egress) is defined by FDB entries and by flooding in the event that no FDB entry exists. In the event that flooding is disabled, only FDB entries define the egress. Once learning is disabled, only static FDB entries provided by a management entity define the egress. If we provide information from these static FDBs to the ingress port X, then we'll be able to accept all traffic that can be successfully forwarded and drop all the other traffic sooner without spending CPU cycles to process it. Another way to define the above is as following equations: ingress = egress + drop expanding egress ingress = static FDB + learned FDB + flooding + drop disabling flooding and learning we a left with ingress = static FDB + drop By adding addresses from the static FDB entries to the MAC address filter of an ingress port X, we fully define what the bridge can process without dropping and can thus turn off promiscuous mode, thus dropping packets sooner. There have been suggestions that we may want to allow learning and update the filters with learned addresses as well. This would require mac-level authentication similar to 802.1x to prevent attacks against the hw filters as they are limited resource. Additionally, if the user places the bridge device in promiscuous mode, all ports are placed in promiscuous mode regardless of the changes to flooding and learning. Since the above functionality depends on full static configuration, we have also require that vlan filtering be enabled to take advantage of this. The reason is that the bridge has to be able to receive and process VLAN-tagged frames and the there are only 2 ways to accomplish this right now: promiscuous mode or vlan filtering. Suggested-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-16 17:06:33 -04:00
Vlad Yasevich	145beee8d6	bridge: Add addresses from static fdbs to non-promisc ports When a static fdb entry is created, add the mac address from this fdb entry to any ports that are currently running in non-promiscuous mode. These ports need this data so that they can receive traffic destined to these addresses. By default ports start in promiscuous mode, so this feature is disabled. Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-16 17:06:33 -04:00
Vlad Yasevich	f3a6ddf152	bridge: Introduce BR_PROMISC flag Introduce a BR_PROMISC per-port flag that will help us track if the current port is supposed to be in promiscuous mode or not. For now, always start in promiscuous mode. Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-16 17:06:33 -04:00
Vlad Yasevich	8db24af71b	bridge: Add functionality to sync static fdb entries to hw Add code that allows static fdb entires to be synced to the hw list for a specified port. This will be used later to program ports that can function in non-promiscuous mode. Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-16 17:06:33 -04:00
Vlad Yasevich	e028e4b8dc	bridge: Keep track of ports capable of automatic discovery. By default, ports on the bridge are capable of automatic discovery of nodes located behind the port. This is accomplished via flooding of unknown traffic (BR_FLOOD) and learning the mac addresses from these packets (BR_LEARNING). If the above functionality is disabled by turning off these flags, the port requires static configuration in the form of static FDB entries to function properly. This patch adds functionality to keep track of all ports capable of automatic discovery. This will later be used to control promiscuity settings. Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-16 17:06:33 -04:00
Vlad Yasevich	63c3a622dd	bridge: Turn flag change macro into a function. Turn the flag change macro into a function to allow easier updates and to reduce space. Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-16 17:06:32 -04:00
Timo Teräs	22fb22eaeb	ipv4: ip_tunnels: disable cache for nbma gre tunnels The connected check fails to check for ip_gre nbma mode tunnels properly. ip_gre creates temporary tnl_params with daddr specified to pass-in the actual target on per-packet basis from neighbor layer. Detect these tunnels by inspecting the actual tunnel configuration. Minimal test case: ip route add 192.168.1.1/32 via 10.0.0.1 ip route add 192.168.1.2/32 via 10.0.0.2 ip tunnel add nbma0 mode gre key 1 tos c0 ip addr add 172.17.0.0/16 dev nbma0 ip link set nbma0 up ip neigh add 172.17.0.1 lladdr 192.168.1.1 dev nbma0 ip neigh add 172.17.0.2 lladdr 192.168.1.2 dev nbma0 ping 172.17.0.1 ping 172.17.0.2 The second ping should be going to 192.168.1.2 and head 10.0.0.2; but cached gre tunnel level route is used and it's actually going to 192.168.1.1 via 10.0.0.1. The lladdr's need to go to separate dst for the bug to trigger. Test case uses separate route entries, but this can also happen when the route entry is same: if there is a nexthop exception or the GRE tunnel is IPsec'ed in which case the dst points to xfrm bundle unique to the gre lladdr. Fixes: `7d442fab0a` ("ipv4: Cache dst in tunnels") Signed-off-by: Timo Teräs <timo.teras@iki.fi> Cc: Tom Herbert <therbert@google.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-16 16:58:41 -04:00
Duan Jiong	ee30ef4d45	ip_tunnel: don't add tunnel twice When using command "ip tunnel add" to add a tunnel, the tunnel will be added twice, through ip_tunnel_create() and ip_tunnel_update(). Because the second is unnecessary, so we can just break after adding tunnel through ip_tunnel_create(). Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-16 16:57:44 -04:00
Fabian Godehardt	d1c0b471b3	net/dsa/dsa.c: increment chip_index during of_node handling on dsa_of_probe() Adding more than one chip on device-tree currently causes the probing routine to always use the first chips data pointer. Signed-off-by: Fabian Godehardt <fg@emlix.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-16 16:56:33 -04:00
Lorenzo Colitti	2e47b29195	net: ipv6: make "ip -6 route get mark xyz" work. Currently, "ip -6 route get mark xyz" ignores the mark passed in by userspace. Make it honour the mark, just like IPv4 does. Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-16 16:50:30 -04:00
Monam Agarwal	944df8ae84	net/openvswitch: Use with RCU_INIT_POINTER(x, NULL) in vport-gre.c This patch replaces rcu_assign_pointer(x, NULL) with RCU_INIT_POINTER(x, NULL) The rcu_assign_pointer() ensures that the initialization of a structure is carried out before storing a pointer to that structure. And in the case of the NULL pointer, there is no structure to initialize. So, rcu_assign_pointer(p, NULL) can be safely converted to RCU_INIT_POINTER(p, NULL) Signed-off-by: Monam Agarwal <monamagarwal123@gmail.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-05-16 13:40:29 -07:00
Jarno Rajahalme	88d73f6c41	openvswitch: Use TCP flags in the flow key for stats. We already extract the TCP flags for the key, might as well use that for stats. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-05-16 13:40:29 -07:00
Jarno Rajahalme	d92ab13558	openvswitch: Fix output of SCTP mask. The 'output' argument of the ovs_nla_put_flow() is the one from which the bits are written to the netlink attributes. For SCTP we accidentally used the bits from the 'swkey' instead. This caused the mask attributes to include the bits from the actual flow key instead of the mask. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-05-16 13:40:29 -07:00
Jarno Rajahalme	63e7959c4b	openvswitch: Per NUMA node flow stats. Keep kernel flow stats for each NUMA node rather than each (logical) CPU. This avoids using the per-CPU allocator and removes most of the kernel-side OVS locking overhead otherwise on the top of perf reports and allows OVS to scale better with higher number of threads. With 9 handlers and 4 revalidators netperf TCP_CRR test flow setup rate doubles on a server with two hyper-threaded physical CPUs (16 logical cores each) compared to the current OVS master. Tested with non-trivial flow table with a TCP port match rule forcing all new connections with unique port numbers to OVS userspace. The IP addresses are still wildcarded, so the kernel flows are not considered as exact match 5-tuple flows. This type of flows can be expected to appear in large numbers as the result of more effective wildcarding made possible by improvements in OVS userspace flow classifier. Perf results for this test (master): Events: 305K cycles + 8.43% ovs-vswitchd [kernel.kallsyms] [k] mutex_spin_on_owner + 5.64% ovs-vswitchd [kernel.kallsyms] [k] __ticket_spin_lock + 4.75% ovs-vswitchd ovs-vswitchd [.] find_match_wc + 3.32% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_lock + 2.61% ovs-vswitchd [kernel.kallsyms] [k] pcpu_alloc_area + 2.19% ovs-vswitchd ovs-vswitchd [.] flow_hash_in_minimask_range + 2.03% swapper [kernel.kallsyms] [k] intel_idle + 1.84% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_unlock + 1.64% ovs-vswitchd ovs-vswitchd [.] classifier_lookup + 1.58% ovs-vswitchd libc-2.15.so [.] 0x7f4e6 + 1.07% ovs-vswitchd [kernel.kallsyms] [k] memset + 1.03% netperf [kernel.kallsyms] [k] __ticket_spin_lock + 0.92% swapper [kernel.kallsyms] [k] __ticket_spin_lock ... And after this patch: Events: 356K cycles + 6.85% ovs-vswitchd ovs-vswitchd [.] find_match_wc + 4.63% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_lock + 3.06% ovs-vswitchd [kernel.kallsyms] [k] __ticket_spin_lock + 2.81% ovs-vswitchd ovs-vswitchd [.] flow_hash_in_minimask_range + 2.51% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_unlock + 2.27% ovs-vswitchd ovs-vswitchd [.] classifier_lookup + 1.84% ovs-vswitchd libc-2.15.so [.] 0x15d30f + 1.74% ovs-vswitchd [kernel.kallsyms] [k] mutex_spin_on_owner + 1.47% swapper [kernel.kallsyms] [k] intel_idle + 1.34% ovs-vswitchd ovs-vswitchd [.] flow_hash_in_minimask + 1.33% ovs-vswitchd ovs-vswitchd [.] rule_actions_unref + 1.16% ovs-vswitchd ovs-vswitchd [.] hindex_node_with_hash + 1.16% ovs-vswitchd ovs-vswitchd [.] do_xlate_actions + 1.09% ovs-vswitchd ovs-vswitchd [.] ofproto_rule_ref + 1.01% netperf [kernel.kallsyms] [k] __ticket_spin_lock ... There is a small increase in kernel spinlock overhead due to the same spinlock being shared between multiple cores of the same physical CPU, but that is barely visible in the netperf TCP_CRR test performance (maybe ~1% performance drop, hard to tell exactly due to variance in the test results), when testing for kernel module throughput (with no userspace activity, handful of kernel flows). On flow setup, a single stats instance is allocated (for the NUMA node 0). As CPUs from multiple NUMA nodes start updating stats, new NUMA-node specific stats instances are allocated. This allocation on the packet processing code path is made to never block or look for emergency memory pools, minimizing the allocation latency. If the allocation fails, the existing preallocated stats instance is used. Also, if only CPUs from one NUMA-node are updating the preallocated stats instance, no additional stats instances are allocated. This eliminates the need to pre-allocate stats instances that will not be used, also relieving the stats reader from the burden of reading stats that are never used. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-05-16 13:40:29 -07:00
Jarno Rajahalme	23dabf88ab	openvswitch: Remove 5-tuple optimization. The 5-tuple optimization becomes unnecessary with a later per-NUMA node stats patch. Remove it first to make the changes easier to grasp. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-05-16 13:40:29 -07:00
Joe Perches	8c63ff09bd	openvswitch: Use ether_addr_copy It's slightly smaller/faster for some architectures. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-05-16 13:40:29 -07:00
Joe Perches	2235ad1c3a	openvswitch: flow_netlink: Use pr_fmt to OVS_NLERR output Add "openvswitch: " prefix to OVS_NLERR output to match the other OVS_NLERR output of datapath.c Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-05-16 13:40:28 -07:00
Joe Perches	1815a8831f	openvswitch: Use net_ratelimit in OVS_NLERR Each use of pr_<level>_once has a per-site flag. Some of the OVS_NLERR messages look as if seeing them multiple times could be useful, so use net_ratelimit() instead of pr_info_once. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-05-16 13:40:28 -07:00
Daniele Di Proietto	cc23ebf3bb	openvswitch: Added (unsigned long long) cast in printf This is necessary, since u64 is not unsigned long long in all architectures: u64 could be also uint64_t. Signed-off-by: Daniele Di Proietto <daniele.di.proietto@gmail.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-05-16 13:40:28 -07:00
Daniele Di Proietto	07dc0602c5	openvswitch: avoid cast-qual warning in vport_priv This function must cast a const value to a non const value. By adding an uintptr_t cast the warning is suppressed. To avoid the cast (proper solution) several function signatures must be changed. Signed-off-by: Daniele Di Proietto <daniele.di.proietto@gmail.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-05-16 13:40:28 -07:00
Daniele Di Proietto	d0b4da1375	openvswitch: avoid warnings in vport_from_priv This change, firstly, avoids declaring the formal parameter const, since it is treated as non const. (to avoid -Wcast-qual) Secondly, it cast the pointer from void* to u8*, since it is used in arithmetic (to avoid -Wpointer-arith) Signed-off-by: Daniele Di Proietto <daniele.di.proietto@gmail.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-05-16 13:40:28 -07:00
Daniele Di Proietto	7085130bab	openvswitch: use const in some local vars and casts In few functions, const formal parameters are assigned or cast to non-const. These changes suppress warnings if compiled with -Wcast-qual. Signed-off-by: Daniele Di Proietto <daniele.di.proietto@gmail.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-05-16 13:40:28 -07:00
David S. Miller	2f67cc87d6	Include changes: - fix NULL dereference in batadv_orig_hardif_seq_print_text() - fix reference counting imbalance when using fragmentation - avoid access to orig_node objects after they have been free'd - fix local TT check for outgoing arp requests in DAT -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJTdQmwAAoJEEKTMo6mOh1VGLAP/A1nHDzPMUOcDttR49Cs38w0 oD2Ox66xSJh2Yn8qRg9k7CshG65pU70J77bQjkPvMtTlwsgwgLFcHP1b/RJQl7Cz aFSJY5tKvLL41TwqxLSAmvUyPMfvagXvxH65bLBIQ9+dLNkDiHNH/IjdnYKWHYi9 0tqUi7/pLaCfWXMkDVeWn0P2M8baDyU1HUTuRX3ctE4l9PKF9ZVgxsxaPrhTYlXY J61KT+VXs19rdAnYQlFiaDk64Q6meMjuNjxuLkViTmqKi6pSDGi9skeKWZXaKOjT UmLLygVyf9Sh36TWDKinSV09r/s+TeU35o6bCgrmshZebSmFEUkEDA7oxNJ5JW+Q Lh2Y2SrX/+F0+9yhxhDd0fHP3PAwt2XNKjIQjurE85Gw84ZoMyBsVIpF8LD3IS+I T5CSAB0fEyeS0ZFyChbgWSLZzFjcowRHwK1iO8SJC5LHRtYerEqnvgP/V3ej0dt9 A4nq8eO8N9AorQc1G9qMosLNLheMCmFenU2nb8MbC5yDvq2X9jxsmgYm0fvr/y47 f667bowPr0afhsLvTqy6ezYma9EV40F8jW2/OovyBRUuytavJ4xcbCz/FUlWfNRU xx68e15t49iOFJynGXt62LJnEmBzRaE2uUagZaMNms18gmsL10y5pECAmi9zhQWK smkfqmsVWU8nB9UsDIT7 =+DKS -----END PGP SIGNATURE----- Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge Include changes: - fix NULL dereference in batadv_orig_hardif_seq_print_text() - fix reference counting imbalance when using fragmentation - avoid access to orig_node objects after they have been free'd - fix local TT check for outgoing arp requests in DAT	2014-05-16 16:28:53 -04:00
Kirill Tkhai	31ff6aa5c8	net: unix: Align send data_len up to PAGE_SIZE Using whole of allocated pages reduces requested skb->data size. This is just a little more thriftily allocation. netperf does not show difference with the current performance. Signed-off-by: Kirill Tkhai <ktkhai@parallels.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-16 16:04:03 -04:00
David S. Miller	202630b445	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== pull request: wireless 2014-05-15 Please pull this batch of fixes for the 3.15 stream... For the mac80211 bits, Johannes says: "One fix is to get better VHT performance and the other fixes tracing garbage or other potential issues with the interface name tracing." And... "This has a fix from Emmanuel for a problem I failed to fix - when association is in progress then it needs to be cancelled while suspending (I had fixed the same for authentication). Also included a fix from myself for a userspace API problem that hit the iw tool and a fix to the remain-on-channel framework." For the iwlwifi bits, Emmanuel says: "Alex fixes the scan by disabling the fragmented scan. David prevents scan offload while associated, the firmware seems not to like it. I fix a stupid bug I made in BT Coex, and fix a bad #ifdef clause in rate scaling. Along with that there is a fix for a NULL pointer exception that can happen if we load the driver and our ISR gets called because the interrupt line is shared. The fix has been tested by the reporter." And... "We have here a fix from David Spinadel that makes a previous fix more complete, and an off-by-one issue fixed by Eliad in the same area. I fix the monitor that broke on the way." Beyond that... Daniel Kim's one-liner fixes a brcmfmac regression caused by a typo in an earlier commit.. Rajkumar Manoharan fixes an ath9k oops reported by David Herrmann. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-16 15:45:56 -04:00
Nathaniel W Filardo	fde0133b9c	af_rxrpc: Fix XDR length check in rxrpc key demarshalling. There may be padding on the ticket contained in the key payload, so just ensure that the claimed token length is large enough, rather than exactly the right size. Signed-off-by: Nathaniel Wesley Filardo <nwf@cs.jhu.edu> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-16 15:24:47 -04:00
Ilya Dryomov	f140662f35	crush: decode and initialize chooseleaf_vary_r Commit `e2b149cc4b` ("crush: add chooseleaf_vary_r tunable") added the crush_map::chooseleaf_vary_r field but missed the decode part. This lead to misdirected requests caused by incorrect raw crush mapping sets. Fixes: http://tracker.ceph.com/issues/8226 Reported-and-Tested-by: Dmitry Smirnov <onlyjob@member.fsf.org> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2014-05-16 21:29:55 +04:00
Chunwei Chen	178eda29ca	libceph: fix corruption when using page_count 0 page in rbd It has been reported that using ZFSonLinux on rbd will result in memory corruption. The bug report can be found here: https://github.com/zfsonlinux/spl/issues/241 http://tracker.ceph.com/issues/7790 The reason is that ZFS will send pages with page_count 0 into rbd, which in turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with page_count 0, as it will do get_page and put_page, and erroneously free the page. This type of issue has been noted before, and handled in iscsi, drbd, etc. So, rbd should also handle this. This fix address this issue by fall back to slower sendmsg when page_count 0 detected. Cc: Sage Weil <sage@inktank.com> Cc: Yehuda Sadeh <yehuda@inktank.com> Cc: stable@vger.kernel.org Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>	2014-05-16 21:29:26 +04:00
Tejun Heo	5c9d535b89	cgroup: remove css_parent() cgroup in general is moving towards using cgroup_subsys_state as the fundamental structural component and css_parent() was introduced to convert from using cgroup->parent to css->parent. It was quite some time ago and we're moving forward with making css more prominent. This patch drops the trivial wrapper css_parent() and let the users dereference css->parent. While at it, explicitly mark fields of css which are public and immutable. v2: New usage from device_cgroup.c converted. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: "David S. Miller" <davem@davemloft.net> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Johannes Weiner <hannes@cmpxchg.org>	2014-05-16 13:22:48 -04:00
Andrzej Kaczmarek	f4e2dd53d5	Bluetooth: Add missing msecs to jiffies conversion conn_info_age value is calculated in ms, so need to be converted to jiffies. Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-05-16 08:23:01 -07:00
Andrzej Kaczmarek	eed5daf318	Bluetooth: Add support for max_tx_power in Get Conn Info This patch adds support for max_tx_power in Get Connection Information request. Value is read only once for given connection and then always returned in response as parameter. Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-05-15 21:48:07 -07:00
Andrzej Kaczmarek	d0455ed996	Bluetooth: Store max TX power level for connection This patch adds support to store local maximum TX power level for connection when reply for HCI_Read_Transmit_Power_Level is received. Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-05-15 21:48:07 -07:00
Andrzej Kaczmarek	f7faab0c9d	Bluetooth: Avoid polling TX power for LE links TX power for LE links is immutable thus we do not need to query for it if already have value. Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-05-15 21:48:06 -07:00
Andrzej Kaczmarek	dd9838087b	Bluetooth: Add support to get connection information This patch adds support for Get Connection Information mgmt command which can be used to query for information about connection, i.e. RSSI and local TX power level. In general values cached in hci_conn are returned as long as they are considered valid, i.e. do not exceed age limit set in hdev. This limit is calculated as random value between min/max values to avoid client trying to guess when to poll for updated information. Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-05-15 21:48:06 -07:00
Andrzej Kaczmarek	31ad169148	Bluetooth: Add conn info lifetime parameters to debugfs This patch adds conn_info_min_age and conn_info_max_age parameters to debugfs which determine lifetime of connection information. Actual lifetime will be random value between min and max age. Default values for min and max age are 1000ms and 3000ms respectively. Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-05-15 21:48:05 -07:00
Duan Jiong	be7a010d6f	ipv6: update Destination Cache entries when gateway turn into host RFC 4861 states in 7.2.5: The IsRouter flag in the cache entry MUST be set based on the Router flag in the received advertisement. In those cases where the IsRouter flag changes from TRUE to FALSE as a result of this update, the node MUST remove that router from the Default Router List and update the Destination Cache entries for all destinations using that neighbor as a router as specified in Section 7.3.3. This is needed to detect when a node that is used as a router stops forwarding packets due to being configured as a host. Currently, when dealing with NA Message which IsRouter flag changes from TRUE to FALSE, the kernel only removes router from the Default Router List, and don't update the Destination Cache entries. Now in order to update those Destination Cache entries, i introduce function rt6_clean_tohost(). Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-15 23:26:27 -04:00
David S. Miller	f895f0cfbb	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Conflicts: net/ipv4/ip_vti.c Steffen Klassert says: ==================== pull request (net): ipsec 2014-05-15 This pull request has a merge conflict in net/ipv4/ip_vti.c between commit `8d89dcdf80` ("vti: don't allow to add the same tunnel twice") and commit `a32452366b` ("vti4:Don't count header length twice"). It can be solved like it is done in linux-next. 1) Fix a ipv6 xfrm output crash when a packet is rerouted by netfilter to not use IPsec. 2) vti4 counts some header lengths twice leading to an incorrect device mtu. Fix this by counting these headers only once. 3) We don't catch the case if an unsupported protocol is submitted to the xfrm protocol handlers, this can lead to NULL pointer dereferences. Fix this by adding the appropriate checks. 4) vti6 may unregister pernet ops twice on init errors. Fix this by removing one of the calls to do it only once. From Mathias Krause. 5) Set the vti tunnel mark before doing a lookup in the error handlers. Otherwise we don't find the correct xfrm state. ==================== The conflict in ip_vti.c was simple, 'net' had a commit removing a line from vti_tunnel_init() and this tree being merged had a commit adding a line to the same location. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-15 23:23:48 -04:00
Julia Lawall	112a3513b5	vti6: delete unneeded call to netdev_priv Netdev_priv is an accessor function, and has no purpose if its result is not used. A simplified version of the semantic match that fixes this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ local idexpression x; @@ -x = netdev_priv(...); ... when != x // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-15 16:57:47 -04:00
Julia Lawall	4929fd8cb0	ip_tunnel: delete unneeded call to netdev_priv Netdev_priv is an accessor function, and has no purpose if its result is not used. A simplified version of the semantic match that fixes this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ local idexpression x; @@ -x = netdev_priv(...); ... when != x // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-15 16:57:47 -04:00
Alexei Starovoitov	622582786c	net: filter: x86: internal BPF JIT Maps all internal BPF instructions into x86_64 instructions. This patch replaces original BPF x64 JIT with internal BPF x64 JIT. sysctl net.core.bpf_jit_enable is reused as on/off switch. Performance: 1. old BPF JIT and internal BPF JIT generate equivalent x86_64 code. No performance difference is observed for filters that were JIT-able before Example assembler code for BPF filter "tcpdump port 22" original BPF -> old JIT: original BPF -> internal BPF -> new JIT: 0: push %rbp 0: push %rbp 1: mov %rsp,%rbp 1: mov %rsp,%rbp 4: sub $0x60,%rsp 4: sub $0x228,%rsp 8: mov %rbx,-0x8(%rbp) b: mov %rbx,-0x228(%rbp) // prologue 12: mov %r13,-0x220(%rbp) 19: mov %r14,-0x218(%rbp) 20: mov %r15,-0x210(%rbp) 27: xor %eax,%eax // clear A c: xor %ebx,%ebx 29: xor %r13,%r13 // clear X e: mov 0x68(%rdi),%r9d 2c: mov 0x68(%rdi),%r9d 12: sub 0x6c(%rdi),%r9d 30: sub 0x6c(%rdi),%r9d 16: mov 0xd8(%rdi),%r8 34: mov 0xd8(%rdi),%r10 3b: mov %rdi,%rbx 1d: mov $0xc,%esi 3e: mov $0xc,%esi 22: callq 0xffffffffe1021e15 43: callq 0xffffffffe102bd75 27: cmp $0x86dd,%eax 48: cmp $0x86dd,%rax 2c: jne 0x0000000000000069 4f: jne 0x000000000000009a 2e: mov $0x14,%esi 51: mov $0x14,%esi 33: callq 0xffffffffe1021e31 56: callq 0xffffffffe102bd91 38: cmp $0x84,%eax 5b: cmp $0x84,%rax 3d: je 0x0000000000000049 62: je 0x0000000000000074 3f: cmp $0x6,%eax 64: cmp $0x6,%rax 42: je 0x0000000000000049 68: je 0x0000000000000074 44: cmp $0x11,%eax 6a: cmp $0x11,%rax 47: jne 0x00000000000000c6 6e: jne 0x0000000000000117 49: mov $0x36,%esi 74: mov $0x36,%esi 4e: callq 0xffffffffe1021e15 79: callq 0xffffffffe102bd75 53: cmp $0x16,%eax 7e: cmp $0x16,%rax 56: je 0x00000000000000bf 82: je 0x0000000000000110 58: mov $0x38,%esi 88: mov $0x38,%esi 5d: callq 0xffffffffe1021e15 8d: callq 0xffffffffe102bd75 62: cmp $0x16,%eax 92: cmp $0x16,%rax 65: je 0x00000000000000bf 96: je 0x0000000000000110 67: jmp 0x00000000000000c6 98: jmp 0x0000000000000117 69: cmp $0x800,%eax 9a: cmp $0x800,%rax 6e: jne 0x00000000000000c6 a1: jne 0x0000000000000117 70: mov $0x17,%esi a3: mov $0x17,%esi 75: callq 0xffffffffe1021e31 a8: callq 0xffffffffe102bd91 7a: cmp $0x84,%eax ad: cmp $0x84,%rax 7f: je 0x000000000000008b b4: je 0x00000000000000c2 81: cmp $0x6,%eax b6: cmp $0x6,%rax 84: je 0x000000000000008b ba: je 0x00000000000000c2 86: cmp $0x11,%eax bc: cmp $0x11,%rax 89: jne 0x00000000000000c6 c0: jne 0x0000000000000117 8b: mov $0x14,%esi c2: mov $0x14,%esi 90: callq 0xffffffffe1021e15 c7: callq 0xffffffffe102bd75 95: test $0x1fff,%ax cc: test $0x1fff,%rax 99: jne 0x00000000000000c6 d3: jne 0x0000000000000117 d5: mov %rax,%r14 9b: mov $0xe,%esi d8: mov $0xe,%esi a0: callq 0xffffffffe1021e44 dd: callq 0xffffffffe102bd91 // MSH e2: and $0xf,%eax e5: shl $0x2,%eax e8: mov %rax,%r13 eb: mov %r14,%rax ee: mov %r13,%rsi a5: lea 0xe(%rbx),%esi f1: add $0xe,%esi a8: callq 0xffffffffe1021e0d f4: callq 0xffffffffe102bd6d ad: cmp $0x16,%eax f9: cmp $0x16,%rax b0: je 0x00000000000000bf fd: je 0x0000000000000110 ff: mov %r13,%rsi b2: lea 0x10(%rbx),%esi 102: add $0x10,%esi b5: callq 0xffffffffe1021e0d 105: callq 0xffffffffe102bd6d ba: cmp $0x16,%eax 10a: cmp $0x16,%rax bd: jne 0x00000000000000c6 10e: jne 0x0000000000000117 bf: mov $0xffff,%eax 110: mov $0xffff,%eax c4: jmp 0x00000000000000c8 115: jmp 0x000000000000011c c6: xor %eax,%eax 117: mov $0x0,%eax c8: mov -0x8(%rbp),%rbx 11c: mov -0x228(%rbp),%rbx // epilogue cc: leaveq 123: mov -0x220(%rbp),%r13 cd: retq 12a: mov -0x218(%rbp),%r14 131: mov -0x210(%rbp),%r15 138: leaveq 139: retq On fully cached SKBs both JITed functions take 12 nsec to execute. BPF interpreter executes the program in 30 nsec. The difference in generated assembler is due to the following: Old BPF imlements LDX_MSH instruction via sk_load_byte_msh() helper function inside bpf_jit.S. New JIT removes the helper and does it explicitly, so ldx_msh cost is the same for both JITs, but generated code looks longer. New JIT has 4 registers to save, so prologue/epilogue are larger, but the cost is within noise on x64. Old JIT checks whether first insn clears A and if not emits 'xor %eax,%eax'. New JIT clears %rax unconditionally. 2. old BPF JIT doesn't support ANC_NLATTR, ANC_PAY_OFFSET, ANC_RANDOM extensions. New JIT supports all BPF extensions. Performance of such filters improves 2-4 times depending on a filter. The longer the filter the higher performance gain. Synthetic benchmarks with many ancillary loads see 20x speedup which seems to be the maximum gain from JIT Notes: . net.core.bpf_jit_enable=2 + tools/net/bpf_jit_disasm is still functional and can be used to see generated assembler . there are two jit_compile() functions and code flow for classic filters is: sk_attach_filter() - load classic BPF bpf_jit_compile() - try to JIT from classic BPF sk_convert_filter() - convert classic to internal bpf_int_jit_compile() - JIT from internal BPF seccomp and tracing filters will just call bpf_int_jit_compile() Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-15 16:31:30 -04:00
Phoebe Buckheister	6ef0023a2e	mac802154: make mac802154_wpan_open static This function is only used within the same translation unit, so mark it static. Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-15 15:51:43 -04:00
Phoebe Buckheister	1cc76e3654	ieee802154: fix dgram socket sendmsg() 802.15.4 datagram sockets do not currently have a compliant sendmsg(). The destination address supplied is always ignored, and in unconnected mode, packets are broadcast instead of dropped with -EDESTADDRREQ. This patch fixes 802.15.4 dgram sockets to be compliant, i.e. !conn && !msg_name => -EDESTADDRREQ !conn && msg_name => send to msg_name conn && !msg_name => send to connected conn && msg_name => -EISCONN Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-15 15:51:43 -04:00
Phoebe Buckheister	d4b2816d67	6lowpan: fix fragmentation Currently, 6lowpan creates one 802.15.4 MAC header for the original packet the device was given by upper layers and reuses this header for all fragments, if fragmentation is required. This also reuses frame sequence numbers, which must not happen. 6lowpan also has issues with fragmentation in the presence of security headers, since those may imply the presence of trailing fields that are not accounted for by the fragmentation code right now. Fix both of these issues by properly allocating fragment skbs with headromm and tailroom as specified by the underlying device, create one header for each skb instead of reusing the original header, let the underlying device do the rest. Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-15 15:51:43 -04:00
Phoebe Buckheister	32edc40ae6	ieee802154: change _cb handling slightly The current mac_cb handling of ieee802154 is rather awkward and limited. Decompose the single flags field into multiple fields with the meanings of each subfield of the flags field to make future extensions (for example, link-layer security) easier. Also don't set the frame sequence number in upper layers, since that's a thing the MAC is supposed to set on frame transmit - we set it on header creation, but assuming that upper layers do not blindly duplicate our headers, this is fine. Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-15 15:51:42 -04:00
Phoebe Buckheister	8c84296fd2	mac802154: account for all header parts during wpan header creationg The current WPAN header creation code checks for EMSGSIZE conditions, but does not account for the MIC field that link layer security may add at the end of the frame. Now that we can accurately calculate the maximum payload size of packets, use that to check for EMSGSIZE conditions. Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-15 15:51:42 -04:00
Phoebe Buckheister	c3a6114f31	ieee802154: add definitions for link-layer security and header functions When dealing with 802.15.4, one often has to know the maximum payload size for a given packet. This depends on many factors, one of which is whether or not a security header is present in the frame. These definitions and functions provide an easy way for any upper layer to calculate the maximum payload size for a packet. The first obvious user for this is 6lowpan, which duplicates this calculation and gets it partially wrong because it ignores security headers. Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-15 15:51:42 -04:00
Cong Wang	200b916f35	rtnetlink: wait for unregistering devices in rtnl_link_unregister() From: Cong Wang <cwang@twopensource.com> commit `50624c934d` (net: Delay default_device_exit_batch until no devices are unregistering) introduced rtnl_lock_unregistering() for default_device_exit_batch(). Same race could happen we when rmmod a driver which calls rtnl_link_unregister() as we call dev->destructor without rtnl lock. For long term, I think we should clean up the mess of netdev_run_todo() and net namespce exit code. Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-15 15:30:33 -04:00
Antonio Quartulli	cc2f33860c	batman-adv: fix local TT check for outgoing arp requests in DAT Change introduced by `88e48d7b33` ("batman-adv: make DAT drop ARP requests targeting local clients") implements a check that prevents DAT from using the caching mechanism when the client that is supposed to provide a reply to an arp request is local. However change brought by `be1db4f661` ("batman-adv: make the Distributed ARP Table vlan aware") has not converted the above check into its vlan aware version thus making it useless when the local client is behind a vlan. Fix the behaviour by properly specifying the vlan when checking for a client being local or not. Reported-by: Simon Wunderlich <simon@open-mesh.com> Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2014-05-15 20:23:47 +02:00
Antonio Quartulli	377fe0f968	batman-adv: increase orig refcount when storing ref in gw_node A pointer to the orig_node representing a bat-gateway is stored in the gw_node->orig_node member, but the refcount for such orig_node is never increased. This leads to memory faults when gw_node->orig_node is accessed and the originator has already been freed. Fix this by increasing the refcount on gw_node creation and decreasing it on gw_node free. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2014-05-15 20:03:17 +02:00
Antonio Quartulli	be181015a1	batman-adv: fix reference counting imbalance while sending fragment In the new fragmentation code the batadv_frag_send_packet() function obtains a reference to the primary_if, but it does not release it upon return. This reference imbalance prevents the primary_if (and then the related netdevice) to be properly released on shut down. Fix this by releasing the primary_if in batadv_frag_send_packet(). Introduced by `ee75ed8887` ("batman-adv: Fragment and send skbs larger than mtu") Cc: Martin Hundebøll <martin@hundeboll.net> Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Acked-by: Martin Hundebøll <martin@hundeboll.net>	2014-05-15 20:03:17 +02:00
Marek Lindner	16a4142363	batman-adv: fix indirect hard_iface NULL dereference If hard_iface is NULL and goto out is made batadv_hardif_free_ref() doesn't check for NULL before dereferencing it to get to refcount. Introduced in `cb1c92ec37` ("batman-adv: add debugfs support to view multiif tables"). Reported-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Acked-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>	2014-05-15 20:03:16 +02:00
Pablo Neira Ayuso	3b084e99a3	netfilter: nf_tables: fix trace of matching non-terminal rule Add the corresponding trace if we have a full match in a non-terminal rule. Note that the traces will look slightly different than in x_tables since the log message after all expressions have been evaluated (contrary to x_tables, that emits it before the target action). This manifests in two differences in nf_tables wrt. x_tables: 1) The rule that enables the tracing is included in the trace. 2) If the rule emits some log message, that is shown before the trace log message. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-15 19:44:20 +02:00
John W. Linville	025a58fd9d	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem	2014-05-15 10:24:28 -04:00
Andrei Otcheretianski	1af586c911	mac80211: Handle the CSA counters correctly Make the beacon CSA counters part of ieee80211_mutable_offsets and don't decrement CSA counters when generating a beacon template. This permits the driver to offload the CSA counters handling. Since mac80211 updates the probe responses with the correct counter, the driver should sync the counter's value with mac80211 using ieee80211_csa_update_counter function. Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-15 15:01:00 +02:00
Andrei Otcheretianski	6ec8c332a0	mac80211: Provide ieee80211_beacon_get_template API Add a new API ieee80211_beacon_get_template, which doesn't affect DTIM counter and should be used if the device generates beacon frames, and new beacon template is needed. In addition set the offsets to TIM IE for MESH interface. Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-15 15:01:00 +02:00
Andrei Otcheretianski	0d06d9ba93	mac80211: Support multiple CSA counters Support up to IEEE80211_MAX_CSA_COUNTERS_NUM csa counters. This is defined to be 2 now, to support both CSA and eCSA counters. Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-15 15:00:58 +02:00
Andrei Otcheretianski	9a774c78e2	cfg80211: Support multiple CSA counters Change the type of NL80211_ATTR_CSA_C_OFF_BEACON and NL80211_ATTR_CSA_C_OFF_PRESP to be NLA_BINARY which allows userspace to use beacons and probe responses with multiple CSA counters. This isn't breaking the API since userspace can continue to use nla_put_u16 for this attributes, which is equivalent to a single element u16 array. In addition advertise max number of supported CSA counters. This is needed when using CSA and eCSA IEs together. Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-15 15:00:42 +02:00
Andrei Otcheretianski	387910cc79	mac80211: Update CSA counters in mgmt frames Track current csa counter value and use it to update mgmt frames at the provided offsets. Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-15 14:54:32 +02:00
Andrei Otcheretianski	34d22ce22b	cfg80211: Add API to update CSA counters in mgmt frames Add NL80211_ATTR_CSA_C_OFFSETS_TX which holds an array of offsets to the CSA counters which should be updated when sending a management frames with NL80211_CMD_FRAME. This API should be used by the drivers that wish to keep the CSA counter updated in probe responses, but do not implement probe response offloading and so, do not use ieee80211_proberesp_get function. Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-15 14:52:44 +02:00
Luciano Coelho	00ec75fc5a	cfg80211: pass the actual iftype when calling cfg80211_chandef_dfs_required() There is no need to pass NL80211_IFTYPE_UNSPECIFIED when calling cfg80211_chandef_dfs_required() since we always already have the interface type. So, pass the actual interface type instead. Additionally, have cfg80211_chandef_dfs_required() WARN if the passed interface type is NL80211_IFTYPE_UNSPECIFIED, so we can detect problems more easily. Tested-by: Janusz Dziedzic <janusz.dziedzic@tieto.com> Reported-by: Eliad Peller <eliad@wizery.com> Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-15 14:50:34 +02:00
Joe Perches	c722831744	net: Use a more standard macro for INET_ADDR_COOKIE Missing a colon on definition use is a bit odd so change the macro for the 32 bit case to declare an __attribute__((unused)) and __deprecated variable. The __deprecated attribute will cause gcc to emit an error if the variable is actually used. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 16:07:23 -04:00
John W. Linville	eac94da8b4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211	2014-05-14 15:39:45 -04:00
Ursula Braun	f5738e2ef8	af_iucv: wrong mapping of sent and confirmed skbs When sending data through IUCV a MESSAGE COMPLETE interrupt signals that sent data memory can be freed or reused again. With commit `f9c41a62bb` "af_iucv: fix recvmsg by replacing skb_pull() function" the MESSAGE COMPLETE callback iucv_callback_txdone() identifies the wrong skb as being confirmed, which leads to data corruption. This patch fixes the skb mapping logic in iucv_callback_txdone(). Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:38:39 -04:00
wangweidong	8ba7e7bfc3	dccp: make the request_retries minimum is 1 In Documentation/networking/dccp.txt points that request_retries should be greater than 0. So make the extra1 to be &one instead of &zero. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:34:16 -04:00
WANG Cong	c9f2dba61b	snmp: fix some left over of snmp stats Fengguang reported the following sparse warning: >> net/ipv6/proc.c:198:41: sparse: incorrect type in argument 1 (different address spaces) net/ipv6/proc.c:198:41: expected void [noderef] <asn:3>mib net/ipv6/proc.c:198:41: got void [noderef] <asn:3>*pcpumib Fixes: commit `698365fa18` (net: clean up snmp stats code) Reported-by: Fengguang Wu <fengguang.wu@intel.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:33:47 -04:00
WANG Cong	122ff243f5	ipv4: make ip_local_reserved_ports per netns ip_local_port_range is already per netns, so should ip_local_reserved_ports be. And since it is none by default we don't actually need it when we don't enable CONFIG_SYSCTL. By the way, rename inet_is_reserved_local_port() to inet_is_local_reserved_port() Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:31:45 -04:00
Jon Paul Maloy	9816f0615d	tipc: merge port message reception into socket reception function In order to reduce complexity and save a call level during message reception at port/socket level, we remove the function tipc_port_rcv() and merge its functionality into tipc_sk_rcv(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:19:48 -04:00
Jon Paul Maloy	c82910e2a8	tipc: clean up neigbor discovery message reception The function tipc_disc_rcv(), which is handling received neighbor discovery messages, is perceived as messy, and it is hard to verify its correctness by code inspection. The fact that the task it is set to resolve is fairly complex does not make the situation better. In this commit we try to take a more systematic approach to the problem. We define a decision machine which takes three state flags as input, and produces three action flags as output. We then walk through all permutations of the state flags, and for each of them we describe verbally what is going on, plus that we set zero or more of the action flags. The action flags indicate what should be done once the decision machine has finished its job, while the last part of the function deals with performing those actions. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:19:48 -04:00
Jon Paul Maloy	38504c28a2	tipc: improve and extend media address conversion functions TIPC currently handles two media specific addresses: Ethernet MAC addresses and InfiniBand addresses. Those are kept in three different formats: 1) A "raw" format as obtained from the device. This format is known only by the media specific adapter code in eth_media.c and ib_media.c. 2) A "generic" internal format, in the form of struct tipc_media_addr, which can be referenced and passed around by the generic media- unaware code. 3) A serialized version of the latter, to be conveyed in neighbor discovery messages. Conversion between the three formats can only be done by the media specific code, so we have function pointers for this purpose in struct tipc_media. Here, the media adapters can install their own conversion functions at startup. We now introduce a new such function, 'raw2addr()', whose purpose is to convert from format 1 to format 2 above. We also try to as far as possible uniform commenting, variable names and usage of these functions, with the purpose of making them more comprehensible. We can now also remove the function tipc_l2_media_addr_set(), whose job is done better by the new function. Finally, we expand the field for serialized addresses (format 3) in discovery messages from 20 to 32 bytes. This is permitted according to the spec, and reduces the risk of problems when we add new media in the future. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:19:48 -04:00
Jon Paul Maloy	37e22164a8	tipc: rename and move message reassembly function The function tipc_link_frag_rcv() is in reality a re-entrant generic message reassemby function that has nothing in particular to do with the link, where it is defined now. This becomes obvious when we see the need to call the function from other places in the code. In this commit rename it to tipc_buf_append() and move it to the file msg.c. We also simplify its signature by moving the tail pointer to the control block of the head buffer, hence making the head buffer self-contained. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:19:48 -04:00
Jon Paul Maloy	5074ab89c5	tipc: mark head of reassembly buffer as non-linear The message reassembly function does not update the 'len' and 'data_len' fields of the head skbuff correctly when fragments are chained to it. This may sometimes lead to obsure errors, such as fragment reordering when we receive fragments which are cloned buffers. This commit fixes this, by ensuring that the two fields are updated correctly. Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:19:48 -04:00
Jon Paul Maloy	ec37dcd382	tipc: don't record link RESET or ACTIVATE messages as traffic In the current code, all incoming LINK_PROTOCOL messages, irrespective of type, nudge the "last message received" checkpoint, informing the link state machine that a message was received from the peer since last supervision timeout event. This inhibits the link from starting probing the peer unnecessarily. However, not only STATE messages are recorded as legitimate incoming traffic this way, but even RESET and ACTIVATE messages, which in reality are there to inform the link that the peer endpoint has been reset. At the same time, some RESET messages may be dropped instead of causing a link reset. This happens when the link endpoint thinks it is fully up and working, and the session number of the RESET is lower than or equal to the current link session. In such cases the RESET is perceived as a delayed remnant from an earlier session, or the current one, and dropped. Now, if a TIPC module is removed and then immediately reinserted, e.g. when using a script, RESET messages may arrive at the peer link endpoint before this one has had time to discover the failure. The RESET may be dropped because of the session number, but only after it has been recorded as a legitimate traffic event. Hence, the receiving link will not start probing, and not discover that the peer endpoint is down, at the same time ignoring the periodic RESET messages coming from that endpoint. We have ended up in a stale state where a failed link cannot be re-established. In this commit, we remedy this by nudging the checkpoint only for received STATE messages, not for RESET or ACTIVATE messages. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:19:48 -04:00
Jon Paul Maloy	4f4482dcd9	tipc: compensate for double accounting in socket rcv buffer The function net/core/sock.c::__release_sock() runs a tight loop to move buffers from the socket backlog queue to the receive queue. As a security measure, sk_backlog.len of the receiving socket is not set to zero until after the loop is finished, i.e., until the whole backlog queue has been transferred to the receive queue. During this transfer, the data that has already been moved is counted both in the backlog queue and the receive queue, hence giving an incorrect picture of the available queue space for new arriving buffers. This leads to unnecessary rejection of buffers by sk_add_backlog(), which in TIPC leads to unnecessarily broken connections. In this commit, we compensate for this double accounting by adding a counter that keeps track of it. The function socket.c::backlog_rcv() receives buffers one by one from __release_sock(), and adds them to the socket receive queue. If the transfer is successful, it increases a new atomic counter 'tipc_sock::dupl_rcvcnt' with 'truesize' of the transferred buffer. If a new buffer arrives during this transfer and finds the socket busy (owned), we attempt to add it to the backlog. However, when sk_add_backlog() is called, we adjust the 'limit' parameter with the value of the new counter, so that the risk of inadvertent rejection is eliminated. It should be noted that this change does not invalidate the original purpose of zeroing 'sk_backlog.len' after the full transfer. We set an upper limit for dupl_rcvcnt, so that if a 'wild' sender (i.e., one that doesn't respect the send window) keeps pumping in buffers to sk_add_backlog(), he will eventually reach an upper limit, (2 x TIPC_CONN_OVERLOAD_LIMIT). After that, no messages can be added to the backlog, and the connection will be broken. Ordinary, well- behaved senders will never reach this buffer limit at all. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:19:47 -04:00
Jon Paul Maloy	6163a194e0	tipc: decrease connection flow control window Memory overhead when allocating big buffers for data transfer may be quite significant. E.g., truesize of a 64 KB buffer turns out to be 132 KB, 2 x the requested size. This invalidates the "worst case" calculation we have been using to determine the default socket receive buffer limit, which is based on the assumption that 1024x64KB = 67MB buffers may be queued up on a socket. Since TIPC connections cannot survive hitting the buffer limit, we have to compensate for this overhead. We do that in this commit by dividing the fix connection flow control window from 1024 (2512) messages to 512 (2256). Since older version nodes send out acks at 512 message intervals, compatibility with such nodes is guaranteed, although performance may be non-optimal in such cases. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:19:47 -04:00
Samuel Ortiz	40b9397a1a	Bluetooth: Fix L2CAP LE debugfs entries permissions 0466 was probably meant to be 0644, there's no reason why everyone except root could write there. Signed-off-by: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org	2014-05-14 09:07:07 -07:00
Janusz Dziedzic	67ae07a109	cfg80211: fix start_radar_detection issue After patch: cfg80211/mac80211: refactor cfg80211_chandef_dfs_required() start_radar_detection always fail with -EINVAL. Acked-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-14 16:42:22 +02:00
Johannes Berg	b4b177a555	mac80211: fix on-channel remain-on-channel Jouni reported that if a remain-on-channel was active on the same channel as the current operating channel, then the ROC would start, but any frames transmitted using mgmt-tx on the same channel would get delayed until after the ROC. The reason for this is that the ROC starts, but doesn't have any handling for "remain on the same channel", so it stops the interface queues. The later mgmt-tx then puts the frame on the interface queues (since it's on the current operating channel) and thus they get delayed until after the ROC. To fix this, add some logic to handle remaining on the same channel specially and not stop the queues etc. in this case. This not only fixes the bug but also improves behaviour in this case as data frames etc. can continue to flow. Cc: stable@vger.kernel.org Reported-by: Jouni Malinen <j@w1.fi> Tested-by: Jouni Malinen <j@w1.fi> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-14 15:48:38 +02:00
Hannes Frederic Sowa	3a1cebe7e0	ipv6: fix calculation of option len in ip6_append_data tot_len does specify the size of struct ipv6_txoptions. We need opt_flen + opt_nflen to calculate the overall length of additional ipv6 extensions. I found this while auditing the ipv6 output path for a memory corruption reported by Alexey Preobrazhensky while he fuzzed an instrumented AddressSanitizer kernel with trinity. This may or may not be the cause of the original bug. Fixes: `4df98e76cd` ("ipv6: pmtudisc setting not respected with UFO/CORK") Reported-by: Alexey Preobrazhensky <preobr@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 00:40:27 -04:00
Hannes Frederic Sowa	3d4405226d	net: avoid dependency of net_get_random_once on nop patching net_get_random_once depends on the static keys infrastructure to patch up the branch to the slow path during boot. This was realized by abusing the static keys api and defining a new initializer to not enable the call site while still indicating that the branch point should get patched up. This was needed to have the fast path considered likely by gcc. The static key initialization during boot up normally walks through all the registered keys and either patches in ideal nops or enables the jump site but omitted that step on x86 if ideal nops where already placed at static_key branch points. Thus net_get_random_once branches not always became active. This patch switches net_get_random_once to the ordinary static_key api and thus places the kernel fast path in the - by gcc considered - unlikely path. Microbenchmarks on Intel and AMD x86-64 showed that the unlikely path actually beats the likely path in terms of cycle cost and that different nop patterns did not make much difference, thus this switch should not be noticeable. Fixes: `a48e42920f` ("net: introduce new macro net_get_random_once") Reported-by: Tuomas Räsänen <tuomasjjrasanen@tjjr.fi> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 00:37:34 -04:00
Lorenzo Colitti	84f39b08d7	net: support marking accepting TCP sockets When using mark-based routing, sockets returned from accept() may need to be marked differently depending on the incoming connection request. This is the case, for example, if different socket marks identify different networks: a listening socket may want to accept connections from all networks, but each connection should be marked with the network that the request came in on, so that subsequent packets are sent on the correct network. This patch adds a sysctl to mark TCP sockets based on the fwmark of the incoming SYN packet. If enabled, and an unmarked socket receives a SYN, then the SYN packet's fwmark is written to the connection's inet_request_sock, and later written back to the accepted socket when the connection is established. If the socket already has a nonzero mark, then the behaviour is the same as it is today, i.e., the listening socket's fwmark is used. Black-box tested using user-mode linux: - IPv4/IPv6 SYN+ACK, FIN, etc. packets are routed based on the mark of the incoming SYN packet. - The socket returned by accept() is marked with the mark of the incoming SYN packet. - Tested with syncookies=1 and syncookies=2. Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-13 18:35:09 -04:00
Lorenzo Colitti	1b3c61dc1a	net: Use fwmark reflection in PMTU discovery. Currently, routing lookups used for Path PMTU Discovery in absence of a socket or on unmarked sockets use a mark of 0. This causes PMTUD not to work when using routing based on netfilter fwmark mangling and fwmark ip rules, such as: iptables -j MARK --set-mark 17 ip rule add fwmark 17 lookup 100 This patch causes these route lookups to use the fwmark from the received ICMP error when the fwmark_reflect sysctl is enabled. This allows the administrator to make PMTUD work by configuring appropriate fwmark rules to mark the inbound ICMP packets. Black-box tested using user-mode linux by pointing different fwmarks at routing tables egressing on different interfaces, and using iptables mangling to mark packets inbound on each interface with the interface's fwmark. ICMPv4 and ICMPv6 PMTU discovery work as expected when mark reflection is enabled and fail when it is disabled. Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-13 18:35:09 -04:00
Lorenzo Colitti	e110861f86	net: add a sysctl to reflect the fwmark on replies Kernel-originated IP packets that have no user socket associated with them (e.g., ICMP errors and echo replies, TCP RSTs, etc.) are emitted with a mark of zero. Add a sysctl to make them have the same mark as the packet they are replying to. This allows an administrator that wishes to do so to use mark-based routing, firewalling, etc. for these replies by marking the original packets inbound. Tested using user-mode linux: - ICMP/ICMPv6 echo replies and errors. - TCP RST packets (IPv4 and IPv6). Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-13 18:35:08 -04:00
Daniel Lee	3a19ce0eec	tcp: IPv6 support for fastopen server After all the preparatory works, supporting IPv6 in Fast Open is now easy. We pretty much just mirror v4 code. The only difference is how we generate the Fast Open cookie for IPv6 sockets. Since Fast Open cookie is 128 bits and we use AES 128, we use CBC-MAC to encrypt both the source and destination IPv6 addresses since the cookie is a MAC tag. Signed-off-by: Daniel Lee <longinus00@gmail.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Jerry Chu <hkchu@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-13 17:53:03 -04:00
Yuchung Cheng	0a672f7413	tcp: improve fastopen icmp handling If a fast open socket is already accepted by the user, it should be treated like a connected socket to record the ICMP error in sk_softerr, so the user can fetch it. Do that in both tcp_v4_err and tcp_v6_err. Also refactor the sequence window check to improve readability (e.g., there were two local variables named 'req'). Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Daniel Lee <longinus00@gmail.com> Signed-off-by: Jerry Chu <hkchu@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-13 17:53:03 -04:00
Yuchung Cheng	843f4a55e3	tcp: use tcp_v4_send_synack on first SYN-ACK To avoid large code duplication in IPv6, we need to first simplify the complicate SYN-ACK sending code in tcp_v4_conn_request(). To use tcp_v4(6)_send_synack() to send all SYN-ACKs, we need to initialize the mini socket's receive window before trying to create the child socket and/or building the SYN-ACK packet. So we move that initialization from tcp_make_synack() to tcp_v4_conn_request() as a new function tcp_openreq_init_req_rwin(). After this refactoring the SYN-ACK sending code is simpler and easier to implement Fast Open for IPv6. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Daniel Lee <longinus00@gmail.com> Signed-off-by: Jerry Chu <hkchu@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-13 17:53:02 -04:00
Yuchung Cheng	89278c9dc9	tcp: simplify fast open cookie processing Consolidate various cookie checking and generation code to simplify the fast open processing. The main goal is to reduce code duplication in tcp_v4_conn_request() for IPv6 support. Removes two experimental sysctl flags TFO_SERVER_ALWAYS and TFO_SERVER_COOKIE_NOT_CHKD used primarily for developmental debugging purposes. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Daniel Lee <longinus00@gmail.com> Signed-off-by: Jerry Chu <hkchu@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-13 17:53:02 -04:00
Yuchung Cheng	5b7ed0892f	tcp: move fastopen functions to tcp_fastopen.c Move common TFO functions that will be used by both v4 and v6 to tcp_fastopen.c. Create a helper tcp_fastopen_queue_check(). Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Daniel Lee <longinus00@gmail.com> Signed-off-by: Jerry Chu <hkchu@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-13 17:53:02 -04:00
Wilfried Klaebe	7ad24ea4bf	net: get rid of SET_ETHTOOL_OPS net: get rid of SET_ETHTOOL_OPS Dave Miller mentioned he'd like to see SET_ETHTOOL_OPS gone. This does that. Mostly done via coccinelle script: @@ struct ethtool_ops ops; struct net_device dev; @@ - SET_ETHTOOL_OPS(dev, ops); + dev->ethtool_ops = ops; Compile tested only, but I'd seriously wonder if this broke anything. Suggested-by: Dave Miller <davem@davemloft.net> Signed-off-by: Wilfried Klaebe <w-lkml@lebenslange-mailadresse.de> Acked-by: Felipe Balbi <balbi@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-13 17:43:20 -04:00
John W. Linville	3231d65ffe	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless	2014-05-13 15:27:44 -04:00
Mathias Krause	0f49ff0702	net: ptp: mark filter as __initdata sk_unattached_filter_create() will copy the filter's instructions so we don't need to have the master copy hanging around after initialization. Signed-off-by: Mathias Krause <minipli@googlemail.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-13 13:17:24 -04:00
David S. Miller	1268e253a8	net: filter: Fix redefinition warnings on x86-64. Do not collide with the x86-64 PTRACE user API namespace. net/core/filter.c:57:0: warning: "R8" redefined [enabled by default] arch/x86/include/uapi/asm/ptrace-abi.h:38:0: note: this is the location of the previous definition Fix by adding a BPF_ prefix to the register macros. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-13 13:13:33 -04:00
David S. Miller	6262971a8a	Included changes: - properly release neigh_ifinfo in batadv_iv_ogm_process_per_outif() - properly release orig_ifinfo->router when freeing orig_ifinfo - properly release neigh_node objects during periodic check - properly release neigh_info objects when the related hard_iface is free'd These changes are all very important because they fix some reference counting imbalances that lead to the impossibility of releasing the netdev object used by batman-adv on shutdown. The consequence is that such object cannot be destroyed by the networking stack (the refcounter does not reach zero) thus bringing the system in hanging state during a normal reboot operation or a network reconfiguration. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJTbyNlAAoJEEKTMo6mOh1VzeUQAJmcP73MMwdFDGfI+3DUD43Z ziaWzHK1/NAkERIJMYu/Nj9BPhFJ/JgYNoYGd4eZ+0IVzIidBKffpGvZYLKJaBBb kVzDt8sHgm7T+bmJdGK5zBCkCrQ66T1/7jF7evzWCtdmzAj9Ld+cJha6sZ6OLY4v WusFFHH2yQgzOGML52HdM99lIfZJu53sdQtYrMI7FpmObwmoBw1VQsmLsJbbFj0A XbFWYNOtQ0s8JvuHPnHB2gsczMXG6AdDuYdG1douOUryjsdg4AsKVWbPWaSuIyS9 ED6TiNsxtRt3A2YDgKrYmcGWHIc7CR4TE97DpdaB1xOEe/h0JPy8NEXaTiXifVi0 yWXaDZAl0J1gEKxda5foqIJZEScQyqWnAGFIIMVsxWxMpv9V3C+XaMgpgC5yQdoQ hgs6lv8U/w7Qevu4oaU2oq64C5ipyzheLuL+l9Ykwig9brJ9pqvBhEr34VDyyLnK l1VVQP5Y94gsPX2FuBaFgQ6oN3xjAkzFWDVKPtdYhMW7l93ER31KWgyJ53zK0Avk wl/h5Xvep7vgA1pvyiu7Lom47QX2SVY3Xt6vsJ42qrR9bp1sLZ+piZaSBPTSuNmo YySwgku6QlQfCFThh09zjuQ8+zwlq5Enjp+fvy/NtzEhTzK1gmknrQo0QF+Fj1Fj 5yz30/XWjUTn1dtBNeBw =GsPT -----END PGP SIGNATURE----- Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge Included changes: - properly release neigh_ifinfo in batadv_iv_ogm_process_per_outif() - properly release orig_ifinfo->router when freeing orig_ifinfo - properly release neigh_node objects during periodic check - properly release neigh_info objects when the related hard_iface is free'd These changes are all very important because they fix some reference counting imbalances that lead to the impossibility of releasing the netdev object used by batman-adv on shutdown. The consequence is that such object cannot be destroyed by the networking stack (the refcounter does not reach zero) thus bringing the system in hanging state during a normal reboot operation or a network reconfiguration.	2014-05-13 12:53:36 -04:00
Duan Jiong	2176d5d418	neigh: set nud_state to NUD_INCOMPLETE when probing router reachability Since commit 7e98056964("ipv6: router reachability probing"), a router falls into NUD_FAILED will be probed. Now if function rt6_select() selects a router which neighbour state is NUD_FAILED, and at the same time function rt6_probe() changes the neighbour state to NUD_PROBE, then function dst_neigh_output() can directly send packets, but actually the neighbour still is unreachable. If we set nud_state to NUD_INCOMPLETE instead NUD_PROBE, packets will not be sent out until the neihbour is reachable. In addition, because the route should be probes with a single NS, so we must set neigh->probes to neigh_max_probes(), then the neigh timer timeout and function neigh_timer_handler() will not send other NS Messages. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-13 12:43:05 -04:00
Tejun Heo	6770c64e5c	cgroup: replace cftype->trigger() with cftype->write() cftype->trigger() is pointless. It's trivial to ignore the input buffer from a regular ->write() operation. Convert all ->trigger() users to ->write() and remove ->trigger(). This patch doesn't introduce any visible behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz>	2014-05-13 12:16:21 -04:00
Tejun Heo	451af504df	cgroup: replace cftype->write_string() with cftype->write() Convert all cftype->write_string() users to the new cftype->write() which maps directly to kernfs write operation and has full access to kernfs and cgroup contexts. The conversions are mostly mechanical. * @css and @cft are accessed using of_css() and of_cft() accessors respectively instead of being specified as arguments. * Should return @nbytes on success instead of 0. * @buf is not trimmed automatically. Trim if necessary. Note that blkcg and netprio don't need this as the parsers already handle whitespaces. cftype->write_string() has no user left after the conversions and removed. While at it, remove unnecessary local variable @p in cgroup_subtree_control_write() and stale comment about CGROUP_LOCAL_BUFFER_SIZE in cgroup_freezer.c. This patch doesn't introduce any visible behavior changes. v2: netprio was missing from conversion. Converted. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Aristeu Rozanski <arozansk@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: "David S. Miller" <davem@davemloft.net>	2014-05-13 12:16:21 -04:00
Felix Fietkau	8c48b50a1a	cfg80211: allow restricting supported dfs regions At the moment, the ath9k/ath10k DFS module only supports detecting ETSI radar patterns. Add a bitmap in the interface combinations, indicating which DFS regions are supported by the detector. If unset, support for all regions is assumed. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-13 15:50:06 +02:00
Emmanuel Grumbach	c52666aef9	mac80211: fix suspend vs. association race If the association is in progress while we suspend, the stack will be in a messed up state. Clean it before we suspend. This patch completes Johannes's patch: `1a1cb744de` Author: Johannes Berg <johannes.berg@intel.com> mac80211: fix suspend vs. authentication race Cc: <stable@vger.kernel.org> Fixes: `12e7f51702` ("mac80211: cleanup generic suspend/resume procedures") Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-13 13:58:16 +02:00
Fabian Frederick	fc68086ce8	net/xfrm/xfrm_output.c: move EXPORT_SYMBOL Fix checkpatch warning: "WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable" Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-05-13 12:44:28 +02:00
Susant Sahani	c8965932a2	ip6_tunnel: fix potential NULL pointer dereference The function ip6_tnl_validate assumes that the rtnl attribute IFLA_IPTUN_PROTO always be filled . If this attribute is not filled by the userspace application kernel get crashed with NULL pointer dereference. This patch fixes the potential kernel crash when IFLA_IPTUN_PROTO is missing . Signed-off-by: Susant Sahani <susant@redhat.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-13 00:27:19 -04:00
Yang Yingliang	b2ce49e737	sch_hhf: fix comparison of qlen and limit When I use the following command, eth0 cannot send any packets. #tc qdisc add dev eth0 root handle 1: hhf limit 1 Because qlen need be smaller than limit, all packets were dropped. Fix this by qlen <= limit. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-12 14:55:21 -04:00
dingtianhong	f06c7f9f92	vlan: rename __vlan_find_dev_deep() to __vlan_find_dev_deep_rcu() The __vlan_find_dev_deep should always called in RCU, according David's suggestion, rename to __vlan_find_dev_deep_rcu looks more reasonable. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-12 14:39:13 -04:00
John W. Linville	c5e64d6b70	Merge git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211	2014-05-12 14:12:19 -04:00
WANG Cong	60ff746739	net: rename local_df to ignore_df As suggested by several people, rename local_df to ignore_df, since it means "ignore df bit if it is set". Cc: Maciej Żenczykowski <maze@google.com> Cc: Florian Westphal <fw@strlen.de> Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-12 14:03:41 -04:00
David S. Miller	5f013c9bc7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/altera/altera_sgdma.c net/netlink/af_netlink.c net/sched/cls_api.c net/sched/sch_api.c The netlink conflict dealt with moving to netlink_capable() and netlink_ns_capable() in the 'net' tree vs. supporting 'tc' operations in non-init namespaces. These were simple transformations from netlink_capable to netlink_ns_capable. The Altera driver conflict was simply code removal overlapping some void pointer cast cleanups in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-12 13:19:14 -04:00
Pablo Neira Ayuso	7e9bc10db2	netfilter: nf_tables: fix missing return trace at the end of non-base chain Display "return" for implicit rule at the end of a non-base chain, instead of when popping chain from the stack. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-12 16:33:11 +02:00
Pablo Neira Ayuso	f7e7e39b21	netfilter: nf_tables: fix bogus rulenum after goto action After returning from the chain that we just went to with no matchings, we get a bogus rule number in the trace. To fix this, we would need to iterate over the list of remaining rules in the chain to update the rule number counter. Patrick suggested to set this to the maximum value since the default base chain policy is the very last action when the processing the base chain is over. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-12 16:33:10 +02:00
Pablo Neira Ayuso	7b9d5ef932	netfilter: nf_tables: fix tracing of the goto action Add missing code to trace goto actions. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-12 16:33:08 +02:00
Pablo Neira Ayuso	5467a51221	netfilter: nf_tables: fix goto action This patch fixes a crash when trying to access the counters and the default chain policy from the non-base chain that we have reached via the goto chain. Fix this by falling back on the original base chain after returning from the custom chain. While fixing this, kill the inline function to account chain statistics to improve source code readability. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-12 16:32:41 +02:00
Steffen Klassert	6d004d6cc7	vti: Use the tunnel mark for lookup in the error handlers. We need to use the mark we get from the tunnels o_key to lookup the right vti state in the error handlers. This patch ensures that. Fixes: `df3893c1` ("vti: Update the ipv4 side to use it's own receive hook.") Fixes: `fa9ad96d` ("vti6: Update the ipv6 side to use its own receive hook.") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-05-12 09:36:03 +02:00
Mathias Krause	fd71143645	vti6: Don't unregister pernet ops twice on init errors If we fail to register one of the xfrm protocol handlers we will unregister the pernet ops twice on the error exit path. This will probably lead to a kernel panic as the double deregistration leads to a double kfree(). Fix this by removing one of the calls to do it only once. Fixes: `fa9ad96d49` ("vti6: Update the ipv6 side to use its own...") Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-05-12 07:43:21 +02:00
Duan Jiong	163cd4e817	ipv6: remove parameter rt from fib6_prune_clones() the parameter rt will be assigned to c.arg in function fib6_clean_tree(), but function fib6_prune_clone() doesn't use c.arg, so we can remove it safely. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-12 01:06:42 -04:00
Alexei Starovoitov	9739eef13c	net: filter: make BPF conversion more readable Introduce BPF helper macros to define instructions (similar to old BPF_STMT/BPF_JUMP macros) Use them while converting classic BPF to internal and in BPF testsuite later. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-12 00:23:55 -04:00
Simon Wunderlich	709de13f0c	batman-adv: fix removing neigh_ifinfo When an interface is removed separately, all neighbors need to be checked if they have a neigh_ifinfo structure for that particular interface. If that is the case, remove that ifinfo so any references to a hard interface can be freed. This is a regression introduced by `89652331c0` ("batman-adv: split tq information in neigh_node struct") Reported-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Simon Wunderlich <simon@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>	2014-05-11 09:10:58 +02:00
Pablo Neira Ayuso	d088be8042	netfilter: nf_tables: reset rule number counter after jump and goto Otherwise we start incrementing the rule number counter from the previous chain iteration. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-10 19:12:04 +02:00
Simon Wunderlich	7b955a9fc1	batman-adv: always run purge_orig_neighbors The current code will not execute batadv_purge_orig_neighbors() when an orig_ifinfo has already been purged. However we need to run it in any case. Fix that. This is a regression introduced by `7351a4822d` ("batman-adv: split out router from orig_node") Signed-off-by: Simon Wunderlich <simon@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>	2014-05-10 10:58:58 +02:00
Simon Wunderlich	000c8dff97	batman-adv: fix neigh reference imbalance When an interface is removed from batman-adv, the orig_ifinfo of a orig_node may be removed without releasing the router first. This will prevent the reference for the neighbor pointed at by the orig_ifinfo->router to be released, and this leak may result in reference leaks for the interface used by this neighbor. Fix that. This is a regression introduced by `7351a4822d` ("batman-adv: split out router from orig_node"). Reported-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Simon Wunderlich <simon@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>	2014-05-10 10:58:45 +02:00
Simon Wunderlich	c1e517fbbc	batman-adv: fix neigh_ifinfo imbalance The neigh_ifinfo object must be freed if it has been used in batadv_iv_ogm_process_per_outif(). This is a regression introduced by `89652331c0` ("batman-adv: split tq information in neigh_node struct") Reported-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Simon Wunderlich <simon@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>	2014-05-10 10:58:42 +02:00
Andrzej Kaczmarek	5a134faeef	Bluetooth: Store TX power level for connection This patch adds support to store local TX power level for connection when reply for HCI_Read_Transmit_Power_Level is received. Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-05-09 14:16:42 -07:00
David S. Miller	1448eb5669	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== pull request: wireless 2014-05-08 This one is all from Johannes: "Here are a few small fixes for the current cycle: radiotap TX flags were wrong (fix by Bob), Chun-Yeow fixes an SMPS issue with mesh interfaces, Eliad fixes a locking bug and a cfg80211 state problem and finally Henning sent me a fix for IBSS rate information." Please let me know if there are problems! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-09 16:46:53 -04:00
wangweidong	f66138c847	sctp: add a checking for sctp_sysctl_net_register When register_net_sysctl failed, we should free the sysctl_table. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-09 16:41:09 -04:00
wangweidong	eb9f37053d	Revert "sctp: optimize the sctp_sysctl_net_register" This revert commit efb842c45("sctp: optimize the sctp_sysctl_net_register"), Since it doesn't kmemdup a sysctl_table for init_net, so the init_net->sctp.sysctl_header->ctl_table_arg points to sctp_net_table which is a static array pointer. So when doing sctp_sysctl_net_unregister, it will free sctp_net_table, then we will get a NULL pointer dereference like that: [ 262.948220] BUG: unable to handle kernel NULL pointer dereference at 000000000000006c [ 262.948232] IP: [<ffffffff81144b70>] kfree+0x80/0x420 [ 262.948260] PGD db80a067 PUD dae12067 PMD 0 [ 262.948268] Oops: 0000 [#1] SMP [ 262.948273] Modules linked in: sctp(-) crc32c_generic libcrc32c ... [ 262.948338] task: ffff8800db830190 ti: ffff8800dad00000 task.ti: ffff8800dad00000 [ 262.948344] RIP: 0010:[<ffffffff81144b70>] [<ffffffff81144b70>] kfree+0x80/0x420 [ 262.948353] RSP: 0018:ffff8800dad01d88 EFLAGS: 00010046 [ 262.948358] RAX: 0100000000000000 RBX: ffffffffa0227940 RCX: ffffea0000707888 [ 262.948363] RDX: ffffea0000707888 RSI: 0000000000000001 RDI: ffffffffa0227940 [ 262.948369] RBP: ffff8800dad01de8 R08: 0000000000000000 R09: ffff8800d9e983a9 [ 262.948374] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa0227940 [ 262.948380] R13: ffffffff8187cfc0 R14: 0000000000000000 R15: ffffffff8187da10 [ 262.948386] FS: 00007fa2a2658700(0000) GS:ffff880112800000(0000) knlGS:0000000000000000 [ 262.948394] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 262.948400] CR2: 000000000000006c CR3: 00000000cddc0000 CR4: 00000000000006e0 [ 262.948410] Stack: [ 262.948413] ffff8800dad01da8 0000000000000286 0000000020227940 ffffffffa0227940 [ 262.948422] ffff8800dad01dd8 ffffffff811b7fa1 ffffffffa0227940 ffffffffa0227940 [ 262.948431] ffffffff8187d960 ffffffff8187cfc0 ffffffff8187d960 ffffffff8187da10 [ 262.948440] Call Trace: [ 262.948457] [<ffffffff811b7fa1>] ? unregister_sysctl_table+0x51/0xa0 [ 262.948476] [<ffffffffa020d1a1>] sctp_sysctl_net_unregister+0x21/0x30 [sctp] [ 262.948490] [<ffffffffa020ef6d>] sctp_net_exit+0x12d/0x150 [sctp] [ 262.948512] [<ffffffff81394f49>] ops_exit_list+0x39/0x60 [ 262.948522] [<ffffffff813951ed>] unregister_pernet_operations+0x3d/0x70 [ 262.948530] [<ffffffff81395292>] unregister_pernet_subsys+0x22/0x40 [ 262.948544] [<ffffffffa020efcc>] sctp_exit+0x3c/0x12d [sctp] [ 262.948562] [<ffffffff810c5e04>] SyS_delete_module+0x194/0x210 [ 262.948577] [<ffffffff81240fde>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 262.948587] [<ffffffff815217a2>] system_call_fastpath+0x16/0x1b With this revert, it won't occur the Oops. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-09 16:41:08 -04:00
wangweidong	be7faf7168	rds: remove the unneed NULL checking unregister_net_sysctl_table will check the ctl_table_header, so remove the unneed checking Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-09 15:59:45 -04:00
David S. Miller	b3d4056632	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following batch contains netfilter fixes for your net tree, they are: 1) Fix use after free in nfnetlink when sending a batch for some unsupported subsystem, from Denys Fedoryshchenko. 2) Skip autoload of the nat module if no binding is specified via ctnetlink, from Florian Westphal. 3) Set local_df after netfilter defragmentation to avoid a bogus ICMP fragmentation needed in the forwarding path, also from Florian. 4) Fix potential user after free in ip6_route_me_harder() when returning the error code to the upper layers, from Sergey Popovich. 5) Skip possible bogus ICMP time exceeded emitted from the router (not valid according to RFC) if conntrack zones are used, from Vasily Averin. 6) Fix fragment handling when nf_defrag_ipv4 is loaded but nf_conntrack is not present, also from Vasily. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-09 13:17:30 -04:00
Eliad Peller	f9ac71bfcc	mac80211: fix vif name tracing If sdata doesn't have a valid dev (e.g. in case of monitor vif), the vif_name field was initialized with (a length of) some short string, but later was set to a different, potentially larger one. This resulted in out-of-bounds write, which usually appeared as garbage in the trace log. Simply trace sdata->name, as it should always have the correct name for both cases. Signed-off-by: Eliad Peller <eliadx.peller@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-09 14:35:40 +02:00
Marcel Holtmann	b75cf9cd16	Bluetooth: Increment management interface revision This patch increments the management interface revision due to the changes with the Device Found management event and other fixes. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-05-09 15:05:57 +03:00
Johannes Berg	f6837ba8c9	mac80211: handle failed restart/resume better When the driver fails during HW restart or resume, the whole stack goes into a very confused state with interfaces being up while the hardware is down etc. Address this by shutting down everything; we'll run into a lot of warnings in the process but that's better than having the whole stack get messed up. Reviewed-by: Arik Nemtsov <arik@wizery.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-09 12:21:34 +02:00
Johannes Berg	4a817aa78f	mac80211: allow VHT with peers not capable of 40MHz There are two (related) issues with this. One case, reported by Michal, is related to hostap: it unsets the 20/40 capability bit for stations that associate when it's in 20 MHz mode. The other case, reported by Eyal, is that some APs like Netgear R6300v2 and probably others based on the BCM4360 chipset can be configured for doing VHT at 20Mhz. In this case the beacon has a VHT IE but the HT cap indicates transmitter only support 20Mhz. In both of these cases, we currently avoid VHT and use only HT this means we can't use the highest rates (MCS8), so fixing this leads to throughput improvements. Reported-by: Michal Kazior <michal.kazior@tieto.com> Reported-by: Eyal Shapira <eyal@wizery.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-09 09:56:53 +02:00
Ying Xue	ca9cf06a06	tipc: don't directly overwrite node action_flags Each node action flag should be set or cleared separately, instead we now set the whole flags variable in one shot, and it's turned out to be hard to see which other flags are affected. Therefore, for instance, we explicitly clear TIPC_WAIT_OWN_LINKS_DOWN bit in node_lost_contact(). Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-09 01:41:01 -04:00
Ying Xue	aecb9bb89c	tipc: rename enum names of node flags Rename node flags to action_flags as well as its enum names so that they can reflect its real meanings. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-09 01:41:01 -04:00
Tom Herbert	58d6085c14	l2tp: Remove UDP checksum verification Validating the UDP checksum is now done in UDP before handing packets to the encapsulation layer. Note that this also eliminates the "feature" where L2TP can ignore a non-zero UDP checksum (doing this was contrary to RFC 1122). Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-08 23:47:50 -04:00
Tom Herbert	0a80966b10	net: Verify UDP checksum before handoff to encap Moving validation of UDP checksum to be done in UDP not encap layer. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-08 23:47:50 -04:00
Tom Herbert	39471ac8dd	icmp6: Call skb_checksum_validate Use skb_checksum_validate to verify checksum. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-08 23:47:50 -04:00
Tom Herbert	29a96e1f36	icmp: Call skb_checksum_simple_validate Use skb_checksum_simple_validate to verify checksum. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-08 23:47:50 -04:00
Tom Herbert	de08dc1a8e	igmp: Call skb_checksum_simple_validate Use skb_checksum_simple_validate to verify checksum. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-08 23:47:50 -04:00
Tom Herbert	81249bea1f	gre6: Call skb_checksum_simple_validate Use skb_checksum_simple_validate to verify checksum. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-08 23:47:50 -04:00
Tom Herbert	b1036c6a47	gre: Call skb_checksum_simple_validate Use skb_checksum_simple_validate to verify checksum. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-08 23:47:50 -04:00
WANG Cong	32a4be4890	ipv4: remove inet_addr_hash_lock in devinet.c All the callers hold RTNL lock, so there is no need to use inet_addr_hash_lock to protect the hash list. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-08 22:56:56 -04:00
Cong Wang	ba6b918ab2	ping: move ping_group_range out of CONFIG_SYSCTL Similarly, when CONFIG_SYSCTL is not set, ping_group_range should still work, just that no one can change it. Therefore we should move it out of sysctl_net_ipv4.c. And, it should not share the same seqlock with ip_local_port_range. BTW, rename it to ->ping_group_range instead. Cc: David S. Miller <davem@davemloft.net> Cc: Francois Romieu <romieu@fr.zoreil.com> Reported-by: Stefan de Konink <stefan@konink.de> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-08 22:50:47 -04:00
Cong Wang	c9d8f1a642	ipv4: move local_port_range out of CONFIG_SYSCTL When CONFIG_SYSCTL is not set, ip_local_port_range should still work, just that no one can change it. Therefore we should move it out of sysctl_inet.c. Also, rename it to ->ip_local_ports instead. Cc: David S. Miller <davem@davemloft.net> Cc: Francois Romieu <romieu@fr.zoreil.com> Reported-by: Stefan de Konink <stefan@konink.de> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-08 22:50:47 -04:00
Sergey Popovich	a8951d5814	netfilter: Fix potential use after free in ip6_route_me_harder() Dst is released one line before we access it again with dst->error. Fixes: `58e35d1471` netfilter: ipv6: propagate routing errors from ip6_route_me_harder() Signed-off-by: Sergey Popovich <popovich_sergei@mail.ru> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-09 02:36:39 +02:00
Kinglong Mee	ecca063b31	SUNRPC: Fix printk that is not only for nfsd Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-08 14:59:51 -04:00
John W. Linville	6153871f77	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem	2014-05-08 11:13:41 -04:00
Andrzej Kaczmarek	5ae76a9415	Bluetooth: Store RSSI for connection This patch adds support to store RSSI for connection when reply for HCI_Read_RSSI is received. Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-05-08 08:01:57 -07:00
Johan Hedberg	38e4a91566	Bluetooth: Add support for SMP Invalid Parameters error code The Invalid Parameters error code is used to indicate that the command length is invalid or that a parameter is outside of the specified range. This error code wasn't clearly specified in the Bluetooth 4.0 specification but since 4.1 this has been fixed. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-05-08 07:50:02 -07:00
Luciano Coelho	f29f58a9e5	mac80211: fix sparse warning caused by __ieee80211_channel_switch() Commit `59af6928` (mac80211: fix CSA tx queue stopping) introduced a sparse warning: net/mac80211/cfg.c:3274:5: warning: symbol '__ieee80211_channel_switch' was not declared. Should it be static? Fix it by declaring the function static. Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-08 12:00:12 +02:00
Michal Kazior	cf8767dd76	mac80211: disconnect iface if CSA unexpectedly fails It doesn't make much sense to leave a crippled interface running. As a side effect this will unblock tx queues with CSA reason immediately after failure instead of until after userspace requests interface to stop. This also gives userspace an opportunity to indirectly see CSA failure. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> [small code cleanup] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-08 11:57:27 +02:00
Sergey Popovich	aeefa1ecfc	ipv4: fib_semantics: increment fib_info_cnt after fib_info allocation Increment fib_info_cnt in fib_create_info() right after successfuly alllocating fib_info structure, overwise fib_metrics allocation failure leads to fib_info_cnt incorrectly decremented in free_fib_info(), called on error path from fib_create_info(). Signed-off-by: Sergey Popovich <popovich_sergei@mail.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-07 17:14:32 -04:00
WANG Cong	698365fa18	net: clean up snmp stats code commit `8f0ea0fe3a` (snmp: reduce percpu needs by 50%) reduced snmp array size to 1, so technically it doesn't have to be an array any more. What's more, after the following commit: commit `933393f58f` Date: Thu Dec 22 11:58:51 2011 -0600 percpu: Remove irqsafe_cpu_xxx variants We simply say that regular this_cpu use must be safe regardless of preemption and interrupt state. That has no material change for x86 and s390 implementations of this_cpu operations. However, arches that do not provide their own implementation for this_cpu operations will now get code generated that disables interrupts instead of preemption. probably no arch wants to have SNMP_ARRAY_SZ == 2. At least after almost 3 years, no one complains. So, just convert the array to a single pointer and remove snmp_mib_init() and snmp_mib_free() as well. Cc: Christoph Lameter <cl@linux.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-07 16:06:05 -04:00
Florian Westphal	c1e756bfcb	Revert "net: core: introduce netif_skb_dev_features" This reverts commit `d206940319`, there are no more callers. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-07 15:49:07 -04:00
Florian Westphal	c7ba65d7b6	net: ip: push gso skb forwarding handling down the stack Doing the segmentation in the forward path has one major drawback: When using virtio, we may process gso udp packets coming from host network stack. In that case, netfilter POSTROUTING will see one packet with udp header followed by multiple ip fragments. Delay the segmentation and do it after POSTROUTING invocation to avoid this. Fixes: `fe6cc55f3a` ("net: ip, ipv6: handle gso skbs in forwarding path") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-07 15:49:07 -04:00
Florian Westphal	418a31561d	net: ipv6: send pkttoobig immediately if orig frag size > mtu If conntrack defragments incoming ipv6 frags it stores largest original frag size in ip6cb and sets ->local_df. We must thus first test the largest original frag size vs. mtu, and not vice versa. Without this patch PKTTOOBIG is still generated in ip6_fragment() later in the stack, but 1) IPSTATS_MIB_INTOOBIGERRORS won't increment 2) packet did (needlessly) traverse netfilter postrouting hook. Fixes: `fe6cc55f3a` ("net: ip, ipv6: handle gso skbs in forwarding path") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-07 15:27:59 -04:00
Florian Westphal	ca6c5d4ad2	net: ipv4: ip_forward: fix inverted local_df test local_df means 'ignore DF bit if set', so if its set we're allowed to perform ip fragmentation. This wasn't noticed earlier because the output path also drops such skbs (and emits needed icmp error) and because netfilter ip defrag did not set local_df until couple of days ago. Only difference is that DF-packets-larger-than MTU now discarded earlier (f.e. we avoid pointless netfilter postrouting trip). While at it, drop the repeated test ip_exceeds_mtu, checking it once is enough... Fixes: `fe6cc55f3a` ("net: ip, ipv6: handle gso skbs in forwarding path") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-07 15:26:09 -04:00
Loic Poulain	771bb3ddc2	rfkill-gpio: Use gpio cansleep version If gpio controller requires waiting for read and write GPIO values, then we have to use the gpio cansleep api. Fix the rfkill_gpio_set_power which calls only the nonsleep version (causing kernel warning). There is no problem to use the cansleep version here because we are not in IRQ handler or similar context (cf rfkill_set_block). Signed-off-by: Loic Poulain <loic.poulain@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-07 13:20:59 +02:00
Michal Kazior	e5593f56eb	mac80211: ignore cqm during csa It is not guaranteed that multi-vif channel switching is tightly synchronized. It makes sense to ignore cqm (missing beacons, et al) while csa is progressing and re-check it after it completes. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-07 11:04:51 +02:00
Al Viro	2b777c9dd9	ceph_sync_read: stop poking into iov_iter guts Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-05-06 17:39:42 -04:00
John W. Linville	68b422db3d	Merge branch 'rfkill-gpio-cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next	2014-05-06 14:43:34 -04:00
John W. Linville	cabae81103	Merge git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211	2014-05-06 14:05:51 -04:00
Michal Kazior	f04c22033c	cfg80211: export interface stopping function This exports a new cfg80211_stop_iface() function. This is intended for driver internal interface combination management and channel switching. Due to locking issues (it re-enters driver) the call is asynchronous and uses cfg80211 event list/worker. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-06 15:16:34 +02:00
Michal Kazior	66199506fb	mac80211: split CSA finalize function Improves readability and modularity. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-06 15:10:00 +02:00
Michal Kazior	59af6928d2	mac80211: fix CSA tx queue stopping It was possible for tx queues to be stuck stopped if AP CSA finalization failed. In that case neither stop_ap nor do_stop woke the queues up. This means it was impossible to perform tx at all until driver was reloaded or a successful CSA was performed later. It was possible to solve this in a simpler manner however this is more robust and future proof (having multi-vif CSA in mind). New sdata->csa_block_tx is introduced to keep track of which interfaces requested tx to be blocked for CSA. This is required because mac80211 stops all tx queues for that purpose. This means queues must be awoken only when last tx-blocking CSA interface is finished. It is still possible to have tx queues stopped after CSA failure but as soon as offending interfaces are stopped from userspace (stop_ap or ifdown) tx queues are woken up properly. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-06 15:10:00 +02:00
Steffen Klassert	edb666f07e	xfrm6: Properly handle unsupported protocols We don't catch the case if an unsupported protocol is submitted to the xfrm6 protocol handlers, this can lead to NULL pointer dereferences. Fix this by adding the appropriate checks. Fixes: `7e14ea15` ("xfrm6: Add IPsec protocol multiplexer") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-05-06 07:08:38 +02:00
Tom Herbert	79e0f1c9f2	ipv6: Need to sock_put on csum error Commit `4068579e1e` ("net: Implmement RFC 6936 (zero RX csums for UDP/IPv6)") introduced zero checksums being allowed for IPv6, but in the case that a socket disallows a zero checksum on RX we need to sock_put. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 23:17:16 -04:00
Libor Pechacek	86aae6c7b5	Bluetooth: Convert RFCOMM spinlocks into mutexes Enabling CONFIG_DEBUG_ATOMIC_SLEEP has shown that some rfcomm functions acquiring spinlocks call sleeping locks further in the chain. Converting the offending spinlocks into mutexes makes sleeping safe. Signed-off-by: Libor Pechacek <lpechacek@suse.cz> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-05-05 19:25:06 -07:00
Linus Torvalds	2080cee435	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) e1000e computes header length incorrectly wrt vlans, fix from Vlad Yasevich. 2) ns_capable() check in sock_diag netlink code, from Andrew Lutomirski. 3) Fix invalid queue pairs handling in virtio_net, from Amos Kong. 4) Checksum offloading busted in sxgbe driver due to incorrect descriptor layout, fix from Byungho An. 5) Fix build failure with SMC_DEBUG set to 2 or larger, from Zi Shen Lim. 6) Fix uninitialized A and X registers in BPF interpreter, from Alexei Starovoitov. 7) Fix arch dependencies of candence driver. 8) Fix netlink capabilities checking tree-wide, from Eric W Biederman. 9) Don't dump IFLA_VF_PORTS if netlink request didn't ask for it in IFLA_EXT_MASK, from David Gibson. 10) IPV6 FIB dump restart doesn't handle table changes that happen meanwhile, causing the code to loop forever or emit dups, fix from Kumar Sandararajan. 11) Memory leak on VF removal in bnx2x, from Yuval Mintz. 12) Bug fixes for new Altera TSE driver from Vince Bridgers. 13) Fix route lookup key in SCTP, from Xugeng Zhang. 14) Use BH blocking spinlocks in SLIP, as per a similar fix to CAN/SLCAN driver. From Oliver Hartkopp. 15) TCP doesn't bump retransmit counters in some code paths, fix from Eric Dumazet. 16) Clamp delayed_ack in tcp_cubic to prevent theoretical divides by zero. Fix from Liu Yu. 17) Fix locking imbalance in error paths of HHF packet scheduler, from John Fastabend. 18) Properly reference the transport module when vsock_core_init() runs, from Andy King. 19) Fix buffer overflow in cdc_ncm driver, from Bjørn Mork. 20) IP_ECN_decapsulate() doesn't see a correct SKB network header in ip_tunnel_rcv(), fix from Ying Cai. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (132 commits) net: macb: Fix race between HW and driver net: macb: Remove 'unlikely' optimization net: macb: Re-enable RX interrupt only when RX is done net: macb: Clear interrupt flags net: macb: Pass same size to DMA_UNMAP as used for DMA_MAP ip_tunnel: Set network header properly for IP_ECN_decapsulate() e1000e: Restrict MDIO Slow Mode workaround to relevant parts e1000e: Fix issue with link flap on 82579 e1000e: Expand workaround for 10Mb HD throughput bug e1000e: Workaround for dropped packets in Gig/100 speeds on 82579 net/mlx4_core: Don't issue PCIe speed/width checks for VFs net/mlx4_core: Load the Eth driver first net/mlx4_core: Fix slave id computation for single port VF net/mlx4_core: Adjust port number in qp_attach wrapper when detaching net: cdc_ncm: fix buffer overflow Altera TSE: ALTERA_TSE should depend on HAS_DMA vsock: Make transport the proto owner net: sched: lock imbalance in hhf qdisc net: mvmdio: Check for a valid interrupt instead of an error net phy: Check for aneg completion before setting state to PHY_RUNNING ...	2014-05-05 15:59:46 -07:00
Linus Torvalds	5575eeb7b9	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fixes from Sage Weil: "First, there is a critical fix for the new primary-affinity function that went into -rc1. The second batch of patches from Zheng fix a range of problems with directory fragmentation, readdir, and a few odds and ends for cephfs" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: reserve caps for file layout/lock MDS requests ceph: avoid releasing caps that are being used ceph: clear directory's completeness when creating file libceph: fix non-default values check in apply_primary_affinity() ceph: use fpos_cmp() to compare dentry positions ceph: check directory's completeness before emitting directory entry	2014-05-05 15:17:02 -07:00
Ying Xue	52ff872055	tipc: purge signal handler infrastructure In the previous commits of this series, we removed all asynchronous actions which were based on the tasklet handler - "tipc_k_signal()". So the moment has now come when we can completely remove the tasklet handler infrastructure. That is done with this commit. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 17:26:45 -04:00
Ying Xue	3f5a12bd9f	tipc: avoid to asynchronously reset all links Postpone the actions of resetting all links until after bclink lock is released, avoiding to asynchronously reset all links. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 17:26:45 -04:00
Ying Xue	eb8b00f5f2	tipc: convert allocations of global variables associated with bclink Convert allocations of global variables associated with bclink from static way to dynamical way for the convenience of bclink instance initialisation. Meanwhile, this also helps TIPC support name space in the future easily. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 17:26:45 -04:00
Ying Xue	d69afc90b8	tipc: define new functions to operate bc_lock As we are going to do more jobs when bc_lock is released, the two operations of holding/releasing the lock should be encapsulated with functions. In addition, we move bc_lock spin lock into tipc_bclink structure avoiding to define the global variable. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 17:26:44 -04:00
Ying Xue	ca0c42732c	tipc: avoid to asynchronously deliver name tables to peer node Postpone the actions of delivering name tables until after node lock is released, avoiding to do it under asynchronous context. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 17:26:44 -04:00
Ying Xue	9d56194968	tipc: remove TIPC_NAMES_GONE node flag Since previously what all publications pertaining to the lost node were removed from name table was finished in tasklet context asynchronously, we need to TIPC_NAMES_GONE flag indicating whether the node cleanup work is finished or not. But now as the cleanup work has been finished when node lock is released, the flag becomes meaningless for us. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 17:26:44 -04:00
Ying Xue	9db9fdd198	tipc: avoid to asynchronously notify subscriptions Postpone the actions of notifying subscriptions until after node lock is released, avoiding to asynchronously execute registered handlers when node is lost. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 17:26:44 -04:00
Ying Xue	10f465c496	tipc: rename setup_blocked variable of node struct to flags Rename setup_blocked variable of node struct to a more common name called "flags", which will be used to represent kinds of node states. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 17:26:44 -04:00
Ying Xue	486f930ac5	tipc: adjust order of variables in tipc_node structure Move more frequently used variables up to the head of tipc_node structure, hopefully improving a bit performance. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 17:26:44 -04:00
Ying Xue	5356f3d7d4	tipc: always use tipc_node_lock() to hold node lock Although we obtain node lock with tipc_node_lock() in most time, there are still places where we directly use native spin lock interface to grab node lock. But as we will do more jobs in the future when node lock is released, we should ensure that tipc_node_lock() is always called when node lock is taken. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 17:26:43 -04:00
Ying Cai	e96f2e7c43	ip_tunnel: Set network header properly for IP_ECN_decapsulate() In ip_tunnel_rcv(), set skb->network_header to inner IP header before IP_ECN_decapsulate(). Without the fix, IP_ECN_decapsulate() takes outer IP header as inner IP header, possibly causing error messages or packet drops. Note that this skb_reset_network_header() call was in this spot when the original feature for checking consistency of ECN bits through tunnels was added in `eccc1bb8d4` ("tunnel: drop packet if ECN present with not-ECT"). It was only removed from this spot in `3d7b46cd20` ("ip_tunnel: push generic protocol handling to ip_tunnel module."). Fixes: `3d7b46cd20` ("ip_tunnel: push generic protocol handling to ip_tunnel module.") Reported-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Ying Cai <ycai@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 16:32:17 -04:00
WANG Cong	07c8e35a38	ipv6: remove unused function ipv6_inherit_linklocal() It is no longer used after commit `e837735ec4` (ip6_tunnel: ensure to always have a link local address). Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 15:30:20 -04:00
Tom Herbert	4068579e1e	net: Implmement RFC 6936 (zero RX csums for UDP/IPv6) RFC 6936 relaxes the requirement of RFC 2460 that UDP/IPv6 packets which are received with a zero UDP checksum value must be dropped. RFC 6936 allows zero checksums to support tunnels over UDP. When sk_no_check is set we allow on a socket we allow a zero IPv6 UDP checksum. This is for both sending zero checksum and accepting a zero checksum on receive. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 15:26:30 -04:00
Tom Herbert	e4f45b7f40	net: Call skb_checksum_init in IPv6 Call skb_checksum_init instead of private functions. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 15:26:30 -04:00
Tom Herbert	ed70fcfcee	net: Call skb_checksum_init in IPv4 Call skb_checksum_init instead of private functions. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 15:26:30 -04:00
David S. Miller	2ad0649687	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next John W. Linville says: ==================== pull request: wireless-next 2014-05-02 Please pull this batch of updates intended for the 3.16 stream... For the mac80211 bits, Johannes says: "In this round we have a large number of small features and improvements from people too numerous to list here. The only really bit thing is Michał and Luca's CSA work (including changing how interface combination verification is done)." For the Bluetooth bits, Gustavo says: "Here goes some patches for the -next release. There is nothing really special for this pull request, just a bunch of refactors, fixes and clean ups." For the ath10k/ath6kl bits, Kalle says: "For ath6kl Kalle fixed a bunch of checkpatch warnings. In ath10k we had more changes, major ones being: * fix memory allocation failures after a firmware crash (Michal) * some rework of DFS configuration to enable it correctly in all cases (Michal) * add a new firmware crash option to make it possible to crash 10.1 firmware for testing purposes (Marek P) * fix RTS/CTS protection in certain cases (Marek K) * fix wrong RSSI and rate reporting in some cases (Janusz) * fix firmware stats reporting (Chun, Ben & Bartosz)" For the iwlwifi bits, Emmanuel says: "I have here a bunch of unrelated things. I disabled support for -7.ucode which means that I can removed a lot of code. Eliad has a brand new feature: we reduce the Tx power when the link allows - this reduces our power consumption. The regular changes in power and scan area. One interesting thing though is the patches from Johannes, we have now GRO which allows to increase our throughput in TCP Rx. The main advantage is that it reduces the number of TCP Acks - these TCP Acks are completely useless when we are using A-MPDU since the first packet of the A-MPDU generates a TCP Ack which is made obsolete by the next packets." Along with that, there are a variety of updates to b43, mwifiex, rtl8180 and wil6210 drivers and a handful of other updates here and there. Please let me know if there are problems! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 13:36:26 -04:00
Andy King	2c4a336e0a	vsock: Make transport the proto owner Right now the core vsock module is the owner of the proto family. This means there's nothing preventing the transport module from unloading if there are open sockets, which results in a panic. Fix that by allowing the transport to be the owner, which will refcount it properly. Includes version bump to 1.0.1.0-k Passes checkpatch this time, I swear... Acked-by: Dmitry Torokhov <dtor@vmware.com> Signed-off-by: Andy King <acking@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 13:13:50 -04:00
Roopa Prabhu	56bfa7ee7c	unregister_netdevice : move RTM_DELLINK to until after ndo_uninit This patch fixes ordering of rtnl notifications during unregister_netdevice by moving RTM_DELLINK notification to until after ndo_uninit. The problem was seen with unregistering bond netdevices. bond ndo_uninit callback generates a few RTM_NEWLINK notifications for NETDEV_CHANGEADDR and NETDEV_FEAT_CHANGE. This is seen mostly when the bond is deleted with slaves still enslaved to the bond. During unregister netdevice (rollback_registered_many to be specific) bond ndo_uninit is called after RTM_DELLINK notification goes out. This results in userspace seeing RTM_DELLINK followed by a couple of RTM_NEWLINK's. In userspace problem was seen with libnl. libnl cache deletes the bond when it sees RTM_DELLINK and re-adds the bond with the following RTM_NEWLINK. Resulting in a stale bond entry in libnl cache when the kernel has already deleted the bond. This patch has been tested for bond, bridges and vlan devices. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 13:11:36 -04:00
David S. Miller	b8dff4e60c	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== pull request: wireless 2014-05-01 Please pull the following batch of fixes intended for the 3.15 stream! For the Bluetooth bits, Gustavo says: "Some fixes for 3.15. There is a revert for the intel driver, a new device id, and two important SSP fixes from Johan." On top of that... Ben Hutchings gives us a fix for an unbalanced irq enable in an rtl8192cu error path. Colin Ian King provides an rtlwifi fix for an uninitialized variable. Felix Fietkau brings a pair of ath9k fixes, one that corrects a hardware initialization value and another that removes an (unnecessary) flag that was being used in a way that led to a software tx queue hang in ath9k. Gertjan van Wingerde pushes a MAINTAINERS change to remove himself from the rt2x00 maintainer team. Hans de Goede fixes a brcmfmac firmware load hang. Larry Finger changes rtlwifi to use the correct queue for V0 traffic on rtl8192se. Rajkumar Manoharan corrects a race in ath9k driver initialization. Stanislaw Gruszka fixes an rt2x00 bug in which disabling beaconing once on USB devices led to permanently disabling beaconing for those devices. Tim Harvey provides fixes for a pair of ath9k issues that can lead to soft lockups in that driver. Please let me know if there are problems! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 13:06:01 -04:00
Vasily Averin	aff09ce303	bridge: superfluous skb->nfct check in br_nf_dev_queue_xmit Currently bridge can silently drop ipv4 fragments. If node have loaded nf_defrag_ipv4 module but have no nf_conntrack_ipv4, br_nf_pre_routing defragments incoming ipv4 fragments but nfct check in br_nf_dev_queue_xmit does not allow re-fragment combined packet back, and therefore it is dropped in br_dev_queue_push_xmit without incrementing of any failcounters It seems the only way to hit the ip_fragment code in the bridge xmit path is to have a fragment list whose reassembled fragments go over the mtu. This only happens if nf_defrag is enabled. Thanks to Florian Westphal for providing feedback to clarify this. Defragmentation ipv4 is required not only in conntracks but at least in TPROXY target and socket match, therefore #ifdef is changed from NF_CONNTRACK_IPV4 to NF_DEFRAG_IPV4 Signed-off-by: Vasily Averin <vvs@openvz.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-05 16:05:43 +02:00
Johannes Berg	33926eb778	mac80211: mark local variable __maybe_unused The 'local' variable in __ieee80211_vif_copy_chanctx_to_vlans() is only used/needed when lockdep is compiled in, mark it as such to avoid compile warnings in the other case. While at it, fix some indentation where it's used. Reviewed-by: Luciano Coelho <luciano.coelho@intel.com> Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-05 16:03:42 +02:00
Vasily Averin	7c3d5ab1f3	ipv4: fix "conntrack zones" support for defrag user check in ip_expire Defrag user check in ip_expire was not updated after adding support for "conntrack zones". This bug manifests as a RFC violation, since the router will send the icmp time exceeeded message when using conntrack zones. Signed-off-by: Vasily Averin <vvs@openvz.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-05 16:02:59 +02:00
Arik Nemtsov	95224fe83e	mac80211: move TDLS code to another file With new additions planned, this code is getting too big for cfg.c. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-05 15:56:15 +02:00
Arik Nemtsov	0c4972ccaa	mac80211: set an external flag for TDLS stations Expose a new tdls flag for the public ieee80211_sta struct. This can be used in some rate control decisions. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-05 15:56:02 +02:00
Eliad Peller	e669ba2d06	mac80211: fix nested rtnl locking on ieee80211_reconfig ieee80211_reconfig already holds rtnl, so calling cfg80211_sched_scan_stopped results in deadlock. Use the rtnl-version of this function instead. Fixes: `d43c6b6` ("mac80211: reschedule sched scan after HW restart") Cc: stable@vger.kernel.org (3.14+) Signed-off-by: Eliad Peller <eliadx.peller@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-05 15:14:58 +02:00
Eliad Peller	792e6aa7a1	cfg80211: add cfg80211_sched_scan_stopped_rtnl Add locked-version for cfg80211_sched_scan_stopped. This is used for some users that might want to call it when rtnl is already locked. Fixes: `d43c6b6` ("mac80211: reschedule sched scan after HW restart") Cc: stable@vger.kernel.org (3.14+) Signed-off-by: Eliad Peller <eliadx.peller@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-05 15:14:57 +02:00
Eliad Peller	c1fbb25884	cfg80211: free sme on connection failures cfg80211 is notified about connection failures by __cfg80211_connect_result() call. However, this function currently does not free cfg80211 sme. This results in hanging connection attempts in some cases e.g. when mac80211 authentication attempt is denied, we have this function call: ieee80211_rx_mgmt_auth() -> cfg80211_rx_mlme_mgmt() -> cfg80211_process_auth() -> cfg80211_sme_rx_auth() -> __cfg80211_connect_result() but cfg80211_sme_free() is never get called. Fixes: `ceca7b712` ("cfg80211: separate internal SME implementation") Cc: stable@vger.kernel.org (3.10+) Signed-off-by: Eliad Peller <eliadx.peller@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-05 14:59:00 +02:00
Henning Rogge	f4ebddf9ab	mac80211: Fix mac80211 station info rx bitrate for IBSS mode Filter out incoming multicast packages before applying their bitrate to the rx bitrate station info field to prevent them from setting the rx bitrate to the basic multicast rate. Signed-off-by: Henning Rogge <hrogge@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-05-05 14:52:03 +02:00
Daniel Borkmann	eb9672f4a1	net: filter: misc/various cleanups This contains only some minor misc cleanpus. We can spare us the extra variable declaration in __skb_get_pay_offset(), the cast in __get_random_u32() is rather unnecessary and in __sk_migrate_realloc() we can remove the memcpy() and do a direct assignment of the structs. Latter was suggested by Fengguang Wu found with coccinelle. Also, remaining pointer casts of long should be unsigned long instead. Suggested-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-04 19:46:31 -04:00
Daniel Borkmann	30743837dd	net: filter: make register naming more comprehensible The current code is a bit hard to parse on which registers can be used, how they are mapped and all play together. It makes much more sense to define this a bit more clearly so that the code is a bit more intuitive. This patch cleans this up, and makes naming a bit more consistent among the code. This also allows for moving some of the defines into the header file. Clearing of A and X registers in __sk_run_filter() do not get a particular register name assigned as they have not an 'official' function, but rather just result from the concrete initial mapping of old BPF programs. Since for BPF helper functions for BPF_CALL we already use small letters, so be consistent here as well. No functional changes. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-04 19:46:31 -04:00
Daniel Borkmann	5bcfedf06f	net: filter: simplify label names from jump-table This patch simplifies label naming for the BPF jump-table. When we define labels via DL(), we just concatenate/textify the combination of instruction opcode which consists of the class, subclass, word size, target register and so on. Each time we leave BPF_ prefix intact, so that e.g. the preprocessor generates a label BPF_ALU_BPF_ADD_BPF_X for DL(BPF_ALU, BPF_ADD, BPF_X) whereas a label name of ALU_ADD_X is much more easy to grasp. Pure cleanup only. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-04 19:46:31 -04:00
John Fastabend	f6a082fed1	net: sched: lock imbalance in hhf qdisc hhf_change() takes the sch_tree_lock and releases it but misses the error cases. Fix the missed case here. To reproduce try a command like this, # tc qdisc change dev p3p2 root hhf quantum 40960 non_hh_weight 300000 Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-04 19:41:45 -04:00
Denys Fedoryshchenko	ecd15dd7e4	netfilter: nfnetlink: Fix use after free when it fails to process batch This bug manifests when calling the nft command line tool without nf_tables kernel support. kernel message: [ 44.071555] Netfilter messages via NETLINK v0.30. [ 44.072253] BUG: unable to handle kernel NULL pointer dereference at 0000000000000119 [ 44.072264] IP: [<ffffffff8171db1f>] netlink_getsockbyportid+0xf/0x70 [ 44.072272] PGD 7f2b74067 PUD 7f2b73067 PMD 0 [ 44.072277] Oops: 0000 [#1] SMP [...] [ 44.072369] Call Trace: [ 44.072373] [<ffffffff8171fd81>] netlink_unicast+0x91/0x200 [ 44.072377] [<ffffffff817206c9>] netlink_ack+0x99/0x110 [ 44.072381] [<ffffffffa004b951>] nfnetlink_rcv+0x3c1/0x408 [nfnetlink] [ 44.072385] [<ffffffff8171fde3>] netlink_unicast+0xf3/0x200 [ 44.072389] [<ffffffff817201ef>] netlink_sendmsg+0x2ff/0x740 [ 44.072394] [<ffffffff81044752>] ? __mmdrop+0x62/0x90 [ 44.072398] [<ffffffff816dafdb>] sock_sendmsg+0x8b/0xc0 [ 44.072403] [<ffffffff812f1af5>] ? copy_user_enhanced_fast_string+0x5/0x10 [ 44.072406] [<ffffffff816dbb6c>] ? move_addr_to_kernel+0x2c/0x50 [ 44.072410] [<ffffffff816db423>] ___sys_sendmsg+0x3c3/0x3d0 [ 44.072415] [<ffffffff811301ba>] ? handle_mm_fault+0xa9a/0xc60 [ 44.072420] [<ffffffff811362d6>] ? mmap_region+0x166/0x5a0 [ 44.072424] [<ffffffff817da84c>] ? __do_page_fault+0x1dc/0x510 [ 44.072428] [<ffffffff812b8b2c>] ? apparmor_capable+0x1c/0x60 [ 44.072435] [<ffffffff817d6e9a>] ? _raw_spin_unlock_bh+0x1a/0x20 [ 44.072439] [<ffffffff816dfc86>] ? release_sock+0x106/0x150 [ 44.072443] [<ffffffff816dc212>] __sys_sendmsg+0x42/0x80 [ 44.072446] [<ffffffff816dc262>] SyS_sendmsg+0x12/0x20 [ 44.072450] [<ffffffff817df616>] system_call_fastpath+0x1a/0x1f Signed-off-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-04 15:14:08 +02:00
Florian Westphal	895162b110	netfilter: ipv4: defrag: set local_df flag on defragmented skb else we may fail to forward skb even if original fragments do fit outgoing link mtu: 1. remote sends 2k packets in two 1000 byte frags, DF set 2. we want to forward but only see '2k > mtu and DF set' 3. we then send icmp error saying that outgoing link is 1500 But original sender never sent a packet that would not fit the outgoing link. Setting local_df makes outgoing path test size vs. IPCB(skb)->frag_max_size, so we will still send the correct error in case the largest original size did not fit outgoing link mtu. Reported-by: Maxime Bizon <mbizon@freebox.fr> Suggested-by: Maxime Bizon <mbizon@freebox.fr> Fixes: `5f2d04f1f9` (ipv4: fix path MTU discovery with connection tracking) Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-04 13:23:28 +02:00
Eric Dumazet	249015515f	tcp: remove in_flight parameter from cong_avoid() methods Commit `e114a710aa` ("tcp: fix cwnd limited checking to improve congestion control") obsoleted in_flight parameter from tcp_is_cwnd_limited() and its callers. This patch does the removal as promised. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-03 19:23:07 -04:00
Eric Dumazet	e114a710aa	tcp: fix cwnd limited checking to improve congestion control Yuchung discovered tcp_is_cwnd_limited() was returning false in slow start phase even if the application filled the socket write queue. All congestion modules take into account tcp_is_cwnd_limited() before increasing cwnd, so this behavior limits slow start from probing the bandwidth at full speed. The problem is that even if write queue is full (aka we are _not_ application limited), cwnd can be under utilized if TSO should auto defer or TCP Small queues decided to hold packets. So the in_flight can be kept to smaller value, and we can get to the point tcp_is_cwnd_limited() returns false. With TCP Small Queues and FQ/pacing, this issue is more visible. We fix this by having tcp_cwnd_validate(), which is supposed to track such things, take into account unsent_segs, the number of segs that we are not sending at the moment due to TSO or TSQ, but intend to send real soon. Then when we are cwnd-limited, remember this fact while we are processing the window of ACKs that comes back. For example, suppose we have a brand new connection with cwnd=10; we are in slow start, and we send a flight of 9 packets. By the time we have received ACKs for all 9 packets we want our cwnd to be 18. We implement this by setting tp->lsnd_pending to 9, and considering ourselves to be cwnd-limited while cwnd is less than twice tp->lsnd_pending (2*9 -> 18). This makes tcp_is_cwnd_limited() more understandable, by removing the GSO/TSO kludge, that tried to work around the issue. Note the in_flight parameter can be removed in a followup cleanup patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-02 17:54:35 -04:00
Stéphane Graber	4e8bbb819d	net: Allow tc changes in user namespaces This switches a few remaining capable(CAP_NET_ADMIN) to ns_capable so that root in a user namespace may set tc rules inside that namespace. Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: "David S. Miller" <davem@davemloft.net> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-02 17:43:25 -04:00
John W. Linville	406a94d7fa	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem	2014-05-02 13:47:50 -04:00
John W. Linville	812e4dafa4	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem	2014-05-01 11:23:21 -04:00
Liu Yu	0cda345d1b	tcp_cubic: fix the range of delayed_ack commit `b9f47a3aae` (tcp_cubic: limit delayed_ack ratio to prevent divide error) try to prevent divide error, but there is still a little chance that delayed_ack can reach zero. In case the param cnt get negative value, then ratio+cnt would overflow and may happen to be zero. As a result, min(ratio, ACK_RATIO_LIMIT) will calculate to be zero. In some old kernels, such as 2.6.32, there is a bug that would pass negative param, which then ultimately leads to this divide error. commit `5b35e1e6e9` (tcp: fix tcp_trim_head() to adjust segment count with skb MSS) fixed the negative param issue. However, it's safe that we fix the range of delayed_ack as well, to make sure we do not hit a divide by zero. CC: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Liu Yu <allanyuliu@tencent.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-30 16:12:22 -04:00
Eric Dumazet	fc9f350106	tcp: increment retransmit counters in tlp and fast open Both TLP and Fast Open call __tcp_retransmit_skb() instead of tcp_retransmit_skb() to avoid changing tp->retrans_out. This has the side effect of missing SNMP counters increments as well as tcp_info tcpi_total_retrans updates. Fix this by moving the stats increments of into __tcp_retransmit_skb() Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Nandita Dukkipati <nanditad@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-30 16:12:22 -04:00
Ying Xue	1621b94d2a	tipc: fix memory leak of publications Commit `1bb8dce57f` ("tipc: fix memory leak during module removal") introduced a memory leak issue: when name table is stopped, it's forgotten that publication instances are freed properly. Additionally the useless "continue" statement in tipc_nametbl_stop() is removed as well. Reported-by: Jason <huzhijiang@gmail.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Acked-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-30 13:31:26 -04:00
Lorenzo Colitti	5c98631cca	net: ipv6: Introduce ip6_sk_dst_hoplimit. This replaces 6 identical code snippets with a call to a new static inline function. Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-30 13:31:26 -04:00
John W. Linville	f6595444c1	Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Conflicts: net/mac80211/chan.c	2014-04-30 12:04:27 -04:00
John W. Linville	0006433a5b	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next	2014-04-30 11:56:43 -04:00
Florian Westphal	f768e5bdef	netfilter: add helper for adding nat extension Reduce copy-past a bit by adding a common helper. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-29 20:56:22 +02:00
Florian Westphal	fe337ac283	netfilter: ctnetlink: don't add null bindings if no nat requested commit `0eba801b64` tried to fix a race where nat initialisation can happen after ctnetlink-created conntrack has been created. However, it causes the nat module(s) to be loaded needlessly on systems that are not using NAT. Fortunately, we do not have to create null bindings in that case. conntracks injected via ctnetlink always have the CONFIRMED bit set, which prevents addition of the nat extension in nf_nat_ipv4/6_fn(). We only need to make sure that either no nat extension is added or that we've created both src and dst manips. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-29 20:49:08 +02:00
Mathieu Poirier	683399eddb	netfilter: nfnetlink_acct: Adding quota support to accounting framework nfacct objects already support accounting at the byte and packet level. As such it is a natural extension to add the possiblity to define a ceiling limit for both metrics. All the support for quotas itself is added to nfnetlink acctounting framework to stay coherent with current accounting object management. Quota limit checks are implemented in xt_nfacct filter where statistic collection is already done. Pablo Neira Ayuso has also contributed to this feature. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-29 18:25:14 +02:00
Johannes Berg	8c5bb1fad0	mac80211: remove BUG_ON usage These BUG_ON statements should never trigger, but in the unlikely event that somebody does manage don't stop everything but simply exit the code path with an error. Leave the one BUG_ON where changing it would result in a NULL pointer dereference. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-29 17:59:27 +02:00
Johannes Berg	2fd0511556	cfg80211: remove BUG_ON usage These really can't trigger unless somebody messes up the code, but don't make debugging it needlessly complicated, WARN and return instead of BUG_ON(). Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-29 17:59:19 +02:00
Steffen Klassert	61622cc6f2	xfrm4: Properly handle unsupported protocols We don't catch the case if an unsupported protocol is submitted to the xfrm4 protocol handlers, this can lead to NULL pointer dereferences. Fix this by adding the appropriate checks. Fixes: `3328715e` ("xfrm4: Add IPsec protocol multiplexer") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-04-29 08:41:12 +02:00
Ilya Dryomov	92b2e75158	libceph: fix non-default values check in apply_primary_affinity() osd_primary_affinity array is indexed into incorrectly when checking for non-default primary-affinity values. This nullifies the impact of the rest of the apply_primary_affinity() and results in misdirected requests. if (osds[i] != CRUSH_ITEM_NONE && osdmap->osd_primary_affinity[i] != ^^^ CEPH_OSD_DEFAULT_PRIMARY_AFFINITY) { For a pool with size 2, this always ends up checking osd0 and osd1 primary_affinity values, instead of the values that correspond to the osds in question. E.g., given a [2,3] up set and a [max,max,0,max] primary affinity vector, requests are still sent to osd2, because both osd0 and osd1 happen to have max primary_affinity values and therefore we return from apply_primary_affinity() early on the premise that all osds in the given set have max (default) values. Fix it. Fixes: http://tracker.ceph.com/issues/7954 Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2014-04-28 12:54:10 -07:00
Ying Xue	eab8c04573	tipc: move the delivery of named messages out of nametbl lock Commit `a89778d8ba` ("tipc: add support for link state subscriptions") introduced below possible deadlock scenario: CPU0 CPU1 T0: tipc_publish() link_timeout() T1: tipc_nametbl_publish() [grab node lock]* T2: [grab nametbl write lock]* link_state_event() T3: named_cluster_distribute() link_activate() T4: [grab node lock]* tipc_node_link_up() T5: tipc_nametbl_publish() T6: [grab nametble write lock]* The opposite order of holding nametbl write lock and node lock on above two different paths may result in a deadlock. If we move the the delivery of named messages via link out of name nametbl lock, the reverse order of holding locks will be eliminated, as a result, the deadlock will be killed as well. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-28 14:49:54 -04:00
Julian Anastasov	e374c618b1	net: ipv6: more places need LOOPBACK_IFINDEX for flowi6_iif To properly match iif in ip rules we have to provide LOOPBACK_IFINDEX in flowi6_iif, not 0. Some ip6mr_fib_lookup and fib6_rule_lookup callers need such fix. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-28 14:47:03 -04:00
Erik Hugne	d7bb74c38c	tipc: fix out of bounds indexing Commit `78acb1f9b8` ("tipc: add ioctl to fetch link names") introduced a buffer overflow bug where specially crafted ioctl requests could cause out-of-bounds indexing of the node->links array. This was caused by an incorrect check vs MAX_BEARERS, and the static code checker complaint is: net/tipc/node.c:459 tipc_node_get_linkname() error: buffer overflow 'node->links' 2 <= 2 Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-28 14:43:35 -04:00
Hisao Tanabe	5a2b646ffe	ipv4: Use predefined value for readability Signed-off-by: Hisao Tanabe <xtanabe@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-28 13:28:43 -04:00
Jean Sacren	266a164684	ethtool: exit the loop when invalid index occurs The commit `3de0b59239` ("ethtool: Support for configurable RSS hash key") introduced a new function ethtool_copy_validate_indir() with full iteration of the loop to validate the ring indices, which could be an overkill. To minimize the impact, we ought to exit the loop as soon as the invalid index occurs for the very first time. The remaining loop simply doesn't serve any more purpose. Signed-off-by: Jean Sacren <sakiwit@gmail.com> Cc: Venkata Duvvuru <VenkatKumar.Duvvuru@Emulex.Com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-28 13:28:43 -04:00
Jouni Malinen	3b1700bde4	mac80211: Support dynamic AP mode channel width changes Implement the new cfg80211 capability to enable mac80211-based drivers to support for dynamic channel bandwidth changes (e.g., HT 20/40 MHz changes). Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-28 18:11:55 +02:00
Jouni Malinen	e16821bcfb	cfg80211: Dynamic channel bandwidth changes in AP mode This extends NL80211_CMD_SET_CHANNEL to allow dynamic channel bandwidth changes in AP mode (including P2P GO) during a lifetime of the BSS. This can be used to implement, e.g., HT 20/40 MHz co-existence rules on the 2.4 GHz band. Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-28 18:09:59 +02:00
Zhao, Gang	b205786e38	mac80211: remove unnecessary assignment P2P_DEVICE doesn't support ieee80211_bss_info_change_notify() for now, so it's not needed to set changed flags for P2P_DEVICE. Signed-off-by: Zhao, Gang <gamerh2o@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-28 17:53:19 +02:00
Zhao, Gang	7df180f7f1	mac80211: avoid calling useless channel context code ieee80211_assign_chanctx() checks if local->use_chanctx is true, so the two code block related to ieee80211_assign_chanctx() can be moved into above if clause, emphasize that these code are executed only if local->use_chanctx is true. Signed-off-by: Zhao, Gang <gamerh2o@gmail.com> [change subject] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-28 17:52:32 +02:00
Pablo Neira	4c1f7818e4	netfilter: nf_tables: relax string validation of NFTA_CHAIN_TYPE Use NLA_STRING for consistency with other string attributes in nf_tables. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-28 16:54:15 +02:00
Cong Wang	a49eb42a34	sched, act: allow to clear all actions as well When we change the list of action on a given filter, currently we don't change it to empty. This is a bug, we should allow to change to whatever users given. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-27 23:42:39 -04:00
Cong Wang	2f7ef2f879	sched, cls: check if we could overwrite actions when changing a filter When actions are attached to a filter, they are a part of the filter itself, so when changing a filter we should allow to overwrite the actions inside as well. In my specific case, when I tried to _append_ a new action to an existing filter which already has an action, I got EEXIST since kernel refused to overwrite the existing one in kernel. This patch checks if we are changing the filter checking NLM_F_CREATE flag (Sigh, filters don't use NLM_F_REPLACE...) and then passes the boolean down to actions. This fixes the problem above. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-27 23:42:39 -04:00
Karl Heiss	8c2eab9097	net: sctp: Don't transition to PF state when transport has exhausted 'Path.Max.Retrans'. Don't transition to the PF state on every strike after 'Path.Max.Retrans'. Per draft-ietf-tsvwg-sctp-failover-03 Section 5.1.6: Additional (PMR - PFMR) consecutive timeouts on a PF destination confirm the path failure, upon which the destination transitions to the Inactive state. As described in [RFC4960], the sender (i) SHOULD notify ULP about this state transition, and (ii) transmit heartbeats to the Inactive destination at a lower frequency as described in Section 8.3 of [RFC4960]. This also prevents sending SCTP_ADDR_UNREACHABLE to the user as the state bounces between SCTP_INACTIVE and SCTP_PF for each subsequent strike. Signed-off-by: Karl Heiss <kheiss@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-27 23:41:14 -04:00
Xufeng Zhang	8535087131	sctp: reset flowi4_oif parameter on route lookup commit `813b3b5db8` (ipv4: Use caller's on-stack flowi as-is in output route lookups.) introduces another regression which is very similar to the problem of commit `e6b45241c` (ipv4: reset flowi parameters on route connect) wants to fix: Before we call ip_route_output_key() in sctp_v4_get_dst() to get a dst that matches a bind address as the source address, we have already called this function previously and the flowi parameters have been initialized including flowi4_oif, so when we call this function again, the process in __ip_route_output_key() will be different because of the setting of flowi4_oif, and we'll get a networking device which corresponds to the inputted flowi4_oif as the output device, this is wrong because we'll never hit this place if the previously returned source address of dst match one of the bound addresses. To reproduce this problem, a vlan setting is enough: # ifconfig eth0 up # route del default # vconfig add eth0 2 # vconfig add eth0 3 # ifconfig eth0.2 10.0.1.14 netmask 255.255.255.0 # route add default gw 10.0.1.254 dev eth0.2 # ifconfig eth0.3 10.0.0.14 netmask 255.255.255.0 # ip rule add from 10.0.0.14 table 4 # ip route add table 4 default via 10.0.0.254 src 10.0.0.14 dev eth0.3 # sctp_darn -H 10.0.0.14 -P 36422 -h 10.1.4.134 -p 36422 -s -I You'll detect that all the flow are routed to eth0.2(10.0.1.254). Signed-off-by: Xufeng Zhang <xufeng.zhang@windriver.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-27 19:46:17 -04:00
Toshiaki Makita	30313a3d57	bridge: Handle IFLA_ADDRESS correctly when creating bridge device When bridge device is created with IFLA_ADDRESS, we are not calling br_stp_change_bridge_id(), which leads to incorrect local fdb management and bridge id calculation, and prevents us from receiving frames on the bridge device. Reported-by: Tom Gundersen <teg@jklm.no> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-27 19:46:17 -04:00
Ying Xue	22e7987ae7	tipc: fix a possible memory leak The commit `a8b9b96e95` ("tipc: fix race in disc create/delete") leads to the following static checker warning: net/tipc/discover.c:352 tipc_disc_create() warn: possible memory leak of 'req' The risk of memory leak really exists in practice. Especially when it's failed to allocate memory for "req->buf", tipc_disc_create() doesn't free its allocated memory, instead just directly returns with ENOMEM error code. In this situation, memory leak, of course, happens. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-27 19:08:06 -04:00
xiao jin	851bdd11ca	inetpeer_gc_worker: trivial cleanup Do not initialize list twice. list_replace_init() already takes care of initializing list. We don't need to initialize it with LIST_HEAD() beforehand. Signed-off-by: xiao jin <jin.xiao@intel.com> Reviewed-by: David Cohen <david.a.cohen@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-26 12:52:28 -04:00
xiao jin	1818ce4dc5	net_namespace: trivial cleanup Do not initialize net_kill_list twice. list_replace_init() already takes care of initializing net_kill_list. We don't need to initialize it with LIST_HEAD() beforehand. Signed-off-by: xiao jin <jin.xiao@intel.com> Reviewed-by: David Cohen <david.a.cohen@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-26 12:50:23 -04:00
Erik Hugne	78acb1f9b8	tipc: add ioctl to fetch link names We add a new ioctl for AF_TIPC that can be used to fetch the logical name for a link to a remote node on a given bearer. This should be used in combination with link state subscriptions. The logical name size limit definitions are moved to tipc.h, as they are now also needed by the new ioctl. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-26 12:13:24 -04:00
Erik Hugne	a89778d8ba	tipc: add support for link state subscriptions When links are established over a bearer plane, we create a node local publication containing information about the peer node and bearer plane. This allows TIPC applications to use the standard TIPC topology server subscription mechanism to get notifications when a link goes up or down. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-26 12:13:24 -04:00
Rostislav Lisovy	8eca1fb692	cfg80211: Use 5MHz bandwidth by default when checking usable channels Current code checks if the 20MHz bandwidth is allowed for particular channel -- if it is not, the channel is disabled. Since we need to use 5/10 MHz channels, this code is modified in the way that the default bandwidth to check is 5MHz. If the maximum bandwidth allowed by the channel is smaller than 5MHz, the channel is disabled. Otherwise the channel is used and the flags are set according to the bandwidth allowed by the channel. Signed-off-by: Rostislav Lisovy <rostislav.lisovy@fel.cvut.cz> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:39:32 +02:00
Rostislav Lisovy	ea077c1cea	cfg80211: Add attributes describing prohibited channel bandwidth Since there are frequency bands (e.g. 5.9GHz) allowing channels with only 10 or 5 MHz bandwidth, this patch adds attributes that allow keeping track about this information. When channel attributes are reported to user-space, make sure to not break old tools, i.e. if the 'split wiphy dump' is enabled, report the extra attributes (if present) describing the bandwidth restrictions. If the 'split wiphy dump' is not enabled, completely omit those channels that have flags set to either IEEE80211_CHAN_NO_10MHZ or IEEE80211_CHAN_NO_20MHZ. Add the check for new bandwidth restriction flags in cfg80211_chandef_usable() to comply with the restrictions. Signed-off-by: Rostislav Lisovy <rostislav.lisovy@fel.cvut.cz> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:38:23 +02:00
Zhao, Gang	8bd811aa6c	mac80211: change return value of notifier function Return NOTIFY_DONE if we don't care this time's notification, return NOTIFY_OK if we successfully handled this time's notification. That's the formal way to do it. Signed-off-by: Zhao, Gang <gamerh2o@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:34:25 +02:00
Zhao, Gang	6784c7db8d	cfg80211: change return value of notifier function Return NOTIFY_DONE if we don't care this time's notification, return NOTIFY_OK if we successfully handled this time's notification. That's the formal way to do it. Signed-off-by: Zhao, Gang <gamerh2o@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:33:44 +02:00
Zhao, Gang	f26cbf401b	cfg80211: change wiphy_to_dev function name Name wiphy_to_rdev is more accurate to describe what the function does, i.e., return a pointer pointing to struct cfg80211_registered_device. Signed-off-by: Zhao, Gang <gamerh2o@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:33:04 +02:00
Zhao, Gang	1b8ec87aa0	cfg80211: change registered device pointer name Name "dev" is too common and ambiguous, let all the pointer name pointing to struct cfg80211_registered_device be "rdev". This can improve code readability and consistency(since other places have already called it rdev). Signed-off-by: Zhao, Gang <gamerh2o@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:32:56 +02:00
Zhao, Gang	308f7fcfdb	mac80211: remove unnecessary BUG_ON() The BUG_ON(!err) can't be triggered in the code path, so remove it. Signed-off-by: Zhao, Gang <gamerh2o@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:31:00 +02:00
Zhao, Gang	6b59db7d4c	mac80211: return bool instead of numbers in yes/no function And some code style changes in the function, and correct a typo in comment. Signed-off-by: Zhao, Gang <gamerh2o@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:30:28 +02:00
Marek Kwaczynski	17d38fa8c2	mac80211: add option to generate CCMP IVs only for mgmt frames Some chips can encrypt managment frames in HW, but require generated IV in the frame. Add a key flag that allows us to achieve this. Signed-off-by: Marek Kwaczynski <marek.kwaczynski@tieto.com> [use BIT(0) to fill that spot, fix indentation] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:26:15 +02:00
Michal Kazior	c0166da9fe	mac80211: compute chanctx refcount on-the-fly It doesn't make much sense to store refcount in the chanctx structure. One still needs to hold chanctx_mtx to get the value safely. Besides, refcount isn't on performance critical paths. This will make implementing chanctx reservation refcounting a little easier. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:08:30 +02:00
Michal Kazior	2b32713d72	mac80211: fix racy usage of chanctx->refcount Channel context refcount is protected by chanctx_mtx. Accessing the value without holding the mutex is racy. RCU section didn't guarantee anything here. Theoretically ieee80211_channel_switch() could fail to see refcount change and read "1" instead of, e.g. "2". This means mac80211 could accept CSA even though it shouldn't have. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:08:30 +02:00
Michal Kazior	1f0d54cdcf	mac80211: split ieee80211_free_chanctx() The function did a little too much. Split it up so the code can be easily reused in the future. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:08:30 +02:00
Michal Kazior	ed68ebcaf9	mac80211: split ieee80211_new_chanctx() The function did a little too much. Split it up so the code can be easily reused in the future. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:08:30 +02:00
Michal Kazior	13f348a814	mac80211: improve chanctx reservation lookup Use a separate function to look for reservation chanctx. For multi-interface/channel reservation search sematics differ slightly. The new routine allows reservations to be merged with chanctx that are already reserved by other interface(s). Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:08:30 +02:00
Michal Kazior	0288157b2a	mac80211: improve find_chanctx() for reservations This allows new vifs to be assigned to a chanctx as long as chanctx's reservation chandefs (if any) and chanctx's current chandef (implied by assigned vifs at the time, if any) and the new vif chandef are all compatible. This implies it is impossible to assign a new vif to an in-place reservation chanctx. This gives no advantages for single-channel hardware. It makes sense for multi-channel hardware only. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:08:30 +02:00
Michal Kazior	e3afb92022	mac80211: track reserved vifs in chanctx This can be useful. Provides a more straghtforward way to iterate over interfaces taking part in chanctx reservation and allows tracking chanctx usage explicitly. The structure is protected by local->chanctx_mtx. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:08:29 +02:00
Michal Kazior	484298ad1a	mac80211: track assigned vifs in chanctx This can be useful. Provides a more straghtforward way to iterate over interfaces bound to a given chanctx and allows tracking chanctx usage explicitly. The structure is protected by local->chanctx_mtx. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:08:29 +02:00
Michal Kazior	093324816b	mac80211: add support for radar detection for reservations Initial chanctx reservation code wasn't aware of radar detection requirements. This is necessary for chanctx reservations to be used for channel switching in the future. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:08:15 +02:00
Michal Kazior	c2b90ad880	mac80211: prevent chanctx overcommit Do not allocate more channel contexts than a driver is capable for currently matching interface combination. This allows the ieee80211_vif_reserve_chanctx() to act as a guard against breaking interface combinations. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:08:15 +02:00
Michal Kazior	6fa001bc7e	mac80211: add max channel calculation utility function The utility function has no uses yet. It is aimed at future chanctx reservation management and channel switching. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:08:15 +02:00
Michal Kazior	65a124dd71	cfg80211: allow drivers to iterate over matching combinations The patch splits cfg80211_check_combinations() into an iterator function and a simple iteration user. This makes it possible for drivers to asses how many channels can use given iftype setup. This in turn can be used for future multi-interface/multi-channel channel switching. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:08:14 +02:00
Ilan Peer	46d537245d	cfg80211: Fix GO Concurrent relaxation on UNII-3 At some locations, channels 149-165 are considered a single bundle, while at some other locations, e.g., Indonesia, channels 149-161 are considered a single bundle, while channel 165 belongs to a different bundle. This means that: 1. A station interface connection to an AP on channel 165 allows the instantiation of a P2P GO on channels 149-165. 2. A station interface connection to an AP on channels 149-161 does NOT allow the instantiation of a P2P GO on channel 165. Fix this. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 16:47:33 +02:00
Johan Hedberg	09da1f3463	Bluetooth: Fix redundant encryption request for reauthentication When we're performing reauthentication (in order to elevate the security level from an unauthenticated key to an authenticated one) we do not need to issue any encryption command once authentication completes. Since the trigger for the encryption HCI command is the ENCRYPT_PEND flag this flag should not be set in this scenario. Instead, the REAUTH_PEND flag takes care of all necessary steps for reauthentication. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org	2014-04-25 09:47:15 +03:00
Johan Hedberg	9eb1fbfa0a	Bluetooth: Fix triggering BR/EDR L2CAP Connect too early Commit `1c2e004183` introduced an event handler for the encryption key refresh complete event with the intent of fixing some LE/SMP cases. However, this event is shared with BR/EDR and there we actually want to act only on the auth_complete event (which comes after the key refresh). If we do not do this we may trigger an L2CAP Connect Request too early and cause the remote side to return a security block error. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org	2014-04-25 09:47:15 +03:00
Kumar Sundararajan	1c26585458	ipv6: fib: fix fib dump restart When the ipv6 fib changes during a table dump, the walk is restarted and the number of nodes dumped are skipped. But the existing code doesn't advance to the next node after a node is skipped. This can cause the dump to loop or produce lots of duplicates when the fib is modified during the dump. This change advances the walk to the next node if the current node is skipped after a restart. Signed-off-by: Kumar Sundararajan <kumar@fb.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-24 17:19:25 -04:00
Nicolas Dichtel	f01ec1c017	vxlan: add x-netns support This patch allows to switch the netns when packet is encapsulated or decapsulated. The vxlan socket is openned into the i/o netns, ie into the netns where encapsulated packets are received. The socket lookup is done into this netns to find the corresponding vxlan tunnel. After decapsulation, the packet is injecting into the corresponding interface which may stand to another netns. When one of the two netns is removed, the tunnel is destroyed. Configuration example: ip netns add netns1 ip netns exec netns1 ip link set lo up ip link add vxlan10 type vxlan id 10 group 239.0.0.10 dev eth0 dstport 0 ip link set vxlan10 netns netns1 ip netns exec netns1 ip addr add 192.168.0.249/24 broadcast 192.168.0.255 dev vxlan10 ip netns exec netns1 ip link set vxlan10 up Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-24 16:18:26 -04:00
David Gibson	c53864fd60	rtnetlink: Only supply IFLA_VF_PORTS information when RTEXT_FILTER_VF is set Since `115c9b8192` (rtnetlink: Fix problem with buffer allocation), RTM_NEWLINK messages only contain the IFLA_VFINFO_LIST attribute if they were solicited by a GETLINK message containing an IFLA_EXT_MASK attribute with the RTEXT_FILTER_VF flag. That was done because some user programs broke when they received more data than expected - because IFLA_VFINFO_LIST contains information for each VF it can become large if there are many VFs. However, the IFLA_VF_PORTS attribute, supplied for devices which implement ndo_get_vf_port (currently the 'enic' driver only), has the same problem. It supplies per-VF information and can therefore become large, but it is not currently conditional on the IFLA_EXT_MASK value. Worse, it interacts badly with the existing EXT_MASK handling. When IFLA_EXT_MASK is not supplied, the buffer for netlink replies is fixed at NLMSG_GOODSIZE. If the information for IFLA_VF_PORTS exceeds this, then rtnl_fill_ifinfo() returns -EMSGSIZE on the first message in a packet. netlink_dump() will misinterpret this as having finished the listing and omit data for this interface and all subsequent ones. That can cause getifaddrs(3) to enter an infinite loop. This patch addresses the problem by only supplying IFLA_VF_PORTS when IFLA_EXT_MASK is supplied with the RTEXT_FILTER_VF flag set. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-24 13:52:54 -04:00
David Gibson	973462bbde	rtnetlink: Warn when interface's information won't fit in our packet Without IFLA_EXT_MASK specified, the information reported for a single interface in response to RTM_GETLINK is expected to fit within a netlink packet of NLMSG_GOODSIZE. If it doesn't, however, things will go badly wrong, When listing all interfaces, netlink_dump() will incorrectly treat -EMSGSIZE on the first message in a packet as the end of the listing and omit information for that interface and all subsequent ones. This can cause getifaddrs(3) to enter an infinite loop. This patch won't fix the problem, but it will WARN_ON() making it easier to track down what's going wrong. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-24 13:52:54 -04:00
David S. Miller	a64d90fd96	netfilter: Fix warning in nfnetlink_receive(). net/netfilter/nfnetlink.c: In function ‘nfnetlink_rcv’: net/netfilter/nfnetlink.c:371:14: warning: unused variable ‘net’ [-Wunused-variable] Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-24 13:51:29 -04:00
Eric W. Biederman	90f62cf30a	net: Use netlink_ns_capable to verify the permisions of netlink messages It is possible by passing a netlink socket to a more privileged executable and then to fool that executable into writing to the socket data that happens to be valid netlink message to do something that privileged executable did not intend to do. To keep this from happening replace bare capable and ns_capable calls with netlink_capable, netlink_net_calls and netlink_ns_capable calls. Which act the same as the previous calls except they verify that the opener of the socket had the desired permissions as well. Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-24 13:44:54 -04:00
Eric W. Biederman	aa4cf9452f	net: Add variants of capable for use on netlink messages netlink_net_capable - The common case use, for operations that are safe on a network namespace netlink_capable - For operations that are only known to be safe for the global root netlink_ns_capable - The general case of capable used to handle special cases __netlink_ns_capable - Same as netlink_ns_capable except taking a netlink_skb_parms instead of the skbuff of a netlink message. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-24 13:44:54 -04:00
Eric W. Biederman	a3b299da86	net: Add variants of capable for use on on sockets sk_net_capable - The common case, operations that are safe in a network namespace. sk_capable - Operations that are not known to be safe in a network namespace sk_ns_capable - The general case for special cases. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-24 13:44:53 -04:00
Eric W. Biederman	a53b72c83a	net: Move the permission check in sock_diag_put_filterinfo to packet_diag_dump The permission check in sock_diag_put_filterinfo is wrong, and it is so removed from it's sources it is not clear why it is wrong. Move the computation into packet_diag_dump and pass a bool of the result into sock_diag_filterinfo. This does not yet correct the capability check but instead simply moves it to make it clear what is going on. Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-24 13:44:53 -04:00
Eric W. Biederman	5187cd055b	netlink: Rename netlink_capable netlink_allowed netlink_capable is a static internal function in af_netlink.c and we have better uses for the name netlink_capable. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-24 13:44:53 -04:00
David S. Miller	4366004d77	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/intel/igb/e1000_mac.c net/core/filter.c Both conflicts were simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-24 13:19:00 -04:00
Marcel Holtmann	db5966816c	Bluetooth: Return EOPNOTSUPP for HCISETRAW ioctl command The HCISETRAW ioctl command is not really useful. To utilize raw and direct access to the HCI controller, the HCI User Channel feature has been introduced. Return EOPNOTSUPP to indicate missing support for this command. For legacy reasons hcidump used to use HCISETRAW for permission check to return proper error codes to users. To keep backwards compability return EPERM in case the caller does not have CAP_NET_ADMIN capability. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-04-24 11:55:25 -03:00
Tomasz Bursztyka	f5efc696cc	netfilter: nf_tables: Add meta expression key for bridge interface name NFT_META_BRI_IIFNAME to get packet input bridge interface name NFT_META_BRI_OIFNAME to get packet output bridge interface name Such meta key are accessible only through NFPROTO_BRIDGE family, on a dedicated nft meta module: nft_meta_bridge. Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-24 10:37:28 +02:00
Alexei Starovoitov	83d5b7ef99	net: filter: initialize A and X registers exisiting BPF verifier allows uninitialized access to registers, 'ret A' is considered to be a valid filter. So initialize A and X to zero to prevent leaking kernel memory In the future BPF verifier will be rejecting such filters Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Cc: Daniel Borkmann <dborkman@redhat.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-23 15:34:41 -04:00
Nicolas Dichtel	22f08069e8	ip6gre: add x-netns support This patch allows to switch the netns when packet is encapsulated or decapsulated. In other word, the encapsulated packet is received in a netns, where the lookup is done to find the tunnel. Once the tunnel is found, the packet is decapsulated and injecting into the corresponding interface which stands to another netns. When one of the two netns is removed, the tunnel is destroyed. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-23 14:53:36 -04:00
Nicolas Dichtel	b57708add3	gre: add x-netns support This patch allows to switch the netns when packet is encapsulated or decapsulated. In other word, the encapsulated packet is received in a netns, where the lookup is done to find the tunnel. Once the tunnel is found, the packet is decapsulated and injecting into the corresponding interface which stands to another netns. When one of the two netns is removed, the tunnel is destroyed. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-23 14:53:36 -04:00
Masanari Iida	1404c3ab98	netfilter: Fix format string mismatch in ip_vs_proto_name() Fix string mismatch in ip_vs_proto_name() Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-23 14:14:14 +02:00
Tomasz Bursztyka	aa45660c6b	netfilter: nf_tables: Make meta expression core functions public This will be useful to create network family dedicated META expression as for NFPROTO_BRIDGE for instance. Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-23 13:55:30 +02:00
Tomasz Bursztyka	758dbcecf1	netfilter: nf_tables: Stack expression type depending on their family To ensure family tight expression gets selected in priority to family agnostic ones. Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-23 13:51:05 +02:00
Tetsuo Handa	2e71029e2c	xfrm: Remove useless xfrm_audit struct. Commit `f1370cc4` "xfrm: Remove useless secid field from xfrm_audit." changed "struct xfrm_audit" to have either { audit_get_loginuid(current) / audit_get_sessionid(current) } or { INVALID_UID / -1 } pair. This means that we can represent "struct xfrm_audit" as "bool". This patch replaces "struct xfrm_audit" argument with "bool". Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-04-23 08:21:04 +02:00
Richard Guy Briggs	7774d5e03f	netlink: implement unbind to netlink_setsockopt NETLINK_DROP_MEMBERSHIP Call the per-protocol unbind function rather than bind function on NETLINK_DROP_MEMBERSHIP in netlink_setsockopt(). Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:42:26 -04:00
Richard Guy Briggs	4f52090052	netlink: have netlink per-protocol bind function return an error code. Have the netlink per-protocol optional bind function return an int error code rather than void to signal a failure. This will enable netlink protocols to perform extra checks including capabilities and permissions verifications when updating memberships in multicast groups. In netlink_bind() and netlink_setsockopt() the call to the per-protocol bind function was moved above the multicast group update to prevent any access to the multicast socket groups before checking with the per-protocol bind function. This will enable the per-protocol bind function to be used to check permissions which could be denied before making them available, and to avoid the messy job of undoing the addition should the per-protocol bind function fail. The netfilter subsystem seems to be the only one currently using the per-protocol bind function. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:42:26 -04:00
Richard Guy Briggs	bfe4bc71c6	netlink: simplify nfnetlink_bind Remove duplicity and simplify code flow by moving the rcu_read_unlock() above the condition and let the flow control exit naturally at the end of the function. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:42:26 -04:00
Chema Gonzalez	4cd3675ebf	filter: added BPF random opcode Added a new ancillary load (bpf call in eBPF parlance) that produces a 32-bit random number. We are implementing it as an ancillary load (instead of an ISA opcode) because (a) it is simpler, (b) allows easy JITing, and (c) seems more in line with generic ISAs that do not have "get a random number" as a instruction, but as an OS call. The main use for this ancillary load is to perform random packet sampling. Signed-off-by: Chema Gonzalez <chema@google.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:27:57 -04:00
Li RongQing	5a4ae5f6e7	vlan: unnecessary to check if vlan_pcpu_stats is NULL if allocating memory for vlan_pcpu_stats failed, the device can not be operated Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:27:57 -04:00
Venkata Duvvuru	3de0b59239	ethtool: Support for configurable RSS hash key This ethtool patch primarily copies the ioctl command data structures from/to the User space and invokes the driver hook. Signed-off-by: Venkat Duvvuru <VenkatKumar.Duvvuru@Emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:27:57 -04:00
Eric Dumazet	1f3279ae0c	tcp: avoid retransmits of TCP packets hanging in host queues In commit `0e280af026` ("tcp: introduce TCPSpuriousRtxHostQueues SNMP counter") we added a logic to detect when a packet was retransmitted while the prior clone was still in a qdisc or driver queue. We are now confident we can do better, and catch the problem before we fragment a TSO packet before retransmit, or in TLP path. This patch fully exploits the logic by simply canceling the spurious retransmit. Original packet is in a queue and will eventually leave the host. This helps to avoid network collapses when some events make the RTO estimations very wrong, particularly when dealing with huge number of sockets with synchronized blast. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:27:57 -04:00
Heiner Kallweit	6046d5b4e4	ipv6: support IFA_F_MANAGETEMPADDR for address deletion too Userspace applications can use IFA_F_MANAGETEMPADDR with RTM_NEWADDR already to indicate that the kernel should take care of temporary address management. This patch adds related functionality to RTM_DELADDR. By setting IFA_F_MANAGETEMPADDR a userspace application can indicate that the kernel should delete all related temporary addresses as well. A corresponding patch for the "ip addr del" command has been applied to iproute2 already. Signed-off-by: Heiner Kallweit <heiner.kallweit@web.de> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:27:57 -04:00
Ying Xue	a8b9b96e95	tipc: fix race in disc create/delete Commit `a21a584d67` (tipc: fix neighbor detection problem after hw address change) introduces a race condition involving tipc_disc_delete() and tipc_disc_add/remove_dest that can cause TIPC to dereference the pointer to the bearer discovery request structure after it has been freed since a stray pointer is left in the bearer structure. In order to fix the issue, the process of resetting the discovery request handler is optimized: the discovery request handler and request buffer are just reset instead of being freed, allocated and initialized. As the request point is always valid and the request's lock is taken while the request handler is reset, the race doesn't happen any more. Reported-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:53 -04:00
Ying Xue	28dd94187a	tipc: use bc_lock to protect node map in bearer structure The node map variable - 'nodes' in bearer structure is only used by bclink. When bclink accesses it, bc_lock is held. But when change it, for instance, in tipc_bearer_add_dest() or tipc_bearer_remove_dest() the bc_lock is not taken at all. To avoid any inconsistent data, we should always grab bc_lock while accessing node map variable. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:53 -04:00
Ying Xue	4ae88c94d3	tipc: use bearer_disable to disable bearer in tipc_l2_device_event As bearer pointer is known in tipc_l2_device_event(), it's unnecessary to search it again in tipc_disable_bearer(). If tipc_disable_bearer() is replaced with bearer_disable() in tipc_l2_device_event(), this will help us save a bit time when bearer is disabled. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:53 -04:00
Ying Xue	f1c8d8cb82	tipc: make media_ptr pointed netdevice valid The 'media_ptr' pointer in bearer structure which points to network device, is protected by RCU. So, before netdevice is released, synchronize_net() should be involved to prevent no any user of the netdevice on read side from accessing it after it is freed. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:53 -04:00
Ying Xue	7216cd949c	tipc: purge tipc_net_lock lock Now tipc routing hierarchy comprises the structures 'node', 'link'and 'bearer'. The whole hierarchy is protected by a big read/write lock, tipc_net_lock, to ensure that nothing is added or removed while code is accessing any of these structures. Obviously the locking policy makes node, link and bearer components closely bound together so that their relationship becomes unnecessarily complex. In the worst case, such locking policy not only has a negative influence on performance, but also it's prone to lead to deadlock occasionally. In order o decouple the complex relationship between bearer and node as well as link, the locking policy is adjusted as follows: - Bearer level RTNL lock is used on update side, and RCU is used on read side. Meanwhile, all bearer instances including broadcast bearer are saved into bearer_list array. - Node and link level All node instances are saved into two tipc_node_list and node_htable lists. The two lists are protected by node_list_lock on write side, and they are guarded with RCU lock on read side. All members in node structure including link instances are protected by node spin lock. - The relationship between bearer and node When link accesses bearer, it first needs to find the bearer with its bearer identity from the bearer_list array. When bearer accesses node, it can iterate the node_htable hash list with the node address to find the corresponding node. In the new locking policy, every component has its private locking solution and the relationship between bearer and node is very simple, that is, they can find each other with node address or bearer identity from node_htable hash list or bearer_list array. Until now above all changes have been done, so tipc_net_lock can be removed safely. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:53 -04:00
Ying Xue	2231c5af45	tipc: use RCU to protect media_ptr pointer Now the media_ptr pointer is protected with tipc_net_lock write lock on write side; tipc_net_lock read lock is used to read side. As the part of effort of eliminating tipc_net_lock, we decide to adjust the locking policy of media_ptr pointer protection: on write side, RTNL lock is use while on read side RCU read lock is applied. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:53 -04:00
Ying Xue	7a2f7d18e7	tipc: decouple the relationship between bearer and link Currently on both paths of message transmission and reception, the read lock of tipc_net_lock must be held before bearer is accessed, while the write lock of tipc_net_lock has to be taken before bearer is configured. Although it can ensure that bearer is always valid on the two data paths, link and bearer is closely bound together. So as the part of effort of removing tipc_net_lock, the locking policy of bearer protection will be adjusted as below: on the two data paths, RCU is used, and on the configuration path of bearer, RTNL lock is applied. Now RCU just covers the path of message reception. To make it possible to protect the path of message transmission with RCU, link should not use its stored bearer pointer to access bearer, but it should use the bearer identity of its attached bearer as index to get bearer instance from bearer_list array, which can help us decouple the relationship between bearer and link. As a result, bearer on the path of message transmission can be safely protected by RCU when we access bearer_list array within RCU lock protection. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:53 -04:00
Ying Xue	f8322dfce5	tipc: convert bearer_list to RCU list Convert bearer_list to RCU list. It's protected by RTNL lock on update side, and RCU read lock is applied to read side. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:52 -04:00
Ying Xue	f97e455abf	tipc: use RTNL lock to protect tipc_net_stop routine As the tipc network initialization(ie, tipc_net_start routine) is under RTNL protection, its corresponding deinitialization part(ie, tipc_net_stop routine) should be protected by RTNL too. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:52 -04:00
Ying Xue	ca07fb07c9	tipc: adjust locking policy of protecting tipc_ptr pointer of net_device Currently the 'tipc_ptr' pointer is protected by tipc_net_lock write lock on write side, and RCU read lock is applied to read side. In addition, there have two paths on write side where we may change variables pointed by the 'tipc_ptr' pointer: one is to configure bearer by tipc-config tool and another one is that bearer status is changed by notification events of its attached interface. But on the latter path, we improperly deem that accessing 'tipc_ptr' pointer happens on read side with rcu_read_lock() although some variables pointed by the 'tipc_ptr' pointer are changed possibly. Moreover, as now the both paths are guarded by RTNL lock, it's better to adjust the locking policy of 'tipc_ptr' pointer protection, allowing RTNL instead of tipc_net_lock write lock to protect it on write side, which will help us purge tipc_net_lock in the future. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:52 -04:00
Ying Xue	ef13a262c3	tipc: replace config_mutex lock with RTNL lock There have two paths where we can configure or change bearer status: one is that bearer is configured from user space with tipc-config tool; another one is that bearer is changed by notification events from its attached interface. On the first path, one dedicated config_mutex lock is guarded; on the latter path, RTNL lock has been placed to serialize the process of dealing with interface events. So, if RTNL lock is also used to protect the first path, this will not only extremely help us simplify current locking policy, but also config_mutex lock can be deleted as well. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:52 -04:00
Andrew Lutomirski	78541c1dc6	net: Fix ns_capable check in sock_diag_put_filterinfo The caller needs capabilities on the namespace being queried, not on their own namespace. This is a security bug, although it likely has only a minor impact. Cc: stable@vger.kernel.org Signed-off-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 12:49:39 -04:00
Bob Copeland	bc3ce0b0be	mac80211: mesh: always use the latest target_sn When a path target responds to a path request, its response always contains the most up-to-date information; accordingly, it should use the latest target_sn, regardless of net_traversal_jiffies(). Otherwise, only the first path response is considered when constructing a path, as it will have the highest target_sn of all replies during that period. Signed-off-by: Bob Copeland <bob@cozybit.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-22 17:24:50 +02:00
Bob Copeland	a40a8c17b2	mac80211: fix mesh_add_rsn_ie IE finding loop Previously, the code to copy the RSN IE from the mesh config would increment its pointer by one in the loop instead of by the element length, so there was the potential for mistaking another IE's data fields as the RSN IE. cfg80211_find_ie() exists, so just use that. Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-22 17:24:49 +02:00
Bob Copeland	aee6499c8c	mac80211: mesh: use u16 return type for u16 getter u16_field_get() is a simple wrapper around get_unaligned_le16(), and it is being assigned to a u16, so there's no need to promote to u32 in the middle. Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-22 17:24:49 +02:00
Jouni Malinen	9a07bf507d	mac80211: Allow HT capa override to add 40 MHz intolerant This can be useful for testing purposes to confirm valid AP behavior on HT 20/40 co-existence functionality. Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-22 17:23:19 +02:00
Luis R. Rodriguez	96cce12ff6	cfg80211: fix processing world regdomain when non modular This allows processing of the last regulatory request when we determine its still pending. Without this if a regulatory request failed to get processed by userspace we wouldn't be able to re-process it later. An example situation that can lead to an unprocessed last_request is enabling cfg80211 to be built-in to the kernel, not enabling CFG80211_INTERNAL_REGDB and the CRDA binary not being available at the time the udev rule that kicks of CRDA triggers. In such a situation we want to let some cfg80211 triggers eventually kick CRDA for us again. Without this if the first cycle attempt to kick off CRDA failed we'd be stuck without the ability to change process any further regulatory domains. cfg80211 will trigger re-processing of the regulatory queue whenever schedule_work(&reg_work) is called, currently this happens when: * suspend / resume * disconnect * a beacon hint gets triggered (non DFS 5 GHz AP found) * a regulatory request gets added to the queue We don't have any specific opportunistic late boot triggers to address a late mount of where CRDA resides though, adding that should be done separately through another patch. Without an opportunistic fix then this fix relies at least one of the triggeres above to happen. Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-22 17:17:56 +02:00
Arik Nemtsov	c888393b74	cfg80211: avoid freeing last_request while in flight Avoid freeing the last request while it is being processed. This can happen in some cases if reg_work is kicked for some reason while the currently pending request is in flight. Cc: Sander Eikelenboom <linux@eikelenboom.it> Tested-by: Eliad Peller <eliad@wizery.com> Tested-by: Colleen Twitty <colleen@cozybit.com> Signed-off-by: Arik Nemtsov <arik@wizery.com> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-22 17:12:55 +02:00
Bob Copeland	311ab8e1c2	mac80211: fixup radiotap tx flags for RTS/CTS When using RTS/CTS, the CTS-to-Self bit in radiotap TX flags is getting set instead of the RTS bit. Set the correct one. Reported-by: Larry Maxwell <larrymaxwell@agilemesh.com> Signed-off-by: Bob Copeland <bob@cozybit.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-22 17:08:30 +02:00
Chun-Yeow Yeoh	062f1d6de0	mac80211: avoid handling of SMPS for mesh The patch "mac80211: implement SMPS for AP" has caused kernel oops at mesh STA if the peer mesh STA operates in sleep mode and then becomes active mode. It can be easily reproduced by setting the following commands at peer mesh STA: iw mesh0 station set aa:bb:cc:dd:ee:ff mesh_power_mode deep iw mesh0 station set aa:bb:cc:dd:ee:ff mesh_power_mode active Kernel oops will happen at mesh STA aa:bb:cc:dd:ee:ff. Fix this by avoiding SMPS for mesh mode. Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-22 16:38:42 +02:00
Tetsuo Handa	f1370cc4a0	xfrm: Remove useless secid field from xfrm_audit. It seems to me that commit `ab5f5e8b` "[XFRM]: xfrm audit calls" is doing something strange at xfrm_audit_helper_usrinfo(). If secid != 0 && security_secid_to_secctx(secid) != 0, the caller calls audit_log_task_context() which basically does secid != 0 && security_secid_to_secctx(secid) == 0 case except that secid is obtained from current thread's context. Oh, what happens if secid passed to xfrm_audit_helper_usrinfo() was obtained from other thread's context? It might audit current thread's context rather than other thread's context if security_secid_to_secctx() in xfrm_audit_helper_usrinfo() failed for some reason. Then, are all the caller of xfrm_audit_helper_usrinfo() passing either secid obtained from current thread's context or secid == 0? It seems to me that they are. If I didn't miss something, we don't need to pass secid to xfrm_audit_helper_usrinfo() because audit_log_task_context() will obtain secid from current thread's context. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-04-22 10:47:53 +02:00
Christophe Ricard	74157ef54c	NFC: hci: Fix sparse: cast to restricted __be16 Fixing "sparse: cast to restricted __be16" message when building with make C=1 CF=-D__CHECK_ENDIAN__ Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-04-22 00:37:29 +02:00
Mark A. Greer	2473460735	NFC: digital: Add support for ISO/IEC 14443-B Protocol Add support for the ISO/IEC 14443-B protocol and Type 4B tags. It is expected that there will be only one tag within range so the full anticollision scheme is not implemented. Only the SENSB_REQ/SENSB_RES and ATTRIB_REQ/ATTRIB_RES are implemented. CC: Thierry Escande <thierry.escande@linux.intel.com> Signed-off-by: Mark A. Greer <mgreer@animalcreek.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-04-22 00:37:28 +02:00
Christophe Ricard	e240bc3612	NFC: hci: Add load_session HCI operand load_session allows a CLF to restore the gate <-> pipe table from some proprietary location. The main advantage to add this function is to reduce the memory wear by running pipe creation (and storing) only once. Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-04-22 00:37:26 +02:00
Christophe Ricard	d330905db6	NFC: hci: Extend command execution delay Extend it up to the maximum FWI value 4949 ms defined by the ISO14443-3 specification. Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-04-22 00:37:26 +02:00
Weiping Pan	86fd14ad1e	tcp: make tcp_cwnd_application_limited() static Make tcp_cwnd_application_limited() static and move it from tcp_input.c to tcp_output.c Signed-off-by: Weiping Pan <wpan@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-20 18:18:56 -04:00
Luis R. Rodriguez	94716d2fb1	6lowpan: make lowpan_cb static CC: Alexander Smirnov <alex.bluesman.smirnov@gmail.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: linux-zigbee-devel@lists.sourceforge.net Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-20 18:18:55 -04:00
Luis R. Rodriguez	599018a710	6lowpan: add helper to get 6lowpan namespace This will simplify the new reassembly backport with no code changes being required. CC: Alexander Smirnov <alex.bluesman.smirnov@gmail.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: linux-zigbee-devel@lists.sourceforge.net Cc: David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-20 18:18:55 -04:00
Neil Horman	8465a5fcd1	sctp: add support for busy polling to sctp protocol The busy polling socket option adds support for sockets to busy wait on data arriving on the napi queue from which they have most recently received a frame. Currently only tcp and udp support this feature, but theres no reason sctp can't do so as well. Add it in so appliations can take advantage of it Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Vlad Yasevich <vyasevich@gmail.com> CC: "David S. Miller" <davem@davemloft.net> CC: Daniel Borkmann <dborkman@redhat.com> CC: netdev@vger.kernel.org Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-20 18:18:55 -04:00
Herbert Xu	a0265d28b3	net: Add __dev_forward_skb This patch adds the helper __dev_forward_skb which is identical to dev_forward_skb except that it doesn't actually inject the skb into the stack. This is useful where we wish to have finer control over how the packet is injected, e.g., via netif_rx_ni or netif_receive_skb. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-20 18:18:55 -04:00
Kenjiro Nakayama	1536e2857b	tcp: Add a TCP_FASTOPEN socket option to get a max backlog on its listner This patch adds a TCP_FASTOPEN socket option to get a max backlog on its listener to getsockopt(). Signed-off-by: Kenjiro Nakayama <nakayamakenjiro@gmail.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-20 18:18:54 -04:00
Vlad Yasevich	b14878ccb7	net: sctp: cache auth_enable per endpoint Currently, it is possible to create an SCTP socket, then switch auth_enable via sysctl setting to 1 and crash the system on connect: Oops[#1]: CPU: 0 PID: 0 Comm: swapper Not tainted 3.14.1-mipsgit-20140415 #1 task: ffffffff8056ce80 ti: ffffffff8055c000 task.ti: ffffffff8055c000 [...] Call Trace: [<ffffffff8043c4e8>] sctp_auth_asoc_set_default_hmac+0x68/0x80 [<ffffffff8042b300>] sctp_process_init+0x5e0/0x8a4 [<ffffffff8042188c>] sctp_sf_do_5_1B_init+0x234/0x34c [<ffffffff804228c8>] sctp_do_sm+0xb4/0x1e8 [<ffffffff80425a08>] sctp_endpoint_bh_rcv+0x1c4/0x214 [<ffffffff8043af68>] sctp_rcv+0x588/0x630 [<ffffffff8043e8e8>] sctp6_rcv+0x10/0x24 [<ffffffff803acb50>] ip6_input+0x2c0/0x440 [<ffffffff8030fc00>] __netif_receive_skb_core+0x4a8/0x564 [<ffffffff80310650>] process_backlog+0xb4/0x18c [<ffffffff80313cbc>] net_rx_action+0x12c/0x210 [<ffffffff80034254>] __do_softirq+0x17c/0x2ac [<ffffffff800345e0>] irq_exit+0x54/0xb0 [<ffffffff800075a4>] ret_from_irq+0x0/0x4 [<ffffffff800090ec>] rm7k_wait_irqoff+0x24/0x48 [<ffffffff8005e388>] cpu_startup_entry+0xc0/0x148 [<ffffffff805a88b0>] start_kernel+0x37c/0x398 Code: dd0900b8 000330f8 0126302d <dcc60000> 50c0fff1 0047182a a48306a0 03e00008 00000000 ---[ end trace b530b0551467f2fd ]--- Kernel panic - not syncing: Fatal exception in interrupt What happens while auth_enable=0 in that case is, that ep->auth_hmacs is initialized to NULL in sctp_auth_init_hmacs() when endpoint is being created. After that point, if an admin switches over to auth_enable=1, the machine can crash due to NULL pointer dereference during reception of an INIT chunk. When we enter sctp_process_init() via sctp_sf_do_5_1B_init() in order to respond to an INIT chunk, the INIT verification succeeds and while we walk and process all INIT params via sctp_process_param() we find that net->sctp.auth_enable is set, therefore do not fall through, but invoke sctp_auth_asoc_set_default_hmac() instead, and thus, dereference what we have set to NULL during endpoint initialization phase. The fix is to make auth_enable immutable by caching its value during endpoint initialization, so that its original value is being carried along until destruction. The bug seems to originate from the very first days. Fix in joint work with Daniel Borkmann. Reported-by: Joshua Kinard <kumba@gentoo.org> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Tested-by: Joshua Kinard <kumba@gentoo.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-18 18:32:00 -04:00
David S. Miller	5a292f7bc6	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== pull request: wireless 2014-04-17 Please pull this batch of fixes intended for the 3.15 stream... For the mac80211 bits, Johannes says: "We have a fix from Chun-Yeow to not look at management frame bitrates that are typically really low, two fixes from Felix for AP_VLAN interfaces, a fix from Ido to disable SMPS settings when a monitor interface is enabled, a radar detection fix from Michał and a fix from myself for a very old remain-on-channel bug." For the iwlwifi bits, Emmanuel says: "I have new device IDs and a new firmware API. These are the trivial ones. The less trivial ones are Johannes's fix that delays the enablement of an interrupt coalescing hardware until after association - this fixes a few connection problems seen in the field. Eyal has a bunch of rate control fixes. I decided to add these for 3.15 because they fix some disconnection and packet loss scenarios which were reported by the field. I also have a fix for a memory leak that happens only with a very new NIC." Along with those... Amitkumar Karwar fixes a couple of problems relating to driver/firmware interactions in mwifiex. Christian Engelmayer avoids a couple of potential memory leaks in the new rsi driver. Eliad Peller provides a wl18xx mailbox alignment fix for problems when using new firmware. Frederic Danis adds a couple of missing debugging strings to the cw1200 driver. Geert Uytterhoeven adds a variable initialization inside of the rsi driver. Luciano Coelho patches the wlcore code to ignore dummy packet events in PLT mode in order to work around a firmware bug. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-18 18:29:44 -04:00
dingtianhong	dc8eaaa006	vlan: Fix lockdep warning when vlan dev handle notification When I open the LOCKDEP config and run these steps: modprobe 8021q vconfig add eth2 20 vconfig add eth2.20 30 ifconfig eth2 xx.xx.xx.xx then the Call Trace happened: [32524.386288] ============================================= [32524.386293] [ INFO: possible recursive locking detected ] [32524.386298] 3.14.0-rc2-0.7-default+ #35 Tainted: G O [32524.386302] --------------------------------------------- [32524.386306] ifconfig/3103 is trying to acquire lock: [32524.386310] (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff814275f4>] dev_mc_sync+0x64/0xb0 [32524.386326] [32524.386326] but task is already holding lock: [32524.386330] (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff8141af83>] dev_set_rx_mode+0x23/0x40 [32524.386341] [32524.386341] other info that might help us debug this: [32524.386345] Possible unsafe locking scenario: [32524.386345] [32524.386350] CPU0 [32524.386352] ---- [32524.386354] lock(&vlan_netdev_addr_lock_key/1); [32524.386359] lock(&vlan_netdev_addr_lock_key/1); [32524.386364] [32524.386364] * DEADLOCK * [32524.386364] [32524.386368] May be due to missing lock nesting notation [32524.386368] [32524.386373] 2 locks held by ifconfig/3103: [32524.386376] #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff81431d42>] rtnl_lock+0x12/0x20 [32524.386387] #1: (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff8141af83>] dev_set_rx_mode+0x23/0x40 [32524.386398] [32524.386398] stack backtrace: [32524.386403] CPU: 1 PID: 3103 Comm: ifconfig Tainted: G O 3.14.0-rc2-0.7-default+ #35 [32524.386409] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [32524.386414] ffffffff81ffae40 ffff8800d9625ae8 ffffffff814f68a2 ffff8800d9625bc8 [32524.386421] ffffffff810a35fb ffff8800d8a8d9d0 00000000d9625b28 ffff8800d8a8e5d0 [32524.386428] 000003cc00000000 0000000000000002 ffff8800d8a8e5f8 0000000000000000 [32524.386435] Call Trace: [32524.386441] [<ffffffff814f68a2>] dump_stack+0x6a/0x78 [32524.386448] [<ffffffff810a35fb>] __lock_acquire+0x7ab/0x1940 [32524.386454] [<ffffffff810a323a>] ? __lock_acquire+0x3ea/0x1940 [32524.386459] [<ffffffff810a4874>] lock_acquire+0xe4/0x110 [32524.386464] [<ffffffff814275f4>] ? dev_mc_sync+0x64/0xb0 [32524.386471] [<ffffffff814fc07a>] _raw_spin_lock_nested+0x2a/0x40 [32524.386476] [<ffffffff814275f4>] ? dev_mc_sync+0x64/0xb0 [32524.386481] [<ffffffff814275f4>] dev_mc_sync+0x64/0xb0 [32524.386489] [<ffffffffa0500cab>] vlan_dev_set_rx_mode+0x2b/0x50 [8021q] [32524.386495] [<ffffffff8141addf>] __dev_set_rx_mode+0x5f/0xb0 [32524.386500] [<ffffffff8141af8b>] dev_set_rx_mode+0x2b/0x40 [32524.386506] [<ffffffff8141b3cf>] __dev_open+0xef/0x150 [32524.386511] [<ffffffff8141b177>] __dev_change_flags+0xa7/0x190 [32524.386516] [<ffffffff8141b292>] dev_change_flags+0x32/0x80 [32524.386524] [<ffffffff8149ca56>] devinet_ioctl+0x7d6/0x830 [32524.386532] [<ffffffff81437b0b>] ? dev_ioctl+0x34b/0x660 [32524.386540] [<ffffffff814a05b0>] inet_ioctl+0x80/0xa0 [32524.386550] [<ffffffff8140199d>] sock_do_ioctl+0x2d/0x60 [32524.386558] [<ffffffff81401a52>] sock_ioctl+0x82/0x2a0 [32524.386568] [<ffffffff811a7123>] do_vfs_ioctl+0x93/0x590 [32524.386578] [<ffffffff811b2705>] ? rcu_read_lock_held+0x45/0x50 [32524.386586] [<ffffffff811b39e5>] ? __fget_light+0x105/0x110 [32524.386594] [<ffffffff811a76b1>] SyS_ioctl+0x91/0xb0 [32524.386604] [<ffffffff815057e2>] system_call_fastpath+0x16/0x1b ======================================================================== The reason is that all of the addr_lock_key for vlan dev have the same class, so if we change the status for vlan dev, the vlan dev and its real dev will hold the same class of addr_lock_key together, so the warning happened. we should distinguish the lock depth for vlan dev and its real dev. v1->v2: Convert the vlan_netdev_addr_lock_key to an array of eight elements, which could support to add 8 vlan id on a same vlan dev, I think it is enough for current scene, because a netdev's name is limited to IFNAMSIZ which could not hold 8 vlan id, and the vlan dev would not meet the same class key with its real dev. The new function vlan_dev_get_lockdep_subkey() will return the subkey and make the vlan dev could get a suitable class key. v2->v3: According David's suggestion, I use the subclass to distinguish the lock key for vlan dev and its real dev, but it make no sense, because the difference for subclass in the lock_class_key doesn't mean that the difference class for lock_key, so I use lock_depth to distinguish the different depth for every vlan dev, the same depth of the vlan dev could have the same lock_class_key, I import the MAX_LOCK_DEPTH from the include/linux/sched.h, I think it is enough here, the lockdep should never exceed that value. v3->v4: Add a huge array of locking keys will waste static kernel memory and is not a appropriate method, we could use _nested() variants to fix the problem, calculate the depth for every vlan dev, and use the depth as the subclass for addr_lock_key. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-18 17:48:30 -04:00
Peter Zijlstra	4e857c58ef	arch: Mass conversion of smp_mb__() Mostly scripted conversion of the smp_mb__ barriers. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-arch@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-04-18 14:20:48 +02:00
John W. Linville	4a0c3d9fd1	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem	2014-04-17 10:34:22 -04:00
Nicolas Dichtel	74462f0d4a	ip6_tunnel: use the right netns in ioctl handler Because the netdevice may be in another netns than the i/o netns, we should use the i/o netns instead of dev_net(dev). Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-16 15:16:02 -04:00
Nicolas Dichtel	9aad77c3b5	sit: use the right netns in ioctl handler Because the netdevice may be in another netns than the i/o netns, we should use the i/o netns instead of dev_net(dev). Note that netdev_priv(dev) cannot bu NULL, hence we can remove these useless checks. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-16 15:16:02 -04:00
Nicolas Dichtel	8c923ce219	ip_tunnel: use the right netns in ioctl handler Because the netdevice may be in another netns than the i/o netns, we should use the i/o netns instead of dev_net(dev). The variable 'tunnel' was used only to get 'itn', hence to simplify code I remove it and use 't' instead. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-16 15:16:02 -04:00
Jan Glauber	b7c0ddf5f2	net: use SYSCALL_DEFINEx for sys_recv Make sys_recv a first class citizen by using the SYSCALL_DEFINEx macro. Besides being cleaner this will also generate meta data for the system call so tracing tools like ftrace or LTTng can resolve this system call. Signed-off-by: Jan Glauber <jan.glauber@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-16 15:15:05 -04:00
Cong Wang	0d5edc6873	ipv4, route: pass 0 instead of LOOPBACK_IFINDEX to fib_validate_source() In my special case, when a packet is redirected from veth0 to lo, its skb->dev->ifindex would be LOOPBACK_IFINDEX. Meanwhile we pass the hard-coded LOOPBACK_IFINDEX to fib_validate_source() in ip_route_input_slow(). This would cause the following check in fib_validate_source() fail: (dev->ifindex != oif \|\| !IN_DEV_TX_REDIRECTS(idev)) when rp_filter is disabeld on loopback. As suggested by Julian, the caller should pass 0 here so that we will not end up by calling __fib_validate_source(). Cc: Eric Biederman <ebiederm@xmission.com> Cc: Julian Anastasov <ja@ssi.bg> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-16 15:05:12 -04:00
Cong Wang	6a662719c9	ipv4, fib: pass LOOPBACK_IFINDEX instead of 0 to flowi4_iif As suggested by Julian: Simply, flowi4_iif must not contain 0, it does not look logical to ignore all ip rules with specified iif. because in fib_rule_match() we do: if (rule->iifindex && (rule->iifindex != fl->flowi_iif)) goto out; flowi4_iif should be LOOPBACK_IFINDEX by default. We need to move LOOPBACK_IFINDEX to include/net/flow.h: 1) It is mostly used by flowi_iif 2) Fix the following compile error if we use it in flow.h by the patches latter: In file included from include/linux/netfilter.h:277:0, from include/net/netns/netfilter.h:5, from include/net/net_namespace.h:21, from include/linux/netdevice.h:43, from include/linux/icmpv6.h:12, from include/linux/ipv6.h:61, from include/net/ipv6.h:16, from include/linux/sunrpc/clnt.h:27, from include/linux/nfs_fs.h:30, from init/do_mounts.c:32: include/net/flow.h: In function ‘flowi4_init_output’: include/net/flow.h:84:32: error: ‘LOOPBACK_IFINDEX’ undeclared (first use in this function) Cc: Eric Biederman <ebiederm@xmission.com> Cc: Julian Anastasov <ja@ssi.bg> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-16 15:05:11 -04:00
Steffen Klassert	a32452366b	vti4: Don't count header length twice. We currently count the size of LL_MAX_HEADER and struct iphdr twice for vti4 devices, this leads to a wrong device mtu. The size of LL_MAX_HEADER and struct iphdr is already counted in ip_tunnel_bind_dev(), so don't do it again in vti_tunnel_init(). Fixes: `b9959fd3` ("vti: switch to new ip tunnel code") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-04-16 09:01:03 +02:00
Nicolas Dichtel	54d63f787b	ip6_gre: don't allow to remove the fb_tunnel_dev It's possible to remove the FB tunnel with the command 'ip link del ip6gre0' but this is unsafe, the module always supposes that this device exists. For example, ip6gre_tunnel_lookup() may use it unconditionally. Let's add a rtnl handler for dellink, which will never remove the FB tunnel (we let ip6gre_destroy_tunnels() do the job). Introduced by commit `c12b395a46` ("gre: Support GRE over IPv6"). CC: Dmitry Kozlov <xeb@mail.ru> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-15 14:56:19 -04:00
Eric Dumazet	aad88724c9	ipv4: add a sock pointer to dst->output() path. In the dst->output() path for ipv4, the code assumes the skb it has to transmit is attached to an inet socket, specifically via ip_mc_output() : The sk_mc_loop() test triggers a WARN_ON() when the provider of the packet is an AF_PACKET socket. The dst->output() method gets an additional 'struct sock *sk' parameter. This needs a cascade of changes so that this parameter can be propagated from vxlan to final consumer. Fixes: `8f646c922d` ("vxlan: keep original skb ownership") Reported-by: lucien xin <lucien.xin@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-15 13:47:15 -04:00
Eric Dumazet	b0270e9101	ipv4: add a sock pointer to ip_queue_xmit() ip_queue_xmit() assumes the skb it has to transmit is attached to an inet socket. Commit `31c70d5956` ("l2tp: keep original skb ownership") changed l2tp to not change skb ownership and thus broke this assumption. One fix is to add a new 'struct sock *sk' parameter to ip_queue_xmit(), so that we do not assume skb->sk points to the socket used by l2tp tunnel. Fixes: `31c70d5956` ("l2tp: keep original skb ownership") Reported-by: Zhan Jianyu <nasa4836@gmail.com> Tested-by: Zhan Jianyu <nasa4836@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-15 12:58:34 -04:00
David S. Miller	00cbc3dcd1	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains three Netfilter fixes for your net tree, they are: * Fix missing generation sequence initialization which results in a splat if lockdep is enabled, it was introduced in the recent works to improve nf_conntrack scalability, from Andrey Vagin. * Don't flush the GRE keymap list in nf_conntrack when the pptp helper is disabled otherwise this crashes due to a double release, from Andrey Vagin. * Fix nf_tables cmp fast in big endian, from Patrick McHardy. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-14 19:00:10 -04:00
Vlad Yasevich	1e785f48d2	net: Start with correct mac_len in skb_network_protocol Sometimes, when the packet arrives at skb_mac_gso_segment() its skb->mac_len already accounts for some of the mac lenght headers in the packet. This seems to happen when forwarding through and OpenSSL tunnel. When we start looking for any vlan headers in skb_network_protocol() we seem to ignore any of the already known mac headers and start with an ETH_HLEN. This results in an incorrect offset, dropped TSO frames and general slowness of the connection. We can start counting from the known skb->mac_len and return at least that much if all mac level headers are known and accounted for. Fixes: `53d6471cef` (net: Account for all vlan headers in skb_mac_gso_segment) CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Daniel Borkman <dborkman@redhat.com> Tested-by: Martin Filip <nexus+kernel@smoula.net> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-14 18:58:58 -04:00
Daniel Borkmann	362d52040c	Revert "net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer" This reverts commit `ef2820a735` ("net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer") as it introduced a serious performance regression on SCTP over IPv4 and IPv6, though a not as dramatic on the latter. Measurements are on 10Gbit/s with ixgbe NICs. Current state: [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64 Time: Fri, 11 Apr 2014 17:56:21 GMT Connecting to host 192.168.241.3, port 5201 Cookie: Lab200slot2.1397238981.812898.548918 [ 4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test [ ID] Interval Transfer Bandwidth [ 4] 0.00-1.09 sec 20.8 MBytes 161 Mbits/sec [ 4] 1.09-2.13 sec 10.8 MBytes 86.8 Mbits/sec [ 4] 2.13-3.15 sec 3.57 MBytes 29.5 Mbits/sec [ 4] 3.15-4.16 sec 4.33 MBytes 35.7 Mbits/sec [ 4] 4.16-6.21 sec 10.4 MBytes 42.7 Mbits/sec [ 4] 6.21-6.21 sec 0.00 Bytes 0.00 bits/sec [ 4] 6.21-7.35 sec 34.6 MBytes 253 Mbits/sec [ 4] 7.35-11.45 sec 22.0 MBytes 45.0 Mbits/sec [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-12.51 sec 16.0 MBytes 126 Mbits/sec [ 4] 12.51-13.59 sec 20.3 MBytes 158 Mbits/sec [ 4] 13.59-14.65 sec 13.4 MBytes 107 Mbits/sec [ 4] 14.65-16.79 sec 33.3 MBytes 130 Mbits/sec [ 4] 16.79-16.79 sec 0.00 Bytes 0.00 bits/sec [ 4] 16.79-17.82 sec 5.94 MBytes 48.7 Mbits/sec (etc) [root@Lab200slot2 ~]# iperf3 --sctp -6 -c 2001:db8:0:f101::1 -V -l 1400 -t 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64 Time: Fri, 11 Apr 2014 19:08:41 GMT Connecting to host 2001:db8:0:f101::1, port 5201 Cookie: Lab200slot2.1397243321.714295.2b3f7c [ 4] local 2001:db8:0:f101::2 port 55804 connected to 2001:db8:0:f101::1 port 5201 Starting Test: protocol: SCTP, 1 streams, 1400 byte blocks, omitting 0 seconds, 60 second test [ ID] Interval Transfer Bandwidth [ 4] 0.00-1.00 sec 169 MBytes 1.42 Gbits/sec [ 4] 1.00-2.00 sec 201 MBytes 1.69 Gbits/sec [ 4] 2.00-3.00 sec 188 MBytes 1.58 Gbits/sec [ 4] 3.00-4.00 sec 174 MBytes 1.46 Gbits/sec [ 4] 4.00-5.00 sec 165 MBytes 1.39 Gbits/sec [ 4] 5.00-6.00 sec 199 MBytes 1.67 Gbits/sec [ 4] 6.00-7.00 sec 163 MBytes 1.36 Gbits/sec [ 4] 7.00-8.00 sec 174 MBytes 1.46 Gbits/sec [ 4] 8.00-9.00 sec 193 MBytes 1.62 Gbits/sec [ 4] 9.00-10.00 sec 196 MBytes 1.65 Gbits/sec [ 4] 10.00-11.00 sec 157 MBytes 1.31 Gbits/sec [ 4] 11.00-12.00 sec 175 MBytes 1.47 Gbits/sec [ 4] 12.00-13.00 sec 192 MBytes 1.61 Gbits/sec [ 4] 13.00-14.00 sec 199 MBytes 1.67 Gbits/sec (etc) After patch: [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.240.3 -V -l 1452 -t 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0+ #1 SMP Mon Apr 14 12:06:40 EDT 2014 x86_64 Time: Mon, 14 Apr 2014 16:40:48 GMT Connecting to host 192.168.240.3, port 5201 Cookie: Lab200slot2.1397493648.413274.65e131 [ 4] local 192.168.240.2 port 50548 connected to 192.168.240.3 port 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test [ ID] Interval Transfer Bandwidth [ 4] 0.00-1.00 sec 240 MBytes 2.02 Gbits/sec [ 4] 1.00-2.00 sec 239 MBytes 2.01 Gbits/sec [ 4] 2.00-3.00 sec 240 MBytes 2.01 Gbits/sec [ 4] 3.00-4.00 sec 239 MBytes 2.00 Gbits/sec [ 4] 4.00-5.00 sec 245 MBytes 2.05 Gbits/sec [ 4] 5.00-6.00 sec 240 MBytes 2.01 Gbits/sec [ 4] 6.00-7.00 sec 240 MBytes 2.02 Gbits/sec [ 4] 7.00-8.00 sec 239 MBytes 2.01 Gbits/sec With the reverted patch applied, the SCTP/IPv4 performance is back to normal on latest upstream for IPv4 and IPv6 and has same throughput as 3.4.2 test kernel, steady and interval reports are smooth again. Fixes: `ef2820a735` ("net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer") Reported-by: Peter Butler <pbutler@sonusnet.com> Reported-by: Dongsheng Song <dongsheng.song@gmail.com> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Tested-by: Peter Butler <pbutler@sonusnet.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nsn.com> Cc: Alexander Sverdlin <alexander.sverdlin@nsn.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-14 16:26:48 -04:00
Daniel Borkmann	8c482cdc35	net: filter: seccomp: fix wrong decoding of BPF_S_ANC_SECCOMP_LD_W While reviewing seccomp code, we found that BPF_S_ANC_SECCOMP_LD_W has been wrongly decoded by commit `a8fc927780` ("sk-filter: Add ability to get socket filter program (v2)") into the opcode BPF_LD\|BPF_B\|BPF_ABS although it should have been decoded as BPF_LD\|BPF_W\|BPF_ABS. In practice, this should not have much side-effect though, as such conversion is/was being done through prctl(2) PR_SET_SECCOMP. Reverse operation PR_GET_SECCOMP will only return the current seccomp mode, but not the filter itself. Since the transition to the new BPF infrastructure, it's also not used anymore, so we can simply remove this as it's unreachable. Fixes: `a8fc927780` ("sk-filter: Add ability to get socket filter program (v2)") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-14 16:26:47 -04:00
John W. Linville	08e22e193a	Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211	2014-04-14 13:47:01 -04:00
Eric Dumazet	30f78d8ebf	ipv6: Limit mtu to 65575 bytes Francois reported that setting big mtu on loopback device could prevent tcp sessions making progress. We do not support (yet ?) IPv6 Jumbograms and cook corrupted packets. We must limit the IPv6 MTU to (65535 + 40) bytes in theory. Tested: ifconfig lo mtu 70000 netperf -H ::1 Before patch : Throughput : 0.05 Mbits After patch : Throughput : 35484 Mbits Reported-by: Francois WELLENREITER <f.wellenreiter@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-14 12:39:59 -04:00
Patrick McHardy	b855d416dc	netfilter: nf_tables: fix nft_cmp_fast failure on big endian for size < 4 nft_cmp_fast is used for equality comparisions of size <= 4. For comparisions of size < 4 byte a mask is calculated that is applied to both the data from userspace (during initialization) and the register value (during runtime). Both values are stored using (in effect) memcpy to a memory area that is then interpreted as u32 by nft_cmp_fast. This works fine on little endian since smaller types have the same base address, however on big endian this is not true and the smaller types are interpreted as a big number with trailing zero bytes. The mask therefore must not include the lower bytes, but the higher bytes on big endian. Add a helper function that does a cpu_to_le32 to switch the bytes on big endian. Since we're dealing with a mask of just consequitive bits, this works out fine. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-14 10:38:02 +02:00
Andrey Vagin	ee214d54bf	netfilter: nf_conntrack: initialize net.ct.generation [ 251.920788] INFO: trying to register non-static key. [ 251.921386] the code is fine but needs lockdep annotation. [ 251.921386] turning off the locking correctness validator. [ 251.921386] CPU: 2 PID: 15715 Comm: socket_listen Not tainted 3.14.0+ #294 [ 251.921386] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 251.921386] 0000000000000000 000000009d18c210 ffff880075f039b8 ffffffff816b7ecd [ 251.921386] ffffffff822c3b10 ffff880075f039c8 ffffffff816b36f4 ffff880075f03aa0 [ 251.921386] ffffffff810c65ff ffffffff810c4a85 00000000fffffe01 ffffffffa0075172 [ 251.921386] Call Trace: [ 251.921386] [<ffffffff816b7ecd>] dump_stack+0x45/0x56 [ 251.921386] [<ffffffff816b36f4>] register_lock_class.part.24+0x38/0x3c [ 251.921386] [<ffffffff810c65ff>] __lock_acquire+0x168f/0x1b40 [ 251.921386] [<ffffffff810c4a85>] ? trace_hardirqs_on_caller+0x105/0x1d0 [ 251.921386] [<ffffffffa0075172>] ? nf_nat_setup_info+0x252/0x3a0 [nf_nat] [ 251.921386] [<ffffffff816c1215>] ? _raw_spin_unlock_bh+0x35/0x40 [ 251.921386] [<ffffffffa0075172>] ? nf_nat_setup_info+0x252/0x3a0 [nf_nat] [ 251.921386] [<ffffffff810c7272>] lock_acquire+0xa2/0x120 [ 251.921386] [<ffffffffa008ab90>] ? ipv4_confirm+0x90/0xf0 [nf_conntrack_ipv4] [ 251.921386] [<ffffffffa0055989>] __nf_conntrack_confirm+0x129/0x410 [nf_conntrack] [ 251.921386] [<ffffffffa008ab90>] ? ipv4_confirm+0x90/0xf0 [nf_conntrack_ipv4] [ 251.921386] [<ffffffffa008ab90>] ipv4_confirm+0x90/0xf0 [nf_conntrack_ipv4] [ 251.921386] [<ffffffff815e7b00>] ? ip_fragment+0x9f0/0x9f0 [ 251.921386] [<ffffffff815d8c5a>] nf_iterate+0xaa/0xc0 [ 251.921386] [<ffffffff815e7b00>] ? ip_fragment+0x9f0/0x9f0 [ 251.921386] [<ffffffff815d8d14>] nf_hook_slow+0xa4/0x190 [ 251.921386] [<ffffffff815e7b00>] ? ip_fragment+0x9f0/0x9f0 [ 251.921386] [<ffffffff815e98f2>] ip_output+0x92/0x100 [ 251.921386] [<ffffffff815e8df9>] ip_local_out+0x29/0x90 [ 251.921386] [<ffffffff815e9240>] ip_queue_xmit+0x170/0x4c0 [ 251.921386] [<ffffffff815e90d5>] ? ip_queue_xmit+0x5/0x4c0 [ 251.921386] [<ffffffff81601208>] tcp_transmit_skb+0x498/0x960 [ 251.921386] [<ffffffff81602d82>] tcp_connect+0x812/0x960 [ 251.921386] [<ffffffff810e3dc5>] ? ktime_get_real+0x25/0x70 [ 251.921386] [<ffffffff8159ea2a>] ? secure_tcp_sequence_number+0x6a/0xc0 [ 251.921386] [<ffffffff81606f57>] tcp_v4_connect+0x317/0x470 [ 251.921386] [<ffffffff8161f645>] __inet_stream_connect+0xb5/0x330 [ 251.921386] [<ffffffff8158dfc3>] ? lock_sock_nested+0x33/0xa0 [ 251.921386] [<ffffffff810c4b5d>] ? trace_hardirqs_on+0xd/0x10 [ 251.921386] [<ffffffff81078885>] ? __local_bh_enable_ip+0x75/0xe0 [ 251.921386] [<ffffffff8161f8f8>] inet_stream_connect+0x38/0x50 [ 251.921386] [<ffffffff8158b157>] SYSC_connect+0xe7/0x120 [ 251.921386] [<ffffffff810e3789>] ? current_kernel_time+0x69/0xd0 [ 251.921386] [<ffffffff810c4a85>] ? trace_hardirqs_on_caller+0x105/0x1d0 [ 251.921386] [<ffffffff810c4b5d>] ? trace_hardirqs_on+0xd/0x10 [ 251.921386] [<ffffffff8158c36e>] SyS_connect+0xe/0x10 [ 251.921386] [<ffffffff816caf69>] system_call_fastpath+0x16/0x1b [ 312.014104] INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 0, t=60003 jiffies, g=42359, c=42358, q=333) [ 312.015097] INFO: Stall ended before state dump start Fixes: `93bb0ceb75` ("netfilter: conntrack: remove central spinlock nf_conntrack_lock") Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-14 10:35:28 +02:00
Patrick McHardy	60eb18943b	netfilter: nf_tables: handle more than 8 * PAGE_SIZE set name allocations We currently have a limit of 8 * PAGE_SIZE anonymous sets. Lift that limit by continuing the scan if the entire page is exhausted. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-14 10:31:20 +02:00
Mathias Krause	05ab8f2647	filter: prevent nla extensions to peek beyond the end of the message The BPF_S_ANC_NLATTR and BPF_S_ANC_NLATTR_NEST extensions fail to check for a minimal message length before testing the supplied offset to be within the bounds of the message. This allows the subtraction of the nla header to underflow and therefore -- as the data type is unsigned -- allowing far to big offset and length values for the search of the netlink attribute. The remainder calculation for the BPF_S_ANC_NLATTR_NEST extension is also wrong. It has the minuend and subtrahend mixed up, therefore calculates a huge length value, allowing to overrun the end of the message while looking for the netlink attribute. The following three BPF snippets will trigger the bugs when attached to a UNIX datagram socket and parsing a message with length 1, 2 or 3. ,-[ PoC for missing size check in BPF_S_ANC_NLATTR ]-- \| ld #0x87654321 \| ldx #42 \| ld #nla \| ret a `--- ,-[ PoC for the same bug in BPF_S_ANC_NLATTR_NEST ]-- \| ld #0x87654321 \| ldx #42 \| ld #nlan \| ret a `--- ,-[ PoC for wrong remainder calculation in BPF_S_ANC_NLATTR_NEST ]-- \| ; (needs a fake netlink header at offset 0) \| ld #0 \| ldx #42 \| ld #nlan \| ret a `--- Fix the first issue by ensuring the message length fulfills the minimal size constrains of a nla header. Fix the second bug by getting the math for the remainder calculation right. Fixes: `4738c1db15` ("[SKFILTER]: Add SKF_ADF_NLATTR instruction") Fixes: `d214c7537b` ("filter: add SKF_AD_NLATTR_NEST to look for nested..") Cc: Patrick McHardy <kaber@trash.net> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Mathias Krause <minipli@googlemail.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-13 23:31:55 -04:00
Julian Anastasov	91146153da	ipv4: return valid RTA_IIF on ip route get Extend commit `13378cad02` ("ipv4: Change rt->rt_iif encoding.") from 3.6 to return valid RTA_IIF on 'ip route get ... iif DEVICE' instead of rt_iif 0 which is displayed as 'iif *'. inet_iif is not appropriate to use because skb_iif is not set. Use the skb->dev->ifindex instead. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-13 23:23:41 -04:00
Wang, Xiaoming	b04c461902	net: ipv4: current group_info should be put after using. Plug a group_info refcount leak in ping_init. group_info is only needed during initialization and the code failed to release the reference on exit. While here move grabbing the reference to a place where it is actually needed. Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com> Signed-off-by: Zhang Dongxing <dongxing.zhang@intel.com> Signed-off-by: xiaoming wang <xiaoming.wang@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-13 22:52:58 -04:00
Linus Torvalds	454fd351f2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull yet more networking updates from David Miller: 1) Various fixes to the new Redpine Signals wireless driver, from Fariya Fatima. 2) L2TP PPP connect code takes PMTU from the wrong socket, fix from Dmitry Petukhov. 3) UFO and TSO packets differ in whether they include the protocol header in gso_size, account for that in skb_gso_transport_seglen(). From Florian Westphal. 4) If VLAN untagging fails, we double free the SKB in the bridging output path. From Toshiaki Makita. 5) Several call sites of sk->sk_data_ready() were referencing an SKB just added to the socket receive queue in order to calculate the second argument via skb->len. This is dangerous because the moment the skb is added to the receive queue it can be consumed in another context and freed up. It turns out also that none of the sk->sk_data_ready() implementations even care about this second argument. So just kill it off and thus fix all these use-after-free bugs as a side effect. 6) Fix inverted test in tcp_v6_send_response(), from Lorenzo Colitti. 7) pktgen needs to do locking properly for LLTX devices, from Daniel Borkmann. 8) xen-netfront driver initializes TX array entries in RX loop :-) From Vincenzo Maffione. 9) After refactoring, some tunnel drivers allow a tunnel to be configured on top itself. Fix from Nicolas Dichtel. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits) vti: don't allow to add the same tunnel twice gre: don't allow to add the same tunnel twice drivers: net: xen-netfront: fix array initialization bug pktgen: be friendly to LLTX devices r8152: check RTL8152_UNPLUG net: sun4i-emac: add promiscuous support net/apne: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO net: ipv6: Fix oif in TCP SYN+ACK route lookup. drivers: net: cpsw: enable interrupts after napi enable and clearing previous interrupts drivers: net: cpsw: discard all packets received when interface is down net: Fix use after free by removing length arg from sk_data_ready callbacks. Drivers: net: hyperv: Address UDP checksum issues Drivers: net: hyperv: Negotiate suitable ndis version for offload support Drivers: net: hyperv: Allocate memory for all possible per-pecket information bridge: Fix double free and memory leak around br_allowed_ingress bonding: Remove debug_fs files when module init fails i40evf: program RSS LUT correctly i40evf: remove open-coded skb_cow_head ixgb: remove open-coded skb_cow_head igbvf: remove open-coded skb_cow_head ...	2014-04-12 17:31:22 -07:00
Nicolas Dichtel	8d89dcdf80	vti: don't allow to add the same tunnel twice Before the patch, it was possible to add two times the same tunnel: ip l a vti1 type vti remote 10.16.0.121 local 10.16.0.249 key 41 ip l a vti2 type vti remote 10.16.0.121 local 10.16.0.249 key 41 It was possible, because ip_tunnel_newlink() calls ip_tunnel_find() with the argument dev->type, which was set only later (when calling ndo_init handler in register_netdevice()). Let's set this type in the setup handler, which is called before newlink handler. Introduced by commit `b9959fd3b0` ("vti: switch to new ip tunnel code"). CC: Cong Wang <amwang@redhat.com> CC: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-12 17:03:11 -04:00
Nicolas Dichtel	5a4552752d	gre: don't allow to add the same tunnel twice Before the patch, it was possible to add two times the same tunnel: ip l a gre1 type gre remote 10.16.0.121 local 10.16.0.249 ip l a gre2 type gre remote 10.16.0.121 local 10.16.0.249 It was possible, because ip_tunnel_newlink() calls ip_tunnel_find() with the argument dev->type, which was set only later (when calling ndo_init handler in register_netdevice()). Let's set this type in the setup handler, which is called before newlink handler. Introduced by commit `c544193214` ("GRE: Refactor GRE tunneling code."). CC: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-12 17:03:11 -04:00
Daniel Borkmann	0f2eea4b7e	pktgen: be friendly to LLTX devices Similarly to commit `43279500de` ("packet: respect devices with LLTX flag in direct xmit"), we can basically apply the very same to pktgen. This will help testing against LLTX devices such as dummy driver (or others), which only have a single netdevice txq and would otherwise require locking their txq from pktgen side while e.g. in dummy case, we would not need any locking. Fix this by making use of HARD_TX_{UN,}LOCK API, so that NETIF_F_LLTX will be respected. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-12 01:59:38 -04:00
Linus Torvalds	582076ab16	A bunch of updates and cleanup within the transport layer, particularly with a focus on RDMA. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) Comment: GPGTools - http://gpgtools.org iQIcBAABAgAGBQJTRXV1AAoJEDZk62b0Tg6xYWAP/0Kfp9/8Z05SQ1fnJveySw1O nKK//1/GBmquntqFAVHI6yJRLzPJ+Z/Y4u4a0qriwfgpPQvUJQrL77tY/VfEBUPB VoJG1tj7lpdLrO3p4YyPkPjymyC7YOoFNjGEstWFg7HetwnnqqZL2LB+5yJzyqjx y9nv1HzsrbAE7j8C4hQ1Nmds5muUb5VhnTtPhjrx4tP1sWWh8XTVJbsVDiEqx6cu uJXFFTbkONr9jKfv+Ki3H2pZej2yD7w4tU4lkdcGNyij/Q4Xn1iERqroW2/GT6Cl AXxlIKN24ASjWo4VqW0Wf8gO8vbUtHRChoiZ69DvzNTkbWmAIWSFHyQJ4cinwuyr UbOQZuccO59QtpNVpBvG/vjnbI54rg+VGLy+xE0vcrBDlyoptc56IAFSg8zJY5UN ysbyHCGME//9VZ1zeeZvkMjm8z5Enp6x4zmtnUHmufO7DVMTFUePED6U1u9WIyP5 FFy5EboXMSh97yB8REvbIlY2MgBJWYdnyzLKFMeRpzC8fOXJqBoCX8i3Z5SkMYWJ 1FS/pGr7ec/VX1iHXSYi9hhzTJ6o9mEmOIhaO4UcqAuK8Rk2jbCp1Lx0iDhmJtdT zjofGDe57ro7nOZf8A/TI5z+6BZq8KYVfZrXtYaPqC4rXOzJAox/yHlz9FmAJYgv ssOIKXX0ujLwqrMBVatj =yzy/ -----END PGP SIGNATURE----- Merge tag 'for-linus-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs Pull 9p changes from Eric Van Hensbergen: "A bunch of updates and cleanup within the transport layer, particularly with a focus on RDMA" * tag 'for-linus-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: 9pnet_rdma: check token type before int conversion 9pnet: trans_fd : allocate struct p9_trans_fd and struct p9_conn together. 9pnet: p9_client->conn field is unused. Remove it. 9P: Get rid of REQ_STATUS_FLSH 9pnet_rdma: add cancelled() 9pnet_rdma: update request status during send 9P: Add cancelled() to the transport functions. net: Mark function as static in 9p/client.c 9P: Add memory barriers to protect request fields over cb/rpc threads handoff	2014-04-11 14:14:57 -07:00
Lorenzo Colitti	a36dbdb28e	net: ipv6: Fix oif in TCP SYN+ACK route lookup. net-next commit `9c76a11`, ipv6: tcp_ipv6 policy route issue, had a boolean logic error that caused incorrect behaviour for TCP SYN+ACK when oif-based rules are in use. Specifically: 1. If a SYN comes in from a global address, and sk_bound_dev_if is not set, the routing lookup has oif set to the interface the SYN came in on. Instead, it should have oif unset, because for global addresses, the incoming interface doesn't necessarily have any bearing on the interface the SYN+ACK is sent out on. 2. If a SYN comes in from a link-local address, and sk_bound_dev_if is set, the routing lookup has oif set to the interface the SYN came in on. Instead, it should have oif set to sk_bound_dev_if, because that's what the application requested. Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-11 16:43:47 -04:00
David S. Miller	676d23690f	net: Fix use after free by removing length arg from sk_data_ready callbacks. Several spots in the kernel perform a sequence like: skb_queue_tail(&sk->s_receive_queue, skb); sk->sk_data_ready(sk, skb->len); But at the moment we place the SKB onto the socket receive queue it can be consumed and freed up. So this skb->len access is potentially to freed up memory. Furthermore, the skb->len can be modified by the consumer so it is possible that the value isn't accurate. And finally, no actual implementation of this callback actually uses the length argument. And since nobody actually cared about it's value, lots of call sites pass arbitrary values in such as '0' and even '1'. So just remove the length argument from the callback, that way there is no confusion whatsoever and all of these use-after-free cases get fixed as a side effect. Based upon a patch by Eric Dumazet and his suggestion to audit this issue tree-wide. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-11 16:15:36 -04:00
Toshiaki Makita	eb7076182d	bridge: Fix double free and memory leak around br_allowed_ingress br_allowed_ingress() has two problems. 1. If br_allowed_ingress() is called by br_handle_frame_finish() and vlan_untag() in br_allowed_ingress() fails, skb will be freed by both vlan_untag() and br_handle_frame_finish(). 2. If br_allowed_ingress() is called by br_dev_xmit() and br_allowed_ingress() fails, the skb will not be freed. Fix these two problems by freeing the skb in br_allowed_ingress() if it fails. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-11 15:12:47 -04:00
Mikel Astiz	b16c660488	Bluetooth: Request MITM Protection when initiator The GAP Specification gives the flexibility to decide whether MITM Protection is requested or not (Bluetooth Core Specification v4.0 Volume 3, part C, section 6.5.3) when replying to an HCI_EV_IO_CAPA_REQUEST event. The recommendation is not to set this flag "unless the security policy of an available local service requires MITM Protection" (regardless of the bonding type). However, the kernel doesn't necessarily have this information and therefore the safest choice is to always use MITM Protection, also for General Bonding. This patch changes the behavior for the General Bonding initiator role, always requesting MITM Protection even if no high security level is used. Depending on the remote capabilities, the protection might not be actually used, and we will accept this locally unless of course a high security level was originally required. Note that this was already done for Dedicated Bonding. No-Bonding is left unmodified because MITM Protection is normally not desired in these cases. Signed-off-by: Mikel Astiz <mikel.astiz@bmw-carit.de> Signed-off-by: Timo Mueller <timo.mueller@bmw-carit.de> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-04-11 10:33:08 -07:00
Timo Mueller	7e74170af1	Bluetooth: Use MITM Protection when IO caps allow it When responding to a remotely-initiated pairing procedure, a MITM protected SSP associaton model can be used for pairing if both local and remote IO capabilities are set to something other than NoInputNoOutput, regardless of the bonding type (Dedicated or General). This was already done for Dedicated Bonding but this patch proposes to use the same policy for General Bonding as well. The GAP Specification gives the flexibility to decide whether MITM Protection is used ot not (Bluetooth Core Specification v4.0 Volume 3, part C, section 6.5.3). Note however that the recommendation is not to set this flag "unless the security policy of an available local service requires MITM Protection" (for both Dedicated and General Bonding). However, as we are already requiring MITM for Dedicated Bonding, we will follow this behaviour also for General Bonding. Signed-off-by: Timo Mueller <timo.mueller@bmw-carit.de> Signed-off-by: Mikel Astiz <mikel.astiz@bmw-carit.de> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-04-11 10:33:08 -07:00
Mikel Astiz	6fd6b915bd	Bluetooth: Refactor code for outgoing dedicated bonding Do not always set the MITM protection requirement by default in the field conn->auth_type, since this will be added later in hci_io_capa_request_evt(), as part of the requirements specified in HCI_OP_IO_CAPABILITY_REPLY. This avoids a hackish exception for the auto-reject case, but doesn't change the behavior of the code at all. Signed-off-by: Mikel Astiz <mikel.astiz@bmw-carit.de> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-04-11 10:33:08 -07:00
Mikel Astiz	b7f94c8808	Bluetooth: Refactor hci_get_auth_req() Refactor the code without changing its behavior by handling the no-bonding cases first followed by General Bonding. Signed-off-by: Mikel Astiz <mikel.astiz@bmw-carit.de> Signed-off-by: Timo Mueller <timo.mueller@bmw-carit.de> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-04-11 10:33:08 -07:00
Janusz Dziedzic	4f267c1198	cfg80211: reg: set DFS CAC time in case of custom regd Set DFS CAC time also in case of using custom and strict regulatory from drivers. In other case we could have unset DFS CAC time directly after driver loaded and before issue regulatory set from user mode. Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-11 10:00:10 +02:00
Felix Fietkau	764152ff66	mac80211: exclude AP_VLAN interfaces from tx power calculation Their power value is initialized to zero. This patch fixes an issue where the configured power drops to the minimum value when AP_VLAN interfaces are created/removed. Cc: stable@vger.kernel.org Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-11 09:37:41 +02:00
Felix Fietkau	990baec022	mac80211: suppress BSS info change notifications for AP_VLAN Fixes warnings on tx power changes Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-11 09:37:01 +02:00
Heikki Krogerus	781d4e0fb7	net: rfkill: gpio: hard-code the gpio names There is no need to dynamically generate the names. This will fix a static checker warning.. net/rfkill/rfkill-gpio.c:144 rfkill_gpio_probe() warn: variable dereferenced before check 'rfkill->name' Acked-by: Alexandre Courbot <acourbot@nvidia.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-11 09:29:22 +02:00
Heikki Krogerus	41d09df1e0	net: rfkill: gpio: add ACPI IDs for a Broadcom bluetooth chip This adds ACPI IDs for Broadcom bluetooth chip BCM43241 used on various Baytrail based boards such as Lenovo Miix 2 and Asus Transformer Book T100TA. Acked-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-11 09:29:21 +02:00
Heikki Krogerus	04f1b47cd7	net: rfkill: gpio: add ACPI ID for GPS module on Lenovo Miix2 On Lenovo Miix 2 8", BCM4752 is renamed LNV4752. Acked-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-11 09:29:21 +02:00
Heikki Krogerus	848ef58695	net: rfkill: gpio: remove unused and obsolete platform parameters After upgrading to descriptor based gpios, the gpio numbers are not used anymore. The power_clk_name and the platform specific setup and close hooks are not used by anybody, and we should not encourage use of such things, so removing them. Acked-by: Alexandre Courbot <acourbot@nvidia.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-11 09:29:16 +02:00
Florian Westphal	6d39d589bb	net: core: don't account for udp header size when computing seglen In case of tcp, gso_size contains the tcpmss. For UFO (udp fragmentation offloading) skbs, gso_size is the fragment payload size, i.e. we must not account for udp header size. Otherwise, when using virtio drivers, a to-be-forwarded UFO GSO packet will be needlessly fragmented in the forward path, because we think its individual segments are too large for the outgoing link. Fixes: `fe6cc55f3a` ("net: ip, ipv6: handle gso skbs in forwarding path") Cc: Eric Dumazet <eric.dumazet@gmail.com> Reported-by: Tobias Brunner <tobias@strongswan.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-10 21:41:25 -04:00
Johannes Berg	c14a74007f	cfg80211: ignore invalid BSSIDs when looking for BSSes When looking for a BSS matching given parameters, ignore invalid BSSIDs. This avoids, for example, trying to join an IBSS that has a multicast BSSID, which isn't supported by all drivers nor is it a valid configuration of the IBSS so better create a new one with a correctly chosen random BSSID. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-10 10:09:18 +02:00
Johannes Berg	74f8274103	cfg80211: reject invalid IBSS BSSIDs in wext compat code Don't allow using a multicast address as the BSSID, that isn't a valid configuration. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-10 10:09:16 +02:00
Zhao, Gang	96998e3a2f	cfg80211: remove unused wiphy argument from cfg80211_wext_freq() cfg80211_wext_freq() is declared in wext-compat.h, but its parameter struct wiphy's declaration is not included there. As the parameter isn't used, just remove it. Signed-off-by: Zhao, Gang <gamerh2o@gmail.com> [remove parameter instead of changing to netdev] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-10 10:06:19 +02:00
Dmitry Petukhov	f34c4a35d8	l2tp: take PMTU from tunnel UDP socket When l2tp driver tries to get PMTU for the tunnel destination, it uses the pointer to struct sock that represents PPPoX socket, while it should use the pointer that represents UDP socket of the tunnel. Signed-off-by: Dmitry Petukhov <dmgenp@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-09 15:24:38 -04:00
Daniel Borkmann	1e1cdf8ac7	net: sctp: test if association is dead in sctp_wake_up_waiters In function sctp_wake_up_waiters(), we need to involve a test if the association is declared dead. If so, we don't have any reference to a possible sibling association anymore and need to invoke sctp_write_space() instead, and normally walk the socket's associations and notify them of new wmem space. The reason for special casing is that otherwise, we could run into the following issue when a sctp_primitive_SEND() call from sctp_sendmsg() fails, and tries to flush an association's outq, i.e. in the following way: sctp_association_free() `-> list_del(&asoc->asocs) <-- poisons list pointer asoc->base.dead = true sctp_outq_free(&asoc->outqueue) `-> __sctp_outq_teardown() `-> sctp_chunk_free() `-> consume_skb() `-> sctp_wfree() `-> sctp_wake_up_waiters() <-- dereferences poisoned pointers if asoc->ep->sndbuf_policy=0 Therefore, only walk the list in an 'optimized' way if we find that the current association is still active. We could also use list_del_init() in addition when we call sctp_association_free(), but as Vlad suggests, we want to trap such bugs and thus leave it poisoned as is. Why is it safe to resolve the issue by testing for asoc->base.dead? Parallel calls to sctp_sendmsg() are protected under socket lock, that is lock_sock()/release_sock(). Only within that path under lock held, we're setting skb/chunk owner via sctp_set_owner_w(). Eventually, chunks are freed directly by an association still under that lock. So when traversing association list on destruction time from sctp_wake_up_waiters() via sctp_wfree(), a different CPU can't be running sctp_wfree() while another one calls sctp_association_free() as both happens under the same lock. Therefore, this can also not race with setting/testing against asoc->base.dead as we are guaranteed for this to happen in order, under lock. Further, Vlad says: the times we check asoc->base.dead is when we've cached an association pointer for later processing. In between cache and processing, the association may have been freed and is simply still around due to reference counts. We check asoc->base.dead under a lock, so it should always be safe to check and not race against sctp_association_free(). Stress-testing seems fine now, too. Fixes: cd253f9f357d ("net: sctp: wake up all assocs if sndbuf policy is per socket") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-09 12:37:49 -04:00

... 14 15 16 17 18 ...

33976 Commits