make build fail if structure no longer fits into ->cb storage.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
If an over-MTU UDP datagram is sent through a SOCK_RAW socket to a
UFO-capable device, ip_ufo_append_data() sets skb->ip_summed to
CHECKSUM_PARTIAL unconditionally as all GSO code assumes transport layer
checksum is to be computed on segmentation. However, in this case,
skb->csum_start and skb->csum_offset are never set as raw socket
transmit path bypasses udp_send_skb() where they are usually set. As a
result, driver may access invalid memory when trying to calculate the
checksum and store the result (as observed in virtio_net driver).
Moreover, the very idea of modifying the userspace provided UDP header
is IMHO against raw socket semantics (I wasn't able to find a document
clearly stating this or the opposite, though). And while allowing
CHECKSUM_NONE in the UFO case would be more efficient, it would be a bit
too intrusive change just to handle a corner case like this. Therefore
disallowing UFO for packets from SOCK_DGRAM seems to be the best option.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
rxrpc_resend_timeout has an initial value of 4 * HZ; use it as-is.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Typo, 'stop' is never set to true.
Seems intent is to not attempt to retransmit more packets after sendmsg
returns an error.
This change is based on code inspection only.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
To repeat:
$ sudo ip link del hsr0
BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
IP: [<ffffffff8187f495>] hsr_del_port+0x15/0xa0
etc...
Bug description:
As part of the hsr master device destruction, hsr_del_port() is called for each of
the hsr ports. At each such call, the master device is updated regarding features
and mtu. When the master device is freed before the slave interfaces, master will
be NULL in hsr_del_port(), which led to a NULL pointer dereference.
Additionally, dev_put() was called on the master device itself in hsr_del_port(),
causing a refcnt error.
A third bug in the same code path was that the rtnl lock was not taken before
hsr_del_port() was called as part of hsr_dev_destroy().
The reporter (Nicolas Dichtel) also said: "hsr_netdev_notify() supposes that the
port will always be available when the notification is for an hsr interface. It's
wrong. For example, netdev_wait_allrefs() may resend NETDEV_UNREGISTER.". As a
precaution against this, a check for port == NULL was added in hsr_dev_notify().
Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Fixes: 51f3c60531 ("net/hsr: Move slave init to hsr_slave.c.")
Signed-off-by: Arvid Brodin <arvid.brodin@alten.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
We did a failed attempt in the past to only use rcu in rtnl dump
operations (commit e67f88dd12 "net: dont hold rtnl mutex during
netlink dump callbacks")
Now that dumps are holding RTNL anyway, there is no need to also
use rcu locking, as it forbids any scheduling ability, like
GFP_KERNEL allocations that controlling path should use instead
of GFP_ATOMIC whenever possible.
This should fix following splat Cong Wang reported :
[ INFO: suspicious RCU usage. ]
3.19.0+ #805 Tainted: G W
include/linux/rcupdate.h:538 Illegal context switch in RCU read-side critical section!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 0
2 locks held by ip/771:
#0: (rtnl_mutex){+.+.+.}, at: [<ffffffff8182b8f4>] netlink_dump+0x21/0x26c
#1: (rcu_read_lock){......}, at: [<ffffffff817d785b>] rcu_read_lock+0x0/0x6e
stack backtrace:
CPU: 3 PID: 771 Comm: ip Tainted: G W 3.19.0+ #805
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
0000000000000001 ffff8800d51e7718 ffffffff81a27457 0000000029e729e6
ffff8800d6108000 ffff8800d51e7748 ffffffff810b539b ffffffff820013dd
00000000000001c8 0000000000000000 ffff8800d7448088 ffff8800d51e7758
Call Trace:
[<ffffffff81a27457>] dump_stack+0x4c/0x65
[<ffffffff810b539b>] lockdep_rcu_suspicious+0x107/0x110
[<ffffffff8109796f>] rcu_preempt_sleep_check+0x45/0x47
[<ffffffff8109e457>] ___might_sleep+0x1d/0x1cb
[<ffffffff8109e67d>] __might_sleep+0x78/0x80
[<ffffffff814b9b1f>] idr_alloc+0x45/0xd1
[<ffffffff810cb7ab>] ? rcu_read_lock_held+0x3b/0x3d
[<ffffffff814b9f9d>] ? idr_for_each+0x53/0x101
[<ffffffff817c1383>] alloc_netid+0x61/0x69
[<ffffffff817c14c3>] __peernet2id+0x79/0x8d
[<ffffffff817c1ab7>] peernet2id+0x13/0x1f
[<ffffffff817d8673>] rtnl_fill_ifinfo+0xa8d/0xc20
[<ffffffff810b17d9>] ? __lock_is_held+0x39/0x52
[<ffffffff817d894f>] rtnl_dump_ifinfo+0x149/0x213
[<ffffffff8182b9c2>] netlink_dump+0xef/0x26c
[<ffffffff8182bcba>] netlink_recvmsg+0x17b/0x2c5
[<ffffffff817b0adc>] __sock_recvmsg+0x4e/0x59
[<ffffffff817b1b40>] sock_recvmsg+0x3f/0x51
[<ffffffff817b1f9a>] ___sys_recvmsg+0xf6/0x1d9
[<ffffffff8115dc67>] ? handle_pte_fault+0x6e1/0xd3d
[<ffffffff8100a3a0>] ? native_sched_clock+0x35/0x37
[<ffffffff8109f45b>] ? sched_clock_local+0x12/0x72
[<ffffffff8109f6ac>] ? sched_clock_cpu+0x9e/0xb7
[<ffffffff810cb7ab>] ? rcu_read_lock_held+0x3b/0x3d
[<ffffffff811abde8>] ? __fcheck_files+0x4c/0x58
[<ffffffff811ac556>] ? __fget_light+0x2d/0x52
[<ffffffff817b376f>] __sys_recvmsg+0x42/0x60
[<ffffffff817b379f>] SyS_recvmsg+0x12/0x1c
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 0c7aecd4bd ("netns: add rtnl cmd to add and get peer netns ids")
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
four-way-handshake problem. The others are various small fixes
for problems all over, nothing really stands out.
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJU8HQ5AAoJEDBSmw7B7bqrGeUQAK7O/UliGPxPNSMs1Mmb+Fj/
GT51sGhZLYXUj2nMMFTTPuhR5qXdgSsBaonitNQaAmJtMtmO6R21rcd74NDB29i7
1xHgtdxJDyDYh6zWUX6+IETV/eRHuS7vlfiiBY/PHAymAQZVa1doC5nN0yCfPHkd
XanD4HRXEZppiw+OVh3VkgvxQZkhqmExQDdYZYWGVm8b3cspqa0l/e9WpLrU/Z19
15liupuhINcDAl5KAsR0oaBy/vUsETA6+H9K4fzZG6bemlaD5T+nkNqjlEPbw806
sMxiWZ7jrI1n5v2KiNHkKTJakzHsz08zb9nooA9swusSImflr8BUm3lNz+IVsqtL
zBvtMEiGwNlGkSLdHieICBktXw7/Lw2VUSKgWlRj1LZSXfd3kISXrwdR46NOr93e
cBzaGn7H8YrVM2Sl/EraRimYL9womICPr6+flE+YAeRnZzsAFrBd9qegc2N3LTt+
NtTqS/JbGAKyOS84JCge+1omGOKPIuQGBOtEIy1U0zlxibZzEzCulrKSeGjQ8o3Q
dfrMYEwpicK4ADh7wxeslD6QlgkcQ/Ft7agkP+ps4m5ngFXo3swup+hxNMDoyyi3
BYiOGDum6SBzrOK8bN4hv3jEHK7e3fo68NeLhJ34Ap4K7jY9kOi8sk7WY8NerZBI
19oVt0+0yyzKVlaI7IQE
=H3+7
-----END PGP SIGNATURE-----
Merge tag 'mac80211-for-davem-2015-02-27' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
A few patches have accumulated, among them the fix for Linus's
four-way-handshake problem. The others are various small fixes
for problems all over, nothing really stands out.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When applicable verify that the caller has permisson to the underlying
network namespace for a newly created network device.
Similary checks exist for the network namespace a network device will
be created in.
Fixes: 317f4810e4 ("rtnl: allow to create device with IFLA_LINK_NETNSID set")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When applicable verify that the caller has permision to create a
network device in another network namespace. This check is already
present when moving a network device between network namespaces in
setlink so all that is needed is to duplicate that check in newlink.
This change almost backports cleanly, but there are context conflicts
as the code that follows was added in v4.0-rc1
Fixes: b51642f6d7 net: Enable a userns root rtnl calls that are safe for unprivilged users
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, all real users of rhashtable default their grow and shrink
decision functions to rht_grow_above_75() and rht_shrink_below_30(),
so that there's currently no need to have this explicitly selectable.
It can/should be generic and private inside rhashtable until a real
use case pops up. Since we can make this private, we'll save us this
additional indirection layer and can improve insertion/deletion time
as well.
Reference: http://patchwork.ozlabs.org/patch/443040/
Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current minstrel_ht rate control behavior is somewhat optimistic in
trying to find optimum TX rate. While this is usually fine for normal
Data frames, there are cases where a more conservative set of retry
parameters would be beneficial to make the connection more robust.
EAPOL frames are critical to the authentication and especially the
EAPOL-Key message 4/4 (the last message in the 4-way handshake) is
important to get through to the AP. If that message is lost, the only
recovery mechanism in many cases is to reassociate with the AP and start
from scratch. This can often be avoided by trying to send the frame with
more conservative rate and/or with more link layer retries.
In most cases, minstrel_ht is currently using the initial EAPOL-Key
frames for probing higher rates and this results in only five link layer
transmission attempts (one at high(ish) MCS and four at MCS0). While
this works with most APs, it looks like there are some deployed APs that
may have issues with the EAPOL frames using HT MCS immediately after
association. Similarly, there may be issues in cases where the signal
strength or radio environment is not good enough to be able to get
frames through even at couple of MCS 0 tries.
The best approach for this would likely to be to reduce the TX rate for
the last rate (3rd rate parameter in the set) to a low basic rate (say,
6 Mbps on 5 GHz and 2 or 5.5 Mbps on 2.4 GHz), but doing that cleanly
requires some more effort. For now, we can start with a simple one-liner
that forces the minimum rate to be used for EAPOL frames similarly how
the TX rate is selected for the IEEE 802.11 Management frames. This does
result in a small extra latency added to the cases where the AP would be
able to receive the higher rate, but taken into account how small number
of EAPOL frames are used, this is likely to be insignificant. A future
optimization in the minstrel_ht design can also allow this patch to be
reverted to get back to the more optimized initial TX rate.
It should also be noted that many drivers that do not use minstrel as
the rate control algorithm are already doing similar workarounds by
forcing the lowest TX rate to be used for EAPOL frames.
Cc: stable@vger.kernel.org
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Before da413eec72 ("packet: Fixed TPACKET V3 to signal poll when block is
closed rather than every packet") poll listening for an af_packet socket was
not signaled if there was no packets to process. After the patch poll is
signaled evety time when block retire timer expires. That happens because
af_packet closes the current block on timeout even if the block is empty.
Passing empty blocks to the user not only wastes CPU but also wastes ring
buffer space increasing probability of packets dropping on small timeouts.
Signed-off-by: Alexander Drozdov <al.drozdov@gmail.com>
Cc: Dan Collins <dan@dcollins.co.nz>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Guy Harris <guy@alum.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arrays (when not in a struct) "shall have a value greater than zero".
GCC complains when it's not the case here.
Fixes: ba7d49b1f0 ("rtnetlink: provide api for getting and setting slave info")
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 06d961a8e2 ("mac80211/minstrel: use the new rate control API")
inverted the condition 'if (msr->sample_limit != 0)' to
'if (!msr->sample_limit != 0)'. But it is confusing both to people and
compilers (gcc5):
net/mac80211/rc80211_minstrel.c: In function 'minstrel_get_rate':
net/mac80211/rc80211_minstrel.c:376:26: warning: logical not is only applied to the left hand side of comparison
if (!msr->sample_limit != 0)
^
Let there be only 'if (!msr->sample_limit)'.
Fixes: 06d961a8e2 ("mac80211/minstrel: use the new rate control API")
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
nl80211_exit should be called in cfg80211_init if nl80211_init succeeds
but regulatory_init or create_singlethread_workqueue fails.
Signed-off-by: Junjie Mao <junjie_mao@yeah.net>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There are currently 8 rules in the world_regdom, but only the first 6
are applied due to an incorrect value for n_reg_rules. This causes
channels 149-165 and 60GHz to be disabled.
Signed-off-by: Jason Abele <jason@aether.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If monitor flags parsing results in active monitor but that
isn't supported, the already allocated message is leaked.
Fix this by moving the allocation after this check.
Reported-by: Christian Engelmayer <cengelma@gmx.at>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
We currently add nested members of the NL80211_ATTR_SCAN_FREQUENCIES
as NLA_U32 attributes of type NL80211_ATTR_WIPHY_FREQ in
cfg80211_net_detect_results. However, since there can be an arbitrary number of
frequency results, we should use the loop index of the loop used to add the
frequency results to NL80211_ATTR_SCAN_FREQUENCIES as the type (i.e. nla_type)
for each result attribute, rather than a fixed type.
This change is in line with how nested members are added to
NL80211_ATTR_SCAN_FREQUENCIES in the functions nl80211_send_wowlan_nd and
nl80211_add_scan_req.
Signed-off-by: Samuel Tan <samueltan@chromium.org>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If ieee80211_vif_use_channel() fails, we have to clear
sdata->radar_required (which we might have just set).
Failing to do it results in stale radar_required field
which prevents starting new scan requests.
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Eliad Peller <eliad@wizery.com>
[use false instead of 0]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Currently we don't check if the new MTU is valid or not and this allows
one to configure a smaller than minimum allowed by RFCs or even bigger
than interface own MTU, which is a problem as it may lead to packet
drops.
If you have a daemon like NetworkManager running, this may be exploited
by remote attackers by forging RA packets with an invalid MTU, possibly
leading to a DoS. (NetworkManager currently only validates for values
too small, but not for too big ones.)
The fix is just to make sure the new value is valid. That is, between
IPV6_MIN_MTU and interface's MTU.
Note that similar check is already performed at
ndisc_router_discovery(), for when kernel itself parses the RA.
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
With commit a7526eb5d0 (net: Unbreak compat_sys_{send,recv}msg), the
MSG_CMSG_COMPAT flag is blocked at the compat syscall entry points,
changing the kernel compat behaviour from the one before the commit it
was trying to fix (1be374a051, net: Block MSG_CMSG_COMPAT in
send(m)msg and recv(m)msg).
On 32-bit kernels (!CONFIG_COMPAT), MSG_CMSG_COMPAT is 0 and the native
32-bit sys_sendmsg() allows flag 0x80000000 to be set (it is ignored by
the kernel). However, on a 64-bit kernel, the compat ABI is different
with commit a7526eb5d0.
This patch changes the compat_sys_{send,recv}msg behaviour to the one
prior to commit 1be374a051.
The problem was found running 32-bit LTP (sendmsg01) binary on an arm64
kernel. Arguably, LTP should not pass 0xffffffff as flags to sendmsg()
but the general rule is not to break user ABI (even when the user
behaviour is not entirely sane).
Fixes: a7526eb5d0 (net: Unbreak compat_sys_{send,recv}msg)
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use helper functions to access current->state.
Direct assignments are prone to races and therefore buggy.
current->state = TASK_RUNNING can be replaced by __set_current_state()
Thanks to Peter Zijlstra for the exact definition of the problem.
Suggested-By: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_should_expand_sndbuf() does not expand the send buffer if we have
filled the congestion window.
However, it should use tcp_packets_in_flight() instead of
tp->packets_out to make this check.
Testing has established that the difference matters a lot if there are
many SACKed packets, causing a needless performance shortfall.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Trying to use burst capability (aka xmit_more) on a virtual device
like bonding is not supported.
For example, skb might be queued multiple times on a qdisc, with
various list corruptions.
Fixes: 38b2cf2982 ("net: pktgen: packet bursting via skb->xmit_more")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Packets defragmentation was introduced for PACKET_FANOUT_HASH only,
see 7736d33f42 ("packet: Add pre-defragmentation support for ipv4
fanouts")
It may be useful to have defragmentation enabled regardless of
fanout type. Without that, the AF_PACKET user may have to:
1. Collect fragments from different rings
2. Defragment by itself
Signed-off-by: Alexander Drozdov <al.drozdov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
colons are used as a separator in netdev device lookup in dev_ioctl.c
Specific functions are SIOCGIFTXQLEN SIOCETHTOOL SIOCSIFNAME
Signed-off-by: Matthew Thode <mthode@mthode.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains updates for your net tree, they are:
1) Fix removal of destination in IPVS when the new mixed family support
is used, from Alexey Andriyanov via Simon Horman.
2) Fix module refcount undeflow in nft_compat when reusing a match /
target.
3) Fix iptables-restore when the recent match is used with a new hitcount
that exceeds threshold, from Florian Westphal.
4) Fix stack corruption in xt_socket due to using stack storage to save
the inner IPv6 header, from Eric Dumazet.
I'll follow up soon with another batch with more fixes that are still
cooking.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The cfpkt_iterate() function can return -EPROTO on error, but the
function is a u16 so the negative value gets truncated to a positive
unsigned short. This causes a static checker warning.
The only caller which might care is cffrml_receive(), when it's checking
the frame checksum. I modified cffrml_receive() so that it never says
-EPROTO is a valid checksum.
Also this isn't ever going to be inlined so I removed the "inline".
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit aafb3e98b2 (netdev: introduce new NETIF_F_HW_SWITCH_OFFLOAD feature
flag for switch device offloads) add a new feature without adding it to
netdev_features_strings array; this patch fixes this.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Non NAPI drivers can call skb_tstamp_tx() and then sock_queue_err_skb()
from hard IRQ context.
Therefore, sock_dequeue_err_skb() needs to block hard irq or
corruptions or hangs can happen.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 364a9e9324 ("sock: deduplicate errqueue dequeue")
Fixes: cb820f8e4b ("net: Provide a generic socket error queue delivery method for Tx time stamps.")
Signed-off-by: David S. Miller <davem@davemloft.net>
Open vSwitch allows moving internal vport to different namespace
while still connected to the bridge. But when namespace deleted
OVS does not detach these vports, that results in dangling
pointer to netdevice which causes kernel panic as follows.
This issue is fixed by detaching all ovs ports from the deleted
namespace at net-exit.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
IP: [<ffffffffa0aadaa5>] ovs_vport_locate+0x35/0x80 [openvswitch]
Oops: 0000 [#1] SMP
Call Trace:
[<ffffffffa0aa6391>] lookup_vport+0x21/0xd0 [openvswitch]
[<ffffffffa0aa65f9>] ovs_vport_cmd_get+0x59/0xf0 [openvswitch]
[<ffffffff8167e07c>] genl_family_rcv_msg+0x1bc/0x3e0
[<ffffffff8167e319>] genl_rcv_msg+0x79/0xc0
[<ffffffff8167d919>] netlink_rcv_skb+0xb9/0xe0
[<ffffffff8167deac>] genl_rcv+0x2c/0x40
[<ffffffff8167cffd>] netlink_unicast+0x12d/0x1c0
[<ffffffff8167d3da>] netlink_sendmsg+0x34a/0x6b0
[<ffffffff8162e140>] sock_sendmsg+0xa0/0xe0
[<ffffffff8162e5e8>] ___sys_sendmsg+0x408/0x420
[<ffffffff8162f541>] __sys_sendmsg+0x51/0x90
[<ffffffff8162f592>] SyS_sendmsg+0x12/0x20
[<ffffffff81764ee9>] system_call_fastpath+0x12/0x17
Reported-by: Assaf Muller <amuller@redhat.com>
Fixes: 46df7b81454("openvswitch: Add support for network namespaces.")
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Reviewed-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In tcf_em_validate(), after calling request_module() to load the
kind-specific module, set em->ops to NULL before returning -EAGAIN, so
that module_put() is not called again by tcf_em_tree_destroy().
Signed-off-by: Ignacy Gawędzki <ignacy.gawedzki@green-communications.fr>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb_copy_bits() returns zero on success and negative value on error,
so it is needed to invert the condition in ip_check_defrag().
Fixes: 1bf3751ec9 ("ipv4: ip_check_defrag must not modify skb before unsharing")
Signed-off-by: Alexander Drozdov <al.drozdov@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The gnet_stats_copy_app() function gets called, more often than not, with its
second argument a pointer to an automatic variable in the caller's stack.
Therefore, to avoid copying garbage afterwards when calling
gnet_stats_finish_copy(), this data is better copied to a dynamically allocated
memory that gets freed after use.
[xiyou.wangcong@gmail.com: remove a useless kfree()]
Signed-off-by: Ignacy Gawędzki <ignacy.gawedzki@green-communications.fr>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking updates from David Miller:
1) Missing netlink attribute validation in nft_lookup, from Patrick
McHardy.
2) Restrict ipv6 partial checksum handling to UDP, since that's the
only case it works for. From Vlad Yasevich.
3) Clear out silly device table sentinal macros used by SSB and BCMA
drivers. From Joe Perches.
4) Make sure the remote checksum code never creates a situation where
the remote checksum is applied yet the tunneling metadata describing
the remote checksum transformation is still present. Otherwise an
external entity might see this and apply the checksum again. From
Tom Herbert.
5) Use msecs_to_jiffies() where applicable, from Nicholas Mc Guire.
6) Don't explicitly initialize timer struct fields, use setup_timer()
and mod_timer() instead. From Vaishali Thakkar.
7) Don't invoke tg3_halt() without the tp->lock held, from Jun'ichi
Nomura.
8) Missing __percpu annotation in ipvlan driver, from Eric Dumazet.
9) Don't potentially perform skb_get() on shared skbs, also from Eric
Dumazet.
10) Fix COW'ing of metrics for non-DST_HOST routes in ipv6, from Martin
KaFai Lau.
11) Fix merge resolution error between the iov_iter changes in vhost and
some bug fixes that occurred at the same time. From Jason Wang.
12) If rtnl_configure_link() fails we have to perform a call to
->dellink() before unregistering the device. From WANG Cong.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (39 commits)
net: dsa: Set valid phy interface type
rtnetlink: call ->dellink on failure when ->newlink exists
com20020-pci: add support for eae single card
vhost_net: fix wrong iter offset when setting number of buffers
net: spelling fixes
net/core: Fix warning while make xmldocs caused by dev.c
net: phy: micrel: disable NAND-tree for KSZ8021, KSZ8031, KSZ8051, KSZ8081
ipv6: fix ipv6_cow_metrics for non DST_HOST case
openvswitch: Fix key serialization.
r8152: restore hw settings
hso: fix rx parsing logic when skb allocation fails
tcp: make sure skb is not shared before using skb_get()
bridge: netfilter: Move sysctl-specific error code inside #ifdef
ipv6: fix possible deadlock in ip6_fl_purge / ip6_fl_gc
ipvlan: add a missing __percpu pcpu_stats
tg3: Hold tp->lock before calling tg3_halt() from tg3_init_one()
bgmac: fix device initialization on Northstar SoCs (condition typo)
qlcnic: Delete existing multicast MAC list before adding new
net/mlx5_core: Fix configuration of log_uar_page_sz
sunvnet: don't change gso data on clones
...
If the phy interface mode is not found in devicetree, or if devicetree
is not configured, of_get_phy_mode returns -ENODEV. The current code
sets the phy interface mode to the return value from of_get_phy_mode
without checking if it is valid.
This invalid phy interface mode is passed as parameter to of_phy_connect
or to phy_connect_direct. This sets the phy interface mode to the invalid
value, which in turn causes problems for any code using phydev->interface.
Fixes: b31f65fb43 ("net: dsa: slave: Fix autoneg for phys on switch MDIO bus")
Fixes: 0d8bcdd383 ("net: dsa: allow for more complex PHY setups")
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As soon as extract_icmp6_fields() returns, its local storage (automatic
variables) is deallocated and can be overwritten.
Lets add an additional parameter to make sure storage is valid long
enough.
While we are at it, adds some const qualifiers.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: b64c9256a9 ("tproxy: added IPv6 support to the socket match")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
given:
-A INPUT -m recent --update --seconds 30 --hitcount 4
and
iptables-save > foo
then
iptables-restore < foo
will fail with:
kernel: xt_recent: hitcount (4) is larger than packets to be remembered (4) for table DEFAULT
Even when the check is fixed, the restore won't work if the hitcount is
increased to e.g. 6, since by the time checkentry runs it will find the
'old' incarnation of the table.
We can avoid this by increasing the maximum threshold silently; we only
have to rm all the current entries of the table (these entries would
not have enough room to handle the increased hitcount).
This even makes (not-very-useful)
-A INPUT -m recent --update --seconds 30 --hitcount 4
-A INPUT -m recent --update --seconds 30 --hitcount 42
work.
Fixes: abc86d0f99 (netfilter: xt_recent: relax ip_pkt_list_tot restrictions)
Tracked-down-by: Chris Vine <chris@cvine.freeserve.co.uk>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Ignacy reported that when eth0 is down and add a vlan device
on top of it like:
ip link add link eth0 name eth0.1 up type vlan id 1
We will get a refcount leak:
unregister_netdevice: waiting for eth0.1 to become free. Usage count = 2
The problem is when rtnl_configure_link() fails in rtnl_newlink(),
we simply call unregister_device(), but for stacked device like vlan,
we almost do nothing when we unregister the upper device, more work
is done when we unregister the lower device, so call its ->dellink().
Reported-by: Ignacy Gawedzki <ignacy.gawedzki@green-communications.fr>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Spelling errors caught by codespell.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fix following warning wile make xmldocs.
Warning(.//net/core/dev.c:5345): No description found
for parameter 'bonding_info'
Warning(.//net/core/dev.c:5345): Excess function parameter
'netdev_bonding_info' description in 'netdev_bonding_info_change'
This warning starts to appear after following patch was added
into Linus's tree during merger period.
commit 61bd3857ff
net/core: Add event for a change in slave state
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6_cow_metrics() currently assumes only DST_HOST routes require
dynamic metrics allocation from inetpeer. The assumption breaks
when ndisc discovered router with RTAX_MTU and RTAX_HOPLIMIT metric.
Refer to ndisc_router_discovery() in ndisc.c and note that dst_metric_set()
is called after the route is created.
This patch creates the metrics array (by calling dst_cow_metrics_generic) in
ipv6_cow_metrics().
Test:
radvd.conf:
interface qemubr0
{
AdvLinkMTU 1300;
AdvCurHopLimit 30;
prefix fd00:face:face:face::/64
{
AdvOnLink on;
AdvAutonomous on;
AdvRouterAddr off;
};
};
Before:
[root@qemu1 ~]# ip -6 r show | egrep -v unreachable
fd00:face:face:face::/64 dev eth0 proto kernel metric 256 expires 27sec
fe80::/64 dev eth0 proto kernel metric 256
default via fe80::74df:d0ff:fe23:8ef2 dev eth0 proto ra metric 1024 expires 27sec
After:
[root@qemu1 ~]# ip -6 r show | egrep -v unreachable
fd00:face:face:face::/64 dev eth0 proto kernel metric 256 expires 27sec mtu 1300
fe80::/64 dev eth0 proto kernel metric 256 mtu 1300
default via fe80::74df:d0ff:fe23:8ef2 dev eth0 proto ra metric 1024 expires 27sec mtu 1300 hoplimit 30
Fixes: 8e2ec63917 (ipv6: don't use inetpeer to store metrics for routes.)
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix typo where mask is used rather than key.
Fixes: 74ed7ab9264("openvswitch: Add support for unique flow IDs.")
Reported-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
printk and friends can now format bitmaps using '%*pb[l]'. cpumask
and nodemask also provide cpumask_pr_args() and nodemask_pr_args()
respectively which can be used to generate the two printf arguments
necessary to format the specified cpu/nodemask.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
IPv6 can keep a copy of SYN message using skb_get() in
tcp_v6_conn_request() so that caller wont free the skb when calling
kfree_skb() later.
Therefore TCP fast open has to clone the skb it is queuing in
child->sk_receive_queue, as all skbs consumed from receive_queue are
freed using __kfree_skb() (ie assuming skb->users == 1)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Fixes: 5b7ed0892f ("tcp: move fastopen functions to tcp_fastopen.c")
Signed-off-by: David S. Miller <davem@davemloft.net>
Move memcg_socket_limit_enabled decrement to tcp_destroy_cgroup (called
from memcg_destroy_kmem -> mem_cgroup_sockets_destroy) and zap a bunch of
wrapper functions.
Although this patch moves static keys decrement from __mem_cgroup_free to
mem_cgroup_css_free, it does not introduce any functional changes, because
the keys are incremented on setting the limit (tcp or kmem), which can
only happen after successful mem_cgroup_css_online.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtisu.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull nfsd updates from Bruce Fields:
"The main change is the pNFS block server support from Christoph, which
allows an NFS client connected to shared disk to do block IO to the
shared disk in place of NFS reads and writes. This also requires xfs
patches, which should arrive soon through the xfs tree, barring
unexpected problems. Support for other filesystems is also possible
if there's interest.
Thanks also to Chuck Lever for continuing work to get NFS/RDMA into
shape"
* 'for-3.20' of git://linux-nfs.org/~bfields/linux: (32 commits)
nfsd: default NFSv4.2 to on
nfsd: pNFS block layout driver
exportfs: add methods for block layout exports
nfsd: add trace events
nfsd: update documentation for pNFS support
nfsd: implement pNFS layout recalls
nfsd: implement pNFS operations
nfsd: make find_any_file available outside nfs4state.c
nfsd: make find/get/put file available outside nfs4state.c
nfsd: make lookup/alloc/unhash_stid available outside nfs4state.c
nfsd: add fh_fsid_match helper
nfsd: move nfsd_fh_match to nfsfh.h
fs: add FL_LAYOUT lease type
fs: track fl_owner for leases
nfs: add LAYOUT_TYPE_MAX enum value
nfsd: factor out a helper to decode nfstime4 values
sunrpc/lockd: fix references to the BKL
nfsd: fix year-2038 nfs4 state problem
svcrdma: Handle additional inline content
svcrdma: Move read list XDR round-up logic
...
If CONFIG_SYSCTL=n:
net/bridge/br_netfilter.c: In function ‘br_netfilter_init’:
net/bridge/br_netfilter.c:996: warning: label ‘err1’ defined but not used
Move the label and the code after it inside the existing #ifdef to get
rid of the warning.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>