when the verifier log is enabled the print_bpf_insn() is doing
bpf_alu_string[BPF_OP(insn->code) >> 4]
and
bpf_jmp_string[BPF_OP(insn->code) >> 4]
where BPF_OP is a 4-bit instruction opcode.
Malformed insns can cause out of bounds access.
Fix it by sizing arrays appropriately.
The bug was found by clang address sanitizer with libfuzzer.
Reported-by: Yonghong Song <yhs@plumgrid.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Problem:
The ecmp route replace support for ipv6 in the kernel, deletes the
existing ecmp route too early, ie when it installs the first nexthop.
If there is an error in installing the subsequent nexthops, its too late
to recover the already deleted existing route leaving the fib
in an inconsistent state.
This patch reduces the possibility of this by doing the following:
a) Changes the existing multipath route add code to a two stage process:
build rt6_infos + insert them
ip6_route_add rt6_info creation code is moved into
ip6_route_info_create.
b) This ensures that most errors are caught during building rt6_infos
and we fail early
c) Separates multipath add and del code. Because add needs the special
two stage mode in a) and delete essentially does not care.
d) In any event if the code fails during inserting a route again, a
warning is printed (This should be unlikely)
Before the patch:
$ip -6 route show
3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024
/* Try replacing the route with a duplicate nexthop */
$ip -6 route change 3000:1000:1000:1000::2/128 nexthop via
fe80::202:ff:fe00:b dev swp49s0 nexthop via fe80::202:ff:fe00:d dev
swp49s1 nexthop via fe80::202:ff:fe00:d dev swp49s1
RTNETLINK answers: File exists
$ip -6 route show
/* previously added ecmp route 3000:1000:1000:1000::2 dissappears from
* kernel */
After the patch:
$ip -6 route show
3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024
/* Try replacing the route with a duplicate nexthop */
$ip -6 route change 3000:1000:1000:1000::2/128 nexthop via
fe80::202:ff:fe00:b dev swp49s0 nexthop via fe80::202:ff:fe00:d dev
swp49s1 nexthop via fe80::202:ff:fe00:d dev swp49s1
RTNETLINK answers: File exists
$ip -6 route show
3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024
Fixes: 2759647247 ("ipv6: fix ECMP route replacement")
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We may already have gotten a proper fd struct through fdget(), so
whenever we return at the end of an map operation, we need to call
fdput(). However, each map operation from syscall side first probes
CHECK_ATTR() to verify that unused fields in the bpf_attr union are
zero.
In case of malformed input, we return with error, but the lookup to
the map_fd was already performed at that time, so that we return
without an corresponding fdput(). Fix it by performing an fdget()
only right before bpf_map_get(). The fdget() invocation on maps in
the verifier is not affected.
Fixes: db20fd2b01 ("bpf: add lookup/update/delete/iterate methods to BPF maps")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit f48da8b14d (xen-netback: fix
unlimited guest Rx internal queue and carrier flapping) introduced a
regression.
The PV frontend in IPXE only places 4 requests on the guest Rx ring.
Since netback required at least (MAX_SKB_FRAGS + 1) slots, IPXE could
not receive any packets.
a) If GSO is not enabled on the VIF, fewer guest Rx slots are required
for the largest possible packet. Calculate the required slots
based on the maximum GSO size or the MTU.
This calculation of the number of required slots relies on
1650d5455b (xen-netback: always fully coalesce guest Rx packets)
which present in 4.0-rc1 and later.
b) Reduce the Rx stall detection to checking for at least one
available Rx request. This is fine since we're predominately
concerned with detecting interfaces which are down and thus have
zero available Rx requests.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai says:
====================
cxgb4: Fix tx flit calculation and wc stat configuration
This patch series fixes the following:
Patch 1/2 fixes tx flit calculation, which if wrong can lead to
stall, hang, data corrpution, write combining failure. Patch 2/2 fixes
PCI-E write combining stats configuration.
This patch series has been created against net tree and includes
patches on cxgb4 driver.
We have included all the maintainers of respective drivers. Kindly review
the change and let us know in case of any review comments.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The write-combining configuration register SGE_STAT_CFG_A needs to
be configured after FW initializes the adapter, else FW will reset
the configuration
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 0aac3f56d4 ("cxgb4: Add comment for calculate tx flits
and sge length code") introduced a regression where tx flit calculation
is going wrong, which can lead to data corruption, hang, stall and
write-combining failure. Fixing it.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Call netif_carrier_off() prior to register_netdev(), otherwise
userspace can see incorrect link state.
Signed-off-by: Atsushi Nemoto <nemoto@toshiba-tops.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
If an attempt to wake up users of broadcast link is made when there is
no enough place in send queue than it may hang up inside the
tipc_sk_rcv() function since the loop breaks only after the wake up
queue becomes empty. This can lead to complete CPU stall with the
following message generated by RCU:
INFO: rcu_sched self-detected stall on CPU { 0} (t=2101 jiffies
g=54225 c=54224 q=11465)
Task dump for CPU 0:
tpch R running task 0 39949 39948 0x0000000a
ffffffff818536c0 ffff88181fa037a0 ffffffff8106a4be 0000000000000000
ffffffff818536c0 ffff88181fa037c0 ffffffff8106d8a8 ffff88181fa03800
0000000000000001 ffff88181fa037f0 ffffffff81094a50 ffff88181fa15680
Call Trace:
<IRQ> [<ffffffff8106a4be>] sched_show_task+0xae/0x120
[<ffffffff8106d8a8>] dump_cpu_task+0x38/0x40
[<ffffffff81094a50>] rcu_dump_cpu_stacks+0x90/0xd0
[<ffffffff81097c3b>] rcu_check_callbacks+0x3eb/0x6e0
[<ffffffff8106e53f>] ? account_system_time+0x7f/0x170
[<ffffffff81099e64>] update_process_times+0x34/0x60
[<ffffffff810a84d1>] tick_sched_handle.isra.18+0x31/0x40
[<ffffffff810a851c>] tick_sched_timer+0x3c/0x70
[<ffffffff8109a43d>] __run_hrtimer.isra.34+0x3d/0xc0
[<ffffffff8109aa95>] hrtimer_interrupt+0xc5/0x1e0
[<ffffffff81030d52>] ? native_smp_send_reschedule+0x42/0x60
[<ffffffff81032f04>] local_apic_timer_interrupt+0x34/0x60
[<ffffffff810335bc>] smp_apic_timer_interrupt+0x3c/0x60
[<ffffffff8165a3fb>] apic_timer_interrupt+0x6b/0x70
[<ffffffff81659129>] ? _raw_spin_unlock_irqrestore+0x9/0x10
[<ffffffff8107eb9f>] __wake_up_sync_key+0x4f/0x60
[<ffffffffa313ddd1>] tipc_write_space+0x31/0x40 [tipc]
[<ffffffffa313dadf>] filter_rcv+0x31f/0x520 [tipc]
[<ffffffffa313d699>] ? tipc_sk_lookup+0xc9/0x110 [tipc]
[<ffffffff81659259>] ? _raw_spin_lock_bh+0x19/0x30
[<ffffffffa314122c>] tipc_sk_rcv+0x2dc/0x3e0 [tipc]
[<ffffffffa312e7ff>] tipc_bclink_wakeup_users+0x2f/0x40 [tipc]
[<ffffffffa313ce26>] tipc_node_unlock+0x186/0x190 [tipc]
[<ffffffff81597c1c>] ? kfree_skb+0x2c/0x40
[<ffffffffa313475c>] tipc_rcv+0x2ac/0x8c0 [tipc]
[<ffffffffa312ff58>] tipc_l2_rcv_msg+0x38/0x50 [tipc]
[<ffffffff815a76d3>] __netif_receive_skb_core+0x5a3/0x950
[<ffffffff815a98d3>] __netif_receive_skb+0x13/0x60
[<ffffffff815a993e>] netif_receive_skb_internal+0x1e/0x90
[<ffffffff815aa138>] napi_gro_receive+0x78/0xa0
[<ffffffffa07f93f4>] tg3_poll_work+0xc54/0xf40 [tg3]
[<ffffffff81597c8c>] ? consume_skb+0x2c/0x40
[<ffffffffa07f9721>] tg3_poll_msix+0x41/0x160 [tg3]
[<ffffffff815ab0f2>] net_rx_action+0xe2/0x290
[<ffffffff8104b92a>] __do_softirq+0xda/0x1f0
[<ffffffff8104bc26>] irq_exit+0x76/0xa0
[<ffffffff81004355>] do_IRQ+0x55/0xf0
[<ffffffff8165a12b>] common_interrupt+0x6b/0x6b
<EOI>
The issue occurs only when tipc_sk_rcv() is used to wake up postponed
senders:
tipc_bclink_wakeup_users()
// wakeupq - is a queue which consists of special
// messages with SOCK_WAKEUP type.
tipc_sk_rcv(wakeupq)
...
while (skb_queue_len(inputq)) {
filter_rcv(skb)
// Here the type of message is checked
// and if it is SOCK_WAKEUP then
// it tries to wake up a sender.
tipc_write_space(sk)
wake_up_interruptible_sync_poll()
}
After the sender thread is woke up it can gather control and perform
an attempt to send a message. But if there is no enough place in send
queue it will call link_schedule_user() function which puts a message
of type SOCK_WAKEUP to the wakeup queue and put the sender to sleep.
Thus the size of the queue actually is not changed and the while()
loop never exits.
The approach I proposed is to wake up only senders for which there is
enough place in send queue so the described issue can't occur.
Moreover the same approach is already used to wake up senders on
unicast links.
I have got into the issue on our product code but to reproduce the
issue I changed a benchmark test application (from
tipcutils/demos/benchmark) to perform the following scenario:
1. Run 64 instances of test application (nodes). It can be done
on the one physical machine.
2. Each application connects to all other using TIPC sockets in
RDM mode.
3. When setup is done all nodes start simultaneously send
broadcast messages.
4. Everything hangs up.
The issue is reproducible only when a congestion on broadcast link
occurs. For example, when there are only 8 nodes it works fine since
congestion doesn't occur. Send queue limit is 40 in my case (I use a
critical importance level) and when 64 nodes send a message at the
same moment a congestion occurs every time.
Signed-off-by: Dmitry S Kolmakov <kolmakov.dmitriy@huawei.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the unnecessary switchdev.h include from br_netlink.c.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since __vlan_del can return an error code, change its inner function
__vlan_vid_del to return an eventual error from switchdev_port_obj_del.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
The comparison check between cur_hw_state and hw_state is currently
invalid because cur_hw_state is right shifted by G_MISTP_SHIFT, while
hw_state is not, so we end-up comparing bits 2:0 with bits 7:5, which is
going to cause an additional aging to occur. Fix this by not shifting
cur_hw_state while reading it, but instead, mask the value with the
appropriately shitfted bitmask.
The other problem with the fast-ageing process is that we did not set
the EN_AGE_DYNAMIC bit to request the ageing to occur for dynamically
learned MAC addresses. Finally, write back 0 to the FAST_AGE_CTRL
register to avoid leaving spurious bits sets from one operation to the
other.
Fixes: 12f460f234 ("net: dsa: bcm_sf2: add HW bridging support")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function device_get_mac_address is trying different property names
in order to get the mac address. To check the return value, the variable
addr (which contain the buffer pass by the caller) will be re-used. This
means that if the previous property is not found, the next property will
be read using a NULL buffer.
Therefore it's only possible to retrieve the mac if node contains a
property "mac-address". Fix it by using a temporary buffer for the
return value.
This has been introduced by commit 4c96b7dc0d
"Add a matching set of device_ functions for determining mac/phy"
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: David S. Miller <davem@davemloft.net>
Reviewed-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The race may happen when a device (e.g. YOTA 4G LTE Modem) is
unplugged while the system is downloading a large file from the Net.
Hardware breakpoints and Kprobes with delays were used to confirm that
the race does actually happen.
The race is on skb_queue ('next' pointer) between usbnet_stop()
and rx_complete(), which, in turn, calls usbnet_bh().
Here is a part of the call stack with the code where the changes to the
queue happen. The line numbers are for the kernel 4.1.0:
*0 __skb_unlink (skbuff.h:1517)
prev->next = next;
*1 defer_bh (usbnet.c:430)
spin_lock_irqsave(&list->lock, flags);
old_state = entry->state;
entry->state = state;
__skb_unlink(skb, list);
spin_unlock(&list->lock);
spin_lock(&dev->done.lock);
__skb_queue_tail(&dev->done, skb);
if (dev->done.qlen == 1)
tasklet_schedule(&dev->bh);
spin_unlock_irqrestore(&dev->done.lock, flags);
*2 rx_complete (usbnet.c:640)
state = defer_bh(dev, skb, &dev->rxq, state);
At the same time, the following code repeatedly checks if the queue is
empty and reads these values concurrently with the above changes:
*0 usbnet_terminate_urbs (usbnet.c:765)
/* maybe wait for deletions to finish. */
while (!skb_queue_empty(&dev->rxq)
&& !skb_queue_empty(&dev->txq)
&& !skb_queue_empty(&dev->done)) {
schedule_timeout(msecs_to_jiffies(UNLINK_TIMEOUT_MS));
set_current_state(TASK_UNINTERRUPTIBLE);
netif_dbg(dev, ifdown, dev->net,
"waited for %d urb completions\n", temp);
}
*1 usbnet_stop (usbnet.c:806)
if (!(info->flags & FLAG_AVOID_UNLINK_URBS))
usbnet_terminate_urbs(dev);
As a result, it is possible, for example, that the skb is removed from
dev->rxq by __skb_unlink() before the check
"!skb_queue_empty(&dev->rxq)" in usbnet_terminate_urbs() is made. It is
also possible in this case that the skb is added to dev->done queue
after "!skb_queue_empty(&dev->done)" is checked. So
usbnet_terminate_urbs() may stop waiting and return while dev->done
queue still has an item.
Locking in defer_bh() and usbnet_terminate_urbs() was revisited to avoid
this race.
Signed-off-by: Eugene Shatokhin <eugene.shatokhin@rosalab.ru>
Reviewed-by: Bjørn Mork <bjorn@mork.no>
Acked-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c: In function ‘init_one’:
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c:4579:8: warning: ‘chip’ may be used uninitialized in this function [-Wmaybe-uninitialized]
chip |= CHELSIO_CHIP_CODE(CHELSIO_T4, pl_rev);
^
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c:4571:11: note: ‘chip’ was declared here
int ver, chip;
^
Fixes: d86bd29e0b ("cxgb4/cxgb4vf: read the correct bits of PL Who Am I register")
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Cc: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I've noticed that fixed_phy_register() ignores its 'irq' parameter instead of
passing it to fixed_phy_add(). Luckily, fixed_phy_register() seems to always
be called with PHY_POLL for 'irq'... :-)
Fixes: a759512174 ("net: phy: extend fixed driver with fixed_phy_register()")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's no particular desire to have conntrack action support in Open
vSwitch as an independently configurable bit, rather just to ensure
there is not a hard dependency. This exposed option doesn't accurately
reflect the conntrack dependency when enabled, so simplify this by
removing the option. Compile the support if NF_CONNTRACK is enabled.
Fixes: 7f8a436eaa ("openvswitch: Add conntrack action")
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Marvell 88E6171 switch is in the 88E6351 family, which supports
802.1Q, thus add support from the generic mv88e6xxx functions.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
* fix for the sizeof() pointer type issue
* a fix for regulatory getting into a restore loop
* a fix for rfkill global 'all' state, it needs to be stored
everywhere to apply correctly to new rfkill instances
* properly refuse CQM RSSI when it cannot actually be used
* protect HT TDLS traffic properly in non-HT networks
* don't incorrectly advertise 80 MHz support when not allowed
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJV6av+AAoJEDBSmw7B7bqrNYYP/1kZjdk9s4wPHpQndfSWK/sj
yTjysh/VXZ5Jyzkcmg337AjEuuJ/SntDXjMXEKTxHHrt5fmHFBvmPErPUY7p7I/t
MJqxVnYGypGW4LImyXYRJbejabmPrxtHC+jcWibw24yBCzJGCXF/MeKCotksnvIn
IoJDvUHrVFZhfzFRepdozJi1Qy2Y04Q7zXywplW6XFYWxdfDWaWyno/xh+rM8oO7
xSgpkuj4FDfimDBjzJOHwBH3xOlt/uHoB97G0TJbJ6k0R6a8aTz8eK5OWB4rFcn+
BsuQfaWcrGg9fJsoqz4mYlWf/F4Qj/gU9Gr5sBUWWa0xQ7HctcgJ/t27XjUWP7oV
VnQ9YQt/SukG3BVf8GuNqaSo7/7oprWHdDYRH0tnvImXCWsUlHM8gv8MrG0UOHGC
4I4qgPLHon6Ui3lNHfivl8iSDFT+jfsMDMWiB6kKZRpUOdEicRrepMoDARVbTYck
h6uY4Tx6OQ39evLSXjyrCWb0aOr2WoDCypxEkpE+fgDsWr8yTV0Ti0QZfVqatO3d
d4GQb6U5Zkd14ymKtzULa5qjOw1TmnKpxVzSSdrONaPBVdUDsl4QH38gIppFjGsV
61Gn3PXRRgy+VAib/YyIYqilvdhZ2TdXaSVAEyKv6WjKavOFA/8KsDPmJsyKoCSn
1dDOwaiqgfu2bK1PTU9A
=xw8r
-----END PGP SIGNATURE-----
Merge tag 'mac80211-for-davem-2015-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
For the first round of fixes, we have this:
* fix for the sizeof() pointer type issue
* a fix for regulatory getting into a restore loop
* a fix for rfkill global 'all' state, it needs to be stored
everywhere to apply correctly to new rfkill instances
* properly refuse CQM RSSI when it cannot actually be used
* protect HT TDLS traffic properly in non-HT networks
* don't incorrectly advertise 80 MHz support when not allowed
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
drivers/net/vxlan.c: In function ‘vxlan_udp_encap_recv’:
drivers/net/vxlan.c:1226: warning: ‘info’ may be used uninitialized in this function
While this warning is a false positive, it can be killed easily by
getting rid of the pointer intermediary and referring directly to the
ip_tunnel_info structure.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/usb/lan78xx.c: In function ‘lan78xx_link_reset’:
net/usb/lan78xx.c:1107: warning: comparison is always false due to limited range of data type
net/usb/lan78xx.c:1111: warning: comparison is always false due to limited range of data type
Assigning return values that can be negative error codes to "u16"
variables makes them positive, ignoring the errors. Hence use "int"
instead.
Drop the "unlikely"s (unlikely considered harmful) and propagate the
actual error values instead of overriding them to -EIO while we're at
it.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Only return a conn if the rds_conn_net(conn) matches the struct
net passed to rds_conn_lookup().
Fixes: 467fa15356 ("RDS-TCP: Support multiple RDS-TCP listen endpoints,
one per netns.")
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If fec MDIO write method succeeds its return value comes from
call to pm_runtime_get_sync().
But pm_runtime_get_sync() can also return 1.
In case of Micrel KSZ9031 PHY this value will then
be returned along the call chain of phy_write() ->
ksz9031_extended_write() -> ksz9031_center_flp_timing() ->
ksz9031_config_init() -> phy_init_hw() -> phy_attach_direct() ->
phy_connect_direct().
Then phy_connect() will cast it into a pointer using ERR_PTR(),
which then fec_enet_mii_probe() will try to dereference
resulting in an oops.
Fix it by normalizing return value of pm_runtime_get_sync()
to be zero if positive in MDIO write method.
Fixes: 8fff755e9f ("net: fec: Ensure clocks are enabled while using mdio bus")
Signed-off-by: Maciej Szmigiero <mail@maciej.szmigiero.name>
Acked-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
switchdev_port_fdb_dump is used as .ndo_fdb_dump. Its return value is
idx, so we cannot return errval.
Fixes: 45d4122ca7 ("switchdev: add support for fdb add/del/dump via switchdev_port_obj ops.")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Acked-by: Scott Feldman<sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The be_cmd_rx_filter() routine sends a non-embedded cmd to the FW and used
a pre-allocated dma memory to hold the cmd payload. This worked fine when
this cmd was synchronous. This cmd was changed to asynchronous mode by the
commit 8af65c2f4("make the RX_FILTER command asynchronous"). So now when
there are two quick invocations of this cmd, the 2nd request may end up
overwriting the first request, causing FW cmd corruption.
This patch reverts the offending commit and hence fixes the regression.
Fixes: 8af65c2f4("be2net: make the RX_FILTER command asynchronous")
Signed-off-by: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
include/net/netfilter/nf_conntrack.h
The conflict was an overlap between changing the type of the zone
argument to nf_ct_tmpl_alloc() whilst exporting nf_ct_tmpl_free.
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net, they are:
1) Oneliner to restore maps in nf_tables since we support addressing registers
at 32 bits level.
2) Restore previous default behaviour in bridge netfilter when CONFIG_IPV6=n,
oneliner from Bernhard Thaler.
3) Out of bound access in ipset hash:net* set types, reported by Dave Jones'
KASan utility, patch from Jozsef Kadlecsik.
4) Fix ipset compilation with gcc 4.4.7 related to C99 initialization of
unnamed unions, patch from Elad Raz.
5) Add a workaround to address inconsistent endianess in the res_id field of
nfnetlink batch messages, reported by Florian Westphal.
6) Fix error paths of CT/synproxy since the conntrack template was moved to use
kmalloc, patch from Daniel Borkmann.
All of them look good to me to reach 4.2, I can route this to -stable myself
too, just let me know what you prefer.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_get_by_name() will increment the usage count if the matching device
is found. But we were not decrementing the count if we have got the
device and the device is non-active.
Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the IPv6 multicast routing code the mrt_lock was not being released
correctly in the MFC iterator, as a result adding or deleting a MIF would
cause a hang because the mrt_lock could not be acquired.
This fix is a copy of the code for the IPv4 case and ensures that the lock
is released correctly.
Signed-off-by: Richard Laing <richard.laing@alliedtelesis.co.nz>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When beacon filtering is enabled the mac80211 software implementation
for RSSI CQM cannot work as beacons will not be available. Rather than
accepting such a configuration without proper effect, reject it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Currently if 80MHz channels are not allowed for use, the VHT IE is not
included in the probe request for an AP. This is not good enough if the
AP is configured with the wrong regulatory and supports VHT even where
prohibited or in TDLS scenarios.
Mark the ifmgd with the DISABLE_VHT flag for the misbehaving-AP case, and
unset VHT support from the peer-station entry for the TDLS case.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When switching the state of all RFKill switches of type all we need to
replicate the RFKILL_TYPE_ALL global state to all the other types global
state, so it is used to initialize persistent RFKill switches on
register.
Signed-off-by: João Paulo Rechi Vita <jprvita@endlessm.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
HT TDLS traffic should be protected in a non-HT BSS to avoid
collisions. Therefore, when TDLS peers join/leave, check if
protection is (now) needed and set the ht_operation_mode of
the virtual interface according to the HT capabilities of the
TDLS peer(s).
This works because a non-HT BSS connection never sets (or
otherwise uses) the ht_operation_mode; it just means that
drivers must be aware that this field applies to all HT
traffic for this virtual interface, not just the traffic
within the BSS. Document that.
Signed-off-by: Avri Altman <avri.altman@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The rate_control_cap_mask() function takes a parameter mcs_mask, which
GCC will take to be u8 * even though it was declared with a fixed size.
This causes the following warning:
net/mac80211/rate.c: In function 'rate_control_cap_mask':
net/mac80211/rate.c:719:25: warning: 'sizeof' on array function parameter 'mcs_mask' will return size of 'u8 * {aka unsigned char *}' [-Wsizeof-array-argument]
for (i = 0; i < sizeof(mcs_mask); i++)
^
net/mac80211/rate.c:684:10: note: declared here
u8 mcs_mask[IEEE80211_HT_MCS_MASK_LEN],
^
This can be easily fixed by using the IEEE80211_HT_MCS_MASK_LEN directly
within the loop condition.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Marcelo Ricardo Leitner says:
====================
couple of sctp fixes for 0ca50d12fe
These are two fixes for sctp after my patch on 0ca50d12fe ("sctp: fix
src address selection if using secondary addresses")
The first, fix a dst leak on those it decided to skip.
The second, adds the fallback on src selection that Vlad had asked
about. Unfortunatelly a lot of ipvs setups relies on the old behavior
and I don't see a better fix for it.
Please consider both to -stable tree.
====================
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 0ca50d12fe added a restriction that the address must belong to
the output interface, so that sctp will use the right interface even
when using secondary addresses.
But it breaks IPVS setups, on which people is used to attach VIP
addresses to loopback interface on real servers. It's preferred to
attach to the interface actually in use, but it's a very common setup
and that used to work.
This patch then saves the first routing good result, even if it would be
going out through an interface that doesn't have that address. If no
better hit found, it's then used. This effectively restores the original
behavior if no better interface could be found.
Fixes: 0ca50d12fe ("sctp: fix src address selection if using secondary addresses")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 0ca50d12fe failed to release the reference to dst entries that
it decided to skip.
Fixes: 0ca50d12fe ("sctp: fix src address selection if using secondary addresses")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tse_poll() calls __napi_complete() with irq enabled. This leads napi
poll_list corruption and may stop all napi drivers working.
Use napi_complete() instead of __napi_complete().
Signed-off-by: Atsushi Nemoto <nemoto@toshiba-tops.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking updates from David Miller:
"Another merge window, another set of networking changes. I've heard
rumblings that the lightweight tunnels infrastructure has been voted
networking change of the year. But what do I know?
1) Add conntrack support to openvswitch, from Joe Stringer.
2) Initial support for VRF (Virtual Routing and Forwarding), which
allows the segmentation of routing paths without using multiple
devices. There are some semantic kinks to work out still, but
this is a reasonably strong foundation. From David Ahern.
3) Remove spinlock fro act_bpf fast path, from Alexei Starovoitov.
4) Ignore route nexthops with a link down state in ipv6, just like
ipv4. From Andy Gospodarek.
5) Remove spinlock from fast path of act_gact and act_mirred, from
Eric Dumazet.
6) Document the DSA layer, from Florian Fainelli.
7) Add netconsole support to bcmgenet, systemport, and DSA. Also
from Florian Fainelli.
8) Add Mellanox Switch Driver and core infrastructure, from Jiri
Pirko.
9) Add support for "light weight tunnels", which allow for
encapsulation and decapsulation without bearing the overhead of a
full blown netdevice. From Thomas Graf, Jiri Benc, and a cast of
others.
10) Add Identifier Locator Addressing support for ipv6, from Tom
Herbert.
11) Support fragmented SKBs in iwlwifi, from Johannes Berg.
12) Allow perf PMUs to be accessed from eBPF programs, from Kaixu Xia.
13) Add BQL support to 3c59x driver, from Loganaden Velvindron.
14) Stop using a zero TX queue length to mean that a device shouldn't
have a qdisc attached, use an explicit flag instead. From Phil
Sutter.
15) Use generic geneve netdevice infrastructure in openvswitch, from
Pravin B Shelar.
16) Add infrastructure to avoid re-forwarding a packet in software
that was already forwarded by a hardware switch. From Scott
Feldman.
17) Allow AF_PACKET fanout function to be implemented in a bpf
program, from Willem de Bruijn"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1458 commits)
netfilter: nf_conntrack: make nf_ct_zone_dflt built-in
netfilter: nf_dup{4, 6}: fix build error when nf_conntrack disabled
net: fec: clear receive interrupts before processing a packet
ipv6: fix exthdrs offload registration in out_rt path
xen-netback: add support for multicast control
bgmac: Update fixed_phy_register()
sock, diag: fix panic in sock_diag_put_filterinfo
flow_dissector: Use 'const' where possible.
flow_dissector: Fix function argument ordering dependency
ixgbe: Resolve "initialized field overwritten" warnings
ixgbe: Remove bimodal SR-IOV disabling
ixgbe: Add support for reporting 2.5G link speed
ixgbe: fix bounds checking in ixgbe_setup_tc for 82598
ixgbe: support for ethtool set_rxfh
ixgbe: Avoid needless PHY access on copper phys
ixgbe: cleanup to use cached mask value
ixgbe: Remove second instance of lan_id variable
ixgbe: use kzalloc for allocating one thing
flow: Move __get_hash_from_flowi{4,6} into flow_dissector.c
ixgbe: Remove unused PCI bus types
...
Pull device mapper update from Mike Snitzer:
- a couple small cleanups in dm-cache, dm-verity, persistent-data's
dm-btree, and DM core.
- a 4.1-stable fix for dm-cache that fixes the leaking of deferred bio
prison cells
- a 4.2-stable fix that adds feature reporting for the dm-stats
features added in 4.2
- improve DM-snapshot to not invalidate the on-disk snapshot if
snapshot device write overflow occurs; but a write overflow triggered
through the origin device will still invalidate the snapshot.
- optimize DM-thinp's async discard submission a bit now that late bio
splitting has been included in block core.
- switch DM-cache's SMQ policy lock from using a mutex to a spinlock;
improves performance on very low latency devices (eg. NVMe SSD).
- document DM RAID 4/5/6's discard support
[ I did not pull the slab changes, which weren't appropriate for this
tree, and weren't obviously the right thing to do anyway. At the very
least they need some discussion and explanation before getting merged.
Because not pulling the actual tagged commit but doing a partial pull
instead, this merge commit thus also obviously is missing the git
signature from the original tag ]
* tag 'dm-4.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm cache: fix use after freeing migrations
dm cache: small cleanups related to deferred prison cell cleanup
dm cache: fix leaking of deferred bio prison cells
dm raid: document RAID 4/5/6 discard support
dm stats: report precise_timestamps and histogram in @stats_list output
dm thin: optimize async discard submission
dm snapshot: don't invalidate on-disk image on snapshot write overflow
dm: remove unlikely() before IS_ERR()
dm: do not override error code returned from dm_get_device()
dm: test return value for DM_MAPIO_SUBMITTED
dm verity: remove unused mempool
dm cache: move wake_waker() from free_migrations() to where it is needed
dm btree remove: remove unused function get_nr_entries()
dm btree: remove unused "dm_block_t root" parameter in btree_split_sibling()
dm cache policy smq: change the mutex to a spinlock
Fengguang reported, that some randconfig generated the following linker
issue with nf_ct_zone_dflt object involved:
[...]
CC init/version.o
LD init/built-in.o
net/built-in.o: In function `ipv4_conntrack_defrag':
nf_defrag_ipv4.c:(.text+0x93e95): undefined reference to `nf_ct_zone_dflt'
net/built-in.o: In function `ipv6_defrag':
nf_defrag_ipv6_hooks.c:(.text+0xe3ffe): undefined reference to `nf_ct_zone_dflt'
make: *** [vmlinux] Error 1
Given that configurations exist where we have a built-in part, which is
accessing nf_ct_zone_dflt such as the two handlers nf_ct_defrag_user()
and nf_ct6_defrag_user(), and a part that configures nf_conntrack as a
module, we must move nf_ct_zone_dflt into a fixed, guaranteed built-in
area when netfilter is configured in general.
Therefore, split the more generic parts into a common header under
include/linux/netfilter/ and move nf_ct_zone_dflt into the built-in
section that already holds parts related to CONFIG_NF_CONNTRACK in the
netfilter core. This fixes the issue on my side.
Fixes: 308ac9143e ("netfilter: nf_conntrack: push zone object into functions")
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
While testing various Kconfig options on another issue, I found that
the following one triggers as well on allmodconfig and nf_conntrack
disabled:
net/ipv4/netfilter/nf_dup_ipv4.c: In function ‘nf_dup_ipv4’:
net/ipv4/netfilter/nf_dup_ipv4.c:72:20: error: ‘nf_skb_duplicated’ undeclared (first use in this function)
if (this_cpu_read(nf_skb_duplicated))
[...]
net/ipv6/netfilter/nf_dup_ipv6.c: In function ‘nf_dup_ipv6’:
net/ipv6/netfilter/nf_dup_ipv6.c:66:20: error: ‘nf_skb_duplicated’ undeclared (first use in this function)
if (this_cpu_read(nf_skb_duplicated))
Fix it by including directly the header where it is defined.
Fixes: bbde9fc182 ("netfilter: factor out packet duplication for IPv4/IPv6")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch just to re-submit the patch "db3421c114cfa6326" because the
patch "4d494cdc92b3b9a0" remove the change.
Clear any pending receive interrupt before we process a pending packet.
This helps to avoid any spurious interrupts being raised after we have
fully cleaned the receive ring, while still allowing an interrupt to be
raised if we receive another packet.
The position of this is critical: we must do this prior to reading the
next packet status to avoid potentially dropping an interrupt when a
packet is still pending.
Acked-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
We previously register IPPROTO_ROUTING offload under inet6_add_offload(),
but in error path, we try to unregister it with inet_del_offload(). This
doesn't seem correct, it should actually be inet6_del_offload(), also
ipv6_exthdrs_offload_exit() from that commit seems rather incorrect (it
also uses rthdr_offload twice), but it got removed entirely later on.
Fixes: 3336288a9f ("ipv6: Switch to using new offload infrastructure.")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull SG updates from Jens Axboe:
"This contains a set of scatter-gather related changes/fixes for 4.3:
- Add support for limited chaining of sg tables even for
architectures that do not set ARCH_HAS_SG_CHAIN. From Christoph.
- Add sg chain support to target_rd. From Christoph.
- Fixup open coded sg->page_link in crypto/omap-sham. From
Christoph.
- Fixup open coded crypto ->page_link manipulation. From Dan.
- Also from Dan, automated fixup of manual sg_unmark_end()
manipulations.
- Also from Dan, automated fixup of open coded sg_phys()
implementations.
- From Robert Jarzmik, addition of an sg table splitting helper that
drivers can use"
* 'for-4.3/sg' of git://git.kernel.dk/linux-block:
lib: scatterlist: add sg splitting function
scatterlist: use sg_phys()
crypto/omap-sham: remove an open coded access to ->page_link
scatterlist: remove open coded sg_unmark_end instances
crypto: replace scatterwalk_sg_chain with sg_chain
target/rd: always chain S/G list
scatterlist: allow limited chaining without ARCH_HAS_SG_CHAIN
Pull block driver updates from Jens Axboe:
"On top of the 4.3 core block IO changes, here are the driver related
changes for 4.3. Basically just NVMe and nbd this time around:
- NVMe:
- PRACT PI improvement from Alok Pandey.
- Cleanups and improvements on submission queue doorbell and
writing, using CMB if available. From Jon Derrick.
- From Keith, support for setting queue maximum segments, and
reset support.
- Also from Jon, fixup of u64 division issue on 32-bit archs and
wiring up of the reset support through and ioctl.
- Two small cleanups from Matias and Sunad
- Various code cleanups and fixes from Markus Pargmann"
* 'for-4.3/drivers' of git://git.kernel.dk/linux-block:
NVMe: Using PRACT bit to generate and verify PI by controller
NVMe:Remove unreachable code in nvme_abort_req
NVMe: Add nvme subsystem reset IOCTL
NVMe: Add nvme subsystem reset support
NVMe: removed unused nn var from nvme_dev_add
NVMe: Set queue max segments
nbd: flags is a u32 variable
nbd: Rename functions for clearness of recv/send path
nbd: Change 'disconnect' to be boolean
nbd: Add debugfs entries
nbd: Remove variable 'pid'
nbd: Move clear queue debug message
nbd: Remove 'harderror' and propagate error properly
nbd: restructure sock_shutdown
nbd: sock_shutdown, remove conditional lock
nbd: Fix timeout detection
nvme: Fixes u64 division which breaks i386 builds
NVMe: Use CMB for the IO SQes if available
NVMe: Unify SQ entry writing and doorbell ringing
Pull core block updates from Jens Axboe:
"This first core part of the block IO changes contains:
- Cleanup of the bio IO error signaling from Christoph. We used to
rely on the uptodate bit and passing around of an error, now we
store the error in the bio itself.
- Improvement of the above from myself, by shrinking the bio size
down again to fit in two cachelines on x86-64.
- Revert of the max_hw_sectors cap removal from a revision again,
from Jeff Moyer. This caused performance regressions in various
tests. Reinstate the limit, bump it to a more reasonable size
instead.
- Make /sys/block/<dev>/queue/discard_max_bytes writeable, by me.
Most devices have huge trim limits, which can cause nasty latencies
when deleting files. Enable the admin to configure the size down.
We will look into having a more sane default instead of UINT_MAX
sectors.
- Improvement of the SGP gaps logic from Keith Busch.
- Enable the block core to handle arbitrarily sized bios, which
enables a nice simplification of bio_add_page() (which is an IO hot
path). From Kent.
- Improvements to the partition io stats accounting, making it
faster. From Ming Lei.
- Also from Ming Lei, a basic fixup for overflow of the sysfs pending
file in blk-mq, as well as a fix for a blk-mq timeout race
condition.
- Ming Lin has been carrying Kents above mentioned patches forward
for a while, and testing them. Ming also did a few fixes around
that.
- Sasha Levin found and fixed a use-after-free problem introduced by
the bio->bi_error changes from Christoph.
- Small blk cgroup cleanup from Viresh Kumar"
* 'for-4.3/core' of git://git.kernel.dk/linux-block: (26 commits)
blk: Fix bio_io_vec index when checking bvec gaps
block: Replace SG_GAPS with new queue limits mask
block: bump BLK_DEF_MAX_SECTORS to 2560
Revert "block: remove artifical max_hw_sectors cap"
blk-mq: fix race between timeout and freeing request
blk-mq: fix buffer overflow when reading sysfs file of 'pending'
Documentation: update notes in biovecs about arbitrarily sized bios
block: remove bio_get_nr_vecs()
fs: use helper bio_add_page() instead of open coding on bi_io_vec
block: kill merge_bvec_fn() completely
md/raid5: get rid of bio_fits_rdev()
md/raid5: split bio for chunk_aligned_read
block: remove split code in blkdev_issue_{discard,write_same}
btrfs: remove bio splitting and merge_bvec_fn() calls
bcache: remove driver private bio splitting code
block: simplify bio_add_page()
block: make generic_make_request handle arbitrarily sized bios
blk-cgroup: Drop unlikely before IS_ERR(_OR_NULL)
block: don't access bio->bi_error after bio_put()
block: shrink struct bio down to 2 cache lines again
...