GRO/GSO layers can be enabled on a node, even if said
node is only forwarding packets.
This patch permits GSO (and upcoming GRO) support for GRE
encapsulated packets, even if the host has no GRE tunnel setup.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes some typos found by Sergei.
Reported-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Gross says:
====================
[GIT net-next] Open vSwitch
Open vSwitch changes for net-next/3.14. Highlights are:
* Performance improvements in the mechanism to get packets to userspace
using memory mapped netlink and skb zero copy where appropriate.
* Per-cpu flow stats in situations where flows are likely to be shared
across CPUs. Standard flow stats are used in other situations to save
memory and allocation time.
* A handful of code cleanups and rationalization.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Several functions and datastructures could be local
Found with 'make namespacecheck'
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Jesse Gross <jesse@nicira.com>
The copy & csum optimization is no longer present with zerocopy
enabled. Compute the checksum in skb_gso_segment() directly by
dropping the HW CSUM capability from the features passed in.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Use of skb_zerocopy() can avoid the expensive call to memcpy()
when copying the packet data into the Netlink skb. Completes
checksum through skb_checksum_help() if not already done in
GSO segmentation.
Zerocopy is only performed if user space supported unaligned
Netlink messages. memory mapped netlink i/o is preferred over
zerocopy if it is set up.
Cost of upcall is significantly reduced from:
+ 7.48% vhost-8471 [k] memcpy
+ 5.57% ovs-vswitchd [k] memcpy
+ 2.81% vhost-8471 [k] csum_partial_copy_generic
to:
+ 5.72% ovs-vswitchd [k] memcpy
+ 3.32% vhost-5153 [k] memcpy
+ 0.68% vhost-5153 [k] skb_zerocopy
(megaflows disabled)
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Allows removing the net and dp_ifindex argument and simplify the
code.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Drop user features if an outdated user space instance that does not
understand the concept of user_features attempted to create a new
datapath.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Make the skb zerocopy logic written for nfnetlink queue available for
use by other modules.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jesse Gross <jesse@nicira.com>
As we're only doing a kfree() anyway in the RCU callback, we can
simply use kfree_rcu, which does the same job, and remove the
function rcu_free_sw_flow_mask_cb() and rcu_free_acts_callback().
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
With mega flow implementation ovs flow can be shared between
multiple CPUs which makes stats updates highly contended
operation. This patch uses per-CPU stats in cases where a flow
is likely to be shared (if there is a wildcard in the 5-tuple
and therefore likely to be spread by RSS). In other situations,
it uses the current strategy, saving memory and allocation time.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
An insufficent ring frame size configuration can lead to an
unnecessary skb allocation for every Netlink message. Check frame
size before taking the queue lock and allocating the skb and
re-check with lock to be safe.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Allocates a new sk_buff large enough to cover the specified payload
plus required Netlink headers. Will check receiving socket for
memory mapped i/o capability and use it if enabled. Will fall back
to non-mapped skb if message size exceeds the frame size of the ring.
Signed-of-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Flow lookup can happen either in packet processing context or userspace
context but it was annotated as requiring RCU read lock to be held. This
also allows OVS mutex to be held without causing warnings.
Reported-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Reviewed-by: Thomas Graf <tgraf@redhat.com>
API changes only for code readability. No functional chnages.
This patch removes the underscored version. Added a new API
ovs_flow_tbl_lookup_stats() that returns the n_mask_hits.
Reported by: Ben Pfaff <blp@nicira.com>
Reviewed-by: Thomas Graf <tgraf@redhat.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
We won't normally have a ton of flow masks but using a size_t to store
values no bigger than sizeof(struct sw_flow_key) seems excessive.
This reduces sw_flow_key_range and sw_flow_mask by 4 bytes on 32-bit
systems. On 64-bit systems it shrinks sw_flow_key_range by 12 bytes but
sw_flow_mask only by 8 bytes due to padding.
Compile tested only.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Conflicts:
drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c
net/ipv6/ip6_tunnel.c
net/ipv6/ip6_vti.c
ipv6 tunnel statistic bug fixes conflicting with consolidation into
generic sw per-cpu net stats.
qlogic conflict between queue counting bug fix and the addition
of multiple MAC address support.
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the rfcomm_carrier_raised() definition as that function isn't
used anymore.
Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch fixes two regressions introduced with the recent rfcomm tty
rework.
The current code uses the carrier_raised() method to wait for the
bluetooth connection when a process opens the tty.
However processes may open the port with the O_NONBLOCK flag or set the
CLOCAL termios flag: in these cases the open() syscall returns
immediately without waiting for the bluetooth connection to
complete.
This behaviour confuses userspace which expects an established bluetooth
connection.
The patch restores the old behaviour by waiting for the connection in
rfcomm_dev_activate() and removes carrier_raised() from the tty_port ops.
As a side effect the new code also fixes the case in which the rfcomm
tty device is created with the flag RFCOMM_REUSE_DLC: the old code
didn't call device_move() and ModemManager skipped the detection
probe. Now device_move() is always called inside rfcomm_dev_activate().
Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it>
Reported-by: Andrey Vihrov <andrey.vihrov@gmail.com>
Reported-by: Beson Chow <blc+bluez@mail.vanade.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This is a preparatory patch which moves the rfcomm_get_device()
definition before rfcomm_dev_activate() where it will be used.
Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch fixes a userspace regression introduced by the commit
29cd718b.
If the rfcomm device was created with the flag RFCOMM_RELEASE_ONHUP the
user space expects that the tty_port is released as soon as the last
process closes the tty.
The current code attempts to release the port in the function
rfcomm_dev_state_change(). However it won't get a reference to the
relevant tty to send a HUP: at that point the tty is already destroyed
and therefore NULL.
This patch fixes the regression by taking over the tty refcount in the
tty install method(). This way the tty_port is automatically released as
soon as the tty is destroyed.
As a consequence the check for RFCOMM_RELEASE_ONHUP flag in the hangup()
method is now redundant. Instead we have to be careful with the reference
counting in the rfcomm_release_dev() function.
Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it>
Reported-by: Alexander Holler <holler@ahsoftware.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
action flushing missaccounting
Account only for deleted actions
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove unnecessary checks for act->ops
(suggested by Eric Dumazet).
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use DEVICE_ATTR_RO/RW macros to simplify bridge sysfs attribute definitions.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
br_multicast_set_hash_max() is called from process context in
net/bridge/br_sysfs_br.c by the sysfs store_hash_max() function.
br_multicast_set_hash_max() calls spin_lock(&br->multicast_lock),
which can deadlock the CPU if a softirq that also tries to take the
same lock interrupts br_multicast_set_hash_max() while the lock is
held . This can happen quite easily when any of the bridge multicast
timers expire, which try to take the same lock.
The fix here is to use spin_lock_bh(), preventing other softirqs from
executing on this CPU.
Steps to reproduce:
1. Create a bridge with several interfaces (I used 4).
2. Set the "multicast query interval" to a low number, like 2.
3. Enable the bridge as a multicast querier.
4. Repeatedly set the bridge hash_max parameter via sysfs.
# brctl addbr br0
# brctl addif br0 eth1 eth2 eth3 eth4
# brctl setmcqi br0 2
# brctl setmcquerier br0 1
# while true ; do echo 4096 > /sys/class/net/br0/bridge/hash_max; done
Signed-off-by: Curt Brune <curt@cumulusnetworks.com>
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla posted a patch trying to address following problem :
<quote>
The vxlan driver sets itself as the socket owner for all the TX flows
it encapsulates (using vxlan_set_owner()) and assigns it's own skb
destructor. This causes all tunneled traffic to land up on only one TXQ
as all encapsulated skbs refer to the vxlan socket and not the original
socket. Also, the vxlan skb destructor breaks some functionality for
tunneled traffic like wmem accounting and as TCP small queues and
FQ/pacing packet scheduler.
</quote>
I reworked Sathya patch and added some explanations.
vxlan_xmit() can avoid one skb_clone()/dev_kfree_skb() pair
and gain better drop monitor accuracy, by calling kfree_skb() when
appropriate.
The UDP socket used by vxlan to perform encapsulation of xmit packets
do not need to be alive while packets leave vxlan code. Its better
to keep original socket ownership to get proper feedback from qdisc and
NIC layers.
We use skb->sk to
A) control amount of bytes/packets queued on behalf of a socket, but
prior vxlan code did the skb->sk transfert without any limit/control
on vxlan socket sk_sndbuf.
B) security purposes (as selinux) or netfilter uses, and I do not think
anything is prepared to handle vxlan stacked case in this area.
By not changing ownership, vxlan tunnels behave like other tunnels.
As Stephen mentioned, we might do the same change in L2TP.
Reported-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
TCP out_of_order_queue lock is not used, as queue manipulation
happens with socket lock held and we therefore use the lockless
skb queue routines (as __skb_queue_head())
We can use __skb_queue_head_init() instead of skb_queue_head_init()
to make this more consistent.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It does not make sense to create an anycast address for an /128-prefix.
Suppress it.
As 32019e651c ("ipv6: Do not leave router anycast address for /127
prefixes.") shows we also may not leave them, because we could accidentally
remove an anycast address the user has allocated or got added via another
prefix.
Cc: François-Xavier Le Bail <fx.lebail@yahoo.com>
Cc: Thomas Haller <thaller@redhat.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The existing serial state notification handling expected older Option
devices, having a hardcoded assumption that the Modem port was always
USB interface #2. That isn't true for devices from the past few years.
hso_serial_state_notification is a local cache of a USB Communications
Interface Class SERIAL_STATE notification from the device, and the
USB CDC specification (section 6.3, table 67 "Class-Specific Notifications")
defines wIndex as the USB interface the event applies to. For hso
devices this will always be the Modem port, as the Modem port is the
only port which is set up to receive them by the driver.
So instead of always expecting USB interface #2, instead validate the
notification with the actual USB interface number of the Modem port.
Signed-off-by: Dan Williams <dcbw@redhat.com>
Tested-by: H. Nikolaus Schaller <hns@goldelico.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It returns 0 in case of success, !0 error otherwise. Fix the improper error
verification.
Fixes: 2c9839c143 ("bonding: add num_grat_arp attribute netlink support")
CC: sfeldma@cumulusnetworks.com
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Acked-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace the boolean value with the error code for the return value
of the rtl_ops_init().
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some information of the device may be used in other functions. Move
the relative code to make sure it would be initialzed correctly
before using it.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace the tabs of the variables declaration with the spaces.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With arm:allmodconfig, building the Teles PCI driver fails with
telespci.c:294:2: error: #error "not running on big endian machines now"
Similar, building the driver for HFC PCI-Bus cards fails with
hfc_pci.c:1647:2: error: #error "not running on big endian machines now"
Remove the big endian cpp check from both drivers to fix the build errors.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SDIO identifier for Broadcom WLAN devices were defined in the
brcmfmac SDIO driver. Moving the definitions in MMC header file
seems common sense.
Reviewed-by: Hante Meuleman <meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The destructor for net devices was set to free_netdev() to get rid
of it and the private data. The private data refers to a brcmf_if
instance, but indirectly it also refers to brcmf_cfg80211_vif which
holds the wdev. This is freed as well by using a new custom destructor
called brcmf_cfg80211_free_netdev().
Reviewed-by: Hante Meuleman <meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Instead of calling brcmf_cfg80211_detach() in brcmf_del_if() when
deleting the primary interface, call it in brcmf_detach() after
deleting all interfaces.
Reviewed-by: Hante Meuleman <meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The wiphy_unregister() call was done in brcmf_free_vif() when the
last interface was being removed. This is not the obvious place to
do that. This patch moves it to the brcmf_cfg80211_detach(). This
removes the need to keep count of interfaces.
Reviewed-by: Hante Meuleman <meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Upon unload of the brcmfmac driver it gave a kernel warning because
cfg80211 still believed to be connected to an AP. The brcmfmac had
already transitioned to disconnected state during unload. This patch
adds informing cfg80211 about this transition. This will get rid of
warning from cfg80211 seen upon module unload:
------------[ cut here ]------------
WARNING: CPU: 3 PID: 24303 at net/wireless/core.c:952
cfg80211_netdev_notifier_call+0x193/0x640 [cfg80211]()
Modules linked in: brcmfmac(O-) brcmutil(O) cfg80211(O) ... [last unloaded: bcma]
CPU: 3 PID: 24303 Comm: rmmod Tainted: G W O 3.13.0-rc4-wl-testing-x64-00002-gb472b6d-dirty #1
Hardware name: Dell Inc. Latitude E6410/07XJP9, BIOS A07 02/15/2011
00000000000003b8 ffff8800b211faf8 ffffffff815a7fcd 0000000000000007
0000000000000000 ffff8800b211fb38 ffffffff8104819c ffff880000000000
ffff8800c889d008 ffff8800b2000220 ffff8800c889a000 ffff8800c889d018
Call Trace:
[<ffffffff815a7fcd>] dump_stack+0x46/0x58
[<ffffffff8104819c>] warn_slowpath_common+0x8c/0xc0
[<ffffffff810481ea>] warn_slowpath_null+0x1a/0x20
[<ffffffffa173fd83>] cfg80211_netdev_notifier_call+0x193/0x640 [cfg80211]
[<ffffffff81521ca8>] ? arp_ifdown+0x18/0x20
[<ffffffff8152d75a>] ? fib_disable_ip+0x3a/0x50
[<ffffffff815b143d>] notifier_call_chain+0x4d/0x70
[<ffffffff8106d6e6>] raw_notifier_call_chain+0x16/0x20
[<ffffffff814b9ae0>] call_netdevice_notifiers_info+0x40/0x70
[<ffffffff814b9b26>] call_netdevice_notifiers+0x16/0x20
[<ffffffff814bb59d>] rollback_registered_many+0x17d/0x280
[<ffffffff814bb74d>] rollback_registered+0x2d/0x40
[<ffffffff814bb7c8>] unregister_netdevice_queue+0x68/0xd0
[<ffffffff814bb9c0>] unregister_netdev+0x20/0x30
[<ffffffffa180069e>] brcmf_del_if+0xce/0x180 [brcmfmac]
[<ffffffffa1800b3c>] brcmf_detach+0x6c/0xe0 [brcmfmac]
Reviewed-by: Hante Meuleman <meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The condition to disable the clock at the end of brcmf_sdio_bus_init()
was wrong as the bus state is updated by the calling function. Hence,
the clock was always disabled after brcmf_sdio_bus_init() which was
not the intended behaviour.
Reviewed-by: Hante Meuleman <meuleman@broadcom.com>
Reviewed-by: Franky Lin <frankyl@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Change condition in brcmf_sdio_wd_timer() function to program
watchdog only when in BRCMF_BUS_DATA state. This avoids watchdog
being active during initialization. During initialization the
SDIO save&restore capability is determined which affect the
bus sleep mechanism used in watchdog thread.
Reviewed-by: Hante Meuleman <meuleman@broadcom.com>
Reviewed-by: Franky Lin <frankyl@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The detection of the save&restore capability in brcmf_sdio_sr_capable()
is only valid for certain chipsets. This patch should cover it for all
chipsets currently supported.
Reviewed-by: Hante Meuleman <meuleman@broadcom.com>
Reviewed-by: Franky Lin <frankyl@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The table for BCM4334 SDIO drive strength programming was missing
from the driver. Adding it with this patch set.
Reviewed-by: Franky Lin <frankyl@broadcom.com>
Reviewed-by: Hante Meuleman <meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Using 'iw phy' only showed HT20 support in the HT capabilities info.
This patch determines support for HT40 using a firmware query that
is supposed to work for all supported devices.
Reviewed-by: Hante Meuleman <meuleman@broadcom.com>
Reviewed-by: Franky Lin <frankyl@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Moving code from helper functions to the calling function
as it makes code easier to read.
Reviewed-by: Hante Meuleman <meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Two helper functions in the sdio remove path were very thin and
only used once. So its code is moved to the calling function.
Reviewed-by: Hante Meuleman <meuleman@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>