Commit Graph

198 Commits

Author SHA1 Message Date
Jonathan Lemon
b54c9d5bd6 net: Use skb_frag_off accessors
Use accessor functions for skb fragment's page_offset instead
of direct references, in preparation for bvec conversion.

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 14:21:32 -07:00
Matthew Wilcox (Oracle)
d7840976e3 net: Use skb accessors in network drivers
In preparation for unifying the skb_frag and bio_vec, use the fine
accessors which already exist and use skb_frag_t instead of
struct skb_frag_struct.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-22 20:47:56 -07:00
Fuqian Huang
c5ec23bb19 vmxnet3: Remove call to memset after dma_alloc_coherent
In commit 518a2f1925
("dma-mapping: zero memory returned from dma_alloc_*"),
dma_alloc_coherent has already zeroed the memory.
So memset is not needed.

Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-15 11:06:27 -07:00
Ronak Doshi
3dd7400b41 vmxnet3: turn off lro when rxcsum is disabled
Currently, when rx csum is disabled, vmxnet3 driver does not turn
off lro, which can cause performance issues if user does not turn off
lro explicitly. This patch adds fix_features support which is used to
turn off LRO whenever RXCSUM is disabled.

Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Rishi Mehta <rmehta@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-04 20:05:56 -07:00
Florian Westphal
2638eb8b50 net: ipv4: provide __rcu annotation for ifa_list
ifa_list is protected by rcu, yet code doesn't reflect this.

Add the __rcu annotations and fix up all places that are now reported by
sparse.

I've done this in the same commit to not add intermediate patches that
result in new warnings.

Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-02 18:08:36 -07:00
Luis Chamberlain
750afb08ca cross-tree: phase out dma_zalloc_coherent()
We already need to zero out memory for dma_alloc_coherent(), as such
using dma_zalloc_coherent() is superflous. Phase it out.

This change was generated with the following Coccinelle SmPL patch:

@ replace_dma_zalloc_coherent @
expression dev, size, data, handle, flags;
@@

-dma_zalloc_coherent(dev, size, handle, flags)
+dma_alloc_coherent(dev, size, handle, flags)

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
[hch: re-ran the script on the latest tree]
Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-01-08 07:58:37 -05:00
David S. Miller
6f6e434aa2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
S390 bpf_jit.S is removed in net-next and had changes in 'net',
since that code isn't used any more take the removal.

TLS data structures split the TX and RX components in 'net-next',
put the new struct members from the bug fix in 'net' into the RX
part.

The 'net-next' tree had some reworking of how the ERSPAN code works in
the GRE tunneling code, overlapping with a one-line headroom
calculation fix in 'net'.

Overlapping changes in __sock_map_ctx_update_elem(), keep the bits
that read the prog members via READ_ONCE() into local variables
before using them.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-21 16:01:54 -04:00
YueHaibing
93c65d13d8 vmxnet3: Replace msleep(1) with usleep_range()
As documented in Documentation/timers/timers-howto.txt,
replace msleep(1) with usleep_range().

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 15:55:38 -04:00
hpreg@vmware.com
f3002c1374 vmxnet3: use DMA memory barriers where required
The gen bits must be read first from (resp. written last to) DMA memory.
The proper way to enforce this on Linux is to call dma_rmb() (resp.
dma_wmb()).

Signed-off-by: Regis Duchesne <hpreg@vmware.com>
Acked-by: Ronak Doshi <doshir@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-14 22:43:57 -04:00
hpreg@vmware.com
61aeecea40 vmxnet3: set the DMA mask before the first DMA map operation
The DMA mask must be set before, not after, the first DMA map operation, or
the first DMA map operation could in theory fail on some systems.

Fixes: b0eb57cb97 ("VMXNET3: Add support for virtual IOMMU")
Signed-off-by: Regis Duchesne <hpreg@vmware.com>
Acked-by: Ronak Doshi <doshir@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-14 22:43:57 -04:00
Ronak Doshi
65ec0bd1c7 vmxnet3: fix incorrect dereference when rxvlan is disabled
vmxnet3_get_hdr_len() is used to calculate the header length which in
turn is used to calculate the gso_size for skb. When rxvlan offload is
disabled, vlan tag is present in the header and the function references
ip header from sizeof(ethhdr) and leads to incorrect pointer reference.

This patch fixes this issue by taking sizeof(vlan_ethhdr) into account
if vlan tag is present and correctly references the ip hdr.

Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Guolin Yang <gyang@vmware.com>
Acked-by: Louis Luo <llouis@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19 13:59:05 -04:00
Igor Pylypiv
8137a8e219 vmxnet3: remove unused flag "rxcsum" from struct vmxnet3_adapter
Signed-off-by: Igor Pylypiv <ipylypiv@silver-peak.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-20 10:56:25 -04:00
Ronak Doshi
034f405793 vmxnet3: use correct flag to indicate LRO feature
'Commit 45dac1d6ea ("vmxnet3: Changes for vmxnet3 adapter version 2
(fwd)")' introduced a flag "lro" in structure vmxnet3_adapter which is
used to indicate whether LRO is enabled or not. However, the patch
did not set the flag and hence it was never exercised.

So, when LRO is enabled, it resulted in poor TCP performance due to
delayed acks. This issue is seen with packets which are larger than
the mss getting a delayed ack rather than an immediate ack, thus
resulting in high latency.

This patch removes the lro flag and directly uses device features
against NETIF_F_LRO to check if lro is enabled.

Fixes: 45dac1d6ea ("vmxnet3: Changes for vmxnet3 adapter version 2 (fwd)")
Reported-by: Rachel Lunnon <rachel_lunnon@stormagic.com>
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-17 20:03:53 -04:00
Ronak Doshi
7a4c003d69 vmxnet3: avoid xmit reset due to a race in vmxnet3
The field txNumDeferred is used by the driver to keep track of the number
of packets it has pushed to the emulation. The driver increments it on
pushing the packet to the emulation and the emulation resets it to 0 at
the end of the transmit.

There is a possibility of a race either when (a) ESX is under heavy load or
(b) workload inside VM is of low packet rate.

This race results in xmit hangs when network coalescing is disabled. This
change creates a local copy of txNumDeferred and uses it to perform ring
arithmetic.

Reported-by: Noriho Tanaka <ntanaka@vmware.com>
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-17 20:03:53 -04:00
Colin Ian King
5e264e2b53 vmxnet3: remove redundant initialization of pointer 'rq'
Pointer rq is being initialized but this value is never read, it
is being updated inside a for-loop. Remove the initialization and
move it into the scope of the for-loop.

Cleans up clang warning:
drivers/net/vmxnet3/vmxnet3_drv.c:2763:27: warning: Value stored
to 'rq' during its initialization is never read

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-01 14:54:28 -05:00
David S. Miller
955bd1d216 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24 23:44:15 -05:00
Neil Horman
848b159835 vmxnet3: repair memory leak
with the introduction of commit
b0eb57cb97, it appears that rq->buf_info
is improperly handled.  While it is heap allocated when an rx queue is
setup, and freed when torn down, an old line of code in
vmxnet3_rq_destroy was not properly removed, leading to rq->buf_info[0]
being set to NULL prior to its being freed, causing a memory leak, which
eventually exhausts the system on repeated create/destroy operations
(for example, when  the mtu of a vmxnet3 interface is changed
frequently.

Fix is pretty straight forward, just move the NULL set to after the
free.

Tested by myself with successful results

Applies to net, and should likely be queued for stable, please

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-By: boyang@redhat.com
CC: boyang@redhat.com
CC: Shrikrishna Khare <skhare@vmware.com>
CC: "VMware, Inc." <pv-drivers@vmware.com>
CC: David S. Miller <davem@davemloft.net>
Acked-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23 19:57:52 -05:00
Shrikrishna Khare
7475908fbe vmxnet3: increase default rx ring sizes
There are several reasons for increasing the receive ring sizes:

1. The original ring size of 256 was chosen about 10 years ago when
vmxnet3 was first created. At that time, 10Gbps Ethernet was not prevalent
and servers were dominated by 1Gbps Ethernet. Now 10Gbps is common place,
and higher bandwidth links -- 25Gbps, 40Gbps, 50Gbps -- are starting
to appear. 256 Rx ring entries are simply not enough to keep up with
higher link speed when there is a burst of network frames coming from
these high speed links. Even with full MTU size frames, they are gone
in a short time. It is also more common to have a mix of frame sizes,
and more likely bi-modal distribution of frame sizes so the average frame
size is not close to full MTU. If we consider average frame size of 800B,
1024 frames that come in a burst takes ~0.65 ms to arrive at 10Gbps. With
256 entires, it takes ~0.16 ms to arrive at 10Gbps.  At 25Gbps or 40Gbps,
this time is reduced accordingly.

2. On a hypervisor where there are many VMs and CPU is over committed,
i.e. the number of VCPUs is more than the number of VCPUs, each PCPU is
in effect time shared between multiple VMs/VCPUs. The time granularity at
which this multiplexing occurs is typically coarser than between processes
on a guest OS. Trying to time slice more finely is not efficient, for
example, if memory cache is barely warmed up when switching from one VM
to another occurs. This CPU overcommit adds delay to when the driver
in a VM can service incoming packets. Whether CPU is over committed
really depends on customer workloads. For certain situations, it is very
common. For example, workloads of desktop VMs and product testing setups.
Consolidation and sharing is what drives efficiency of a customer setup
for such workloads. In these situations, the raw network bandwidth may
not be very high, but the delays between when a VM is running or not
running can also be relatively long.

Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Acked-by: Jin Heo <heoj@vmware.com>
Acked-by: Guolin Yang <gyang@vmware.com>
Acked-by: Boon Ang <bang@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-30 14:06:58 -05:00
Arnd Bergmann
c7673e4dea vmxnet3: avoid format strint overflow warning
gcc-7 notices that "-event-%d" could be more than 11 characters long
if we had larger 'vector' numbers:

drivers/net/vmxnet3/vmxnet3_drv.c: In function 'vmxnet3_activate_dev':
drivers/net/vmxnet3/vmxnet3_drv.c:2095:40: error: 'sprintf' may write a terminating nul past the end of the destination [-Werror=format-overflow=]
sprintf(intr->event_msi_vector_name, "%s-event-%d",
                                     ^~~~~~~~~~~~~
drivers/net/vmxnet3/vmxnet3_drv.c:2095:3: note: 'sprintf' output between 9 and 33 bytes into a destination of size 32

The current code is safe, but making the string a little longer
is harmless and lets gcc see that it's ok.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-14 09:03:11 -07:00
Neil Horman
1c4d5f51a8 vmxnet3: ensure that adapter is in proper state during force_close
There are several paths in vmxnet3, where settings changes cause the
adapter to be brought down and back up (vmxnet3_set_ringparam among
them).  Should part of the reset operation fail, these paths call
vmxnet3_force_close, which enables all napi instances prior to calling
dev_close (with the expectation that vmxnet3_close will then properly
disable them again).  However, vmxnet3_force_close neglects to clear
VMXNET3_STATE_BIT_QUIESCED prior to calling dev_close.  As a result
vmxnet3_quiesce_dev (called from vmxnet3_close), returns early, and
leaves all the napi instances in a enabled state while the device itself
is closed.  If a device in this state is activated again, napi_enable
will be called on already enabled napi_instances, leading to a BUG halt.

The fix is to simply enausre that the QUIESCED bit is cleared in
vmxnet3_force_close to allow quesence to be completed properly on close.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Shrikrishna Khare <skhare@vmware.com>
CC: "VMware, Inc." <pv-drivers@vmware.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-12 12:23:52 -04:00
Philippe Reynes
3426bd7277 net: vmxnet3: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 19:26:52 -07:00
Eric Dumazet
6ad20165d3 drivers: net: generalize napi_complete_done()
napi_complete_done() allows to opt-in for gro_flush_timeout,
added back in linux-3.19, commit 3b47d30396
("net: gro: add a per device gro flush timer")

This allows for more efficient GRO aggregation without
sacrifying latencies.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30 15:10:42 -05:00
stephen hemminger
bc1f44709c net: make ndo_get_stats64 a void function
The network device operation for reading statistics is only called
in one place, and it ignores the return value. Having a structure
return value is potentially confusing because some future driver could
incorrectly assume that the return value was used.

Fix all drivers with ndo_get_stats64 to have a void function.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-08 17:51:44 -05:00
Linus Torvalds
4d5b57e05a Updates for 4.10 kernel merge window
- Shared mlx5 updates with net stack (will drop out on merge if Dave's
   tree has already been merged)
 - Driver updates: cxgb4, hfi1, hns-roce, i40iw, mlx4, mlx5, qedr, rxe
 - Debug cleanups
 - New connection rejection helpers
 - SRP updates
 - Various misc fixes
 - New paravirt driver from vmware
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYUbAPAAoJELgmozMOVy/dMXcP/iuG5MNzfN8Ny1JftyBQGWg3
 cqoQ2OLj9CsXjwVB+5EqbcZHRZY852lKONaLoDKkIOx4YAXO2YuIKOp944vN7EQx
 96wfqzT1F5jzAcy5mYZXgLaStGFDAwejKMqeHd0LfJj3OEtemGnVPWYzyqSQmSKo
 dzJraS1Z9GIRppzU5WaRpB9PtRBkqIqGJ5vZ0EKLGhed5hYY5r0iMJB0GfriMRDO
 lJ4UUVfpsAoLPnqDBFH6IMn2V2UeAw9IR5zNa1mrM1RBfvt/uYTxrw1w3p9WoaNs
 GRodhk4DCeAfeyqzVPNBLyXZ4Zq4FzGe3UWM4qysJ1RR4oFNw9Cuw0Fqk8mrfznr
 7hv5TpGIckRZiKf8l6e+qLirF0qGtXJg29j2vPVQI9i5nSj95g1agA81PnLQlLLb
 flWyxeMj81my7lfMHN1xcV6pqPEKMCOysZmfcvVfJd2XxpjuVD7ekl/YXWp8o8kU
 YPdQMqPD626XsD8VpPdMszb9FPmx0JD0HEv+Y1rIFX8JegEI+c3H2X0dqC27T/Ou
 FEPWOy025EgHm0Fh/7eIzkG6tjZ4JHoCugJAcxNZGj2XW4eB6r5vY8UwJ8iQRv+n
 PVYHiy0UoIRePh0mrdOSSphGZMi/GO/DsqKwCtAMEK43WqZQju6wR7QSIGkh66mp
 4uSHJqpf3YEYylxGMhk3
 =QeGy
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma updates from Doug Ledford:
 "This is the complete update for the rdma stack for this release cycle.

  Most of it is typical driver and core updates, but there is the
  entirely new VMWare pvrdma driver. You may have noticed that there
  were changes in DaveM's pull request to the bnxt Ethernet driver to
  support a RoCE RDMA driver. The bnxt_re driver was tentatively set to
  be pulled in this release cycle, but it simply wasn't ready in time
  and was dropped (a few review comments still to address, and some
  multi-arch build issues like prefetch() not working across all
  arches).

  Summary:

   - shared mlx5 updates with net stack (will drop out on merge if
     Dave's tree has already been merged)

   - driver updates: cxgb4, hfi1, hns-roce, i40iw, mlx4, mlx5, qedr, rxe

   - debug cleanups

   - new connection rejection helpers

   - SRP updates

   - various misc fixes

   - new paravirt driver from vmware"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (210 commits)
  IB: Add vmw_pvrdma driver
  IB/mlx4: fix improper return value
  IB/ocrdma: fix bad initialization
  infiniband: nes: return value of skb_linearize should be handled
  MAINTAINERS: Update Intel RDMA RNIC driver maintainers
  MAINTAINERS: Remove Mitesh Ahuja from emulex maintainers
  IB/core: fix unmap_sg argument
  qede: fix general protection fault may occur on probe
  IB/mthca: Replace pci_pool_alloc by pci_pool_zalloc
  mlx5, calc_sq_size(): Make a debug message more informative
  mlx5: Remove a set-but-not-used variable
  mlx5: Use { } instead of { 0 } to init struct
  IB/srp: Make writing the add_target sysfs attr interruptible
  IB/srp: Make mapping failures easier to debug
  IB/srp: Make login failures easier to debug
  IB/srp: Introduce a local variable in srp_add_one()
  IB/srp: Fix CONFIG_DYNAMIC_DEBUG=n build
  IB/multicast: Check ib_find_pkey() return value
  IPoIB: Avoid reading an uninitialized member variable
  IB/mad: Fix an array index check
  ...
2016-12-15 12:03:32 -08:00
Adit Ranadive
b1226c7db1 vmxnet3: Move PCI Id to pci_ids.h
The VMXNet3 PCI Id will be shared with our paravirtual RDMA driver.
Moved it to the shared location in pci_ids.h.

Suggested-by: Leon Romanovsky <leon@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-03 15:38:24 -05:00
David S. Miller
27058af401 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Mostly simple overlapping changes.

For example, David Ahern's adjacency list revamp in 'net-next'
conflicted with an adjacency list traversal bug fix in 'net'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-30 12:42:58 -04:00
Jarod Wilson
d0c2c9973e net: use core MTU range checking in virt drivers
hyperv_net:
- set min/max_mtu, per Haiyang, after rndis_filter_device_add

virtio_net:
- set min/max_mtu
- remove virtnet_change_mtu

vmxnet3:
- set min/max_mtu

xen-netback:
- min_mtu = 0, max_mtu = 65517

xen-netfront:
- min_mtu = 0, max_mtu = 65535

unisys/visor:
- clean up defines a little to not clash with network core or add
  redundat definitions

CC: netdev@vger.kernel.org
CC: virtualization@lists.linux-foundation.org
CC: "K. Y. Srinivasan" <kys@microsoft.com>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: "Michael S. Tsirkin" <mst@redhat.com>
CC: Shrikrishna Khare <skhare@vmware.com>
CC: "VMware, Inc." <pv-drivers@vmware.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Paul Durrant <paul.durrant@citrix.com>
CC: David Kershner <david.kershner@unisys.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20 14:51:09 -04:00
Alexey Khoroshilov
fb5c6cfaec vmxnet3: avoid assumption about invalid dma_pa in vmxnet3_set_mc()
vmxnet3_set_mc() checks new_table_pa returned by dma_map_single()
with dma_mapping_error(), but even there it assumes zero is invalid pa
(it assumes dma_mapping_error(...,0) returns true if new_table is NULL).

The patch adds an explicit variable to track status of new_table_pa.

Found by Linux Driver Verification project (linuxtesting.org).

v2: use "bool" and "true"/"false" for boolean variables.
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-15 17:47:32 -04:00
Benjamin Poirier
277964e19e vmxnet3: Wake queue from reset work
vmxnet3_reset_work() expects tx queues to be stopped (via
vmxnet3_quiesce_dev -> netif_tx_disable). However, this races with the
netif_wake_queue() call in netif_tx_timeout() such that the driver's
start_xmit routine may be called unexpectedly, triggering one of the BUG_ON
in vmxnet3_map_pkt with a stack trace like this:

RIP: 0010:[<ffffffffa00cf4bc>] vmxnet3_map_pkt+0x3ac/0x4c0 [vmxnet3]
 [<ffffffffa00cf7e0>] vmxnet3_tq_xmit+0x210/0x4e0 [vmxnet3]
 [<ffffffff813ab144>] dev_hard_start_xmit+0x2e4/0x4c0
 [<ffffffff813c956e>] sch_direct_xmit+0x17e/0x1e0
 [<ffffffff813c96a7>] __qdisc_run+0xd7/0x130
 [<ffffffff813a6a7a>] net_tx_action+0x10a/0x200
 [<ffffffff810691df>] __do_softirq+0x11f/0x260
 [<ffffffff81472fdc>] call_softirq+0x1c/0x30
 [<ffffffff81004695>] do_softirq+0x65/0xa0
 [<ffffffff81069b89>] local_bh_enable_ip+0x99/0xa0
 [<ffffffffa031ff36>] destroy_conntrack+0x96/0x110 [nf_conntrack]
 [<ffffffff813d65e2>] nf_conntrack_destroy+0x12/0x20
 [<ffffffff8139c6d5>] skb_release_head_state+0xb5/0xf0
 [<ffffffff8139d299>] skb_release_all+0x9/0x20
 [<ffffffff8139cfe9>] __kfree_skb+0x9/0x90
 [<ffffffffa00d0069>] vmxnet3_quiesce_dev+0x209/0x340 [vmxnet3]
 [<ffffffffa00d020a>] vmxnet3_reset_work+0x6a/0xa0 [vmxnet3]
 [<ffffffff8107d7cc>] process_one_work+0x16c/0x350
 [<ffffffff810804fa>] worker_thread+0x17a/0x410
 [<ffffffff810848c6>] kthread+0x96/0xa0
 [<ffffffff81472ee4>] kernel_thread_helper+0x4/0x10

Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-04 00:36:13 -04:00
David S. Miller
6abdd5f593 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
All three conflicts were cases of simple overlapping
changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-30 00:54:02 -04:00
Wei Yongjun
bb40aca7cf vmxnet3: fix non static symbol warning
Fixes the following sparse warning:

drivers/net/vmxnet3/vmxnet3_drv.c:1645:1: warning:
 symbol 'vmxnet3_rq_destroy_all_rxdataring' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-25 16:42:22 -07:00
Shrikrishna Khare
ff2e7d5d51 vmxnet3: fix tx data ring copy for variable size
'Commit 3c8b3efc06 ("vmxnet3: allow variable length transmit data ring
buffer")' changed the size of the buffers in the tx data ring from a
fixed size of 128 bytes to a variable size.

However, while copying data to the data ring, vmxnet3_copy_hdr continues
to carry the old code that assumes fixed buffer size of 128. This patch
fixes it by adding correct offset based on the actual data ring buffer
size.

Signed-off-by: Guolin Yang <gyang@vmware.com>
Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19 22:44:22 -07:00
Shrikrishna Khare
6af9d78745 vmxnet3: update to version 3
With all vmxnet3 version 3 changes incorporated in the vmxnet3 driver,
the driver can configure emulation to run at vmxnet3 version 3, provided
the emulation advertises support for version 3.

Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16 22:37:05 -07:00
Shrikrishna Khare
474432229f vmxnet3: introduce command to register memory region
In vmxnet3 version 3, the emulation added support for the vmxnet3 driver
to communicate information about the memory regions the driver will use
for rx/tx buffers. The driver can also indicate which rx/tx queue the
memory region is applicable for. If this information is communicated
to the emulation, the emulation will always keep these memory regions
mapped, thereby avoiding the mapping/unmapping overhead for every packet.

Currently, Linux vmxnet3 driver does not leverage this capability. The
feasibility of using this approach for the Linux vmxnet3 driver will be
investigated independently and if possible, will be part of a different
patch. This patch only exposes the emulation capability to the driver
(vmxnet3_defs.h is identical between the driver and the emulation).

Signed-off-by: Guolin Yang <gyang@vmware.com>
Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16 22:37:05 -07:00
Shrikrishna Khare
4edef40ef5 vmxnet3: add support for get_coalesce, set_coalesce ethtool operations
The emulation supports a variety of coalescing modes viz. disabled
(no coalescing), adaptive, static (number of packets to batch before
raising an interrupt), rate based (number of interrupts per second).

This patch implements get_coalesce and set_coalesce methods to allow
querying and configuring different coalescing modes.

Signed-off-by: Keyong Sun <sunk@vmware.com>
Signed-off-by: Manoj Tammali <tammalim@vmware.com>
Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16 22:37:04 -07:00
Shrikrishna Khare
50a5ce3e71 vmxnet3: add receive data ring support
vmxnet3 driver preallocates buffers for receiving packets and posts the
buffers to the emulation. In order to deliver a received packet to the
guest, the emulation must map buffer(s) and copy the packet into it.

To avoid this memory mapping overhead, this patch introduces the receive
data ring - a set of small sized buffers that are always mapped by
the emulation. If a packet fits into the receive data ring buffer, the
emulation delivers the packet via the receive data ring (which must be
copied by the guest driver), or else the usual receive path is used.

Receive Data Ring buffer length is configurable via ethtool -G ethX rx-mini

Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16 22:37:04 -07:00
Shrikrishna Khare
3c8b3efc06 vmxnet3: allow variable length transmit data ring buffer
vmxnet3 driver supports transmit data ring viz. a set of fixed size
buffers used by the driver to copy packet headers. Small packets that
fit these buffers are copied into these buffers entirely.

Currently this buffer size of fixed at 128 bytes. This patch extends
transmit data ring implementation to allow variable length transmit
data ring buffers. The length of the buffer is read from the emulation
during initialization.

Signed-off-by: Sriram Rangarajan <rangarajans@vmware.com>
Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16 22:37:04 -07:00
Shrikrishna Khare
f35c7480f8 vmxnet3: introduce generalized command interface to configure the device
Shared memory is used to exchange information between the vmxnet3 driver
and the emulation. In order to request emulation to perform a task, the
driver first populates specific fields in this shared memory and then
issues corresponding command by writing to the command register(CMD). The
layout of the shared memory was defined by vmxnet3 version 1 and cannot
be extended for every new command without breaking backward compatibility.

To address this problem, in vmxnet3 version 3, the emulation repurposed
a reserved field in the shared memory to represent command information
instead. For new commands, the driver first populates the command
information field in the shared memory and then issues the command. The
emulation interprets the data written to the command information depending
on the type of the command. This patch exposes this capability to the driver.

Signed-off-by: Guolin Yang <gyang@vmware.com>
Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16 22:37:04 -07:00
Shrikrishna Khare
190af10f0b vmxnet3: prepare for version 3 changes
vmxnet3 is currently at version 2, but some command definitions from
previous vmxnet3 versions are missing. Add those definitions before
moving to version 3.

Also, introduce utility macros for vmxnet3 version comparison and update
Copyright information and Maintained by.

Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16 22:37:04 -07:00
Shrikrishna Khare
50219538ff vmxnet3: segCnt can be 1 for LRO packets
The device emulation may send segCnt of 1 for LRO packets.

Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: Jin Heo <heoj@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10 00:15:11 -07:00
Shrikrishna Khare
f0d437809d Driver: Vmxnet3: set CHECKSUM_UNNECESSARY for IPv6 packets
For IPv6, if the device indicates that the checksum is correct, set
CHECKSUM_UNNECESSARY.

Reported-by: Subbarao Narahari <snarahari@vmware.com>
Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: Jin Heo <heoj@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:28:05 -04:00
Arnd Bergmann
efc21d9506 vmxnet3: fix lock imbalance in vmxnet3_tq_xmit()
A recent bug fix rearranged the code in vmxnet3_tq_xmit() in a
way that left the error handling for oversized headers unlock
a lock that had not been taken yet. Gcc warns about the incorrect
use of the 'flags' variable because of that:

drivers/net/vmxnet3/vmxnet3_drv.c: In function 'vmxnet3_tq_xmit.constprop':
include/linux/spinlock.h:246:3: error: 'flags' may be used uninitialized in this function [-Werror=maybe-uninitialized]

This changes the error handling path to 'goto' the end of the function
beyond the lock/unlock pair.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: cec05562fb ("vmxnet3: avoid calling pskb_may_pull with interrupts disabled")
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 13:10:29 -04:00
Neil Horman
cec05562fb vmxnet3: avoid calling pskb_may_pull with interrupts disabled
vmxnet3 has a function vmxnet3_parse_and_copy_hdr which, among other operations,
uses pskb_may_pull to linearize the header portion of an skb.  That operation
eventually uses local_bh_disable/enable to ensure that it doesn't race with the
drivers bottom half handler.  Unfortunately, vmxnet3 preforms this
parse_and_copy operation with a spinlock held and interrupts disabled.  This
causes us to run afoul of the WARN_ON_ONCE(irqs_disabled()) warning in
local_bh_enable, resulting in this:

WARNING: at kernel/softirq.c:159 local_bh_enable+0x59/0x90() (Not tainted)
Hardware name: VMware Virtual Platform
Modules linked in: ipv6 ppdev parport_pc parport microcode e1000 vmware_balloon
vmxnet3 i2c_piix4 sg ext4 jbd2 mbcache sd_mod crc_t10dif sr_mod cdrom mptspi
mptscsih mptbase scsi_transport_spi pata_acpi ata_generic ata_piix vmwgfx ttm
drm_kms_helper drm i2c_core dm_mirror dm_region_hash dm_log dm_mod [last
unloaded: mperf]
Pid: 6229, comm: sshd Not tainted 2.6.32-616.el6.i686 #1
Call Trace:
 [<c04624d9>] ? warn_slowpath_common+0x89/0xe0
 [<c0469e99>] ? local_bh_enable+0x59/0x90
 [<c046254b>] ? warn_slowpath_null+0x1b/0x20
 [<c0469e99>] ? local_bh_enable+0x59/0x90
 [<c07bb936>] ? skb_copy_bits+0x126/0x210
 [<f8d1d9fe>] ? ext4_ext_find_extent+0x24e/0x2d0 [ext4]
 [<c07bc49e>] ? __pskb_pull_tail+0x6e/0x2b0
 [<f95a6164>] ? vmxnet3_xmit_frame+0xba4/0xef0 [vmxnet3]
 [<c05d15a6>] ? selinux_ip_postroute+0x56/0x320
 [<c0615988>] ? cfq_add_rq_rb+0x98/0x110
 [<c0852df8>] ? packet_rcv+0x48/0x350
 [<c07c5839>] ? dev_queue_xmit_nit+0xc9/0x140
...

Fix it by splitting vmxnet3_parse_and_copy_hdr into two functions:

vmxnet3_parse_hdr, which sets up the internal/on stack ctx datastructure, and
pulls the skb (both of which can be done without holding the spinlock with irqs
disabled

and

vmxnet3_copy_header, which just copies the skb to the tx ring under the lock
safely.

tested and shown to correct the described problem.  Applies cleanly to the head
of the net tree

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Shrikrishna Khare <skhare@vmware.com>
CC: "VMware, Inc." <pv-drivers@vmware.com>
CC: "David S. Miller" <davem@davemloft.net>
Acked-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-07 15:15:24 -05:00
Shrikrishna Khare
14112ca562 Driver: Vmxnet3: Update Rx ring 2 max size
Device emulation supports max size of 4096.

Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: Bhavesh Davda <bhavesh@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-21 22:04:15 -05:00
Shrikrishna Khare
58caf63736 Driver: Vmxnet3: Fix regression caused by 5738a09
Reported-by: Bingkuo Liu <bingkuol@vmware.com>
Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-06 16:20:13 -05:00
Alexey Khoroshilov
5738a09d58 vmxnet3: fix checks for dma mapping errors
vmxnet3_drv does not check dma_addr with dma_mapping_error()
after mapping dma memory. The patch adds the checks and
tries to handle failures.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Acked-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-01 15:19:16 -05:00
Shrikrishna Khare
d37d5ec861 Driver: Vmxnet3: Fix use of mfTableLen for big endian architectures
Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Reported-by: Masao Uebayashi <uebayasi@gmail.com>
Signed-off-by: Bhavesh Davda <bhavesh@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16 15:06:47 -05:00
Ivan Vecera
47ea032533 drivers/net: get rid of unnecessary initializations in .get_drvinfo()
Many drivers initialize uselessly n_priv_flags, n_stats, testinfo_len,
eedump_len & regdump_len fields in their .get_drvinfo() ethtool op.
It's not necessary as these fields is filled in ethtool_get_drvinfo().

v2: removed unused variable
v3: removed another unused variable

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-16 00:24:10 -07:00
Shrikrishna Khare
b6bd9b5448 Driver: Vmxnet3: Extend register dump support
Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: Bhavesh Davda <bhavesh@vmware.com>
Acked-by: Srividya Murali <smurali@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-23 15:06:27 -07:00
Neil Horman
0769636cb5 vmxnet3: prevent receive getting out of sequence on napi poll
vmxnet3's current napi path is built to count every rx descriptor we recieve,
and use that as a count of the napi budget.  That means its possible to return
from a napi poll halfway through recieving a fragmented packet accross multiple
dma descriptors.  If that happens, the next napi poll will start with the
descriptor ring in an improper state (e.g. the first descriptor we look at may
have the end-of-packet bit set), which will cause a BUG halt in the driver.

Fix the issue by only counting whole received packets in the napi poll and
returning that value, rather than the descriptor count.

Tested by the reporter and myself, successfully

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Shreyas Bhatewara <sbhatewara@vmware.com>
CC: "David S. Miller" <davem@davemloft.net>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-08 23:36:11 -07:00