Commit Graph

2552 Commits

Author SHA1 Message Date
Sathya Perla
cc7d723adb be2net: fixup malloc/free of adapter->pmac_id
Free was missing and kcalloc() is better placed in be_ctrl_init()

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-30 13:27:05 -04:00
Vasundhara Volam
8a046d3bfd be2net: fix FW default for VF tx-rate
BE3 FW initializes VF tx-rate to 100Mbps. Fix this to 10Gbps.

Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-30 13:27:05 -04:00
Vasundhara Volam
7c5a524221 be2net: fix max VFs reported by HW
BE3 FW allocates VF resources for upto 30 VFs per PF while a max value of 32
may be reported via PCI config space. Fix this in the driver.

Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-30 13:27:05 -04:00
Sathya Perla
4cbdaf6d52 be2net: create RSS rings even in multi-channel configs
Changes from commit df505e were incorrectly over-written by commit 10ef9ab.
Fixing the same.

Change log of the original fix:
    Currently RSS rings are not created in a multi-channel config.
    RSS rings can be created on one (out of four) interfaces per port in a
    multi-channel config. Doing this insulates the driver from a FW bug wherin
    multi-channel config is wrongly reported even when not enabled. This also
    helps performance in a multi-channel config, as one interface per port gets
    RSS rings.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-30 13:27:05 -04:00
Julia Lawall
e8c75e2c1c drivers/net/ethernet/tundra/tsi108_eth.c: delete double assignment
Delete successive assignments to the same location.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression i;
@@

*i = ...;
 i = ...;
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-30 13:08:37 -04:00
david decotigny
3f0a1b58ae forcedeth: prevent TX timeouts after reboot
This complements patch "net-forcedeth: fix TX timeout caused by TX
pause on down link" which ensures that a lock-up sequence is not sent
to the NIC. Present patch ensures that if a NIC is already locked-up,
the driver will recover from it when initializing the device.

It does the equivalent of the following recovery sequence:
 - write NVREG_TX_PAUSEFRAME_ENABLE_V1 to eth1's register
   NvRegTxPauseFrame
 - write NVREG_XMITCTL_START to eth1's register
   NvRegTransmitterControl
 - write 0 to eth1's register NvRegTransmitterControl
(this is at the heart of the "unbricking" sequence mentioned in patch
 "net-forcedeth: fix TX timeout caused by TX pause on down link")

Tested:
 - hardware is MCP55 device id 10de:0373 (rev a3), dual-port
 - reboot a kernel without any of patches mentioned
 - freeze the NIC (details on description for commit "net-forcedeth:
   fix TX timeout caused by TX pause on down link")
 - wait 5mn until ping hangs & TX timeout in dmesg
 - reboot on kernel with present patch
 - host is immediatly operational, no TX timeout

Signed-off-by: David Decotigny <decot@googlers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-30 13:04:57 -04:00
david decotigny
1ff39eb66b forcedeth: fix TX timeout caused by TX pause on down link
On some dual-port forcedeth devices such as MCP55 10de:0373 (rev a3),
when autoneg & TX pause are enabled while port is connected but
interface is down, the NIC will eventually freeze (TX timeouts,
network unreachable).

This patch ensures that TX pause is not configured in hardware when
interface is down. The TX pause request will be honored when interface
is later configured.

Tested:
 - hardware is MCP55 device id 10de:0373 (rev a3), dual-port
 - eth0 connected and UP, eth1 connected but DOWN
 - without this patch, following sequence would brick NIC:
      ifconfig eth0 down
      ifconfig eth1 up
      ifconfig eth1 down
      ethtool -A eth1 autoneg off rx on tx off
      ifconfig eth1 up
      ifconfig eth1 down
      ethtool -A eth1 autoneg on rx on tx on
      ifconfig eth1 up
      ifconfig eth1 down
      ifup eth0
      sleep 120  # or longer
      ethtool eth1
   Just in case, sequence to un-brick:
      ifconfig eth0 down
      ethtool -A eth1 autoneg off rx on tx off
      ifconfig eth1 up
      ifconfig eth1 down
      ifup eth0
 - with this patch: no TX timeout after "bricking" sequence above

Details:
 - The following register accesses have been identified as the ones
   causing the NIC to freeze in "bricking" sequence above:
    - write NVREG_TX_PAUSEFRAME_ENABLE_V1 to eth1's register NvRegTxPauseFrame
    - write NVREG_MISC1_PAUSE_TX | NVREG_MISC1_FORCE to eth1's register NvRegMisc1
    - write 0 to eth1's register NvRegTransmitterControl
   This is what this patch avoids.

Signed-off-by: David Decotigny <decot@googlers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-30 13:04:27 -04:00
david decotigny
ba9aa13428 forcedeth: fix buffer overflow
Found by manual code inspection.

Tested: compile, reboot, ethtool -d ethX

Signed-off-by: David Decotigny <decot@googlers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-30 13:04:27 -04:00
David S. Miller
255e87657a Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc-next
Ben Hutchings says:

====================
1. Change the TX path to stop queues earlier and avoid returning
NETDEV_TX_BUSY.
2. Remove some inefficiencies in soft-TSO.
3. Fix various bugs involving device state transitions and/or reset
scheduling by error handlers.
4. Take advantage of my previous change to operstate initialisation.
5. Miscellaneous cleanup.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-24 16:35:43 -04:00
Ben Hutchings
8f8b3d5189 sfc: Fix the initial device operstate
Following commit 8f4cccb ('net: Set device operstate at registration
time') it is now correct and preferable to set the carrier off before
registering a device.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2012-08-24 20:10:24 +01:00
Ben Hutchings
adeb15aa1c sfc: Assign efx and efx->type as early as possible in efx_pci_probe()
We also stop clearing *efx in efx_init_struct().  This is safe because
alloc_etherdev_mq() already clears it for us.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2012-08-24 20:10:23 +01:00
Ben Hutchings
3f65ea5b2a sfc: Remove bogus comment about MTU change and RX buffer overrun
RX DMA is limited by the length specified in each descriptor and not
by the MAC.  Over-length frames may get into the RX FIFO regardless of
the MAC settings, due to a hardware bug, but they will be truncated by
the packet DMA engine and reported as such in the completion event.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2012-08-24 20:10:23 +01:00
Ben Hutchings
7bde852afc sfc: Remove overly paranoid locking assertions from netdev operations
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2012-08-24 20:10:22 +01:00
Ben Hutchings
7153f623ea sfc: Fix reset vs probe/remove/PM races involving efx_nic::state
We try to defer resets while the device is not READY, but we're not
doing this quite correctly.  In particular, changes to efx_nic::state
are documented as serialised by the RTNL lock, but they aren't.

1. We check whether a reset was requested during probe (suggesting
broken hardware) before we allow requested resets to be scheduled.
This leaves a window where a requested reset would be deferred
indefinitely.

2. Although we cancel the reset work item during device removal,
there are still later operations that can cause it to be scheduled
again.  We need to check the state before scheduling it.

3. Since the state can change between scheduling and running of
the work item, we still need to check it there, and we need to
do so *after* acquiring the RTNL lock which serialises state
changes.

4. We must cancel the reset work item during device removal, if the
state could ever have been READY.  This wasn't done in some of the
failure paths from efx_pci_probe().  Move the cancellation to
efx_pci_remove_main().

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2012-08-24 20:10:22 +01:00
Ben Hutchings
b812f8b7a9 sfc: Improve log messages in case we abort probe due to a pending reset
The current informational message doesn't properly explain what
happens, and could also appear if we defer a reset during
suspend/resume.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2012-08-24 20:10:21 +01:00
Ben Hutchings
8b7325b4e2 sfc: Never try to stop and start a NIC that is disabled
efx_change_mtu() and efx_realloc_channels() each stop and start much
of the NIC, even if it has been disabled.  Since efx_start_all() is a
no-op when the NIC is disabled, this is probably harmless in the case
of efx_change_mtu(), but efx_realloc_channels() also reenables
interrupts which could be a bad thing to do.

Change efx_start_all() and efx_start_interrupts() to assert that the
NIC is not disabled, but make efx_stop_interrupts() do nothing if the
NIC is disabled (since it is already stopped), consistent with
efx_stop_all().

Update comments for efx_start_all() and efx_stop_all() to describe
their purpose and preconditions more accurately.

Add a common function to check and log if the NIC is disabled, and use
it in efx_net_open(), efx_change_mtu() and efx_realloc_channels().

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2012-08-24 20:10:20 +01:00
Ben Hutchings
5642ceef46 sfc: Hold RTNL lock (only) when calling efx_stop_interrupts()
Interrupt state should be consistently guarded by the RTNL lock once
the net device is registered.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2012-08-24 20:10:20 +01:00
Ben Hutchings
6032fb56c5 sfc: Keep disabled NICs quiescent during suspend/resume
Currently we ignore and clear the disabled state.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2012-08-24 20:10:19 +01:00
Ben Hutchings
61da026d86 sfc: Hold the RTNL lock for more of the suspend/resume cycle
I don't think these PM functions can race with userland net device
operations, but it's much easier to reason about locking if state is
consistently guarded by the same lock.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2012-08-24 20:10:19 +01:00
Ben Hutchings
f16aeea0e6 sfc: Change state names to be clearer, and comment them
STATE_INIT and STATE_FINI are equivalent and represent incompletely
initialised states; combine them as STATE_UNINIT.

Rename STATE_RUNNING to STATE_READY, to avoid confusion with
netif_running() and IFF_RUNNING.

The comments do not quite match current usage, but this will be
corrected in subsequent fixes.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2012-08-24 20:10:18 +01:00
Ben Hutchings
9714284f83 sfc: Stash header offsets for TSO in struct tso_state
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2012-08-24 20:10:17 +01:00
Ben Hutchings
53cb13c680 sfc: Replace tso_state::full_packet_space with ip_base_len
We only use tso_state::full_packet_space to calculate the IPv4 tot_len
or IPv6 payload_len, not to set tso_state::packet_space.  Replace it
with an ip_base_len field holding the value of tot_len or payload_len
before including the TCP payload, which is much more useful when
constructing the new headers.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2012-08-24 20:10:17 +01:00
Ben Hutchings
f7251a9ce9 sfc: Simplify TSO header buffer allocation
TSO header buffers contain a control structure immediately followed by
the packet headers, and are kept on a free list when not in use.  This
complicates buffer management and tends to result in cache read misses
when we recycle such buffers (particularly if DMA-coherent memory
requires caches to be disabled).

Replace the free list with a simple mapping by descriptor index.  We
know that there is always a payload descriptor between any two
descriptors with TSO header buffers, so we can allocate only one
such buffer for each two descriptors.

While we're at it, use a standard error code for allocation failure,
not -1.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2012-08-24 20:10:11 +01:00
Ben Hutchings
14bf718fb9 sfc: Stop TX queues before they fill up
We now have a definite upper bound on the number of descriptors per
skb; use that to stop the queue when the next packet might not fit.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2012-08-24 19:00:27 +01:00
Ben Hutchings
7668ff9c2a sfc: Refactor struct efx_tx_buffer to use a flags field
Add a flags field to struct efx_tx_buffer, replacing the
continuation and map_single booleans.

Since a single descriptor cannot be both a TSO header and the last
descriptor for an skb, unionise efx_tx_buffer::{skb,tsoh} and add
flags for validity of these fields.

Clear all flags in free buffers (whereas previously the continuation
flag would be set).

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2012-08-24 19:00:26 +01:00
Wei Yongjun
1f7c9ae7a0 w5300: using eth_hw_addr_random() for random MAC and set device flag
Using eth_hw_addr_random() to generate a random Ethernet address
(MAC) to be used by a net device and set addr_assign_type.
Not need to duplicating its implementation.

spatch with a semantic match is used to found this problem.
(http://coccinelle.lip6.fr/)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-24 13:30:27 -04:00
Wei Yongjun
d68bb7e1a6 w5100: using eth_hw_addr_random() for random MAC and set device flag
Using eth_hw_addr_random() to generate a random Ethernet address
(MAC) to be used by a net device and set addr_assign_type.
Not need to duplicating its implementation.

spatch with a semantic match is used to found this problem.
(http://coccinelle.lip6.fr/)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-24 13:30:27 -04:00
Timur Tabi
9f35a7342c net/fsl: introduce Freescale 10G MDIO driver
Similar to fsl_pq_mdio.c, this driver is for the 10G MDIO controller on
Freescale Frame Manager Ethernet controllers.

Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-24 12:42:42 -04:00
David S. Miller
bba6ec7e49 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next
Jeff Kirsher says:

====================
This series contains updates to ethtool.h, e1000, e1000e, and igb to
implement MDI/MDIx control.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 14:23:43 -07:00
David S. Miller
1304a7343b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-08-22 14:21:38 -07:00
Linus Torvalds
8f8ba75ee2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking update from David Miller:
 "A couple weeks of bug fixing in there.  The largest chunk is all the
  broken crap Amerigo Wang found in the netpoll layer."

 1) netpoll and it's users has several serious bugs:
    a) uses GFP_KERNEL with locks held
    b) interfaces requiring interrupts disabled are called with them
       enabled
    c) and vice versa
    d) VLAN tag demuxing, as per all other RX packet input paths, is not
       applied

    All from Amerigo Wang.

 2) Hopefully cure the ipv4 mapped ipv6 address TCP early demux bugs for
    good, from Neal Cardwell.

 3) Unlike AF_UNIX, AF_PACKET sockets don't set a default credentials
    when the user doesn't specify one explicitly during sendmsg().
    Instead we attach an empty (zero) SCM credential block which is
    definitely not what we want.  Fix from Eric Dumazet.

 4) IPv6 illegally invokes netdevice notifiers with RCU lock held, fix
    from Ben Hutchings.

 5) inet_csk_route_child_sock() checks wrong inet options pointer, fix
    from Christoph Paasch.

 6) When AF_PACKET is used for transmit, packet loopback doesn't behave
    properly when a socket fanout is enabled, from Eric Leblond.

 7) On bluetooth l2cap channel create failure, we leak the socket, from
    Jaganath Kanakkassery.

 8) Fix all the netprio file handling bugs found by Al Viro, from John
    Fastabend.

 9) Several error return and NULL deref bug fixes in networking drivers
    from Julia Lawall.

10) A large smattering of struct padding et al.  kernel memory leaks to
    userspace found of Mathias Krause.

11) Conntrack expections in netfilter can access an uninitialized timer,
    fix from Pablo Neira Ayuso.

12) Several netfilter SIP tracker bug fixes from Patrick McHardy.

13) IPSEC ipv6 routes are not initialized correctly all the time,
    resulting in an OOPS in inet_putpeer().  Also from Patrick McHardy.

14) Bridging does rcu_dereference() outside of RCU protected area, from
    Stephen Hemminger.

15) Fix routing cache removal performance regression when looking up
    output routes that have a local destination.  From Zheng Yan.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits)
  af_netlink: force credentials passing [CVE-2012-3520]
  ipv4: fix ip header ident selection in __ip_make_skb()
  ipv4: Use newinet->inet_opt in inet_csk_route_child_sock()
  tcp: fix possible socket refcount problem
  net: tcp: move sk_rx_dst_set call after tcp_create_openreq_child()
  net/core/dev.c: fix kernel-doc warning
  netconsole: remove a redundant netconsole_target_put()
  net: ipv6: fix oops in inet_putpeer()
  net/stmmac: fix issue of clk_get for Loongson1B.
  caif: Do not dereference NULL in chnl_recv_cb()
  af_packet: don't emit packet on orig fanout group
  drivers/net/irda: fix error return code
  drivers/net/wan/dscc4.c: fix error return code
  drivers/net/wimax/i2400m/fw.c: fix error return code
  smsc75xx: add missing entry to MAINTAINERS
  net: qmi_wwan: new devices: UML290 and K5006-Z
  net: sh_eth: Add eth support for R8A7779 device
  netdev/phy: skip disabled mdio-mux nodes
  dt: introduce for_each_available_child_of_node, of_get_next_available_child
  net: netprio: fix cgrp create and write priomap race
  ...
2012-08-21 16:46:08 -07:00
Jesse Brandeburg
8376dad0c8 igb: update to allow reading/setting MDI state
This is the implementation for igb to allow forcing MDI state
via ethtool, allowing users to work around some improperly
behaving switches.

Forcing in this driver is for now only allowed when auto-neg is
enabled.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
CC: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Aaron Brown aaron.f.brown@intel.com
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2012-08-21 01:27:48 -07:00
Jesse Brandeburg
4e8186b68f e1000e: implement MDI/MDI-X control
Some users report issues with link failing when connected to certain
switches.  This gives the user the ability to control the MDI state
from the driver, allowing users to work around some improperly
behaving switches.

Forcing in this driver is for now only allowed when auto-neg is
enabled.

This is in regards to the related ethtool app patch and
bugzilla.kernel.org bug 11998

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
CC: bruce.w.allan@intel.com
CC: n.poppelier@xs4all.nl
CC: bastien@durel.org
CC: jsveiga@it.eng.br
Tested-by: Aaron Brown aaron.f.brown@intel.com
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2012-08-21 01:26:45 -07:00
Jesse Brandeburg
c819bbd5ec e1000: configure and read MDI settings
This is the implementation in e1000 to allow ethtool to force
MDI state, allowing users to work around some improperly
behaving switches.

Forcing in this driver is for now only allowed when auto-neg is enabled.

To use must have the matching version of ethtool app that supports
this functionality.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
CC: Tushar Dave <tushar.n.dave@intel.com>
Tested-by: Aaron Brown aaron.f.brown@intel.com
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2012-08-21 01:25:34 -07:00
Jesse Brandeburg
1b556783ab igb: implement 580 MDI setting support
In order for igb to support MDI setting support via
ethtool this code is needed to allow setting the MDI state
via software.

This is in regards to the related ethtool patch

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Aaron Brown aaron.f.brown@intel.com
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2012-08-21 01:24:05 -07:00
Bruce W Allan
e86fd89188 e1000e: implement 82577/579 MDI setting support
In order for e1000e to support MDI setting support via
ethtool this code is needed to allow setting the MDI state
via software.

This is in regards to the related ethtool patch and
fixes bugzilla.kernel.org bug 11998

Signed-off-by: Bruce W Allan <bruce.w.allan@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Aaron Brown aaron.f.brown@intel.com
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2012-08-21 01:22:56 -07:00
Kelvin Cheung
ae4d8cf299 net/stmmac: fix issue of clk_get for Loongson1B.
When getting clock, give a chance to the CPUs without DT support,
which use Common Clock Framework, such as Loongson1B.

Signed-off-by: Kelvin Cheung <keguang.zhang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:52:20 -07:00
Phil Edworthy
d0418bb712 net: sh_eth: Add eth support for R8A7779 device
Signed-off-by: Phil Edworthy <phil.edworthy@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:16:54 -07:00
Linus Torvalds
846b99964a Grab bag of InfiniBand/RDMA fixes:
- IPoIB fixes for regressions introduced by path database conversion
  - mlx4 fixes for bugs with large memory systems and regressions from SR-IOV patches
  - RDMA CM fix for passing bad event up to userspace
  - Other minor fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJQLo9cAAoJEENa44ZhAt0hPeUP/Ag1i164UE2e64sTfbVUfbEM
 7a+acsBo2ByQhVaXd65PZqm6+aYjosSwkStl0u4cb9Q3qQvYiZeLJj/qmInVV1hv
 qHuNytbnXl6W2f6vyz391TKx+X6uastlH7O4f4v0zS3zkrlkN9kCrN5dI1Fxm9ba
 5hluyggs7pVmHPX7KDGnQNZEjCsKoJF2kzB0hXwqGo1JxB2sEGq3DV/uMTHPgKKM
 HlrTJEEb7pOeHHf2Zt8MQiHMC0YBJ6MmpVIidR02TdQdCgZQcpoYnn20odmNLK5D
 HTT4jbCSsbOiHPpdAn0YkgzXzL58a2/b1Tt/pNq+Rud7VKwFVuFhNElPyAxR1/1K
 tPGUbDA6eOfosU++QbDJ0QHxDTgBTyXwh4yMkwPjmdWJ2jwTQev74iiDXgDf3QbS
 tMmWSwfC38hBxvfomNA6QTPvY1c3NNvj3uO+xq4/q6HloGqSlnIrPBL9hF0BiVQ6
 KteZ6gF+R1rHEIQraEnbNmh5Rxvi2RdgN0Xa0U9w/yr0LCowmUbFK35XoyTFyo/8
 rF7xp85s1wH5OuaA0NQeivQEKbuBxl7nvJlC2RBdRT/B9U5now+Y2XKOX0+FfRsg
 Qm0By/K8ss2XftX+lNfip+ScSmu9p+kCBME0X8Ra590VDt0IcPiJsC9ozhKyywRy
 KfSGj3iZG/nDaec6i1aN
 =hN/H
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull infiniband/rdma fixes from Roland Dreier:
 "Grab bag of InfiniBand/RDMA fixes:
   - IPoIB fixes for regressions introduced by path database conversion
   - mlx4 fixes for bugs with large memory systems and regressions from
     SR-IOV patches
   - RDMA CM fix for passing bad event up to userspace
   - Other minor fixes"

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/mlx4: Check iboe netdev pointer before dereferencing it
  mlx4_core: Clean up buddy bitmap allocation
  mlx4_core: Fix integer overflow issues around MTT table
  mlx4_core: Allow large mlx4_buddy bitmaps
  IB/srp: Fix a race condition
  IB/qib: Fix error return code in qib_init_7322_variables()
  IB: Fix typos in infiniband drivers
  IB/ipoib: Fix RCU pointer dereference of wrong object
  IB/ipoib: Add missing locking when CM object is deleted
  RDMA/ucma.c: Fix for events with wrong context on iWARP
  RDMA/ocrdma: Don't call vlan_dev_real_dev() for non-VLAN netdevs
  IB/mlx4: Fix possible deadlock on sm_lock spinlock
2012-08-17 11:45:58 -07:00
Alexander Duyck
62748b7bde ixgbe: Rewrite code related to configuring IFCS bit in Tx descriptor
This change updates the code related to configuring the transmit frame
checksum.  Specifically I have updated the code so that we can only skip
inserting the checksum in the case that we are not performing some other
offload that will modify the frame data.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
2012-08-16 15:44:47 -07:00
Alexander Duyck
5a02cbd10d ixgbe: Roll RSC code into non-EOP code
This change moves the RSC code into the non-EOP descriptor handling
function. The main motivation behind this change is to help reduce the
overhead in the non-RSC case. Previously the non-RSC path code would
always be checking for append count even if RSC had been disabled. Now
this code is completely skipped in a single conditional check instead of
having to make two separate checks.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
2012-08-16 15:44:45 -07:00
Alexander Duyck
18806c9ea2 ixgbe: Make allocating skb and placing data in it a separate function
This patch creates a function named ixgbe_fetch_rx_buffer. The sole
purpose of this function is to retrieve a single buffer off of the ring and
to place it in an skb.

The advantage to doing this is that it helps improve the readability since
I can decrease the indentation and for the code in this section.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
2012-08-16 15:44:42 -07:00
Alexander Duyck
cf3fe7aca0 ixgbe: Copybreak sooner to avoid get_page/put_page and offset change overhead
This change makes it so that if only the first 256 bytes of a buffer are
used we just copy the data out and leave the offset and page count
unchanged. There are multiple advantages to this. First it allows us to
reuse the page much more in the case of pages larger than 4K. It also
allows us to avoid some expensive atomic operations in the form of
get_page/put_page. In perf I have seen CPU utilization for put_page drop
from 3.5% to 1.8% as a result of this patch when doing small packet routing,
and packet rates increased by about 3%.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
2012-08-16 15:44:39 -07:00
Alexander Duyck
19861ce24f ixgbe: Make pull tail function separate from rest of cleanup_headers
This change creates a separate function for functionality similar to
pskb_pull_tail.  The main motivation for moving it to a separate function
is so that later I can just skip this function in the case where we have
already copied the buffer into skb->head.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
2012-08-16 15:44:35 -07:00
Alexander Duyck
42073d91a2 ixgbe: Have the CPU take ownership of the buffers sooner
This patch makes it so that we will always have ownership of the buffers by
the time we get to ixgbe_add_rx_frag. This is necessary as I am planning to
add a copy-break to ixgbe_add_rx_frag and in order for that to function
correctly we need the CPU to have ownership of the buffer.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
2012-08-16 15:44:32 -07:00
Alexander Duyck
09816fbea9 ixgbe: Only use double buffering if page size is less than 8K
This change makes it so that we do not use double buffering if the page
size is larger than 4K.  Instead we will simply walk through the page using
up to 3K per receive, and if we receive less than we only move the offset
by that amount.  We will free the page when there is no longer any space
left that we can use instead of checking the page count to see if we can
cycle back to the start.

The main motivation behind this is to avoid the unnecessary truesize cost
for using a half page when most packets are 2K or smaller. With this new
approach the largest possible truesize for a page fragment will be 3K when
PAGE_SIZE is larger than 4K.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
2012-08-16 15:44:28 -07:00
Alexander Duyck
0549ae20b7 ixgbe: combine ixgbe_add_rx_frag and ixgbe_can_reuse_page
This patch combines ixgbe_add_rx_frag and ixgbe_can_reuse_page into a
single function. The main motivation behind this is to make better use of
the values so that we don't have to load them from memory and into
registers twice.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
2012-08-16 15:44:25 -07:00
Alexander Duyck
afaa9459de ixgbe: Remove code that was initializing Rx page offset
This change reverts an earlier patch that introduced
ixgbe_init_rx_page_offset. The idea behind the function was to provide
some variation in the starting offset for the page in order to reduce
hot-spots in the cache. However it doesn't appear to provide any
significant benefit in the testing I have done. It has however been a
source of several bugs, and it blocks us from being able to use 2K
fragments on larger page sizes. So the decision I made was to remove it.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
2012-08-16 15:44:00 -07:00
Roland Dreier
96f17d5900 mlx4_core: Clean up buddy bitmap allocation
- Use kcalloc() / vzalloc() instead of an extra bitmap_zero().
 - Add __GFP_NOWARN to kcalloc() since we'll try vzalloc() if it fails.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-08-15 21:05:27 -07:00
Yishai Hadas
3de819e6b6 mlx4_core: Fix integer overflow issues around MTT table
Fix some issues around int variables used in data structures related
to memory registration.

Handle int overflow in mlx4_init_icm_table by using a u64 intermediate
variable and changing struct mlx4_icm_table num_obj field to be u32.

Change some more fields/variables to use u32 instead of int to prevent
a case where the variable becomes negative when bit 31 is set.

Also subtract log_mtts_per_seg from the exponent when computing
num_mtt, since its added later on in that very same code area.

This and the previous commit fixes some issues which actually prevent
commit db5a7a65c0 ("mlx4_core: Scale size of MTT table with system
RAM") from working.  Now, when the number of MTTs is scaled with the
size of the RAM we can map up to 8TB.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-08-15 21:05:26 -07:00