Always enable large page support; didn't seem to cause problems for anyone.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch enables IPoIB to use 4K UD messages (when the underlying
device and fabrics support a 4K MTU) by using two scatter buffers when
PAGE_SIZE is less than or equal to thhe HCA IB MTU size. The first
buffer is for IPoIB header + GRH header, and the second buffer is the
IPoIB payload, which is 4K-4.
Signed-off-by: Shirley Ma <xma@us.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
After PXE boot, the iw_nes driver does a full reset to ensure the card
is in a clean state. However, it doesn't wait for firmware to
complete its work before issuing a port reset to enable the ports,
which leads to problems bringing up the ports.
The solution is to wait for firmware to complete its work before
proceeding with port reset.
This bug was flagged by Roland Dreier <rolandd@cisco.com>.
Cc: <stable@kernel.org>
Signed-off-by: Chien Tung <ctung@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Use NIPQUAD_FMT instead of printing raw 32-bit hex quantities in
debugging output.
Acked-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The PCI MSI interface is stubbed out properly so that all the
functions just return failure if PCI_MSI=n, so there's no reason to
have "#ifdef CONFIG_PCI_MSI" blocks in ipath_iba7220.c.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Before IBA7220 support was added, the ipath driver didn't support any
hardware unless PCI_MSI and/or HT_IRQ was enabled. However, the
IBA7220 can generate INTx interrupts, so it makes sense to allow the
driver to be build even if PCI_MSI=n and HT_IRQ=n.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The new IBA7220 code added a call to ipath_init_iba7220_funcs() that
is compiled unconditionally, but only built the IBA7220 code if
PCI_MSI is enabled. Fix this by building the IBA7220 file
unconditonally.
This fixes build breakage when PCI_MSI=n, HT_IRQ=y and
INFINIBAND_IPATH=y reported by Ingo Molnar <mingo@elte.hu>:
drivers/built-in.o: In function `ipath_init_one':
ipath_driver.c:(.devinit.text+0x1e5bc): undefined reference to `ipath_init_iba7220_funcs'
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Commit 124b4dcb ("IB/ipath: add calls to new 7220 code and enable in
build") inadvertently added core to set dev->class_dev.dev back into
ib_ipath. This is completely redundant since commit 1912ffbb ("IB: Set
class_dev->dev in core for nice device symlink"), which removed
class_dev setting from low-level drivers, and also will break the build
when class_dev is removed completely from struct ib_device.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Describe disable_sma parameter with its name rather than the internal
ib_ipath_disable_sma variable name, so that the description shows up
properly in modinfo.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Acked-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Remove redundant static declarations of functions that are defined
before they are used in the source.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6: (36 commits)
SCSI: convert struct class_device to struct device
DRM: remove unused dev_class
IB: rename "dev" to "srp_dev" in srp_host structure
IB: convert struct class_device to struct device
memstick: convert struct class_device to struct device
driver core: replace remaining __FUNCTION__ occurrences
sysfs: refill attribute buffer when reading from offset 0
PM: Remove destroy_suspended_device()
Firmware: add iSCSI iBFT Support
PM: Remove legacy PM (fix)
Kobject: Replace list_for_each() with list_for_each_entry().
SYSFS: Explicitly include required header file slab.h.
Driver core: make device_is_registered() work for class devices
PM: Convert wakeup flag accessors to inline functions
PM: Make wakeup flags available whenever CONFIG_PM is set
PM: Fix misuse of wakeup flag accessors in serial core
Driver core: Call device_pm_add() after bus_add_device() in device_add()
PM: Handle device registrations during suspend/resume
block: send disk "change" event for rescan_partitions()
sysdev: detect multiple driver registrations
...
Fixed trivial conflict in include/linux/memory.h due to semaphore header
file change (made irrelevant by the change to mutex).
It's big, but there doesn't seem to be a way to split it up smaller...
Signed-off-by: Tony Jones <tonyj@suse.de>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This sets us up to be able to convert the srp_host to use a struct
device instead of a class_device.
Based on a original patch from Tony Jones, but split up into this piece
by Greg.
Signed-off-by: Tony Jones <tonyj@suse.de>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Reviewed-by: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This converts the main ib_device to use struct device instead of struct
class_device as class_device is going away.
Signed-off-by: Tony Jones <tonyj@suse.de>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
None of these files use any of the functionality promised by
asm/semaphore.h. It's possible that they rely on it dragging in some
unrelated header file, but I can't build all these files, so we'll have
fix any build failures as they come up.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
The itt field in struct iscsi_data is not defined with any particular
endianness. open-iscsi should use it as-is without byte-swapping it.
This fixes sparse warnings coming from doing ntohl(hdr->itt).
Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The mlx4_ib driver is stable enough for production use, so bump the
version number to 1.0 to indicate this.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If a P_Key is deleted and then re-added at the same index, then IPoIB
gets confused because __ipoib_ib_dev_flush() only checks whether the
index is the same without checking whether the P_Key was present, so
the interface is stopped when the P_Key is deleted, but the event when
the P_Key is re-added gets ignored and the interface never gets
restarted.
Also, switch to using ib_find_pkey() instead of ib_find_cached_pkey()
everywhere in IPoIB, since none of the places that look for P_Keys are
in a fast path or in non-sleeping context, and in general we want to
kill off the whole caching infrastructure eventually. This also fixes
consistency problems caused because some IPoIB queries were cached and
some were uncached during the window where the cache was not updated.
Thanks to Venkata Subramonyam <vsubramo@cisco.com> for debugging this
problem and testing this fix.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When a RDMA_CM_EVENT_DEVICE_REMOVAL event is raised, iSER should
release the connection resources.
This is necessary when the IB HCA module is unloaded while open-iscsi
is still running. Currently, iSER just BUG()s.
Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Also, introduce a few inline helper functions to make the code more readable.
Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Move the free_irq() call in nes_remove() to before the tasklet_kill();
otherwise there is a window after tasklet_kill() where a new interrupt
can be handled and reschedule the tasklet, leading to a use-after-free
crash.
Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The ib_mthca driver has been stable for a while, so bump the version
number to 1.0 to indicate this.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If the QP was moved to another state (such as SQE) by the hardware,
then after this change the user won't have to set the IBV_QP_CUR_STATE
mask in order to execute modify QP in order to recover from this state.
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If the QP was moved to another state (such as SQE) by the hardware,
then after this change the user won't have to set the IBV_QP_CUR_STATE
mask in order to execute modify QP in order to recover from this state.
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix a place where we might dereference a NULL pointer; this fixes
Coverity CID 1392. On inspection I also found a place where we could
attempt to kmem_cache_free() a NULL pointer, so fix this too.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This can be used to tune at run time the parameters controlling the
event (interrupt) generation rate and thus reduce the overhead
incurred by handling interrupts resulting in better throughput. Since
IPoIB uses a single CQ for both RX and TX, RX is chosen to dictate
configuration for both RX and TX.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add support for modifying CQ parameters for controlling event
generation moderation.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Just add the infrastructure so we can add functionality later.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Handle IB_WR_SEND_WITH_INV work requests.
This resurrects a patch sent long ago by Mikkel Hagen <mhagen@iol.unh.edu>.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add a new IB_WR_SEND_WITH_INV send opcode that can be used to mark a
"send with invalidate" work request as defined in the iWARP verbs and
the InfiniBand base memory management extensions. Also put "imm_data"
and a new "invalidate_rkey" member in a new "ex" union in struct
ib_send_wr. The invalidate_rkey member can be used to pass in an
R_Key/STag to be invalidated. Add this new union to struct
ib_uverbs_send_wr. Add code to copy the invalidate_rkey field in
ib_uverbs_post_send().
Fix up low-level drivers to deal with the change to struct ib_send_wr,
and just remove the imm_data initialization from net/sunrpc/xprtrdma/,
since that code never does any send with immediate operations.
Also, move the existing IB_DEVICE_SEND_W_INV flag to a new bit, since
the iWARP drivers currently in the tree set the bit. The amso1100
driver at least will silently fail to honor the IB_SEND_INVALIDATE bit
if passed in as part of userspace send requests (since it does not
implement kernel bypass work request queueing). Remove the flag from
all existing drivers that set it until we know which ones are OK.
The values chosen for the new flag is not consecutive to avoid clashing
with flags defined in the XRC patches, which are not merged yet but
which are already in use and are likely to be merged soon.
This resurrects a patch sent long ago by Mikkel Hagen <mhagen@iol.unh.edu>.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch adds the initialization calls into the new 7220 HCA files,
changes the Makefile to compile and link the new files, and code to
handle send DMA.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The patch adds a number of minor changes to support newer HCAs:
- New send buffer control bits
- New error condition bits
- Locking and initialization changes
- More send buffers
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
A new file which allows the IBA7220 send DMA engine to be used from
userland. The routines here are not linked in yet, that will happen in
a follow-on patch...
Signed-off-by: Arthur Jones <arthur.jones@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
A new header file which allows the IBA7220 send DMA engine to be used
from userland. The definitions here are not used yet, that will happen
in a follow-on patch...
Signed-off-by: Arthur Jones <arthur.jones@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The IBA7220 HCA has a new feature to DMA data to the on chip send
buffers instead of or in addition to the host CPU doing the data
transfer. This patch adds code to support the send DMA queue.
Signed-off-by: John Gregor <john.gregor@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch adds binary data to initialize the IB SERDES.
Signed-off-by: Michael Albaugh <Michael.Albaugh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The control and initialization of the SerDes blocks of the IBA7220 is
sufficiently complex to merit a separate file.
Signed-off-by: Michael Albaugh <Michael.Albaugh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch adds the HCA-specific code for the IBA7220 HCA.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch adds a new ASIC-specific header file for the HCAs using the IBA7220.
Signed-off-by: Michael Albaugh <Michael.Albaugh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This is part of a patch series to add support for a new HCA. This patch
adds new fields to the header files.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch makes chip reset more robust and reduces lock contention
between user and kernel TID register updates.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Newer HCAs support MSI interrupts and also INTx interrupts. Fix the
code so that INTx can be reliably enabled if MSI interrupts are not
working.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Newer HCAs have a threshold counter to reduce the number of DMAs the
chip makes to update the PIO buffer availability status bits. This
patch enables the feature.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Whenever the LID is set, notify the HCA specific code so that the
appropriate HW registers can be updated. Also log the info on the
console at low priority.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch adds code to enable/disable the IBTA 1.2 heartbeat for testing
if the HCA supports it.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The hardware-based recovery doesn't need any intervention, and in a few
cases we can get a bit confused about state and skip steps such as
turning off the link state LED when we consider recovery to be "down".
So ignore this transition, and either we recover in hardware, or we
transition to down, and will handle it then.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Newer HCAs have a HW option to write a sequence number to each receive
queue entry and avoid a separate DMA of the tail register to memory.
This patch adds support for these changes.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch makes some white space changes and minor non-functional
changes to more closely match the code in OFED-1.3.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch checks for old and new format writes to send a packet via the
diagnostic interface.
Signed-off-by: Michael Albaugh <Michael.Albaugh@Qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Make sure that a device implements the modify_srq and reg_phys_mr
optional methods before calling them.
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Raw comparison against jiffies will fail if jiffies wraps, although
since ipath currently only supports 64-bit architectures, this is rather
far-fetched. Still, it's better to use time_after_eq().
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Rather than have build_mlx_header() return a negative value on failure
and the length of the segments it builds on success, add a pointer
parameter to return the length and return 0 on success. This matches
the calling convention used for build_lso_seg() and generates slightly
smaller code -- eg, on 64-bit x86:
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-22 (-22)
function old new delta
mlx4_ib_post_send 2023 2001 -22
Signed-off-by: Roland Dreier <rolandd@cisco.com>
For HCAs that support TCP segmentation offload (IB_DEVICE_UD_TSO), set
NETIF_F_TSO and use HW LSO to offload TCP segmentation.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add a create_flags member to struct ib_qp_init_attr that will allow a
kernel verbs consumer to create a pass special flags when creating a QP.
Add a flag value for telling low-level drivers that a QP will be used
for IPoIB UD LSO. The create_flags member will also be useful for XRC
and ehca low-latency QP support.
Since no create_flags handling is implemented yet, add code to all
low-level drivers to return -EINVAL if create_flags is non-zero.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add support for reading newer card's EEPROMs while continuing to support
older EEPROMs.
Also, add support for the temperature sensor if present.
Signed-off-by: Michael Albaugh <Michael.Albaugh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This reduces the latency for RC ACKs when a PIO buffer is available.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
A fixed partitioning of send buffers is determined at driver load time
for user processes and kernel use. Since send buffers are a scarce
resource, it makes sense to allow the kernel to use the buffers if they
are not in use by a user process.
Also, eliminate code duplication for ipath_force_pio_avail_update().
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The link can be put in LINKDOWN_DISABLE state either locally or via a
MAD. However, the link-recovery code will take it out of that state as
a side-effect of attempts to clear SerDes/XGXS issues.
We add a flag to indicate "link is down on purpose, leave it alone."
Signed-off-by: Michael Albaugh <michael.albaugh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Update the module author to the current email address.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In list iteration code, you normally wouldn't be calling
"container_of()" directly anyway, you'd be invoking "list_entry()".
But you don't even need that here, "list_for_each_entry()" is fine.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Acked-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Convert list_splice() + INIT_LIST_HEAD() to the equivalent list_splice_init()
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The function rdma_create_id() always returns either a valid pointer or
a value made with ERR_PTR, so its result should be tested with IS_ERR,
not with a test for 0.
The problem was found using the following semantic match.
(http://www.emn.fr/x-info/coccinelle/)
//<smpl>
@a@
expression E, E1;
statement S,S1;
position p;
@@
E = rdma_create_id(...)
... when != E = E1
if@p (E) S else S1
@n@
position a.p;
expression E,E1;
statement S,S1;
@@
E = NULL
... when != E = E1
if@p (E) S else S1
@depends on !n@
expression E;
statement S,S1;
position a.p;
@@
* if@p (E)
S else S1
//</smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The session_id members of struct nes_cm_listener and struct
nes_cm_node are write-only, so remove them. This allows the
session_id member of struct nes_cm_core to be removed as well, since
it is only used to write those other session_id values.
This removes the use of current->tgid (which will be deprecated)
pointed out by Pavel Emelyanov <xemul@openvz.org>.
Acked-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In slave_or_pri_blk(), pci_write_config_byte() is used to write a
16-bit quantity to clear linkctrl CRC error bits. This is clearly a
bug and also causes the warning
drivers/infiniband/hw/ipath/ipath_iba6110.c: In function 'slave_or_pri_blk':
drivers/infiniband/hw/ipath/ipath_iba6110.c:849: warning: overflow in implicit constant conversion
Fix this by using pci_write_config_word() instead.
Acked-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The receive queue number of WRs and SGEs shouldn't be checked if a
SRQ is specified.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Remove useless comment about list removal since locks are held and
the code checks that the QP is on the list before removing it.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Workaround a QLE7140 problem that in rare cases causes flow control
problems after link recovery by forcing a link retrain after recovery.
A module parameter is provided to control the behavior in case it causes
problems.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch adds code to get/set portinfo to support multiple link speeds
and widths.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
There's a conflict between our need to quiesce PSM-based applications
to avoid HoL blocking when the IB link goes down and the apps' desire
to remain running so that their quiescence timout mechanism can keep
running.
The compromise is to STOP the processes for a fixed period of time and
then alternate between CONT and STOP until the link is again active.
If there are poor interactions with subnet manager configuration at a
given site, the interval can be adjusted via a module paramter.
Signed-off-by: John Gregor <john.gregor@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Don't try to handle freeze mode HW errors if the driver is in diagnostic
mode since some tests can cause errors that shouldn't be processed.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The check for link up was incorrect, thus setting the LED display
inconsistently with the link state.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The error recovery code for updating the driver's cached status information
for which send buffers are busy or free wasn't updated for IBA7220.
It should be similar to the initialization code in enable_chip().
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix byte order of value assigned to pioavailshadow. This bug was
detected by sparse endianness warnings.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In mthca_alloc_icm_table(), the number of entries to allocate for the
table->icm array is computed by calculating obj_size * nobj and then
dividing by MTHCA_TABLE_CHUNK_SIZE. If nobj is really large, then
obj_size * nobj may overflow and the division may get the wrong value
(even a negative value). Fix this by calculating the number of
objects per chunk and then dividing nobj by this value instead.
This patch allows crazy configurations such as loading ib_mthca with
the module parameter num_mtt=33554432 to work properly.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
mthca_make_profile() returns the size in bytes of the HCA context
layout it creates, or a negative value if an error occurs. However,
the return value is declared as u64 and the memfree initialization
path casts this value to int to test if it is negative. This makes it
think incorrectly than an error has occurred if the context size
happens to be bigger than 2GB, since this turns into a negative int.
Fix this by having mthca_make_profile() return an s64 and testing
for an error by checking whether this 64-bit value itself is negative.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Pavel Emelyanov <xemul@openvz.org> mentioned in <http://lkml.org/lkml/2008/3/17/131>
that the task_struct->tgid field is about to become deprecated, so the
uses in the ehca driver need to be fixed up.
However, all the uses in ehca are for some object ownership checking
that is not really needed, and anyway is implementing a policy that
should be in common code rather than a low-level driver. So just
remove all the checks.
Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The current SRP initiator will allow unlimited s/g entries in the
indirect descriptors lists, but the entry count field in the SRP_CMD
request is 8 bits, so setting srp_sg_tablesize too large will open the
possibility of wrapping the count and generating invalid requests.
Clamp srp_sg_tablesize to the protocol limits to prevent surprises.
Reported by Martin W. Schlining III <mschlining@datadirectnet.com>.
Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Enable use of 4KB MTU. Since the driver uses more pinned memory for
receive buffers when the 4KB MTU is enabled, whether or not the fabric
supports that MTU, add a "mtu4096" module parameter that can be used to
limit the MTU to 2KB when it is known that 4KB MTUs can't be used
anyway.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The code was checking if units are present, but not that present units
were usable (link up, etc.)
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Modern I/O buses like PCIe and HT can be configured for multiple speeds
and widths. When an ipath HCA seems to have lower than expected
performance, it is very useful to be able to display what the driver
thinks the bus speed is.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch makes some constants chip-specific, and makes some related
changes to prepare for supporting another HCA.
Signed-off-by: Dave Olson <dave.olson@qlogic.com
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Recent sparse versions and kernel cleanups knock down the false positive
rate of the ipath driver code to a point where having it be sparse clean
is worthwhile. Here we fixup the sparse warnings. Some of these warnings
(and the impetus to run sparse again) are due to work by Roland Dreier.
Signed-off-by: Arthur Jones <arthur.jones@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Arbel and Sinai devices support checksum generation and verification
of TCP and UDP packets for UD IPoIB messages. This patch checks if
the HCA supports this and sets the IB_DEVICE_UD_IP_CSUM capability
flag if it does. It implements support for handling the IB_SEND_IP_CSUM
send flag and setting the csum_ok field in receive work completions.
Signed-off-by: Eli Cohen <eli@mellnaox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
ConnectX devices support checksum generation and verification of TCP
and UDP packets for UD IPoIB messages. This patch checks if the HCA
supports this and sets the IB_DEVICE_UD_IP_CSUM capability flag if it
does. It implements support for handling the IB_SEND_IP_CSUM send
flag and setting the csum_ok field in receive work completions.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Ali Ayub <ali@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
For HCAs that support checksum offload (ie that set IB_DEVICE_UD_IP_CSUM
in the device capabilities flags), have IPoIB set NETIF_F_IP_CSUM and
use the HCA to generate and verify IP checksums.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
__FUNCTION__ is gcc-specific, use __func__ instead.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Allow the compiler to optimize better and generate smaller code:
add/remove: 0/6 grow/shrink: 2/0 up/down: 1528/-1864 (-336)
function old new delta
.ehca_set_pagebuf 1344 2172 +828
.ehca_probe 2312 3012 +700
ehca_set_pagebuf_phys 24 - -24
ehca_set_pagebuf_fmr 24 - -24
ehca_init_device 24 - -24
.ehca_set_pagebuf_fmr 480 - -480
.ehca_set_pagebuf_phys 512 - -512
.ehca_init_device 800 - -800
Also this fixes warnings like:
drivers/infiniband/hw/ehca/ehca_mrmw.c:2015:5: warning: symbol 'ehca_set_pagebuf_fmr' was not declared. Should it be static?
Signed-off-by: Roland Dreier <rolandd@cisco.com>
On some platforms, eg sparc64, dma_addr_t is not the same size as a
pointer, so printing dma_addr_t values by casting to void * and using
a %p format generates warnings. Fix this by casting to unsigned long
and using %lx instead. This fixes the warnings:
drivers/infiniband/hw/nes/nes_verbs.c: In function 'nes_setup_virt_qp':
drivers/infiniband/hw/nes/nes_verbs.c:1047: warning: cast to pointer from integer of different size
drivers/infiniband/hw/nes/nes_verbs.c:1078: warning: cast to pointer from integer of different size
drivers/infiniband/hw/nes/nes_verbs.c:1078: warning: cast to pointer from integer of different size
drivers/infiniband/hw/nes/nes_verbs.c: In function 'nes_reg_user_mr':
drivers/infiniband/hw/nes/nes_verbs.c:2657: warning: cast to pointer from integer of different size
Reported by Andrew Morton <akpm@linux-foundation.org>.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
nes_unregister_ofa_device() dereferences the nesibdev pointer before
testing if it's NULL. Also, the test is doubly redundant because the
only caller of nes_unregister_ofa_device() is nes_destroy_ofa_device(),
which already tests if nesibdev is NULL. Remove the unnecessary test.
This was spotted by the Coverity checker (CID 2190).
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Christoph Hellwig wants to unexport get_empty_filp(), which is an ugly
internal interface. Change the modular user in ib_uverbs_alloc_event_file()
to use the better alloc_file() interface; this makes the code cleaner too.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The file member of struct ib_uverbs_event_file was only used to keep
track of whether the file had been closed or not. The only thing we
ever did with the value was check if it was NULL or not. Simplify the
code and get rid of the need to keep track of the struct file * we
allocate by replacing the file member with an is_closed member.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The struct mlx4_interface.event() method was supposed to get an enum
mlx4_dev_event, but the driver code was actually passing in the
hardware enum mlx4_event values. Fix up the callers of
mlx4_dispatch_event() so that they pass in the right type of value,
and fix up the event method in mlx4_ib so that it can handle the enum
mlx4_dev_event values.
This eliminates the need for the subtype parameter to the event
method, so remove it.
This also fixes the sparse warning
drivers/net/mlx4/intf.c:127:48: warning: mixing different enum types
drivers/net/mlx4/intf.c:127:48: int enum mlx4_event versus
drivers/net/mlx4/intf.c:127:48: int enum mlx4_dev_event
Signed-off-by: Roland Dreier <rolandd@cisco.com>
None of the cqp_reqs_XXX counters were ever used anywhere, and neither
was the nics_per_function variable.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add __force cast of node_guid to __u64, since we are sticking it into a
structure whose definition is shared with userspace.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Mostly update the RB tree comparisons to force __be types to normal
integers, but the change to cm_format_sidr_req() is a real fix:
param->path->pkey is already __be16.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Fix
drivers/infiniband/hw/ipath/ipath_init_chip.c:526:10: warning: symbol 'val' shadows an earlier one
drivers/infiniband/hw/ipath/ipath_init_chip.c:473:6: originally declared here
by giving the second val a different name.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Arthur Jones <arthur.jones@qlogic.com>
There's no reason for the third parameter of ipath_count_units() to be
a u32 *, so change it to be an int * instead. This fixes the sparse
warning:
drivers/infiniband/hw/ipath/ipath_file_ops.c:1654:47: warning: incorrect type in argument 3 (different signedness)
drivers/infiniband/hw/ipath/ipath_file_ops.c:1654:47: expected unsigned int [usertype] *maxportsp
drivers/infiniband/hw/ipath/ipath_file_ops.c:1654:47: got int *<noident>
Signed-off-by: Arthur Jones <arthur.jones@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix sparse warnings about pointer signedness by using a signed int when
calling idr_get_new_above().
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Write tests for NULL pointers as
if (!ptr)
instead of
if (ptr == 0UL)
to fix sparse warnings.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Because of a typo in iwch_accept_cr(), the cxgb3 connection handling
code programs the hardware IRD (incoming RDMA read queue depth) with
the value that is passed in for the ORD (outgoing RDMA read queue
depth). In particular this means that if an application passes in IRD
> 0 and ORD = 0 (which is a completely sane and valid thing to do for
an app that expects only incoming RDMA read requests), then the
hardware will end up programmed with IRD = 0 and the app will fail in
a mysterious way.
Fix this by using "ep->ird" instead of "ep->ord" in the intended place.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix the calculation of the MSS for RDMA connections: we need to
allow space in frames for a VLAN tag too.
Signed-off-by: Chien Tung <ctung@neteffect.com>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Commit 7143740d ("IPoIB: Add send gather support") made struct
ipoib_tx_buf significantly larger, since the mapping member changed
from a single u64 to an array with MAX_SKB_FRAGS + 1 entries. This
means that allocating tx_rings with kzalloc() may fail because there
is not enough contiguous memory for the new, much bigger size. Fix
this regression by allocating the rings with vmalloc() instead.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Commit 7143740d ("IPoIB: Add send gather support") made it possible
for tx_wr.num_sge to be != 1 -- this happens if send gather support is
enabled. However, the code in the connected mode post_send() function
assumes the old invariant, namely that tx_wr.num_sge is always 1. Fix
this by explicitly setting tx_wr.num_sge to 1 in the CM post_send().
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When set_multicast_list() is called the multicast task is restarted
and the IPOIB_MCAST_STARTED bit is cleared. As a result for some
window of time, multicast packets are not transmitted nor queued but
rather dropped by ipoib_mcast_send(). These dropped packets are
painful in two cases:
- bonding fail-over which both calls set_multicast_list() on the new
active slave and sends Gratuitous ARP through that slave.
- IP_DROP_MEMBERSHIP code which both calls set_multicast_list() on the
device and issues IGMP leave.
In both these cases, depending on the scheduling of the IPoIB
multicast task, the packets would be dropped. As a result, in the
bonding case, the failover would not be detected by the peers until
their neighbour is renewed the neighbour (which takes a few tens of
seconds). In the IGMP case, the IP router doesn't get an IGMP leave
and would only learn on that from further probes on the group (also a
delay of at least a few tens of seconds).
Fix this by allowing transmission (or queuing) depending on the
IPOIB_FLAG_OPER_UP flag instead of the IPOIB_MCAST_STARTED flag.
Signed-off-by: Olga Shern <olgas@voltaire.com>
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Reset the retry counter when we get a good RDMA_READ_RESPONSE_MIDDLE
packet. This fix will prevent the requester from reporting a retry
exceeded error too early.
Signed-off-by: Patrick Marchand Latifi <patrick.latifi@qlogic.com>
A work completion entry could be placed on the wrong completion
queue when an RC QP is placed in the error state.
Signed-off-by: Patrick Marchand Latifi <patrick.latifi@qlogic.com>
Acked-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch fixes the initialization of RC QPs, since we would rely on
the queue pair type (ibqp->qp_type) being set, but this field is only
initialized when we return from ipath_create_qp (it is initialized by
the user-level verbs library).
The fix is to not depend on this field to initialize the send and
the receive state of the RC QP.
Signed-off-by: Patrick Marchand Latifi <patrick.latifi@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
There can be a case where the requester's rnr retry counter
(s_rnr_retry) is less than the number of rnr retries allowed per QP
(s_rnr_retry_cnt). This can happen if the s_rnr_retry counter is being
decremented and an ipath_query_qp call is issued during that time frame.
The fix is to always return the number of rnr retries allowed per QP
instead of the requester's rnr counter.
Found by code review.
Signed-off-by: Patrick Marchand Latifi <patrick.latifi@qlogic.com>
Acked-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Subnet manager SetPortinfo messages distingush between changing the link
state (DOWN, ARM, ACTIVE) and the link physical state (POLL, SLEEP,
DISABLED). These are somewhat independent commands and affect when link
width and speed changes take effect. Without this patch, a link DOWN
physical state NOP command was causing the link width and speed settings
to take effect which should only happen when the link physical state is
goes down (either by a SMP or some link physical error like link errors
exceeding the threshold).
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
cm_work_handler() can access cm_id_priv after it drops its reference
by calling iwch_deref_id(), which might cause it to be freed. The fix
is to look at whether IWCM_F_CALLBACK_DESTROY is set _before_ dropping
the reference. Then if it was set, free the cm_id on this thread.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
"iser_device" allocation failure is "handled" with a BUG_ON() right
before dereferencing the NULL-pointer - fix this!
Signed-off-by: Arne Redlich <arne.redlich@xiranet.com>
Signed-off-by: Erez Zilber <erezz@voltaire.com>
The iteration through the list of "iser_device"s during device
lookup/creation is broken -- it might result in an infinite loop if
more than one HCA is used with iSER. Fix this by using
list_for_each_entry() instead of the open-coded flawed list iteration
code.
Signed-off-by: Arne Redlich <arne.redlich@xiranet.com>
Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The cxbg3 driver is unnecessarily decreasing the number of CQ entries by
one when creating a CQ. This will cause the CQ not to have as many
entries as requested by the user if the user requests a power of 2 size.
Signed-off-by: Jon Mason <jon@opengridcomputing.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Set cap.max_inline_data to the actual max inline data that the adapter
support, so that userspace apps see the right value returned.
Signed-off-by: Jon Mason <jon@opengridcomputing.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Commit a3cd7d90 ("IB/fmr_pool: ib_fmr_pool_flush() should flush all
dirty FMRs") caused a regression for iSER and was reverted in
e5507736.
This change attempts to redo the original patch so that all used FMR
entries are flushed when ib_flush_fmr_pool() is called without
affecting the normal FMR pool cleaning thread. Simply move used
entries from the clean list onto the dirty list in ib_flush_fmr_pool()
before letting the cleanup thread do its job.
Signed-off-by: Pete Wyckoff <pw@osc.edu>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This reverts commit a3cd7d9070.
The original commit breaks iSER reliably, making it complain:
iser: iser_reg_page_vec:ib_fmr_pool_map_phys failed: -11
The FMR cleanup thread runs ib_fmr_batch_release() as dirty entries
build up. This commit causes clean but used FMR entries also to be
purged. During that process, another thread can see that there are no
free FMRs and fail, even though there should always have been enough
available.
Signed-off-by: Pete Wyckoff <pw@osc.edu>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When a CM MAD is received, it is queued to a CM workqueue for
processing. The queued work item references the port and device on
which the MAD was received. If that device is removed from the system
before the work item can execute, the work item will reference freed
memory.
To fix this, flush the workqueue after unregistering to receive MAD,
and before the device is be freed.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Interrupt moderation low threshold value was incorrectly triggering,
indicating that the threshold should be lowered.
The impact was the timer was likely to become 40usecs and get stuck
there. The biggest side effect was too many interrupts and nonoptimal
performance.
Signed-off-by: John Lacombe <jlacombe@neteffect.com>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
With commit ef19454b ("[LIB] crc32c: Keep intermediate crc state in
cpu order"), the behavior of crc32c changes on big-endian platforms.
Our algorithm expects the previous behavior; otherwise we have RDMA
connection establishment failure on big-endian platforms like powerpc.
Apply cpu_to_le32() to value returned by crc32c() to get the previous
behavior.
Signed-off-by: Faisal Latif <flatif@neteffect.com>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Just delete the debugging statement so we don't use cqp_request after
freeing it. Adrian Bunk flagged this use-after-free issue spotted by
the Coverity checker.
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix a check-after-use spotted by the Coverity checker.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix a memory leak spotted by the Coverity checker.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix an off-by-one spotted by the Coverity checker.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Adrian Bunk pointed out that a Coverity scan found some apparently
dead code in nes_verbs.c that really shouldn't have been dead.
The function nes_create_cq() was missing the assignment
err = 1;
just prior to an iteration that conditionally set err = 0 if a PBL was
found for a given virtual CQ. I also noticed we should have been
returning -EFAULT on a couple related error paths.
Signed-off-by: Chien Tung <ctung@neteffect.com>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
A single entry (addr 0x10001000, size 0x2000) will get converted to
page address 0x10000000 with a page size of 0x4000. The code as it
stands doesn't address the single buffer case, but in fact it allows
the subsequent single-buffer special case to be eliminated entirely.
Because the mask now includes the (page adjusted) starting and ending
addresses, the general case works for the single buffer case as well.
Signed-off-by: Bryan Rosenburg <rosnbrg@us.ibm.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When mthca_fmr_alloc() returns an error, it should free the MPT at the
index key, not mr->ibmr.lkey, since the lkey has been mangled by
hw_index_to_key() and no longer is the real index. This bug causes
corruption of the MPT table free bitmap when mthca_fmr_alloc() fails.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Commit efcd9971 ("IPoIB/cm: Factor out ipoib_cm_free_rx_reap_list()")
introduced a bug in ipoib_cm_dev_stop() when the receive drain times
out. In that case, the function moves all the pending rx stuff into a
private list but then calls ipoib_cm_free_rx_reap_list(), which
handles a different list.
Fix this by moving everything to the rx_reap_list that will actually
get freed up.
This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=906>.
Signed-off-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In nes_create_qp(), the test
if (nesqp->mmap_sq_db_index > NES_MAX_USER_WQ_REGIONS) {
is used to error out if the db_index is too large; however, if the
test doesn't trigger, then the index is used as
nes_ucontext->mmap_nesqp[nesqp->mmap_sq_db_index] = nesqp;
and mmap_nesqp is declared as
struct nes_qp *mmap_nesqp[NES_MAX_USER_WQ_REGIONS];
which leads to an array overrun if the index is exactly equal to
NES_MAX_USER_WQ_REGIONS. Fix this by bailing out if the index is
greater than or equal to NES_MAX_USER_WQ_REGIONS.
This was spotted by the Coverity checker (CID 2162).
Acked-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
We need to account for the VLAN header size in nes_netdev_change_mtu()
and nes_netdev_init(). Also, add spin lock/unlock during VLAN RX
registration so only one process can assign VLAN group for a given
interface at a time.
Signed-off-by: Chien Tung <ctung@neteffect.com>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Only mask out MAC interrupt if necessary and re-enable on ifup. There
could be multiple netdevs going through the same MAC. MAC interrupts
should not be masked off until the last netdev is downed.
Signed-off-by: Chien Tung <ctung@neteffect.com>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If kobject_create_and_add() fails and returns NULL, the current code
in ib_device_register_sysfs() does not set ret and hence returns 0.
Set ret to -ENOMEM for this failure, so that the caller knows that
ib_device_register_sysfs() actually failed.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
There's an undesirable interaction with issuing MRA requests to
increase connection timeouts and the listen backlog.
When the rdma_cm receives a connection request, it queues an MRA with
the ib_cm. (The ib_cm will send an MRA if it receives a duplicate
REQ.) The rdma_cm will then create a new rdma_cm_id and give that to
the user, which in this case is the rdma_user_cm.
If the listen backlog maintained in the rdma_user_cm is full, it
destroys the rdma_cm_id, which in turns destroys the ib_cm_id. The
ib_cm_id generates a REJ because the state of the ib_cm_id has changed
to MRA sent, versus REQ received. When the backlog is full, we just
want to drop the REQ so that it is retried later.
Fix this by deferring queuing the MRA until after the user of the
rdma_cm has examined the connection request.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Currently mlx4_ib_fmr_alloc() calls mlx4_mr_enable() instead of
mlx4_fmr_enable(). The two functions are equivalent at the moment, but
this is not really correct (and the change is needed to fix a bug).
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
struct ipoib_cm_tx.ibwc is unused since commit 1b524963 ("IPoIB/cm:
Use common CQ for CM send completions"), so remove it.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
In P_Key event handling, if the old P_Key is no longer available, the
driver must call ipoib_ib_dev_stop() -- just as it does when the P_Key
is still available (see procedure __ipoib_ib_dev_flush()).
When a P_Key becomes available, the driver will perform ipoib_open(),
which assumes that the QP is in RESET, the cm_id has been
destroyed/deleted, etc. If ipoib_ib_dev_stop() is not called as
described above, then these assumptions will be false, and the attempt
to bring the interface up will fail.
Found by Mellanox QA.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
replace:
big_endian_variable = cpu_to_beX(beX_to_cpu(big_endian_variable) +
expression_in_cpu_byteorder);
with:
beX_add_cpu(&big_endian_variable, expression_in_cpu_byteorder);
Generated with a semantic patch.
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The cxgb3 HW and driver don't support loopback RDMA connections. So
fail any connection attempt where the destination address is local.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Commit 9af57b7a ("IB/cm: Add basic performance counters") introduced a
bug in how the reference count for cm_class.subsys.kobj was handled:
the path that released a device did a kobject_put() on that kobject, but
there was no kobject_get() in the path the handles adding a device. So
the reference count ended up too low, which leads to bad things. Fix up
and simplify the reference counting to avoid this.
(Actually, I introduced the bug when fixing the patch up to match some
of Greg's kobject changes, but who's counting)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Usually harmless, since the scatterlist is always hard-coded to a length
of 1, but it triggers a BUG() if CONFIG_DEBUG_SG=y, so we better fix it.
This fixes <http://bugzilla.kernel.org/show_bug.cgi?id=9934>.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch acts as a preparation for using checksum offload for IB
devices capable of inserting/verifying checksum in IP packets. The
patch does not actaully turn on NETIF_F_SG - we defer that to the
patches adding checksum offload capabilities.
We only add support for send gathers for datagram mode, since existing
HW does not support checksum offload on connected QPs.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
All current InfiniBand devices can handle all DMA addresses, and it's
hard to imagine anyone would be silly enough to build a new device
that couldn't. Therefore, enable the NETIF_F_HIGHDMA feature for IPoIB.
This has no effect for no, but is needed when we enable gather/scatter
support and checksum stateless offloads.
Signed-off-by: Eli Cohen <eli@mellnaox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
ConnectX HCA supports shrinking WQEs, so that a single work request
can be made of multiple units of wqe_shift. This way, WRs can differ
in size, and do not have to be a power of 2 in size, saving memory and
speeding up send WR posting. Unfortunately, if we do this then the
wqe_index field in CQEs can't be used to look up the WR ID anymore, so
our implementation does this only if selective signaling is off.
Further, on 32-bit platforms, we can't use vmap() to make the QP
buffer virtually contigious. Thus we have to use constant-sized WRs to
make sure a WR is always fully within a single page-sized chunk.
Finally, we use WRs with the NOP opcode to avoid wrapping around the
queue buffer in the middle of posting a WR, and we set the
NoErrorCompletion bit to avoid getting completions with error for NOP
WRs. However, NEC is only supported starting with firmware 2.2.232,
so we use constant-sized WRs for older firmware. And, since MLX QPs
only support SEND, we use constant-sized WRs in this case.
When stamping during NOP posting, do stamping following setting of the
NOP WQE valid bit.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
We use struct mlx4_buf for kernel QP, CQ and SRQ buffers, and the code
to look up an entry is duplicated in get_cqe_from_buf() and the QP and
SRQ versions of get_wqe(). Factor this out into mlx4_buf_offset().
This will also make it easier to switch over to using vmap() for buffers.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add a standard NIC and RDMA/iWARP driver for NetEffect 1/10Gb ethernet adapters.
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If the allocation of the MTT or the mailbox failed, mthca_fmr_alloc()
would return 0 (success) no matter what. This leads to crashes a
little down the road, when we try to dereference eg mr->mtt, which was
really ERR_PTR(-Ewhatever).
Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The string mlx4_ib_version was defined, but never used. Print out the
version once when the first device is initialized.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
We have recently discovered that Tavor mode requires each WQE in a
posted list of receive WQEs to have a valid NDA field at all times.
This requirement holds true for regular QPs as well as for SRQs. This
patch prelinks the receive queue in a regular QP and keeps the free
list in SRQ always properly linked.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Reviewed-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The SRQ receive posting functions make sure that srq->first_free never
becomes negative, so we can remove tests of whether it is negative.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Allocate memory for the page_list field of struct ib_pool_fmr only
when caching is enabled for the FMR pool, since the field is not used
otherwise. This can save significant amounts of memory for large
pools with caching turned off.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When a host just goes away (crash, power loss, etc.) without tearing
down its IB connections, it can get stale connection errors when it
tries to reconnect to targets upon rebooting. Retrying the connection
a few times will prevent sysadmins from playing the "which disk(s)
went missing?" game.
This would have made things slightly quicker when tracking down some
of the recent bugs, but it also helps quite a bit when you've got a
large number of targets hanging off a wedged server.
Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The firmware QUERY_ADAPTER command does not return vendor_id,
device_id, and revision_id; eliminate these fields from the query.
Initialize the rev_id field of the mlx4 device via init_node_data (MAD
IFC query), as is done in the query_device verb implementation.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
For memfree devices, the firmware QUERY_ADAPTER command does not
return vendor_id, device_id, and revision_id; do not return these
fields in the QUERY_ADAPTER function for memfree devices.
Instead, for memfree devices, initialize the rev_id field of the mthca
device via init_node_data (MAD IFC query), as is done in the
query_device verb implementation.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Commit 732a2170 ("IB/ipoib: Bound the net device to the ipoib_neigh
structue") left a misleading debug print (n->dev would be a bond
device only if boding is used). Clean it up.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Move up the code that checks for a situation where the remote GID
stored in the ipoib_neigh is different than the one present in the
neighbour (handle gratuitous ARP) or that a bonding fail over has
happened but the neighbour still has a pointer to an ipoib_neigh
created by a different device than the current slave. This will cause
the driver to apply the check also for connected mode neighbours.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In mthca_reg_phys_mr(), we calculate the page size for the HCA
hardware to use to map the buffer list passed in by the consumer.
For example, if the consumer passes in
[0] addr 0x1000, size 0x1000
[1] addr 0x2000, size 0x1000
then the algorithm would come up with a page size of 0x2000 and a list
of two pages, at 0x0000 and 0x2000. Usually, this would work fine
since the memory region would start at an offset of 0x1000 and have a
length of 0x2000.
However, the old code did not take into account the alignment of the
IO virtual address passed in. For example, if the consumer passed in
a virtual address of 0x6000 for the above, then the offset of 0x1000
would not be used correctly because the page mask of 0x1fff would
result in an offset of 0.
We can fix this quite neatly by making sure that the page shift we use
is no bigger than the first bit where the start of the first buffer
and the IO virtual address differ. Also, we can further simplify the
code by removing the special case for a single buffer by noticing that
it doesn't matter if we use a page size that is too big. This allows
the loop to compute the page shift to be replaced with __ffs().
Thanks to Bryan S Rosenburg <rosnbrg@us.ibm.com> for pointing out the
original bug and suggesting several ways to improve this patch.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch enables ehca to redirect any PMA queries to the
actual PMA QP.
Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Reviewed-by: Joachim Fenkes <fenkes@de.ibm.com>
Reviewed-by: Christoph Raisch <raisch@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The IB spec doesn't allow packets to QP0 sent on any other VL than VL15.
Hardware doesn't filter those packets on the send side, so we need to do
this in the driver and firmware.
As eHCA doesn't support QP0, we can just filter out all traffic going to
QP0, regardless of SL or VL.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Paths with hop_limit > 1 indicate that the connection will be routed
between IB subnets. Update the subnet local field in the CM REQ based
on the hop_limit value. In addition, if the path is routed, then set
the LIDs in the REQ to the permissive LIDs. This is used to indicate
to the passive side that it should use the LIDs in the received local
route header (LRH) associated with the REQ when programming the QP.
This is a temporary work-around to the IB CM to support IB router
development until the IB router specification is completed. It is not
anticipated that this work-around will cause any interoperability
issues with existing stacks or future stacks that will properly
support IB routers when defined.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
With the sg table code, every SCSI driver is now either chain capable
or broken (or has sg_tablesize set so chaining is never activated), so
there's no need to have a check in the host template.
Also tidy up the code by moving the scatterlist size defines into the
SCSI includes and permit the last entry of the scatterlist pools not
to be a power of two.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.25: (1470 commits)
[IPV6] ADDRLABEL: Fix double free on label deletion.
[PPP]: Sparse warning fixes.
[IPV4] fib_trie: remove unneeded NULL check
[IPV4] fib_trie: More whitespace cleanup.
[NET_SCHED]: Use nla_policy for attribute validation in ematches
[NET_SCHED]: Use nla_policy for attribute validation in actions
[NET_SCHED]: Use nla_policy for attribute validation in classifiers
[NET_SCHED]: Use nla_policy for attribute validation in packet schedulers
[NET_SCHED]: sch_api: introduce constant for rate table size
[NET_SCHED]: Use typeful attribute parsing helpers
[NET_SCHED]: Use typeful attribute construction helpers
[NET_SCHED]: Use NLA_PUT_STRING for string dumping
[NET_SCHED]: Use nla_nest_start/nla_nest_end
[NET_SCHED]: Propagate nla_parse return value
[NET_SCHED]: act_api: use PTR_ERR in tcf_action_init/tcf_action_get
[NET_SCHED]: act_api: use nlmsg_parse
[NET_SCHED]: act_api: fix netlink API conversion bug
[NET_SCHED]: sch_netem: use nla_parse_nested_compat
[NET_SCHED]: sch_atm: fix format string warning
[NETNS]: Add namespace for ICMP replying code.
...
Needed to propagate it down to the ip_route_output_flow.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Needed to propagate it down to the __ip_route_output_key.
Signed_off_by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
in_dev_find() need a namespace to pass it to fib_get_table(), so add
an argument.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes TOPDIR from infiniband Makefile and delete
one include statement pointing to a non-existing directory
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <mshefty@ichips.intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (200 commits)
[SCSI] usbstorage: use last_sector_bug flag universally
[SCSI] libsas: abstract STP task status into a function
[SCSI] ultrastor: clean up inline asm warnings
[SCSI] aic7xxx: fix firmware build
[SCSI] aacraid: fib context lock for management ioctls
[SCSI] ch: remove forward declarations
[SCSI] ch: fix device minor number management bug
[SCSI] ch: handle class_device_create failure properly
[SCSI] NCR5380: fix section mismatch
[SCSI] sg: fix /proc/scsi/sg/devices when no SCSI devices
[SCSI] IB/iSER: add logical unit reset support
[SCSI] don't use __GFP_DMA for sense buffers if not required
[SCSI] use dynamically allocated sense buffer
[SCSI] scsi.h: add macro for enclosure bit of inquiry data
[SCSI] sd: add fix for devices with last sector access problems
[SCSI] fix pcmcia compile problem
[SCSI] aacraid: add Voodoo Lite class of cards.
[SCSI] aacraid: add new driver features flags
[SCSI] qla2xxx: Update version number to 8.02.00-k7.
[SCSI] qla2xxx: Issue correct MBC_INITIALIZE_FIRMWARE command.
...
Correctly work around T3A issues by checking "hwtype != T3A" instead of
"hwtype == T3B". This will be needed for new hardware types.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The existing logic incorrectly maps this buffer list:
0: addr 0x10001000, size 0x1000
1: addr 0x10002000, size 0x1000
To this bogus page list:
0: 0x10000000
1: 0x10002000
The shift calculation must also take into account the address of the
first entry masked by the page_mask as well as the last address+size
rounded up to the next page size.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
- for kernel mode cqs, call event notification handler when flushing.
- flush QP when moving from RTS -> CLOSING.
- fix logic to identify a kernel mode qp.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Move the increment of s_hdrwords into the existing if block that tests
if we're doing a send with immediate, to save one test of the opcode.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
qdisc_run() now tests for queue_stopped() before calling
__qdisc_run(), and the same check is done in every iteration of
__qdisc_run(), so another check is not required in the driver xmit.
This means that ipoib_start_xmit() no longer needs to test
netif_queue_stopped(); the test was added to fix earlier kernels,
where the networking stack did not guarantee that the xmit method of
an LLTX driver would not be called after the queue was stopped, but
current kernels do provide this guarantee.
To validate, I put a debug in the TX_BUSY path which never hit with 64
threads running overnight exercising this code a few 100 million
times.
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add new mappings from port physical state (a HW register value) to the
IB SubnGet(PortInfo) port physical state.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The IBA7220 uses a count-based triggering mechanism, and therefore
can't use the same bandwidth verification mechanism as older chips.
To support the 7220, allow enabling and disabling armlaunch errors on
application request. Minor robustness improvements as well.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Clean up some unused header fields, minor related cleanup.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
IBA7220 includes many more configurable IB settings. Getting/setting
these is now grouped into a pair of chip specific functions accessed via
function pointers. Provide sysfs access to these settings.
Signed-off-by: Michael Albaugh <michael.albaugh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This adds the new (sometimes empty) chip-specific functions to the older
chips, and makes the initialization and related functions consistent across
all 3 chips.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This code has been unused for some time, but still had leftovers
from when it was used.
Signed-off-by: Dave Olson <dave.olson@qlogic.com
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Some HW revisions of eHCA2 may cause an RC connection to break if they
received RDMA Reads over that connection before. This can be
prevented by assuring that, after the first RDMA Read, the QP receives
a new RDMA Read every few million link packets.
Include code into the driver that inserts an empty (size 0) RDMA Read
into the message stream every now and then if the consumer doesn't
post them frequently enough.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch enhances ehca with a capability to "autodetect" the ports
being connected physically. In order to utilize that function the
module option nr_ports must be set to -1 (default is 2 - two
ports). This feature is experimental and will made the default later.
More detail:
If the user connects only one port to the switch, current code requires
1) port one to be connected and
2) module option nr_ports=1 to be given.
If autodetect is enabled, ehca will not wait at creation of the GSI QP
for the respective port to become active. Since firmware does not
accept modify_qp() while the port is down at initialization, we need
to cache all calls to modify_qp() for the SMI/GSI QP and just return a
good return code.
When a port is activated and we get a PORT_ACTIVE event, we replay the
cached modify-qp() parms and re-trigger any posted recv WRs. Only then
do we forward the PORT_ACTIVE event to registered clients.
The result of this autodetect patch is that all ports will be
accessible by the users. Depending on their respective cabling only
those ports that are connected properly will become operable. If a
user tries to modify a regular QP of a non-connected port, modify_qp()
will fail. Furthermore, ibv_devinfo should show the port state
accordingly.
Note that this patch primarily improves the loading behaviour of
ehca. If the cable is removed while the driver is operating and
plugged in again, firmware will handle that properly by sending an
appropriate async event.
Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add a .change_queue_depth handler to the scsi_host_template in the
iSER driver. iscsi_change_queue_depth was added to iscsi_tcp in order
to solve the problem of queue depth which was too high for some
targets. It is also applicable for iSER.
Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Some RDMA CM events are not supported or not handled in iSER.
This patch adds some info (printk) for the user about them.
Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When a FMR is released via ib_fmr_pool_unmap(), the FMR usually ends
up on the free_list rather than the dirty_list (because we allow a
certain number of remappings before actually requiring a flush).
However, ib_fmr_batch_release() only looks at dirty_list when flushing
out old mappings. This means that when ib_fmr_pool_flush() is used to
force a flush of the FMR pool, some dirty FMRs that have not reached
their maximum remap count will not actually be flushed.
Fix this by flushing all FMRs that have been used at least once in
ib_fmr_batch_release().
Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Normally, the serial numbers for flush requests and flushes executed
for an FMR pool should be in sync.
However, if the FMR pool flushes dirty FMRs because the
dirty_watermark was reached, we wake up the cleanup thread and let it
do its stuff. As a side effect, the cleanup thread increments
pool->flush_ser, which leaves it one higher than pool->req_ser. The
next time the user calls ib_flush_fmr_pool(), the cleanup thread will
be woken up, but ib_flush_fmr_pool() won't wait for the flush to
complete because flush_ser is already past req_ser. This means the
FMRs that the user expects to be flushed may not have all been flushed
when the function returns.
Fix this by telling the cleanup thread to do work exclusively by
incrementing req_ser, and by moving the comparison of dirty_len and
dirty_watermark into ib_fmr_pool_unmap().
Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
In addition to being overly complex, the locking in user_mad.c is
broken: there were multiple reports of deadlocks and lockdep warnings.
In particular it seems that a single thread may end up trying to take
the same rwsem for reading more than once, which is explicitly
forbidden in the comments in <linux/rwsem.h>.
To solve this, we change the locking to use plain mutexes instead of
rwsems. There is one mutex per open file, which protects the contents
of the struct ib_umad_file, including the array of agents and list of
queued packets; and there is one mutex per struct ib_umad_port, which
protects the contents, including the list of open files. We never
hold the file mutex across calls to functions like ib_unregister_mad_agent(),
which can call back into other ib_umad code to queue a packet, and we
always hold the port mutex as long as we need to make sure that a
device is not hot-unplugged from under us.
This even makes things nicer for users of the -rt patch, since we
remove calls to downgrade_write() (which is not implemented in -rt).
Signed-off-by: Roland Dreier <rolandd@cisco.com>
There are a few places in the ipath driver where a variable is
re-declared within a block where it is already in scope. Most of these
extra declarations can simply be removed, since the variable from the
outer scope is used in a way so that it does not need to keep its
variable across the block with the re-declaration.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Use round_jiffies() to align ehca's 1-second timer with other timers
and potentially save power by sleeping cores for longer.
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
By default, the responder_resources parameter is set to that received
in a connection request. The passive side may override this value
when accepting the connection. Use the value provided by the passive
side when transitioning the QP to RTR state, rather than the value
given in the connect request. Without this change, the RTR transition
may fail if the passive side supports fewer responder_resources than
that in the request.
For code consistency and to protect against QP destruction, restructure
overriding initiator_depth to match how responder_resources is set.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The original QHT7040 had significant performance issues so there was an
additional check in the driver for a newer serial number. Support for
the small quantities of that board shipped has been dropped, so this
patch removes the special checks to simplify the code.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Different chips have different width interrupt status registers, so add
a flag and accessor function to decide which width register read to use.
Signed-off-by: Arthur Jones <arthur.jones@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The 6110 had a bug that caused some registers to be swapped; it was
fixed for the 7220 (and didn't affect the 6120 because it had fewer
registers). This adds a flag and related code to handle that, and
includes some minor cleanups in the same area.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
The number of configured ports for the 7220 changes the number of eager
TIDs available per port, for all but port 0 (kernel port) which remains
constant, so add a field to give port0 count separate from the portdata
structure.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
User registers have different alignments on different chips (4KB on
older, 64KB on 7220). Allow mapping the user registers on kernels with
page sizes up to 64K.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Various hardware counters are exported via the ipath file system (since
it is binary data). The old file format was very dependent on the HW
offsets for these registers. Newer HCA chips can have different
counters at different offsets. This patch adds a level of indirection
to make the file format consistent across HCAs.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add support for QLogic HCAs which have hardware performance sampling
registers for PortSamplesControl and PortSamplesResult MADs.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When you have multiple targets, it gets really confusing when you try
to track down who did a reset when there is no identifying information
in the log message, especially when the same extension ID is mapped
through two different local IB ports. So, add an identifier that can
be used to track back to which local IB port/remote target pair is the
one having problems.
Signed-off-by: David Dillow <dillowda@ornl.gov>
Acked-by: Pete Wyckoff <pw@osc.edu>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Some HCAs (such as ehca2) support SRQ, but only support fewer than 16 SG
entries for SRQs. Currently IPoIB/CM implicitly assumes all HCAs will
support 16 SG entries for SRQs (to handle a 64K MTU with 4K pages). This
patch removes that restriction by limiting the maximum MTU in connected
mode to what the maximum number of SRQ SG entries allows.
This patch addresses <https://bugs.openfabrics.org/show_bug.cgi?id=728>
Signed-off-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
By default, the SCSI mid-layer seems to send down 512KB requests
(sg_tablesize = 256), with some requests occasionally combined. By
allowing the mid-layer to chain requests, we can easily grow to 1024KB
or larger -- I've tested 4096KB I/O requests with no problems.
I looked through the DMA paths on the hardware drivers to ensure they
could take advantage of the SG chaining, and it seems that every one
except ipath uses the system's DMA routines, which have been converted
to handle chaining. ipath looks like it should be OK, but I have no
way to test it.
Signed-off-by: David Dillow <dillowda@ornl.gov>
[ Tested on ipath. - Roland ]
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The current SRP initiator will send requests even if it has no credits
available. The results of sending extra requests are vendor specific,
but on some devices, overrunning credits will cost 85% of peak
performance -- e.g. 100 MB/s vs 720 MB/s. Other devices may just drop
the requests.
This patch will tell the SCSI midlayer to queue requests if there are
fewer than two credits remaining, and will not issue a task management
request if there are no credits remaining. The mid-layer will retry
the queued command once an outstanding command completes.
The patch also removes the unlikely() in __srp_get_tx_iu(), as it is
not at all unlikely to hit this limit under heavy load.
Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
An IPoIB subnet on an IB fabric that spans multiple IB subnets can't
use link-local scope in multicast GIDs. The existing routines that
map IP/IPv6 multicast addresses into IB link-level addresses hard-code
the scope to link-local, and they also leave the partition key field
uninitialised. This patch adds a parameter (the link-level broadcast
address) to the mapping routines, allowing them to initialise both the
scope and the P_Key appropriately, and fixes up the call sites.
The next step will be to add a way to configure the scope for an IPoIB
interface.
Signed-off-by: Rolf Manderscheid <rvm@obsidianresearch.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch moves some arrays that were defined per-device to be
variables defined in the per context data structure, thus avoiding extra
kzalloc() calls.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In preparation for upcoming chips that have different values for
INFINIPATH_R_PORTENABLE_SHIFT, INFINIPATH_R_INTRAVAIL_SHIFT,
INFINIPATH_R_TAILUPD_SHIFT, and portcfg_shift, remove the shared
#defines and use device-specific variables instead.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
kreceive is now portdata * instead of devdata * and other kreceive
related cleanups....
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Remove an unused parameter and fix up the comment.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch fixes a couple of minor problems with RNR NAK handling:
- The insertion sort was causing extra delay when inserting ahead
vs. behind an existing entry on the list.
- A resend of a first packet of a message which is still not ready,
needs another RNR NAK (i.e., it was suppressed when it shouldn't).
- Also, the resend tasklet doesn't need to be woken up unless the
ACK/NAK actually indicates progress has been made.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch allows ehca to forward event client-reregister-required to
registered clients. One such event is generated by a switch eg. after
its reboot.
Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Rather than byte-swapping cqe->g_mlpath_rqpn each time we extract a
field from it, byte-swap it once into a temporary variable. This
results in smaller, better code -- eg, on 32-bit x86:
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-5 (-5)
function old new delta
mlx4_ib_poll_cq 1188 1183 -5
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Remove MSI support from the mthca driver, as scheduled. There is no
reason to use MSI instead of MSI-X, since MSI-X performs better. No
one has spoken up since MSI support was deprecated in commit f6be6fbe
("IB/mthca: Schedule MSI support for removal"), so apparently the MSI
support is unused.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This is based on user feedback from Doug Ledford at RedHat:
Events that occur on an rdma_cm_id are reported to userspace through an
event channel. Connection request events are reported on the event
channel associated with the listen. When the connection is accepted, a
new rdma_cm_id is created and automatically uses the listen event
channel. This is suboptimal where the user only wants listen events on
that channel.
Additionally, it may be desirable to have events related to connection
establishment use a different event channel than those related to
already established connections.
Allow the user to migrate an rdma_cm_id between event channels. All
pending events associated with the rdma_cm_id are moved to the new event
channel.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Enable conn_id remove on the passive side after connection
establishment. This corrects an issue where the IB driver can't be
unloaded after running applications over RDS. The 'dev_remove' counter
does not reach 0 for established connections on the passive side.
This problem is limited to device removal, and only occurs on the
passive side if there are established connections.
Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In cancel_mads(), MADs are moved from the wait_list and local_list
to a cancel_list for processing. However, the structures on these two
lists are not the same. The wait_list references struct
ib_mad_send_wr_private, but local_list references struct
ib_mad_local_private. Cancel_mads() treats all items moved to the
cancel_list as struct ib_mad_send_wr_private. This leads to a system
crash when requests are moved from the local_list to the cancel_list.
Fix this by leaving local_list alone. All requests on the local_list
have completed are just awaiting processing by a queued worker thread.
Bug (crash) reported by Dotan Barak <dotanb@dev.mellanox.co.il>.
Problem with local_list access reported by Robert Reynolds
<rreynolds@opengridcomputing.com>.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add performance/debug counters to track sent/received messages, retries,
and duplicates. Counters are tracked per CM message type, per port.
The counters are always enabled, so intrusive state tracking is not done.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
To allow ULPs to tune timeout values and capture retry statistics,
report the number of times that a mad send operation was retried.
For RMPP mads, report the total number of times that the any portion
(send window) of the send operation was retried.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
P_key changes can invalidate multicast groups. Report errors on all
multicast groups affected by a pkey change.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>