This patch makes it so that simple_fill_super and get_sb_pseudo assign their
root inodes to be number 1. It also fixes up a couple of callers of
simple_fill_super that were passing in files arrays that had an index at
number 1, and adds a warning for any caller that sends in such an array.
It would have been nice to have made it so that it wasn't possible to make
such a collision, but some callers need to be able to control what inode
number their entries get, so I think this is the best that can be done.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove includes of <linux/smp_lock.h> where it is not used/needed.
Suggested by Al Viro.
Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
IPoIB: Convert to NAPI
IB: Return "maybe missed event" hint from ib_req_notify_cq()
IB: Add CQ comp_vector support
IB/ipath: Fix a race condition when generating ACKs
IB/ipath: Fix two more spin lock problems
IB/fmr_pool: Add prefix to all printks
IB/srp: Set proc_name
IB/srp: Add orig_dgid sysfs attribute to scsi_host
IPoIB/cm: Don't crash if remote side uses one QP for both directions
RDMA/cxgb3: Support for new abort logic
RDMA/cxgb3: Initialize cpu_idx field in cpl_close_listserv_req message
RDMA/cxgb3: Fail qp creation if the requested max_inline is too large
RDMA/cxgb3: Fix TERM codes
IPoIB/cm: Fix error handling in ipoib_cm_dev_open()
IB/ipath: Don't corrupt pending mmap list when unmapped objects are freed
IB/mthca: Work around kernel QP starvation
IB/ipath: Don't put QP in timeout queue if waiting to send
IB/ipath: Don't call spin_lock_irq() from interrupt context
Convert the IP-over-InfiniBand network device driver over to using
NAPI to handle completions for the main CQ. This covers all receives
as well as datagram mode sends; send completions for connected mode
connections are still handled from interrupt context.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The semantics defined by the InfiniBand specification say that
completion events are only generated when a completions is added to a
completion queue (CQ) after completion notification is requested. In
other words, this means that the following race is possible:
while (CQ is not empty)
ib_poll_cq(CQ);
// new completion is added after while loop is exited
ib_req_notify_cq(CQ);
// no event is generated for the existing completion
To close this race, the IB spec recommends doing another poll of the
CQ after requesting notification.
However, it is not always possible to arrange code this way (for
example, we have found that NAPI for IPoIB cannot poll after
requesting notification). Also, some hardware (eg Mellanox HCAs)
actually will generate an event for completions added before the call
to ib_req_notify_cq() -- which is allowed by the spec, since there's
no way for any upper-layer consumer to know exactly when a completion
was really added -- so the extra poll of the CQ is just a waste.
Motivated by this, we add a new flag "IB_CQ_REPORT_MISSED_EVENTS" for
ib_req_notify_cq() so that it can return a hint about whether the a
completion may have been added before the request for notification.
The return value of ib_req_notify_cq() is extended so:
< 0 means an error occurred while requesting notification
== 0 means notification was requested successfully, and if
IB_CQ_REPORT_MISSED_EVENTS was passed in, then no
events were missed and it is safe to wait for another
event.
> 0 is only returned if IB_CQ_REPORT_MISSED_EVENTS was
passed in. It means that the consumer must poll the
CQ again to make sure it is empty to avoid the race
described above.
We add a flag to enable this behavior rather than turning it on
unconditionally, because checking for missed events may incur
significant overhead for some low-level drivers, and consumers that
don't care about the results of this test shouldn't be forced to pay
for the test.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add a num_comp_vectors member to struct ib_device and extend
ib_create_cq() to pass in a comp_vector parameter -- this parallels
the userspace libibverbs API. Update all hardware drivers to set
num_comp_vectors to 1 and have all ULPs pass 0 for the comp_vector
value. Pass the value of num_comp_vectors to userspace rather than
hard-coding a value of 1.
We want multiple CQ event vector support (via MSI-X or similar for
adapters that can generate multiple interrupts), but it's not clear
how many vectors we want, or how we want to deal with policy issues
such as how to decide which vector to use or how to set up interrupt
affinity. This patch is useful for experimenting, since no core
changes will be necessary when updating a driver to support multiple
vectors, and we know that we want to make at least these changes
anyway.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix a problem where simple ACKs can be sent ahead of RDMA read
responses thus implicitly NAKing the RDMA read.
Signed-off-by: Ralph Campbell <ralph.cambpell@qlogic.com>
Signed-off-by: Robert Walsh <robert.walsh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix a missing unlock in ipath_rc_rcv_resp() and remove an extra unlock
from ipath_rc_rcv_error().
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add an orig_dgid attribute in sysfs for SRP scsi_hosts, so that
userspace can tell what the original dgid value written to the
add_target file was, even if the connection is redirected to a
different port while connecting.
Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The IPoIB CM spec allows the use of a single connection in both
active->passive and passive->active directions. The current Linux
code uses one connection for both directions, but if another node only
uses one connection for both directions, we oops when we try to look
up the passive connection. Fix by checking that qp_context is
non-NULL before dereferencing it.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
The HW now posts 2 ABORT_RPL and/or PEER_ABORT_REQ messages. We need
to handle them by silenty dropping the 1st but mark that we're ready
for the final message. This plugs some close races between the uP and
HW. Also update the minimum required firmware version.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
I noticed that many source files include <linux/pci.h> while they do
not appear to need it. Here is an attempt to clean it all up.
In order to find all possibly affected files, I searched for all
files including <linux/pci.h> but without any other occurence of "pci"
or "PCI". I removed the include statement from all of these, then I
compiled an allmodconfig kernel on both i386 and x86_64 and fixed the
false positives manually.
My tests covered 66% of the affected files, so there could be false
positives remaining. Untested files are:
arch/alpha/kernel/err_common.c
arch/alpha/kernel/err_ev6.c
arch/alpha/kernel/err_ev7.c
arch/ia64/sn/kernel/huberror.c
arch/ia64/sn/kernel/xpnet.c
arch/m68knommu/kernel/dma.c
arch/mips/lib/iomap.c
arch/powerpc/platforms/pseries/ras.c
arch/ppc/8260_io/enet.c
arch/ppc/8260_io/fcc_enet.c
arch/ppc/8xx_io/enet.c
arch/ppc/syslib/ppc4xx_sgdma.c
arch/sh64/mach-cayman/iomap.c
arch/xtensa/kernel/xtensa_ksyms.c
arch/xtensa/platform-iss/setup.c
drivers/i2c/busses/i2c-at91.c
drivers/i2c/busses/i2c-mpc.c
drivers/media/video/saa711x.c
drivers/misc/hdpuftrs/hdpu_cpustate.c
drivers/misc/hdpuftrs/hdpu_nexus.c
drivers/net/au1000_eth.c
drivers/net/fec_8xx/fec_main.c
drivers/net/fec_8xx/fec_mii.c
drivers/net/fs_enet/fs_enet-main.c
drivers/net/fs_enet/mac-fcc.c
drivers/net/fs_enet/mac-fec.c
drivers/net/fs_enet/mac-scc.c
drivers/net/fs_enet/mii-bitbang.c
drivers/net/fs_enet/mii-fec.c
drivers/net/ibm_emac/ibm_emac_core.c
drivers/net/lasi_82596.c
drivers/parisc/hppb.c
drivers/sbus/sbus.c
drivers/video/g364fb.c
drivers/video/platinumfb.c
drivers/video/stifb.c
drivers/video/valkyriefb.c
include/asm-arm/arch-ixp4xx/dma.h
sound/oss/au1550_ac97.c
I would welcome test reports for these files. I am fine with removing
the untested files from the patch if the general opinion is that these
changes aren't safe. The tested part would still be nice to have.
Note that this patch depends on another header fixup patch I submitted
to LKML yesterday:
[PATCH] scatterlist.h needs types.h
http://lkml.org/lkml/2007/3/01/141
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
These are all the remaining instances of get_property. Simple rename of
get_property to of_get_property.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Fix TERMINATE layer, type, and ecode values based on
conformance testing.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If skb allocation fails when we start the device, we call
ipoib_cm_dev_stop() even though ipoib_cm_dev_open() did not run to
completion, so we pass an invalid pointer to ib_destroy_cm_id and get
an oops.
Fix by clearing cm.id on error, and testing it in cm_dev_stop().
This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=561>
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix the pending mmap code so it doesn't corrupt the list of pending
mmaps and crash the machine when pending mmaps are destroyed without
first being mapped. Also, remove an unused variable, and use standard
kernel lists instead of our own homebrewed linked list implementation
to keep the pending mmap list.
Signed-off-by: Robert Walsh <robert.walsh@qlogic.com>
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
With mthca, RC QPs can starve each other and even UD QPs on the same
hardware schedule queue. As a result, userspace MPI can starve
e.g. IPoIB traffic, with netdev watchdog warnings getting printed out,
and TCP connections getting stuck or failing.
Reduce the chance of this happening by using three separate hardware
schedule queues: one for userspace RC QPs, one for kernel RC QPs, and
one for all other QPs.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This fixes a problem which causes too many RC timeouts and
retransmits.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch fixes the problem reported by Bernd Schubert <bs@q-leap.de>
with kernel debug options enabled:
BUG: at kernel/lockdep.c:1860 trace_hardirqs_on()
This was caused by using spin_lock_irq()/spin_unlock_irq() from
interrupt context. Fix all the places that might be called from
interrupts to use spin_lock_irqsave()/spin_unlock_irqrestore().
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband: (49 commits)
IB: Set class_dev->dev in core for nice device symlink
IB/ehca: Implement modify_port
IB/umad: Clarify documentation of transaction ID
IPoIB/cm: spin_lock_irqsave() -> spin_lock_irq() replacements
IB/mad: Change SMI to use enums rather than magic return codes
IB/umad: Implement GRH handling for sent/received MADs
IB/ipoib: Use ib_init_ah_from_path to initialize ah_attr
IB/sa: Set src_path_bits correctly in ib_init_ah_from_path()
IB/ucm: Simplify ib_ucm_event()
RDMA/ucma: Simplify ucma_get_event()
IB/mthca: Simplify CQ cleaning in mthca_free_qp()
IB/mthca: Fix mthca_write_mtt() on HCAs with hidden memory
IB/mthca: Update HCA firmware revisions
IB/ipath: Fix WC format drift between user and kernel space
IB/ipath: Check that a UD work request's address handle is valid
IB/ipath: Remove duplicate stuff from ipath_verbs.h
IB/ipath: Check reserved memory keys
IB/ipath: Fix unit selection when all CPU affinity bits set
IB/ipath: Don't allow QPs 0 and 1 to be opened multiple times
IB/ipath: Disable IB link earlier in shutdown sequence
...
To clearly state the intent of copying from linear sk_buffs, _offset being a
overly long variant but interesting for the sake of saving some bytes.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Now to convert the last one, skb->data, that will allow many simplifications
and removal of some of the offset helpers.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes
on 64bit architectures, allowing us to combine the 4 bytes hole left by the
layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4
64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN...
:-)
Many calculations that previously required that skb->{transport,network,
mac}_header be first converted to a pointer now can be done directly, being
meaningful as offsets or pointers.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the common, open coded 'skb->h.raw = skb->data' operation, so that we can
later turn skb->h.raw into a offset, reducing the size of struct sk_buff in
64bit land while possibly keeping it as a pointer on 32bit.
This one touches just the most simple cases:
skb->h.raw = skb->data;
skb->h.raw = {skb_push|[__]skb_pull}()
The next ones will handle the slightly more "complex" cases.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the common, open coded 'skb->mac.raw = skb->data' operation, so that we can
later turn skb->mac.raw into a offset, reducing the size of struct sk_buff in
64bit land while possibly keeping it as a pointer on 32bit.
This one touches just the most simple case, next will handle the slightly more
"complex" cases.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One less thing for drivers writers to worry about.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All RDMA drivers except ehca set class_dev->dev to their dma_device
value (ehca leaves this unset). dma_device is the only value that
makes any sense, so move this assignment to core/sysfs.c. This reduce
the duplicated code in the rest of the drivers and gives ehca a nice
/sys/class/infiniband/ehcaX/device symlink.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add "Modify Port" verb support to eHCA driver. The IB communication
manager needs this to set the IsCM port capability bit when
initializing.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
There are quite a few places in ipoib_cm.c where we know IRQs are
enabled because we do something that sleeps in the same function, so
we can convert several occurrences of spin_lock_irqsave() to a plain
spin_lock_irq(). This cleans up the source a little and makes the
code smaller too:
add/remove: 0/0 grow/shrink: 1/5 up/down: 3/-51 (-48)
function old new delta
ipoib_cm_tx_reap 403 406 +3
ipoib_cm_stale_task 146 145 -1
ipoib_cm_dev_stop 173 172 -1
ipoib_cm_tx_handler 964 956 -8
ipoib_cm_rx_handler 956 937 -19
ipoib_cm_skb_reap 212 190 -22
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Clarify code by changing return values from SMI functions to named
enum values instead of magic 0/1 values.
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
We need to set the SGID index for routed MADs and pass received
GRH information to userspace when a MAD is received.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
To support destinations that are not on the local IB subnet, IPoIB
should include the GRH information when constructing an address
handle. Using the existing ib_init_ah_from_path() call will do this
for us.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
mthca_free_qp() already has local variables to hold the QP's send_cq
and recv_cq, so we can slightly clean up the calls to mthca_cq_clean()
by using those local variables instead of expressions like
to_mcq(qp->ibqp.send_cq).
Also, by cleaning the recv_cq first, we can avoid worrying about
whether the QP is attached to an SRQ for the second call, because we
would only clean send_cq if send_cq is not equal to recv_cq, and that
means send_cq cannot have any receive completions from the QP being
destroyed.
All this work even improves the generated code a bit:
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-5 (-5)
function old new delta
mthca_free_qp 510 505 -5
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Commit b2875d4c ("IB/mthca: Always fill MTTs from CPU") causes a crash
in mthca_write_mtt() with non-memfree HCAs that have their memory
hidden (that is, have only two PCI BARs instead of having a third BAR
that allows access to the RAM attached to the HCA) on 64-bit
architectures. This is because the commit just before, c20e20ab
("IB/mthca: Merge MR and FMR space on 64-bit systems") makes
dev->mr_table.fmr_mtt_buddy equal to &dev->mr_table.mtt_buddy and
hence mthca_write_mtt() tries to write directly into the HCA's MTT
table. However, since that table is in the HCA's memory, this is
impossible without the PCI BAR that gives access to that memory.
This causes a crash because mthca_tavor_write_mtt_seg() basically
tries to dereference some offset of a NULL pointer. Fix this by
adding a test of MTHCA_FLAG_FMR in mthca_write_mtt() so that we always
use the WRITE_MTT firmware command rather than writing directly if
FMRs are not enabled.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The kernel ib_wc structure now uses a QP pointer, but the user space
equivalent uses a QP number instead. This means we can no longer use
a simple structure copy to copy stuff into user space.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Don't let userspace use the direct-physical-map L_key or R_key.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
At some point things changed so that all the affinity bits can be set,
but cpus_full() macro is not true. This caused problems with the unit
selection logic on multi-unit (board) configurations.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Move the code that shuts down the IB link earlier in the unload
process, to be sure no new packets can arrive while we are unloading.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
To prevent random utility reads and writes of the diag interface to the
chip, we first require a handshake of reading from offset 0 and writing
to offset 0 before any other reads or writes can be done through the
diags device. Otherwise chip errors can be triggered.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If the chip is no longer usable, LEDs should be turned off so system
can be found easily in the cluster.
Also some minor reorganizing so both chips print hardware error
message at same point and only if there were unrecovered errors
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Re-init of the kernel structures after a chip reset was leaving the
portdata structure for port zero in an inconsistent state, and a
pointer to it either stale (in re-init code) or NULL (in devdata)
Fixing the order of operations on this struct, and the condition for
interrupt access, prevents the crashes.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Due to a chip bug, the PIOAvail register is not always updated to
memory. This patch allows userspace to force an update.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In initialization, if we bailed at chip specific initialization, we
forgot to clean up the irq we had requested.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch fixes a bug where multicast packets without a GRH were not
being dropped as per the IB spec.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If the module parameter "kpiobufs" is set too high, the calculation to
reset it to a sane value was incorrect.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix RDMA read response length checking for RDMA_READ_RESPONSE_ONLY to
allow a zero length response. RDMA read responses which don't match
the expected length or occur in response to some other operation
should generate a completion queue error (see table 56, ch. 9.9.2.3 in
the IB spec).
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Improve port-sharing performance by allowing any process to receive
packets from the shared hardware port under a spin lock for mutual
exclusion. Previously, one process was nominated as the master and
that process was responsible for receiving all packets from the shared
hardware port and either consuming them or forwarding them to their
destination. This led to starvation problems for other processes when
the master process was busy in computation phases.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The port sharing feature mixed kernel virtual addresses as well as
physical addresses for the offset used to describe the mmap address to
map the InfiniPath hardware into user space. This had a conflict on
powerpc. The new scheme converts it to a physical address so it
doesn't conflict with chip addresses and yet still fits in 40/44 bits
so it isn't truncated by 32-bit applications calling mmap64().
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If a receive work request has been removed from the queue but has not
had a CQ entry generated for it and the QP is modified to the error
state, the completion entry generated is incorrect. This patch fixes
the problem.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Code was converted from a &= ~mask to clear_bit, but the bit was left
shifted instead of being used directly, so we were either trashing
memory several pages away, or sometimes taking a kernel page fault on
an invalid page.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Some types of packet errors are moderately common with longer IB
cables and large clusters, and are not reported with prints by other
IB HCA drivers. This suppresses those messages unless the new
__IPATH_ERRPKTDBG bit is set in ipath_debug. Reporting of temporarily
disabled frequent error interrupts was also made clearer
We also distinguish between chip errors, and bad packets sent or
received in the wording of the messages.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch fixes a number of bugs with updating the PSN for retries of
RC requests.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When switching to the QP error state, the completion queue entries
(error or flush) were not being generated correctly.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
ipath_dbg doesn't need the same prefixes that printk does.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch adds support for multiple RDMA reads and atomics to be sent
before an ACK is required to be seen by the requester.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If a post send is done in loopback and there is no receive queue
entry, the sending QP is put on a timeout list for a while so the
receiver has a chance to post a receive buffer. If the another post
send is done, the code incorrectly tried to put the QP on the timeout
list again an corrupted the timeout list. This eventually leads to a
spin lock deadlock NMI due to the timer function looping forever with
the lock held.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
A silly programming error causes a CQ entry to not be generated if a
SRQ limit event is generated.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
A recent change was made to allocate memory for a port after CPU
affinity is set. That change didn't account for subports and was
trying to allocate memory for the port twice.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The chip documentation on the expected TID vs eager TID parity error
bits was reversed from what was implemented in the RTL, for both
chips. This corrects the definitions.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The loop which initializes the user memory region from an array of
pages was using the wrong limit for the array. This worked OK when
dma_map_sg() returned the same number as the number of pages. This
patch fixes the problem.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This is a sticky state. It is useful for diagnosing problems with
boards versus cable/switch problems.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
There's no point in printing the opcode field in the completion
handling debugging output, since the type of completion is already
printed at the beginning of the line. In fact the opcode field is not
even defined for completions with a status other than success.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The current ib_umad code never accesses bits past IB_UMAD_MAX_PORTS in
dev_map[]. We shouldn't declare it to be twice as big.
Pointed-out-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
In mthca_arbel_fmr_unmap(), the high bits of the key are masked off.
This gets rid of the effect of adjust_key(), which makes sure that
bits 3 and 23 of the key are equal when the Sinai throughput
optimization is enabled, and so it may happen that an FMR will end up
with bits 3 and 23 in the key being different. This causes data
corruption, because when enabling the throughput optimization, the
driver promises the HCA firmware that bits 3 and 23 of all memory keys
will always be equal.
Fix by re-applying adjust_key() after masking the key.
Thanks to Or Gerlitz for reproducing the problem, and Ariel Shahar for
help in debug.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
As of commit 6cdbd77e ("cxgb3 - missing CPL hanler and register
setting."), the cxgb3 ethernet NIC driver no longer handles SET_TCB
replies, so we need to do it in the iWARP driver.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Acked-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Receive buffers need to be mapped with DMA_FROM_DEVICE. Incorrectly
mapping with DMA_TO_DEVICE causes a hard lock on ppc64 machines with
an IOMMU.
This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=431>
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When a connection is terminated asynchronously from the iSCSI layer's
perspective, iSER needs to notify the iSCSI layer that the connection
has failed. This is done using a workqueue (switched to from the iSER
tasklet context). Meanwhile, the connection object (that holds the
work struct) is released. If the workqueue function wasn't called
yet, it will be called later with a NULL pointer, which will crash the
kernel.
The context switch (tasklet to workqueue) is not required, and
everything can be done from the iSER tasklet. This eliminates the NULL
work struct bug (and simplifies the code).
Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
IB/iser: Handle aborting a command after it is sent
IB/mthca: Fix thinko in init_mr_table()
RDMA/cxgb3: Fix resource leak in cxio_hal_init_ctrl_qp()
The SCSI midlayer may abort a command that was already sent. If the
initiator is still trying to send the command (or data-out PDUs for
that command), the QP may time out after the midlayer times
out. Therefore, when aborting the command, iSER may still have
references for the command's buffers. When sending these PDUs, the
sends will complete with an error and their resources will be released
then.
Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Commit c20e20ab ("IB/mthca: Merge MR and FMR space on 64-bit systems")
swapped the number of MTTs and MPTs when initializing the MR table. As
a result, we get a kernel oops when the number of MTT segments
allocated exceeds 0x20000.
Noted by Troy Benjegerdes <troy@scl.ameslab.gov>, and reproduced by
Dotan Barak <dotanb@mellanox.co.il>. This fixes
https://bugs.openfabrics.org/show_bug.cgi?id=490
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This was spotted by the Coverity checker (CID 1554).
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
->neigh_destructor() is killed (not used), replaced with
->neigh_cleanup(), which is called when neighbor entry goes to dead
state. At this point everything is still valid: neigh->dev,
neigh->parms etc.
The device should guarantee that dead neighbor entries (neigh->dead !=
0) do not get private part initialized, otherwise nobody will cleanup
it.
I think this is enough for ipoib which is the only user of this thing.
Initialization private part of neighbor entries happens in ipib
start_xmit routine, which is not reached when device is down. But it
would be better to add explicit test for neigh->dead in any case.
Signed-off-by: David S. Miller <davem@davemloft.net>
The packet length checks in ipoib are broken: we add 4 bytes (IPoIB
encapsulation header) when sending a packet, not 20 bytes (hardware
address length) to each packet. Therefore, if connected mode is
enabled so that the interface MTU is larger than the multicast MTU,
IPoIB may end up trying to send too-long multicast packets. For
example, multicast is broken if a message of size 2048 bytes is sent
on an interface with UD MTU 2048, because 2048 is bigger than the real
limit of 2044 but the code tests against the wrong limit of 2060.
This patch fixes <https://bugs.openfabrics.org/show_bug.cgi?id=418>,
submitted by Scott Weitzenkamp <sweitzen@cisco.com>.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The connected mode code added the possibility that an neigh struct
gets freed in the list_for_each_entry() loop in path_rec_completion(),
which causes a use-after-free. Fix this by changing to the _safe
variant of the list walking macro.
This was spotted by the Coverity checker (CID 1567).
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
eHCA scaling code must not depend on register_cpu_notifier() if
CONFIG_HOTPLUG_CPU is not set, so put all related code into #ifdefs.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
There's a race between ipoib_mcast_leave() and ipoib_mcast_join_finish()
where we can try to detach from a multicast group before we've
attached to it. Fix this by reordering the code in ipoib_mcast_leave
to free the multicast group first, which waits for the multicast
callback thread (which calls ipoib_mcast_join_finish()) to complete
before detaching from the group.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The sense of the time_after_eq() test in ipoib_cm_stale_task() is
reversed so that only non-stale connections are reaped. Fix this by
changing to time_before_eq().
Noticed by Pradeep Satyanarayana <pradeep@us.ibm.com>.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>