Commit Graph

2676 Commits

Author SHA1 Message Date
Mike Christie
2261ec3d68 [SCSI] iser: handle iscsi_cmd_task rename
This handles the iscsi_cmd_task rename and renames
the iser cmd task to iser task.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:20 -05:00
Mike Christie
2747fdb257 [SCSI] iser: convert ib_iser to support merged tasks
Convert ib_iser to support merged tasks.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:19 -05:00
Mike Christie
0af967f5d4 [SCSI] libiscsi, iscsi_tcp, iser: add session cmds array accessor
Currently to get a ctask from the session cmd array, you have to
know to use the itt modifier. To make this easier on LLDs and
so in the future we can easilly kill the session array and use
the host shared map instead, this patch adds a nice wrapper
to strip the itt into a session->cmds index and return a ctask.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:18 -05:00
Mike Christie
b40977d95f [SCSI] iser: fix handling of scsi cmnds during recovery.
After the stop_conn callback has returned the LLD should not
touch the scsi cmds. iscsi_tcp and libiscsi use the
conn->recv_lock and suspend_rx field to halt recv path
processing, but iser does not have any protection.

This patch modifies iser so that userspace can just
call the ep_disconnect callback, which will halt
all recv IO, before calling the stop_conn callback so
we do not have to worry about the conn->recv_lock and
suspend rx field. iser just needs to stop the send side
from accessing the ib conn.

Fixup to handle when the ep poll fails and ep disconnect
is called from Erez.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:17 -05:00
Mike Christie
5d91e209fb [SCSI] iscsi: remove session/conn_data_size from iscsi_transport
This removes the session and conn data_size fields from the iscsi_transport.
Just pass in the value like with host allocation. This patch also makes
it so the LLD iscsi_conn data is allocated with the iscsi_cls_conn.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:16 -05:00
Mike Christie
a4804cd6eb [SCSI] iscsi: add iscsi host helpers
This finishes the host/session unbinding, by adding some helpers
to add and remove hosts and the session they manage.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:16 -05:00
Mike Christie
756135215e [SCSI] iscsi: remove session and host binding in libiscsi
bnx2i allocates a host per netdevice but will use libiscsi,
so this unbinds the session from the host in that code.

This will also be useful for the iser parent device dma settings
fixes.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:16 -05:00
Mike Christie
d3826721b1 [SCSI] iscsi class, iscsi drivers: remove unused iscsi_transport attrs
max_cmd_len and max_conn are not really used. max_cmd_len is
always 16 and can be set by the LLD. max_conn is always one
since we do not support MCS.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:15 -05:00
Mike Christie
40753caa36 [SCSI] iscsi class, iscsi_tcp/iser: add host arg to session creation
iscsi offload (bnx2i and qla4xx) allocate a scsi host per hba,
so the session creation path needs a shost/host_no argument.
Software iscsi/iser will follow the same behabior as before
where it allcoates a host per session, but in the future iser
will probably look more like bnx2i where the host's parent is
the hardware (rnic for iser and for bnx2i it is the nic), because
it does not use a socket layer like how iscsi_tcp does.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:15 -05:00
Roland Dreier
feae1ef116 IB/umad: BKL is not needed for ib_umad_open()
Remove explicit lock_kernel() calls and document why the code is safe.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2008-07-11 16:40:58 -06:00
Ingo Molnar
0c81b2a144 Merge branch 'linus' into core/rcu
Conflicts:

	include/linux/rculist.h
	kernel/rcupreempt.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 10:46:50 +02:00
Steve Wise
5e19cf663b RDMA/cxgb3: Fix regression caused by class_device -> device conversion
The change to iwch_provider.c in commit f4e91eb4 ("IB: convert struct
class_device to struct device") undid the fix done in commit 7f049f2f
("RDMA/cxgb3: Hold rtnl_lock() around ethtool get_drvinfo call").  It
removed the calls to rtnl_lock() that serialized the iw_cxgb3 ethtool
ops calls into the cxgb3 driver.  This locking is needed to avoid
messing up the internal state of the cxgb3 driver.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-08 14:40:05 -07:00
Ingo Molnar
68083e05d7 Merge commit 'v2.6.26-rc9' into cpus4096 2008-07-06 14:23:39 +02:00
Roland Dreier
5b2d281acb IB/uverbs: BKL is not needed for ib_uverbs_open()
Remove explicit lock_kernel() calls and document why the code is safe.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2008-07-04 10:32:28 -06:00
Ingo Molnar
9a13150109 Merge commit 'v2.6.26-rc8' into core/rcu 2008-06-26 09:24:23 +02:00
Eli Cohen
87afd448b1 IB/mthca: Clear ICM pages before handing to FW
Current memfree FW has a bug which in some cases, assumes that ICM
pages passed to it are cleared.  This patch uses __GFP_ZERO to
allocate all ICM pages passed to the FW.  Once firmware with a fix is
released, we can make the workaround conditional on firmware version.

This fixes the bug reported by Arthur Kepner <akepner@sgi.com> here:
http://lists.openfabrics.org/pipermail/general/2008-May/050026.html

Cc: <stable@kernel.org>
Signed-off-by: Eli Cohen <eli@mellanox.co.il>

[ Rewritten to be a one-liner using __GFP_ZERO instead of vmap()ing
  ICM memory and memset()ing it to 0. - Roland ]

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-06-23 09:29:58 -07:00
Ingo Molnar
1e74f9cbbb Merge branch 'linus' into core/rcu 2008-06-23 11:29:11 +02:00
Arnd Bergmann
6b0ee363b2 infiniband-ucma: BKL pushdown
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2008-06-20 14:05:57 -06:00
Jonathan Corbet
f2b9857eee Add a bunch of cycle_kernel_lock() calls
All of the open() functions which don't need the BKL on their face may
still depend on its acquisition to serialize opens against driver
initialization.  So make those functions acquire then release the BKL to be
on the safe side.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2008-06-20 14:05:53 -06:00
Jonathan Corbet
057e7c7ff9 infiniband: more BKL pushdown
Be extra-cautious and protect the remaining open() functions.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2008-06-20 14:05:51 -06:00
Jonathan Corbet
d21c95c569 Add "no BKL needed" comments to several drivers
This documents the fact that somebody looked at the relevant open()
functions and concluded that, due to their trivial nature, no locking was
needed.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2008-06-20 14:05:50 -06:00
Jack Morgenstein
fb77bcef9f IB/uverbs: Fix check of is_closed flag check in ib_uverbs_async_handler()
Commit 1ae5c187 ("IB/uverbs: Don't store struct file * for event
files") changed the way that closed files are handled in the uverbs
code.  However, after the conversion, is_closed flag is checked
incorrectly in ib_uverbs_async_handler().  As a result, no async
events are ever passed to applications.

Found by: Ronni Zimmerman <ronniz@mellanox.co.il>

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-06-18 15:36:38 -07:00
Ingo Molnar
766d02786e Merge branch 'linus' into core/rcu 2008-06-16 11:23:36 +02:00
Roland Dreier
24797a3442 RDMA/nes: Fix off-by-one in nes_reg_user_mr() error path
nes_reg_user_mr() should fail if page_count becomes >= 1024 * 512
rather than just testing for strict >, because page_count is
essentially used as an index into an array with 1024 * 512 entries, so
allowing the loop to continue with page_count == 1024 * 512 means that
memory after the end of the array is corrupted.  This leads to a crash
triggerable by a userspace application that requests registration of a
too-big region.

Also get rid of the call to pci_free_consistent() here to avoid
corrupting state with a double free, since the same memory will be
freed in the code jumped to at reg_user_mr_err.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-06-10 12:29:49 -07:00
Roland Dreier
4c0283fc56 IB/core: Remove IB_DEVICE_SEND_W_INV capability flag
In 2.6.26, we added some support for send with invalidate work
requests, including a device capability flag to indicate whether a
device supports such requests.  However, the support was incomplete:
the completion structure was not extended with a field for the key
contained in incoming send with invalidate requests.

Full support for memory management extensions (send with invalidate,
local invalidate, fast register through a send queue, etc) is planned
for 2.6.27.  Since send with invalidate is not very useful by itself,
just remove the IB_DEVICE_SEND_W_INV bit before the 2.6.26 final
release; we will add an IB_DEVICE_MEM_MGT_EXTENSIONS bit in 2.6.27,
which makes things simpler for applications, since they will not have
quite as confusing an array of fine-grained bits to check.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-06-09 09:58:42 -07:00
Roland Dreier
8079ffa0e1 IB/umem: Avoid sign problems when demoting npages to integer
On a 64-bit architecture, if ib_umem_get() is called with a size value
that is so big that npages is negative when cast to int, then the
length of the page list passed to get_user_pages(), namely

	min_t(int, npages, PAGE_SIZE / sizeof (struct page *))

will be negative, and get_user_pages() will immediately return 0 (at
least since 900cf086, "Be more robust about bad arguments in
get_user_pages()").  This leads to an infinite loop in ib_umem_get(),
since the code boils down to:

	while (npages) {
		ret = get_user_pages(...);
		npages -= ret;
	}

Fix this by taking the minimum as unsigned longs, so that the value of
npages is never truncated.

The impact of this bug isn't too severe, since the value of npages is
checked against RLIMIT_MEMLOCK, so a process would need to have an
astronomical limit or have CAP_IPC_LOCK to be able to trigger this,
and such a process could already cause lots of mischief.  But it does
let buggy userspace code cause a kernel lock-up; for example I hit
this with code that passes a negative value into a memory registartion
function where it is promoted to a huge u64 value.

Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-06-06 21:38:37 -07:00
Ralph Campbell
27676a3e16 IB/ipath: Fix SM trap forwarding
SM/SMA traps received by the ipath driver should be forwarded to the
SM if it is running on the host.  The ib_ipath driver was incorrectly
replying with "bad method."

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-06-06 11:23:29 -07:00
Joachim Fenkes
088af1543c IB/ehca: Reject send WRs only for RESET, INIT and RTR state
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-06-06 11:21:33 -07:00
Ralph Campbell
03031f71c7 IB/ipath: Fix device capability flags
The driver supports a few features (RNR NAK, port active event, SRQ
resize) that were not reported in the device capability flags.  This
patch fixes that.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-26 15:22:17 -07:00
Roland Dreier
e8ffef73c8 IB/ipath: Avoid test_bit() on u64 SDMA status value
Gabriel C <nix.or.die@googlemail.com> pointed out that when the x86
bitops are updated to operate on unsigned long, the code in
sdma_abort_task() will produce warnings:

    drivers/infiniband/hw/ipath/ipath_sdma.c: In function 'sdma_abort_task':
    drivers/infiniband/hw/ipath/ipath_sdma.c:267: warning: passing argument 2 of 'constant_test_bit' from incompatible pointer type

and so on, because it uses test_bit() to operation on a u64 value
(returned by ipath_read_kref64() for a hardware register).

Fix up these warnings by converting the test_bit() operations to &ing
with appropriate symbolic defines of the bits within the hardware
register.  This has the benign side-effect of making the code more
self-documenting as well.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-26 15:20:34 -07:00
Linus Torvalds
c2448278e3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/mad: Fix kernel crash when .process_mad() returns SUCCESS|CONSUMED
  IPoIB: Test for NULL broadcast object in ipiob_mcast_join_finish()
  MAINTAINERS: Add cxgb3 and iw_cxgb3 NIC and iWARP driver entries
  IB/mlx4: Fix creation of kernel QP with max number of send s/g entries
  IB/mthca: Fix max_sge value returned by query_device
  RDMA/cxgb3: Fix uninitialized variable warning in iwch_post_send()
  IB/mlx4: Fix uninitialized-var warning in mlx4_ib_post_send()
  IB/ipath: Fix UC receive completion opcode for RDMA WRITE with immediate
  IB/ipath: Fix printk format for ipath_sdma_status
2008-05-23 11:11:44 -07:00
Dave Olson
5a4f2b6752 IB/mad: Fix kernel crash when .process_mad() returns SUCCESS|CONSUMED
If a low-level driver returns IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_CONSUMED,
handle_outgoing_dr_smp() doesn't clean up properly.  The fix is to
kfree the local data and break, rather than falling through.  This was
observed with the ipath driver, but could happen with any driver.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1027>.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-23 10:52:59 -07:00
Mike Travis
5d7bfd0c4d infiniband: use performance variant for_each_cpu_mask_nr
Change references from for_each_cpu_mask to for_each_cpu_mask_nr
where appropriate

Reviewed-by: Paul Jackson <pj@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-23 18:39:06 +02:00
Jack Morgenstein
e1d50dce5a IPoIB: Test for NULL broadcast object in ipiob_mcast_join_finish()
We saw a kernel oops in our regression testing when a multicast "join
finish" occurred just after the interface was -- this is
<https://bugs.openfabrics.org/show_bug.cgi?id=1040>.  The test
randomly causes the HCA physical port to go down then up.

The cause of this is that ipoib_mcast_join_finish() processing happen
just after ipoib_mcast_dev_flush() was invoked (in which case the
broadcast pointer is NULL).  This patch tests for and handles the case
where priv->broadcast is NULL.

Cc: <stable@kernel.org>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-20 15:41:09 -07:00
Roland Dreier
cd155c1c7c IB/mlx4: Fix creation of kernel QP with max number of send s/g entries
When creating a kernel QP where the consumer asked for a send queue
with lots of scatter/gater entries, set_kernel_sq_size() incorrectly
returned an error if the send queue stride is larger than the
hardware's maximum send work request descriptor size.  This is not a
problem; the only issue is to make sure that the actual descriptors
used do not overflow the maximum descriptor size, so check this instead.

Clamp the returned max_send_sge value to be no bigger than what
query_device returns for the max_sge to avoid confusing hapless users,
even if the hardware is capable of handling a few more s/g entries.

This bug caused NFS/RDMA mounts to fail when the server adapter used
the mlx4 driver.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-20 14:00:02 -07:00
Greg Kroah-Hartman
6c06aec248 IB: fix race in device_create
There is a race from when a device is created with device_create() and
then the drvdata is set with a call to dev_set_drvdata() in which a
sysfs file could be open, yet the drvdata will be NULL, causing all
sorts of bad things to happen.

This patch fixes the problem by using the new function,
device_create_drvdata().

Cc: Kay Sievers <kay.sievers@vrfy.org>
Reviewed-by: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-05-20 13:31:55 -07:00
Franck Bui-Huu
82524746c2 rcu: split list.h and move rcu-protected lists into rculist.h
Move rcu-protected lists from list.h into a new header file rculist.h.

This is done because list are a very used primitive structure all over the
kernel and it's currently impossible to include other header files in this
list.h without creating some circular dependencies.

For example, list.h implements rcu-protected list and uses rcu_dereference()
without including rcupdate.h.  It actually compiles because users of
rcu_dereference() are macros.  Others RCU functions could be used too but
aren't probably because of this.

Therefore this patch creates rculist.h which includes rcupdates without to
many changes/troubles.

Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Josh Triplett <josh@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-19 10:01:37 +02:00
Roland Dreier
12103dca52 IB/mthca: Fix max_sge value returned by query_device
The mthca driver returns the maximum number of scatter/gather entries
returned by the firmware as the max_sge value when device properties
are queried.  However, the firmware also reports a limit on the
maximum descriptor size allowed, and because mthca takes into account
the worst case send request overhead when checking whether to allow a
QP to be created, the largest number of scatter/gather entries that
can be used with mthca may be limited by the maximum descriptor size
rather than just by the actual s/g entry limit.

This means that applications cannot actually create QPs with
max_send_sge equal to the limit returned by ib_query_device().  Fix
this by checking if the maximum descriptor size imposes a lower limit
and if so returning that lower limit.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-16 14:58:44 -07:00
Roland Dreier
21609ae3ef RDMA/cxgb3: Fix uninitialized variable warning in iwch_post_send()
drivers/infiniband/hw/cxgb3/iwch_qp.c: In function 'iwch_post_send':
    drivers/infiniband/hw/cxgb3/iwch_qp.c:232: warning: 't3_wr_flit_cnt' may be used uninitialized in this function

This is what akpm describes as "the dopey
gcc-doesn't-know-that-foo(&var)-writes-to-var problem."

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
2008-05-16 14:58:40 -07:00
Andrew Morton
a3d8e1591d IB/mlx4: Fix uninitialized-var warning in mlx4_ib_post_send()
drivers/infiniband/hw/mlx4/qp.c: In function 'mlx4_ib_post_send':
    drivers/infiniband/hw/mlx4/qp.c:1460: warning: 'seglen' may be used uninitialized in this function

This is the dopey gcc-doesn't-know-that-foo(&var)-writes-to-var problem.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-16 14:28:30 -07:00
Ralph Campbell
df3f0da8db IB/ipath: Fix UC receive completion opcode for RDMA WRITE with immediate
When I fixed the RC receive completion opcode in 2bfc8e9e ("IB/ipath:
Return the correct opcode for RDMA WRITE with immediate"), I forgot to
fix UC, which had the same problem for RDMA write with immediate
returning the wrong opcode.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-15 16:37:25 -07:00
Roland Dreier
cd80ec6f81 IB/ipath: Fix printk format for ipath_sdma_status
Commit f018c7e1 ("IB/ipath: Change ipath_devdata.ipath_sdma_status to be
unsigned long") changed ipath_sdma_status to be unsigned long, but left
a few debug messages that printed it out with a %016llx format, which
generates the warnings

    drivers/infiniband/hw/ipath/ipath_sdma.c:348: warning: format '%016llx' expects type 'long long unsigned int', but argument  3 has type 'long unsigned int'
    drivers/infiniband/hw/ipath/ipath_sdma.c:618: warning: format '%016llx' expects type 'long long unsigned int', but argument  3 has type 'long unsigned int'

Fix this by changing the format used to print out the value to %08lx
(8 hex digits are now sufficient, because the highest bit used is 31).

Warnings reported by Randy Dunlap <randy.dunlap@oracle.com>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-15 15:28:55 -07:00
Steve Wise
a58e58fafd RDMA/cxgb3: Wrap the software send queue pointer as needed on flush
cxio_flush_sq() was failing to wrap around the software send queue
causing garbage completion entries on a flush operation.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-13 11:52:55 -07:00
Roland Dreier
f018c7e177 IB/ipath: Change ipath_devdata.ipath_sdma_status to be unsigned long
Andrew Morton <akpm@linux-foundation.org> pointed out that bitops
should take an unsigned long * arg.  However, the ipath driver was
doing bitops on struct ipath_devdata.ipath_sdma_status, which is u64.
Change this member to unsigned long to avoid tons of warnings when x86
fixes the bitops to take unsigned long * instead of void *.

Also, change the IPATH_SDMA_RUNNING and IPATH_SDMA_SHUTDOWN bit
numbers to 30 and 31 (instead of 62 and 63) so that we're not setting
another booby trap for someone who tries to make ipath work on a
32-bit architecture.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-13 11:51:23 -07:00
Pavel Emelyanov
40d97692fb IB/ipath: Make ipath_portdata work with struct pid * not pid_t
The official reason is "with the presence of pid namespaces in the
kernel using pid_t-s inside one is no longer safe."

But the reason I fix this right now is the following:

About a month ago (when 2.6.25 was not yet released) there still was a
one last caller of a to-be-deprecated-soon function find_pid() - the
kill_proc() function, which in turn was only used by nfs callback
code.

During the last merge window, this last caller was finally eliminated
by some NFS patch(es) and I was about to finally kill this kill_proc()
and find_pid(), but found, that I was late and the kill_proc is now
called from the ipath driver since commit 58411d1c ("IB/ipath: Head of
Line blocking vs forward progress of user apps").

So here's a patch that fixes this code to use struct pid * and (!)
the kill_pid routine.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-13 11:45:32 -07:00
Ralph Campbell
74116f580b IB/ipath: Fix RDMA read response sequence checking
If an out of sequence RDMA read response middle or last packet is
received, we should only resend the RDMA read request on the first
out of sequence packet and drop subsequent out of sequence packets
otherwise, we get "too many retries".

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-13 11:42:20 -07:00
Ralph Campbell
e509be898d IB/ipath: Fix many locking issues when switching to error state
The send DMA hardware queue voided a number of prior assumptions about
when a send is complete which led to completions being generated out of
order.  There were also a number of locking issues when switching the QP
to the error or reset states, and we implement the IB_QPS_SQD state.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-13 11:41:29 -07:00
Ralph Campbell
53dc1ca194 IB/ipath: Fix RC and UC error handling
When errors are detected in RC, the QP should transition to the
IB_QPS_ERR state, not the IB_QPS_SQE state. Also, when the error is on
the responder side, the receive work completion error was incorrect
(remote vs. local).

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-13 11:40:25 -07:00
Roland Dreier
dd37818dbd RDMA/nes: Fix up nes_lro_max_aggr module parameter
Fix some bugs with the max_aggr module parameter added with LRO support:

 - The module parameter value ignored and not actually used to set
   lro_mgr.max_aggr.
 - MODULE_PARM_DESC had a typo "_mro_" instead of "_lro_" so it didn't
   end up describing the actual module parameter.
 - The nes_lro_max_aggr variable was declared as unsigned, but the
   module_param line said "int" instead of "uint" for the type.
 - The default value for the parameter was stuck in the permissions
   field of module_param, which led to nonsensical permissions for the
   file under /sys/module/iw_nes/param.
 - The parameter was used in only one file but defined in another, which
   led to the variable being global for no good reason.  Move everything
   related to the parameter to the file nes_hw.c where it is actually
   used.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-13 11:27:25 -07:00
Stefan Roscher
12137c593d IB/ehca: Wait for async events to finish before destroying QP
This is necessary because, in a multicore environment, a race between
uverbs async handler and destroy QP could occur.

Signed-off-by: Stefan Roscher <stefan.roscher at de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-07 11:35:06 -07:00
John Gregor
ab69b3cf12 IB/ipath: Fix SDMA error recovery in absence of link status change
What's fixed:

    in ipath_cancel_sends()

        We need to unconditionally set ABORTING.  So, swap the tests
        so the set_bit() isn't shadowed by the &&.

        If we've disarmed the piobufs, then we need to unconditionally
        set DISARMED.  So, move it out from the overly protective if
        at the bottom.

    in sdma_abort_task()

        Abort_task was written knowing that the SDMA engine would always
        be reset (and restarted) on error.  A recent change broke that
        fundamental assumption by taking the restart portion and making
        it conditional on a link status change.  But, SDMA can go boom
        without a link status change in some conditions.

Signed-off-by: John Gregor <john.gregor@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-07 11:01:10 -07:00
Dave Olson
e2ab41cae4 IB/ipath: Need to always request and handle PIO avail interrupts
Now that we always use PIO for vl15 on 7220, we could get stuck forever
if we happened to run out of PIO buffers from the verbs code, because
the setup code wouldn't run; the interrupt was also ignored if SDMA was
supported.  We also have to reduce the pio update threshold if we have
fewer kernel buffers than the existing threshold.

Clean up the initialization a bit to get ordering safer and more
sensible, and use the existing ipath_chg_kernavail call to do init,
rather than doing it separately.

Drop unnecessary clearing of pio buffer on pio parity error.

Drop incorrect updating of pioavailshadow when exitting freeze mode
(software state may not match chip state if buffer has been allocated
and not yet written).

If we couldn't get a kernel buffer for a while, make sure we are
in sync with hardware, mainly to handle the exitting freeze case.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-07 11:00:15 -07:00
Michael Albaugh
2889d1ef12 IB/ipath: Fix count of packets received by kernel
The loop in ipath_kreceive() that processes packets increments the
loop-index 'i' once too often, because the exit condition does not
depend on it, and is checked after the increment. By adding a check for
!last to the iterator in the for loop, we correct that in a way that is
not so likely to be re-broken by changes in the loop body.

Signed-off-by: Michael Albaugh <micheal.albaugh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-07 10:59:23 -07:00
Ralph Campbell
2bfc8e9edf IB/ipath: Return the correct opcode for RDMA WRITE with immediate
This patch fixes a bug in the RC responder which generates a completion
entry with the wrong opcode when an RDMA WRITE with immediate is received.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-07 10:58:50 -07:00
Dave Olson
b4d390d8d2 IB/ipath: Fix bug that can leave sends disabled after freeze recovery
The semantics of cancel_sends changed, but the code using it was missed.
Don't leave sends and pioavail updates disabled, and add a comment as to
why the force update is needed.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-07 10:57:48 -07:00
Ralph Campbell
6e87d15007 IB/ipath: Only increment SSN if WQE is put on send queue
If a send work request has immediate errors and is not put on the
send queue, we shouldn't update any of the QP state.

The increment of the SSN wasn't obeying this.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-07 10:57:14 -07:00
Michael Albaugh
5f51efc195 IB/ipath: Only warn about prototype chip during init
We warn about prototype chips, but the function that checks for
support is also called as a result of a get_portinfo request, which
can clutter the logs.

Restrict warning to only appear during initialization.

Signed-off-by: Michael Albaugh <michael.albaugh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-07 10:56:47 -07:00
Roland Dreier
273748cc90 RDMA/cxgb3: Fix severe limit on userspace memory registration size
Currently, iw_cxgb3 is severely limited on the amount of userspace
memory that can be registered in in a single memory region, which
causes big problems for applications that expect to be able to
register 100s of MB.

The problem is that the driver uses a single kmalloc()ed buffer to
hold the physical buffer list (PBL) for the entire memory region
during registration, which means that 8 bytes of contiguous memory are
required for each page of memory being registered.  For example, a 64
MB registration will require 128 KB of contiguous memory with 4 KB
pages, and it unlikely that such an allocation will succeed on a busy
system.

This is purely a driver problem: the temporary page list buffer is not
needed by the hardware, so we can fix this by writing the PBL to the
hardware in page-sized chunks rather than all at once.  We do this by
splitting the memory registration operation up into several steps:

 - Allocate PBL space in adapter memory for the full registration
 - Copy PBL to adapter memory in chunks
 - Allocate STag and enable memory region

This also allows several other cleanups to the __cxio_tpt_op()
interface and related parts of the driver.

This change leaves the reregister memory region and memory window
operations broken, but they already didn't work due to other
longstanding bugs, so fixing them will be left to a later patch.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-06 15:56:22 -07:00
Roland Dreier
0e9913362a RDMA/cxgb3: Don't add PBL memory to gen_pool in chunks
Current iw_cxgb3 code adds PBL memory to the driver's gen_pool in 2 MB
chunks.  This limits the largest single allocation that can be done to
the same size, which means that with 4 KB pages, each of which takes 8
bytes of PBL memory, the largest memory region that can be allocated
is 1 GB (256K PBL entries * 4 KB/entry).

Remove this limit by adding all the PBL memory in a single gen_pool
chunk, if possible.  Add code that falls back to smaller chunks if
gen_pool_add() fails, which can happen if there is not sufficient
contiguous lowmem for the internal gen_pool bitmap.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-06 15:03:38 -07:00
Stefan Roscher
cf04690885 IB/ehca: Fix function return types
Also remove duplicate assignment of local_ca_ack_delay and change
min_t check for local_ca_ack_delay to u8 instead of int.

Signed-off-by: Stefan Roscher <stefan.roscher at de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-05 15:51:49 -07:00
Steve Wise
77a8d5741f RDMA/cxgb3: Bump up the MPA connection setup timeout.
Testing on large clusters shows its way too short at 10 secs.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-02 10:57:09 -07:00
Steve Wise
c4d49776e8 RDMA/cxgb3: Silently ignore close reply after abort.
Remove bad BUG_ON() that can trigger in correct operation from
close_con_rpl().  It is possible to get a close_rpl message on a dead
connection.  The sequence is:

	- host refs ep for close exchange
	- host posts close_req
	- hw posts PEER_ABORT from incoming RST
	- host marks ep DEAD
	- host posts ABORT_RPL and releases ep resources
	- hw posts CLOSE_RPL
	- host derefs ep and ep freed.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-02 10:57:09 -07:00
Steve Wise
c8286944b8 RDMA/cxgb3: QP flush fixes
- Flush the QP only after the HW disables the connection.  Currently
  we flush the QP when transitioning to CLOSING.  This exposes a race
  condition where the HW can complete a RECV WR, for instance, -and-
  the SW can flush that same WR.

- Only call CQ event handlers on flush IFF we actually flushed something.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-02 10:56:57 -07:00
Eli Cohen
57ce41d1d1 IB/ipoib: Fix transmit queue stalling forever
Commit f56bcd80 ("IPoIB: Use separate CQ for UD send completions")
introduced a bug where the transmit queue could get stopped and never
woken up.  The problem is that send completions are only polled at the
end of the xmit function, so if the send queue fills up and the xmit
path stops the queue, then there is no way for send completions to
ever get polled, and so the transmit queue stays stopped forever.

Fix this by arming the send CQ just before posting the last send
request that fills the send queue.  Then, when the completion event
handler is called, drain the send CQ.  Since it is possible that not
enough send completions are in the CQ, verify that the the net queue
has been woken up after draining the send CQ, and if not arm a timer
and drain again at the timer function.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-30 20:02:45 -07:00
Roland Dreier
3ae15e1623 IB/mlx4: Fix off-by-one errors in calls to mlx4_ib_free_cq_buf()
When I merged bbf8eed1 ("IB/mlx4: Add support for resizing CQs") I
changed things around so that mlx4_ib_alloc_cq_buf() and
mlx4_ib_free_cq_buf() were used everywhere they could be.  However, I
screwed up the number of entries passed into mlx4_ib_alloc_cq_buf()
in a couple places -- the function bumps the number of entries
internally, so the caller shouldn't add 1 as well.

Passing a too-big value for the number of entries to mlx4_ib_free_cq_buf()
can cause the cleanup to go off the end of an array and corrupt
allocator state in interesting ways.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-30 19:52:55 -07:00
Glenn Streiff
7495ab6837 RDMA/nes: Formatting cleanup
Various cleanups:
	- Change // to /* .. */
	- Place whitespace around binary operators.
	- Trim down a few long lines.
	- Some minor alignment formatting for better readability.
	- Remove some silly tabs.

Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:54 -07:00
Eric Schneider
0e1de5d62e RDMA/nes: Add support for SFP+ PHY
This patch enables the iw_nes module for NetEffect RNICs to support
additional PHYs including SFP+ (referred to as ARGUS in the code).

Signed-off-by: Eric Schneider <eric.schneider@neteffect.com>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:54 -07:00
Faisal Latif
37dab4112d RDMA/nes: Use LRO
Signed-off-by: Faisal Latif <flatif@neteffect.com.
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:54 -07:00
Eli Cohen
b4132efa1a IPoIB: Copy child MTU from parent
When creating a child interface, copy the MTU information from the
parent.  Otherwise when the child's multicast join completes, the MTU
will not be updated since the code does

	dev->mtu = min(priv->mcast_mtu, priv->admin_mtu);

and priv->admin_mtu will be set to 0.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:53 -07:00
Roland Dreier
baaad380c0 IB/mthca: Avoid changing userspace ABI to handle DMA write barrier attribute
Commit cb9fbc5c ("IB: expand ib_umem_get() prototype") changed the
mthca userspace ABI to provide a way for userspace to indicate which
memory regions need the DMA write barrier attribute.  However, it is
possible to handle this without breaking existing userspace, by having
the mthca kernel driver recognize whether it is talking to old or new
userspace, depending on the size of the register MR structure passed in.

The only potential drawback of this is that is allows old userspace
(which has a bug with DMA ordering on large SGI Altix systems) to
continue to run on new kernels, but the advantage of allowing old
userspace to continue to work on unaffected systems seems to outweigh
this, and we can print a warning to push people to upgrade their
userspace.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:53 -07:00
Olaf Kirch
0bfe151cc4 IB/mthca: Avoid recycling old FMR R_Keys too soon
When a FMR is unmapped, mthca resets the map count to 0, and clears
the upper part of the R_Key which is used as the sequence counter.

This poses a problem for RDS, which uses ib_fmr_unmap as a fence
operation.  RDS assumes that after issuing an unmap, the old R_Keys
will be invalid for a "reasonable" period of time. For instance,
Oracle processes uses shared memory buffers allocated from a pool of
buffers.  When a process dies, we want to reclaim these buffers -- but
we must make sure there are no pending RDMA operations to/from those
buffers.  The only way to achieve that is by using unmap and sync the
TPT.

However, when the sequence count is reset on unmap, there is a high
likelihood that a new mapping will be given the same R_Key that was
issued a few milliseconds ago.

To prevent this, don't reset the sequence count when unmapping a FMR.

Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:53 -07:00
Stefan Roscher
d227fa7288 IB/ehca: Allocate event queue size depending on max number of CQs and QPs
If a lot of QPs fall into Error state at once and the EQ of the
respective HCA is too small, it might overrun, causing the eHCA driver
to stop processing completion events and calling the application's
completion handlers, effectively causing traffic to stop.

Fix this by limiting available QPs and CQs to a customizable max
count, and determining EQ size based on these counts and a worst-case
assumption.

Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:53 -07:00
Eli Cohen
f56bcd8013 IPoIB: Use separate CQ for UD send completions
Use a dedicated CQ for UD send completions. Also, do not arm the UD
send CQ, which reduces the number of interrupts generated.  This patch
farther reduces overhead by not calling poll CQ for every posted send
WR -- it does polls only when there 16 or more outstanding work requests.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:53 -07:00
Eli Dorfman
87528227df IB/iser: Count FMR alignment violations per session
Count FMR alignment violations per session as part of the iscsi
statistics.

Signed-off-by: Eli Dorfman <elid@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:52 -07:00
Eli Dorfman
6f735e36ba IB/iser: Move high-volume debug output to higher debug level
Add another level for debug.

Signed-off-by: Eli Dorfman <elid@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:52 -07:00
Hoang-Nam Nguyen
7df109d917 IB/ehca: handle negative return value from ibmebus_request_irq() properly
ehca_create_eq() was assigning a signed return value to an unsiged
local variable and then checking if the variable was < 0, which meant
that errors were always ignored.  Fix this by using one variable for
signed integer return values and another for u64 hcall return values.

Bug originally found by Roel Kluin <12o3l@tiscali.nl>.

Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:52 -07:00
Steve Wise
f8b0dfd152 RDMA/cxgb3: Support peer-2-peer connection setup
Open MPI, Intel MPI and other applications don't respect the iWARP
requirement that the client (active) side of the connection send the
first RDMA message.  This class of application connection setup is
called peer-to-peer.  Typically once the connection is setup, _both_
sides want to send data.

This patch enables supporting peer-to-peer over the chelsio RNIC by
enforcing this iWARP requirement in the driver itself as part of RDMA
connection setup.

Connection setup is extended, when the peer2peer module option is 1,
such that the MPA initiator will send a 0B Read (the RTR) just after
connection setup.  The MPA responder will suspend SQ processing until
the RTR message is received and reply-to.

In the longer term, this will be handled in a standardized way by
enhancing the MPA negotiation so peers can indicate whether they
want/need the RTR and what type of RTR (0B read, 0B write, or 0B send)
should be sent.  This will be done by standardizing a few bits of the
private data in order to negotiate all this.  However this patch
enables peer-to-peer applications now and allows most of the required
firmware and driver changes to be done and tested now.

Design:

 - Add a module option, peer2peer, to enable this mode.

 - New firmware support for peer-to-peer mode:

	- a new bit in the rdma_init WR to tell it to do peer-2-peer
	  and what form of RTR message to send or expect.

	- process _all_ preposted recvs before moving the connection
	  into rdma mode.

	- passive side: defer completing the rdma_init WR until all
	  pre-posted recvs are processed.  Suspend SQ processing until
	  the RTR is received.

	- active side: expect and process the 0B read WR on offload TX
	  queue. Defer completing the rdma_init WR until all
	  pre-posted recvs are processed.  Suspend SQ processing until
	  the 0B read WR is processed from the offload TX queue.

 - If peer2peer is set, driver posts 0B read request on offload TX
   queue just after posting the rdma_init WR to the offload TX queue.

 - Add CQ poll logic to ignore unsolicitied read responses.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:52 -07:00
Steve Wise
ccaf10d0ad RDMA/cxgb3: Set the max_mr_size device attribute correctly
cxgb3 only supports 4GB memory regions.  The lustre RDMA code uses
this attribute and currently has to code around our bad setting.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:52 -07:00
Steve Wise
989a178069 RDMA/cxgb3: Correctly serialize peer abort path
Open MPI and other stress testing exposed a few bad bugs in handling
aborts in the middle of a normal close.  Fix these by:

 - serializing abort reply and peer abort processing with disconnect
   processing

 - warning (and ignoring) if ep timer is stopped when it wasn't running

 - cleaning up disconnect path to correctly deal with aborting and
   dead endpoints

 - in iwch_modify_qp(), taking a ref on the ep before releasing the qp
   lock if iwch_ep_disconnect() will be called.  The ref is dropped
   after calling disconnect.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:51 -07:00
Yevgeny Petrilin
e463c7b197 mlx4_core: Add a way to set the "collapsed" CQ flag
Extend the mlx4_cq_resize() API with a way to set the "collapsed" flag
for the CQ being created.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:50 -07:00
Arthur Kepner
cb9fbc5c37 IB: expand ib_umem_get() prototype
Add a new parameter, dmasync, to the ib_umem_get() prototype.  Use dmasync = 1
when mapping user-allocated CQs with ib_umem_get().

Signed-off-by: Arthur Kepner <akepner@sgi.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: David Miller <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:12 -07:00
Roland Dreier
31d1e340f0 RDMA/nes: Remove volatile qualifier from struct nes_hw_cq.cq_vbase
Remove the volatile qualifier from the cq_vbase member of struct
nes_hw_cq, and add an rmb() in the one place where it looks like
access order might make a difference.  As usual, removing a volatile
qualifier in a declaration is actually a bug fix, since a volatile
qualifier is not sufficient to make sure that aggressively
out-of-order CPUs don't reorder things and cause incorrect results.

For example, a CPU might speculatively execute reads of other cqe
fields before the NIC hardware has written those fields and before it
has set the NES_CQE_VALID bit (even though those reads come after the
test of the NES_CQE_VALID bit in program order), but then when the CPU
actually executes the conditional test of the NES_CQE_VALID, the bit
has been set, and the CPU will proceed with the results of the earlier
speculative execution and end up using bogus data.

This also gets rid of the warning:

    drivers/infiniband/hw/nes/nes_verbs.c: In function 'nes_destroy_cq':
    drivers/infiniband/hw/nes/nes_verbs.c:1978: warning: passing argument 3 of 'pci_free_consistent' discards qualifiers from pointer target type

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-23 11:55:45 -07:00
Yevgeny Petrilin
6296883ca4 mlx4_core: Move kernel doorbell management into core
In addition to mlx4_ib, there will be ethernet and FC consumers of
mlx4_core, so move the code for managing kernel doorbells into the
core module to avoid having to duplicate this multiple times.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-23 11:55:45 -07:00
Joachim Fenkes
14fb05b349 IB/ehca: Bump version number to 0026
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-23 11:55:45 -07:00
Joachim Fenkes
0455e36d81 IB/ehca: Make some module parameters bool, update descriptions
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-23 11:55:45 -07:00
Joachim Fenkes
a7607c9b11 IB/ehca: Remove mr_largepage parameter
Always enable large page support; didn't seem to cause problems for anyone.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-23 11:55:45 -07:00
Joachim Fenkes
4da27d6d5b IB/ehca: Move high-volume debug output to higher debug levels
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-23 11:55:45 -07:00
Joachim Fenkes
863fb09fbf IB/ehca: Prevent posting of SQ WQEs if QP not in RTS
...as required by IB Spec, C10-29.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-23 11:55:45 -07:00
Shirley Ma
bc7b3a36ba IPoIB: Handle 4K IB MTU for UD (datagram) mode
This patch enables IPoIB to use 4K UD messages (when the underlying
device and fabrics support a 4K MTU) by using two scatter buffers when
PAGE_SIZE is less than or equal to thhe HCA IB MTU size.  The first
buffer is for IPoIB header + GRH header, and the second buffer is the
IPoIB payload, which is 4K-4.

Signed-off-by: Shirley Ma <xma@us.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-23 11:55:45 -07:00
Chien Tung
bc5698f3ec RDMA/nes: Fix adapter reset after PXE boot
After PXE boot, the iw_nes driver does a full reset to ensure the card
is in a clean state.  However, it doesn't wait for firmware to
complete its work before issuing a port reset to enable the ports,
which leads to problems bringing up the ports.

The solution is to wait for firmware to complete its work before
proceeding with port reset.

This bug was flagged by Roland Dreier <rolandd@cisco.com>.

Cc: <stable@kernel.org>
Signed-off-by: Chien Tung <ctung@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-23 11:55:45 -07:00
Roland Dreier
e447703123 RDMA/nes: Print IPv4 addresses in a readable format
Use NIPQUAD_FMT instead of printing raw 32-bit hex quantities in
debugging output.

Acked-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-23 11:55:43 -07:00
Roland Dreier
2bd01c5d2e RDMA/nes: Use print_mac() to format ethernet addresses for printing
Removing open-coded MAC formats shrinks the source and the generated
code too, eg on x86-64:

add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-103 (-103)
function                                     old     new   delta
make_cm_node                                 932     912     -20
nes_netdev_set_mac_address                   427     406     -21
nes_netdev_set_multicast_list               1148    1124     -24
nes_probe                                   2349    2311     -38

Acked-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-23 11:52:18 -07:00
Roland Dreier
bc751fe6ff IB/ipath: Correct capitalization "IntX" -> "INTx"
Match what the PCI specification uses.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-21 18:19:15 -07:00
Roland Dreier
44957572cc IB/ipath: Remove tests of PCI_MSI in ipath_iba7220.c
The PCI MSI interface is stubbed out properly so that all the
functions just return failure if PCI_MSI=n, so there's no reason to
have "#ifdef CONFIG_PCI_MSI" blocks in ipath_iba7220.c.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-21 18:19:15 -07:00
Roland Dreier
480f58e614 IB/ipath: Remove dependency on PCI_MSI || HT_IRQ
Before IBA7220 support was added, the ipath driver didn't support any
hardware unless PCI_MSI and/or HT_IRQ was enabled.  However, the
IBA7220 can generate INTx interrupts, so it makes sense to allow the
driver to be build even if PCI_MSI=n and HT_IRQ=n.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-21 18:19:14 -07:00
Roland Dreier
37a6ab5227 IB/ipath: Build IBA7220 code unconditionally
The new IBA7220 code added a call to ipath_init_iba7220_funcs() that
is compiled unconditionally, but only built the IBA7220 code if
PCI_MSI is enabled.  Fix this by building the IBA7220 file
unconditonally.

This fixes build breakage when PCI_MSI=n, HT_IRQ=y and
INFINIBAND_IPATH=y reported by Ingo Molnar <mingo@elte.hu>:

 drivers/built-in.o: In function `ipath_init_one':
 ipath_driver.c:(.devinit.text+0x1e5bc): undefined reference to `ipath_init_iba7220_funcs'

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-21 18:19:14 -07:00
Roland Dreier
88a8317bcd IB/ipath: Remove reference to dev->class_dev
Commit 124b4dcb ("IB/ipath: add calls to new 7220 code and enable in                                
build") inadvertently added core to set dev->class_dev.dev back into                                
ib_ipath.  This is completely redundant since commit 1912ffbb ("IB: Set                             
class_dev->dev in core for nice device symlink"), which removed                                     
class_dev setting from low-level drivers, and also will break the build
when class_dev is removed completely from struct ib_device.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-21 18:19:14 -07:00
Paul Bolle
9862874d21 IB/ipath: Fix module parameter description for disable_sma
Describe disable_sma parameter with its name rather than the internal
ib_ipath_disable_sma variable name, so that the description shows up
properly in modinfo.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Acked-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-21 18:19:13 -07:00
Roland Dreier
6a5546e76c RDMA/nes: Remove unneeded function declarations
Remove redundant static declarations of functions that are defined
before they are used in the source.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-21 18:19:12 -07:00
Linus Torvalds
e80ab411e5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6: (36 commits)
  SCSI: convert struct class_device to struct device
  DRM: remove unused dev_class
  IB: rename "dev" to "srp_dev" in srp_host structure
  IB: convert struct class_device to struct device
  memstick: convert struct class_device to struct device
  driver core: replace remaining __FUNCTION__ occurrences
  sysfs: refill attribute buffer when reading from offset 0
  PM: Remove destroy_suspended_device()
  Firmware: add iSCSI iBFT Support
  PM: Remove legacy PM (fix)
  Kobject: Replace list_for_each() with list_for_each_entry().
  SYSFS: Explicitly include required header file slab.h.
  Driver core: make device_is_registered() work for class devices
  PM: Convert wakeup flag accessors to inline functions
  PM: Make wakeup flags available whenever CONFIG_PM is set
  PM: Fix misuse of wakeup flag accessors in serial core
  Driver core: Call device_pm_add() after bus_add_device() in device_add()
  PM: Handle device registrations during suspend/resume
  block: send disk "change" event for rescan_partitions()
  sysdev: detect multiple driver registrations
  ...

Fixed trivial conflict in include/linux/memory.h due to semaphore header
file change (made irrelevant by the change to mutex).
2008-04-21 15:49:58 -07:00
Tony Jones
ee959b00c3 SCSI: convert struct class_device to struct device
It's big, but there doesn't seem to be a way to split it up smaller...

Signed-off-by: Tony Jones <tonyj@suse.de>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-19 19:10:33 -07:00
Greg Kroah-Hartman
0532193746 IB: rename "dev" to "srp_dev" in srp_host structure
This sets us up to be able to convert the srp_host to use a struct
device instead of a class_device.

Based on a original patch from Tony Jones, but split up into this piece
by Greg.

Signed-off-by: Tony Jones <tonyj@suse.de>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Reviewed-by: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-19 19:10:30 -07:00
Tony Jones
f4e91eb4a8 IB: convert struct class_device to struct device
This converts the main ib_device to use struct device instead of struct
class_device as class_device is going away.

Signed-off-by: Tony Jones <tonyj@suse.de>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-19 19:10:30 -07:00
Matthew Wilcox
6188e10d38 Convert asm/semaphore.h users to linux/semaphore.h
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
2008-04-18 22:22:54 -04:00
Matthew Wilcox
d3135846f6 drivers: Remove unnecessary inclusions of asm/semaphore.h
None of these files use any of the functionality promised by
asm/semaphore.h.  It's possible that they rely on it dragging in some
unrelated header file, but I can't build all these files, so we'll have
fix any build failures as they come up.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
2008-04-18 22:16:32 -04:00
Erez Zilber
0a22ab92f5 IB/iser: Don't change itt endianness
The itt field in struct iscsi_data is not defined with any particular
endianness.  open-iscsi should use it as-is without byte-swapping it.
This fixes sparse warnings coming from doing ntohl(hdr->itt).

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:35 -07:00
Jack Morgenstein
068c4ea1bb IB/mlx4: Update module version and release date
The mlx4_ib driver is stable enough for production use, so bump the
version number to 1.0 to indicate this.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:35 -07:00
Roland Dreier
9fdd5e5bf6 IPoIB: Handle case when P_Key is deleted and re-added at same index
If a P_Key is deleted and then re-added at the same index, then IPoIB
gets confused because __ipoib_ib_dev_flush() only checks whether the
index is the same without checking whether the P_Key was present, so
the interface is stopped when the P_Key is deleted, but the event when
the P_Key is re-added gets ignored and the interface never gets
restarted.

Also, switch to using ib_find_pkey() instead of ib_find_cached_pkey()
everywhere in IPoIB, since none of the places that look for P_Keys are
in a fast path or in non-sleeping context, and in general we want to
kill off the whole caching infrastructure eventually.  This also fixes
consistency problems caused because some IPoIB queries were cached and
some were uncached during the window where the cache was not updated.

Thanks to Venkata Subramonyam <vsubramo@cisco.com> for debugging this
problem and testing this fix.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:35 -07:00
Erez Zilber
d97c51707d IB/iser: Release connection resources on RDMA_CM_EVENT_DEVICE_REMOVAL event
When a RDMA_CM_EVENT_DEVICE_REMOVAL event is raised, iSER should
release the connection resources.

This is necessary when the IB HCA module is unloaded while open-iscsi
is still running.  Currently, iSER just BUG()s.

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:35 -07:00
Stefan Roscher
c83b5b1cb2 IB/ehca: Support all ibv_devinfo values in query_device() and query_port()
Also, introduce a few inline helper functions to make the code more readable.

Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:35 -07:00
Roland Dreier
4cd1e5eb3c RDMA/nes: Free IRQ before killing tasklet
Move the free_irq() call in nes_remove() to before the tasklet_kill();
otherwise there is a window after tasklet_kill() where a new interrupt
can be handled and reschedule the tasklet, leading to a use-after-free
crash.

Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:34 -07:00
Jack Morgenstein
940801b27e IB/mthca: Update module version and release date
The ib_mthca driver has been stable for a while, so bump the version
number to 1.0 to indicate this.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:34 -07:00
Dotan Barak
0df6703095 IB/mlx4: Update QP state if query QP succeeds
If the QP was moved to another state (such as SQE) by the hardware,
then after this change the user won't have to set the IBV_QP_CUR_STATE
mask in order to execute modify QP in order to recover from this state.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:34 -07:00
Dotan Barak
5121df3ae4 IB/mthca: Update QP state if query QP succeeds
If the QP was moved to another state (such as SQE) by the hardware,
then after this change the user won't have to set the IBV_QP_CUR_STATE
mask in order to execute modify QP in order to recover from this state.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:34 -07:00
Tom Tucker
9285faa1e7 RDMA/amso1100: Add check for NULL reply_msg in c2_intr()
Fix a place where we might dereference a NULL pointer; this fixes
Coverity CID 1392.  On inspection I also found a place where we could
attempt to kmem_cache_free() a NULL pointer, so fix this too.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:34 -07:00
Vladimir Sokolovsky
bbf8eed1a0 IB/mlx4: Add support for resizing CQs
Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:33 -07:00
Eli Cohen
3fdcb97f0b IB/mlx4: Add support for modifying CQ moderation parameters
Signed-off-by: Eli Cohen <eli@mellnaox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:33 -07:00
Eli Cohen
28d52b3cd8 IPoIB: Support modifying IPoIB CQ event moderation
This can be used to tune at run time the parameters controlling the
event (interrupt) generation rate and thus reduce the overhead
incurred by handling interrupts resulting in better throughput.  Since
IPoIB uses a single CQ for both RX and TX, RX is chosen to dictate
configuration for both RX and TX.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:33 -07:00
Eli Cohen
2dd5716227 IB/core: Add support for modify CQ
Add support for modifying CQ parameters for controlling event
generation moderation.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:33 -07:00
Eli Cohen
82c24c18af IPoIB: Add basic ethtool support
Just add the infrastructure so we can add functionality later.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:32 -07:00
Roland Dreier
139b2db795 RDMA/amso1100: Add support for "send with invalidate" work requests
Handle IB_WR_SEND_WITH_INV work requests.

This resurrects a patch sent long ago by Mikkel Hagen <mhagen@iol.unh.edu>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:32 -07:00
Roland Dreier
0f39cf3d54 IB/core: Add support for "send with invalidate" work requests
Add a new IB_WR_SEND_WITH_INV send opcode that can be used to mark a
"send with invalidate" work request as defined in the iWARP verbs and
the InfiniBand base memory management extensions.  Also put "imm_data"
and a new "invalidate_rkey" member in a new "ex" union in struct
ib_send_wr. The invalidate_rkey member can be used to pass in an
R_Key/STag to be invalidated.  Add this new union to struct
ib_uverbs_send_wr.  Add code to copy the invalidate_rkey field in
ib_uverbs_post_send().

Fix up low-level drivers to deal with the change to struct ib_send_wr,
and just remove the imm_data initialization from net/sunrpc/xprtrdma/,
since that code never does any send with immediate operations.

Also, move the existing IB_DEVICE_SEND_W_INV flag to a new bit, since
the iWARP drivers currently in the tree set the bit.  The amso1100
driver at least will silently fail to honor the IB_SEND_INVALIDATE bit
if passed in as part of userspace send requests (since it does not
implement kernel bypass work request queueing).  Remove the flag from
all existing drivers that set it until we know which ones are OK.

The values chosen for the new flag is not consecutive to avoid clashing
with flags defined in the XRC patches, which are not merged yet but
which are already in use and are likely to be merged soon.

This resurrects a patch sent long ago by Mikkel Hagen <mhagen@iol.unh.edu>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:32 -07:00
Ralph Campbell
e7eacd3686 IB/ipath: Update copyright dates for files changed in 2008
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:32 -07:00
Dave Olson
124b4dcb1d IB/ipath: add calls to new 7220 code and enable in build
This patch adds the initialization calls into the new 7220 HCA files,
changes the Makefile to compile and link the new files, and code to
handle send DMA.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:32 -07:00
Arthur Jones
bb9171448d IB/ipath: Misc changes to prepare for IB7220 introduction
The patch adds a number of minor changes to support newer HCAs:
 - New send buffer control bits
 - New error condition bits
 - Locking and initialization changes
 - More send buffers

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:31 -07:00
Arthur Jones
8babfa4fb9 IB/ipath: User mode send DMA
A new file which allows the IBA7220 send DMA engine to be used from
userland.  The routines here are not linked in yet, that will happen in
a follow-on patch...

Signed-off-by: Arthur Jones <arthur.jones@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:31 -07:00
Arthur Jones
909c0faa8f IB/ipath: User mode send DMA header file
A new header file which allows the IBA7220 send DMA engine to be used
from userland.  The definitions here are not used yet, that will happen
in a follow-on patch...

Signed-off-by: Arthur Jones <arthur.jones@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:31 -07:00
John Gregor
f7a60d71af IB/ipath: Add code for IBA7220 send DMA
The IBA7220 HCA has a new feature to DMA data to the on chip send
buffers instead of or in addition to the host CPU doing the data
transfer.  This patch adds code to support the send DMA queue.

Signed-off-by: John Gregor <john.gregor@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:31 -07:00
Ralph Campbell
2c19643563 IB/ipath: Add IBA7220-specific SERDES initialization data
This patch adds binary data to initialize the IB SERDES.

Signed-off-by: Michael Albaugh <Michael.Albaugh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:31 -07:00
Michael Albaugh
ab0fb2e049 IB/ipath: Support for SerDes portion of IBA7220
The control and initialization of the SerDes blocks of the IBA7220 is
sufficiently complex to merit a separate file.

Signed-off-by: Michael Albaugh <Michael.Albaugh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:31 -07:00
Ralph Campbell
843e6ab489 IB/ipath: HCA-specific code to support IBA7220
This patch adds the HCA-specific code for the IBA7220 HCA.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:30 -07:00
Michael Albaugh
dd042d59c1 IB/ipath: Isolate 7220-specific content
This patch adds a new ASIC-specific header file for the HCAs using the IBA7220.

Signed-off-by: Michael Albaugh <Michael.Albaugh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:30 -07:00
Ralph Campbell
afce688ba9 IB/ipath: Header file changes to support IBA7220
This is part of a patch series to add support for a new HCA.  This patch
adds new fields to the header files.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:30 -07:00
Ralph Campbell
6bb68835d3 IB/ipath: Fix up error handling
This patch makes chip reset more robust and reduces lock contention
between user and kernel TID register updates.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:30 -07:00
Dave Olson
9b436eb4f8 IB/ipath: Fix check for no interrupts to reliably fallback to INTx
Newer HCAs support MSI interrupts and also INTx interrupts.  Fix the
code so that INTx can be reliably enabled if MSI interrupts are not
working.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:30 -07:00
Dave Olson
1d7c2e529f IB/ipath: Enable reduced PIO update for HCAs that support it.
Newer HCAs have a threshold counter to reduce the number of DMAs the
chip makes to update the PIO buffer availability status bits.  This
patch enables the feature.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:30 -07:00
Dave Olson
0ab6b2b9ab IB/ipath: Set LID filtering for HCAs that support it.
Whenever the LID is set, notify the HCA specific code so that the
appropriate HW registers can be updated. Also log the info on the
console at low priority.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:29 -07:00
Dave Olson
b3e8f54107 IB/ipath: Add support for IBTA 1.2 Heartbeat
This patch adds code to enable/disable the IBTA 1.2 heartbeat for testing
if the HCA supports it.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:29 -07:00
Dave Olson
555b203e48 IB/ipath: Make link state transition code ignore (transient) link recovery
The hardware-based recovery doesn't need any intervention, and in a few
cases we can get a bit confused about state and skip steps such as
turning off the link state LED when we consider recovery to be "down".
So ignore this transition, and either we recover in hardware, or we
transition to down, and will handle it then.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:29 -07:00
Ralph Campbell
9355fb6a06 IB/ipath: Add support for 7220 receive queue changes
Newer HCAs have a HW option to write a sequence number to each receive
queue entry and avoid a separate DMA of the tail register to memory.
This patch adds support for these changes.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:29 -07:00
Ralph Campbell
2ba3f56eb4 IB/ipath: Fix some white space and code style issues
This patch makes some white space changes and minor non-functional
changes to more closely match the code in OFED-1.3.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:29 -07:00
Michael Albaugh
afd9970f95 IB/ipath: Allow old and new diagnostic packet formats
This patch checks for old and new format writes to send a packet via the
diagnostic interface.

Signed-off-by: Michael Albaugh <Michael.Albaugh@Qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:28 -07:00
Dotan Barak
7ce5eacb45 IB/core: Check optional verbs before using them
Make sure that a device implements the modify_srq and reg_phys_mr
optional methods before calling them.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:28 -07:00
Robert P. J. Day
b3b8128fd3 IB/ipath: Fix time comparison to use time_after_eq()
Raw comparison against jiffies will fail if jiffies wraps, although
since ipath currently only supports 64-bit architectures, this is rather
far-fetched.  Still, it's better to use time_after_eq().

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:28 -07:00
Roland Dreier
f438000f7a IB/mlx4: Micro-optimize mlx4_ib_post_send()
Rather than have build_mlx_header() return a negative value on failure
and the length of the segments it builds on success, add a pointer
parameter to return the length and return 0 on success.  This matches
the calling convention used for build_lso_seg() and generates slightly
smaller code -- eg, on 64-bit x86:

add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-22 (-22)
function                                     old     new   delta
mlx4_ib_post_send                           2023    2001     -22

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:28 -07:00
Eli Cohen
b832be1e40 IB/mlx4: Add IPoIB LSO support
Add TSO support to the mlx4_ib driver.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:27 -07:00
Eli Cohen
40ca1988e0 IPoIB: Add LSO support
For HCAs that support TCP segmentation offload (IB_DEVICE_UD_TSO), set
NETIF_F_TSO and use HW LSO to offload TCP segmentation.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:27 -07:00
Eli Cohen
b846f25aa2 IB/core: Add creation flags to struct ib_qp_init_attr
Add a create_flags member to struct ib_qp_init_attr that will allow a
kernel verbs consumer to create a pass special flags when creating a QP.
Add a flag value for telling low-level drivers that a QP will be used
for IPoIB UD LSO.  The create_flags member will also be useful for XRC
and ehca low-latency QP support.

Since no create_flags handling is implemented yet, add code to all
low-level drivers to return -EINVAL if create_flags is non-zero.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:27 -07:00
Michael Albaugh
d84e0b28d3 IB/ipath: EEPROM support for 7220 devices, robustness improvements, cleanup
Add support for reading newer card's EEPROMs while continuing to support
older EEPROMs.

Also, add support for the temperature sensor if present.

Signed-off-by: Michael Albaugh <Michael.Albaugh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:27 -07:00
Ralph Campbell
d98b193776 IB/ipath: Use PIO buffer for RC ACKs
This reduces the latency for RC ACKs when a PIO buffer is available.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:27 -07:00
Ralph Campbell
c4b4d16e09 IB/ipath: Make send buffers available for kernel if not allocated to user
A fixed partitioning of send buffers is determined at driver load time
for user processes and kernel use.  Since send buffers are a scarce
resource, it makes sense to allow the kernel to use the buffers if they
are not in use by a user process.

Also, eliminate code duplication for ipath_force_pio_avail_update().

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:26 -07:00
Michael Albaugh
4330e4dad7 IB/ipath: Prevent link-recovery code from negating admin disable
The link can be put in LINKDOWN_DISABLE state either locally or via a
MAD.  However, the link-recovery code will take it out of that state as
a side-effect of attempts to clear SerDes/XGXS issues.

We add a flag to indicate "link is down on purpose, leave it alone."

Signed-off-by: Michael Albaugh <michael.albaugh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:26 -07:00
Ralph Campbell
8c641d4b5f IB/ipath: Remove some useless (void) casts
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:26 -07:00
Ralph Campbell
928e3e4bb9 IB/ipath: Change the module author
Update the module author to the current email address.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:26 -07:00
Robert P. J. Day
4e96a77440 RDMA/nes: Use more concise list_for_each_entry()
In list iteration code, you normally wouldn't be calling
"container_of()" directly anyway, you'd be invoking "list_entry()".
But you don't even need that here, "list_for_each_entry()" is fine.

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Acked-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:26 -07:00
Robert P. J. Day
157de22946 IB: Use shorter list_splice_init() for brevity
Convert list_splice() + INIT_LIST_HEAD() to the equivalent list_splice_init()

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:26 -07:00
Julia Lawall
10f32065a2 RDMA/iwcm: Test rdma_create_id() for IS_ERR rather than 0
The function rdma_create_id() always returns either a valid pointer or
a value made with ERR_PTR, so its result should be tested with IS_ERR,
not with a test for 0.

The problem was found using the following semantic match.
(http://www.emn.fr/x-info/coccinelle/)

//<smpl>
@a@
expression E, E1;
statement S,S1;
position p;
@@

E = rdma_create_id(...)
... when != E = E1
if@p (E) S else S1

@n@
position a.p;
expression E,E1;
statement S,S1;
@@

E = NULL
... when != E = E1
if@p (E) S else S1

@depends on !n@
expression E;
statement S,S1;
position a.p;
@@

* if@p (E)
  S else S1
//</smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:25 -07:00
Roland Dreier
4d43653263 RDMA/nes: Remove session_id from nes_cm stuff
The session_id members of struct nes_cm_listener and struct
nes_cm_node are write-only, so remove them.  This allows the
session_id member of struct nes_cm_core to be removed as well, since
it is only used to write those other session_id values.

This removes the use of current->tgid (which will be deprecated)
pointed out by Pavel Emelyanov <xemul@openvz.org>.

Acked-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:25 -07:00
Roland Dreier
782203884e IB/ipath: Fix PCI config write size used to clear linkctrl error bits
In slave_or_pri_blk(), pci_write_config_byte() is used to write a
16-bit quantity to clear linkctrl CRC error bits.  This is clearly a
bug and also causes the warning

    drivers/infiniband/hw/ipath/ipath_iba6110.c: In function 'slave_or_pri_blk':
    drivers/infiniband/hw/ipath/ipath_iba6110.c:849: warning: overflow in implicit constant conversion

Fix this by using pci_write_config_word() instead.

Acked-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:25 -07:00
Ralph Campbell
10a8c3cd01 IB/ipath: Fix sanity checks on QP number of WRs and SGEs
The receive queue number of WRs and SGEs shouldn't be checked if a
SRQ is specified.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:25 -07:00
Ralph Campbell
69bd74c696 IB/ipath: Remove useless comments
Remove useless comment about list removal since locks are held and
the code checks that the QP is on the list before removing it.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:25 -07:00
Dave Olson
72708a0a2b IB/ipath: HW workaround for case where chip can send but not receive
Workaround a QLE7140 problem that in rare cases causes flow control
problems after link recovery by forcing a link retrain after recovery.
A module parameter is provided to control the behavior in case it causes
problems.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:25 -07:00
Ralph Campbell
a51a2513a8 IB/ipath: Add code to support multiple link speeds and widths
This patch adds code to get/set portinfo to support multiple link speeds
and widths.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:24 -07:00
John Gregor
58411d1c01 IB/ipath: Head of Line blocking vs forward progress of user apps
There's a conflict between our need to quiesce PSM-based applications
to avoid HoL blocking when the IB link goes down and the apps' desire
to remain running so that their quiescence timout mechanism can keep
running.

The compromise is to STOP the processes for a fixed period of time and
then alternate between CONT and STOP until the link is again active.

If there are poor interactions with subnet manager configuration at a
given site, the interval can be adjusted via a module paramter.

Signed-off-by: John Gregor <john.gregor@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:24 -07:00
Ralph Campbell
6be979d71a IB/ipath: Make debug error message match the constraint that is checked for
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:24 -07:00
Ralph Campbell
c1702be20f IB/ipath: Don't try to handle freeze mode HW errors if diagnostic mode
Don't try to handle freeze mode HW errors if the driver is in diagnostic
mode since some tests can cause errors that shouldn't be processed.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:14 -07:00
Arthur Jones
b848882153 IB/ipath: Fix link up LED display
The check for link up was incorrect, thus setting the LED display
inconsistently with the link state.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:13 -07:00
Ralph Campbell
8bae0ff259 IB/ipath: Fix error recovery for send buffer status after chip freeze mode
The error recovery code for updating the driver's cached status information
for which send buffers are busy or free wasn't updated for IBA7220.
It should be similar to the initialization code in enable_chip().

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:13 -07:00
Ralph Campbell
0349d16620 IB/ipath: Fix byte order of pioavail in handle_errors()
Fix byte order of value assigned to pioavailshadow.  This bug was
detected by sparse endianness warnings.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:13 -07:00
Roland Dreier
c263ff65d5 IB/mthca: Avoid integer overflow when allocating huge ICM table
In mthca_alloc_icm_table(), the number of entries to allocate for the
table->icm array is computed by calculating obj_size * nobj and then
dividing by MTHCA_TABLE_CHUNK_SIZE.  If nobj is really large, then
obj_size * nobj may overflow and the division may get the wrong value
(even a negative value).  Fix this by calculating the number of
objects per chunk and then dividing nobj by this value instead.

This patch allows crazy configurations such as loading ib_mthca with
the module parameter num_mtt=33554432 to work properly.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:13 -07:00
Roland Dreier
19773539d6 IB/mthca: Avoid integer overflow when dealing with profile size
mthca_make_profile() returns the size in bytes of the HCA context
layout it creates, or a negative value if an error occurs.  However,
the return value is declared as u64 and the memfree initialization
path casts this value to int to test if it is negative.  This makes it
think incorrectly than an error has occurred if the context size
happens to be bigger than 2GB, since this turns into a negative int.

Fix this by having mthca_make_profile() return an s64 and testing
for an error by checking whether this 64-bit value itself is negative.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:13 -07:00
Hoang-Nam Nguyen
f4f82994d1 IB/ehca: Remove tgid checking
Pavel Emelyanov <xemul@openvz.org> mentioned in <http://lkml.org/lkml/2008/3/17/131>
that the task_struct->tgid field is about to become deprecated, so the
uses in the ehca driver need to be fixed up.

However, all the uses in ehca are for some object ownership checking
that is not really needed, and anyway is implementing a policy that
should be in common code rather than a low-level driver.  So just
remove all the checks.

Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:13 -07:00
David Dillow
1e89a1946c IB/srp: Enforce protocol limit on srp_sg_tablesize
The current SRP initiator will allow unlimited s/g entries in the
indirect descriptors lists, but the entry count field in the SRP_CMD
request is 8 bits, so setting srp_sg_tablesize too large will open the
possibility of wrapping the count and generating invalid requests.

Clamp srp_sg_tablesize to the protocol limits to prevent surprises.

Reported by Martin W. Schlining III <mschlining@datadirectnet.com>.

Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:12 -07:00
Dave Olson
826d801009 IB/ipath: Enable 4KB MTU
Enable use of 4KB MTU.  Since the driver uses more pinned memory for
receive buffers when the 4KB MTU is enabled, whether or not the fabric
supports that MTU, add a "mtu4096" module parameter that can be used to
limit the MTU to 2KB when it is known that 4KB MTUs can't be used
anyway.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:12 -07:00
Dave Olson
5d1ce03dd3 IB/ipath: Shared context code needs to be sure device is usable
The code was checking if units are present, but not that present units
were usable (link up, etc.)

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:12 -07:00
Arthur Jones
6ca2abf4c0 IB/ipath: Provide I/O bus speeds for diagnostic purposes
Modern I/O buses like PCIe and HT can be configured for multiple speeds
and widths.  When an ipath HCA seems to have lower than expected
performance, it is very useful to be able to display what the driver
thinks the bus speed is.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:12 -07:00
Dave Olson
f2ceb4929a IB/ipath: Make some constants chip-specific, related cleanup
This patch makes some constants chip-specific, and makes some related
changes to prepare for supporting another HCA.

Signed-off-by: Dave Olson <dave.olson@qlogic.com
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:12 -07:00
Arthur Jones
3dd59e226e IB/ipath: Misc sparse warning cleanup
Recent sparse versions and kernel cleanups knock down the false positive
rate of the ipath driver code to a point where having it be sparse clean
is worthwhile. Here we fixup the sparse warnings.  Some of these warnings
(and the impetus to run sparse again) are due to work by Roland Dreier.

Signed-off-by: Arthur Jones <arthur.jones@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:11 -07:00
Eli Cohen
680b575f6d IB/mthca: Add IPoIB checksum offload support
Arbel and Sinai devices support checksum generation and verification
of TCP and UDP packets for UD IPoIB messages.  This patch checks if
the HCA supports this and sets the IB_DEVICE_UD_IP_CSUM capability
flag if it does.  It implements support for handling the IB_SEND_IP_CSUM
send flag and setting the csum_ok field in receive work completions.

Signed-off-by: Eli Cohen <eli@mellnaox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:11 -07:00
Eli Cohen
8ff095ec4b IB/mlx4: Add IPoIB checksum offload support
ConnectX devices support checksum generation and verification of TCP
and UDP packets for UD IPoIB messages.  This patch checks if the HCA
supports this and sets the IB_DEVICE_UD_IP_CSUM capability flag if it
does.  It implements support for handling the IB_SEND_IP_CSUM send
flag and setting the csum_ok field in receive work completions.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Ali Ayub <ali@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:10 -07:00
Eli Cohen
6046136c74 IPoIB: Use checksum offload support if available
For HCAs that support checksum offload (ie that set IB_DEVICE_UD_IP_CSUM
in the device capabilities flags), have IPoIB set NETIF_F_IP_CSUM and
use the HCA to generate and verify IP checksums.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:10 -07:00
Harvey Harrison
3371836383 IB: Replace remaining __FUNCTION__ occurrences with __func__
__FUNCTION__ is gcc-specific, use __func__ instead.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:10 -07:00
Roland Dreier
e8e91f6b4d IB/ehca: Make symbols used only in a single source file static
Allow the compiler to optimize better and generate smaller code:

add/remove: 0/6 grow/shrink: 2/0 up/down: 1528/-1864 (-336)
function                                     old     new   delta
.ehca_set_pagebuf                           1344    2172    +828
.ehca_probe                                 2312    3012    +700
ehca_set_pagebuf_phys                         24       -     -24
ehca_set_pagebuf_fmr                          24       -     -24
ehca_init_device                              24       -     -24
.ehca_set_pagebuf_fmr                        480       -    -480
.ehca_set_pagebuf_phys                       512       -    -512
.ehca_init_device                            800       -    -800

Also this fixes warnings like:

    drivers/infiniband/hw/ehca/ehca_mrmw.c:2015:5: warning: symbol 'ehca_set_pagebuf_fmr' was not declared. Should it be static?

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:10 -07:00
Roland Dreier
1a855fbfb6 RDMA/nes: Make symbols used only in a single source file static
Avoid namespace pollution and allow the compiler to optimize better.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:09 -07:00
Roland Dreier
71e0957c62 RDMA/nes: Use proper format and cast to print dma_addr_t
On some platforms, eg sparc64, dma_addr_t is not the same size as a
pointer, so printing dma_addr_t values by casting to void * and using
a %p format generates warnings.  Fix this by casting to unsigned long
and using %lx instead.  This fixes the warnings:

    drivers/infiniband/hw/nes/nes_verbs.c: In function 'nes_setup_virt_qp':
    drivers/infiniband/hw/nes/nes_verbs.c:1047: warning: cast to pointer from integer of different size
    drivers/infiniband/hw/nes/nes_verbs.c:1078: warning: cast to pointer from integer of different size
    drivers/infiniband/hw/nes/nes_verbs.c:1078: warning: cast to pointer from integer of different size
    drivers/infiniband/hw/nes/nes_verbs.c: In function 'nes_reg_user_mr':
    drivers/infiniband/hw/nes/nes_verbs.c:2657: warning: cast to pointer from integer of different size

Reported by Andrew Morton <akpm@linux-foundation.org>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:09 -07:00
Roland Dreier
9d84ab9c7e RDMA/nes: Remove unused nes_netdev_exit() function
nes_netdev_exit() has no callers, so delete it.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:09 -07:00
Roland Dreier
5bd8341ce2 RDMA/nes: Remove redundant NULL check in nes_unregister_ofa_device()
nes_unregister_ofa_device() dereferences the nesibdev pointer before
testing if it's NULL.  Also, the test is doubly redundant because the
only caller of nes_unregister_ofa_device() is nes_destroy_ofa_device(),
which already tests if nesibdev is NULL.  Remove the unnecessary test.

This was spotted by the Coverity checker (CID 2190).

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:09 -07:00
Roland Dreier
a7dab9e887 IB/uverbs: Use alloc_file() instead of get_empty_filp()
Christoph Hellwig wants to unexport get_empty_filp(), which is an ugly
internal interface.  Change the modular user in ib_uverbs_alloc_event_file()
to use the better alloc_file() interface; this makes the code cleaner too.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:08 -07:00
Roland Dreier
1ae5c187ac IB/uverbs: Don't store struct file * for event files
The file member of struct ib_uverbs_event_file was only used to keep
track of whether the file had been closed or not.  The only thing we
ever did with the value was check if it was NULL or not.  Simplify the
code and get rid of the need to keep track of the struct file * we
allocate by replacing the file member with an is_closed member.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:08 -07:00
Roland Dreier
37608eea86 mlx4_core: Fix confusion between mlx4_event and mlx4_dev_event enums
The struct mlx4_interface.event() method was supposed to get an enum
mlx4_dev_event, but the driver code was actually passing in the
hardware enum mlx4_event values.  Fix up the callers of
mlx4_dispatch_event() so that they pass in the right type of value,
and fix up the event method in mlx4_ib so that it can handle the enum
mlx4_dev_event values.

This eliminates the need for the subtype parameter to the event
method, so remove it.

This also fixes the sparse warning

    drivers/net/mlx4/intf.c:127:48: warning: mixing different enum types
    drivers/net/mlx4/intf.c:127:48:     int enum mlx4_event  versus
    drivers/net/mlx4/intf.c:127:48:     int enum mlx4_dev_event

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:08 -07:00
Roland Dreier
26c4fc26d0 RDMA/amso1100: Endian annotate mqsq allocator
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:08 -07:00
Roland Dreier
dc544bc9cb RDMA/amso1100: Start of endianness annotation
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
2008-04-16 21:01:08 -07:00
Roland Dreier
d23b9d8ff2 RDMA/nes: Delete unused variables
None of the cqp_reqs_XXX counters were ever used anywhere, and neither
was the nics_per_function variable.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:08 -07:00
Roland Dreier
b30db1c186 RDMA/nes: Trivial endianness annotations
Fix a couple of htonl() that should really be ntohl().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:07 -07:00
Roland Dreier
9cda779cc2 RDMA/ucma: Endian annotation
Add __force cast of node_guid to __u64, since we are sticking it into a
structure whose definition is shared with userspace.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:07 -07:00
Roland Dreier
a88f488857 IB/cm: Endianness annotations
Mostly update the RB tree comparisons to force __be types to normal
integers, but the change to cm_format_sidr_req() is a real fix:
param->path->pkey is already __be16.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
2008-04-16 21:01:07 -07:00
Roland Dreier
d2ae16d576 IB/mlx4: Endianness annotations
Trivial fixes to stamp_send_wqe().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:07 -07:00
Roland Dreier
6358ae25fd IB/ipath: Fix sparse warning about shadowed symbol
Fix

    drivers/infiniband/hw/ipath/ipath_init_chip.c:526:10: warning: symbol 'val' shadows an earlier one
    drivers/infiniband/hw/ipath/ipath_init_chip.c:473:6: originally declared here

by giving the second val a different name.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Arthur Jones <arthur.jones@qlogic.com>
2008-04-16 21:01:07 -07:00
Arthur Jones
6ef6aee2f0 IB/ipath: Fix sparse warning about pointer signedness
There's no reason for the third parameter of ipath_count_units() to be
a u32 *, so change it to be an int * instead.  This fixes the sparse
warning:

    drivers/infiniband/hw/ipath/ipath_file_ops.c:1654:47: warning: incorrect type in argument 3 (different signedness)
    drivers/infiniband/hw/ipath/ipath_file_ops.c:1654:47:    expected unsigned int [usertype] *maxportsp
    drivers/infiniband/hw/ipath/ipath_file_ops.c:1654:47:    got int *<noident>

Signed-off-by: Arthur Jones <arthur.jones@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:06 -07:00
Roland Dreier
edba846af9 RDMA/cxgb3: IDR IDs are signed
Fix sparse warnings about pointer signedness by using a signed int when
calling idr_get_new_above().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
2008-04-16 21:01:06 -07:00
Roland Dreier
4b29043921 RDMA/amso1100: Don't use 0UL as a NULL pointer
Write tests for NULL pointers as

	if (!ptr)

instead of

	if (ptr == 0UL)

to fix sparse warnings.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
2008-04-16 21:01:06 -07:00
Roland Dreier
5d5e815db9 IB/mlx4: Convert "if(foo)" to "if (foo)"
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:04 -07:00
Roland Dreier
b39993936d IB/mthca: Formatting cleanups
Fix a few whitespace and other coding style problems.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:03 -07:00
Al Viro
1b90c137cc trivial endianness annotations: infiniband core
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-30 14:20:24 -07:00
Roland Dreier
1f71f50342 RDMA/cxgb3: Program hardware IRD with correct value
Because of a typo in iwch_accept_cr(), the cxgb3 connection handling
code programs the hardware IRD (incoming RDMA read queue depth) with
the value that is passed in for the ORD (outgoing RDMA read queue
depth).  In particular this means that if an application passes in IRD
> 0 and ORD = 0 (which is a completely sane and valid thing to do for
an app that expects only incoming RDMA read requests), then the
hardware will end up programmed with IRD = 0 and the app will fail in
a mysterious way.

Fix this by using "ep->ird" instead of "ep->ord" in the intended place.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-28 10:45:32 -07:00
Chien Tung
f2b2b59b93 RDMA/nes: Fix MSS calculation on RDMA path
Fix the calculation of the MSS for RDMA connections: we need to
allow space in frames for a VLAN tag too.

Signed-off-by: Chien Tung <ctung@neteffect.com>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-03-21 13:59:28 -07:00
Roland Dreier
10313cbb92 IPoIB: Allocate priv->tx_ring with vmalloc()
Commit 7143740d ("IPoIB: Add send gather support") made struct
ipoib_tx_buf significantly larger, since the mapping member changed
from a single u64 to an array with MAX_SKB_FRAGS + 1 entries.  This
means that allocating tx_rings with kzalloc() may fail because there
is not enough contiguous memory for the new, much bigger size.  Fix
this regression by allocating the rings with vmalloc() instead.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-03-12 07:51:03 -07:00
Roland Dreier
4200406b8f IPoIB/cm: Set tx_wr.num_sge in connected mode post_send()
Commit 7143740d ("IPoIB: Add send gather support") made it possible
for tx_wr.num_sge to be != 1 -- this happens if send gather support is
enabled.  However, the code in the connected mode post_send() function
assumes the old invariant, namely that tx_wr.num_sge is always 1.  Fix
this by explicitly setting tx_wr.num_sge to 1 in the CM post_send().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-03-11 18:35:20 -07:00
Or Gerlitz
b3e2749bf3 IPoIB: Don't drop multicast sends when they can be queued
When set_multicast_list() is called the multicast task is restarted
and the IPOIB_MCAST_STARTED bit is cleared.  As a result for some
window of time, multicast packets are not transmitted nor queued but
rather dropped by ipoib_mcast_send().  These dropped packets are
painful in two cases:

 - bonding fail-over which both calls set_multicast_list() on the new
   active slave and sends Gratuitous ARP through that slave.

 - IP_DROP_MEMBERSHIP code which both calls set_multicast_list() on the
   device and issues IGMP leave.

In both these cases, depending on the scheduling of the IPoIB
multicast task, the packets would be dropped.  As a result, in the
bonding case, the failover would not be detected by the peers until
their neighbour is renewed the neighbour (which takes a few tens of
seconds).  In the IGMP case, the IP router doesn't get an IGMP leave
and would only learn on that from further probes on the group (also a
delay of at least a few tens of seconds).

Fix this by allowing transmission (or queuing) depending on the
IPOIB_FLAG_OPER_UP flag instead of the IPOIB_MCAST_STARTED flag.

Signed-off-by: Olga Shern <olgas@voltaire.com>
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-03-11 14:12:03 -07:00
Patrick Marchand Latifi
450bb3875f IB/ipath: Reset the retry counter for RDMA_READ_RESPONSE_MIDDLE packets
Reset the retry counter when we get a good RDMA_READ_RESPONSE_MIDDLE
packet.  This fix will prevent the requester from reporting a retry
exceeded error too early.

Signed-off-by: Patrick Marchand Latifi <patrick.latifi@qlogic.com>
2008-03-11 14:04:35 -07:00
Patrick Marchand Latifi
2a049e514b IB/ipath: Fix error completion put on send CQ instead of recv CQ
A work completion entry could be placed on the wrong completion
queue when an RC QP is placed in the error state.

Signed-off-by: Patrick Marchand Latifi <patrick.latifi@qlogic.com>
Acked-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-03-11 14:03:54 -07:00
Patrick Marchand Latifi
4cd5060cf7 IB/ipath: Fix RC QP initialization
This patch fixes the initialization of RC QPs, since we would rely on
the queue pair type (ibqp->qp_type) being set, but this field is only
initialized when we return from ipath_create_qp (it is initialized by
the user-level verbs library).

The fix is to not depend on this field to initialize the send and
the receive state of the RC QP.

Signed-off-by: Patrick Marchand Latifi <patrick.latifi@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-03-11 14:02:32 -07:00
Patrick Marchand Latifi
87d5aed85b IB/ipath: Fix potentially wrong RNR retry counter returned in ipath_query_qp()
There can be a case where the requester's rnr retry counter
(s_rnr_retry) is less than the number of rnr retries allowed per QP
(s_rnr_retry_cnt).  This can happen if the s_rnr_retry counter is being
decremented and an ipath_query_qp call is issued during that time frame.
The fix is to always return the number of rnr retries allowed per QP
instead of the requester's rnr counter.

Found by code review.

Signed-off-by: Patrick Marchand Latifi <patrick.latifi@qlogic.com>
Acked-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-03-11 14:01:14 -07:00
Ralph Campbell
140277e9a7 IB/ipath: Fix IB compliance problems with link state vs physical state
Subnet manager SetPortinfo messages distingush between changing the link
state (DOWN, ARM, ACTIVE) and the link physical state (POLL, SLEEP,
DISABLED).  These are somewhat independent commands and affect when link
width and speed changes take effect.  Without this patch, a link DOWN
physical state NOP command was causing the link width and speed settings
to take effect which should only happen when the link physical state is
goes down (either by a SMP or some link physical error like link errors
exceeding the threshold).

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-03-11 13:58:22 -07:00
Steve Wise
d7c1fbd660 RDMA/iwcm: Don't access a cm_id after dropping reference
cm_work_handler() can access cm_id_priv after it drops its reference
by calling iwch_deref_id(), which might cause it to be freed.  The fix
is to look at whether IWCM_F_CALLBACK_DESTROY is set _before_ dropping
the reference.  Then if it was set, free the cm_id on this thread.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-03-10 21:22:22 -07:00
Arne Redlich
d33ed425c6 IB/iser: Handle iser_device allocation error gracefully
"iser_device" allocation failure is "handled" with a BUG_ON() right
before dereferencing the NULL-pointer - fix this!

Signed-off-by: Arne Redlich <arne.redlich@xiranet.com>
Signed-off-by: Erez Zilber <erezz@voltaire.com>
2008-03-10 21:17:51 -07:00
Arne Redlich
9a378270c0 IB/iser: Fix list iteration bug
The iteration through the list of "iser_device"s during device
lookup/creation is broken -- it might result in an infinite loop if
more than one HCA is used with iSER.  Fix this by using
list_for_each_entry() instead of the open-coded flawed list iteration
code.

Signed-off-by: Arne Redlich <arne.redlich@xiranet.com>
Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-03-10 21:15:49 -07:00
Jon Mason
4fa45725df RDMA/cxgb3: Fix iwch_create_cq() off-by-one error
The cxbg3 driver is unnecessarily decreasing the number of CQ entries by
one when creating a CQ.  This will cause the CQ not to have as many
entries as requested by the user if the user requests a power of 2 size.

Signed-off-by: Jon Mason <jon@opengridcomputing.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-03-09 13:54:12 -07:00
Jon Mason
1bab74e691 RDMA/cxgb3: Return correct max_inline_data when creating a QP
Set cap.max_inline_data to the actual max inline data that the adapter
support, so that userspace apps see the right value returned.

Signed-off-by: Jon Mason <jon@opengridcomputing.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-29 13:53:18 -08:00
Pete Wyckoff
331552925d IB/fmr_pool: Flush all dirty FMRs from ib_fmr_pool_flush()
Commit a3cd7d90 ("IB/fmr_pool: ib_fmr_pool_flush() should flush all
dirty FMRs") caused a regression for iSER and was reverted in
e5507736.

This change attempts to redo the original patch so that all used FMR
entries are flushed when ib_flush_fmr_pool() is called without
affecting the normal FMR pool cleaning thread.  Simply move used
entries from the clean list onto the dirty list in ib_flush_fmr_pool()
before letting the cleanup thread do its job.

Signed-off-by: Pete Wyckoff <pw@osc.edu>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-29 13:31:48 -08:00
Pete Wyckoff
35fb5340e3 Revert "IB/fmr_pool: ib_fmr_pool_flush() should flush all dirty FMRs"
This reverts commit a3cd7d9070.

The original commit breaks iSER reliably, making it complain:

    iser: iser_reg_page_vec:ib_fmr_pool_map_phys failed: -11

The FMR cleanup thread runs ib_fmr_batch_release() as dirty entries
build up.  This commit causes clean but used FMR entries also to be
purged.  During that process, another thread can see that there are no
free FMRs and fail, even though there should always have been enough
available.

Signed-off-by: Pete Wyckoff <pw@osc.edu>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-29 13:29:19 -08:00
Sean Hefty
84ba284cd7 IB/cm: Flush workqueue when removing device
When a CM MAD is received, it is queued to a CM workqueue for
processing.  The queued work item references the port and device on
which the MAD was received.  If that device is removed from the system
before the work item can execute, the work item will reference freed
memory.

To fix this, flush the workqueue after unregistering to receive MAD,
and before the device is be freed.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-29 13:27:52 -08:00
John Lacombe
4b1cc7e7ca RDMA/nes: Fix interrupt moderation low threshold
Interrupt moderation low threshold value was incorrectly triggering,
indicating that the threshold should be lowered.

The impact was the timer was likely to become 40usecs and get stuck
there.  The biggest side effect was too many interrupts and nonoptimal
performance.

Signed-off-by: John Lacombe <jlacombe@neteffect.com>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-26 16:24:29 -08:00
Faisal Latif
30da7cff87 RDMA/nes: Fix CRC endianness for RDMA connection establishment on big-endian
With commit ef19454b ("[LIB] crc32c: Keep intermediate crc state in
cpu order"), the behavior of crc32c changes on big-endian platforms.

Our algorithm expects the previous behavior; otherwise we have RDMA
connection establishment failure on big-endian platforms like powerpc.
Apply cpu_to_le32() to value returned by crc32c() to get the previous
behavior.

Signed-off-by: Faisal Latif <flatif@neteffect.com>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-26 16:24:29 -08:00
Faisal Latif
a2e9c384ce RDMA/nes: Fix use-after-free in mini_cm_dec_refcnt_listen()
Fix use-after-free spotted by Coverity checker flagged by Adrian Bunk.

Signed-off-by: Faisal Latif <flatif@neteffect.com>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-26 16:24:29 -08:00
Glenn Streiff
f84fba6f96 RDMA/nes: Fix use-after-free in nes_create_cq()
Just delete the debugging statement so we don't use cqp_request after
freeing it.  Adrian Bunk flagged this use-after-free issue spotted by
the Coverity checker.

Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-26 16:24:29 -08:00
Adrian Bunk
a4435febd4 RDMA/nes: Fix a check-after-use in nes_probe()
Fix a check-after-use spotted by the Coverity checker.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-26 16:24:29 -08:00
Adrian Bunk
ed0ba33d64 RDMA/nes: Fix a memory leak in schedule_nes_timer()
Fix a memory leak spotted by the Coverity checker.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-26 16:24:27 -08:00
Adrian Bunk
65b07ec293 RDMA/nes: Fix off-by-one
Fix an off-by-one spotted by the Coverity checker.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-25 16:00:30 -08:00
Chien Tung
9300c0c067 RDMA/nes: Resurrect error path dead code
Adrian Bunk pointed out that a Coverity scan found some apparently
dead code in nes_verbs.c that really shouldn't have been dead.

The function nes_create_cq() was missing the assignment

	err = 1;

just prior to an iteration that conditionally set err = 0 if a PBL was
found for a given virtual CQ.  I also noticed we should have been
returning -EFAULT on a couple related error paths.

Signed-off-by: Chien Tung <ctung@neteffect.com>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-25 16:00:30 -08:00
Bryan Rosenburg
82d416fffb RDMA/cxgb3: Fix shift calc in build_phys_page_list() for 1-entry page lists
A single entry (addr 0x10001000, size 0x2000) will get converted to
page address 0x10000000 with a page size of 0x4000.  The code as it
stands doesn't address the single buffer case, but in fact it allows
the subsequent single-buffer special case to be eliminated entirely.
Because the mask now includes the (page adjusted) starting and ending
addresses, the general case works for the single buffer case as well.

Signed-off-by: Bryan Rosenburg <rosnbrg@us.ibm.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-25 16:00:29 -08:00
Roland Dreier
b7f9c112a5 IB/mthca: Free correct MPT on error exit from mthca_fmr_alloc()
When mthca_fmr_alloc() returns an error, it should free the MPT at the
index key, not mr->ibmr.lkey, since the lkey has been mangled by
hw_index_to_key() and no longer is the real index.  This bug causes
corruption of the MPT table free bitmap when mthca_fmr_alloc() fails.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-19 10:42:50 -08:00
Pradeep Satyanarayana
ec229e5e81 IPoIB/cm: Fix ipoib_cm_dev_stop() cleanup when drain times out
Commit efcd9971 ("IPoIB/cm: Factor out ipoib_cm_free_rx_reap_list()")
introduced a bug in ipoib_cm_dev_stop() when the receive drain times
out.  In that case, the function moves all the pending rx stuff into a
private list but then calls ipoib_cm_free_rx_reap_list(), which
handles a different list.

Fix this by moving everything to the rx_reap_list that will actually
get freed up.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=906>.

Signed-off-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-19 10:25:11 -08:00
Roland Dreier
51af33e8e4 RDMA/nes: Fix possible array overrun
In nes_create_qp(), the test

	if (nesqp->mmap_sq_db_index > NES_MAX_USER_WQ_REGIONS) {

is used to error out if the db_index is too large; however, if the
test doesn't trigger, then the index is used as

	nes_ucontext->mmap_nesqp[nesqp->mmap_sq_db_index] = nesqp;

and mmap_nesqp is declared as

	struct nes_qp      *mmap_nesqp[NES_MAX_USER_WQ_REGIONS];

which leads to an array overrun if the index is exactly equal to
NES_MAX_USER_WQ_REGIONS.  Fix this by bailing out if the index is
greater than or equal to NES_MAX_USER_WQ_REGIONS.

This was spotted by the Coverity checker (CID 2162).

Acked-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-18 10:33:59 -08:00
Chien Tung
edd2fd643c RDMA/nes: Fix VLAN support
We need to account for the VLAN header size in nes_netdev_change_mtu()
and nes_netdev_init().  Also, add spin lock/unlock during VLAN RX
registration so only one process can assign VLAN group for a given
interface at a time.

Signed-off-by: Chien Tung <ctung@neteffect.com>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-16 21:16:33 -08:00
Glenn Streiff
11e0704b7e RDMA/nes: Fix MAC interrupt erroneously masked on ifdown
Only mask out MAC interrupt if necessary and re-enable on ifup.  There
could be multiple netdevs going through the same MAC.  MAC interrupts
should not be masked off until the last netdev is downed.

Signed-off-by: Chien Tung <ctung@neteffect.com>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-15 15:05:05 -08:00
Li Zefan
c7482b81c8 IB: Fix return value in ib_device_register_sysfs()
If kobject_create_and_add() fails and returns NULL, the current code
in ib_device_register_sysfs() does not set ret and hence returns 0.
Set ret to -ENOMEM for this failure, so that the caller knows that
ib_device_register_sysfs() actually failed.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-15 15:05:05 -08:00
Sean Hefty
ead595aeb0 RDMA/cma: Do not issue MRA if user rejects connection request
There's an undesirable interaction with issuing MRA requests to
increase connection timeouts and the listen backlog.

When the rdma_cm receives a connection request, it queues an MRA with
the ib_cm.  (The ib_cm will send an MRA if it receives a duplicate
REQ.)  The rdma_cm will then create a new rdma_cm_id and give that to
the user, which in this case is the rdma_user_cm.

If the listen backlog maintained in the rdma_user_cm is full, it
destroys the rdma_cm_id, which in turns destroys the ib_cm_id.  The
ib_cm_id generates a REJ because the state of the ib_cm_id has changed
to MRA sent, versus REQ received.  When the backlog is full, we just
want to drop the REQ so that it is retried later.

Fix this by deferring queuing the MRA until after the user of the
rdma_cm has examined the connection request.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-14 15:30:41 -08:00
Jack Morgenstein
e6028c0e00 IB/mlx4: mlx4_ib_fmr_alloc() should call mlx4_fmr_enable()
Currently mlx4_ib_fmr_alloc() calls mlx4_mr_enable() instead of
mlx4_fmr_enable().  The two functions are equivalent at the moment, but 
this is not really correct (and the change is needed to fix a bug).

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-14 10:39:36 -08:00
Eli Cohen
a9d1884925 IPoIB: Remove unused struct ipoib_cm_tx.ibwc member
struct ipoib_cm_tx.ibwc is unused since commit 1b524963 ("IPoIB/cm:
Use common CQ for CM send completions"), so remove it.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
2008-02-14 10:30:50 -08:00
Jack Morgenstein
167c42655c IPoIB: On P_Key change event, reset state properly
In P_Key event handling, if the old P_Key is no longer available, the
driver must call ipoib_ib_dev_stop() -- just as it does when the P_Key
is still available (see procedure __ipoib_ib_dev_flush()).

When a P_Key becomes available, the driver will perform ipoib_open(),
which assumes that the QP is in RESET, the cm_id has been
destroyed/deleted, etc.  If ipoib_ib_dev_stop() is not called as
described above, then these assumptions will be false, and the attempt
to bring the interface up will fail.

Found by Mellanox QA.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-14 10:15:06 -08:00
Marcin Slusarz
5163dc1a64 IB/mthca: Convert to use be16_add_cpu()
replace:

	big_endian_variable = cpu_to_beX(beX_to_cpu(big_endian_variable) +
						expression_in_cpu_byteorder);

with:

	beX_add_cpu(&big_endian_variable, expression_in_cpu_byteorder);

Generated with a semantic patch.

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-13 07:47:47 -08:00
Steve Wise
8704e9a879 RDMA/cxgb3: Fail loopback connections
The cxgb3 HW and driver don't support loopback RDMA connections.  So
fail any connection attempt where the destination address is local.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-13 07:47:42 -08:00
Roland Dreier
7c7a9bccd2 IB/cm: Fix infiniband_cm class kobject ref counting
Commit 9af57b7a ("IB/cm: Add basic performance counters") introduced a
bug in how the reference count for cm_class.subsys.kobj was handled:
the path that released a device did a kobject_put() on that kobject, but
there was no kobject_get() in the path the handles adding a device.  So
the reference count ended up too low, which leads to bad things.  Fix up
and simplify the reference counting to avoid this.

(Actually, I introduced the bug when fixing the patch up to match some
of Greg's kobject changes, but who's counting)

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-12 14:38:27 -08:00
Roland Dreier
ab64b96067 IB/cm: Remove debug printk()s that snuck upstream
Pesky little devils, sneaking around...

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-12 14:38:27 -08:00
Roland Dreier
fe174357eb IB/mthca: Add missing sg_init_table() in mthca_map_user_db()
Usually harmless, since the scatterlist is always hard-coded to a length
of 1, but it triggers a BUG() if CONFIG_DEBUG_SG=y, so we better fix it.
This fixes <http://bugzilla.kernel.org/show_bug.cgi?id=9934>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-12 14:38:22 -08:00
Eli Cohen
7143740d26 IPoIB: Add send gather support
This patch acts as a preparation for using checksum offload for IB
devices capable of inserting/verifying checksum in IP packets.  The
patch does not actaully turn on NETIF_F_SG - we defer that to the
patches adding checksum offload capabilities.

We only add support for send gathers for datagram mode, since existing
HW does not support checksum offload on connected QPs.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-08 14:32:37 -08:00
Eli Cohen
eb14032f9e IPoIB: Add high DMA feature flag
All current InfiniBand devices can handle all DMA addresses, and it's
hard to imagine anyone would be silly enough to build a new device
that couldn't.  Therefore, enable the NETIF_F_HIGHDMA feature for IPoIB.

This has no effect for no, but is needed when we enable gather/scatter
support and checksum stateless offloads.

Signed-off-by: Eli Cohen <eli@mellnaox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-08 13:39:26 -08:00
Jack Morgenstein
ea54b10c77 IB/mlx4: Use multiple WQ blocks to post smaller send WQEs
ConnectX HCA supports shrinking WQEs, so that a single work request
can be made of multiple units of wqe_shift.  This way, WRs can differ
in size, and do not have to be a power of 2 in size, saving memory and
speeding up send WR posting.  Unfortunately, if we do this then the
wqe_index field in CQEs can't be used to look up the WR ID anymore, so
our implementation does this only if selective signaling is off.

Further, on 32-bit platforms, we can't use vmap() to make the QP
buffer virtually contigious. Thus we have to use constant-sized WRs to
make sure a WR is always fully within a single page-sized chunk.

Finally, we use WRs with the NOP opcode to avoid wrapping around the
queue buffer in the middle of posting a WR, and we set the
NoErrorCompletion bit to avoid getting completions with error for NOP
WRs.  However, NEC is only supported starting with firmware 2.2.232,
so we use constant-sized WRs for older firmware.  And, since MLX QPs
only support SEND, we use constant-sized WRs in this case.

When stamping during NOP posting, do stamping following setting of the
NOP WQE valid bit.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-08 13:30:02 -08:00
Roland Dreier
1c69fc2a90 IB/mlx4: Consolidate code to get an entry from a struct mlx4_buf
We use struct mlx4_buf for kernel QP, CQ and SRQ buffers, and the code
to look up an entry is duplicated in get_cqe_from_buf() and the QP and
SRQ versions of get_wqe().  Factor this out into mlx4_buf_offset().

This will also make it easier to switch over to using vmap() for buffers.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-06 21:07:54 -08:00
Glenn Streiff
3c2d774cad RDMA/nes: Add a driver for NetEffect RNICs
Add a standard NIC and RDMA/iWARP driver for NetEffect 1/10Gb ethernet adapters.

Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:45 -08:00
Olaf Kirch
2c78853472 IB/mthca: Return proper error codes from mthca_fmr_alloc()
If the allocation of the MTT or the mailbox failed, mthca_fmr_alloc()
would return 0 (success) no matter what. This leads to crashes a
little down the road, when we try to dereference eg mr->mtt, which was
really ERR_PTR(-Ewhatever).

Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:44 -08:00
Roland Dreier
f33afc26dc IB: Avoid marking __devinitdata as const
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:44 -08:00
Roland Dreier
68f3948dab IB/mlx4: Actually print out the driver version
The string mlx4_ib_version was defined, but never used.  Print out the
version once when the first device is initialized.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:44 -08:00
Eli Cohen
1d368c5465 IB/ib_mthca: Pre-link receive WQEs in Tavor mode
We have recently discovered that Tavor mode requires each WQE in a
posted list of receive WQEs to have a valid NDA field at all times.
This requirement holds true for regular QPs as well as for SRQs.  This
patch prelinks the receive queue in a regular QP and keeps the free
list in SRQ always properly linked.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Reviewed-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:44 -08:00
Eli Cohen
1203c42e7b IB/mthca: Remove checks for srq->first_free < 0
The SRQ receive posting functions make sure that srq->first_free never
becomes negative, so we can remove tests of whether it is negative.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:44 -08:00
Or Gerlitz
1d96354e61 IB/fmr_pool: Allocate page list for pool FMRs only when caching enabled
Allocate memory for the page_list field of struct ib_pool_fmr only
when caching is enabled for the FMR pool, since the field is not used
otherwise.  This can save significant amounts of memory for large
pools with caching turned off.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:44 -08:00
David Dillow
9fe4bcf45e IB/srp: Retry stale connections
When a host just goes away (crash, power loss, etc.) without tearing
down its IB connections, it can get stale connection errors when it
tries to reconnect to targets upon rebooting.  Retrying the connection
a few times will prevent sysadmins from playing the "which disk(s)
went missing?" game.

This would have made things slightly quicker when tracking down some
of the recent bugs, but it also helps quite a bit when you've got a
large number of targets hanging off a wedged server.

Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:43 -08:00
Jack Morgenstein
893da75956 mlx4_core: Don't read reserved fields in mlx4_QUERY_ADAPTER()
The firmware QUERY_ADAPTER command does not return vendor_id,
device_id, and revision_id; eliminate these fields from the query.

Initialize the rev_id field of the mlx4 device via init_node_data (MAD
IFC query), as is done in the query_device verb implementation.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:43 -08:00
Jack Morgenstein
6ccef1de2c IB/mthca: Don't read reserved fields in mthca_QUERY_ADAPTER()
For memfree devices, the firmware QUERY_ADAPTER command does not
return vendor_id, device_id, and revision_id; do not return these
fields in the QUERY_ADAPTER function for memfree devices.

Instead, for memfree devices, initialize the rev_id field of the mthca
device via init_node_data (MAD IFC query), as is done in the
query_device verb implementation.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:43 -08:00
Or Gerlitz
7bc531dd88 IPoIB: Remove a misleading debug print
Commit 732a2170 ("IB/ipoib: Bound the net device to the ipoib_neigh
structue") left a misleading debug print (n->dev would be a bond
device only if boding is used).  Clean it up.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:43 -08:00
Or Gerlitz
bafff97417 IPoIB: Handle bonding failover race for connected neighbours too
Move up the code that checks for a situation where the remote GID
stored in the ipoib_neigh is different than the one present in the
neighbour (handle gratuitous ARP) or that a bonding fail over has
happened but the neighbour still has a pointer to an ipoib_neigh
created by a different device than the current slave.  This will cause
the driver to apply the check also for connected mode neighbours.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:43 -08:00
Roland Dreier
0d89fe2c0c IB/mthca: Fix and simplify page size calculation in mthca_reg_phys_mr()
In mthca_reg_phys_mr(), we calculate the page size for the HCA
hardware to use to map the buffer list passed in by the consumer.
For example, if the consumer passes in

    [0] addr 0x1000, size 0x1000
    [1] addr 0x2000, size 0x1000

then the algorithm would come up with a page size of 0x2000 and a list
of two pages, at 0x0000 and 0x2000.  Usually, this would work fine
since the memory region would start at an offset of 0x1000 and have a
length of 0x2000.

However, the old code did not take into account the alignment of the
IO virtual address passed in.  For example, if the consumer passed in
a virtual address of 0x6000 for the above, then the offset of 0x1000
would not be used correctly because the page mask of 0x1fff would
result in an offset of 0.

We can fix this quite neatly by making sure that the page shift we use
is no bigger than the first bit where the start of the first buffer
and the IO virtual address differ.  Also, we can further simplify the
code by removing the special case for a single buffer by noticing that
it doesn't matter if we use a page size that is too big.  This allows
the loop to compute the page shift to be replaced with __ffs().

Thanks to Bryan S Rosenburg <rosnbrg@us.ibm.com> for pointing out the
original bug and suggesting several ways to improve this patch.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:42 -08:00
Hoang-Nam Nguyen
2b5e6b120e IB/ehca: Add PMA support
This patch enables ehca to redirect any PMA queries to the
actual PMA QP.

Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Reviewed-by: Joachim Fenkes <fenkes@de.ibm.com>
Reviewed-by: Christoph Raisch <raisch@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:42 -08:00
Joachim Fenkes
528b03f732 IB/ehca: Update sma_attr also in case of disruptive config change
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:42 -08:00
Joachim Fenkes
2b7274c392 IB/ehca: Prevent sending UD packets to QP0
The IB spec doesn't allow packets to QP0 sent on any other VL than VL15.
Hardware doesn't filter those packets on the send side, so we need to do
this in the driver and firmware.

As eHCA doesn't support QP0, we can just filter out all traffic going to
QP0, regardless of SL or VL.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:42 -08:00
Sean Hefty
3971c9f6db IB/cm: Add interim support for routed paths
Paths with hop_limit > 1 indicate that the connection will be routed
between IB subnets.  Update the subnet local field in the CM REQ based
on the hop_limit value.  In addition, if the path is routed, then set
the LIDs in the REQ to the permissive LIDs.  This is used to indicate
to the passive side that it should use the LIDs in the received local
route header (LRH) associated with the REQ when programming the QP.

This is a temporary work-around to the IB CM to support IB router
development until the IB router specification is completed.  It is not
anticipated that this work-around will cause any interoperability
issues with existing stacks or future stacks that will properly
support IB routers when defined.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:42 -08:00
James Bottomley
d3f46f39b7 [SCSI] remove use_sg_chaining
With the sg table code, every SCSI driver is now either chain capable
or broken (or has sg_tablesize set so chaining is never activated), so
there's no need to have a check in the host template.

Also tidy up the code by moving the scatterlist size defines into the
SCSI includes and permit the last entry of the scatterlist pools not
to be a power of two.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-01-30 13:14:02 -06:00
Linus Torvalds
0ba6c33bcd Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.25
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.25: (1470 commits)
  [IPV6] ADDRLABEL: Fix double free on label deletion.
  [PPP]: Sparse warning fixes.
  [IPV4] fib_trie: remove unneeded NULL check
  [IPV4] fib_trie: More whitespace cleanup.
  [NET_SCHED]: Use nla_policy for attribute validation in ematches
  [NET_SCHED]: Use nla_policy for attribute validation in actions
  [NET_SCHED]: Use nla_policy for attribute validation in classifiers
  [NET_SCHED]: Use nla_policy for attribute validation in packet schedulers
  [NET_SCHED]: sch_api: introduce constant for rate table size
  [NET_SCHED]: Use typeful attribute parsing helpers
  [NET_SCHED]: Use typeful attribute construction helpers
  [NET_SCHED]: Use NLA_PUT_STRING for string dumping
  [NET_SCHED]: Use nla_nest_start/nla_nest_end
  [NET_SCHED]: Propagate nla_parse return value
  [NET_SCHED]: act_api: use PTR_ERR in tcf_action_init/tcf_action_get
  [NET_SCHED]: act_api: use nlmsg_parse
  [NET_SCHED]: act_api: fix netlink API conversion bug
  [NET_SCHED]: sch_netem: use nla_parse_nested_compat
  [NET_SCHED]: sch_atm: fix format string warning
  [NETNS]: Add namespace for ICMP replying code.
  ...
2008-01-29 22:54:01 +11:00
Denis V. Lunev
f206351a50 [NETNS]: Add namespace parameter to ip_route_output_key.
Needed to propagate it down to the ip_route_output_flow.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:07 -08:00
Denis V. Lunev
f1b050bf7a [NETNS]: Add namespace parameter to ip_route_output_flow.
Needed to propagate it down to the __ip_route_output_key.

Signed_off_by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:06 -08:00
Denis V. Lunev
1ab352768f [NETNS]: Add namespace parameter to ip_dev_find.
in_dev_find() need a namespace to pass it to fib_get_table(), so add
an argument.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:04 -08:00
Joe Perches
6360a02af1 [IPV4] drivers/infiniband: Use ipv4_is_<type>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:17 -08:00
WANG Cong
aff5905778 INFINIBAND: Remove 'TOPDIR' from Makefiles
This patch removes TOPDIR from infiniband Makefile and delete
one include statement pointing to a non-existing directory

Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <mshefty@ichips.intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-01-28 23:14:37 +01:00
Linus Torvalds
9b73e76f3c Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (200 commits)
  [SCSI] usbstorage: use last_sector_bug flag universally
  [SCSI] libsas: abstract STP task status into a function
  [SCSI] ultrastor: clean up inline asm warnings
  [SCSI] aic7xxx: fix firmware build
  [SCSI] aacraid: fib context lock for management ioctls
  [SCSI] ch: remove forward declarations
  [SCSI] ch: fix device minor number management bug
  [SCSI] ch: handle class_device_create failure properly
  [SCSI] NCR5380: fix section mismatch
  [SCSI] sg: fix /proc/scsi/sg/devices when no SCSI devices
  [SCSI] IB/iSER: add logical unit reset support
  [SCSI] don't use __GFP_DMA for sense buffers if not required
  [SCSI] use dynamically allocated sense buffer
  [SCSI] scsi.h: add macro for enclosure bit of inquiry data
  [SCSI] sd: add fix for devices with last sector access problems
  [SCSI] fix pcmcia compile problem
  [SCSI] aacraid: add Voodoo Lite class of cards.
  [SCSI] aacraid: add new driver features flags
  [SCSI] qla2xxx: Update version number to 8.02.00-k7.
  [SCSI] qla2xxx: Issue correct MBC_INITIALIZE_FIRMWARE command.
  ...
2008-01-25 17:19:08 -08:00
Steve Wise
8176d297c7 RDMA/cxgb3: Fix the T3A workaround checks
Correctly work around T3A issues by checking "hwtype != T3A" instead of
"hwtype == T3B".  This will be needed for new hardware types.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:17:47 -08:00
Jan Engelhardt
f7fca1e8a8 IB/ipath: Remove unnecessary cast
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:17:46 -08:00
Jan Engelhardt
1cf18d5aab IPoIB: Constify seq_operations function pointer tables
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:17:46 -08:00
Steve Wise
c6b5b50474 RDMA/cxgb3: Mark QP as privileged based on user capabilities
This is needed to support zero-stag properly.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:17:45 -08:00
Steve Wise
d08ca26cee RDMA/cxgb3: Fix page shift calculation in build_phys_page_list()
The existing logic incorrectly maps this buffer list:

    0: addr 0x10001000, size 0x1000
    1: addr 0x10002000, size 0x1000

To this bogus page list:

    0: 0x10000000
    1: 0x10002000

The shift calculation must also take into account the address of the
first entry masked by the page_mask as well as the last address+size
rounded up to the next page size.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:17:45 -08:00
Steve Wise
856b592504 RDMA/cxgb3: Flush the receive queue when closing
- for kernel mode cqs, call event notification handler when flushing.
- flush QP when moving from RTS -> CLOSING.
- fix logic to identify a kernel mode qp.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:17:45 -08:00
Ralph Campbell
4e1e93a418 IB/ipath: Trivial simplification of ipath_make_ud_req()
Move the increment of s_hdrwords into the existing if block that tests 
if we're doing a send with immediate, to save one test of the opcode.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:17:44 -08:00
Roland Dreier
950529e5c6 IB/mthca: Update latest "native Arbel" firmware revision
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:17:44 -08:00
Krishna Kumar
48fe5e594c IPoIB: Remove redundant check of netif_queue_stopped() in xmit handler
qdisc_run() now tests for queue_stopped() before calling
__qdisc_run(), and the same check is done in every iteration of
__qdisc_run(), so another check is not required in the driver xmit.
This means that ipoib_start_xmit() no longer needs to test
netif_queue_stopped(); the test was added to fix earlier kernels,
where the networking stack did not guarantee that the xmit method of
an LLTX driver would not be called after the queue was stopped, but
current kernels do provide this guarantee.

To validate, I put a debug in the TX_BUSY path which never hit with 64
threads running overnight exercising this code a few 100 million
times.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:17:44 -08:00
Ralph Campbell
3d68ea3261 IB/ipath: Add mappings from HW register to PortInfo port physical state
Add new mappings from port physical state (a HW register value) to the
IB SubnGet(PortInfo) port physical state.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:17:44 -08:00
Dave Olson
6ac50727bd IB/ipath: Changes to support PIO bandwidth check on IBA7220
The IBA7220 uses a count-based triggering mechanism, and therefore
can't use the same bandwidth verification mechanism as older chips.

To support the 7220, allow enabling and disabling armlaunch errors on
application request.  Minor robustness improvements as well.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:17:43 -08:00
Dave Olson
ddb70c83a5 IB/ipath: Minor cleanup of unused fields and chip-specific errors
Clean up some unused header fields, minor related cleanup.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:17:43 -08:00
Michael Albaugh
359193ef43 IB/ipath: New sysfs entries to control 7220 features
IBA7220 includes many more configurable IB settings. Getting/setting
these is now grouped into a pair of chip specific functions accessed via
function pointers.  Provide sysfs access to these settings.

Signed-off-by: Michael Albaugh <michael.albaugh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:17:32 -08:00
Dave Olson
c4bce8032e IB/ipath: Add new chip-specific functions to older chips, consistent init
This adds the new (sometimes empty) chip-specific functions to the older
chips, and makes the initialization and related functions consistent across
all 3 chips.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:45 -08:00
Dave Olson
7387273307 IB/ipath: Remove unused MDIO interface code
This code has been unused for some time, but still had leftovers
from when it was used.

Signed-off-by: Dave Olson <dave.olson@qlogic.com
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:44 -08:00
Joachim Fenkes
2ec8e66241 IB/ehca: Prevent RDMA-related connection failures on some eHCA2 hardware
Some HW revisions of eHCA2 may cause an RC connection to break if they
received RDMA Reads over that connection before.  This can be
prevented by assuring that, after the first RDMA Read, the QP receives
a new RDMA Read every few million link packets.

Include code into the driver that inserts an empty (size 0) RDMA Read
into the message stream every now and then if the consumer doesn't
post them frequently enough.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:44 -08:00
Hoang-Nam Nguyen
bbdd267ef2 IB/ehca: Add "port connection autodetect mode"
This patch enhances ehca with a capability to "autodetect" the ports
being connected physically. In order to utilize that function the
module option nr_ports must be set to -1 (default is 2 - two
ports). This feature is experimental and will made the default later.

More detail:

If the user connects only one port to the switch, current code requires
  1) port one to be connected and
  2) module option nr_ports=1 to be given.

If autodetect is enabled, ehca will not wait at creation of the GSI QP
for the respective port to become active. Since firmware does not
accept modify_qp() while the port is down at initialization, we need
to cache all calls to modify_qp() for the SMI/GSI QP and just return a
good return code.

When a port is activated and we get a PORT_ACTIVE event, we replay the
cached modify-qp() parms and re-trigger any posted recv WRs. Only then
do we forward the PORT_ACTIVE event to registered clients.

The result of this autodetect patch is that all ports will be
accessible by the users. Depending on their respective cabling only
those ports that are connected properly will become operable. If a
user tries to modify a regular QP of a non-connected port, modify_qp()
will fail. Furthermore, ibv_devinfo should show the port state
accordingly.

Note that this patch primarily improves the loading behaviour of
ehca. If the cable is removed while the driver is operating and
plugged in again, firmware will handle that properly by sending an
appropriate async event.

Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:44 -08:00
Hoang-Nam Nguyen
b8b50e353b IB/ehca: Define array to store SMI/GSI QPs
Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:44 -08:00
Hoang-Nam Nguyen
0c86e280fe IB/ehca: Remove CQ-QP-link before destroying QP in error path of create_qp()
Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:43 -08:00
Erez Zilber
6410627eb9 IB/iser: Add change_queue_depth method
Add a .change_queue_depth handler to the scsi_host_template in the
iSER driver.  iscsi_change_queue_depth was added to iscsi_tcp in order
to solve the problem of queue depth which was too high for some
targets.  It is also applicable for iSER.

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:43 -08:00
Erez Zilber
a4ef1451df IB/iser: Print information about unhandled RDMA CM events
Some RDMA CM events are not supported or not handled in iSER.
This patch adds some info (printk) for the user about them.

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:43 -08:00
Olaf Kirch
a3cd7d9070 IB/fmr_pool: ib_fmr_pool_flush() should flush all dirty FMRs
When a FMR is released via ib_fmr_pool_unmap(), the FMR usually ends
up on the free_list rather than the dirty_list (because we allow a
certain number of remappings before actually requiring a flush).

However, ib_fmr_batch_release() only looks at dirty_list when flushing
out old mappings.  This means that when ib_fmr_pool_flush() is used to
force a flush of the FMR pool, some dirty FMRs that have not reached
their maximum remap count will not actually be flushed.

Fix this by flushing all FMRs that have been used at least once in
ib_fmr_batch_release().

Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:43 -08:00
Olaf Kirch
a656eb758f IB/fmr_pool: Flush serial numbers can get out of sync
Normally, the serial numbers for flush requests and flushes executed
for an FMR pool should be in sync.

However, if the FMR pool flushes dirty FMRs because the
dirty_watermark was reached, we wake up the cleanup thread and let it
do its stuff.  As a side effect, the cleanup thread increments
pool->flush_ser, which leaves it one higher than pool->req_ser.  The
next time the user calls ib_flush_fmr_pool(), the cleanup thread will
be woken up, but ib_flush_fmr_pool() won't wait for the flush to
complete because flush_ser is already past req_ser.  This means the
FMRs that the user expects to be flushed may not have all been flushed
when the function returns.

Fix this by telling the cleanup thread to do work exclusively by
incrementing req_ser, and by moving the comparison of dirty_len and
dirty_watermark into ib_fmr_pool_unmap().

Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
2008-01-25 14:15:42 -08:00
Roland Dreier
2fe7e6f7c9 IB/umad: Simplify and fix locking
In addition to being overly complex, the locking in user_mad.c is
broken: there were multiple reports of deadlocks and lockdep warnings.
In particular it seems that a single thread may end up trying to take
the same rwsem for reading more than once, which is explicitly
forbidden in the comments in <linux/rwsem.h>.

To solve this, we change the locking to use plain mutexes instead of
rwsems.  There is one mutex per open file, which protects the contents
of the struct ib_umad_file, including the array of agents and list of
queued packets; and there is one mutex per struct ib_umad_port, which
protects the contents, including the list of open files.  We never
hold the file mutex across calls to functions like ib_unregister_mad_agent(),
which can call back into other ib_umad code to queue a packet, and we
always hold the port mutex as long as we need to make sure that a
device is not hot-unplugged from under us.

This even makes things nicer for users of the -rt patch, since we
remove calls to downgrade_write() (which is not implemented in -rt).

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:42 -08:00
Roland Dreier
cf9542aa92 IB/ipath: Fix some sparse warnings about shadowed symbols
There are a few places in the ipath driver where a variable is
re-declared within a block where it is already in scope.  Most of these
extra declarations can simply be removed, since the variable from the
outer scope is used in a way so that it does not need to keep its
variable across the block with the re-declaration.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:42 -08:00
Roland Dreier
1d6e658e8e RDMA/cxgb3: Endianness annotation for irs field
t3_rdma_init_wr.irs is a big-endian field, so declare it as __be32.
This fixes one sparse warning.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:42 -08:00
Anton Blanchard
1a7d2dce41 IB/ehca: Use round_jiffies() for EQ polling timer
Use round_jiffies() to align ehca's 1-second timer with other timers
and potentially save power by sleeping cores for longer.

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:41 -08:00
Sean Hefty
5851bb893e RDMA/cma: Override default responder_resources with user value
By default, the responder_resources parameter is set to that received
in a connection request.  The passive side may override this value
when accepting the connection.  Use the value provided by the passive
side when transitioning the QP to RTR state, rather than the value
given in the connect request.  Without this change, the RTR transition
may fail if the passive side supports fewer responder_resources than
that in the request.

For code consistency and to protect against QP destruction, restructure
overriding initiator_depth to match how responder_resources is set.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:41 -08:00
Dave Olson
1f813ca830 IB/ipath: Drop support for the original QHT7040 board
The original QHT7040 had significant performance issues so there was an
additional check in the driver for a newer serial number.  Support for
the small quantities of that board shipped has been dropped, so this
patch removes the special checks to simplify the code.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:40 -08:00
Arthur Jones
7da0498e7f IB/ipath: Add ipath_read_ireg() abstraction
Different chips have different width interrupt status registers, so add
a flag and accessor function to decide which width register read to use.

Signed-off-by: Arthur Jones <arthur.jones@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:40 -08:00
Ralph Campbell
4ea61b548b IB/ipath: Add flag and handling for chips with swapped register bug
The 6110 had a bug that caused some registers to be swapped; it was
fixed for the 7220 (and didn't affect the 6120 because it had fewer
registers).  This adds a flag and related code to handle that, and
includes some minor cleanups in the same area.

Signed-off-by:  Ralph Campbell <ralph.campbell@qlogic.com>
2008-01-25 14:15:39 -08:00
Ralph Campbell
60948a4158 IB/ipath: Port config has on-chip effects for 7220
The number of configured ports for the 7220 changes the number of eager
TIDs available per port, for all but port 0 (kernel port) which remains
constant, so add a field to give port0 count separate from the portdata
structure.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:39 -08:00
Ralph Campbell
a18e26ae44 IB/ipath: Allow more flexible user register alignments
User registers have different alignments on different chips (4KB on
older, 64KB on 7220).  Allow mapping the user registers on kernels with
page sizes up to 64K.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:39 -08:00
Dave Olson
9e2ef36b5a IB/ipath: Clean up some comments
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:38 -08:00
Ralph Campbell
3029fcc3d4 IB/ipath: Export hardware counters more consistently
Various hardware counters are exported via the ipath file system (since
it is binary data).  The old file format was very dependent on the HW
offsets for these registers.  Newer HCA chips can have different
counters at different offsets.  This patch adds a level of indirection
to make the file format consistent across HCAs.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:38 -08:00
Ralph Campbell
6c719cae0b IB/ipath: MAD performance sampling registers support
Add support for QLogic HCAs which have hardware performance sampling
registers for PortSamplesControl and PortSamplesResult MADs.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:38 -08:00
David Dillow
7aa54bd730 IB/srp: Add identifying information to log messages
When you have multiple targets, it gets really confusing when you try
to track down who did a reset when there is no identifying information
in the log message, especially when the same extension ID is mapped
through two different local IB ports.  So, add an identifier that can
be used to track back to which local IB port/remote target pair is the
one having problems.

Signed-off-by: David Dillow <dillowda@ornl.gov>
Acked-by: Pete Wyckoff <pw@osc.edu>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:38 -08:00
Pradeep Satyanarayana
586a693448 IPoIB/CM: Enable SRQ support on HCAs that support fewer than 16 SG entries
Some HCAs (such as ehca2) support SRQ, but only support fewer than 16 SG
entries for SRQs.  Currently IPoIB/CM implicitly assumes all HCAs will
support 16 SG entries for SRQs (to handle a 64K MTU with 4K pages). This
patch removes that restriction by limiting the maximum MTU in connected
mode to what the maximum number of SRQ SG entries allows.

This patch addresses <https://bugs.openfabrics.org/show_bug.cgi?id=728>

Signed-off-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:37 -08:00
David Dillow
fff09a8e6e IB/srp: Enable SG list chaining
By default, the SCSI mid-layer seems to send down 512KB requests
(sg_tablesize = 256), with some requests occasionally combined. By
allowing the mid-layer to chain requests, we can easily grow to 1024KB
or larger -- I've tested 4096KB I/O requests with no problems.

I looked through the DMA paths on the hardware drivers to ensure they
could take advantage of the SG chaining, and it seems that every one
except ipath uses the system's DMA routines, which have been converted
to handle chaining.  ipath looks like it should be OK, but I have no
way to test it.

Signed-off-by: David Dillow <dillowda@ornl.gov>

[ Tested on ipath.  - Roland ]

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:37 -08:00
David Dillow
8cba207732 IB/srp: Respect target credit limit
The current SRP initiator will send requests even if it has no credits
available.  The results of sending extra requests are vendor specific,
but on some devices, overrunning credits will cost 85% of peak
performance -- e.g. 100 MB/s vs 720 MB/s.  Other devices may just drop
the requests.

This patch will tell the SCSI midlayer to queue requests if there are
fewer than two credits remaining, and will not issue a task management
request if there are no credits remaining.  The mid-layer will retry
the queued command once an outstanding command completes.

The patch also removes the unlikely() in __srp_get_tx_iu(), as it is
not at all unlikely to hit this limit under heavy load.

Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:37 -08:00
Rolf Manderscheid
a9e527e3f9 IPoIB: improve IPv4/IPv6 to IB mcast mapping functions
An IPoIB subnet on an IB fabric that spans multiple IB subnets can't
use link-local scope in multicast GIDs.  The existing routines that
map IP/IPv6 multicast addresses into IB link-level addresses hard-code
the scope to link-local, and they also leave the partition key field
uninitialised.  This patch adds a parameter (the link-level broadcast
address) to the mapping routines, allowing them to initialise both the
scope and the P_Key appropriately, and fixes up the call sites.

The next step will be to add a way to configure the scope for an IPoIB
interface.

Signed-off-by: Rolf Manderscheid <rvm@obsidianresearch.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:37 -08:00
Dave Olson
755807a296 IB/ipath: Changes for fields moving from devdata to portdata
This patch moves some arrays that were defined per-device to be
variables defined in the per context data structure, thus avoiding extra
kzalloc() calls.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:36 -08:00
Dave Olson
d8274869d7 IB/ipath: Generalize some xxx_SHIFT macros
In preparation for upcoming chips that have different values for
INFINIPATH_R_PORTENABLE_SHIFT, INFINIPATH_R_INTRAVAIL_SHIFT,
INFINIPATH_R_TAILUPD_SHIFT, and portcfg_shift, remove the shared
#defines and use device-specific variables instead.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:36 -08:00
Ralph Campbell
c59a80aca0 IB/ipath: kreceive uses portdata rather than devdata
kreceive is now portdata * instead of devdata * and other kreceive
related cleanups....

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:35 -08:00
Ralph Campbell
d65708f3a7 IB/ipath: Cleanup ipath_get_egrbuf()
Remove an unused parameter and fix up the comment.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:35 -08:00
Ralph Campbell
cc65edcf0c IB/ipath: Fix RNR NAK handling
This patch fixes a couple of minor problems with RNR NAK handling:
 - The insertion sort was causing extra delay when inserting ahead
   vs. behind an existing entry on the list.
 - A resend of a first packet of a message which is still not ready,
   needs another RNR NAK (i.e., it was suppressed when it shouldn't).
 - Also, the resend tasklet doesn't need to be woken up unless the
   ACK/NAK actually indicates progress has been made.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:34 -08:00
Hoang-Nam Nguyen
e57d62a147 IB/ehca: Forward event client-reregister-required to registered clients
This patch allows ehca to forward event client-reregister-required to
registered clients.  One such event is generated by a switch eg. after
its reboot.

Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:34 -08:00
Roland Dreier
b3226184af IB/mlx4: Micro-optimize mlx4_ib_poll_one()
Rather than byte-swapping cqe->g_mlpath_rqpn each time we extract a
field from it, byte-swap it once into a temporary variable.  This 
results in smaller, better code -- eg, on 32-bit x86:

add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-5 (-5)
function                                     old     new   delta
mlx4_ib_poll_cq                             1188    1183      -5

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:34 -08:00
Adrian Bunk
e57895d389 IB/mthca: Remove MSI support as scheduled
Remove MSI support from the mthca driver, as scheduled.  There is no
reason to use MSI instead of MSI-X, since MSI-X performs better.  No
one has spoken up since MSI support was deprecated in commit f6be6fbe
("IB/mthca: Schedule MSI support for removal"), so apparently the MSI
support is unused.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:33 -08:00
Oliver Pinter
38dc732f47 IB/iser: Typo fix (s/destory/destroy/)
Signed-off-by: Oliver Pinter <oliver.pntr@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:32 -08:00
Erez Zilber
bd5d7a8585 IB/iser: update URLs of iSER docs
Signed-off-by: Erez Zilber <erezz@voltaire.com>
2008-01-25 14:15:32 -08:00
Sean Hefty
88314e4dda RDMA/cma: add support for rdma_migrate_id()
This is based on user feedback from Doug Ledford at RedHat:

Events that occur on an rdma_cm_id are reported to userspace through an
event channel.  Connection request events are reported on the event
channel associated with the listen.  When the connection is accepted, a
new rdma_cm_id is created and automatically uses the listen event
channel.  This is suboptimal where the user only wants listen events on
that channel.

Additionally, it may be desirable to have events related to connection
establishment use a different event channel than those related to
already established connections.

Allow the user to migrate an rdma_cm_id between event channels. All
pending events associated with the rdma_cm_id are moved to the new event
channel.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:32 -08:00
Vladimir Sokolovsky
45d9478da1 RDMA/cma: Reenable device removal on passive side
Enable conn_id remove on the passive side after connection
establishment.  This corrects an issue where the IB driver can't be
unloaded after running applications over RDS.  The 'dev_remove' counter
does not reach 0 for established connections on the passive side.

This problem is limited to device removal, and only occurs on the
passive side if there are established connections.

Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:31 -08:00
Sean Hefty
b61d92d8ae IB/mad: Fix incorrect access to items on local_list
In cancel_mads(), MADs are moved from the wait_list and local_list
to a cancel_list for processing.  However, the structures on these two
lists are not the same.  The wait_list references struct
ib_mad_send_wr_private, but local_list references struct
ib_mad_local_private.  Cancel_mads() treats all items moved to the
cancel_list as struct ib_mad_send_wr_private.  This leads to a system
crash when requests are moved from the local_list to the cancel_list.

Fix this by leaving local_list alone.  All requests on the local_list
have completed are just awaiting processing by a queued worker thread.

Bug (crash) reported by Dotan Barak <dotanb@dev.mellanox.co.il>.
Problem with local_list access reported by Robert Reynolds
<rreynolds@opengridcomputing.com>.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:31 -08:00
Sean Hefty
9af57b7a27 IB/cm: Add basic performance counters
Add performance/debug counters to track sent/received messages, retries,
and duplicates.  Counters are tracked per CM message type, per port.

The counters are always enabled, so intrusive state tracking is not done.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:30 -08:00
Sean Hefty
4fc8cd4919 IB/mad: Report number of times a mad was retried
To allow ULPs to tune timeout values and capture retry statistics,
report the number of times that a mad send operation was retried.

For RMPP mads, report the total number of times that the any portion
(send window) of the send operation was retried.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:30 -08:00
Sean Hefty
547af76521 IB/multicast: Report errors on multicast groups if P_key changes
P_key changes can invalidate multicast groups.  Report errors on all
multicast groups affected by a pkey change.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:29 -08:00
Joe Perches
94545e8c51 IB: Spelling fixes in comments
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:29 -08:00
Nick Piggin
3c8450860b IB/ipath: Convert from .nopage to .fault
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:29 -08:00
Ralph Campbell
a2f76cd69f IB/ipath: Add the work completion error code to the QP error debug output
Add the work completion error code to the QP error debug output.
This makes it easier to determine the cause of the error.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:29 -08:00
Arthur Jones
2f01a70011 IB/ipath: Better comment for rmb() in ipath_intr()
An internal code review found the comment here lacking -- update it with 
more specifics of how and why the rmb() is there.

Signed-off-by: Arthur Jones <arthur.jones@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:28 -08:00
Ralph Campbell
6276980138 IB/ipath: Fix comments for ipath_create_srq()
During a code review, someone noticed the comments didn't match the code.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:28 -08:00
Ralph Campbell
733d12813b IB/ipath: Fix error returned from ib_resize_cq if new size smaller than # entries
The gen2_basic tests check for the errno value when a CQ is resized
smaller than the number of outstanding completions queue on the CQ.
This patch changes ib_ipath to return EINVAL which is what ib_mthca
returns and what gen2_basic expects.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:28 -08:00
John Gregor
e342c11917 IB/ipath: Fix sendctrl locking
Code review pointed out that the locking around uses of ipath_sendctrl 
and kr_sendctrl were, in several places, incorrect and/or inconsistent.

Signed-off-by: John Gregor <john.gregor@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:27 -08:00
Ralph Campbell
9ab4295d1d IB/ipath: Remove dead code for user process waiting for send buffer
At one point in time there was code to allow a user process to
wait for a send buffer if none were available. This feature was
never used and most of the code was removed. This removes
some missed unused code.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:27 -08:00
Steve Wise
457fe7b8a6 RDMA/cxgb3: Support version 5.0 firmware
The 5.0 firmware now supports translating sgls in recv work requests,
so remove the host driver logic currently doing the translation.

Note: this change requires 5.0 firmware.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:26 -08:00
Steve Wise
7f049f2f42 RDMA/cxgb3: Hold rtnl_lock() around ethtool get_drvinfo call
Currently the call into cxgb3 to get the driver info is not serialized.
The iw_cxgb3 module needs to hold the rtnl_lock around the ethtool ops
call like dev_ioctl() does.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:26 -08:00
Joe Perches
908cf9a565 drivers/infiniband: Add missing "space"
Add missing spaces in the middle of format strings.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:26 -08:00
Matthias Kaehlcke
2c45688fae IB/ipath: Convert ipath_eep_sem semaphore to a mutex
Signed-off-by: Matthias Kaehlcke <matthias.kaehlcke@gmail.com>
Acked-by: Michael Albaugh <Michael.Albaugh@qlogic.com>
Tested-by: Arthur Jones <arthur.jones@qlogic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:26 -08:00
Steve Welch
727792da2b IB/mad: Enable loopback of DR SMP responses from userspace
The local loopback of an outgoing DR SMP response is limited to those
that originate at the driver specific SMA implementation during the
driver specific process_mad() function.  This patch enables a
returning DR SMP originating in userspace (or elsewhere) to be
delivered to the local managment stack.  In this specific case the
driver process_mad() function does not consume or process the MAD, so
a reponse mad has not be created and the original MAD must manually be
copied to the MAD buffer that is to be handed off to the local agent.

Signed-off-by: Steve Welch <swelch@systemfabricworks.com>
Acked-by: Hal Rosenstock <hal@xsigo.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:25 -08:00
Ralph Campbell
f9b4035322 IB/ipath: Enable loopback of DR SMP responses from userspace
This patch is in response to reviewing a patch to the core MAD
processing which fixes loopback of directed route packets to/from user
level MAD agents.  This change enables the core code to work for
ib_ipath by fixing the return code from the ipath process_mad method.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:25 -08:00
Ralph Campbell
3828ff457a IB/mad: Remove redundant NULL pointer check in ib_mad_recv_done_handler()
In ib_mad_recv_done_handler(), the response pointer is checked for
NULL after allocating it.  It is then checked again in the local
process_mad() path but there is no possibility of it changing in
between.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Acked-by: Hal Rosenstock <hal@xsigo.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:25 -08:00
Steve Wise
8d8293cfb3 RDMA/iwcm: Set initiator depth and responder resources to device max values
Set the initiator depth and responder resources to the device max
values for new connect request events in the iWARP connection manager.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:25 -08:00
Dave Olson
e193e3326c IB/ipath: Improve interrupt handler cache footprint
Improve interrupt handler cache footprint by noinline'ing error
functions that are rarely called.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:25 -08:00
Pradeep Satyanarayana
68e995a295 IPoIB/cm: Add connected mode support for devices without SRQs
Some IB adapters (notably IBM's eHCA) do not implement SRQs (shared
receive queues).  The current IPoIB connected mode support only works
on devices that support SRQs.

Fix this by adding support for using the receive queue of each
connected mode receive QP.  The disadvantage of this compared to using
an SRQ is that it means a full queue of receives must be posted for
each remote connected mode peer, which means that total memory usage
is potentially much higher than when using SRQs.  To manage this, add
a new module parameter "max_nonsrq_conn_qp" that limits the number of
connections allowed per interface.

The rest of the changes are fairly straightforward: we use a table of
struct ipoib_cm_rx to hold all the active connections, and put the
table index of the connection in the high bits of receive WR IDs.
This is needed because we cannot rely on the struct ib_wc.qp field for
non-SRQ receive completions.  Most of the rest of the changes just
test whether or not an SRQ is available, and post receives or find
received packets in the right place depending on the answer.

Cleaning up dead connections actually becomes simpler, because we do
not have to do the "last WQE reached" dance that is required to
destroy QPs attached to an SRQ.  We just move the QP to the error
state and wait for all pending receives to be flushed.

Signed-off-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>

[ Completely rewritten and split up, based on Pradeep's work.  Several
  bugs fixed and no doubt several bugs introduced.  - Roland ]

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:24 -08:00
Roland Dreier
efcd99717f IPoIB/cm: Factor out ipoib_cm_free_rx_reap_list()
Factor out the code for going through the rx_reap list of struct
ipoib_cm_rx and freeing each one.  This consolidates the code
duplicated between ipoib_cm_dev_stop() and ipoib_cm_rx_reap() and
reduces the risk of error when adding additional accounting.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:24 -08:00
Roland Dreier
7b3687df66 IPoIB/cm: Factor out ipoib_cm_create_srq()
Factor out the code to create an SRQ and allocate the receive ring in
ipoib_cm_dev_init() into a new function ipoib_cm_create_srq().  This
will make the code neater when support for devices that don't implement
SRQs is added.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:24 -08:00
Roland Dreier
1efb61444c IPoIB/cm: Factor out ipoib_cm_free_rx_ring()
Factor out the code to unmap/free skbs and free the receive ring in
ipoib_cm_dev_cleanup() into a new function ipoib_cm_free_rx_ring().
This function will be called from a couple of other places when
support for devices that don't implement SRQs is added.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:24 -08:00
Roland Dreier
2337f80941 IPoIB: Trivial formatting cleanups
Fix whitespace blunders, convert "foo* bar" to "foo *bar", etc.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:23 -08:00
Roland Dreier
657c2f2cbc IB/ipath: Fix crash on unload introduced by sysfs changes
Commit 23b9c1ab ("Infiniband: make ipath driver use default driver
groups.") introduced a bug in the ipath driver where
ipath_device_create_group() fell through into the error path, even on
success, which meant that the sysfs groups it created would always get
removed right away.  This made ipath_device_remove_group() hit the
BUG_ON() in sysfs_remove_group() when it tried to remove those groups a
second time.

Correct the return path so that the groups stick around until they are
supposed to be cleaned up.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:21 -08:00
Greg Kroah-Hartman
c10997f657 Kobject: convert drivers/* from kobject_unregister() to kobject_put()
There is no need for kobject_unregister() anymore, thanks to Kay's
kobject cleanup changes, so replace all instances of it with
kobject_put().


Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:40 -08:00
Greg Kroah-Hartman
23b9c1ab5b Infiniband: make ipath driver use default driver groups.
Make the ipath driver use the new driver functions so that it does not
touch the sysfs portion of the driver structure.

We also remove the redundant symlink from the device back to the driver,
as it is already in the sysfs tree.  Any userspace tools should be using
the standard symlink, not some driver specific one.

Cc: Roland Dreier <rdreier@cisco.com>
Cc: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: Arthur Jones <arthur.jones@qlogic.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:34 -08:00
Greg Kroah-Hartman
35be068198 Kobject: change drivers/infiniband to use kobject_init_and_add
Stop using kobject_register, as this way we can control the sending of
the uevent properly, after everything is properly initialized.

Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <mshefty@ichips.intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:26 -08:00
Erez Zilber
90c18f3c28 [SCSI] IB/iSER: add logical unit reset support
eh_device_reset_handler was already added to scsi_host_template
in iscsi_tcp, and is now added also for iscsi_iser.

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-01-23 13:39:43 -06:00
Ralph Campbell
0a69631b28 IB/ipath: Fix receiving UD messages with immediate data
This fixes a small bug in ipath_ud_rcv()'s handling of UD messages
with immediate data.  We need to test whether immediate data is
present and update the header size accordingly *before* testing the
packet size from the header against the actual received length.
Otherwise the wrong header size will be used and all messages with
immediate data will be dropped.

This bug keeps MVAPICH-UD and HP MPI from working at all on ipath devices.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-16 14:42:35 -08:00
Olaf Kirch
a8ac6311cc [SCSI] iscsi: convert xmit path to iscsi chunks
Convert xmit to iscsi chunks.

from michaelc@cs.wisc.edu:

Bug fixes, more digest integration, sg chaining conversion and other
sg wrapper changes, coding style sync up, and removal of io fields,
like pdu_sent, that are not needed.

Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-01-11 18:28:42 -06:00
Mike Christie
f6d5180c78 [SCSI] libiscsi: fix nop handling
During root boot and shutdown the target could send us nops.
At this time iscsid cannot be running, so the target will drop
the session and the boot or shutdown will hang.

To handle this and allow us to better control when to check the network
this patch moves the nop handling to the kernel.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-01-11 18:28:35 -06:00
Mike Christie
b3a7ea8d50 [SCSI] libiscsi: do not block session during logout
There is not need to block the session during logout. Since
we are going to fail the commands that were blocked just fail them
immediately instead.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-01-11 18:28:28 -06:00
Boaz Harrosh
38ad03de3f [SCSI] libiscsi,iser: patch for AHS support
- The default initialization of hdr_max is the minimum -
    sizeof(struct iscsi_cmd) - Once this patch goes into iser the default
    initialization at libiscsi can be removed.
  - This is not yet full support for AHSs at iser end. But it should be easy.
    Just allocate more space at iser_desc right after iscsi_hdr. Than
    at transmission time use ctask->hdr_len to retrieve the total
    size of all iscsi pdu headers. See previous patch at iscsi_tcp.[ch]

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-01-11 18:28:25 -06:00
Mike Christie
843c0a8a76 [SCSI] libiscsi, iscsi_tcp: add device support
This patch adds logical unit reset support. This should work for ib_iser,
but I have not finished testing that driver so it is not hooked in yet.

This patch also temporarily reverts the iscsi_tcp r2t write out patch.
That code is completely rewritten in this patchset.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-01-11 18:28:19 -06:00
Dave Dillow
ad696989b4 IB/srp: Release transport before removing host
The documented call sequence for removing a host is to call the
transport xxx_remove_host() prior to scsi_remove_host(). The SRP
transport used to crash when that order was followed, but as it is now
fixed, use the documented order.

Signed-off-by: David Dillow <dillowda@ornl.gov>
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-08 12:08:10 -08:00
Dotan Barak
e1bb7843e4 IB/mlx4: Fix value of pkey_index in QP1 completions
Fix the value of pkey_index in completions to get a valid value for
GSI QPs.  Without this fix, incoming GSI packets on port 2 get an
invalid P_Key index in the completion, which prevents the MAD layer
from sending back a response, which can make the second port of
ConnectX HCAs completely useless.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-08 12:05:53 -08:00
David Dillow
b0e47c8b79 IB/srp: Fix list corruption/oops on module reload
Add a missing call to srp_remove_host() in srp_remove_one() so that we 
don't leak SRP transport class list entries.

Tested-by: David Dillow <dillowda@ornl.gov>
Acked-by: FUJITA Tomonori <tomof@acm.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-03 10:25:27 -08:00
Joachim Fenkes
3d758a4a48 IB/ehca: Fix lock flag variable location, bump version number
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-12-13 09:37:23 -08:00
Joachim Fenkes
4faf775795 IB/ehca: Serialize HCA-related hCalls if necessary
Several pSeries firmware versions share a rare locking issue in the
HCA-related hCalls. Check for a feature flag that indicates the issue
being fixed and serialize all HCA hCalls if not.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-12-12 14:09:43 -08:00
Joachim Fenkes
1457edc72d IB/ehca: Return correct number of SGEs for SRQ
Firmware would round up the number of SGEs to four, because the WQE
structure holds four SGEs. For SRQ, only three are supported, so return
a fixed value instead.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-12-12 14:09:43 -08:00
Joachim Fenkes
b1812582ba IB/ehca: Fix static rate if path faster than link
The formula would yield -1 if the path is faster than the link, which
is wrong in a bad way (max throttling).  Clamp to 0, which is the
correct value.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-11-30 16:19:41 -08:00
Jack Morgenstein
1401b53acc IPoIB: Fix oops if xmit is called when priv->broadcast is NULL
If a port goes down, ipoib_ib_dev_down() is invoked -- which flushes
the mcasts (clearing priv->broadcast) and clearing the path record
cache.  If ipoib_start_xmit() is then invoked (before the broadcast
group is rejoined), a kernel oops results from attempting to access
priv->broadcast, which is still unset.

Returning NULL from path_rec_create() if priv->broadcast is NULL is a
harmless way of bypassing the problem -- the offending packet is
simply discarded "without prejudice."

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-11-27 15:40:10 -08:00
Erez Zilber
a316b79c33 IB/iser: Add missing counter increment in iser_data_buf_aligned_len()
While adding sg chaining support to iSER, a "for" loop was replaced
with a "for_each_sg" loop. The "for" loop included the incrementation
of 2 variables. Only one of them is incremented in the current
"for_each_sg" loop. This caused iSER to think that all data is
unaligned, and all data was copied to aligned buffers.

This patch increments the missing counter inside the "for_each_sg"
loop whenever necessary.

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-11-24 13:50:39 -08:00
Joachim Fenkes
3fe2ed344d IB/ehca: Fix static rate regression
Wrong choice of port number caused modify_qp() to fail -- fixed.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-11-24 13:47:59 -08:00
Ralph Campbell
4187b915a0 IB/ipath: Normalize error return codes for posting work requests
The error codes for ib_post_send(), ib_post_recv(), and ib_post_srq_recv()
were inconsistent. Use EINVAL for too many SGEs and ENOMEM for too many
WRs.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-11-20 11:05:42 -08:00
Ralph Campbell
14de986a0b IB/ipath: Fix offset returned to ibv_modify_srq()
The wrong offset was being returned to libipathverbs so that when
ibv_modify_srq() calls mmap(), it always fails.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-11-20 11:04:41 -08:00
Ralph Campbell
8a278e6d57 IB/ipath: Fix error path in QP creation
This patch fixes the code which frees the partially allocated QP
resources if there was an error while creating the QP. In particular,
the QPN wasn't deallocated and the QP wasn't removed from the hash
table.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-11-20 11:04:10 -08:00
Ralph Campbell
fb74dacb0f IB/ipath: Fix offset returned to ibv_resize_cq()
The wrong offset was being returned to libipathverbs so that when
ibv_resize_cq() calls mmap(), it always fails.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-11-20 11:03:26 -08:00
Steve Wise
9a7666494b RDMA/cxgb3: Set the max_qp_init_rd_atom attribute in query_device
The device attribute max_qp_init_rd_atom is not getting set in cxgb3's
query_device method.  Version 1.0.4 of librdmacm now validates the
user's requested initiator and responder resources against the max
supported by the device.  Since iw_cxgb3 wasn't setting this attribute
(and it defaulted to 0), all rdma_connect()s fail if there are
initiator resources requested by the app.  Fix this by setting the
correct value in iwch_query_device().

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-11-13 15:27:00 -08:00
Joachim Fenkes
51aaa54eb9 IB/ehca: Fix static rate calculation
The IPD (inter-packet delay) formula was a little off and assumed a
fixed physical link rate; fix the formula and query the actual
physical link rate, now that we can get it.  Also, refactor the
calculation into a common function ehca_calc_ipd() and use that
instead of duplicating code.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-11-13 15:27:00 -08:00
Joachim Fenkes
40ebb5615e IB/ehca: Return physical link information in query_port()
Newer firmware versions return physical port information to the
partition, so hand that information to the consumer if it's present.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-11-13 15:26:59 -08:00
Ralph Campbell
f4ad1bcc44 IB/ipath: Fix race with ACK retry timeout list management
When an ACK is received, the QP is removed from the timeout list and
then if there are still pending send WQEs, the QP is put back on the
timeout list. It is possible that another post send has put the QP on
the timeout list thus, a check needs to be made before trying to do it
again or the list is corrupted.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-11-13 15:26:58 -08:00
Ralph Campbell
a6e7550d8f IB/ipath: Fix memory leak in ipath_resize_cq() if copy_to_user() fails
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Patrick Marchand Latifi <patrick.latifi@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-11-13 15:26:57 -08:00
Linus Torvalds
53173920da Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/fmr_pool: Stop ib_fmr threads from contributing to load average
  IB/ipath: Fix incorrect use of sizeof on msg buffer (function argument)
  IB/ipath: Limit length checksummed in eeprom
  IB/ipath: Fix a race where s_last is updated without lock held
  IB/mlx4: Lock SQ lock in mlx4_ib_post_send()
  IPoIB/cm: Fix receive QP cleanup
2007-10-30 15:26:56 -07:00
Anton Blanchard
3f776e8a25 IB/fmr_pool: Stop ib_fmr threads from contributing to load average
I noticed my machine was at a constant load average of 1. This was
because ib_create_fmr_pool calls kthread_create but does not
immediately wake the thread up.

Change to using kthread_run so we enter ib_fmr_cleanup_thread(), set
TASK_INTERRUPTIBLE, then go to sleep.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-30 14:57:43 -07:00
Dave Olson
164ef7a252 IB/ipath: Fix incorrect use of sizeof on msg buffer (function argument)
Inside a function declared as

    void foo(char bar[512])

the value of sizeof bar is the size of a pointer, not 512.  So avoid
constructions like this by passing the size explicitly.

Also reduce the size of the buffer to 128 bytes (512 was overly generous).

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-30 11:05:49 -07:00
Michael Albaugh
627934448e IB/ipath: Limit length checksummed in eeprom
The small eeprom that holds the GUID etc. contains a data-length, but if 
the actual eeprom is new or has been erased, that byte will be 0xFF,
which is greater than the maximum physical length of the eeprom, and
more importantly greater than the length of the buffer we vmalloc'd.
Sanity-check the length to avoid the possbility of reading past end of
buffer.

Signed-off-by: Michael Albaugh <Michael.Albaugh@Qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-30 10:58:53 -07:00
Ralph Campbell
fffbfeaa68 IB/ipath: Fix a race where s_last is updated without lock held
There is a small window where a send work queue entry could be
overwritten by ib_post_send() because s_last is updated before the
entry is read.

This patch closes the window by acquiring the lock and updating
the last send work queue entry index after reading the wr_id.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-30 10:57:24 -07:00
Roland Dreier
96db0e0335 IB/mlx4: Lock SQ lock in mlx4_ib_post_send()
Because of a typo, mlx4_ib_post_send() takes the same lock rq.lock as
mlx4_ib_post_recv().  Correct the code so the intended sq.lock is
taken when posting a send.

Noticed by Yossi Leybovitch and pointed out by Jack Morgenstein from
Mellanox.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-30 10:53:54 -07:00
Roland Dreier
09f60f8f54 IPoIB/cm: Fix receive QP cleanup
Commit 1b524963 ("IPoIB/cm: Use common CQ for CM send completions")
changed how the high-order bits of work request IDs were used, which
had the effect that IPOIB_CM_RX_DRAIN_WRID was no longer handled as a
connected mode receive completion.  This leads to the messages

    ib1: cm send completion event with wrid 1073741823 (> 64)
    ib1: RX drain timing out

when an interface with connected mode QPs is brought down.  Fix this
by making sure that both IPOIB_OP_CM and IPOIB_OP_RECV are set in
IPOIB_CM_RX_DRAIN_WRID.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-26 13:44:25 -07:00
Jens Axboe
642f149031 SG: Change sg_set_page() to take length and offset argument
Most drivers need to set length and offset as well, so may as well fold
those three lines into one.

Add sg_assign_page() for those two locations that only needed to set
the page, where the offset/length is set outside of the function context.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-24 11:20:47 +02:00
Linus Torvalds
0b776eb542 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  mlx4_core: Increase command timeout for INIT_HCA to 10 seconds
  IPoIB/cm: Use common CQ for CM send completions
  IB/uverbs: Fix checking of userspace object ownership
  IB/mlx4: Sanity check userspace send queue sizes
  IPoIB: Rewrite "if (!likely(...))" as "if (unlikely(!(...)))"
  IB/ehca: Enable large page MRs by default
  IB/ehca: Change meaning of hca_cap_mr_pgsize
  IB/ehca: Fix ehca_encode_hwpage_size() and alloc_fmr()
  IB/ehca: Fix masking error in {,re}reg_phys_mr()
  IB/ehca: Supply QP token for SRQ base QPs
  IPoIB: Use round_jiffies() for ah_reap_task
  RDMA/cma: Fix deadlock destroying listen requests
  RDMA/cma: Add locking around QP accesses
  IB/mthca: Avoid alignment traps when writing doorbells
  mlx4_core: Kill mlx4_write64_raw()
2007-10-23 09:56:11 -07:00
Olof Johansson
c8ac5a7309 IB/ehca: Fix sg_page() fallout
More fallout from sg_page changes:

drivers/infiniband/hw/ehca/ehca_mrmw.c: In function 'ehca_set_pagebuf_user1':
drivers/infiniband/hw/ehca/ehca_mrmw.c:1779: error: 'struct scatterlist' has no member named 'page'
drivers/infiniband/hw/ehca/ehca_mrmw.c: In function 'ehca_check_kpages_per_ate':
drivers/infiniband/hw/ehca/ehca_mrmw.c:1835: error: 'struct scatterlist' has no member named 'page'
drivers/infiniband/hw/ehca/ehca_mrmw.c: In function 'ehca_set_pagebuf_user2':
drivers/infiniband/hw/ehca/ehca_mrmw.c:1870: error: 'struct scatterlist' has no member named 'page'

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-23 09:12:52 +02:00
Jens Axboe
45711f1af6 [SG] Update drivers to use sg helpers
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-22 21:19:53 +02:00
Michael S. Tsirkin
1b524963fd IPoIB/cm: Use common CQ for CM send completions
Use the same CQ for CM send completions as for all other IPoIB
completions.  This means all completions are processed via the same
NAPI polling routine.  This should help reduce the number of
interrupts for bi-directional traffic (such as TCP) and fixes "driver
is hogging interrupts" errors reported for IPoIB send side, e.g.
<https://bugs.openfabrics.org/show_bug.cgi?id=508>

To do this, keep a per-interface counter of outstanding send WRs, and
stop the interface when this counter reaches the send queue size to
avoid CQ overruns.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-19 21:39:34 -07:00
Roland Dreier
cbfb50e6e2 IB/uverbs: Fix checking of userspace object ownership
Commit 9ead190b ("IB/uverbs: Don't serialize with ib_uverbs_idr_mutex")
rewrote how userspace objects are looked up in the uverbs module's
idrs, and introduced a severe bug in the process: there is no checking
that an operation is being performed by the right process any more.
Fix this by adding the missing check of uobj->context in __idr_get_uobj().

Apparently everyone is being very careful to only touch their own
objects, because this bug was introduced in June 2006 in 2.6.18, and
has gone undetected until now.

Cc: stable <stable@kernel.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-19 20:01:43 -07:00
Anton Arapov
a25de534f8 [INET]: Justification for local port range robustness.
There is a justifying patch for Stephen's patches. Stephen's patches
disallows using a port range of one single port and brakes the meaning
of the 'remaining' variable, in some places it has different meaning.
My patch gives back the sense of 'remaining' variable. It should mean
how many ports are remaining and nothing else. Also my patch allows
using a single port.

  I sure we must be able to use mentioned port range, this does not
restricted by documentation and does not brake current behavior.

usefull links:
Patches posted by Stephen Hemminger
  http://marc.info/?l=linux-netdev&m=119206106218187&w=2
  http://marc.info/?l=linux-netdev&m=119206109918235&w=2

Andrew Morton's comment
  http://marc.info/?l=linux-kernel&m=119248225007737&w=2

1. Allows using a port range of one single port.
2. Gives back sense of 'remaining' variable.

Signed-off-by: Anton Arapov <aarapov@redhat.com>
Acked-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-18 22:00:17 -07:00
Joe Perches
898eb71cb1 Add missing newlines to some uses of dev_<level> messages
Found these while looking at printk uses.

Add missing newlines to dev_<level> uses
Add missing KERN_<level> prefixes to multiline dev_<level>s
Fixed a wierd->weird spelling typo
Added a newline to a printk

Signed-off-by: Joe Perches <joe@perches.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Mark M. Hoffman <mhoffman@lightlink.com>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Tilman Schmidt <tilman@imap.cc>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Stephen Hemminger <shemminger@linux-foundation.org>
Cc: Greg KH <greg@kroah.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Cc: James Smart <James.Smart@Emulex.Com>
Cc: Andrew Vasquez <andrew.vasquez@qlogic.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Jaroslav Kysela <perex@suse.cz>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:28 -07:00
Jack Morgenstein
839041329f IB/mlx4: Sanity check userspace send queue sizes
Add sanity checks to send queue sizes passed in from userspace. The
minimum sq stride value below is taken from the MT25408 PRM (section
11.10, Table 306, log_sq_stride definition).

Without this check, userspace can submit arbitrarily large/small
values for the number of WQEs and the stride, which can crash the
kernel.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-18 09:27:26 -07:00
Roland Dreier
fd312561ad IPoIB: Rewrite "if (!likely(...))" as "if (unlikely(!(...)))"
It's too hard to figure out what "!likely(...)" really means, and who
knows how compilers interpret the hint.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-17 21:54:44 -07:00
Joachim Fenkes
8da9ee9c1e IB/ehca: Enable large page MRs by default
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-17 21:47:47 -07:00
Joachim Fenkes
abc39d3672 IB/ehca: Change meaning of hca_cap_mr_pgsize
ehca_shca.hca_cap_mr_pgsize now contains all supported page sizes ORed
together. This makes some checks easier to code and understand, plus
we can return this value verbatim in query_hca(), fixing a problem
with SRP (reported by Anton Blanchard -- thanks!).

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-17 21:47:24 -07:00
Joachim Fenkes
8c08d50d4f IB/ehca: Fix ehca_encode_hwpage_size() and alloc_fmr()
Simplify ehca_encode_hwpage_size(), fixing an infinite loop for pgsize == 0
in the process. Fix the bug in alloc_fmr() that triggered the loop.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-17 21:46:37 -07:00
Joachim Fenkes
9511724da9 IB/ehca: Fix masking error in {,re}reg_phys_mr()
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-17 21:45:56 -07:00
Joachim Fenkes
c0c84d566d IB/ehca: Supply QP token for SRQ base QPs
Because hardware reports the SRQ token in RWQEs of SRQ base QPs, supply the
base QP token as SRQ token, so we can properly find the SRQ base QP.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-17 21:45:17 -07:00
Joachim Fenkes
6b08f3ae8e [POWERPC] ibmebus: Move to of_device and of_platform_driver, match eHCA and eHEA drivers
Replace struct ibmebus_dev and struct ibmebus_driver with struct of_device
and struct of_platform_driver, respectively.  Match the external ibmebus
interface and drivers using it.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-10-17 22:30:08 +10:00
Anton Blanchard
69fc507a14 IPoIB: Use round_jiffies() for ah_reap_task
Use round_jiffies() to align the 1 second ah_reap_task with other work
and potentially save power by sleeping cores for longer.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-16 12:28:56 -07:00
Sean Hefty
d02d1f5359 RDMA/cma: Fix deadlock destroying listen requests
Deadlock condition reported by Kanoj Sarcar <kanoj@netxen.com>.
The deadlock occurs when a connection request arrives at the same
time that a wildcard listen is being destroyed.

A wildcard listen maintains per device listen requests for each
RDMA device in the system.  The per device listens are automatically
added and removed when RDMA devices are inserted or removed from
the system.

When a wildcard listen is destroyed, rdma_destroy_id() acquires
the rdma_cm's device mutex ('lock') to protect against hot-plug
events adding or removing per device listens.  It then tries to
destroy the per device listens by calling ib_destroy_cm_id() or
iw_destroy_cm_id().  It does this while holding the device mutex.

However, if the underlying iw/ib CM reports a connection request
while this is occurring, the rdma_cm callback function will try
to acquire the same device mutex.  Since we're in a callback,
the ib_destroy_cm_id() or iw_destroy_cm_id() calls will block until
their callback thread returns, but the callback is blocked waiting for
the device mutex.

Fix this by re-working how per device listens are destroyed.  Use
rdma_destroy_id(), which avoids the deadlock, in place of
cma_destroy_listen().  Additional synchronization is added to handle
device hot-plug events and ensure that the id is not destroyed twice.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-16 12:25:49 -07:00
Sean Hefty
c5483388bb RDMA/cma: Add locking around QP accesses
If a user allocates a QP on an rdma_cm_id, the rdma_cm will automatically
transition the QP through its states (RTR, RTS, error, etc.)  While the
QP state transitions are occurring, the QP itself must remain valid.
Provide locking around the QP pointer to prevent its destruction while
accessing the pointer.

This fixes an issue reported by Olaf Kirch from Oracle that resulted in
a system crash:

"An incoming connection arrives and we decide to tear down the nascent
 connection.  The remote ends decides to do the same.  We start to shut
 down the connection, and call rdma_destroy_qp on our cm_id. ... Now
 apparently a 'connect reject' message comes in from the other host,
 and cma_ib_handler() is called with an event of IB_CM_REJ_RECEIVED.
 It calls cma_modify_qp_err, which for some odd reason tries to modify
 the exact same QP we just destroyed."

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-16 12:25:00 -07:00
Jens Axboe
53d412fce0 infiniband: sg chaining support
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:20:59 +02:00
Roland Dreier
ab8403c424 IB/mthca: Avoid alignment traps when writing doorbells
Architectures such as ia64 see alignment traps when doing a 64-bit 
read from __be32 doorbell[2] arrays to do doorbell writes in 
mthca_write64().  Fix this by just passing the two halves of the 
doorbell value into mthca_write64().  This actually improves the 
generated code by allowing the compiler to see what's going on better.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-15 20:17:27 -07:00
Moni Shoua
200d1713b4 IB/ipoib: Verify address handle validity on send
When the bonding device senses a carrier loss of its active slave it replaces
that slave with a new one. In between the times when the carrier of an IPoIB
device goes down and ipoib_neigh is destroyed, it is possible that the
bonding driver will send a packet on a new slave that uses an old ipoib_neigh.
This patch detects and prevents this from happenning.

Signed-off-by: Moni Shoua <monis at voltaire.com>
Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com>
Acked-by: Roland Dreier <rdreier@cisco.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-15 14:20:45 -04:00
Moni Shoua
732a2170f4 IB/ipoib: Bound the net device to the ipoib_neigh structue
IPoIB uses a two layer neighboring scheme, such that for each struct neighbour
whose device is an ipoib one, there is a struct ipoib_neigh buddy which is
created on demand at the tx flow by an ipoib_neigh_alloc(skb->dst->neighbour)
call.

When using the bonding driver, neighbours are created by the net stack on behalf
of the bonding (master) device. On the tx flow the bonding code gets an skb such
that skb->dev points to the master device, it changes this skb to point on the
slave device and calls the slave hard_start_xmit function.

Under this scheme, ipoib_neigh_destructor assumption that for each struct
neighbour it gets, n->dev is an ipoib device and hence netdev_priv(n->dev)
can be casted to struct ipoib_dev_priv is buggy.

To fix it, this patch adds a dev field to struct ipoib_neigh which is used
instead of the struct neighbour dev one, when n->dev->flags has the
IFF_MASTER bit set.

Signed-off-by: Moni Shoua <monis at voltaire.com>
Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com>
Acked-by: Roland Dreier <rdreier@cisco.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-15 14:20:45 -04:00
Linus Torvalds
df3d80f5a5 Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (207 commits)
  [SCSI] gdth: fix CONFIG_ISA build failure
  [SCSI] esp_scsi: remove __dev{init,exit}
  [SCSI] gdth: !use_sg cleanup and use of scsi accessors
  [SCSI] gdth: Move members from SCp to gdth_cmndinfo, stage 2
  [SCSI] gdth: Setup proper per-command private data
  [SCSI] gdth: Remove gdth_ctr_tab[]
  [SCSI] gdth: switch to modern scsi host registration
  [SCSI] gdth: gdth_interrupt() gdth_get_status() & gdth_wait() fixes
  [SCSI] gdth: clean up host private data
  [SCSI] gdth: Remove virt hosts
  [SCSI] gdth: Reorder scsi_host_template intitializers
  [SCSI] gdth: kill gdth_{read,write}[bwl] wrappers
  [SCSI] gdth: Remove 2.4.x support, in-kernel changelog
  [SCSI] gdth: split out pci probing
  [SCSI] gdth: split out eisa probing
  [SCSI] gdth: split out isa probing
  gdth: Make one abuse of scsi_cmnd less obvious
  [SCSI] NCR5380: Use scsi_eh API for REQUEST_SENSE invocation
  [SCSI] usb storage: use scsi_eh API in REQUEST_SENSE execution
  [SCSI] scsi_error: Refactoring scsi_error to facilitate in synchronous REQUEST_SENSE
  ...
2007-10-15 08:19:33 -07:00
Kay Sievers
7eff2e7a8b Driver core: change add_uevent_var to use a struct
This changes the uevent buffer functions to use a struct instead of a
long list of parameters. It does no longer require the caller to do the
proper buffer termination and size accounting, which is currently wrong
in some places. It fixes a known bug where parts of the uevent
environment are overwritten because of wrong index calculations.

Many thanks to Mathieu Desnoyers for finding bugs and improving the
error handling.

Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:01 -07:00
FUJITA Tomonori
aebd5e476e [SCSI] transport_srp: add rport roles attribute
This adds a 'roles' attribute to rport like transport_fc. The role can
be initiator or target. That is, the initiator driver creates target
remote ports and the target driver creates initiator remote ports.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2007-10-12 14:37:46 -04:00
FUJITA Tomonori
3236822b1c [SCSI] ib_srp: convert to use the srp transport class
This converts ib_srp to use the srp transport class.

I don't have ib hardware so I've not tested this patch.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2007-10-12 14:37:42 -04:00
Linus Torvalds
ce9d3c9a6a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (87 commits)
  mlx4_core: Fix section mismatches
  IPoIB: Allow setting policy to ignore multicast groups
  IB/mthca: Mark error paths as unlikely() in post_srq_recv functions
  IB/ipath: Minor fix to ordering of freeing and zeroing of tid pages.
  IB/ipath: Remove redundant link state checks
  IB/ipath: Fix IB_EVENT_PORT_ERR event
  IB/ipath: Better handling of unexpected GPIO interrupts
  IB/ipath: Maintain active time on all chips
  IB/ipath: Fix QHT7040 serial number check
  IB/ipath: Indicate a couple of chip bugs to userspace
  IB/ipath: iba6110 rev4 no longer needs recv header overrun workaround
  IB/ipath: Use counters in ipath_poll and cleanup interrupts in ipath_close
  IB/ipath: Remove duplicate copy of LMC
  IB/ipath: Add ability to set the LMC via the sysfs debugging interface
  IB/ipath: Optimize completion queue entry insertion and polling
  IB/ipath: Implement IB_EVENT_QP_LAST_WQE_REACHED
  IB/ipath: Generate flush CQE when QP is in error state
  IB/ipath: Remove redundant code
  IB/ipath: Future proof eeprom checksum code (contents reading)
  IB/ipath: UC RDMA WRITE with IMMEDIATE doesn't send the immediate
  ...
2007-10-11 19:43:13 -07:00
Stephen Hemminger
227b60f510 [INET]: local port range robustness
Expansion of original idea from Denis V. Lunev <den@openvz.org>

Add robustness and locking to the local_port_range sysctl.
1. Enforce that low < high when setting.
2. Use seqlock to ensure atomic update.

The locking might seem like overkill, but there are
cases where sysadmin might want to change value in the
middle of a DoS attack.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 17:30:46 -07:00
Roland Dreier
9153f66a5b IPoIB: Fix unused variable warning
The conversion to use netdevice internal stats left an unused variable
in ipoib_neigh_free(), since there's no longer any reason to get
netdev_priv() in order to increment dropped packets.  Delete the
unused priv variable.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-10 16:55:30 -07:00
Roland Dreier
de90351219 [IPoIB]: Convert to netdevice internal stats
Use the stats member of struct netdevice in IPoIB, so we can save
memory by deleting the stats member of struct ipoib_dev_priv, and save
code by deleting ipoib_get_stats().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:53:41 -07:00
Stephen Hemminger
3b04ddde02 [NET]: Move hardware header operations out of netdevice.
Since hardware header operations are part of the protocol class
not the device instance, make them into a separate object and
save memory.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:52 -07:00
Ralf Baechle
10d024c1b2 [NET]: Nuke SET_MODULE_OWNER macro.
It's been a useless no-op for long enough in 2.6 so I figured it's time to
remove it.  The number of people that could object because they're
maintaining unified 2.4 and 2.6 drivers is probably rather small.

[ Handled drivers added by netdev tree and some missed IRDA cases... -DaveM ]

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:51:13 -07:00
Eric W. Biederman
881d966b48 [NET]: Make the device list and device lookups per namespace.
This patch makes most of the generic device layer network
namespace safe.  This patch makes dev_base_head a
network namespace variable, and then it picks up
a few associated variables.  The functions:
dev_getbyhwaddr
dev_getfirsthwbytype
dev_get_by_flags
dev_get_by_name
__dev_get_by_name
dev_get_by_index
__dev_get_by_index
dev_ioctl
dev_ethtool
dev_load
wireless_process_ioctl

were modified to take a network namespace argument, and
deal with it.

vlan_ioctl_set and brioctl_set were modified so their
hooks will receive a network namespace argument.

So basically anthing in the core of the network stack that was
affected to by the change of dev_base was modified to handle
multiple network namespaces.  The rest of the network stack was
simply modified to explicitly use &init_net the initial network
namespace.  This can be fixed when those components of the network
stack are modified to handle multiple network namespaces.

For now the ifindex generator is left global.

Fundametally ifindex numbers are per namespace, or else
we will have corner case problems with migration when
we get that far.

At the same time there are assumptions in the network stack
that the ifindex of a network device won't change.  Making
the ifindex number global seems a good compromise until
the network stack can cope with ifindex changes when
you change namespaces, and the like.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:49:10 -07:00
Stephen Hemminger
bea3348eef [NET]: Make NAPI polling independent of struct net_device objects.
Several devices have multiple independant RX queues per net
device, and some have a single interrupt doorbell for several
queues.

In either case, it's easier to support layouts like that if the
structure representing the poll is independant from the net
device itself.

The signature of the ->poll() call back goes from:

	int foo_poll(struct net_device *dev, int *budget)

to

	int foo_poll(struct napi_struct *napi, int budget)

The caller is returned the number of RX packets processed (or
the number of "NAPI credits" consumed if you want to get
abstract).  The callee no longer messes around bumping
dev->quota, *budget, etc. because that is all handled in the
caller upon return.

The napi_struct is to be embedded in the device driver private data
structures.

Furthermore, it is the driver's responsibility to disable all NAPI
instances in it's ->stop() device close handler.  Since the
napi_struct is privatized into the driver's private data structures,
only the driver knows how to get at all of the napi_struct instances
it may have per-device.

With lots of help and suggestions from Rusty Russell, Roland Dreier,
Michael Chan, Jeff Garzik, and Jamal Hadi Salim.

Bug fixes from Thomas Graf, Roland Dreier, Peter Zijlstra,
Joseph Fannin, Scott Wood, Hans J. Koch, and Michael Chan.

[ Ported to current tree and all drivers converted.  Integrated
  Stephen's follow-on kerneldoc additions, and restored poll_list
  handling to the old style to fix mutual exclusion issues.  -DaveM ]

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:47:45 -07:00
Or Gerlitz
335a64a5a9 IPoIB: Allow setting policy to ignore multicast groups
The kernel IB stack allows (through the RDMA CM) userspace
applications to join and use multicast groups from the IPoIB MGID
range.  This allows multicast traffic to be handled directly from
userspace QPs, without going through the kernel stack, which gives
better performance for some applications.

However, to fully interoperate with IP multicast, such userspace
applications need to participate in IGMP reports and queries, or else
routers may not forward the multicast traffic to the system where the
application is running.  The simplest way to do this is to share the
kernel IGMP implementation by using the IP_ADD_MEMBERSHIP option to
join multicast groups that are being handled directly in userspace.

However, in such cases, the actual multicast traffic should not also
be handled by the IPoIB interface, because that would burn resources
handling multicast packets that will just be discarded in the kernel.

To handle this, this patch adds lookup on the database used for IB
multicast group reference counting when IPoIB is joining multicast
groups, and if a multicast group is already handled by user space,
then the IPoIB kernel driver ignores the group.  This is controlled by
a per-interface policy flag.  When the flag is set, IPoIB will not
join and attach its QP to a multicast group which already has an entry
in the database; when the flag is cleared, IPoIB will behave as before
this change.

For each IPoIB interface, the /sys/class/net/$intf/umcast attribute
controls the policy flag.  The default value is off/0.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-10 13:02:30 -07:00
Eli Cohen
55a98e955c IB/mthca: Mark error paths as unlikely() in post_srq_recv functions
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-10 11:24:33 -07:00
Dave Olson
3ac8c70f74 IB/ipath: Minor fix to ordering of freeing and zeroing of tid pages.
Fixed to be the same as everywhere else.  copy and then zero the page *
in the array first, and then pass the copy to the VM routines.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 21:03:02 -07:00
Ralph Campbell
bda94e32b3 IB/ipath: Remove redundant link state checks
This patch removes some redundant checks when the SMA changes the link
state since the same checks are made in the lower level function that
sets the state.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 21:02:08 -07:00
Ralph Campbell
49739b3e24 IB/ipath: Fix IB_EVENT_PORT_ERR event
The link state event calls were being generated when the SM told the SMA
to change link states. This works for IB_EVENT_PORT_ACTIVE but not if
the link goes down and stays down. The fix is to generate event calls
from the interrupt handler when the HW link state changes.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 21:01:38 -07:00
Michael Albaugh
6a733cdc71 IB/ipath: Better handling of unexpected GPIO interrupts
The General Purpose I/O pins can be configured to cause interrupts. At
the end of the interrupt code dealing with all known causes, a message
is output if any bits remain un-handled. Since this is a "can't happen"
scenario, it should only be triggered by bugs elsewhere. It is harmless,
and potentially beneficial, to limit the damage by masking any such
unexpected interrupts.

This patch adds disabling of interrupts from any pins that should
not have been allowed to interrupt, in addition to emitting a message.

Signed-off-by: Michael Albaugh <Michael.Albaugh@Qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 21:00:47 -07:00
Michael Albaugh
192594d523 IB/ipath: Maintain active time on all chips
There is a count of "active hours" maintained in EEPROM, to aid
troubleshooting. The definition of "active" is based on traffic
exceeding a threshold in any given 5-second polling interval. As
originally written, the check was inadvertently bypassed for chips whose
counters were 64-bits wide, and only applied to chips with 32-bit wide
counters.

This patch moves the test for amount of traffic "out" to a more common
location, rather than depending on a side-effect of the software
emulation of 64-bit counts on chips whose hardware is only 32-bits wide.

Signed-off-by: Michael Albaugh <Michael.Albaugh@Qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 21:00:08 -07:00
Dave Olson
aa7c79abd1 IB/ipath: Fix QHT7040 serial number check
Remove all the OEM and bringup boards, and complain and fail
initialization if one is found.  QHT7040 with GPIO rework (128ywwuuuu)
is OK, older 112ywwuuuu is no longer supported). The check that had been
added was failing both the 112 and 128 series.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 20:58:49 -07:00
Arthur Jones
20bed34314 IB/ipath: Indicate a couple of chip bugs to userspace
A couple of chip bugs in the iba6110 and in the iba6120 are not in more
recent chips.  This first bug swaps two of the pioavail register
locations.  In the second bug, the chip can sometimes forget to dma the
pio avail register to memory.  We indicate the presence of these bugs
with runtime flags and we indicate the presence of the flags by bumping
the SWMINOR.

Signed-off-by: Arthur Jones <arthur.jones@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 20:57:54 -07:00
Arthur Jones
4bec0b9155 IB/ipath: iba6110 rev4 no longer needs recv header overrun workaround
iba6110 rev3 and earlier had a chip bug where the chip could overrun the
recv header queue. rev4 fixed this chip bug so userspace no longer needs
to workaround it.  Now we only set the workaround flag for older chip
versions.

Signed-off-by: Arthur Jones <arthur.jones@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 20:56:59 -07:00
Arthur Jones
70c51da2c4 IB/ipath: Use counters in ipath_poll and cleanup interrupts in ipath_close
ipath_poll() suffered from a couple subtle bugs.  Under the right
conditions we could leave recv interrupts enabled on an ipath user
context on close, thereby taking potentially unwanted interrupts on the
next open -- this is fixed by unconditionally turning off recv
interrupts on close.  Also, we now use counters rather than set/clear
bits which allows us to make sure we catch all interrupts at the cost of
changing the semantics slightly (it's now give me all events since the
last time I called poll() rather than give me all events since I called
_this_ poll routine).  We also added some memory barriers which may help
ensure we get all notifications in a timely manner.

Signed-off-by: Arthur Jones <arthur.jones@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 20:56:23 -07:00
Ralph Campbell
542869a17e IB/ipath: Remove duplicate copy of LMC
The LMC value was being saved by the SMA in two places. This patch
cleans it up so only one copy is kept.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 20:55:06 -07:00
Ralph Campbell
15cba26f42 IB/ipath: Add ability to set the LMC via the sysfs debugging interface
This patch adds the ability to set the LMC via a sysfs file as if the SM
sent a SubnSet(PortInfo) MAD.  It is useful for debugging when no SM is
running.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 20:53:50 -07:00
Ralph Campbell
6cff2faaf1 IB/ipath: Optimize completion queue entry insertion and polling
The code to add an entry to the completion queue stored the QPN which is
needed for the user level verbs view of the completion queue entry but
the kernel struct ib_wc contains a pointer to the QP instead of a QPN.
When the kernel polled for a completion queue entry, the QPN was lookup
up and the QP pointer recovered. This patch stores the CQE differently
based on whether the CQ is a kernel CQ or a user CQ thus avoiding the
QPN to QP lookup overhead.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 20:52:23 -07:00
Ralph Campbell
d42b01b584 IB/ipath: Implement IB_EVENT_QP_LAST_WQE_REACHED
This patch implements the IB_EVENT_QP_LAST_WQE_REACHED event which is
needed by ib_ipoib to destroy the QP when used in connected mode.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 20:51:20 -07:00
Ralph Campbell
c9cf7db2bc IB/ipath: Generate flush CQE when QP is in error state
Follow the IB spec. (C10-96) for post send which states that a flushed 
completion event should be generated for work requests posted when a QP
is in the error state.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 20:50:29 -07:00
Ralph Campbell
036be09ca5 IB/ipath: Remove redundant code
This patch removes some redundant initialization code.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 20:46:23 -07:00
Dave Olson
d29cc6efb9 IB/ipath: Future proof eeprom checksum code (contents reading)
In an earlier change, the amount of data read from the flash was
mistakenly limited to the size known to the current driver.  This causes
problems when the length is increased, and written with the new longer
version; the checksum would fail because not enough data was read.
Always read the full 128 byte length to prevent this.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 20:45:57 -07:00
Ralph Campbell
55046698fa IB/ipath: UC RDMA WRITE with IMMEDIATE doesn't send the immediate
This patch fixes a bug in the receive processing for UC RDMA WRITE with
immediate which caused the last packet to be dropped.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 20:44:56 -07:00
Dave Olson
9ef8617af7 IB/ipath: Correctly describe workaround for TID write chip bug
This is a comment change, only, correcting the comment to match the
implemented workaround, rather than the original workaround, and
clarifying why it's needed.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 20:44:20 -07:00
Ralph Campbell
1793b4771d IB/ipath: Remove unneeded code for ipathfs
The ipathfs file system is used to export binary data verses ASCII data
such as through /sys. This patch removes some unneeded files since the
data is available through other /sys files.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 20:43:17 -07:00
Dave Olson
9bec399231 IB/ipath: Verify host bus bandwidth to chip will not limit performance
There have been a number of issues where host bandwidth via HT or PCIe
to the InfiniPath chip has been limited in some fashion (BIOS,
configuration, etc.), resulting in user confusion.  This check gives a
clear warning that something is wrong and needs to be resolved.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 20:20:15 -07:00
Ralph Campbell
4ee97180ac IB/ipath: Change UD to queue work requests like RC & UC
The code to post UD sends tried to process work requests at the time
ib_post_send() is called without using a WQE queue.  This was fine as
long as HW resources were available for sending a packet.  This patch
changes UD to be handled more like RC and UC and shares more code.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 20:05:49 -07:00
Ralph Campbell
210d6ca3db IB/ipath: Performance optimization for CPU differences
Different processors have different ordering restrictions for write
combining.  By taking advantage of this, we can eliminate some write
barriers when writing to the send buffers.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 20:04:14 -07:00
Arthur Jones
327a338d4f IB/ipath: iba6110 rev4 GPIO counters support
On iba6110 rev4, support for three more IB counters were added.  The
LocalLinkIntegrityError counter, the ExcessiveBufferOverrunErrors
counter and support for error counting of flow control packets on an
invalid VL.  These counters trigger GPIO interrupts and the sw keeps
track of the counts.  Since we also use GPIO interrupts to signal packet
reception, we need to turn off the fast interrupts, or we risk losing a
GPIO interrupt.

Signed-off-by: Arthur Jones <arthur.jones@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 20:02:46 -07:00
Roland Dreier
76dea3bc26 IB/ehca: Fix clipping of device limits to INT_MAX
Doing min_t(int, foo, INT_MAX) doesn't work correctly, because if foo
is bigger than INT_MAX, then when treated as a signed integer, it will
become negative and hence such an expression is just an elaborate NOP.

Fix such cases in ehca to do min_t(unsigned, foo, INT_MAX) instead.
This fixes negative reported values for max_cqe, max_pd and max_ah:

Before:

        max_cqe:                        -64
        max_pd:                         -1
        max_ah:                         -1

After:
        max_cqe:                        2147483647
        max_pd:                         2147483647
        max_ah:                         2147483647

Based on a bug report and fix from Anton Blanchard <anton@samba.org>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:18 -07:00
Dotan Barak
ede6bc04f3 IPoIB/cm: Clean up initialization of QP attr in ipoib_cm_create_tx_qp()
Make the way QP is being created in ipoib_cm_create_tx_qp()
consistent with ipoib_cm_create_rx_qp().

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:18 -07:00
Roland Dreier
76d7cc0345 IB/mthca: Use mmiowb() to avoid firmware commands getting jumbled up
Firmware commands are sent to the HCA by writing multiple words to a
command register block.  Access to this block of registers is
serialized with a mutex.  However, on large SGI systems, problems were
seen with multiple CPUs issuing FW commands at the same time, because
the writes to the register block may be reordered within the system
interconnect and reach the HCA in a different order than they were
issued (even with the mutex).  Fix this by adding an mmiowb() before
dropping the mutex.

Tested-by: Arthur Kepner <akepner@sgi.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:17 -07:00
Sean Hefty
dcb3f974da RDMA/cma: Queue IB CM MRAs to avoid unnecessary remote retries
Automatically queue MRA message to decrease the number of retries sent
by the remote side during connection establishment.  This also has the
effect of increasing the overall connection timeout without using a
longer retry time in the case of dropped packets.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:17 -07:00
Sean Hefty
de98b693e9 IB/cm: Modify interface to send MRAs in response to duplicate messages
The IB CM provides a message received acknowledged (MRA) message that
can be sent to indicate that a REQ or REP message has been received, but
will require more time to process than the timeout specified by those
messages.  In many cases, the application may not know how long it will
take to respond to a CM message, but the majority of the time, it will
usually respond before a retry has been sent.  Rather than sending an
MRA in response to all messages just to handle the case where a longer
timeout is needed, it is more efficient to queue the MRA for sending in
case a duplicate message is received.

This avoids sending an MRA when it is not needed, but limits the number
of times that a REQ or REP will be resent.  It also provides for a
simpler implementation than generating the MRA based on a timer event.
(That is, trying to send the MRA after receiving the first REQ or REP if
a response has not been generated, so that it is received at the remote
side before a duplicate REQ or REP has been received)

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:17 -07:00
Roland Dreier
1a1eb6a646 IB/mthca: Increase max number of QPs per multicast group to 56
Increase the number of QPs allowed per multicast group from 8 to 56.
This allows for one QP per core on 16-core systems, which are now
quite common, and allows some space for future growth.

This is basically the same patch that Jack Morgenstein
<jackm@dev.mellanox.co.il> just supplied for mlx4.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:17 -07:00
Jack Morgenstein
8ad11fb6b0 IB/mlx4: Implement FMRs
Implement FMRs for mlx4.  This is an adaptation of code from mthca.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:16 -07:00
Jack Morgenstein
d7bb58fb1c mlx4_core: Write MTTs from CPU instead with of WRITE_MTT FW command
Write MTT entries directly to ICM from the driver (eliminating use of
WRITE_MTT command).  This reduces the number of FW commands needed to
register an MR by at least a factor of 2 and speeds up memory
registration significantly.  This code will also be used to implement
FMRs.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:16 -07:00
Roland Dreier
04d29b0ede IB/uverbs: Make ib_uverbs_release_event_file() static
ib_uverbs_release_event_file() is only used in uverbs_main.c, so make it
static to that file.  Also move the definition before the first use, so
a forward declaration is not needed.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:15 -07:00
Roland Dreier
a394f83bdf IB/umad: Fix bit ordering and 32-on-64 problems on big endian systems
The declaration of struct ib_user_mad_reg_req.method_mask[] exported
to userspace was an array of __u32, but the kernel internally treated
it as a bitmap made up of longs.  This makes a difference for 64-bit
big-endian kernels, where numbering the bits in an array of__u32 gives:

    |31.....0|63....31|95....64|127...96|

while numbering the bits in an array of longs gives:

    |63..............0|127............64|

64-bit userspace can handle this by just treating method_mask[] as an
array of longs, but 32-bit userspace is really stuck: the meaning of
the bits in method_mask[] depends on whether the kernel is 32-bit or
64-bit, and there's no sane way for userspace to know that.

Fix this by updating <rdma/ib_user_mad.h> to make it clear that
method_mask[] is an array of longs, and using a compat_ioctl method to
convert to an array of 64-bit longs to handle the 32-on-64 problem.
This fixes the interface description to match existing behavior (so
working binaries continue to work) in almost all situations, and gives
consistent semantics in the case of 32-bit userspace that can run on
either a 32-bit or 64-bit kernel, so that the same binary can work for
both 32-on-32 and 32-on-64 systems.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:15 -07:00
Roland Dreier
2be8e3ee8e IB/umad: Add P_Key index support
Add support for setting the P_Key index of sent MADs and getting the
P_Key index of received MADs.  This requires a change to the layout of
the ABI structure struct ib_user_mad_hdr, so to avoid breaking
compatibility, we default to the old (unchanged) ABI and add a new
ioctl IB_USER_MAD_ENABLE_PKEY that allows applications that are aware
of the new ABI to opt into using it.

We plan on switching to the new ABI by default in a year or so, and
this patch adds a warning that is printed when an application uses the
old ABI, to push people towards converting to the new ABI.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Hal Rosenstock <hal@xsigo.com>
2007-10-09 19:59:15 -07:00
Joachim Fenkes
c01759cee9 IB/ehca: Return srq_attr->max_sge in ehca_query_srq()
Totally forgot this.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:15 -07:00
Hoang-Nam Nguyen
a660722375 IB/ehca: Adjust 64-bit alignment of create QP response for userspace
Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:14 -07:00
Hoang-Nam Nguyen
03f72a51cb IB/ehca: Fix mem leak of firmware ctrlblock in ehca_create_srq()
Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:14 -07:00
Jack Morgenstein
cd9281d873 IB/mlx4: Display misc device information under /sys/class/infiniband/
display the following device information under /sys/class/infiniband/mlx4_X:
board_id, fw_ver, hw_rev, hca_type.

This patch makes this information available to userspace utilities
such as ibstat and ibv_devinfo.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:14 -07:00
Ralph Campbell
57cb61d587 IB/core: Fix handling of multicast response failures
I was looking at the code for multicast.c and noticed that
ib_sa_join_multicast() calls queue_join() which puts the
request at the front of the group->pending_list.  If this
is a second request, it seems like it would interfere with
process_join_error() since group->last_join won't point
to the member at the head of the pending_list. The sequence
would thus be:

1. ib_sa_join_multicast()
   puts member1 on head of pending_list and starts work thread
2. mcast_work_handler()
   calls send_join() which sets group->last_join to member1
3. ib_sa_join_multicast()
   puts member2 on head of pending_list
4. join operation for member1 receives failures response from SA.
5. join_handler() is called with error status
6. process_join_error() fails to process member1 since
   it doesn't match the first entry in the group->pending_list.

The impact is that the failed join request is tossed.  The second
request is processed, and after it completes, the original request ends
up being retried.

This change also results in join requests being processed in FIFO
order.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:14 -07:00
Satyam Sharma
9faa559c01 IB/ehca: Misc cpuinit section annotations and #ifdef cleanups
* Replace {un}register_cpu_notifier with {un}register_hotcpu_notifier
  thereby losing a couple of #ifdef HOTPLUG_CPU pairs.
* Move comp_pool_callback_nb declaration to below that of callback
  function so that initialization of .notifier_call and .priority can
  occur at build time itself and not runtime.
* Mark the notifier_block (and callback function, and another static
  function used by it) as __cpuinit{data} for the sake of consistency
  and remove enclosing #ifdef. (This may increase size for modular
  build of this module, however, because these are no longer dropped
  unconditionally now.)

Signed-off-by: Satyam Sharma <satyam@infradead.org>
Acked-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:14 -07:00
Roland Dreier
ec2a1344ad IB/iser: Remove unnecessary includes
<asm/scatterlist.h> is not needed because everyplace it appears,
<linux/scatterlist.h> also appears.  <asm/io.h> is not needed because
nothing seems to be using device IO anyway.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:13 -07:00
Steve Wise
935ef2d7a2 RDMA/cma: Use neigh_event_send() to start neighbour discovery
Calling arp_send() to initiate neighbour discovery (ND) doesn't do the
full ND protocol.  Namely, it doesn't handle retransmitting the arp
request if it is dropped. The function neigh_event_send() does all
this.  Without doing full ND, RDMA address resolution fails in the
presence of dropped ARP broadcast packets.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:13 -07:00
Joachim Fenkes
3a31c41901 IB/ehca: Only use MR large pages for hugetlb regions
...because, on virtualized hardware like System p, we can't be sure
that the physical pages behind them are contiguous otherwise.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:13 -07:00
Joachim Fenkes
c8d8beea03 IB/umem: Add hugetlb flag to struct ib_umem
During ib_umem_get(), determine whether all pages from the memory
region are hugetlb pages and report this in the "hugetlb" member.
Low-level drivers can use this information if they need it.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:13 -07:00
Sean Hefty
247e020ee5 IB/srp: Add QoS support through service ID
Provide the target service ID when performing a path record query to
support optional QoS capability.  QoS requires support from the SA.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:12 -07:00
Sean Hefty
7ce86409ad RDMA/ucma: Allow user space to set service type
Export the ability to set the type of service to user space.  Model
the interface after setsockopt.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:12 -07:00
Sean Hefty
a81c994d5e RDMA/cma: Add ability to specify type of service
Provide support to specify a type of service for a communication
identifier.  A new function call is used when dealing with IPv4
addresses.  For IPv6 addresses, the ToS is specified through the
traffic class field in the sockaddr_in6 structure.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

[ The comments Eitan Zahavi and myself have made over the v1 post at 
  <http://lists.openfabrics.org/pipermail/general/2007-August/039247.html>
  were fully addressed. ]
 
Reviewed-by: Or Gerlitz <ogerlitz@voltaire.com> 
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:12 -07:00
Sean Hefty
733d65fe33 IB/sa: Add new QoS fields to path record
The QoS annex defines new fields for path records.  Add them to the
ib_sa for consumers that want to use them.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:12 -07:00
Sean Hefty
81668838c4 IPoIB: Specify Traffic Class with path record queries for QoS support
To support QoS within and between subnets, modify IPoIB to request
specific Traffic Class values with path record queries, using
the value associated with the IPoIB broadcast group.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

[ See some comments I made on this at v1 and v2 of the posts
  <http://lists.openfabrics.org/pipermail/general/2007-August/039275.html>
  <http://lists.openfabrics.org/pipermail/general/2007-September/040312.html> ]

Reviewed-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:11 -07:00
Hoang-Nam Nguyen
08c283ac26 IB/ehca: Fix large page HW cap defines
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:11 -07:00
Joachim Fenkes
39089e7774 IB/ehca: Bump version number and change its format
Nobody needed the SVNEHCA_ prefix anyway.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:11 -07:00
Joachim Fenkes
5110e4de49 IB/ehca: Replace get_paca()->paca_index by the more portable raw_smp_processor_id()
We can use raw_smp_processor_id() here because the processor ID is
only used for debug output and therefore our use is preemption-unsafe.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:11 -07:00
Joachim Fenkes
0b5de96858 IB/ehca: Serialize MR alloc and MR free hvCalls
Some firmware levels exhibit a race condition between H_ALLOC_RESOURCE(MR)
and H_FREE_RESOURCE(MR).  Work around this problem by locking these hvCalls
against each other.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:11 -07:00
Joachim Fenkes
e90d0b3dae IB/ehca: Path migration support
Fix some modify_qp() issues related to path migration.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:10 -07:00
Joachim Fenkes
b708fba3c2 IB/ehca: Add check for max #SGE to create_qp()
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:10 -07:00
Joachim Fenkes
86dce445e0 IB/ehca: ehca_gen_warn() should always print
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:10 -07:00
Joachim Fenkes
e37221928b IB/ehca: Print return codes as signed decimal integers
...because -12 is easier to read than FFFFFFF4.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:10 -07:00
Joachim Fenkes
2863ad4bdd IB/ehca: Refactor hvcall tracing
Change hvcall trace output towards better readability: reg numbers
instead of argument numbers, return code as signed decimal instead of
unsigned hex.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:10 -07:00
Hoang-Nam Nguyen
e390d3b52f IB/ehca: Use remap_4k_pfn() to map firmware contexts to user space
Use Paul's new remap_4k_pfn() function to map our 4K firmware contexts
into user space on 64K-page machines without exposing neighboring
firmware contexts. Return the context's offset within a 64K page to
user space so it can determine the proper virtual address.

For details about remap_4k_pfn(), see commit 721151d0 or
http://patchwork.ozlabs.org/linuxppc/patch?id=10281

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:09 -07:00
Stefan Roscher
5281a4b8a0 IB/ehca: Support more than 4k QPs for userspace and kernelspace
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:08 -07:00
Stefan Roscher
441633b968 IB/ehca: Small QP userspace support
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:07 -07:00
Peter Oruba
a855b1a742 IB/mthca: Use PCI-X/PCI-Express read control interfaces
These driver changes incorporate the proposed PCI-X / PCI-Express read
byte count interface.  Reading and setting those values doesn't take
place "manually", instead wrapping functions are called to allow
quirks for some PCI bridges.

Signed-off by: Peter Oruba <peter.oruba@amd.com>
Based on work by Stephen Hemminger <shemminger@linux-foundation.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:07 -07:00
Ali Ayoub
3c10c7c929 IB/sa: Error handling thinko fix
ib_create_send_mad() returns an error code pointer on error, not NULL.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:07 -07:00
Anton Blanchard
339e2640a9 IB/ehca: Export module parameters in sysfs
At the moment the ehca module parameters are not exported in sysfs.
Export them with 0444 permissions.

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:06 -07:00
Anton Blanchard
1f79448302 IB/ehca: Make output clearer by removing some debug messages
ehca spits out a lot of debugging information. I had to look closely to
see the "Port 1 is not active" message within all the debug:

eHCA Infiniband Device Driver (Rel.: SVNEHCA_0022)
eHCA scaling code enabled
ehca D.001.DQDXYCB-P1-C9: PU0006 EHCA_ERR:ehca_define_sqp Port 1 is not active.
ehca D.001.DQDXYCB-P1-C9: PU0006 EHCA_ERR:ehca_create_qp ehca_define_sqp() failed rc=ffffffffffffffff
ib_mad: Couldn't create ib_mad QP1
ib_mad: Couldn't open ehca0 port 1
ehca D.001.DQDXYCB-P1-C9: PU0006 EHCA_ERR:ehca_alloc_fmr unsupported fmr_attr->page_shift=9
ehca D.001.DQDXYCB-P1-C9: PU0006 EHCA_ERR:ehca_alloc_fmr rc=ffffffffffffffea pd=c000000b4b5b2420 mr_access_flags=7 fmr_attr=c0000005afd37394
fmr_create failed for FMR 0

Remove a few debug statements so that things are clearer:

eHCA Infiniband Device Driver (Rel.: SVNEHCA_0022)
eHCA scaling code enabled
ehca D.001.DQDXYCB-P1-C9: PU0006 EHCA_ERR:ehca_define_sqp Port 1 is not active.
ib_mad: Couldn't create ib_mad QP1
ib_mad: Couldn't open ehca0 port 1
ehca D.001.DQDXYCB-P1-C9: PU0006 EHCA_ERR:ehca_alloc_fmr unsupported fmr_attr->page_shift=9
fmr_create failed for FMR 0

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:06 -07:00
Roland Dreier
d7dc3ccbe4 IB/mlx4: Fix up SRQ limit_watermark endianness
mlx4_srq_query() returns a big-endian 16-bit value through an int *,
which screws up sparse checking.  Fix this so that a CPU-endian value
is returned.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:06 -07:00
Eli Cohen
ca6de177ac IPoIB: Fix error path memory leak
Clean up properly if ib_query_pkey() or ib_query_gid() fail.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:06 -07:00
Eli Cohen
b3ac60fc24 IPoIB: Fix typo to end statement with ';' instead of ','
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:06 -07:00
Michael S. Tsirkin
017aadc4b5 IB/mthca: Enable MSI-X by default
Recover from MSI-X errors by automatically falling back on regular
interrupt, instead of asking the user to do this manually.  This makes
it possible to enable MSI-X by default, and will make it possible to
get rid of the msi_x module option in the future.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:06 -07:00
Anton Blanchard
8a68bbe31d IB/fmr_pool: Clean up some error messages in fmr_pool.c
A number of printks in fmr_pool.c dont have newlines, eg:

    fmr_create failed for FMR 0<5>FS-Cache: Loaded

Fix them up.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:05 -07:00
Roland Dreier
1fea391039 IB/ehca: Include <linux/mutex.h> from ehca_classes.h
ehca_classes.h uses struct mutex, so while <linux/mutex.h> seems to be
pulled in indirectly by one of the headers it includes, the right
thing is to include <linux/mutex.h> directly.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Acked-by: Stefan Roscher <stefan.roscher@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:05 -07:00
Roland Dreier
2242fa4f04 IB/mlx4: Use __set_data_seg() in mlx4_ib_post_recv()
Use a __set_data_seg() helper in mlx4_ib_post_recv() too; in addition
to making the code easier to read, this also allows gcc to generate
better code -- on x86_64:

add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-8 (-8)
function                                     old     new   delta
mlx4_ib_post_recv                            359     351      -8

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:05 -07:00
Roland Dreier
65d470b3ea IB: find_first_zero_bit() takes unsigned pointer
Fix sparse warning

    drivers/infiniband/core/device.c:142:6: warning: incorrect type in argument 1 (different signedness)
    drivers/infiniband/core/device.c:142:6:    expected unsigned long const *addr
    drivers/infiniband/core/device.c:142:6:    got long *[assigned] inuse

by making the local variable inuse unsigned.  Does not affect generated
code at all.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:04 -07:00
Roland Dreier
ce423ef50e IPoIB: Make sure no receives are handled when stopping device
The current IPoIB code might process receive completions from
ipoib_drain_cq() when bringing down the interface.  This could cause
packets to be passed up the stack without the device's poll method
being called.  Avoid this by setting the status of any successful
completions to IB_WC_WR_FLUSH_ERR.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:04 -07:00
Steve Wise
e54664c095 RDMA/cxgb3: Make the iw_cxgb3 module parameters writable
Allow changing parameter values without having to reload the module.
This is safe because these parameters are only looked at when a new
connection is established.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:04 -07:00
Jack Morgenstein
6e694ea33e IB/mlx4: Fix data corruption triggered by wrong headroom marking order
This is an addendum to commit 0e6e7416 ("IB/mlx4: Handle new FW
requirement for send request prefetching").  We also need to handle
prefetch marking properly for S/G segments, or else the HCA may end up
processing S/G segments that are not fully written and end up sending
the wrong data.  This can actually cause data corruption in practice,
especially on systems with relatively slow CPUs (where the HCA is more
likely to prefetch while the CPU is in the middle of writing a work
request into memory).

We write S/G segments in reverse order into the WQE, in order to
guarantee that the first dword of all cachelines containing S/G
segments is written last (overwriting the headroom invalidation
pattern).  The entire cacheline will thus contain valid data when the
invalidation pattern is overwritten.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-09-23 13:03:22 -07:00
Linus Torvalds
6db602d447 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/ehca: SRQ fixes to enable IPoIB CM
  IB/ehca: Fix Small QP regressions
2007-08-31 20:40:37 -07:00
Joachim Fenkes
5ff70cac3e IB/ehca: SRQ fixes to enable IPoIB CM
Fix ehca SRQ support so that IPoIB connected mode works:

 - Report max_srq > 0 if SRQ is supported
 - Report "last wqe reached" asynchronous event when base QP dies;
   this is required by the IB spec and IPoIB CM relies on receiving it
   when cleaning up.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-08-31 13:58:04 -07:00
Stefan Roscher
fecea0ab34 IB/ehca: Fix Small QP regressions
The new Small QP code had a few bugs that would also make it trigger
for non-Small QPs.  Fix them.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-08-31 13:56:42 -07:00
Divy Le Ray
5fbf816fe7 cxgb3 - Fix dev->priv usage
cxgb3 used netdev_priv() and dev->priv for different purposes.
In 2.6.23, netdev_priv() == dev->priv, cxgb3 needs a fix.
This patch is a partial backport of Dave Miller's changes in the
net-2.6.24 git branch.

Without this fix, cxgb3 crashes on 2.6.23.

Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-08-31 07:29:08 -04:00
Ilpo Järvinen
fe11cb6ba4 IB/mlx4: Incorrect semicolon after if statement
A stray semicolon makes us inadvertently ignore the value of err.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-08-15 20:24:06 -07:00
Jack Morgenstein
6958e827f1 IPoIB: Fix leak in ipoib_transport_dev_init() error path
ipoib_transport_dev_init() calls ipoib_cm_dev_init(), so it needs to
call ipoib_cm_dev_cleanup() to unwind that on the error path.

Found by Dotan Barak of Mellanox.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-08-07 12:40:56 -07:00
Vu Pham
198919151d IB/mlx4: Fix opcode returned in RDMA read completion
Current code has a cut-and-paste error and returns IB_WC_SEND when it
should return IB_WC_RDMA_READ.

Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-08-03 14:29:06 -07:00
Raghava Kondapalli
3d1ff48da7 IB/srp: Add OUI for new Cisco targets
New Cisco IB SRP targets use the Cisco OUI 00-1b-0d but still need the
Topspin workarounds.  Add this OUI to srp_target_is_topspin().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-08-03 10:45:18 -07:00
Roland Dreier
5d7cbfd631 IB/srp: Wrap OUI checking for workarounds in helper functions
Wrap the checking for Mellanox and Topspin OUIs to decide whether to
use a workaround into helper functions.  This will make it cleaner to
add a new OUI to check (as we need to do now that some targets with a
Cisco OUI still need the Topspin workarounds).

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-08-03 10:45:18 -07:00
Steve Wise
699924b1e1 RDMA/cxgb3: Always call low level send function via cxgb3_ofld_send()
This avoids deadlocks.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-08-03 10:45:18 -07:00
Dotan Barak
92ddc447ce IB: Move the macro IB_UMEM_MAX_PAGE_CHUNK() to umem.c
After moving the definition of struct ib_umem_chunk from ib_verbs.h to
ib_umem.h there isn't any reason for the macro IB_UMEM_MAX_PAGE_CHUNK
to stay in ib_verbs.h.  Move the macro to umem.c, the only place where
it is used.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-08-03 10:45:18 -07:00
Sean Hefty
38d5af9565 IB/mad: Fix address handle leak in mad_rmpp
The address handle associated with dual-sided RMPP direction switch
ACKs is never destroyed.  Free the AH for ACKs which fall into this
category.

Problem was reported by Dotan Barak <dotanb@dev.mellanox.co.il>.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-08-03 10:45:17 -07:00
Hal Rosenstock
8fc394b197 IB/mad: agent_send_response() should be void
Nothing looks at the return value of agent_send_response(), so there's
no point in returning anything.

Signed-off-by: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-08-03 10:45:17 -07:00
Hal Rosenstock
86dfbecdea IB/mad: Fix memory leak in switch handling in ib_mad_recv_done_handler()
If agent_send_response() returns an error, we shouldn't do anything
differently than if it succeeds; setting response to NULL just means
that the response buffer gets leaked.

Signed-off-by: Suresh Shelvapille <suri@baymicrosystems.com>
Signed-off-by: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-08-03 10:45:17 -07:00
Hal Rosenstock
445d68070c IB/mad: Fix error path if response alloc fails in ib_mad_recv_done_handler()
If ib_mad_recv_done_handler() fails to allocate response, then it just
printed a warning and continued, which leads to an oops if the MAD is
being handled for a switch device, because the switch code uses
response without checking for NULL.  Fix this by bailing out of the
function if the allocation fails.

Signed-off-by: Suresh Shelvapille <suri@baymicrosystems.com>
Signed-off-by: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-08-03 10:45:17 -07:00
Roland Dreier
5399891052 IB/sa: Don't need to check for default P_Key twice
Now that ib_find_pkey() ignores the membership bit of P_Keys, there's no
need for ib_sa to look for both 0x7fff and 0xffff in a port's P_Key table.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-08-03 10:45:17 -07:00
Moni Shoua
36026ecc20 IB/core: Ignore membership bit in ib_find_pkey()
ib_find_pkey() is used as a replacement for ib_find_cached_pkey(), and
the original function ignored the membership bit when searching for a
P_Key, so ib_find_pkey() should ignore the bit too.

In particular, IPoIB turns on the P_Key membership bit of limited
membership P_Keys when creating a child interface and looks for the
full membership P_key.  This broke if a port was a partial member of a
partition when IPoIB switched from ib_find_cached_pkey() to
ib_find_pkey(), and this change fixes things again.

Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-08-03 10:45:17 -07:00
Eddy L O Jansson
0e6ff1580f in-string typos of "error"
One patch for two trivial typos of 'error' with three R's, appearing in message strings.

There's a bunch more of the same in comments, not dealt with here.

Signed-off-by: Eddy L O Jansson <eddy@klopper.net>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-31 15:39:40 -07:00
Linus Torvalds
4dcf39c6cc Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/ipath: Workaround problem of errormask register being overwritten
  IB/ipath: Fix some issues with buffer cancel and sendctrl register update
  IB/ipath: Use faster put_tid_2 routine after initialization
  IB/ipath: Remove unsafe fastrcvint code from interrupt handler
  IB/ehca: Move extern declarations from .c files to .h files
  IB/mlx4: Whitespace fix
  IB/ehca: Fix include order to better match kernel style
  mlx4_core: Remove kfree() in mlx4_mr_alloc() error flow
  RDMA/amso1100: Initialize the wait_queue_head_t in the c2_qp structure
2007-07-30 16:36:33 -07:00
Dave Olson
78d1e02fac IB/ipath: Workaround problem of errormask register being overwritten
On some system hardware, we are seeing moderately common cases of the
chip errormask register being overwritten due to a chip bug in iba6120
that is triggered by a vendor-specific PCIe broadcast message.  This
patch merely checks periodically, and corrects it if needed (the
overwrite can cause us to not get error and hardware error
interrupts).  Also, make dd->ipath_errormask the one, true canonical
source for kr_errormask, and remove references to ipath_ignorederrs as
it is currently unused.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: John Gregor <john.gregor@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-30 13:16:46 -07:00
Dave Olson
3810f2a84e IB/ipath: Fix some issues with buffer cancel and sendctrl register update
There was confused use of INFINIPATH_S_PIOBUFAVAILUPD (value) and
IPATH_S_PIOBUFAVAILUPD (bit position).  Also, some callers of
ipath_cancel_sends() need kr_sendctrl restored, and some want to do it
later.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-30 13:16:46 -07:00
Dave Olson
cf5b60aa40 IB/ipath: Use faster put_tid_2 routine after initialization
At one time the ipath_minrev field was initialized prior to the
ipath_init_iba6120_funcs call, but that is no longer the case, so the
slower put_tid routine was always being used.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-30 13:16:46 -07:00
Dave Olson
f17fddc9e2 IB/ipath: Remove unsafe fastrcvint code from interrupt handler
The fastrcvint code's purpose was to avoid reading the interrupt
status if kernel packets were in the receive queue (to reduce
overhead).  Because intstatus was not read, we could miss the error
interrupt bit indicating freeze mode, since it only delivers a single
interrupt, even if still pending after intclear is written.

This patch removes that unsafe optimization.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-30 13:16:45 -07:00
Linus Torvalds
a6ce22a5f6 Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6: (28 commits)
  [SCSI] mpt fusion: Changes in mptctl.c for logging support
  [SCSI] mpt fusion: Changes in mptfc.c mptlan.c mptsas.c and mptspi.c for logging support
  [SCSI] mpt fusion: Changes in mptscsih.c for logging support
  [SCSI] mpt fusion: Changes in mptbase.c for logging support
  [SCSI] mpt fusion: logging support in Kconfig, Makefile, mptbase.h and addition of mptdebug.h
  [SCSI] libsas: Fix potential NULL dereference in sas_smp_get_phy_events()
  [SCSI] bsg: Fix build for CONFIG_BLOCK=n
  [SCSI] aacraid: fix Sunrise Lake reset handling
  [SCSI] aacraid: add SCSI SYNCHONIZE_CACHE range checking
  [SCSI] add easyRAID to the no report luns blacklist
  [SCSI] advansys: lindent and other large, uninteresting changes
  [SCSI] aic79xx, aic7xxx: Fix incorrect width setting
  [SCSI] qla2xxx: fix to honor ignored parameters in sysfs attributes
  [SCSI] aacraid: draw line in sand, sundry cleanup and version update
  [SCSI] iscsi_tcp: Turn off bounce buffers
  [SCSI] libiscsi: fix cmd seqeunce number checking
  [SCSI] iscsi_tcp, ib_iser Enable module refcounting for iscsi host template
  [SCSI] libiscsi: make sure session is not blocked when removing host
  [SCSI] libsas: Remove PCI dependencies
  [SCSI] simscsi: convert to use the data buffer accessors
  ...
2007-07-29 17:22:03 -07:00
Alexey Dobriyan
4e950f6f01 Remove fs.h from mm.h
Remove fs.h from mm.h. For this,
 1) Uninline vma_wants_writenotify(). It's pretty huge anyway.
 2) Add back fs.h or less bloated headers (err.h) to files that need it.

As result, on x86_64 allyesconfig, fs.h dependencies cut down from 3929 files
rebuilt down to 3444 (-12.3%).

Cross-compile tested without regressions on my two usual configs and (sigh):

alpha              arm-mx1ads        mips-bigsur          powerpc-ebony
alpha-allnoconfig  arm-neponset      mips-capcella        powerpc-g5
alpha-defconfig    arm-netwinder     mips-cobalt          powerpc-holly
alpha-up           arm-netx          mips-db1000          powerpc-iseries
arm                arm-ns9xxx        mips-db1100          powerpc-linkstation
arm-assabet        arm-omap_h2_1610  mips-db1200          powerpc-lite5200
arm-at91rm9200dk   arm-onearm        mips-db1500          powerpc-maple
arm-at91rm9200ek   arm-picotux200    mips-db1550          powerpc-mpc7448_hpc2
arm-at91sam9260ek  arm-pleb          mips-ddb5477         powerpc-mpc8272_ads
arm-at91sam9261ek  arm-pnx4008       mips-decstation      powerpc-mpc8313_rdb
arm-at91sam9263ek  arm-pxa255-idp    mips-e55             powerpc-mpc832x_mds
arm-at91sam9rlek   arm-realview      mips-emma2rh         powerpc-mpc832x_rdb
arm-ateb9200       arm-realview-smp  mips-excite          powerpc-mpc834x_itx
arm-badge4         arm-rpc           mips-fulong          powerpc-mpc834x_itxgp
arm-carmeva        arm-s3c2410       mips-ip22            powerpc-mpc834x_mds
arm-cerfcube       arm-shannon       mips-ip27            powerpc-mpc836x_mds
arm-clps7500       arm-shark         mips-ip32            powerpc-mpc8540_ads
arm-collie         arm-simpad        mips-jazz            powerpc-mpc8544_ds
arm-corgi          arm-spitz         mips-jmr3927         powerpc-mpc8560_ads
arm-csb337         arm-trizeps4      mips-malta           powerpc-mpc8568mds
arm-csb637         arm-versatile     mips-mipssim         powerpc-mpc85xx_cds
arm-ebsa110        i386              mips-mpc30x          powerpc-mpc8641_hpcn
arm-edb7211        i386-allnoconfig  mips-msp71xx         powerpc-mpc866_ads
arm-em_x270        i386-defconfig    mips-ocelot          powerpc-mpc885_ads
arm-ep93xx         i386-up           mips-pb1100          powerpc-pasemi
arm-footbridge     ia64              mips-pb1500          powerpc-pmac32
arm-fortunet       ia64-allnoconfig  mips-pb1550          powerpc-ppc64
arm-h3600          ia64-bigsur       mips-pnx8550-jbs     powerpc-prpmc2800
arm-h7201          ia64-defconfig    mips-pnx8550-stb810  powerpc-ps3
arm-h7202          ia64-gensparse    mips-qemu            powerpc-pseries
arm-hackkit        ia64-sim          mips-rbhma4200       powerpc-up
arm-integrator     ia64-sn2          mips-rbhma4500       s390
arm-iop13xx        ia64-tiger        mips-rm200           s390-allnoconfig
arm-iop32x         ia64-up           mips-sb1250-swarm    s390-defconfig
arm-iop33x         ia64-zx1          mips-sead            s390-up
arm-ixp2000        m68k              mips-tb0219          sparc
arm-ixp23xx        m68k-amiga        mips-tb0226          sparc-allnoconfig
arm-ixp4xx         m68k-apollo       mips-tb0287          sparc-defconfig
arm-jornada720     m68k-atari        mips-workpad         sparc-up
arm-kafa           m68k-bvme6000     mips-wrppmc          sparc64
arm-kb9202         m68k-hp300        mips-yosemite        sparc64-allnoconfig
arm-ks8695         m68k-mac          parisc               sparc64-defconfig
arm-lart           m68k-mvme147      parisc-allnoconfig   sparc64-up
arm-lpd270         m68k-mvme16x      parisc-defconfig     um-x86_64
arm-lpd7a400       m68k-q40          parisc-up            x86_64
arm-lpd7a404       m68k-sun3         powerpc              x86_64-allnoconfig
arm-lubbock        m68k-sun3x        powerpc-cell         x86_64-defconfig
arm-lusl7200       mips              powerpc-celleb       x86_64-up
arm-mainstone      mips-atlas        powerpc-chrp32

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-29 17:09:29 -07:00
Hoang-Nam Nguyen
1655fc2e12 IB/ehca: Move extern declarations from .c files to .h files
Make sure declarations stay in sync with definitions by keeping all 
extern declarations in common .h files.

Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-28 21:47:53 -07:00
Roland Dreier
e0f5d99e8d IB/mlx4: Whitespace fix
Remove extra dumb-looking blank line that snuck in somehow.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-28 20:52:44 -07:00
Hoang-Nam Nguyen
05d989f948 IB/ehca: Fix include order to better match kernel style
Include <rdma/...> headers after <asm/...> headers.

Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-28 08:36:32 -07:00
Tom Tucker
4e8e6ee380 RDMA/amso1100: Initialize the wait_queue_head_t in the c2_qp structure
Fix a crash if the driver has to wait for a QP reference to be dropped
when destroying the QP.

Signed-off-by: Ethan Burns <eaburns@iol.unh.edu>
Acked-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-28 08:06:40 -07:00
Mike Christie
7974392c0b [SCSI] iscsi_tcp, ib_iser Enable module refcounting for iscsi host template
This prevents the iscsi modules from being unloaded while
there are active mounts from an iscsi target.

Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2007-07-27 09:11:45 -04:00
Stefan Roscher
e2f81daf23 IB/ehca: Support small QP queues
eHCA2 supports QP queues that can be as small as 512 bytes. This
greatly reduces memory overhead for consumers that use lots of QPs
with small queues (e.g. RDMA-only QPs). Apart from dealing with
firmware, this code needs to manage bite-sized chunks of kernel pages,
making sure that no kernel page is shared between different protection
domains.

Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
2007-07-20 21:19:47 -07:00
Joachim Fenkes
0c10f7b79b IB/ehca: Make internal_create/destroy_qp() static
They're only used in ehca_qp.c, so make them static to that file.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-20 21:19:44 -07:00
Hoang-Nam Nguyen
51d2bfbddb IB/ehca: Move ehca2ib_return_code() out of line
ehca2ib_return_code() is not used in any fast path, and making it
non-inline saves ~1.5K of code.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-20 21:19:44 -07:00
Hoang-Nam Nguyen
633a5aedae IB/ehca: Generate async event when SRQ limit reached
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-20 21:19:44 -07:00
Hoang-Nam Nguyen
5bb7d9290c IB/ehca: Support large page MRs
Add support for MR pages larger than 4K on eHCA2. This reduces
firmware memory consumption.  If enabled via the mr_largepage module
parameter, the MR page size will be determined based on the MR length
and the hardware capabilities -- if the MR is >= 16M, 16M pages are
used, for example.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-20 21:19:43 -07:00
Roland Dreier
23f1b38481 IB/mlx4: Fix error path in create_qp_common()
The error handling code at err_wrid in create_qp_common() does not
handle a userspace QP attached to an SRQ correctly, since it ends up
in the else clause of the if statement.  This means it tries to
kfree() the uninitialized qp->sq.wrid and qp->rq.wrid pointers.  Fix
this so we only free the wrid arrays for kernel QPs.

Pointed out by Michael S. Tsirkin <mst@dev.mellanox.co.il>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-20 21:19:43 -07:00
Michael S. Tsirkin
c1f74958db IB/mthca: Change command token on timeout
The FW command token is currently only updated on a command completion
event. This means that on command timeout, the same token will be
reused for new command, which results in a mess if the timed out
command *does* eventually complete.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-20 21:19:43 -07:00
Arthur Jones
bd63104811 IB/ipath: Remove ipath_layer dead code
The ipath_layer.[ch] code was an attempt to provide a single interface
for the ipath verbs and ipath_ether code to use.  As verbs
functionality increased, the layer's functionality became insufficient
and the verbs code broke away to interface directly to the driver.
The failed attempt to get ipath_ether upstream was the final nail in
the coffin and now it sits quietly in a dark kernel.org corner waiting
for someone to notice the smell and send it along to it's final
resting place.  Roland Dreier was that someone -- this patch expands
on his work...

Signed-off-by: Arthur Jones <arthur.jones@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-20 21:19:43 -07:00
Florin Malita
f5b404317b IB/mlx4: Fix leaks in __mlx4_ib_modify_qp
Temporarily allocated struct mlx4_qp_context *context is leaked by
several error paths.  The patch takes advantage of the return value
'err' being preinitialized to -EINVAL.

Spotted by Coverity (CID 1768).

Signed-off-by: Florin Malita <fmalita@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-20 21:19:43 -07:00
Paul Mundt
20c2df83d2 mm: Remove slab destructors from kmem_cache_create().
Slab destructors were no longer supported after Christoph's
c59def9f22 change. They've been
BUGs for both slab and slub, and slob never supported them
either.

This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-07-20 10:11:58 +09:00
Yoann Padioleau
dd00cc486a some kmalloc/memset ->kzalloc (tree wide)
Transform some calls to kmalloc/memset to a single kzalloc (or kcalloc).

Here is a short excerpt of the semantic patch performing
this transformation:

@@
type T2;
expression x;
identifier f,fld;
expression E;
expression E1,E2;
expression e1,e2,e3,y;
statement S;
@@

 x =
- kmalloc
+ kzalloc
  (E1,E2)
  ...  when != \(x->fld=E;\|y=f(...,x,...);\|f(...,x,...);\|x=E;\|while(...) S\|for(e1;e2;e3) S\)
- memset((T2)x,0,E1);

@@
expression E1,E2,E3;
@@

- kzalloc(E1 * E2,E3)
+ kcalloc(E1,E2,E3)

[akpm@linux-foundation.org: get kcalloc args the right way around]
Signed-off-by: Yoann Padioleau <padator@wanadoo.fr>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Acked-by: Russell King <rmk@arm.linux.org.uk>
Cc: Bryan Wu <bryan.wu@analog.com>
Acked-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Dave Airlie <airlied@linux.ie>
Acked-by: Roland Dreier <rolandd@cisco.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Acked-by: Dmitry Torokhov <dtor@mail.ru>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Acked-by: Pierre Ossman <drzeus-list@drzeus.cx>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Greg KH <greg@kroah.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:50 -07:00
Roland Dreier
43509d1fec IB/mthca: Simplify use of size0 in work request posting
Current code sets size0 to 0 at the start of work request posting
functions and then handles size0 == 0 specially within the loop over
work requests.  Change this so size0 is set along with f0 the first
time through the loop (when nreq == 0).  This makes the code easier to
understand by making it clearer that f0 and size0 are always
initialized if nreq != 0 without having to know that size0 == 0
implies nreq == 0.

Also annotate size0 with uninitialized_var() so that this doesn't
introduce a new compiler warning.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-18 13:28:29 -07:00
Roland Dreier
e535c699bf IB/mthca: Factor out setting WQE UD segment entries
Factor code to set UD entries out of the work request posting
functions into inline functions set_tavor_ud_seg() and
set_arbel_ud_seg().  This doesn't change the generated code in any
significant way, and makes the source easier on the eyes.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-18 13:21:14 -07:00
Roland Dreier
400ddc11eb IB/mthca: Factor out setting WQE remote address and atomic segment entries
Factor code to set remote address and atomic segment entries out of the
work request posting functions into inline functions set_raddr_seg()
and set_atomic_seg().  This doesn't change the generated code in any
significant way, and makes the source easier on the eyes.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-18 12:55:42 -07:00
Roland Dreier
0fbfa6a906 IB/mlx4: Factor out setting other WQE segments
Factor code to set remote address, atomic and datagram segments out of
mlx4_ib_post_send() into small helper functions.  This doesn't change
the generated code in any significant way, and makes the source easier
on the eyes.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-18 11:47:55 -07:00
Roland Dreier
d420d9e32f IB/mlx4: Factor out setting WQE data segment entries
Factor code to set data segment entries out of mlx4_ib_post_send()
into set_data_seg().  This cleans up the code and lets the compiler do
a better job -- on x86_64:

add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-16 (-16)
function                                     old     new   delta
mlx4_ib_post_send                           1598    1582     -16

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-18 11:46:27 -07:00
Roland Dreier
80885456e8 IB/mthca: Factor out setting WQE data segment entries
Factor code to set data segment entries out of the work request
posting functions into inline functions mthca_set_data_seg() and
mthca_set_data_seg_inval().  This makes the code more readable and
also allows the compiler to do a better job -- on x86_64:

add/remove: 0/0 grow/shrink: 0/6 up/down: 0/-69 (-69)
function                                     old     new   delta
mthca_arbel_post_srq_recv                    373     369      -4
mthca_arbel_post_receive                     570     562      -8
mthca_tavor_post_srq_recv                    520     508     -12
mthca_tavor_post_send                       1344    1330     -14
mthca_arbel_post_send                       1481    1467     -14
mthca_tavor_post_receive                     792     775     -17

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-18 11:30:34 -07:00
Roland Dreier
7f5eb9bb8c IB/mlx4: Return receive queue sizes for userspace QPs from query QP
Return the receive queue sizes for both userspace QPs and kernel Qps
(not just kernel QPs) from mlx4_ib_query_qp().  Also zero the send
queue sizes for userspace QPs to avoid a possible information leak,
and set the max_inline_data for kernel QPs to 0 since inline sends are
not supported for kernel QPs.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-17 20:59:02 -07:00
Dotan Barak
8f076531cd RDMA/cma: Remove local write permission from QP access flags
Local write permission makes no sense as part of the QP access flags,
since the access flags only control what the remote end of the
connection is allowed to do.  Remove the code in the RDMA CM that
initializes qp_access_flags with IB_ACCESS_LOCAL_WRITE.

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-17 20:30:22 -07:00
Roland Dreier
6d7d080e9f IB/mthca: Use uninitialized_var() for f0
Commit 9db48926 ("drivers/infiniband/hw/mthca/mthca_qp: kill uninit'd
var warning") added "= 0" to the declarations of f0 to shut up gcc
warnings.  However, there's no point in making the code bigger by
initializing f0 to a random value just to get rid of a warning;
setting f0 to 0 is no safer than just using uninitialized_var(), which
documents the situation better and gives smaller code too.  For example, 
on x86_64:

add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-16 (-16)
function                                     old     new   delta
mthca_tavor_post_send                       1352    1344      -8
mthca_arbel_post_send                       1489    1481      -8

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-17 19:30:51 -07:00
Roland Dreier
454a01e7f4 IB/cm: Make internal function cm_get_ack_delay() static
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-17 18:37:43 -07:00
Roland Dreier
1743b91710 IB/ipath: Remove ipath_get_user_pages_nocopy()
It has no callers and is completely dead code.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-17 18:37:43 -07:00
Roland Dreier
da9aec7b62 IB/ipath: Make a few functions static
Make some functions that are only used in a single .c file static.  In
addition to being a cleanup, this shrinks the generated code.  On x86_64:

add/remove: 1/3 grow/shrink: 2/1 up/down: 4777/-4956 (-179)
function                                     old     new   delta
handle_errors                                  -    3994   +3994
__verbs_timer                                 42     710    +668
ipath_do_ruc_send                           2131    2246    +115
ipath_no_bufs_available                      136       -    -136
ipath_disarm_senderrbufs                     639       -    -639
ipath_ib_timer                               658       -    -658
ipath_intr                                  5878    2355   -3523

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-17 18:37:43 -07:00
Roland Dreier
41179e2de6 IB/iser: Make a couple of functions static
Make iser_conn_release() and iser_start_rdma_unaligned_sg() static,
since they are only used in the .c file where they are defined.  In
addition to being a cleanup, this even shrinks the generated code by
allowing the single call of iser_start_rdma_unaligned_sg() to be
inlined into its callsite.  On x86_64:

add/remove: 0/1 grow/shrink: 1/0 up/down: 466/-533 (-67)
function                                     old     new   delta
iser_reg_rdma_mem                           1518    1984    +466
iser_start_rdma_unaligned_sg                 533       -    -533

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-17 18:37:42 -07:00
Roland Dreier
e4daf73868 IB/mthca: Fix printk format used for firmware version in warning
When warning about out-of-date firmware, current mthca code messes up
the formatting of the version if the subminor doesn't have three
digits.  It doesn't fill the field with 0s so we end up with:

    ib_mthca 0000:0b:00.0: HCA FW version 1.1.  0 is old (1.2.  0 is current).

Change the format from "%3d" to "%03d" to get the right thing printed.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-17 18:37:42 -07:00
Roland Dreier
f6be6fbe26 IB/mthca: Schedule MSI support for removal
The mthca driver supports both MSI and MSI-X.  However, MSI-X works with
all hardware that the driver handles, and provides a superset of what
MSI does, so there's no point in having code for both.  Schedule MSI
support for removal in 2008 to give anyone who actually needs MSI and
who can't use MSI time to speak up.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-17 18:37:41 -07:00
Hoang-Nam Nguyen
2b94397adc IB/ehca: Fix warnings issued by checkpatch.pl
Run the existing ehca code through checkpatch.pl and clean up the
worst of the coding style violations.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-17 18:37:40 -07:00
Hoang-Nam Nguyen
187c72e31f IB/ehca: Restructure ehca_set_pagebuf()
Split ehca_set_pagebuf() into three functions depending on MR type
(phys/user/fast) and remove superfluous ehca_set_pagebuf_1().

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-17 18:37:40 -07:00
Hoang-Nam Nguyen
df17bfd4a0 IB/ehca: MR/MW structure refactoring
- Rename struct ehca_mr fields to clearly distinguish between kernel
  and HW page size.
- Sort struct ehca_mr_pginfo into a common part and a union containing
  specific fields for physical, user and fast MR

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-17 18:37:40 -07:00
Hoang-Nam Nguyen
2492398e61 IB/ehca: Use macro to calculate number of chunks in a mem block
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-17 18:37:39 -07:00
Hoang-Nam Nguyen
4e4e74cae7 IB/ehca: Use #define for "pages per register_rpage" instead of hardcoded value
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-17 18:37:39 -07:00
Hoang-Nam Nguyen
a1a6ff1100 IB/ehca: Use common error code mapping instead of specific ones
Instead of one error mapping function for each potential error source
in ehca_mrmw.c, use a centralized function that handles all cases,
saving a three-figure line count.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-17 18:37:39 -07:00
Hoang-Nam Nguyen
3df78f81e0 IB/ehca: Fix memory leak in error path of ehca_get_dma_mr()
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-17 18:37:39 -07:00
Joachim Fenkes
fbb9318be4 IB/ehca: Fix HW level autodetection
Autodetection was missing a few HW revisions, causing certain eHCA1
revisions to be treated like eHCA2.  Fixed.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-17 18:37:39 -07:00
Dotan Barak
8fcea95a2a IB/mlx4: Take sizeof the correct pointer in call to memset()
When clearing the ib_ah_attr parameter in to_ib_ah_attr(), use sizeof
*ib_ah_attr instead of sizeof *path.  This is the same bug as was
fixed for mthca in 99d4f22e ("IB/mthca: Use correct structure size in
call to memset()"), but the code was cut and pasted into mlx4 before the
fix was merged.

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-17 18:37:38 -07:00
Jack Morgenstein
1c27cb71aa IB/mlx4: Fix port returned from query QP for QPs in INIT state
When a QP is in the INIT state, the sched_queue field hasn't been given 
to the firmware yet, so the firmware cannot return the value when the QP 
is queried.  To handle this, use the port number that is saved in the 
driver's QP data structure.

Found by Dotan Barak and Yaron Gepstein of Mellanox.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-17 18:37:38 -07:00
Jack Morgenstein
586bb586ae IB/mlx4: Fix flow label returned from query QP
Correct the mask used to get the flow label, since the field is 20 bits, 
not 24 bits.

Found by Dotan Barak and Yaron Gepstein of Mellanox. 

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-17 18:37:38 -07:00
Steve Wise
1b07db7079 RDMA/cxgb3: Remove cm_id reference on listen failures
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-17 18:37:38 -07:00
Jeff Garzik
9db4892620 drivers/infiniband/hw/mthca/mthca_qp: kill uninit'd var warning
drivers/infiniband/hw/mthca/mthca_qp.c: In function
  ‘mthca_tavor_post_send’:
drivers/infiniband/hw/mthca/mthca_qp.c:1594: warning: ‘f0’ may be used
  uninitialized in this function
drivers/infiniband/hw/mthca/mthca_qp.c: In function
  ‘mthca_arbel_post_send’:
drivers/infiniband/hw/mthca/mthca_qp.c:1949: warning: ‘f0’ may be used
  uninitialized in this function

Initializing 'f0' is not strictly necessary in either case, AFAICS.

I was considering use of uninitialized_var(), but looking at the
complex flow of control in each function, I feel it is wiser and
safer to simply zero the var and be certain of ourselves.

Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-07-17 16:18:00 -04:00
Linus Torvalds
bc06cffdec Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (166 commits)
  [SCSI] ibmvscsi: convert to use the data buffer accessors
  [SCSI] dc395x: convert to use the data buffer accessors
  [SCSI] ncr53c8xx: convert to use the data buffer accessors
  [SCSI] sym53c8xx: convert to use the data buffer accessors
  [SCSI] ppa: coding police and printk levels
  [SCSI] aic7xxx_old: remove redundant GFP_ATOMIC from kmalloc
  [SCSI] i2o: remove redundant GFP_ATOMIC from kmalloc from device.c
  [SCSI] remove the dead CYBERSTORMIII_SCSI option
  [SCSI] don't build scsi_dma_{map,unmap} for !HAS_DMA
  [SCSI] Clean up scsi_add_lun a bit
  [SCSI] 53c700: Remove printk, which triggers because of low scsi clock on SNI RMs
  [SCSI] sni_53c710: Cleanup
  [SCSI] qla4xxx: Fix underrun/overrun conditions
  [SCSI] megaraid_mbox: use mutex instead of semaphore
  [SCSI] aacraid: add 51245, 51645 and 52245 adapters to documentation.
  [SCSI] qla2xxx: update version to 8.02.00-k1.
  [SCSI] qla2xxx: add support for NPIV
  [SCSI] stex: use resid for xfer len information
  [SCSI] Add Brownie 1200U3P to blacklist
  [SCSI] scsi.c: convert to use the data buffer accessors
  ...
2007-07-15 16:51:54 -07:00
Linus Torvalds
0cdf6990e9 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (76 commits)
  IB: Update MAINTAINERS with Hal's new email address
  IB/mlx4: Implement query SRQ
  IB/mlx4: Implement query QP
  IB/cm: Send no match if a SIDR REQ does not match a listen
  IB/cm: Fix handling of duplicate SIDR REQs
  IB/cm: cm_msgs.h should include ib_cm.h
  IB/cm: Include HCA ACK delay in local ACK timeout
  IB/cm: Use spin_lock_irq() instead of spin_lock_irqsave() when possible
  IB/sa: Make sure SA queries use default P_Key
  IPoIB: Recycle loopback skbs instead of freeing and reallocating
  IB/mthca: Replace memset(<addr>, 0, PAGE_SIZE) with clear_page(<addr>)
  IPoIB/cm: Fix warning if IPV6 is not enabled
  IB/core: Take sizeof the correct pointer when calling kmalloc()
  IB/ehca: Improve latency by unlocking after triggering the hardware
  IB/ehca: Notify consumers of LID/PKEY/SM changes after nondisruptive events
  IB/ehca: Return QP pointer in poll_cq()
  IB/ehca: Change idr spinlocks into rwlocks
  IB/ehca: Refactor sync between completions and destroy_cq using atomic_t
  IB/ehca: Lock renaming, static initializers
  IB/ehca: Report RDMA atomic attributes in query_qp()
  ...
2007-07-12 16:45:40 -07:00
Jack Morgenstein
65541cb7cf IB/mlx4: Implement query SRQ
Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-12 15:41:24 -07:00
Jack Morgenstein
6a775e2ba4 IB/mlx4: Implement query QP
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-12 15:41:00 -07:00
Linus Torvalds
21ba0f88ae Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6: (34 commits)
  PCI: Only build PCI syscalls on architectures that want them
  PCI: limit pci_get_bus_and_slot to domain 0
  PCI: hotplug: acpiphp: avoid acpiphp "cannot get bridge info" PCI hotplug failure
  PCI: hotplug: acpiphp: remove hot plug parameter write to PCI host bridge
  PCI: hotplug: acpiphp: fix slot poweroff problem on systems without _PS3
  PCI: hotplug: pciehp: wait for 1 second after power off slot
  PCI: pci_set_power_state(): check for PM capabilities earlier
  PCI: cpci_hotplug: Convert to use the kthread API
  PCI: add pci_try_set_mwi
  PCI: pcie: remove SPIN_LOCK_UNLOCKED
  PCI: ROUND_UP macro cleanup in drivers/pci
  PCI: remove pci_dac_dma_... APIs
  PCI: pci-x-pci-express-read-control-interfaces cleanups
  PCI: Fix typo in include/linux/pci.h
  PCI: pci_ids, remove double or more empty lines
  PCI: pci_ids, add atheros and 3com_2 vendors
  PCI: pci_ids, reorder some entries
  PCI: i386: traps, change VENDOR to DEVICE
  PCI: ATM: lanai, change VENDOR to DEVICE
  PCI: Change all drivers to use pci_device->revision
  ...
2007-07-12 13:40:57 -07:00
Tejun Heo
7b595756ec sysfs: kill unnecessary attribute->owner
sysfs is now completely out of driver/module lifetime game.  After
deletion, a sysfs node doesn't access anything outside sysfs proper,
so there's no reason to hold onto the attribute owners.  Note that
often the wrong modules were accounted for as owners leading to
accessing removed modules.

This patch kills now unnecessary attribute->owner.  Note that with
this change, userland holding a sysfs node does not prevent the
backing module from being unloaded.

For more info regarding lifetime rule cleanup, please read the
following message.

  http://article.gmane.org/gmane.linux.kernel/510293

(tweaked by Greg to not delete the field just yet, to make it easier to
merge things properly.)

Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:09:06 -07:00
Auke Kok
44c10138fd PCI: Change all drivers to use pci_device->revision
Instead of all drivers reading pci config space to get the revision
ID, they can now use the pci_device->revision member.

This exposes some issues where drivers where reading a word or a dword
for the revision number, and adding useless error-handling around the
read. Some drivers even just read it for no purpose of all.

In devices where the revision ID is being copied over and used in what
appears to be the equivalent of hotpath, I have left the copy code
and the cached copy as not to influence the driver's performance.

Compile tested with make all{yes,mod}config on x86_64 and i386.

Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Acked-by: Dave Jones <davej@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:02:10 -07:00
Sean Hefty
6164c8cd13 IB/cm: Send no match if a SIDR REQ does not match a listen
If a SIDR REQ does not match a listen, we should reply with status
value 1 (service ID not supported), rather than dropping through to
the default case of status 2 (rejected by service provider).

Doing this also fixes a bug where the cm_id_priv is removed from the
remote_sidr_table twice.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-10 21:52:28 -07:00
Sean Hefty
29c2731cbf IB/cm: Fix handling of duplicate SIDR REQs
Fix handling to duplicate SIDR REQs to avoid sending a reject if a
duplicate is detected.  Duplicates should just be silently discarded.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-10 21:51:43 -07:00
Sean Hefty
5d861be8c8 IB/cm: cm_msgs.h should include ib_cm.h
cm_msgs.h uses definitions from ib_cm.h.  Include it directly, rather
than depending on a specific include order.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-10 21:50:53 -07:00
Sean Hefty
1d84612649 IB/cm: Include HCA ACK delay in local ACK timeout
The IB CM should include the HCA ACK delay when calculating the local
ACK timeout value to use for RC QPs.  If the HCA ACK delay is large
enough relative to the packet life time, then if it is not taken into
account, the calculated timeout value ends up being too small, which
can result in "retry exceeded" errors.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-10 21:50:05 -07:00
Sean Hefty
24be6e81c7 IB/cm: Use spin_lock_irq() instead of spin_lock_irqsave() when possible
The ib_cm is a little over zealous about using spin_lock_irqsave,
when spin_lock_irq would do.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-10 21:47:29 -07:00
Sean Hefty
2aec5c602c IB/sa: Make sure SA queries use default P_Key
MADs sent to the SA should use the the default P_Key (0x7fff/0xffff).
There's no requirement that the default P_Key is stored at index 0 in
the local P_Key table, so add code to the sa_query module to look up
the index of the default P_Key when creating an address handle for the
SA (which is done any time the P_Key table might change), and use this
index for all SA queries.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-10 21:45:31 -07:00
Roland Dreier
1b844afe9e IPoIB: Recycle loopback skbs instead of freeing and reallocating
InfiniBand HCAs replicate multicast packets back to the QP that sent
them if that QP is attached to the destination multicast group.  This
means that IPoIB multicasts are often replicated back to the receive
queue of the interface that generated them.  To avoid confusing the
network stack, we drop these duplicates within the IPoIB driver.

However, there's no reason to free the skb that received the duplicate
and then immediately allocate a new skb to post to the receive queue.
We can be more efficient and just repost the same skb.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-10 13:43:53 -07:00
Shani Moideen
8909c571fa IB/mthca: Replace memset(<addr>, 0, PAGE_SIZE) with clear_page(<addr>)
Signed-off-by: Shani Moideen <shani.moideen@wipro.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
----
2007-07-10 12:28:05 -07:00
Roland Dreier
20089ca557 IPoIB/cm: Fix warning if IPV6 is not enabled
Fix

    drivers/infiniband/ulp/ipoib/ipoib_cm.c:1151: warning: unused variable 'dev'

by getting rid of the variable dev, which is only used if CONFIG_IPV6
is enabled, and replacing the one use of it with the value it is
assigned, namely priv->dev.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-10 11:18:34 -07:00
Dotan Barak
856c52a741 IB/core: Take sizeof the correct pointer when calling kmalloc()
When allocating out_mad in show_pma_counter(), take sizeof *out_mad
instead of sizeof *in_mad.  It is true that today the type of in_mad
and out_mad are the same, but this patch will give us a cleaner code.

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-10 11:04:40 -07:00
Hoang-Nam Nguyen
f72d2f0814 IB/ehca: Improve latency by unlocking after triggering the hardware
Kick the hardware before unlocking the send/receive queue to overlap
processing a little more.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:27 -07:00
Joachim Fenkes
8705ce5b90 IB/ehca: Notify consumers of LID/PKEY/SM changes after nondisruptive events
When firmware reports a nondisruptive port configuration change event,
previous versions of the eHCA driver didn't forward the event to consumers
like IPoIB.  Add code that determines the type of configuration change by
comparing old and new port attributes and reports it.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:27 -07:00
Joachim Fenkes
b1cfe43d4b IB/ehca: Return QP pointer in poll_cq()
Also add two unlikely() statements.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:27 -07:00
Joachim Fenkes
26ed687fdd IB/ehca: Change idr spinlocks into rwlocks
This eliminates lock contention among IRQs as well as the need to
disable IRQs around idr_find, because there are no IRQ writers.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:27 -07:00
Joachim Fenkes
28db6beb42 IB/ehca: Refactor sync between completions and destroy_cq using atomic_t
- ehca_cq.nr_events is made an atomic_t, eliminating a lot of locking.
- The CQ is removed from the CQ idr first now to make sure no more
  completions are scheduled on that CQ. The "wait for all completions to
  end" code becomes much simpler this way.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:27 -07:00
Joachim Fenkes
9844b71baa IB/ehca: Lock renaming, static initializers
- Rename all spinlock flags to "flags", matching the vast majority of kernel
  code.
- Move hcall_lock into the only file it's used in.
- Replaced spin_lock_init() and friends with static initializers for
  global variables.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:27 -07:00
Hoang-Nam Nguyen
15f001ec47 IB/ehca: Report RDMA atomic attributes in query_qp()
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:27 -07:00
Stefan Roscher
85f003172f IB/ehca: Set SEND_GRH flag for all non-LL UD QPs on eHCA2
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:27 -07:00
Stefan Roscher
472803dab8 IB/ehca: Support UD low-latency QPs
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:27 -07:00
Joachim Fenkes
a6a12947fb IB/ehca: add Shared Receive Queue support
Support SRQs on eHCA2. Since an SRQ is a QP for eHCA2, a lot of code
(structures, create, destroy, post_recv) can be shared between QP and SRQ.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:27 -07:00
Joachim Fenkes
9a79fc0a1b IB/ehca: QP code restructuring in preparation for SRQ
- Replace init_qp_queues() by a shorter init_qp_queue(), eliminating
  duplicate code.

- hipz_h_alloc_resource_qp() doesn't need a pointer to struct ehca_qp any
  longer. All input and output data is transferred through the parms
  parameter.

- Change the interface to also support SRQ.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:27 -07:00
Joachim Fenkes
91f13aa3fc IB/ehca: HW level, HW caps and MTU autodetection
In preparation for support of new eHCA2 features, change adapter probing:
 - Hardware level is changed to encode major and minor chip version
 - Hardware capabilities are queried from the firmware
 - The maximum MTU is queried from the firmware instead of assuming a
   fixed value

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:27 -07:00
Hoang-Nam Nguyen
b8a3ba5513 IB/ehca: Change scaling_code parameter description to match default value
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
Sean Hefty
f41d229865 IB/ipath: return correct PortGUID in NodeInfo
Return the PortGUID of the correct port when responding to a NodeInfo
query.  Returning the SystemImageGUID causes issues when there are
multiple HCAs in a single system.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
Arthur Jones
4f5973fd3b IB/ipath: Remove bogus RD_ATOMIC checks from modify_qp
The changeset 3859e39d ("IB/ipath: Support larger IB_QP_MAX_DEST_RD_ATOMIC
and IB_QP_MAX_QP_RD_ATOMIC") added support for larger RD_ATOMIC values,
but it failed to take out the stricter checks that were before these and
hence had no effect.  This patch takes out the bogus checks....

Signed-off-by: Arthur Jones <arthur.jones@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
Arthur Jones
3588423fba IB/ipath: Test interrupts at driver startup
All too often, interrupts do not get enabled for our card due to BIOS
misconfiguration and other issues.  This patch checks for that
condition on startup and warns the user.  This patch is based on work
(check LID availability) by Robert Walsh.

Signed-off-by: Arthur Jones <arthur.jones@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
Ralph Campbell
9ca4865566 IB/ipath: Remove support for preproduction HTX InfiniPath cards
Clean up some code by removing support for some older pre-production
HTX InfiniPath cards.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
2007-07-09 20:12:26 -07:00
Dave Olson
12f9a49e1b IB/ipath: Change version wording to be less confusing with release number
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
Ralph Campbell
37a7e9b7f2 IB/ipath: Lower default number of kernel send buffers
The default calculation for the number of send buffers to allocate to
the kernel was too high for the PCIe version of the chip thus leaving
fewer than desired send buffers for user MPI applications.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
Dave Olson
0f4fc5ebd9 IB/ipath: Be more cautious about coming out of freeze mode
We are more careful to be sure that we don't lose information about
changes that occurred while we were in freeze mode, when the chip will
not notify us, and try to avoid false error interrupts while doing
cleanup.  Put all of this logic in a new function ipath_clear_freeze().

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
Ralph Campbell
4fc570bcbe IB/ipath: Add barrier before updating WC head in shared memory
Add a barrier to make sure the CPU doesn't reorder writes to memory,
since user programs can be polling on the head index update and the
entry should be written before that.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
Jan Engelhardt
06cc85086e IB: Use menuconfig for InfiniBand menu
Change Kconfig objects from "menu, config" into "menuconfig" so
that the user can disable the whole feature without having to
enter the menu first.

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
WANG Cong
6abb6ea80b RDMA/cxgb3: Check return of kmalloc() in iwch_register_device()
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
[ Also remove cast from void * return of kmalloc() as suggested by  
  Jesper Juhl <jesper.juhl@gmail.com>. ]
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
Steve Wise
ecc2f0060f RDMA/cxgb3: Don't abort after failures sending the mpa reply
This bug results in an abort request being sent down _after_ the tid
has been released.  If the tid happens to have been reused, then the
subsequent generation of the tid gets incorrectly aborted.

The thread running iwch_accecpt_cr() must not abort a connection if an
error is returned after being awakened.  If any errors did occur while
iwch_accept_cr() is blocked, then the connection has already been
aborted on the thread processing the error.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
Steve Wise
96d0e4931e RDMA/cxgb3: Don't post TID_RELEASE message
The LLD does this for us in cxgb3_remove_tid().

Also fixed active open failure cases where we also shouldn't be
releasing the TID.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
Steve Wise
6eda48d1e8 RDMA/cxgb3: ctrl-qp init/clear shouldn't set the gen bit
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
Steve Wise
1580367e7b RDMA/cxgb3: Don't count neg_adv abort_req_rss messages as real aborts
Negative advice messages should _not_ count toward the 2 abort
requests needed to indicate an abort request.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
Steve Wise
fb497d7266 RDMA/cxgb3: TERMINATE WRs can hang the tx ofld queue
Don't set the gen bits nor length bits in the terminate WR.  This is
done by the LLD driver.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
Steve Wise
de3d353072 RDMA/cxgb3: Streaming -> RDMA mode transition fixes
Due to a HW issue, our current scheme to transition the connection from
streaming to rdma mode is broken on the passive side.  The firmware
and driver now support a new transition scheme for the passive side:

 - driver posts rdma_init_wr (now including the initial receive seqno)
 - driver posts last streaming message via TX_DATA message (MPA start
   response)
 - uP atomically sends the last streaming message and transitions the
   tcb to rdma mode.
 - driver waits for wr_ack indicating the last streaming message was ACKed.

NOTE: This change also bumps the required firmware version to 4.3.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
Dotan Barak
149983af60 mlx4_core: Get the maximum message size from reported device capabilities
Get the maximum message size from the device capabilities returned
from the QUERY_DEV_CAP firmware command, rather than hard-coding 2 GB.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
John Gregor
87427da55b IB/ipath: Update copyright dates
Now that it's June, it's about time to update
the copyright notices of files that have changed.

Signed-off-by: John Gregor <john.gregor@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
Robert Walsh
991bda284d IB/ipath: Clean send flags properly on QP reset
Signed-off-by: Robert Walsh <robert.walsh@qlogic.com>
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
Robert Walsh
f2d042313e IB/ipath: ipath_poll fixups and enhancements
Fix ipath_poll and enhance it so we can poll for urgent packets or
regular packets and receive notifications of when a header queue
overflows.

Signed-off-by: Robert Walsh <robert.walsh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
Robert Walsh
b506e1dc59 IB/ipath: Send ACK invalid where appropriate
The IB specification ch. 9.9.3 table 58 says that a QP which isn't set
up for the operation should return a NAK invalid request.

Signed-off-by: Robert Walsh <robert.walsh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
Michael Albaugh
e8e7ad7115 IB/ipath: Add capability to modify PBC word
During compliance testing and when debugging some interconnect issues,
it is very useful to be able to send malformed packets, without having
the device signal them as malformed (drop, or terminate with EBP). The
hardware supports this, but the driver "diagnostic packet" interface
did not.

Extend capability to send specific malformed packets for testing.

Signed-off-by: Michael Albaugh <Michael.Albaugh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
Mark Debbage
bacf401353 IB/ipath: Make handling of one subport consistent
Previously the driver and userspace code handled the case of 1 subport
somewhat inconsistently.  The new interpretation of this situation is
that if one subport is requested, the driver turns on the subport
mechanism and arranges for the port to be "shared" by one process.  In
normal use the userspace library does not use this configuration and
instead arranges for the port not to be shared at all.  This
particular idiom can be useful for testing purposes.

Signed-off-by: Mark Debbage <mark.debbage@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
Mark Debbage
0df6291c8a IB/ipath: Correct checking of swminor version field when using subports
When subports are required to run a program, this patch checks that
the driver and the userspace library have compatible subport
implementations.  This is achieved through checks on the swminor
version field built into the driver and userspace library.  Bad
combinations are reported through syslog and result in an error when
opening the port.

Signed-off-by: Mark Debbage <mark.debbage@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
Ralph Campbell
d781b129f1 IB/ipath: Duplicate RDMA reads can cause responder to NAK inappropriately
A duplicate RDMA read request can fool the responder into NAKing a new
RDMA read request because the responder wasn't keeping track of
whether the queue of RDMA read requests had been sent at least once.
For example, requester sends 4 2K byte RDMA read requests, times out,
and resends the first, then sees the 4 responses, then sends a 5th
RDMA read or atomic operation.  The responder sees the 4 requests,
sends 4 responses, sees the resent 1st request, rewinds the queue,
then sees the 5th request but thinks the queue is full and that the
requester is invalidly sending a 5th new request.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
Ralph Campbell
30d149ab58 IB/ipath: Fix possible data corruption if multiple SGEs used for receive
The code to copy data from the receive queue buffers to the IB SGEs
doesn't check the SGE length, only the memory region/page length when
copying data.  This could overwrite parts of the user's memory that
were not intended to be written.  It can only happen if multiple SGEs
are used to describe a receive buffer which almost never happens in
practice.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
Ralph Campbell
db5518cd09 IB/ipath: Wait for PIO available interrupt
The send function is called when posting new send work requests.
There is no point in trying to send a packet if the QP is already
waiting for a HW send buffer so don't clear the busy bit until the
buffer available interrupt happens.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
Ralph Campbell
06ee109002 IB/ipath: Fix RDMA read retry code
A RDMA read response or atomic response can ACK earlier sends and RDMA
writes.  In this case, the wrong work request pointer was being used
to store the read first response or atomic result.  Also, if a RDMA
read request is retried, the code to compute which request to resend
was incorrect.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
Dave Olson
9380068fc2 IB/ipath: Use S_ABORT not cancel and abort on exit freeze mode after recovery
This centralizes the use of the abort functionality, removes the
unneeded buffer cancel (abort does the same thing), sets up to ignore
launch errors after abort, same as cancel.  We need abort on exit from
freeze mode to avoid having buffers stuck in the busy state, if a user
process happened to complete the send while we were in freeze mode
doing the recovery.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
Dave Olson
561095f20e IB/ipath: Fix the mtrr_add args for chips with 2 buffer sizes
The values passed have never been right for iba 6120 chips, but just
happened to work.  We needed to select the right buffer offset in the
chip (both are in same register), and the total length was wrong also,
but was covered by the rounding up.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
Joan Eslinger
f716cdfe57 IB/ipath: Change use of constants for TID type to defined values
Define pkt rcvd 'type' in a way consistent with HW spec and chips.

The hardware considers received packets of type 0 to be expected, and
type 1 to be eager. The driver was calling the ipath_f_put_tid
functions using a variable called 'type' set to 0 for eager and to 1
for expected packets.  Worse, the iba6110 and iba6120 drivers used
those values inconsistently.  This was quite confusing.  Now
everything is consistent with the hardware.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
Ralph Campbell
1dd6a1be14 IB/ipath: Set M bit in BTH according to IB spec
According to chapter 17.2.8.1.1, QPs start in the migrated state and
should send packets with the M bit set in the BTH.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
Ralph Campbell
6d2fad0472 IB/ipath: Fix local loopback bug when waiting for resources
This patch fixes a minor bug where the wrong QP was checked for a send
work request that should wait for an RNR timeout.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
Ralph Campbell
2c9749c3b5 IB/ipath: Fix problem with next WQE after a UC completion
This patch fixes a bug introduced when moving some code around for
readability.

Setting the wqe pointer at the end of the function is a NOP since it
isn't used.  Move it back to where it is used.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
Robert Walsh
fdc7215fbd IB/ipath: Fill in some missing FMR-related fields in query_device
In ipath_query_device(), some of the struct ib_device_attr fields were
not being initialized.

Signed-off-by: Robert Walsh <robert.walsh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
Robert Walsh
e7340f0442 IB/ipath: Fix maximum MTU reporting
Although our chip supports 4K MTUs, our driver doesn't yet support
this feature, so limit the maximum MTU to 2K until we get support for
4K MTUs implemented.

Signed-off-by: Robert Walsh <robert.walsh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:25 -07:00
Dave Olson
380bf5d38f IB/ipath: Support the IBA6110 revision 4
Recognize IBA 6110 Revision 4: same feature set, etc. as earlier revisions.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:25 -07:00
Michael Albaugh
aecd3b5ab1 IB/ipath: Log "active" time and some errors to EEPROM
We currently track various errors, now we enhance that capability by
logging some of them to EEPROM.  We also now log a cumulative "active"
time defined by traffic though the InfiniPath HCA beyond the normal SM
traffic.

Signed-off-by: Michael Albaugh <michael.albaugh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:25 -07:00
John Gregor
8e9ab3f1c9 IB/ipath: Remove incompletely implemented ipath_runtime flags and code
The IPATH_RUNTIME_PBC_REWRITE and the IPATH_RUNTIME_LOOSE_DMA_ALIGN
flags were not ever implemented correctly and did not turn out to be
necessary.  Remove the last vestiges of these flags but mark the spot
with a comment to remind us to not reuse these flags in the interest
of binary compatibility.  The INFINIPATH_XGXS_SUPPRESS_ARMLAUNCH_ERR
bit was also not found to be useful, so it was dropped in the cleanup
as well.

Signed-off-by: John Gregor <john.gregor@qlogic.com>
Signed-off-by: Arthur Jones <arthur.jones@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:25 -07:00
Michael Albaugh
17b2eb9fe6 IB/ipath: Lock and always use shadow copies of GPIO register
The new LED blinking interface adds more contention for the
unprotected GPIO pins that were already shared, though not commonly at
the same time.  We add locks to the accesses to these pins so that
Read-Modify-Write is now safe.  Some of these locks are added at
interrupt context, so we shadow the registers which drive and inspect
these pins to avoid the mmio read/writes.  This mitigates the effects
of the locks and hastens us through the interrupt.

Add locking and always use shadows for registers controlling GPIO pins
(ExtCtrl and GPIOout). The use of shadows implies doing less I/O,
which can make I2C operation too fast on some platforms. An explicit
udelay(1) in SCL manipulation fixes that.

Signed-off-by: Michael Albaugh <michael.albaugh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:25 -07:00
Michael Albaugh
82466f00ec IB/ipath: Support blinking LEDs with an led_override file
When we want to find an InfiniPath HCA in a rack of nodes, it is often
expeditious to blink the status LEDs via a userspace /sys file.

A write-only led_override "file" is published per device. Writes to
this file are interpreted as (string form) numbers, and the resulting
value sent to ipath_set_led_override(). The upper eight bits are
interpretted as a 4.4 fixed-point "frequency in Hertz", and the bottom
two 4-bit values are alternately (D0..3, then D4..7) used by the
board-specific LED-setting function to override the normal state.

Signed-off-by: Michael Albaugh <michael.albaugh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:25 -07:00
Bryan O'Sullivan
a024291b36 IB/ipath: Include <linux/vmalloc.h> to fix ppc64 build
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:25 -07:00
Michael S. Tsirkin
63019d9329 IB/mlx4: Include linux/mutex.h from mlx4_ib.h
mlx4_ib.h uses struct mutex, so although <linux/mutex.h> seems to be
pulled in indirectly by one of the headers it includes, the right
thing is to include <linux/mutex.h> directly.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 16:17:33 -07:00
Andrew Morton
1d3f4b905a IB: Fix ib_umem_get() when npages == 0
gcc correctly warned:

drivers/infiniband/core/umem.c: In function 'ib_umem_get':
drivers/infiniband/core/umem.c:78: warning: 'ret' may be used uninitialized in this function

Set ret to 0 in case npages == 0 and the loop isn't entered at all.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 16:17:33 -07:00
Roland Dreier
43506d954e IB: Remove garbage non-ASCII characters from comments
A few files had 0xa0 characters in comments.  Remove them so that the 
files are clean ASCII text.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 16:17:32 -07:00
Joachim Fenkes
fffba373ef IB/ehca: Refactor "maybe missed event" code
Refactor the ehca changes from commit ed23a727 ("IB: Return "maybe
missed event" hint from ib_req_notify_cq()") so the queue arithmetic
is done in slightly fewer lines.  Also, move the spinlock flags into
the block they're used in.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 16:17:32 -07:00
Hal Rosenstock
1bae4dbf95 IB/mad: Enhance SMI for switch support
Extend the SMI with switch (intermediate hop) support. Care has been
taken to ensure that the CA (and router) code paths are changed as
little as possible.

Signed-off-by: Suresh Shelvapille <suri@baymicrosystems.com>
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 16:17:32 -07:00
Ralph Campbell
841adfca9c IPoIB/cm: Partial error clean up unmaps wrong address
If a page can't be allocated for the frag list of a skb, the code to
unmap the partially allocated list is off by one.  For exaple, if
'frags' equals one, i == 0, and the alloc_page() fails, then the old
loop would have unmapped mapping[1] which is uninitialized.  The same
would happen if the call to ib_dma_map_page() failed.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Acked-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-02 20:48:31 -07:00
Jack Morgenstein
c8681f1401 IB/mlx4: Correct max_srq_wr returned from mlx4_ib_query_device()
We need to keep a spare entry in the SRQ so that there always is a
next WQE available when posting receives (so that we can tell the
difference between a full queue and an empty queue).  So subtract 1
from the value HW gives us before reporting the limit on SRQ entries
to consumers.

Found by Mellanox QA.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-06-21 13:39:10 -07:00
Roland Dreier
13ef5f44c3 IPoIB/cm: Remove dead definition of struct ipoib_cm_id
It's completely unused.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-06-21 13:39:08 -07:00
Michael S. Tsirkin
82c3aca6ad IPoIB/cm: Fix interoperability when MTU doesn't match
IPoIB connected mode currently rejects a connection request unless the
supported MTU is >= the local netdevice MTU. This breaks
interoperability with implementations that might have tweaked
IPOIB_CM_MTU, and there's real no longer a reason to do so: this test
is just a leftover from when we did not tweak MTU per-connection.  Fix
this by making the test as permissive as possible.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-06-21 13:38:08 -07:00
Michael S. Tsirkin
3ec7393a68 IPoIB/cm: Initialize RX before moving QP to RTR
Fix a crasher bug in IPoIB CM: once a QP is in the RTR state, a
receive completion (or even an asynchronous error) might be observed
on this QP, so we have to initialize all of our receive data
structures before moving to the RTR state.

As an optimization (since modify_qp might take a long time), the
jiffies update done when moving RX to the passive_ids list is also
left in place to reduce the chance of the RX being misdetected as
stale.

This fixes bug <https://bugs.openfabrics.org/show_bug.cgi?id=662>.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-06-21 13:03:50 -07:00
Roland Dreier
24bce50803 IB/umem: Fix possible hang on process exit
If ib_umem_release() is called after ib_uverbs_close() sets context->closing,
then a process can get stuck in a D state, because the code boils down to

	if (down_write_trylock(&mm->mmap_sem))
		down_write(&mm->mmap_sem);

which is obviously a stupid instant deadlock.  Fix the code so that we
only try to take the lock once.

This bug was introduced in commit f7c6a7b5 ("IB/uverbs: Export
ib_umem_get()/ib_umem_release() to modules") which fortunately never
made it into a release, and was reported by Pete Wyckoff <pw@osc.edu>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-06-21 11:05:58 -07:00
FUJITA Tomonori
da9c0c770e [SCSI] iscsi_iser: convert to use the data buffer accessors
iscsi_iser: convert to use the data buffer accessors

- remove the unnecessary map_single path.

- convert to use the new accessors for the sg lists and the
parameters.

TODO: use scsi_for_each_sg().

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2007-06-18 19:48:43 -05:00
Roland Dreier
e61ef2416b IB/mlx4: Make sure inline data segments don't cross a 64 byte boundary
Inline data segments in send WQEs are not allowed to cross a 64 byte
boundary.  We use inline data segments to hold the UD headers for MLX
QPs (QP0 and QP1).  A send with GRH on QP1 will have a UD header that
is too big to fit in a single inline data segment without crossing a
64 byte boundary, so split the header into two inline data segments.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-06-18 09:23:47 -07:00
Roland Dreier
5ae2a7a836 IB/mlx4: Handle FW command interface rev 3
Upcoming firmware introduces command interface revision 3, which
changes the way port capabilities are queried and set.  Update the
driver to handle both the new and old command interfaces by adding a
new MLX4_FLAG_OLD_PORT_CMDS that it is set after querying the firmware
interface revision and then using the correct interface based on the
setting of the flag.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-06-18 08:15:02 -07:00
Jack Morgenstein
082dee3216 IB/mlx4: Handle buffer wraparound in __mlx4_ib_cq_clean()
When compacting CQ entries, we need to set the correct value of the
ownership bit in case the value is different between the index we copy
the CQE from and the index we copy it to.

Found by Ronni Zimmerman of Mellanox.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-06-18 08:13:59 -07:00
Roland Dreier
54e95f8dcb IB/mlx4: Get rid of max_inline_data calculation
The calculation of max_inline_data in set_kernel_sq_size() is bogus,
since it doesn't take into account the fact that inline segments may
not cross a 64-byte boundary, and hence multiple inline segments will
probably need to be used to post large inline sends.

We don't support inline sends for kernel QPs anyway, so there's no
point in doing this calculation anyway, since the field is just zeroed
out a little later.  So just delete the bogus calculation.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-06-18 08:13:53 -07:00
Roland Dreier
0e6e741621 IB/mlx4: Handle new FW requirement for send request prefetching
New ConnectX firmware introduces FW command interface revision 2,
which requires that for each QP, a chunk of send queue entries (the
"headroom") is kept marked as invalid, so that the HCA doesn't get
confused if it prefetches entries that haven't been posted yet.  Add
code to the driver to do this, and also update the user ABI so that
userspace can request that the prefetcher be turned off for userspace
QPs (we just leave the prefetcher on for all kernel QPs).

Unfortunately, marking send queue entries this way is confuses older
firmware, so we change the driver to allow only FW command interface
revisions 2.  This means that users will have to update their firmware
to work with the new driver, but the firmware is changing quickly and
the old firmware has lots of other bugs anyway, so this shouldn't be too
big a deal.

Based on a patch from Jack Morgenstein <jackm@dev.mellanox.co.il>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-06-18 08:13:48 -07:00
Roland Dreier
42c059ea2b IB/mlx4: Fix warning in rounding up queue sizes
Doing max(1, foo) where foo is u32 generates a warning, because 1 is a
signed constant.  Fix this by using 1U instead.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-06-12 10:52:02 -07:00
Roland Dreier
614c3c85b5 IB/mlx4: Fix handling of wq->tail for send completions
Cast the increment added to wq->tail when send completions are
processed to u16 to avoid using wrong values caused by standard
integer promotions.

The same bug was fixed in libmlx4 by Eli Cohen <eli@mellanox.co.il>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-06-12 10:50:42 -07:00
Linus Torvalds
99f9f3d49c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/mlx4: Make sure RQ allocation is always valid
  RDMA/cma: Fix initialization of next_port
  IB/mlx4: Fix zeroing of rnr_retry value in ib_modify_qp()
  mlx4_core: Don't set MTT address in dMPT entries with PA set
  mlx4_core: Check firmware command interface revision
  IB/mthca, mlx4_core: Fix typo in comment
  mlx4_core: Free catastrophic error MSI-X interrupt with correct dev_id
  mlx4_core: Initialize ctx_list and ctx_lock earlier
  mlx4_core: Fix CQ context layout
2007-06-11 15:46:08 -07:00
Roland Dreier
a4cd7ed86f IB/mlx4: Make sure RQ allocation is always valid
QPs attached to an SRQ must never have their own RQ, and QPs not
attached to SRQs must have an RQ with at least 1 entry.  Enforce all
of this in set_rq_size().

Based on a patch by Eli Cohen <eli@mellanox.co.il>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-06-07 23:24:39 -07:00
Sean Hefty
bf2944bd56 RDMA/cma: Fix initialization of next_port
next_port should be between sysctl_local_port_range[0] and [1].
However, it is initially set to a random value with get_random_bytes().  
If the value is negative when treated as a signed integer, next_port
can end up outside the expected range because of the result of the % 
operator being negative.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-06-07 23:24:38 -07:00
Jack Morgenstein
57f01b5339 IB/mlx4: Fix zeroing of rnr_retry value in ib_modify_qp()
The code in __mlx4_ib_modify_qp() overwrites context->params1 after
the RNR retry parameter is ORed in, which results in the RNR retry
parameter always being set to 0.  Fix this by moving where we OR in
the value to later in the function, after the initial assignment of
context->params1.

Found by the Mellanox firmware group.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-06-07 23:24:38 -07:00
Herbert Xu
42f811b8bc [IPV4]: Convert IPv4 devconf to an array
This patch converts the ipv4_devconf config members (everything except
sysctl) to an array.  This allows easier manipulation which will be
needed later on to provide better management of default config values.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-07 13:39:13 -07:00
Roland Dreier
3e1db334dc IB/mthca, mlx4_core: Fix typo in comment
s/signifant/significant/

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-06-07 11:51:59 -07:00
FUJITA Tomonori
bb350d1dec [SCSI] ib_srp: convert to use the data buffer accessors
- remove the unnecessary map_single path.

- convert to use the new accessors for the sg lists and the
parameters.

Jens Axboe <jens.axboe@oracle.com> did the for_each_sg cleanup.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Roland Dreier <rdreier@cisco.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2007-06-07 09:02:50 -05:00
Mike Christie
d8196ed218 [SCSI] iscsi class, iscsi_tcp, iser, qla4xxx: add netdevname sysfs attr
iSCSI must support software iscsi (iscsi_tcp, iser), hardware iscsi (qla4xxx),
and partial offload (broadcom). To be able to allow each stack or driver
or port (virtual or physical) to be able to log into the same target portal
we use the initiator tuple [[HWADDRESS | NETDEVNAME], INITIATOR_NAME] and
the target tuple [TARGETNAME, CONN_ADDRESS, CONN_PORT] to id a session.
This patch adds the netdev name, which is used by software iscsi when
it binds a session to a netdevice using the SO_BINDTODEVICE sock opt.
It cannot use HWADDRESS because if someone did vlans then the same netdevice
will have the same mac and the initiator,target id will not be unique.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: David C Somayajulu <david.somayajulu@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2007-06-02 15:38:04 -04:00
Mike Christie
1548271ece [SCSI] libiscsi: make can_queue configurable
This patch allows us to set can_queue and cmds_per_lun from userspace
when we create the session/host. From there we can set it on a per
target basis. The patch fully converts iscsi_tcp, but only hooks
up ib_iser for cmd_per_lun since it currently has a lots of preallocations
based on can_queue.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: Roland Dreier <rdreier@cisco.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2007-06-02 15:34:46 -04:00
Mike Christie
77a23c21aa [SCSI] libiscsi: fix iscsi cmdsn allocation
The cmdsn allocation and pdu transmit code can race, and we can end
up sending a pdu with cmdsn 10 before a pdu with 5. The target will
then fail the connection/session. This patch fixes the problem by
delaying the cmdsn allocation until we are about to send the pdu.

This also removes the xmitmutex. We were using the connection xmitmutex
during error handling to handle races with mtask and ctask cleanup and
completion. For ctasks we now have nice refcounting and for the mtask,
if we hit the case where the mtask timesout and it is floating
around somewhere in the driver, we end up dropping the session.
And to handle session level cleanup, we use the xmit suspend bit
along with scsi_flush_queue and the session lock to make sure
that the xmit thread is not possibly transmitting a task while
we are trying to kill it.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: Roland Dreier <rdreier@cisco.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2007-06-02 15:34:14 -04:00
Mike Christie
b2c6416736 [SCSI] iscsi class, iscsi_tcp, ib_iser: add sysfs chap file
The attached patches add sysfs files for the chap settings
to the iscsi transport class, iscsi_tcp and ib_iser. This is
needed for software iscsi because there are times when iscsid
can die and it will need to reread the values it was using.
And it is needed by qla4xxx for basic management opertaions.
This patch does not hook in qla4xxx yet, because I am not sure
the mbx command to use.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: Roland Dreier <rdreier@cisco.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2007-06-01 12:58:58 -04:00
Mike Christie
857ae0bdb7 [SCSI] iscsi: Some fixes in preparation for bidirectional support - total_length
- Remove shadow of request length from struct iscsi_cmd_task.
- change all users to use scsi_cmnd->request_bufflen directly

(With bidi we will use scsi-ml API to retrieve in/out length)

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: Roland Dreier <rdreier@cisco.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2007-06-01 12:58:22 -04:00
Mike Christie
8ad5781ae9 [SCSI] iscsi class, qla4xxx, iscsi_tcp, ib_iser: export/set initiator name
For iscsi root boot, software iscsi needs to know what the BIOS/OF
initiator used for the initiator name so this puts it in sysfs
for userspace to be able to pick up.

For hw iscsi, it is nice to see what the card is using.

This patch adds the new param, and hooks in qla4xxx, iscsi_tcp, and ib_iser.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: David C Somayajulu <david.somayajulu@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2007-06-01 12:56:40 -04:00
Mike Christie
0801c242a3 [SCSI] libiscsi, iscsi_tcp, ib_iser : add sw iscsi host get/set params helpers
iscsid and udev need to key off the hw address being
used so add some helpers for iser and iscsi tcp.

Also convert them

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: Roland Dreier <rdreier@cisco.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2007-06-01 12:55:23 -04:00
Sean Hefty
d998ccce02 IB/cm: Fix stale connection detection
The ib_cm can incorrectly detect a stale connection (a new connection
request for a QPN that is already connected) as a duplicate connection
request.  Separate the handling of potential duplicate REQs from stale
connections.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-29 16:07:09 -07:00
Michael S. Tsirkin
ec56dc0b7f IPoIB/cm: Fix performance regression on Mellanox
commit 518b1646 ("IPoIB/cm: Fix SRQ WR leak") introduced a severe
performance regression on Mellanox cards, because keeping a QP in the
error state for extended periods of time moves hardware to the slow
path (until the QP is destroyed).  For example, MPI latency goes from
~3 usecs to ~7 usecs.

Fix this by posting a send WR on one of the QPs that are being
flushed, instead of using a separate drain QP that is kept in the
error state.

This fixes bug <https://bugs.openfabrics.org/show_bug.cgi?id=636>,
reported and bisected by Scott Weitzenkamp at Cisco and debugged by
Sasha Mikheev at Voltaire.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-29 16:07:09 -07:00
Michael S. Tsirkin
8b7e15772a IB/mthca: Fix handling of send CQE with error for QPs connected to SRQ
mthca_free_err_wqe() currently treats both send and receive CQEs
identically if a QP is using an SRQ.  But for Tavor hardware, send
CQEs with error can be chained together even if the RQ is part of SRQ,
so we may miss some CQEs.

Fix by following the WQE chain for all send CQEs even for non-SRQ QPs.

This fixes crashes in IPoIB CM:
<https://bugs.openfabrics.org//show_bug.cgi?id=604>

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-29 16:07:09 -07:00
Michael S. Tsirkin
2dfbfc3712 IPoIB/cm: Drain cq in ipoib_cm_dev_stop()
Since NAPI polling is disabled while ipoib_cm_dev_stop() is running,
ipoib_cm_dev_stop() must poll the CQ itself in order to see the
packets draining.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-24 14:02:40 -07:00
Michael S. Tsirkin
8fd357a6e3 IPoIB/cm: Fix timeout check in ipoib_cm_dev_stop()
time_after() was used backwards, so the timeout occurred immediately.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-24 14:02:39 -07:00
Stefan Roscher
65a2c841d6 IB/ehca: Fix number of send WRs reported for new QP
Due to a typo, the driver was reporting the wrong number of "actual send
WRs" after ehca_create_qp().

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-24 14:02:39 -07:00
Eli Cohen
c0be5fb5f8 IB/mlx4: Initialize send queue entry ownership bits
We need to initialize the owner bit of send queue WQEs to hardware 
ownership whenever the QP is modified from reset to init, not just 
when the QP is first allocated.  This avoids having the hardware 
process stale WQEs when the QP is moved to reset but not destroyed and 
then modified to init again. 

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-24 14:02:38 -07:00
Roland Dreier
02d89b8708 IB/mlx4: Don't allocate RQ doorbell if using SRQ
If a QP is attached to a shared receive queue (SRQ), then it doesn't
have a receive queue (RQ).  So don't allocate an RQ doorbell (or map a
doorbell from userspace for userspace QPs) for that QP.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-23 15:16:08 -07:00
Linus Torvalds
8aee74c8ee Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
  IB/cm: Improve local id allocation
  IPoIB/cm: Fix SRQ WR leak
  IB/ipoib: Fix typos in error messages
  IB/mlx4: Check if SRQ is full when posting receive
  IB/mlx4: Pass send queue sizes from userspace to kernel
  IB/mlx4: Fix check of opcode in mlx4_ib_post_send()
  mlx4_core: Fix array overrun in dump_dev_cap_flags()
  IB/mlx4: Fix RESET to RESET and RESET to ERROR transitions
  IB/mthca: Fix RESET to ERROR transition
  IB/mlx4: Set GRH:HopLimit when sending globally routed MADs
  IB/mthca: Set GRH:HopLimit when building MLX headers
  IB/mlx4: Fix check of max_qp_dest_rdma in modify QP
  IB/mthca: Fix use-after-free on device restart
  IB/ehca: Return proper error code if register_mr fails
  IPoIB: Handle P_Key table reordering
  IB/core: Use start_port() and end_port()
  IB/core: Add helpers for uncached GID and P_Key searches
  IB/ipath: Fix potential deadlock with multicast spinlocks
  IB/core: Free umem when mm is already gone
2007-05-21 16:19:32 -07:00
Michael S. Tsirkin
9f81036c54 IB/cm: Improve local id allocation
The IB CM uses an idr for local id allocations, with a running counter
as start_id.  This fails to generate distinct ids if

1. An id is constantly created and destroyed
2. A chunk of ids just beyond the current next_id value is occupied

This in turn leads to an increased chance of connection request being
mis-detected as a duplicate, sometimes for several retries, until
next_id gets past the block of allocated ids. This has been observed
in practice.

As a fix, remember the last id allocated and start immediately above it.
This also fixes a problem with the old code, where next_id might
overflow and become negative.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-21 13:41:29 -07:00
Michael S. Tsirkin
518b1646f8 IPoIB/cm: Fix SRQ WR leak
SRQ WR leakage has been observed with IPoIB/CM: e.g. flipping ports on
and off will, with time, leak out all WRs and then all connections
will start getting RNR NAKs.  Fix this in the way suggested by spec:
move the QP being destroyed to the error state, wait for "Last WQE
Reached" event and then post WR on a "drain QP" connected to the same
CQ.  Once we observe a completion on the drain QP, it's safe to call
ib_destroy_qp.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-21 13:35:40 -07:00
Michael S. Tsirkin
24bd1e4e32 IB/ipoib: Fix typos in error messages
Trivial error message fixups.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-21 13:29:15 -07:00
Alexey Dobriyan
e8edc6e03a Detach sched.h from mm.h
First thing mm.h does is including sched.h solely for can_do_mlock() inline
function which has "current" dereference inside. By dealing with can_do_mlock()
mm.h can be detached from sched.h which is good. See below, why.

This patch
a) removes unconditional inclusion of sched.h from mm.h
b) makes can_do_mlock() normal function in mm/mlock.c
c) exports can_do_mlock() to not break compilation
d) adds sched.h inclusions back to files that were getting it indirectly.
e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
   getting them indirectly

Net result is:
a) mm.h users would get less code to open, read, preprocess, parse, ... if
   they don't need sched.h
b) sched.h stops being dependency for significant number of files:
   on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
   after patch it's only 3744 (-8.3%).

Cross-compile tested on

	all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
	alpha alpha-up
	arm
	i386 i386-up i386-defconfig i386-allnoconfig
	ia64 ia64-up
	m68k
	mips
	parisc parisc-up
	powerpc powerpc-up
	s390 s390-up
	sparc sparc-up
	sparc64 sparc64-up
	um-x86_64
	x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig

as well as my two usual configs.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-21 09:18:19 -07:00
Roland Dreier
56a8c8b6ac IB/mlx4: Check if SRQ is full when posting receive
Make mlx4_post_srq_recv() fail if the SRQ is full (head == tail).

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-20 20:19:24 -07:00
Eli Cohen
2446304dd6 IB/mlx4: Pass send queue sizes from userspace to kernel
Pass the number of WQEs for the send queue and their size from userspace
to the kernel to avoid having to keep the QP size calculations in sync 
between the kernel driver and libmlx4.  This fixes a bug seen with the 
current mlx4_ib driver and current libmlx4 caused by a difference in the 
calculated sizes for SQ WQEs.  Also, this gives more flexibility for 
userspace to experiment with using multiple WQE BBs for a single SQ WQE.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-20 10:18:04 -07:00
Roland Dreier
59b0ed1212 IB/mlx4: Fix check of opcode in mlx4_ib_post_send()
wr->opcode is invalid if it's >= ARRAY_SIZE(mlx4_ib_opcode), not just
strictly >.

This was spotted by the Coverity checker (CID 1643).

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19 08:51:58 -07:00
Michael S. Tsirkin
65adfa911a IB/mlx4: Fix RESET to RESET and RESET to ERROR transitions
According to the IB spec, a QP can be moved from RESET back to RESET
or to the ERROR state, but mlx4 firmware does not support this and
returns an error if we try.  Fix the RESET to RESET transition by
just returning 0 without doing anything, and fix RESET to ERROR by
moving the QP from RESET to INIT with dummy parameters and then
transitioning from INIT to ERROR.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19 08:51:57 -07:00
Michael S. Tsirkin
b18aad7150 IB/mthca: Fix RESET to ERROR transition
According to the IB spec, a QP can be moved from RESET to the ERROR 
state, but mthca firmware does not support this and returns an error if 
we try.  Work around this FW limitation by moving the QP from RESET to
INIT with dummy parameters and then transitioning from INIT to ERROR.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19 08:51:57 -07:00
Roland Dreier
1526130351 IB/mlx4: Set GRH:HopLimit when sending globally routed MADs
This is the same issue discovered in mthca by Rolf Manderscheid
<rvm@obsidianresearch.com>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19 08:51:57 -07:00
Rolf Manderscheid
3f37cae694 IB/mthca: Set GRH:HopLimit when building MLX headers
Global CM packets used by rmda_cm were being sent with a GRH:hopLimit
of zero, causing them to be dropped by the router.  The problem is a
missing initialization of the hop_limit field in mthca_read_ah(),
which was called by build_mlx_header() when sending a MAD on QP1.

Signed-off-by: Rolf Manderscheid <rvm@obsidianresearch.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19 08:51:56 -07:00
Eli Cohen
1f8f7b7a7b IB/mlx4: Fix check of max_qp_dest_rdma in modify QP
max_qp_dest_rdma is already in natural units - no need to shift.  This
was discovered by a test that deliberately requests more outstanding
atomic operation than the device supports.

Found by Sagi Rotem at Mellanox.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19 08:51:56 -07:00
Ali Ayoub
de57c9f102 IB/mthca: Fix use-after-free on device restart
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19 08:51:56 -07:00
Hoang-Nam Nguyen
bd5a6ccc0e IB/ehca: Return proper error code if register_mr fails
Set the return code of ehca_register_mr() to ENOMEM if the corresponding
firmware call fails due to out of resources.  Some other error codes
were explicitly mapped to EINVAL -- just remove those cases so they
get mapped to the default case, which already returns EINVAL anyway.

Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19 08:51:54 -07:00
Yosef Etigin
26bbf13ce1 IPoIB: Handle P_Key table reordering
SM reconfiguration or failover possibly causes a shuffling of the values
in the P_Key table. Right now, IPoIB only queries for the P_Key index
once when it creates the device QP, and hence there are problems if the
index of a P_Key value changes.  Fix this by using the PKEY_CHANGE event
to trigger a recheck of the P_Key index.

Signed-off-by: Yosef Etigin <yosefe@voltaire.com>
Acked-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19 08:51:54 -07:00
Roland Dreier
1af4c435f3 IB/core: Use start_port() and end_port()
Clean up ib_query_port() and ib_modify_port() slightly by using the 
just-added start_port() and end_port() helpers.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19 08:51:54 -07:00
Yosef Etigin
5eb620c81c IB/core: Add helpers for uncached GID and P_Key searches
Add ib_find_gid() and ib_find_pkey() functions that use uncached device
queries.  The calls might block but the returns are always up-to-date.
Cache P_Key and GID table lengths in core to avoid extra port info queries.

Signed-off-by: Yosef Etigin <yosefe@voltaire.com>
Acked-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19 08:51:53 -07:00
Roland Dreier
8b8c8bca3a IB/ipath: Fix potential deadlock with multicast spinlocks
Lockdep found the following potential deadlock between mcast_lock and
n_mcast_grps_lock: mcast_lock is taken from both interrupt context and
process context, so spin_lock_irqsave() must be used to take it.
n_mcast_grps_lock is only taken from process context, so at first it
seems safe to take it with plain spin_lock(); however, it also nests
inside mcast_lock, and hence we could deadlock:

  cpu A                                   cpu B
    ipath_mcast_add():
      spin_lock_irq(&mcast_lock);

                                            ipath_mcast_detach():
                                              spin_lock(&n_mcast_grps_lock);

                                            <enter interrupt>

                                            ipath_mcast_find():
                                              spin_lock_irqsave(&mcast_lock);

      spin_lock(&n_mcast_grps_lock);

Fix this by using spin_lock_irq() to take n_mcast_grps_lock.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19 08:51:53 -07:00
Eli Cohen
7b82cd8ee7 IB/core: Free umem when mm is already gone
Free umem when task's mm is already destroyed by the time
ib_umem_release gets called.

Found by Dotan Barak at Mellanox.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19 08:51:53 -07:00
Linus Torvalds
de7860c3f3 Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
  IPoIB/cm: Optimize stale connection detection
  IB/mthca: Set cleaned CQEs back to HW ownership when cleaning CQ
  IB/mthca: Fix posting >255 recv WRs for Tavor
  RDMA/cma: Add check to validate that cm_id is bound to a device
  RDMA/cma: Fix synchronization with device removal in cma_iw_handler
  RDMA/cma: Simplify device removal handling code
  IB/ehca: Disable scaling code by default, bump version number
  IB/ehca: Beautify sysfs attribute code and fix compiler warnings
  IB/ehca: Remove _irqsave, move #ifdef
  IB/ehca: Fix AQP0/1 QP number
  IB/ehca: Correctly set GRH mask bit in ehca_modify_qp()
  IB/ehca: Serialize hypervisor calls in ehca_register_mr()
  IB/ipath: Shadow the gpio_mask register
  IB/mlx4: Fix uninitialized spinlock for 32-bit archs
  mlx4_core: Remove unused doorbell_lock
  net: Trivial MLX4_DEBUG dependency fix.
2007-05-15 09:52:31 -07:00
Michael S. Tsirkin
7c5b9ef857 IPoIB/cm: Optimize stale connection detection
In the presence of some running RX connections, we repeat
queue_delayed_work calls each 4 RX WRs, which is a waste.  It's enough
to start stale task when a first passive connection is added, and
rerun it every IPOIB_CM_RX_DELAY as long as there are outstanding
passive connections.

This removes some code from RX data path.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14 14:11:01 -07:00
Michael S. Tsirkin
bd18c11277 IB/mthca: Set cleaned CQEs back to HW ownership when cleaning CQ
mthca_cq_clean() updates the CQ consumer index without moving CQEs
back to HW ownership.  As a result, the same WRID might get reported
twice, resulting in a use-after-free.  This was observed in IPoIB CM.
Fix by moving all freed CQEs to HW ownership.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=617>

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14 14:10:34 -07:00
Michael S. Tsirkin
3e28c56b9b IB/mthca: Fix posting >255 recv WRs for Tavor
Fix posting lists of > 255 receive WRs for Tavor: rq.next_ind must
be updated each doorbell, otherwise the next doorbell will use an
incorrect index.

Found by Ronni Zimmermann at Mellanox.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14 14:10:34 -07:00
Sean Hefty
6c719f5c6c RDMA/cma: Add check to validate that cm_id is bound to a device
Several checks in the rdma_cm check against the state of the
cm_id, but only to validate that the cm_id is bound to an underlying
transport specific CM and an RDMA device.  Make the check explicit
in what we're trying to check for, since we're not synchronizing
against the cm_id state.

This will allow a user to disconnect a cm_id or reject a connection
after receiving a device removal event.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14 14:10:32 -07:00
Sean Hefty
be65f086f2 RDMA/cma: Fix synchronization with device removal in cma_iw_handler
The cma_iw_handler needs to validate the state of the rdma_cm_id before
processing a new connection request to ensure that a device removal is
not already being processed for the same rdma_cm_id.  Without the state
check, the user can receive simultaneous callbacks for the same cm_id, or
a callback after they've destroyed the cm_id.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14 13:56:32 -07:00
Sean Hefty
8aa08602bd RDMA/cma: Simplify device removal handling code
Add a new routine and rename another to encapsulate common code for
synchronizing with device removal.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14 13:54:49 -07:00
Joachim Fenkes
4e430dcb7b IB/ehca: Disable scaling code by default, bump version number
- Scaling code is still considered experimental, so disable it by default
- Increase version to SVNEHCA_0023

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14 13:41:40 -07:00
Joachim Fenkes
bba9b6013e IB/ehca: Beautify sysfs attribute code and fix compiler warnings
eHCA's sysfs attributes are now being created via sysfs_create_group(),
making the process neatly table-driven. The return value is checked, thus
fixing a few compiler warnings.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14 13:40:45 -07:00
Joachim Fenkes
c7a14939e7 IB/ehca: Remove _irqsave, move #ifdef
- In ehca_process_eq(), we're IRQ safe throughout the whole function, so we
  don't need another _irqsave in the middle of flight.

- take_over_work() is only called by comp_pool_callback(), so it can move
  into the same #ifdef block.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14 13:40:05 -07:00
Hoang-Nam Nguyen
c55a0ddd8e IB/ehca: Fix AQP0/1 QP number
AQP0/1 should report qp_num={0|1} and the actual QP# should be stored
in struct ehca_qp, not the other way round.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14 13:39:31 -07:00
Joachim Fenkes
92761cdaf2 IB/ehca: Correctly set GRH mask bit in ehca_modify_qp()
The driver needs to always supply the "GRH present" flag to the
hypervisor, whether it's true or false. Not supplying it (i.e. not
setting the corresponding mask bit) amounts to a "perhaps", which we
don't want.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14 13:38:57 -07:00
Stefan Roscher
5d88278e3b IB/ehca: Serialize hypervisor calls in ehca_register_mr()
Some pSeries hypervisor versions show a race condition in the allocate
MR hCall.  Serialize this call per adapter to circumvent this problem.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14 13:38:11 -07:00
Arthur Jones
8f140b407f IB/ipath: Shadow the gpio_mask register
Once upon a time, GPIO interrupts were rare.  But then a chip bug in
the waldo series forced the use of a GPIO interrupt to signal packet
reception.  This greatly increased the frequency of GPIO interrupts
which have the gpio_mask bits set on the waldo chips.  Other bits in
the gpio_status register are used for I2C clock and data lines, these
bits are usually on.  An "unlikely" annotation leftover from the old
days was improperly applied to these bits, and an unnecessary chip
mmio read was being accessed in the interrupt fast path on waldo.

Remove the stagnant unlikely annotation in the interrupt handler and
keep a shadow copy of the gpio_mask register to avoid the slow mmio
read when testing for interruptable GPIO bits.

Signed-off-by: Arthur Jones <arthur.jones@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14 13:22:42 -07:00
Jack Morgenstein
26c6bc7b81 IB/mlx4: Fix uninitialized spinlock for 32-bit archs
uar_lock spinlock was used in mlx4_ib_cq_arm without being initialized
(this only affects 32-bit archs, because uar_lock is not used on
64-bit archs and MLX4_INIT_DOORBELL_LOCK() is a NOP).

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14 13:02:58 -07:00
Martin Schwidefsky
e25df1205f [S390] Kconfig: menus with depends on HAS_IOMEM.
Add "depends on HAS_IOMEM" to a number of menus to make them
disappear for s390 which does not have I/O memory.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2007-05-10 15:46:07 +02:00
Linus Torvalds
de5603748a Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
  IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters
  IB: Put rlimit accounting struct in struct ib_umem
  IB/uverbs: Export ib_umem_get()/ib_umem_release() to modules
2007-05-09 19:40:09 -07:00
Rafael J. Wysocki
8bb7844286 Add suspend-related notifications for CPU hotplug
Since nonboot CPUs are now disabled after tasks and devices have been
frozen and the CPU hotplug infrastructure is used for this purpose, we need
special CPU hotplug notifications that will help the CPU-hotplug-aware
subsystems distinguish normal CPU hotplug events from CPU hotplug events
related to a system-wide suspend or resume operation in progress.  This
patch introduces such notifications and causes them to be used during
suspend and resume transitions.  It also changes all of the
CPU-hotplug-aware subsystems to take these notifications into consideration
(for now they are handled in the same way as the corresponding "normal"
ones).

[oleg@tv-sign.ru: cleanups]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:56 -07:00
Roland Dreier
225c7b1fee IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters
Add an InfiniBand driver for Mellanox ConnectX adapters.  Because
these adapters can also be used as ethernet NICs and Fibre Channel 
HBAs, the driver is split into two modules: 
 
  mlx4_core: Handles low-level things like device initialization and 
    processing firmware commands.  Also controls resource allocation 
    so that the InfiniBand, ethernet and FC functions can share a 
    device without stepping on each other. 
 
  mlx4_ib: Handles InfiniBand-specific things; plugs into the 
    InfiniBand midlayer. 

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-08 18:00:38 -07:00
Roland Dreier
1bf66a3042 IB: Put rlimit accounting struct in struct ib_umem
When memory pinned with ib_umem_get() is released, ib_umem_release()
needs to subtract the amount of memory being unpinned from
mm->locked_vm.  However, ib_umem_release() may be called with
mm->mmap_sem already held for writing if the memory is being released
as part of an munmap() call, so it is sometimes necessary to defer
this accounting into a workqueue.

However, the work struct used to defer this accounting is dynamically
allocated before it is queued, so there is the possibility of failing
that allocation.  If the allocation fails, then ib_umem_release has no
choice except to bail out and leave the process with a permanently
elevated locked_vm.

Fix this by allocating the structure to defer accounting as part of
the original struct ib_umem, so there's no possibility of failing a
later allocation if creating the struct ib_umem and pinning memory
succeeds.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-08 18:00:37 -07:00
Roland Dreier
f7c6a7b5d5 IB/uverbs: Export ib_umem_get()/ib_umem_release() to modules
Export ib_umem_get()/ib_umem_release() and put low-level drivers in
control of when to call ib_umem_get() to pin and DMA map userspace,
rather than always calling it in ib_uverbs_reg_mr() before calling the
low-level driver's reg_user_mr method.

Also move these functions to be in the ib_core module instead of
ib_uverbs, so that driver modules using them do not depend on
ib_uverbs.

This has a number of advantages:
 - It is better design from the standpoint of making generic code a
   library that can be used or overridden by device-specific code as
   the details of specific devices dictate.
 - Drivers that do not need to pin userspace memory regions do not
   need to take the performance hit of calling ib_mem_get().  For
   example, although I have not tried to implement it in this patch,
   the ipath driver should be able to avoid pinning memory and just
   use copy_{to,from}_user() to access userspace memory regions.
 - Buffers that need special mapping treatment can be identified by
   the low-level driver.  For example, it may be possible to solve
   some Altix-specific memory ordering issues with mthca CQs in
   userspace by mapping CQ buffers with extra flags.
 - Drivers that need to pin and DMA map userspace memory for things
   other than memory regions can use ib_umem_get() directly, instead
   of hacks using extra parameters to their reg_phys_mr method.  For
   example, the mlx4 driver that is pending being merged needs to pin
   and DMA map QP and CQ buffers, but it does not need to create a
   memory key for these buffers.  So the cleanest solution is for mlx4
   to call ib_umem_get() in the create_qp and create_cq methods.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-08 18:00:37 -07:00
Linus Torvalds
df6d3916f3 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (77 commits)
  [POWERPC] Abolish powerpc_flash_init()
  [POWERPC] Early serial debug support for PPC44x
  [POWERPC] Support for the Ebony 440GP reference board in arch/powerpc
  [POWERPC] Add device tree for Ebony
  [POWERPC] Add powerpc/platforms/44x, disable platforms/4xx for now
  [POWERPC] MPIC U3/U4 MSI backend
  [POWERPC] MPIC MSI allocator
  [POWERPC] Enable MSI mappings for MPIC
  [POWERPC] Tell Phyp we support MSI
  [POWERPC] RTAS MSI implementation
  [POWERPC] PowerPC MSI infrastructure
  [POWERPC] Rip out the existing powerpc msi stubs
  [POWERPC] Remove use of 4level-fixup.h for ppc32
  [POWERPC] Add powerpc PCI-E reset API implementation
  [POWERPC] Holly bootwrapper
  [POWERPC] Holly DTS
  [POWERPC] Holly defconfig
  [POWERPC] Add support for 750CL Holly board
  [POWERPC] Generalize tsi108 PCI setup
  [POWERPC] Generalize tsi108 PHY types
  ...

Fixed conflict in include/asm-powerpc/kdebug.h manually

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:50:19 -07:00
Jeff Layton
1a1c9bb433 inode numbering: change libfs sb creation routines to avoid collisions with their root inodes
This patch makes it so that simple_fill_super and get_sb_pseudo assign their
root inodes to be number 1.  It also fixes up a couple of callers of
simple_fill_super that were passing in files arrays that had an index at
number 1, and adds a warning for any caller that sends in such an array.

It would have been nice to have made it so that it wasn't possible to make
such a collision, but some callers need to be able to control what inode
number their entries get, so I think this is the best that can be done.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:16 -07:00
Randy Dunlap
e63340ae6b header cleaning: don't include smp_lock.h when not used
Remove includes of <linux/smp_lock.h> where it is not used/needed.
Suggested by Al Viro.

Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:07 -07:00
Paul Mackerras
02bbc0f09c Merge branch 'linux-2.6' 2007-05-08 13:37:51 +10:00
Linus Torvalds
972d45fb43 Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
  IPoIB: Convert to NAPI
  IB: Return "maybe missed event" hint from ib_req_notify_cq()
  IB: Add CQ comp_vector support
  IB/ipath: Fix a race condition when generating ACKs
  IB/ipath: Fix two more spin lock problems
  IB/fmr_pool: Add prefix to all printks
  IB/srp: Set proc_name
  IB/srp: Add orig_dgid sysfs attribute to scsi_host
  IPoIB/cm: Don't crash if remote side uses one QP for both directions
  RDMA/cxgb3: Support for new abort logic
  RDMA/cxgb3: Initialize cpu_idx field in cpl_close_listserv_req message
  RDMA/cxgb3: Fail qp creation if the requested max_inline is too large
  RDMA/cxgb3: Fix TERM codes
  IPoIB/cm: Fix error handling in ipoib_cm_dev_open()
  IB/ipath: Don't corrupt pending mmap list when unmapped objects are freed
  IB/mthca: Work around kernel QP starvation
  IB/ipath: Don't put QP in timeout queue if waiting to send
  IB/ipath: Don't call spin_lock_irq() from interrupt context
2007-05-07 12:18:21 -07:00
Roland Dreier
8d1cc86a62 IPoIB: Convert to NAPI
Convert the IP-over-InfiniBand network device driver over to using
NAPI to handle completions for the main CQ.  This covers all receives
as well as datagram mode sends; send completions for connected mode
connections are still handled from interrupt context.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-06 21:18:11 -07:00
Roland Dreier
ed23a72778 IB: Return "maybe missed event" hint from ib_req_notify_cq()
The semantics defined by the InfiniBand specification say that
completion events are only generated when a completions is added to a
completion queue (CQ) after completion notification is requested.  In
other words, this means that the following race is possible:

	while (CQ is not empty)
		ib_poll_cq(CQ);
	// new completion is added after while loop is exited
	ib_req_notify_cq(CQ);
	// no event is generated for the existing completion

To close this race, the IB spec recommends doing another poll of the
CQ after requesting notification.

However, it is not always possible to arrange code this way (for
example, we have found that NAPI for IPoIB cannot poll after
requesting notification).  Also, some hardware (eg Mellanox HCAs)
actually will generate an event for completions added before the call
to ib_req_notify_cq() -- which is allowed by the spec, since there's
no way for any upper-layer consumer to know exactly when a completion
was really added -- so the extra poll of the CQ is just a waste.

Motivated by this, we add a new flag "IB_CQ_REPORT_MISSED_EVENTS" for
ib_req_notify_cq() so that it can return a hint about whether the a
completion may have been added before the request for notification.
The return value of ib_req_notify_cq() is extended so:

	 < 0	means an error occurred while requesting notification
	== 0	means notification was requested successfully, and if
		IB_CQ_REPORT_MISSED_EVENTS was passed in, then no
		events were missed and it is safe to wait for another
		event.
	 > 0	is only returned if IB_CQ_REPORT_MISSED_EVENTS was
		passed in.  It means that the consumer must poll the
		CQ again to make sure it is empty to avoid the race
		described above.

We add a flag to enable this behavior rather than turning it on
unconditionally, because checking for missed events may incur
significant overhead for some low-level drivers, and consumers that
don't care about the results of this test shouldn't be forced to pay
for the test.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-06 21:18:11 -07:00
Michael S. Tsirkin
f4fd0b224d IB: Add CQ comp_vector support
Add a num_comp_vectors member to struct ib_device and extend
ib_create_cq() to pass in a comp_vector parameter -- this parallels
the userspace libibverbs API.  Update all hardware drivers to set
num_comp_vectors to 1 and have all ULPs pass 0 for the comp_vector
value.  Pass the value of num_comp_vectors to userspace rather than
hard-coding a value of 1.

We want multiple CQ event vector support (via MSI-X or similar for
adapters that can generate multiple interrupts), but it's not clear
how many vectors we want, or how we want to deal with policy issues
such as how to decide which vector to use or how to set up interrupt
affinity.  This patch is useful for experimenting, since no core
changes will be necessary when updating a driver to support multiple
vectors, and we know that we want to make at least these changes
anyway.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-06 21:18:11 -07:00
Ralph Campbell
154257f362 IB/ipath: Fix a race condition when generating ACKs
Fix a problem where simple ACKs can be sent ahead of RDMA read
responses thus implicitly NAKing the RDMA read.

Signed-off-by: Ralph Campbell <ralph.cambpell@qlogic.com>
Signed-off-by: Robert Walsh <robert.walsh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-06 21:18:11 -07:00
Ralph Campbell
6ed89b9574 IB/ipath: Fix two more spin lock problems
Fix a missing unlock in ipath_rc_rcv_resp() and remove an extra unlock
from ipath_rc_rcv_error().

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-06 21:18:11 -07:00
Roland Dreier
1a70a05d9d IB/fmr_pool: Add prefix to all printks
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-06 21:18:11 -07:00
Roland Dreier
b7f008fdc9 IB/srp: Set proc_name
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-06 21:18:11 -07:00
Ishai Rabinovitz
3633b3d096 IB/srp: Add orig_dgid sysfs attribute to scsi_host
Add an orig_dgid attribute in sysfs for SRP scsi_hosts, so that
userspace can tell what the original dgid value written to the
add_target file was, even if the connection is redirected to a
different port while connecting.

Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-06 21:18:11 -07:00
Michael S. Tsirkin
d6ef7d68f6 IPoIB/cm: Don't crash if remote side uses one QP for both directions
The IPoIB CM spec allows the use of a single connection in both
active->passive and passive->active directions.  The current Linux
code uses one connection for both directions, but if another node only
uses one connection for both directions, we oops when we try to look
up the passive connection.  Fix by checking that qp_context is
non-NULL before dereferencing it.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
2007-05-06 21:18:11 -07:00
Steve Wise
aff9e39d97 RDMA/cxgb3: Support for new abort logic
The HW now posts 2 ABORT_RPL and/or PEER_ABORT_REQ messages.  We need
to handle them by silenty dropping the 1st but mark that we're ready
for the final message.  This plugs some close races between the uP and
HW.  Also update the minimum required firmware version.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-06 21:18:08 -07:00
Linus Torvalds
4f7a307dc6 Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (87 commits)
  [SCSI] fusion: fix domain validation loops
  [SCSI] qla2xxx: fix regression on sparc64
  [SCSI] modalias for scsi devices
  [SCSI] sg: cap reserved_size values at max_sectors
  [SCSI] BusLogic: stop using check_region
  [SCSI] tgt: fix rdma transfer bugs
  [SCSI] aacraid: fix aacraid not finding device
  [SCSI] aacraid: Correct SMC products in aacraid.txt
  [SCSI] scsi_error.c: Add EH Start Unit retry
  [SCSI] aacraid: [Fastboot] Panics for AACRAID driver during 'insmod' for kexec test.
  [SCSI] ipr: Driver version to 2.3.2
  [SCSI] ipr: Faster sg list fetch
  [SCSI] ipr: Return better qc_issue errors
  [SCSI] ipr: Disrupt device error
  [SCSI] ipr: Improve async error logging level control
  [SCSI] ipr: PCI unblock config access fix
  [SCSI] ipr: Fix for oops following SATA request sense
  [SCSI] ipr: Log error for SAS dual path switch
  [SCSI] ipr: Enable logging of debug error data for all devices
  [SCSI] ipr: Add new PCI-E IDs to device table
  ...
2007-05-05 13:30:44 -07:00
Jean Delvare
6473d160b4 PCI: Cleanup the includes of <linux/pci.h>
I noticed that many source files include <linux/pci.h> while they do
not appear to need it. Here is an attempt to clean it all up.

In order to find all possibly affected files, I searched for all
files including <linux/pci.h> but without any other occurence of "pci"
or "PCI". I removed the include statement from all of these, then I
compiled an allmodconfig kernel on both i386 and x86_64 and fixed the
false positives manually.

My tests covered 66% of the affected files, so there could be false
positives remaining. Untested files are:

arch/alpha/kernel/err_common.c
arch/alpha/kernel/err_ev6.c
arch/alpha/kernel/err_ev7.c
arch/ia64/sn/kernel/huberror.c
arch/ia64/sn/kernel/xpnet.c
arch/m68knommu/kernel/dma.c
arch/mips/lib/iomap.c
arch/powerpc/platforms/pseries/ras.c
arch/ppc/8260_io/enet.c
arch/ppc/8260_io/fcc_enet.c
arch/ppc/8xx_io/enet.c
arch/ppc/syslib/ppc4xx_sgdma.c
arch/sh64/mach-cayman/iomap.c
arch/xtensa/kernel/xtensa_ksyms.c
arch/xtensa/platform-iss/setup.c
drivers/i2c/busses/i2c-at91.c
drivers/i2c/busses/i2c-mpc.c
drivers/media/video/saa711x.c
drivers/misc/hdpuftrs/hdpu_cpustate.c
drivers/misc/hdpuftrs/hdpu_nexus.c
drivers/net/au1000_eth.c
drivers/net/fec_8xx/fec_main.c
drivers/net/fec_8xx/fec_mii.c
drivers/net/fs_enet/fs_enet-main.c
drivers/net/fs_enet/mac-fcc.c
drivers/net/fs_enet/mac-fec.c
drivers/net/fs_enet/mac-scc.c
drivers/net/fs_enet/mii-bitbang.c
drivers/net/fs_enet/mii-fec.c
drivers/net/ibm_emac/ibm_emac_core.c
drivers/net/lasi_82596.c
drivers/parisc/hppb.c
drivers/sbus/sbus.c
drivers/video/g364fb.c
drivers/video/platinumfb.c
drivers/video/stifb.c
drivers/video/valkyriefb.c
include/asm-arm/arch-ixp4xx/dma.h
sound/oss/au1550_ac97.c

I would welcome test reports for these files. I am fine with removing
the untested files from the patch if the general opinion is that these
changes aren't safe. The tested part would still be nice to have.

Note that this patch depends on another header fixup patch I submitted
to LKML yesterday:
  [PATCH] scatterlist.h needs types.h
  http://lkml.org/lkml/2007/3/01/141

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:35 -07:00
Stephen Rothwell
40cd3a4564 [POWERPC] Rename get_property to of_get_property: drivers
These are all the remaining instances of get_property.  Simple rename of
get_property to of_get_property.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-05-02 20:04:32 +10:00
Steve Wise
60be4b5966 RDMA/cxgb3: Initialize cpu_idx field in cpl_close_listserv_req message
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-30 17:30:28 -07:00
Steve Wise
1860cdf802 RDMA/cxgb3: Fail qp creation if the requested max_inline is too large
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-30 17:30:28 -07:00
Steve Wise
4a97d47ef7 RDMA/cxgb3: Fix TERM codes
Fix TERMINATE layer, type, and ecode values based on
conformance testing.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-30 17:30:28 -07:00
Michael S. Tsirkin
347fcfbed2 IPoIB/cm: Fix error handling in ipoib_cm_dev_open()
If skb allocation fails when we start the device, we call
ipoib_cm_dev_stop() even though ipoib_cm_dev_open() did not run to
completion, so we pass an invalid pointer to ib_destroy_cm_id and get
an oops.

Fix by clearing cm.id on error, and testing it in cm_dev_stop().
This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=561>

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-30 17:30:28 -07:00
Robert Walsh
6b66b2da1e IB/ipath: Don't corrupt pending mmap list when unmapped objects are freed
Fix the pending mmap code so it doesn't corrupt the list of pending
mmaps and crash the machine when pending mmaps are destroyed without
first being mapped.  Also, remove an unused variable, and use standard
kernel lists instead of our own homebrewed linked list implementation
to keep the pending mmap list.

Signed-off-by: Robert Walsh <robert.walsh@qlogic.com>
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-30 17:30:28 -07:00
Michael S. Tsirkin
9ba6d5529d IB/mthca: Work around kernel QP starvation
With mthca, RC QPs can starve each other and even UD QPs on the same
hardware schedule queue.  As a result, userspace MPI can starve
e.g. IPoIB traffic, with netdev watchdog warnings getting printed out,
and TCP connections getting stuck or failing.

Reduce the chance of this happening by using three separate hardware
schedule queues: one for userspace RC QPs, one for kernel RC QPs, and
one for all other QPs.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-30 17:30:28 -07:00
Ralph Campbell
c3af664adb IB/ipath: Don't put QP in timeout queue if waiting to send
This fixes a problem which causes too many RC timeouts and
retransmits.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-30 17:30:28 -07:00
Ralph Campbell
35ff032e65 IB/ipath: Don't call spin_lock_irq() from interrupt context
This patch fixes the problem reported by Bernd Schubert <bs@q-leap.de>
with kernel debug options enabled:

    BUG: at kernel/lockdep.c:1860 trace_hardirqs_on()

This was caused by using spin_lock_irq()/spin_unlock_irq() from
interrupt context.  Fix all the places that might be called from
interrupts to use spin_lock_irqsave()/spin_unlock_irqrestore().

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-30 17:30:27 -07:00
Paul Mackerras
49e1900d4c Merge branch 'linux-2.6' into for-2.6.22 2007-04-30 12:38:01 +10:00
Linus Torvalds
afc2e82c08 Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband: (49 commits)
  IB: Set class_dev->dev in core for nice device symlink
  IB/ehca: Implement modify_port
  IB/umad: Clarify documentation of transaction ID
  IPoIB/cm: spin_lock_irqsave() -> spin_lock_irq() replacements
  IB/mad: Change SMI to use enums rather than magic return codes
  IB/umad: Implement GRH handling for sent/received MADs
  IB/ipoib: Use ib_init_ah_from_path to initialize ah_attr
  IB/sa: Set src_path_bits correctly in ib_init_ah_from_path()
  IB/ucm: Simplify ib_ucm_event()
  RDMA/ucma: Simplify ucma_get_event()
  IB/mthca: Simplify CQ cleaning in mthca_free_qp()
  IB/mthca: Fix mthca_write_mtt() on HCAs with hidden memory
  IB/mthca: Update HCA firmware revisions
  IB/ipath: Fix WC format drift between user and kernel space
  IB/ipath: Check that a UD work request's address handle is valid
  IB/ipath: Remove duplicate stuff from ipath_verbs.h
  IB/ipath: Check reserved memory keys
  IB/ipath: Fix unit selection when all CPU affinity bits set
  IB/ipath: Don't allow QPs 0 and 1 to be opened multiple times
  IB/ipath: Disable IB link earlier in shutdown sequence
  ...
2007-04-27 09:39:27 -07:00
Paul Mackerras
a48141db68 Revert "[POWERPC] Rename get_property to of_get_property: drivers"
This reverts commit d05c7a80cf,
which included changes which should go via other subsystem
maintainers.
2007-04-26 22:24:31 +10:00
Arnaldo Carvalho de Melo
d626f62b11 [SK_BUFF]: Introduce skb_copy_from_linear_data{_offset}
To clearly state the intent of copying from linear sk_buffs, _offset being a
overly long variant but interesting for the sake of saving some bytes.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2007-04-25 22:28:23 -07:00
Arnaldo Carvalho de Melo
4305b54135 [SK_BUFF]: Convert skb->end to sk_buff_data_t
Now to convert the last one, skb->data, that will allow many simplifications
and removal of some of the offset helpers.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:29 -07:00
Arnaldo Carvalho de Melo
27a884dc3c [SK_BUFF]: Convert skb->tail to sk_buff_data_t
So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes
on 64bit architectures, allowing us to combine the 4 bytes hole left by the
layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4
64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN...
:-)

Many calculations that previously required that skb->{transport,network,
mac}_header be first converted to a pointer now can be done directly, being
meaningful as offsets or pointers.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:28 -07:00
Arnaldo Carvalho de Melo
badff6d01a [SK_BUFF]: Introduce skb_reset_transport_header(skb)
For the common, open coded 'skb->h.raw = skb->data' operation, so that we can
later turn skb->h.raw into a offset, reducing the size of struct sk_buff in
64bit land while possibly keeping it as a pointer on 32bit.

This one touches just the most simple cases:

skb->h.raw = skb->data;
skb->h.raw = {skb_push|[__]skb_pull}()

The next ones will handle the slightly more "complex" cases.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:25:15 -07:00
Arnaldo Carvalho de Melo
459a98ed88 [SK_BUFF]: Introduce skb_reset_mac_header(skb)
For the common, open coded 'skb->mac.raw = skb->data' operation, so that we can
later turn skb->mac.raw into a offset, reducing the size of struct sk_buff in
64bit land while possibly keeping it as a pointer on 32bit.

This one touches just the most simple case, next will handle the slightly more
"complex" cases.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:24:32 -07:00
Arnaldo Carvalho de Melo
4c13eb6657 [ETH]: Make eth_type_trans set skb->dev like the other *_type_trans
One less thing for drivers writers to worry about.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:24:30 -07:00
Joachim Fenkes
1912ffbb88 IB: Set class_dev->dev in core for nice device symlink
All RDMA drivers except ehca set class_dev->dev to their dma_device
value (ehca leaves this unset).  dma_device is the only value that
makes any sense, so move this assignment to core/sysfs.c.  This reduce
the duplicated code in the rest of the drivers and gives ehca a nice
/sys/class/infiniband/ehcaX/device symlink.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-24 21:30:38 -07:00
Joachim Fenkes
c4ed790dfd IB/ehca: Implement modify_port
Add "Modify Port" verb support to eHCA driver.  The IB communication
manager needs this to set the IsCM port capability bit when
initializing.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-24 21:30:38 -07:00
Roland Dreier
37aebbde70 IPoIB/cm: spin_lock_irqsave() -> spin_lock_irq() replacements
There are quite a few places in ipoib_cm.c where we know IRQs are
enabled because we do something that sleeps in the same function, so
we can convert several occurrences of spin_lock_irqsave() to a plain
spin_lock_irq().  This cleans up the source a little and makes the
code smaller too:

add/remove: 0/0 grow/shrink: 1/5 up/down: 3/-51 (-48)
function                                     old     new   delta
ipoib_cm_tx_reap                             403     406      +3
ipoib_cm_stale_task                          146     145      -1
ipoib_cm_dev_stop                            173     172      -1
ipoib_cm_tx_handler                          964     956      -8
ipoib_cm_rx_handler                          956     937     -19
ipoib_cm_skb_reap                            212     190     -22

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-24 21:30:37 -07:00
Hal Rosenstock
de493d47d8 IB/mad: Change SMI to use enums rather than magic return codes
Clarify code by changing return values from SMI functions to named
enum values instead of magic 0/1 values.

Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-24 16:31:12 -07:00
Sean Hefty
aeba84a925 IB/umad: Implement GRH handling for sent/received MADs
We need to set the SGID index for routed MADs and pass received
GRH information to userspace when a MAD is received.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
2007-04-24 16:31:12 -07:00
Sean Hefty
46f1b3d7af IB/ipoib: Use ib_init_ah_from_path to initialize ah_attr
To support destinations that are not on the local IB subnet, IPoIB
should include the GRH information when constructing an address
handle.  Using the existing ib_init_ah_from_path() call will do this
for us.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
2007-04-24 16:31:12 -07:00
Sean Hefty
d0e7bb1418 IB/sa: Set src_path_bits correctly in ib_init_ah_from_path()
src_path_bits needs to mask off the base LID value.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
2007-04-24 16:31:12 -07:00
Sean Hefty
9d41b7fdea IB/ucm: Simplify ib_ucm_event()
Use wait_event_interruptible() instead of a more complicated
open-coded equivalent.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
2007-04-24 16:31:11 -07:00
Sean Hefty
d92f76448c RDMA/ucma: Simplify ucma_get_event()
Use wait_event_interruptible() instead of a more complicated
open-coded equivalent.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
2007-04-24 16:31:11 -07:00
Roland Dreier
30c00986f3 IB/mthca: Simplify CQ cleaning in mthca_free_qp()
mthca_free_qp() already has local variables to hold the QP's send_cq
and recv_cq, so we can slightly clean up the calls to mthca_cq_clean()
by using those local variables instead of expressions like
to_mcq(qp->ibqp.send_cq).

Also, by cleaning the recv_cq first, we can avoid worrying about
whether the QP is attached to an SRQ for the second call, because we
would only clean send_cq if send_cq is not equal to recv_cq, and that
means send_cq cannot have any receive completions from the QP being
destroyed.

All this work even improves the generated code a bit:

add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-5 (-5)
function                                     old     new   delta
mthca_free_qp                                510     505      -5

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-24 16:31:11 -07:00
Roland Dreier
532c3b5817 IB/mthca: Fix mthca_write_mtt() on HCAs with hidden memory
Commit b2875d4c ("IB/mthca: Always fill MTTs from CPU") causes a crash
in mthca_write_mtt() with non-memfree HCAs that have their memory
hidden (that is, have only two PCI BARs instead of having a third BAR
that allows access to the RAM attached to the HCA) on 64-bit
architectures.  This is because the commit just before, c20e20ab
("IB/mthca: Merge MR and FMR space on 64-bit systems") makes
dev->mr_table.fmr_mtt_buddy equal to &dev->mr_table.mtt_buddy and
hence mthca_write_mtt() tries to write directly into the HCA's MTT
table.  However, since that table is in the HCA's memory, this is
impossible without the PCI BAR that gives access to that memory.

This causes a crash because mthca_tavor_write_mtt_seg() basically
tries to dereference some offset of a NULL pointer.  Fix this by
adding a test of MTHCA_FLAG_FMR in mthca_write_mtt() so that we always
use the WRITE_MTT firmware command rather than writing directly if
FMRs are not enabled.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-24 16:31:04 -07:00
Roland Dreier
3f114853d4 IB/mthca: Update HCA firmware revisions
Update the driver's list of current firmware versions with Mellanox's
latest releases.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:21:02 -07:00
Robert Walsh
40b90430ec IB/ipath: Fix WC format drift between user and kernel space
The kernel ib_wc structure now uses a QP pointer, but the user space
equivalent uses a QP number instead.  This means we can no longer use
a simple structure copy to copy stuff into user space.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:21:01 -07:00
Robert Walsh
6ce73b07db IB/ipath: Check that a UD work request's address handle is valid
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:21:00 -07:00
Robert Walsh
0d6172a428 IB/ipath: Remove duplicate stuff from ipath_verbs.h
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:21:00 -07:00
Robert Walsh
253fb39020 IB/ipath: Check reserved memory keys
Don't let userspace use the direct-physical-map L_key or R_key.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:21:00 -07:00
Bryan O'Sullivan
f0810daf74 IB/ipath: Fix unit selection when all CPU affinity bits set
At some point things changed so that all the affinity bits can be set,
but cpus_full() macro is not true.  This caused problems with the unit
selection logic on multi-unit (board) configurations.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:59 -07:00
Bryan O'Sullivan
662af5813b IB/ipath: Don't allow QPs 0 and 1 to be opened multiple times
Signed-off-by: Robert Walsh <robert.walsh@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:59 -07:00
Bryan O'Sullivan
53c1d2c943 IB/ipath: Disable IB link earlier in shutdown sequence
Move the code that shuts down the IB link earlier in the unload
process, to be sure no new packets can arrive while we are unloading.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:59 -07:00
Bryan O'Sullivan
490462c268 IB/ipath: Prevent random program use of diags interface
To prevent random utility reads and writes of the diag interface to the
chip, we first require a handshake of reading from offset 0 and writing
to offset 0 before any other reads or writes can be done through the
diags device.   Otherwise chip errors can be triggered.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:59 -07:00
Bryan O'Sullivan
f5408ac7cc IB/ipath: On unrecoverable errors, force link down, LEDs off
If the chip is no longer usable, LEDs should be turned off so system
can be found easily in the cluster.

Also some minor reorganizing so both chips print hardware error
message at same point and only if there were unrecovered errors

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:59 -07:00
Michael Albaugh
27b044a815 IB/ipath: Fix driver crash (in interrupt or during unload) after chip reset
Re-init of the kernel structures after a chip reset was leaving the
portdata structure for port zero in an inconsistent state, and a
pointer to it either stale (in re-init code) or NULL (in devdata)
Fixing the order of operations on this struct, and the condition for
interrupt access, prevents the crashes.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:58 -07:00
Bryan O'Sullivan
9783ab4058 IB/ipath: Improve handling and reporting of parity errors
Mostly cleanup.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:58 -07:00
Bryan O'Sullivan
820054b7ca IB/ipath: Print better error messages if kernel is misconfigured
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:58 -07:00
Arthur Jones
569b87b47f IB/ipath: Force PIOAvail update entry point
Due to a chip bug, the PIOAvail register is not always updated to
memory.  This patch allows userspace to force an update.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:58 -07:00
Arthur Jones
7b196e2ff3 IB/ipath: Call free_irq() on chip specific initialization failure
In initialization, if we bailed at chip specific initialization, we
forgot to clean up the irq we had requested.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:58 -07:00
Bryan O'Sullivan
5a7d4eea91 IB/ipath: Discard multicast packets without a GRH
This patch fixes a bug where multicast packets without a GRH were not
being dropped as per the IB spec.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:57 -07:00
Bryan O'Sullivan
0ed3c594e3 IB/ipath: Fix calculation for number of kernel PIO buffers
If the module parameter "kpiobufs" is set too high, the calculation to
reset it to a sane value was incorrect.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:57 -07:00
Bryan O'Sullivan
c8c6f5d496 IB/ipath: Remove unused ipath_read_kreg64_port()
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:57 -07:00
Ralph Campbell
dd5190b6be IB/ipath: Fix RDMA reads of length zero and error handling
Fix RDMA read response length checking for RDMA_READ_RESPONSE_ONLY to
allow a zero length response.  RDMA read responses which don't match
the expected length or occur in response to some other operation
should generate a completion queue error (see table 56, ch. 9.9.2.3 in
the IB spec).

Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:57 -07:00
Mark Debbage
c7e29ff11f IB/ipath: Allow receive ports mapped into userspace to be shared
Improve port-sharing performance by allowing any process to receive
packets from the shared hardware port under a spin lock for mutual
exclusion. Previously, one process was nominated as the master and
that process was responsible for receiving all packets from the shared
hardware port and either consuming them or forwarding them to their
destination. This led to starvation problems for other processes when
the master process was busy in computation phases.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:57 -07:00
Ralph Campbell
0a5a83cffc IB/ipath: Fix port sharing on powerpc
The port sharing feature mixed kernel virtual addresses as well as
physical addresses for the offset used to describe the mmap address to
map the InfiniPath hardware into user space.  This had a conflict on
powerpc.  The new scheme converts it to a physical address so it
doesn't conflict with chip addresses and yet still fits in 40/44 bits
so it isn't truncated by 32-bit applications calling mmap64().

Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:56 -07:00
Bryan O'Sullivan
041eab9136 IB/ipath: Fix CQ flushing when QP is modified to error state
If a receive work request has been removed from the queue but has not
had a CQ entry generated for it and the QP is modified to the error
state, the completion entry generated is incorrect.  This patch fixes
the problem.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:56 -07:00
Bryan O'Sullivan
614d49a21e IB/ipath: Fix bad argument to clear_bit()
Code was converted from a &= ~mask to clear_bit, but the bit was left
shifted instead of being used directly, so we were either trashing
memory several pages away, or sometimes taking a kernel page fault on
an invalid page.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:56 -07:00
Bryan O'Sullivan
8ec1077b35 IB/ipath: Change packet problems vs chip errors handling and reporting
Some types of packet errors are moderately common with longer IB
cables and large clusters, and are not reported with prints by other
IB HCA drivers.  This suppresses those messages unless the new
__IPATH_ERRPKTDBG bit is set in ipath_debug.  Reporting of temporarily
disabled frequent error interrupts was also made clearer

We also distinguish between chip errors, and bad packets sent or
received in the wording of the messages.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:55 -07:00
Ralph Campbell
6f5c407460 IB/ipath: Fix PSN update for RC retries
This patch fixes a number of bugs with updating the PSN for retries of
RC requests.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:55 -07:00
Ralph Campbell
0434d271fd IB/ipath: Fix QP error completion queue entries
When switching to the QP error state, the completion queue entries
(error or flush) were not being generated correctly.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:55 -07:00
Bryan O'Sullivan
39c0d0b919 IB/ipath: Fix up some debug messages
ipath_dbg doesn't need the same prefixes that printk does.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:55 -07:00
Ralph Campbell
3859e39d75 IB/ipath: Support larger IB_QP_MAX_DEST_RD_ATOMIC and IB_QP_MAX_QP_RD_ATOMIC
This patch adds support for multiple RDMA reads and atomics to be sent
before an ACK is required to be seen by the requester.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:55 -07:00
Ralph Campbell
7b21d26dda IB/ipath: NMI cpu lockup if local loopback used
If a post send is done in loopback and there is no receive queue
entry, the sending QP is put on a timeout list for a while so the
receiver has a chance to post a receive buffer. If the another post
send is done, the code incorrectly tried to put the QP on the timeout
list again an corrupted the timeout list. This eventually leads to a
spin lock deadlock NMI due to the timer function looping forever with
the lock held.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:54 -07:00
Ralph Campbell
9f9630d5e1 IB/ipath: Fix SRQ limit event causing dropped CQ entry
A silly programming error causes a CQ entry to not be generated if a
SRQ limit event is generated.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:54 -07:00
Ralph Campbell
947d7617a1 IB/ipath: Don't initialize port memory for subports
A recent change was made to allocate memory for a port after CPU
affinity is set. That change didn't account for subports and was
trying to allocate memory for the port twice.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:54 -07:00
Bryan O'Sullivan
1908574559 IB/ipath: Definitions of two RXE parity err bits were reversed
The chip documentation on the expected TID vs eager TID parity error
bits was reversed from what was implemented in the RTL, for both
chips.  This corrects the definitions.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:54 -07:00
Bryan O'Sullivan
165c552c35 IB/ipath: Fix user memory region creation when IOMMU present
The loop which initializes the user memory region from an array of
pages was using the wrong limit for the array.  This worked OK when
dma_map_sg() returned the same number as the number of pages.  This
patch fixes the problem.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:54 -07:00
Bryan O'Sullivan
946db67fbf IB/ipath: Add ability to set and clear IB local loopback
This is a sticky state.  It is useful for diagnosing problems with
boards versus cable/switch problems.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:53 -07:00
Roland Dreier
a89875fc7e IPoIB: Remove pointless opcode field from debugging output
There's no point in printing the opcode field in the completion
handling debugging output, since the type of completion is already
printed at the beginning of the line.  In fact the opcode field is not
even defined for completions with a status other than success.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:53 -07:00
Hal Rosenstock
9a4b65e357 IB/umad: Fix declaration of dev_map[]
The current ib_umad code never accesses bits past IB_UMAD_MAX_PORTS in
dev_map[].  We shouldn't declare it to be twice as big.

Pointed-out-by: Roland Dreier <rolandd@cisco.com>

Signed-off-by: Hal Rosenstock <halr@voltaire.com>
2007-04-18 20:20:53 -07:00
Michael S. Tsirkin
608d8268be IB/mthca: Fix data corruption after FMR unmap on Sinai
In mthca_arbel_fmr_unmap(), the high bits of the key are masked off.
This gets rid of the effect of adjust_key(), which makes sure that
bits 3 and 23 of the key are equal when the Sinai throughput
optimization is enabled, and so it may happen that an FMR will end up
with bits 3 and 23 in the key being different.  This causes data
corruption, because when enabling the throughput optimization, the
driver promises the HCA firmware that bits 3 and 23 of all memory keys
will always be equal.

Fix by re-applying adjust_key() after masking the key.

Thanks to Or Gerlitz for reproducing the problem, and Ariel Shahar for
help in debug.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-16 14:10:55 -07:00
Stephen Rothwell
d05c7a80cf [POWERPC] Rename get_property to of_get_property: drivers
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-04-13 03:55:19 +10:00
Stephen Rothwell
a7edd0e676 [POWERPC] get_property returns const
This just tidies up some of the remains.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-04-13 03:55:17 +10:00
Steve Wise
1ca19770c5 RDMA/cxgb3: Add set_tcb_rpl_handler
As of commit 6cdbd77e ("cxgb3 - missing CPL hanler and register
setting."), the cxgb3 ethernet NIC driver no longer handles SET_TCB
replies, so we need to do it in the iWARP driver.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Acked-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-12 10:37:11 -07:00
Michael S. Tsirkin
6371ea3d48 IPoIB/cm: Fix DMA direction typo
Receive buffers need to be mapped with DMA_FROM_DEVICE.  Incorrectly
mapping with DMA_TO_DEVICE causes a hard lock on ppc64 machines with
an IOMMU.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=431>

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-10 08:58:30 -07:00
Erez Zilber
1d426d6418 IB/iser: Don't defer connection failure notification to workqueue
When a connection is terminated asynchronously from the iSCSI layer's
perspective, iSER needs to notify the iSCSI layer that the connection
has failed.  This is done using a workqueue (switched to from the iSER
tasklet context).  Meanwhile, the connection object (that holds the
work struct) is released.  If the workqueue function wasn't called
yet, it will be called later with a NULL pointer, which will crash the
kernel.

The context switch (tasklet to workqueue) is not required, and
everything can be done from the iSER tasklet. This eliminates the NULL
work struct bug (and simplifies the code).

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-05 09:46:04 -07:00
Linus Torvalds
a26b5fce06 Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
  IB/iser: Handle aborting a command after it is sent
  IB/mthca: Fix thinko in init_mr_table()
  RDMA/cxgb3: Fix resource leak in cxio_hal_init_ctrl_qp()
2007-03-28 14:00:01 -07:00
Erez Zilber
3104a2175d IB/iser: Handle aborting a command after it is sent
The SCSI midlayer may abort a command that was already sent.  If the
initiator is still trying to send the command (or data-out PDUs for
that command), the QP may time out after the midlayer times
out. Therefore, when aborting the command, iSER may still have
references for the command's buffers.  When sending these PDUs, the
sends will complete with an error and their resources will be released
then.

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-26 16:35:09 -07:00
Michael S. Tsirkin
0264d88531 IB/mthca: Fix thinko in init_mr_table()
Commit c20e20ab ("IB/mthca: Merge MR and FMR space on 64-bit systems")
swapped the number of MTTs and MPTs when initializing the MR table. As
a result, we get a kernel oops when the number of MTT segments
allocated exceeds 0x20000.

Noted by Troy Benjegerdes <troy@scl.ameslab.gov>, and reproduced by
Dotan Barak <dotanb@mellanox.co.il>.  This fixes
https://bugs.openfabrics.org/show_bug.cgi?id=490

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-26 15:59:32 -07:00
Steve Wise
ed6ee5178e RDMA/cxgb3: Fix resource leak in cxio_hal_init_ctrl_qp()
This was spotted by the Coverity checker (CID 1554).

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-26 15:54:40 -07:00
Alexey Kuznetsov
ecbb416939 [NET]: Fix neighbour destructor handling.
->neigh_destructor() is killed (not used), replaced with
->neigh_cleanup(), which is called when neighbor entry goes to dead
state. At this point everything is still valid: neigh->dev,
neigh->parms etc.

The device should guarantee that dead neighbor entries (neigh->dead !=
0) do not get private part initialized, otherwise nobody will cleanup
it.

I think this is enough for ipoib which is the only user of this thing.
Initialization private part of neighbor entries happens in ipib
start_xmit routine, which is not reached when device is down.  But it
would be better to add explicit test for neigh->dead in any case.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-25 18:48:01 -07:00
Michael S. Tsirkin
77d8e1efea IB/ipoib: Fix thinko in packet length checks
The packet length checks in ipoib are broken: we add 4 bytes (IPoIB
encapsulation header) when sending a packet, not 20 bytes (hardware
address length) to each packet.  Therefore, if connected mode is
enabled so that the interface MTU is larger than the multicast MTU,
IPoIB may end up trying to send too-long multicast packets.  For
example, multicast is broken if a message of size 2048 bytes is sent
on an interface with UD MTU 2048, because 2048 is bigger than the real
limit of 2044 but the code tests against the wrong limit of 2060.

This patch fixes <https://bugs.openfabrics.org/show_bug.cgi?id=418>,
submitted by Scott Weitzenkamp <sweitzen@cisco.com>.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-22 14:40:16 -07:00
Michael S. Tsirkin
d04d01b113 IPoIB: Fix use-after-free in path_rec_completion()
The connected mode code added the possibility that an neigh struct
gets freed in the list_for_each_entry() loop in path_rec_completion(),
which causes a use-after-free.  Fix this by changing to the _safe
variant of the list walking macro.

This was spotted by the Coverity checker (CID 1567).

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-22 14:40:16 -07:00
Joachim Fenkes
73b9e9870f IB/ehca: Make scaling code work without CPU hotplug
eHCA scaling code must not depend on register_cpu_notifier() if
CONFIG_HOTPLUG_CPU is not set, so put all related code into #ifdefs.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-22 14:40:16 -07:00
Steve Wise
d601347188 RDMA/cxgb3: Handle build_phys_page_list() failure in iwch_reregister_phys_mem()
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-22 14:40:16 -07:00
Bryan O'Sullivan
fae8773b73 IB/ipath: Check return value of lookup_one_len
This fixes kernel.org bug 8003.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-22 14:40:15 -07:00
Sean Hefty
e07832b662 IPoIB: Fix race in detaching from mcast group before attaching
There's a race between ipoib_mcast_leave() and ipoib_mcast_join_finish()
where we can try to detach from a multicast group before we've
attached to it.  Fix this by reordering the code in ipoib_mcast_leave
to free the multicast group first, which waits for the multicast
callback thread (which calls ipoib_mcast_join_finish()) to complete
before detaching from the group.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-22 14:32:09 -07:00
Michael S. Tsirkin
60a596dab7 IPoIB/cm: Fix reaping of stale connections
The sense of the time_after_eq() test in ipoib_cm_stale_task() is
reversed so that only non-stale connections are reaped.  Fix this by
changing to time_before_eq().

Noticed by Pradeep Satyanarayana <pradeep@us.ibm.com>.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-22 14:32:09 -07:00
Al Viro
62577fa324 [PATCH] fix ipath_dma_free_coherent() prototype
method gets u64, not dma_addr_t

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-14 15:27:49 -07:00
Mike Christie
bf32ed33e9 [SCSI] iscsi: rename DEFAULT_MAX_RECV_DATA_SEGMENT_LENGTH
This patch renames DEFAULT_MAX_RECV_DATA_SEGMENT_LENGTH to avoid
confusion with the drivers default values (DEFAULT_MAX_RECV_DATA_SEGMENT_LENGTH
is the iscsi RFC specific default).

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2007-03-11 11:26:50 -05:00
Shirley Ma
55c9adde13 IPoIB: Turn on interface's carrier after broadcast group is joined
Do netif_carrier_on() right after the IPv4 broadcast multicast group
is joined, rather than waiting for all of the initial set of multicast
group joins to finish.  This allows at least IPv4 traffic to limp
along on broken fabrics where not all multicast groups can be joined.

Signed-off-by: Shirley Ma <xma@us.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-08 14:59:30 -08:00
Sean Hefty
3492856e33 RDMA/ucma: Avoid sending reject if backlog is full
Change the returned error code to ENOMEM if the connection event
backlog is full.  This prevents the ib_cm from issuing a reject
on the connection, which can allow retries to succeed.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-06 14:58:11 -08:00
Steve Wise
e64518f373 RDMA/cxgb3: Fix MR permission problems
Fix memory region permission problems:

- remove useless and redundant iwch_mem_perms enum.

- create ib_to_tpt_access_rights() for mapping ib access rights
  to T3 TPT permissions.

- create ib_to_mwbind_access_rights() for mapping ib access rights
  to T3 MWBIND WR permissions.

- fix up the mem reg code to utilize the new functions.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-06 12:51:02 -08:00
Steve Wise
1f6a849b7c RDMA/cxgb3: Don't reuse skbs that are non-linear or cloned
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-06 12:50:57 -08:00
Steve Wise
8cfccf02bb RDMA/cxgb3: Squelch logging AE errors
Only print one AE error for a given connection in the kernel log.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-06 12:50:53 -08:00
Steve Wise
adf376b370 RDMA/cxgb3: Stop EP timer when MPA exchange is aborted by peer
Stop the endpoint timer when the MPA exchange is aborted by the peer.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-06 12:50:49 -08:00
Steve Wise
2df50da00e RDMA/cxgb3: Move QP to error on destroy if the state is IDLE
Change iwch_destroy_qp() to always move the QP to ERROR and let
iwch_modify_qp() decide what to do.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-06 12:50:45 -08:00
Steve Wise
42e3175354 RDMA/cxgb3: Fixes for "normal close" failures
Fixes for "normal close" failures:

- Start normal close timer when moving to CLOSING state.
- Handle ABORTING state in close_con_rpl().
- Stop timer correctly on abort during a normal close.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-06 12:50:37 -08:00
David Miller
c3bb1092c8 RDMA/cxgb3: Fix build on sparc64
cxgb3 uses dma_alloc_coherent() et al. thus needs linux/dma-mapping.h
include in order to build reliably.

Noticed on sparc64.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-06 12:45:57 -08:00
Sean Hefty
cb164b8c6a RDMA/cma: Initialize rdma_bind_list in cma_alloc_any_port()
The struct rdma_bind_list fields for hlist are not being initialized,
resulting in a corrupted list.  Fix this by using kzalloc() to make
sure all pointers are NULL.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-06 12:41:44 -08:00
Steve Wise
aeb100e246 RDMA/cxgb3: Don't use mm after it's freed in iwch_mmap()
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-06 11:48:17 -08:00
Steve Wise
7d526e6b2c RDMA/cxgb3: Start ep timer on a MPA reject
If the consumer rejects the connection we end up under-referencing the
endpoint structure.  The fix is to call iwch_ep_disconnect() instead
of the low level disconnect functions so that the endpoint close timer
is started correctly.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-06 11:47:05 -08:00
Roland Dreier
88171cfed5 IB/mthca: Fix error path in mthca_alloc_memfree()
The garbled logic in mthca_alloc_memfree() causes it to return 0, even
if it fails to allocate all doorbell records.  Fix it to return -ENOMEM
when it fails.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-01 13:17:14 -08:00
Hoang-Nam Nguyen
31726798bd IB/ehca: Fix sync between completion handler and destroy cq
This patch fixes two issues reported by Roland Dreier and Christoph Hellwig:

- Mismatched sync/locking between completion handler and destroy cq We
  introduced a counter nr_events per cq to track number of irq events
  seen. This counter is incremented when an event queue entry is seen
  and decremented after completion handler has been called regardless
  if scaling code is active or not. Note that nr_callbacks tracks
  number of events assigned to a cpu and both counters can potentially
  diverge.

  The sync between running completion handler and destroy cq is done
  by using the global spin lock ehca_cq_idr_lock.

- Replace yield by wait_event on the counter above to become zero.

Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-01 13:04:05 -08:00
Roland Dreier
a27cbe8782 IPoIB: Only handle async events for one port
An asynchronous event carries the port number that the event occurred
on, so there's no reason for an IPoIB interface to process an event
associated with a different local HCA port.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-27 07:37:49 -08:00
Roland Dreier
843613b047 IPoIB: Correct debugging output when path record lookup fails
If path_rec_completion() is passed a non-NULL path record pointer
along with an unsuccessful status value, the tracing code incorrectly
prints the (invalid) DLID from the path record rather than the more
interesting status code.  The actual logic of the function correctly
uses the path record only if the status indicates a successful lookup.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-26 12:57:08 -08:00
Steve Wise
2f236735fd RDMA/cxgb3: Stop the EP Timer on BAD CLOSE
Stop the ep timer in ec_status() if the status indicates a
bad close.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-23 13:12:48 -08:00
Adrian Bunk
2b540355cd RDMA/cxgb3: cleanups
- don't mark static functions in C files as inline - gcc should know
  best whether inlining makes sense
- never compile the unused cxio_dbg.c
- make the following needlessly global functions static:
  - cxio_hal.c: cxio_hal_clear_qp_ctx()
  - iwch_provider.c: iwch_get_qp()
- remove the following unused global functions:
  - cxio_hal.c: cxio_allocate_stag()
  - cxio_resource.: cxio_hal_get_rhdl()
  - cxio_resource.: cxio_hal_put_rhdl()

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-23 13:10:43 -08:00
Sean Hefty
1836854f25 RDMA/cma: Remove unused node_guid from cma_device structure
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-22 17:54:35 -08:00
Sean Hefty
e971b8cd19 IB/cm: Remove ca_guid from cm_device structure
The cm_device references an ib_device, which already contains the node_guid.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-22 17:54:33 -08:00
Sean Hefty
962063e64b RDMA/cma: Request reversible paths only
The rdma_cm requires that path records be reversible.  Set the
reversible bit when issuing an path record query.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-22 17:54:07 -08:00
Sean Hefty
47645d8d25 IB/core: Set hop limit in ib_init_ah_from_wc correctly
The hop_limit value in the ah_attr should be 0xFF, not the value read
from the received GRH (which should be 0).  See 13.5.4.4 in the 1.2 IB
spec.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-22 17:54:02 -08:00
Roland Dreier
aaf1aef55f IB/uverbs: Return correct error for invalid PD in register MR
If no matching PD is found in ib_uverbs_reg_mr(), then the function
jumps to err_release without setting the return value ret.  This means
that ret will hold the return value of the call to ib_umem_get() a few
lines earlier; if the function reaches the point where it looks for
the PD, we know that ib_umem_get() must have returned 0, so
ib_uverbs_reg_mr() ends up return 0 for a bad PD ID.  Fix this by
setting ret to -EINVAL before jumping to the exit path when no PD is
found.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-22 13:16:51 -08:00
Roland Dreier
658bcef619 IPoIB: Remove unused local_rate tracking
Now that low-level drivers handle the conversion from an absolute rate
to a relative rate, there's no need for the IPoIB driver to keep track
of the local port's data rate.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-21 20:28:05 -08:00
Michael S. Tsirkin
1812063ba3 IPoIB/cm: Improve small message bandwidth
Avoid the overhead of freeing/reallocating and mapping/unmapping for
DMA pages that have not been written to by hardware.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-20 20:16:14 -08:00
Adrian Bunk
c9add6ec56 IB/mthca: Make 2 functions static
This patch makes the needlessly global functions mthca_tavor_write_mtt_seg()
and mthca_arbel_write_mtt_seg() static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-20 20:12:02 -08:00
Linus Torvalds
874ff01bd9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (25 commits)
  Documentation/kernel-docs.txt update.
  arch/cris: typo in KERN_INFO
  Storage class should be before const qualifier
  kernel/printk.c: comment fix
  update I/O sched Kconfig help texts - CFQ is now default, not AS.
  Remove duplicate listing of Cris arch from README
  kbuild: more doc. cleanups
  doc: make doc. for maxcpus= more visible
  drivers/net/eexpress.c: remove duplicate comment
  add a help text for BLK_DEV_GENERIC
  correct a dead URL in the IP_MULTICAST help text
  fix the BAYCOM_SER_HDX help text
  fix SCSI_SCAN_ASYNC help text
  trivial documentation patch for platform.txt
  Fix typos concerning hierarchy
  Fix comment typo "spin_lock_irqrestore".
  Fix misspellings of "agressive".
  drivers/scsi/a100u2w.c: trivial typo patch
  Correct trivial typo in log2.h.
  Remove useless FIND_FIRST_BIT() macro from cardbus.c.
  ...
2007-02-19 13:29:02 -08:00
Robert P. J. Day
d08df601a3 Various typo fixes.
Correct mis-spellings of "algorithm", "appear", "consistent" and
(shame, shame) "kernel".

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-02-17 19:07:33 +01:00
Roland Dreier
7084f8429c IB/core: Set static rate in ib_init_ah_from_path()
The static rate from the path record should be put into the address
vector -- a long time ago the rate in the address attributes needed to
be a relative rate, which required more munging, but now that the
conversion from absolute to relative is done in the low-level driver,
it's easy for ib_init_ah_from_path() to put the absolute rate in.

Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 15:31:24 -08:00
Roland Dreier
630e61f2fa IB/ipath: Make ipath_map_sg() static
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 14:58:08 -08:00
Roland Dreier
38abaa63bf IB/core: Fix sparse warnings about shadowed declarations
Change a couple of variable names to avoid sparse warnings about
symbols being shadowed.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 14:41:14 -08:00
Sean Hefty
c8f6a362bf RDMA/cma: Add multicast communication support
Extend rdma_cm to support multicast communication.  Multicast support
is added to the existing RDMA_PS_UDP port space, as well as a new
RDMA_PS_IPOIB port space.  The latter port space allows joining the
multicast groups used by IPoIB, which enables offloading IPoIB traffic
to a separate QP.  The port space determines the signature used in the
MGID when joining the group.  The newly added RDMA_PS_IPOIB also
allows for unicast operations, similar to RDMA_PS_UDP.

Supporting the RDMA_PS_IPOIB requires changing how UD QPs are initialized,
since we can no longer assume that the qkey is constant.  This requires
saving the Q_Key to use when attaching to a device, so that it is
available when creating the QP.  The Q_Key information is exported to
the user through the existing rdma_init_qp_attr() interface.

Multicast support is also exported to userspace through the rdma_ucm.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 14:29:07 -08:00
Sean Hefty
faec2f7b96 IB/sa: Track multicast join/leave requests
The IB SA tracks multicast join/leave requests on a per port basis and
does not do any reference counting: if two users of the same port join
the same group, and one leaves that group, then the SA will remove the
port from the group even though there is one user who wants to stay a
member left.  Therefore, in order to support multiple users of the
same multicast group from the same port, we need to perform reference
counting locally.

To do this, add an multicast submodule to ib_sa to perform reference
counting of multicast join/leave operations.  Modify ib_ipoib (the
only in-kernel user of multicast) to use the new interface.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 14:20:02 -08:00
Michael S. Tsirkin
8a2e65f87c IPoIB: CM error handling thinko fix
ipoib_cm_alloc_rx_skb() might be called from IRQ context, so it must
use dev_kfree_skb_any(), not kfree_skb().

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 13:57:35 -08:00
Steve Wise
c52daa2976 RDMA/cxgb3: Remove Open Grid Computing copyrights in iw_cxgb3 driver
Remove the Open Grid Computing copyright.  It shouldn't be there.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 13:57:35 -08:00
Steve Wise
a1a750523b RDMA/cxgb3: Fail posts synchronously when in TERMINATE state
For T3B devices, mark user QP in error once we transition
to TERMINATE.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 13:57:35 -08:00
Steve Wise
ebb90986e1 RDMA/iwcm: iw_cm_id destruction race fixes
iwcm iw_cm_id destruction race condition fixes:

- iwcm_deref_id() always wakes up if there's another reference.
- clean up race condition in cm_work_handler().
- create static void free_cm_id() which deallocs the work entries and then
  kfrees the cm_id memory.  This reduces code replication.
- rem_ref() if this is the last reference -and- the IWCM owns freeing the
  cm_id, then free it.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 13:57:35 -08:00
Hoang-Nam Nguyen
6bbcea0d42 IB/ehca: Change query_port() to return LINK_UP instead UNKNOWN
Set the port phys state as returned from ehca_query_port() to LINK_UP.
ehca actually represents a logical HCA, whose phys/link state always
is LINK_UP.

Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 13:57:34 -08:00
Hoang-Nam Nguyen
4fd3006032 IB/ehca: Allow en/disabling scaling code via module parameter
Allow users to en/disable scaling code when loading ib_ehca module,
rather than requiring the module to be rebuilt to change the setting.

Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 13:57:34 -08:00
Hoang-Nam Nguyen
8b16cef3df IB/ehca: Fix race condition/locking issues in scaling code
Fix a race condition in find_next_cpu_online() and some other locking
issues in ehca scaling code.

Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 13:57:34 -08:00
Hoang-Nam Nguyen
78d8d5f9ef IB/ehca: Rework irq handler
Rework ehca interrupt handling to avoid/reduce missed irq events.

Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 13:57:34 -08:00
Roland Dreier
551fd6122d IPoIB: Only allow root to change between datagram and connected mode
Change the permissions of the "mode" sysfs attribute to be S_IWUSR
instead of S_IWUGO.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 13:57:33 -08:00
Roland Dreier
11282b32a4 IB/mthca: Fix allocation of ICM chunks in coherent memory
The change to allow allocating ICM chunks from coherent memory did not
increment the count of sg entries properly, so a chunk that required
more than allocation would not be mapped properly by the HCA.

Fix this by adding the missing increment of chunk->nsg.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 13:57:33 -08:00
Dotan Barak
fc89afce34 IB/mthca: Allow the QP state transition RESET->RESET
RESET->RESET is an allowed QP state transition, so mthca should handle
it correctly, by just returning success without involving the firmware.

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 13:57:32 -08:00
Thomas Gleixner
38515e908b [PATCH] Scheduled removal of SA_xxx interrupt flags fixups
The obsolete SA_xxx interrupt flags have been used despite the scheduled
removal.  Fixup the remaining users.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Wim Van Sebroeck <wim@iguana.be>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Greg KH <greg@kroah.com>
Cc: Dave Airlie <airlied@linux.ie>
Cc: James Simmons <jsimmons@infradead.org>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14 08:09:54 -08:00
Tim Schmielau
cd354f1ae7 [PATCH] remove many unneeded #includes of sched.h
After Al Viro (finally) succeeded in removing the sched.h #include in module.h
recently, it makes sense again to remove other superfluous sched.h includes.
There are quite a lot of files which include it but don't actually need
anything defined in there.  Presumably these includes were once needed for
macros that used to live in sched.h, but moved to other header files in the
course of cleaning it up.

To ease the pain, this time I did not fiddle with any header files and only
removed #includes from .c-files, which tend to cause less trouble.

Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
configs in arch/arm/configs on arm.  I also checked that no new warnings were
introduced by the patch (actually, some warnings are removed that were emitted
by unnecessarily included header files).

Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14 08:09:54 -08:00
Linus Torvalds
93bbad8fe1 Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
  IB/mthca: Always fill MTTs from CPU
  IB/mthca: Merge MR and FMR space on 64-bit systems
  IB/mthca: Fix access to MTT and MPT tables on non-cache-coherent CPUs
  IB/mthca: Give reserved MTTs a separate cache line
  IB/mthca: Fix reserved MTTs calculation on mem-free HCAs
  RDMA/cxgb3: Add driver for Chelsio T3 RNIC
  IB: Remove redundant "_wq" from workqueue names
  RDMA/cma: Increment port number after close to avoid re-use
  IB/ehca: Fix memleak on module unloading
  IB/mthca: Work around gcc bug on sparc64
  IPoIB: Connected mode experimental support
  IB/core: Use ARRAY_SIZE macro for mandatory_table
  IB/mthca: Use correct structure size in call to memset()
2007-02-13 21:16:39 -08:00
Michael S. Tsirkin
b2875d4c39 IB/mthca: Always fill MTTs from CPU
Speed up memory registration by filling in MTTs directly when the CPU
can write directly to the whole table (all mem-free cards, and to
Tavor mode on 64-bit systems with the patch I posted earlier).  This
reduces the number of FW commands needed to register an MR by at least
a factor of 2 and speeds up memory registration significantly.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-12 16:16:29 -08:00
Michael S. Tsirkin
c20e20ab0f IB/mthca: Merge MR and FMR space on 64-bit systems
For Tavor, we currently reserve separate MPT and MTT space for FMRs to
avoid abusing the vmalloc space on 32 bit kernels. No such problem
exists on 64 bit kernels so let's not do it there.

This way we have a shared pool for MR and FMR resources, used on
demand.  This will also make it possible to write MTTs for regular
regions directly from driver.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-12 16:16:29 -08:00
Michael S. Tsirkin
391e4dea71 IB/mthca: Fix access to MTT and MPT tables on non-cache-coherent CPUs
We allocate the MTT table with alloc_pages() and then do pci_map_sg(),
so we must call pci_dma_sync_sg() after the CPU writes to the MTT
table.  This works since the device will never write MTTs on mem-free
HCAs, once we get rid of the use of the WRITE_MTT firmware command.
This change is needed to make that work, and is an improvement for
now, since it gives FMRs a chance at working.

For MPTs, both the device and CPU might write there, so we must
allocate DMA coherent memory for these.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-12 16:16:29 -08:00
Michael S. Tsirkin
1d1f19cfce IB/mthca: Give reserved MTTs a separate cache line
MTTs are allocated in non-cache-coherent memory, so we must give
reserved MTTs their own cache line, to prevent both device and
CPU from writing into the same cache line at the same time.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-12 16:16:29 -08:00
Michael S. Tsirkin
c7d204e8fd IB/mthca: Fix reserved MTTs calculation on mem-free HCAs
The reserved_mtts field has different meaning in Tavor and Arbel, so
we are wasting mtt entries on memfree. Fix the Arbel case to match
Tavor semantics.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-12 16:16:29 -08:00
Steve Wise
b038ced7b3 RDMA/cxgb3: Add driver for Chelsio T3 RNIC
Add an RDMA/iWARP driver for the Chelsio T3 1GbE and 10GbE adapters.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-12 16:16:18 -08:00
Arjan van de Ven
2b8693c061 [PATCH] mark struct file_operations const 3
Many struct file_operations in the kernel can be "const".  Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data.  In addition it'll catch accidental writes at compile time to
these shared resources.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:45 -08:00
Robert P. J. Day
c376222960 [PATCH] Transform kmem_cache_alloc()+memset(0) -> kmem_cache_zalloc().
Replace appropriate pairs of "kmem_cache_alloc()" + "memset(0)" with the
corresponding "kmem_cache_zalloc()" call.

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Roland McGrath <roland@redhat.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Greg KH <greg@kroah.com>
Acked-by: Joel Becker <Joel.Becker@oracle.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Jan Kara <jack@ucw.cz>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-11 10:51:27 -08:00
Sean Hefty
c7f743a669 IB: Remove redundant "_wq" from workqueue names
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-10 08:00:50 -08:00
Sean Hefty
aedec08050 RDMA/cma: Increment port number after close to avoid re-use
Randomize the starting port number and avoid re-using port values
immediately after they are closed.  Instead keep track of the last
port value used and increment it every time a new port number is
assigned, to better replicate other port spaces.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-10 08:00:50 -08:00
Akinobu Mita
65e5c02621 IB/ehca: Fix memleak on module unloading
Percpu data is not freed on module unloading.

Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Christoph Raisch <raisch@de.ibm.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-10 08:00:49 -08:00
David Howells
6bdd61d876 IB/mthca: Work around gcc bug on sparc64
For some reason gcc-3.4.5 on sparc64 does:

 WARNING: "____ilog2_NaN" [drivers/infiniband/hw/mthca/ib_mthca.ko] undefined!

Points to note:

 (1) The asm volatile flush/flushw are just markers for viewing what comes out
     in the assembly; removing them has no effect on the result.

 (2) Changing almost anything else in dwh__mthca_arbel_init_srq_context() or
     dwh__mthca_alloc_srq() causes the problem to go away.

The compiler command line issued by the kernel build is:

/opt/crosstool/gcc-3.4.5-glibc-2.3.6/sparc64-unknown-linux-gnu/bin/sparc64-unknown-linux-gnu-gcc -fno-strict-aliasing -fno-common -Os -m64 -mno-fpu -mcpu=ultrasparc -mcmodel=medlow -ffixed-g4 -ffixed-g5 -fcall-used-g7 -Wa,--undeclared-regs -pg -fno-omit-frame-pointer -fno-optimize-sibling-calls -fasynchronous-unwind-tables -g  -c -o drivers/infiniband/hw/mthca/.tmp_mthca_srq.o drivers/infiniband/hw/mthca/mthca_srq.c

This can be reduced to this whilst still retaining the problem:

/opt/crosstool/gcc-3.4.5-glibc-2.3.6/sparc64-unknown-linux-gnu/bin/sparc64-unknown-linux-gnu-gcc -m64 -c -o drivers/infiniband/hw/mthca/mthca_srq.o drivers/infiniband/hw/mthca/mthca_srq.c -Os

Removing -Os or changing it to -O or -O0 thru -O6 gets rid of the problem.

This patch to the kernel code fixes the problem:

Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-10 08:00:49 -08:00
Michael S. Tsirkin
839fcaba35 IPoIB: Connected mode experimental support
The following patch adds experimental support for IPoIB connected
mode, as defined by the draft from the IETF ipoib working group.  The
idea is to increase performance by increasing the MTU from the maximum
of 2K (theoretically 4K) supported by IPoIB on top of UD.  With this
code, I'm able to get 800MByte/sec or more with netperf without
options on a Mellanox 4x back-to-back DDR system.

Some notes on code:
1. SRQ is used for scalability to large cluster sizes
2. Only RC connections are used (UC does not support SRQ now)
3. Retry count is set to 0 since spec draft warns against retries
4. Each connection is used for data transfers in only 1 direction, so
   each connection is either active(TX) or passive (RX).  2 sides that
   want to communicate create 2 connections.
5. Each active (TX) connection has a separate CQ for send completions -
   this keeps the code simple without CQ resize and other tricks
6. To detect stale passive side connections (where the remote side is
   down), we keep an LRU list of passive connections (updated once per
   second per connection) and destroy a connection after it has been
   unused for several seconds. The LRU rule makes it possible to avoid
   scanning connections that have recently been active.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-10 08:00:48 -08:00
Ahmed S. Darwish
9a6b090c0d IB/core: Use ARRAY_SIZE macro for mandatory_table
Use ARRAY_SIZE() macro already defined in kernel.h instead of open
coding equivalent code.

Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-10 08:00:47 -08:00
Roland Dreier
99d4f22e91 IB/mthca: Use correct structure size in call to memset()
When clearing the ib_ah_attr parameter in to_ib_ah_attr(), use sizeof
*ib_ah_attr instead of sizeof *path.

Pointed out by Jack Morgenstein <jackm@mellanox.co.il>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-10 08:00:47 -08:00
Al Viro
b437735645 [PATCH] iscsi endianness annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-09 09:14:07 -08:00
Greg Kroah-Hartman
43cb76d91e Network: convert network devices to use struct device instead of class_device
This lets the network core have the ability to handle suspend/resume
issues, if it wants to.

Thanks to Frederik Deweerdt <frederik.deweerdt@gmail.com> for the arm
driver fixes.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-02-07 10:37:11 -08:00
Hoang-Nam Nguyen
b45bfcc1ae IB/ehca: Remove obsolete prototypes
Remove prototypes for functions that don't exist.

Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-04 14:11:58 -08:00
Hoang-Nam Nguyen
4c34bdf58c IB/ehca: Remove use of do_mmap()
This patch removes do_mmap() from ehca:
 - Call remap_pfn_range() for hardware register block
 - Use vm_insert_page() to register memory allocated for completion
   queues and queue pairs
 - The actual mmap() call/trigger is now controlled by user space,
   ie. libehca

Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-04 14:11:57 -08:00
Steve Wise
1f12667021 RDMA/addr: Handle ethernet neighbour updates during route resolution
The iWARP connection manager uses the ib_addr services to do route
resolution (neighbour discovery in the IP world).  The ib_addr
netevent callback routine, however, currently only acts on InfiniBand
neighbour updates.  It needs to act on ethernet neighbour updates as
well.

This patch just removes filtering on device type altogether and will
trigger on any neighour updates where the nud_type is valid.  This
simplifies the code some.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-04 14:11:57 -08:00
Ishai Rabinovitz
1033ff670d IB/srp: Don't wait for response when QP is in error state.
When there is a call to send_tsk_mgmt SRP posts a send and waits for 5
seconds to get a response.

When the QP is in the error state it is obvious that there will be no
response so it is quite useless to wait.  In fact, the timeout causes
SRP to wait a long time to reconnect when a QP error occurs. (Each
abort and each reset_device calls send_tsk_mgmt, which waits for the
timeout).  The following patch solves this problem by identifying the
failure and returning an immediate error code.

Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-04 14:11:56 -08:00
Michael S. Tsirkin
062dbb69f3 IB: Return qp pointer as part of ib_wc
struct ib_wc currently only includes the local QP number: this matches
the IB spec, but seems mostly useless. The following patch replaces
this with the pointer to qp itself, and updates all low level drivers
and all users.

This has the following advantages:
- Ability to get a per-qp context through wc->qp->qp_context
- Existing drivers already have the qp pointer ready in poll cq, so
  this change actually saves a tiny bit (extra memory read) on data path
  (for ehca it would actually be expensive to find the QP pointer when
  polling a CQ, but ehca does not support SRQ so we can leave wc->qp as
  NULL for ehca)
- Users that need the QP number can still get it through wc->qp->qp_num

Use case:

In IPoIB connected mode code, I have a common CQ shared by multiple
QPs.  To track connection usage, I need a way to get at some per-QP
context upon the completion, and I would like to avoid allocating
context object per work request just to stick a QP pointer into it.
With this code, I can just use wc->qp->qp_context.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-04 14:11:55 -08:00
Hoang-Nam Nguyen
cea9ea67e9 IB/ehca: Fix mismatched spin_unlock in irq handler
The lock is taken with _irqsave and hence must be released with
_irqrestore on all paths.

Signed-off-by Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-01-22 17:03:55 -08:00
Hoang-Nam Nguyen
ce29d72cc7 IB/ehca: Fix improper use of yield() with spinlock held
Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-01-22 17:03:55 -08:00
Ishai Rabinovitz
a20f3a6d7e IB/srp: Check match_strdup() return
Checks if the kmalloc in match_strdup() was successful, and bail out
on looking at the token if it failed.

Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-01-22 17:03:54 -08:00
Dotan Barak
f5e10529a9 IB/mthca: Don't execute QUERY_QP firmware command for QP in RESET state
If a QP being queried is in the RESET state, don't execute the
QUERY_QP firmware command (because it will fail).

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-01-09 14:14:28 -08:00
Hoang-Nam Nguyen
f2d9136133 IB/ehca: Use proper GFP_ flags for get_zeroed_page()
Here is a patch for ehca to use proper flag, ie. GFP_ATOMIC
resp. GFP_KERNEL, when calling get_zeroed_page() to prevent "Bug:
scheduling while atomic...". This error does not cause a kernel panic
but makes ipoib un-usable afterwards.  It is reproducible on
2.6.20-rc4 if one does ifconfig down during a flood ping test.  I have
not observed this error in earlier releases incl. 2.6.20-rc1.

This error occurs when a qp event/irq is received and ehca event
handler allocates a control block/page to obtain HCA error data block.
Use of GFP_ATOMIC when in interrupt context prevents this issue.

Signed-off-by Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-01-09 14:14:24 -08:00
Jack Morgenstein
98714cb161 IB/mthca: Fix PRM compliance problem in atomic-send completions
According to the Tavor and Arbel programmer's reference manuals, the
number of bytes transferred is not provided in the byte_cnt field of
the CQ entry for atomic operation completions.  For atomic operations,
the number of bytes transferred is always 8 (when the status is
"success"), and this constant value should always be used by the
driver in the ib_wc entry returned, rather than using the CQE.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-01-07 20:25:24 -08:00
Sean Hefty
0cefcf0bbc RDMA/ucma: Don't report events with invalid user context
There's a problem with how rdma cm events are reported to userspace
that can lead to application crashes.

When a new connection request arrives, a context for the connection is
allocated in the kernel.  The connection event is then reported to
userspace.  The userspace library retrieves the event and allocates
its own context for the connection.  The userspace context is
associated with the kernel's context when accepting.  This allows the
kernel to give userspace context with other events.

A problem occurs if a second event for the same connection occurs
before the user has had a chance to call accept.  The userspace
context has not yet been set, which causes the librdmacm to crash.
(This has been seen when the app takes too long to call accept,
resulting in the remote side timing out and rejecting the connection)

Fix this by ignoring events for new connections until userspace has
set their context.  This can only happen if an error occurs on a new
connection before the user accepts it.  This is okay, since the accept
will just fail later.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-01-07 20:20:08 -08:00
Sean Hefty
30a5ec982e RDMA/ucma: Fix struct ucma_event leak when backlog is full
We discard new connection requests while the listen backlog is full,
but leak a struct ucma_event in the process.  Free the structure in
this case.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-01-07 20:17:34 -08:00
Steve Wise
881a045fc5 RDMA/iwcm: iWARP connection timeouts shouldn't be reported as rejects
The iWARP CM should report timeouts as event RDMA_CM_EVENT_UNREACHABLE,
not event RDMA_CM_EVENT_REJECTED.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-01-07 20:15:58 -08:00
Erez Zilber
f0938401f2 IB/iser: Return error code when PDUs may not be sent
iSER limits the number of outstanding PDUs to send. When this threshold
is reached, it should return an error code (-ENOBUFS) instead of setting
the suspend_tx bit (which should be used only by libiscsi).

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-01-07 10:15:15 -08:00
Michael S. Tsirkin
46707e96b7 IB/mthca: Fix off-by-one in FMR handling on memfree
mthca_table_find() will return the wrong address when the table entry
being searched for is exactly at the beginning of a sglist entry
(other than the first), because it uses >= when it should use >.

Example: assume we have 2 entries in scatterlist, 4K each, offset is
4K.  The current code will return first entry + 4K when we really want
the second entry.

In particular this means mapping an FMR on a memfree HCA may end up
writing the page table into the wrong place, leading to memory
corruption and also causing the HCA to use an incorrect address
translation table.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-01-04 19:46:32 -08:00
Michael S. Tsirkin
9d79f1b467 [PATCH] IB/mthca: Fix FMR breakage caused by kmemdup() conversion
Commit bed8bdfd ("IB: kmemdup() cleanup") introduced one bad conversion to
kmemdup() in mthca_alloc_fmr(), where the structure allocated and the
structure copied are not the same size.  Revert this back to the original
kmalloc()/memcpy() code.

Reported-by: Dotan Barak <dotanb@mellanox.co.il>.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <roland@digitalvampire.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-30 10:55:55 -08:00
Roland Dreier
0b0df6f207 IB/mthca: Use DEFINE_MUTEX() instead of mutex_init()
mthca_device_mutex() can be initialized automatically with
DEFINE_MUTEX() rather than explicitly calling mutex_init().  This
saves a bit of text and shrinks the source by a line, so we may as
well do it....

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-15 20:55:28 -08:00
Leonid Arsh
82da703ee6 IB/mthca: Add HCA profile module parameters
Add module parameters that enable settting some of the HCA
profile values, such as the number of QPs, CQs, etc.

Signed-off-by: Leonid Arsh <leonida@voltaire.com>
Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-15 20:46:31 -08:00
Roland Dreier
bf628dc22a IB/srp: Fix FMR mapping for 32-bit kernels and addresses above 4G
struct srp_device.fmr_page_mask was unsigned long, which means that
the top part of addresses above 4G was being chopped off on 32-bit
architectures.  Of course nothing good happens when data from SRP
targets is DMAed to the wrong place.

Fix this by changing fmr_page_mask to u64, to match the addresses
actually used by IB devices.

Thanks to Brian Cain <Brian.Cain@ge.com> and David McMillen
<davem@systemfabricworks.com> for help diagnosing the bug and testing
the fix.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-15 14:01:49 -08:00
Roland Dreier
82b399133b IPoIB: Make sure struct ipoib_neigh.queue is always initialized
Move the initialization of ipoib_neigh's skb_queue into
ipoib_neigh_alloc(), since commit 2745b5b7 ("IPoIB: Fix skb leak when
freeing neighbour") will make iterate over the skb_queue to free any
packets left over when freeing the ipoib_neigh structure.

This fixes a crash when freeing ipoib_neigh structures allocated in
ipoib_mcast_send(), which otherwise don't have their skb_queue
initialized.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-12 14:48:18 -08:00
Ralph Campbell
5180311fe9 IB/iser: Use the new verbs DMA mapping functions
Convert iSER to use the new verbs DMA mapping functions for kernel
verbs consumers.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-12 14:31:00 -08:00
Ralph Campbell
85507bcce0 IB/srp: Use new verbs IB DMA mapping functions
Convert SRP to use the new verbs DMA mapping functions for kernel
verbs consumers.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-12 14:30:55 -08:00
Ralph Campbell
37ccf9df97 IPoIB: Use the new verbs DMA mapping functions
Convert IPoIB to use the new DMA mapping functions
for kernel verbs consumers.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-12 14:30:48 -08:00
Ralph Campbell
1527106ff8 IB/core: Use the new verbs DMA mapping functions
Convert code in core/ to use the new DMA mapping functions for kernel
verbs consumers.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-12 14:28:30 -08:00
Ralph Campbell
f2cbb660ed IB/ipath: Implement new verbs DMA mapping functions
This patch implements the interposing DMA mapping functions to allow
support for IOMMUs and remove the dependence on phys_to_virt() and
bus_to_virt().

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-12 14:28:28 -08:00
Sean Hefty
7521663857 RDMA/cma: Export rdma cm interface to userspace
Export the rdma cm interfaces to userspace via a misc device.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-12 11:50:22 -08:00
Sean Hefty
628e5f6d39 RDMA/cma: Add support for RDMA_PS_UDP
Allow the use of UD QPs through the rdma_cm, in order to provide
address translation services for resolving IB addresses for datagram
messages using SIDR.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-12 11:50:21 -08:00
Sean Hefty
0fe313b000 RDMA/cma: Allow early transition to RTS to handle lost CM messages
During connection establishment, the passive side of a connection can
receive messages from the active side before the connection event has
been delivered to the user.  Allow the passive side to send messages
in response to received data before the event is delivered.  To handle
the case where the connection messages are lost, a new rdma_notify()
function is added that users may invoke to force a connection into the
established state.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-12 11:50:21 -08:00
Sean Hefty
a1b1b61f80 RDMA/cma: Report connect info with connect events
Connection information was never given to the recipient of a
connection request or reply message.  Only the event was delivered.
Report the connection data with the event to allows user to
reject the connection based on the requested parameters, or adjust
their resources to match the request.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-12 11:50:21 -08:00
Sean Hefty
9b2e9c0c24 RDMA/cma: Remove unneeded qp_type parameter from rdma_cm
The qp_type parameter into the rdma_cm is unneeded, and can be
misleading.  The QP type should be determined from the port space.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-12 11:50:21 -08:00
Roland Dreier
0a1336c8c9 IB/ipath: Fix IRQ for PCI Express HCAs
Commit 51f65ebc ("IB/ipath - program intconfig register using new HT
irq hook"), which fixed interrupts for HyperTransport HCAs, broke PCI
Express HCAs, because for those HCAs, the driver uses the value of
pdev->irq before pci_enable_msi() and ends up getting a totally bogus
IRQ number.  Fix this by using the value of pdev->irq after
pci_enable_msi().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-12 11:50:20 -08:00
Krishna Kumar
ad1f9791e9 RDMA/amso1100: Fix memory leak in c2_qp_modify()
vq_req is leaked in error cases.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-12 11:50:20 -08:00
Roland Dreier
dee234f48a IB/iser: Remove unused "write-only" variables
Remove variables that are set but then never looked at in the iSER
initiator.  These cleanups came from David Binderman's list of "set
but never used" warnings from icc.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-12 11:50:20 -08:00
Roland Dreier
44f8e3f3f7 IB/ipath: Remove unused "write-only" variables
Remove variables that are set but then never looked at in the ipath
driver.  These cleanups came from David Binderman's list of "set but
never used" warnings from icc.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-12 11:50:20 -08:00
Roland Dreier
f47e22c6e4 IB/fmr: ib_flush_fmr_pool() may wait too long
ib_flush_fmr_pool() stashes away the request generation number
properly, but then goes ahead and rereads it every time it tests
whether the flush generation number has caught up.  This means that
there is a theoretical possibility of livelock, if the request
generation number keeps getting bumped and the flush generation number
never catches up.  The fix is simple: use the request generation
number read at the beginning of the function.

Also, atomic_inc() followed by atomic_read() can be replaced with
atomic_int_return().  There's no real requirement for atomicity here
but we might as well shrink the code.

This bug was discovered using David Binderman's list of "set but never
used" warnings from icc.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-12 11:50:19 -08:00
David Howells
f0d1b0b30d [PATCH] LOG2: Implement a general integer log2 facility in the kernel
This facility provides three entry points:

	ilog2()		Log base 2 of unsigned long
	ilog2_u32()	Log base 2 of u32
	ilog2_u64()	Log base 2 of u64

These facilities can either be used inside functions on dynamic data:

	int do_something(long q)
	{
		...;
		y = ilog2(x)
		...;
	}

Or can be used to statically initialise global variables with constant values:

	unsigned n = ilog2(27);

When performing static initialisation, the compiler will report "error:
initializer element is not constant" if asked to take a log of zero or of
something not reducible to a constant.  They treat negative numbers as
unsigned.

When not dealing with a constant, they fall back to using fls() which permits
them to use arch-specific log calculation instructions - such as BSR on
x86/x86_64 or SCAN on FRV - if available.

[akpm@osdl.org: MMC fix]
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David Howells <dhowells@redhat.com>
Cc: Wojtek Kaniewski <wojtekka@toxygen.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:51 -08:00
Josef Sipek
1cfd6e648b [PATCH] struct path: convert infiniband
Signed-off-by: Josef Sipek <jsipek@fsl.cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:46 -08:00
Christoph Lameter
e94b176609 [PATCH] slab: remove SLAB_KERNEL
SLAB_KERNEL is an alias of GFP_KERNEL.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:24 -08:00
Christoph Lameter
54e6ecb239 [PATCH] slab: remove SLAB_ATOMIC
SLAB_ATOMIC is an alias of GFP_ATOMIC

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:24 -08:00
David Howells
9db7372445 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:

	drivers/ata/libata-scsi.c
	include/linux/libata.h

Futher merge of Linus's head and compilation fixups.

Signed-Off-By: David Howells <dhowells@redhat.com>
2006-12-05 17:01:28 +00:00
David Howells
4c1ac1b491 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:

	drivers/infiniband/core/iwcm.c
	drivers/net/chelsio/cxgb2.c
	drivers/net/wireless/bcm43xx/bcm43xx_main.c
	drivers/net/wireless/prism54/islpci_eth.c
	drivers/usb/core/hub.h
	drivers/usb/input/hid-core.c
	net/core/netpoll.c

Fix up merge failures with Linus's head and fix new compilation failures.

Signed-Off-By: David Howells <dhowells@redhat.com>
2006-12-05 14:37:56 +00:00
Al Viro
a1f8e7f7fb [PATCH] severing skbuff.h -> highmem.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-12-04 02:00:29 -05:00
Michael S. Tsirkin
f469b2626f IB/ucm: Fix deadlock in cleanup
ib_ucm_cleanup_events() holds file_mutex while calling ib_destroy_cm_id().
This can deadlock since ib_destroy_cm_id() flushes event handlers, and
ib_ucm_event_handler() needs file_mutex, too.  Therefore, drop the
file_mutex during the call to ib_destroy_cm_id().

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:10 -08:00
Sean Hefty
e1444b5a16 IB/cm: Fix automatic path migration support
The ib_cm_establish() function is replaced with a more generic
ib_cm_notify().  This routine is used to notify the CM that failover
has occurred, so that future CM messages (LAP, DREQ) reach the remote
CM.  (Currently, we continue to use the original path)  This bumps the
userspace CM ABI.

New alternate path information is captured when a LAP message is sent
or received.  This allows QP attributes to be initialized for the user
when a new path is loaded after failover occurs.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:10 -08:00
Michael S. Tsirkin
2745b5b713 IPoIB: Fix skb leak when freeing neighbour
ipoib_neigh_free() is sometimes called while neighbour is still alive,
so it might still have queued skbs.  Fix skb leak in this case.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:09 -08:00
Vu Pham
d2fcea7d68 IB/srp: Fix memory leak on reconnect
SRP reallocates the IU buffers for tx_ring and rx_ring without freeing
the old buffers when it reconnects to a target.  Fix this by keeping
the old IU buffers around.

Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:09 -08:00
Roland Dreier
04699a1f86 RDMA/addr: list_move() cleanups
Replace a couple list_del()/list_add() combos with list_move().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:09 -08:00
Krishna Kumar
c78bb8442b RDMA/addr: Fix some cancellation problems in process_req()
Fix following problems in process_req() relating to cancellation:

- Function is wrongly doing another addr_remote() when cancelled,
  which is not required.
- Make failure reporting immediate by using time_after_eq().
- On cancellation, -ETIMEDOUT was returned to the callback routine
  instead of the more appropriate -ECANCELLED (users getting notified
  may want to print/return this status, eg ucma_event_handler).

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:09 -08:00
Krishna Kumar
c9edea298e RDMA/amso1100: Prevent deadlock in destroy QP
It is possible to swap the CQs used for send_cq and recv_cq when
creating two different QPs.  If these two QPs are then destroyed at
the same time, an AB-BA deadlock can occur because the CQ locks are
taken our of order.  Fix this by always taking CQ locks in a fixed
order.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:08 -08:00
Jack Morgenstein
7013696a5f IB/mthca: Fix initial SRQ logsize for mem-free HCAs
When initializing an mthca SRQ, the log_srq_size field should be the
log of the number of SRQ WQEs, not the log of the number of bytes in
the SRQ.

This affects only mthca drivers for memfree HCAs which set the initial
srq wqe counter (in the SW2HW transition) to a non-zero value.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:08 -08:00
Hoang-Nam Nguyen
2771e9ed47 IB/ehca: Use WQE offset instead of WQE addr for pending work reqs
This is a patch for ehca to fix a bug in prepare_sqe_to_rts(), which
used WQE address to iterate pending work requests.  This might cause
an access violation since the queue pages can not be assumed to follow
each other consecutively.  Thus, this patch introduces a few queue
functions to determine WQE offset based on its address and uses WQE
offset to iterate the pending work requests.

Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:08 -08:00
Krishna Kumar
9ab1ffa877 RDMA/iwcm: Fix comment for iwcm_deref_id() to match code
In iwcm_deref_id(), the comment says : "If the last reference is being
removed and iw_destroy_cm_id is waiting, wake up the waiting
thread". The second part of the comment, "and iw_destroy_cm_id is
waiting," is wrong, since this function either wakes the waiter
already waiting in iwcm_deref_id, or enables it (so that when
wait_for_completion() is performed later, it will immediately return).

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:08 -08:00
Krishna Kumar
715a588f42 RDMA/iwcm: Remove unnecessary function argument
Remove unnecessary cm_id_priv argument to copy_private_data(), and
change text to reflect the code.  Fix couple of typos in comments.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:08 -08:00
Krishna Kumar
13fccdb380 RDMA/iwcm: Remove unnecessary initializations
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:08 -08:00
Krishna Kumar
83b9658623 RDMA/iwcm: Fix memory leak
If we get IW_CM_EVENT_CONNECT_REQUEST message and encounter an error
(not in the LISTEN state, cannot create an id, cannot alloc
work_entry, etc), then the memory allocated by cm_event_handler() in
the event->private_data gets leaked. Since cm_work_handler has already
put the event on the work_free_list, this allocated memory is
leaked. High backlog value can allow DoS attacks.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:07 -08:00
Krishna Kumar
33ba0fa9f3 RDMA/iwcm: Fix memory corruption bug in cm_work_handler()
Possible memory corruption scenario: after putting the work entry back
on the work_free_list, we call process_event() which dereferences
work->event, which could have been modified to another value
meanwhile.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:07 -08:00
Roland Dreier
e54f81889c IB: Convert kmem_cache_t -> struct kmem_cache
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:07 -08:00
Roland Dreier
53533e16b1 IB/ipath: Fix typo in pma_counter_select subscript
The array has only 5 entries, so [5] should have been [4].

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:07 -08:00
Roland Dreier
29666128a2 RDMA/amso1100: Fix section mismatches
The amso1100 driver was missing a couple of __devinit/__devexit
annotations for init/cleanup functions that are called from
__devinit/__devexit functions.

Reported by Randy Dunlap <randy.dunlap@oracle.com>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:07 -08:00
Roland Dreier
f4f3d0f0ec IB/mthca: Fix section mismatches
Commit b3b30f5e ("IB/mthca: Recover from catastrophic errors")
introduced some section mismatch breakage, because the error recovery
code tears down and reinitializes the device, which calls into lots of
code originally marked __devinit and __devexit from regular .text.

Fix this by getting rid of these now-incorrect section markers.

Reported by Randy Dunlap <randy.dunlap@oracle.com>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:06 -08:00
Arne Redlich
3c8edf0eca IB/srp: Increase supported CDB size
Set the Scsi_Host's max_cmd_len from 12 (default) to 16 for
SRP. Otherwise scsi_dispatch_cmd() won't pass down certain commands
such as READ CAPACITY 16, required for supporting disks > 2TB.

Signed-off-by: Arne Redlich <arne.redlich@xiranet.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:06 -08:00
Dotan Barak
e31353eaec RDMA/cm: Remove setting local write as part of QP access flags
The qp_access_flags are for remote access permissions only, so
IB_ACCESS_LOCAL_WRITE is an invalid value.  Remove it from the values
set by cm_init_qp_init_attr() and cma_init_ib_qp().

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:06 -08:00
Eric Sesterhenn
bed8bdfddd IB: kmemdup() cleanup
Replace open coded kmemdup() to save some screen space, and allow
inlining/not inlining to be triggered by gcc.

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:06 -08:00
Krishna Kumar
a1a733f65b RDMA/cma: Rewrite cma_req_handler() to encapsulate common code
Rewrite cma_req_handler error handling case to encapsulate
common code.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:05 -08:00
Krishna Kumar
f115db4803 RDMA/addr: Use time_after_eq() instead of time_after() in queue_req()
In queue_req(), use time_after_eq() instead of time_after()
for following reasons :

- Improves insert time if multiple entries with same time are
  present.
- set_timeout need not be called if entry with same time
  is added to the list (and that happens to be the entry
  with the smallest time), saving atomic/locking operations.
- Earlier entries with same time are deleted first (fifo).

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:05 -08:00
Krishna Kumar
e4022274cf RDMA/cma: Remove redundant check in cma_add_one
Remove redundant check of node_guid in cma_add_one().

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:04 -08:00
Krishna Kumar
e82153b54d RDMA/cma: Optimize cma_bind_loopback() to check for empty list
Optimize to test for an empty list first.  This ends up simplifying
the code too.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:04 -08:00
David Howells
c4028958b6 WorkStruct: make allyesconfig
Fix up for make allyesconfig.

Signed-Off-By: David Howells <dhowells@redhat.com>
2006-11-22 14:57:56 +00:00
Bryan O'Sullivan
3f5a6ca31c IB/ipath: Depend on CONFIG_NET
ipath uses skb functions and won't build without CONFIG_NET.

Spotted by Randy Dunlap.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-20 13:06:19 -08:00
Linus Torvalds
f44ea62344 Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
  IPoIB: Clear high octet in QP number
2006-11-20 10:48:23 -08:00
Michael S. Tsirkin
073ae841d6 IPoIB: Clear high octet in QP number
IPoIB assumes that high (reserved) octet in the hardware address is 0,
and copies it into the QPN.  This violates RFC 4391 (which requires
that the high 8 bits are ignored on receive), and will result in an
invalid QPN being used when interoperating with IPoIB connected mode.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-16 13:56:45 -08:00
Bryan O'Sullivan
e757bef270 [PATCH] IB/ipath - fix driver build for platforms with PCI, but not HT
The PCI Express and Hypertransport chip-specific source files should only
be built when the kernel has the capability of actually compiling them.

This fixes the driver build on, for example, ia64.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-16 11:43:37 -08:00
Linus Torvalds
0f66c08e96 Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
  IB/mad: Fix race between cancel and receive completion
  RDMA/amso1100: Fix && typo
  RDMA/amso1100: Fix unitialized pseudo_netdev accessed in c2_register_device
  IB/ehca: Activate scaling code by default
  IB/ehca: Use named constant for max mtu
  IB/ehca: Assure 4K alignment for firmware control blocks
2006-11-13 09:52:04 -08:00
Roland Dreier
39798695b4 IB/mad: Fix race between cancel and receive completion
When ib_cancel_mad() is called, it puts the canceled send on a list
and schedules a "flushed" callback from process context.  However,
this leaves a window where a receive completion could be processed
before the send is fully flushed.

This is fine, except that ib_find_send_mad() will find the MAD and
return it to the receive processing, which results in the sender
getting both a successful receive and a "flushed" send completion for
the same request.  Understandably, this confuses the sender, which is
expecting only one of these two callbacks, and leads to grief such as
a use-after-free in IPoIB.

Fix this by changing ib_find_send_mad() to return a send struct only
if the status is still successful (and not "flushed").  The search of
the send_list already had this check, so this patch just adds the same
check to the search of the wait_list.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-13 09:38:07 -08:00
Jean Delvare
b26c791e9c RDMA/amso1100: Fix && typo
Fix the AMSO1100 firmware version computation, which was broken
due to "&&" being used where "&" should have.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-13 09:38:07 -08:00
Tom Tucker
2ffcab6ae4 RDMA/amso1100: Fix unitialized pseudo_netdev accessed in c2_register_device
Rework some load-time error handling: c2_register_device() leaked when
it failed, and the function that called it didn't check the return code.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-13 09:38:04 -08:00
Hoang-Nam Nguyen
f2c238a0c5 IB/ehca: Activate scaling code by default
Change ehca's Kconfig to activates scaling code as default.  After
several measurements we saw that this feature prevents dropped packets
(UD) in stress situation. Thus, enabling it helps to improve ehca's
bandwidth through IPoIB.

Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-13 08:46:28 -08:00
Hoang-Nam Nguyen
c58121143f IB/ehca: Use named constant for max mtu
Define and use a constant EHCA_MAX_MTU instead hardcoded value.

Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-13 08:46:28 -08:00
Hoang-Nam Nguyen
7e28db5d8f IB/ehca: Assure 4K alignment for firmware control blocks
Assure 4K alignment for firmware control blocks in 64K page mode,
because kzalloc()'s result address might not be 4K aligned if 64K
pages are enabled. Thus, we introduce wrappers called
ehca_{alloc,free}_fw_ctrlblock(), which use a slab cache for objects
with 4K length and 4K alignment in order to alloc/free firmware
control blocks in 64K page mode. In 4K page mode those wrappers just
are defines of get_zeroed_page() and free_page().

Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-09 12:41:57 -08:00
Bryan O'Sullivan
51f65ebccf [PATCH] IB/ipath - program intconfig register using new HT irq hook
Eric's changes to the htirq infrastructure require corresponding
modifications to the ipath HT driver code so that interrupts are still
delivered properly.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Roland Dreier <rdreier@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-08 18:29:25 -08:00
Sean Hefty
7a118df3ea RDMA/addr: Use client registration to fix module unload race
Require registration with ib_addr module to prevent caller from
unloading while a callback is in progress.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-02 14:26:04 -08:00
Michael S. Tsirkin
68586b67ab IB/mthca: Fix MAD extended header format for MAD_IFC firmware command
Several fields in an incoming MAD extended info header were passed
into the MAD_IFC firmware command at incorrect offsets (mostly off by
4 bytes).  As the result, the HCA will fail to generate traps in which
this info is needed (e.g. traps which include the GRH of the incoming
packet), in violation of the IB spec.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-31 10:51:26 -08:00
Jack Morgenstein
0b26c88f29 IB/uverbs: Return sq_draining value in query_qp response
Return the sq_draining value back to user space for query_qp instead
of the en_sqd_async notify value, which is valid only for
modify_qp.  For query_qp, the draining status should returned.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-30 21:19:35 -08:00
Steve Wise
d7b748d63c IB/amso1100: Fix incorrect pr_debug()
pr_debug() was printing the wrong stuff.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-30 20:52:53 -08:00
Steve Wise
8de94ce19d IB/amso1100: Use dma_alloc_coherent() instead of kmalloc/dma_map_single
The Ammasso driver needs to use dma_alloc_coherent() for
allocating memory that will be used by the HW for dma.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-30 20:52:52 -08:00
Paul Mackerras
04d03bc576 IB/ehca: Fix eHCA driver compilation for uniprocessor
The eHCA driver does not compile for a uniprocessor configuration
(CONFIG_SMP=n), due to H_SUCCESS and other symbols being undefined.
This fixes it.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Hoang-Nam Nguyen <HNGUYEN@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-30 20:52:51 -08:00
Krishna Kumar
255d0c14b3 RDMA/cma: rdma_bind_addr() leaks a cma_dev reference count
rdma_bind_addr() leaks a cma_dev reference count in failure case.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
2006-10-30 20:52:51 -08:00
Erez Zilber
2e7a742628 IB/iser: Start connection after enabling iSER
When a connection is started (a new connection or a recovered one),
iSER should prepare its resources for full-featured mode and only then
notify the iSCSI layer that it is ready to start queueing commands.

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-30 20:52:50 -08:00
Arthur Kepner
1f5c23e2c1 IB/mthca: Use mmiowb after doorbell ring
We discovered a problem when running IPoIB applications on multiple
CPUs on an Altix system. Many messages such as:

ib_mthca 0002:01:00.0: SQ 000014 full (19941644 head, 19941707 tail, 64 max, 0 nreq)

appear in syslog, and the driver wedges up.

Apparently this is because writes to the doorbells from different CPUs
reach the device out of order. The following patch adds mmiowb() calls
after doorbell rings to ensure the doorbell writes are ordered.

Signed-off-by: Arthur Kepner <akepner@sgi.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-16 20:22:35 -07:00
Robert Walsh
6ef93dddfe IB/ipath: Initialize diagpkt file on device init only
Don't attempt to set up the diagpkt device in the module init code.
Instead, wait until a piece of hardware is initialized.  Fixes a
problem when loading the ib_ipath module when no InfiniPath hardware
is present: modprobe would go into the D state and stay there.

Signed-off-by: Robert Walsh <robert.walsh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-16 10:06:07 -07:00
Adrian Bunk
fb7711e71e RDMA/amso1100: Fix a NULL dereference in error path
This patch fixes a NULL dereference spotted by the Coverity checker.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Acked-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-16 10:06:07 -07:00
Henrik Kretzschmar
d986a27413 RDMA/amso1100: pci_module_init() conversion
pci_module_init() convertion in amso1100 driver.

Signed-off-by: Henrik Kretzschmar <henne@nachtwindheim.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-16 10:06:07 -07:00
Michael S. Tsirkin
2cbe19d48a IB/mthca: Fix off-by-one in mthca SRQ creation
All HCAs (not just mem-free) need a spare SRQ entry, so bump srq->max
by 1 in all cases.

Noted by Jack Morgenstein <jackm@mellanox.co.il>

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-10 12:50:38 -07:00
Roland Dreier
73fbe8be73 IPoIB: Check for DMA mapping error for TX packets
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-10 12:50:38 -07:00
Roland Dreier
1031bfb93a RDMA/amso1100: Fix build with debugging off
Since pr_debug() has changed from a macro to an inline function when
DEBUG is not defined, its arguments now need to be defined even when
debugging is off.  Therefore to_event_str() and to_qp_state_str() need
to be moved out of #ifdef DEBUG.  The compiler will throw the
definitions away if DEBUG is not defined, but it needs to be able to
see that the functions exist.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-10 12:50:38 -07:00
Sean Hefty
82a9c16a10 IB/cm: Send DREP in response to unmatched DREQ
Currently a DREP is only sent in response to a DREQ if a connection
has been found matching the DREQ, and it is in the proper state.  Once
a DREP is sent, the local connection moves into timewait.  Duplicate
DREQs received while in this state result in re-sending the DREP.

However, it's likely that the local connection will enter and exit
timewait before the remote side times out a lost DREP and resends a DREQ.
To handle this, we send a DREP in response to a DREQ, even if a local
connection is not found.  This avoids maintaining disconnected
id's in timewait states for excessively long times, just to handle a
lost DREP.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-10 12:50:38 -07:00
Sean Hefty
8575329d4f IB/cm: Fix timewait crash after module unload
If the ib_cm module is unloaded while id's are still in timewait, the
CM will destroy the work queue used to process timewait.  Once the
id's exit timewait, their timers will fire, leading to a crash trying
to access the destroyed work queue.

We need to track id's that are in timewait, and cancel their deferred
work on module unload.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-10 12:50:38 -07:00
Jack Morgenstein
a8bf4e7717 IB/mthca: Query port fix
Fill in "max_vl_num" (encoded according to VLCap field in the PortInfo MAD)
and "init_type_reply" values in the ib_query_port() verb.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-10 12:50:36 -07:00
Ishai Rabinovitz
01cb9bcbd3 IB/srp: Enable multiple connections to the same target
Enable multiple concurrent connections to the same SRP target:

1) Use port GUID instead of node GUID in the initiator port
   identifier.  This allows connections to be made from multiple HCA
   ports at the same time.
2) Let the user specify the identifier extention when adding the
   device.  This allows userspace to make multiple connections even
   from the same port, if it wants too.

Without this, only one connection can be made from any given HCA, even
if it has multiple ports, because we don't use multi-channel mode, so
targets will only allow one connection from a given initiator port ID.

Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-10 12:49:05 -07:00
Ishai Rabinovitz
9b0af401aa IB/srp: Remove redundant memset()
scsi_host_alloc() already allocates with kzalloc(), so the struct Scsi_Host
is zeroed out, including the private data portion.  Remove the redundant
memset that zeros this out again in the SRP initiator.

Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-10 09:51:14 -07:00
Tom Tucker
e52e6080ca RDMA/amso1100: Add spinlocks to serialize ib_post_send/ib_post_recv
The AMSO driver was not thread-safe in the post WR code and had
code that would sleep if the WR post FIFO was full. Since these
functions can be called on interrupt level I changed the sleep to a
udelay.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-10 09:51:13 -07:00
David Howells
7d12e780e0 IRQ: Maintain regs pointer globally rather than passing to IRQ handlers
Maintain a per-CPU global "struct pt_regs *" variable which can be used instead
of passing regs around manually through all ~1800 interrupt handlers in the
Linux kernel.

The regs pointer is used in few places, but it potentially costs both stack
space and code to pass it around.  On the FRV arch, removing the regs parameter
from all the genirq function results in a 20% speed up of the IRQ exit path
(ie: from leaving timer_interrupt() to leaving do_IRQ()).

Where appropriate, an arch may override the generic storage facility and do
something different with the variable.  On FRV, for instance, the address is
maintained in GR28 at all times inside the kernel as part of general exception
handling.

Having looked over the code, it appears that the parameter may be handed down
through up to twenty or so layers of functions.  Consider a USB character
device attached to a USB hub, attached to a USB controller that posts its
interrupts through a cascaded auxiliary interrupt controller.  A character
device driver may want to pass regs to the sysrq handler through the input
layer which adds another few layers of parameter passing.

I've build this code with allyesconfig for x86_64 and i386.  I've runtested the
main part of the code on FRV and i386, though I can't test most of the drivers.
I've also done partial conversion for powerpc and MIPS - these at least compile
with minimal configurations.

This will affect all archs.  Mostly the changes should be relatively easy.
Take do_IRQ(), store the regs pointer at the beginning, saving the old one:

	struct pt_regs *old_regs = set_irq_regs(regs);

And put the old one back at the end:

	set_irq_regs(old_regs);

Don't pass regs through to generic_handle_irq() or __do_IRQ().

In timer_interrupt(), this sort of change will be necessary:

	-	update_process_times(user_mode(regs));
	-	profile_tick(CPU_PROFILING, regs);
	+	update_process_times(user_mode(get_irq_regs()));
	+	profile_tick(CPU_PROFILING);

I'd like to move update_process_times()'s use of get_irq_regs() into itself,
except that i386, alone of the archs, uses something other than user_mode().

Some notes on the interrupt handling in the drivers:

 (*) input_dev() is now gone entirely.  The regs pointer is no longer stored in
     the input_dev struct.

 (*) finish_unlinks() in drivers/usb/host/ohci-q.c needs checking.  It does
     something different depending on whether it's been supplied with a regs
     pointer or not.

 (*) Various IRQ handler function pointers have been moved to type
     irq_handler_t.

Signed-Off-By: David Howells <dhowells@redhat.com>
(cherry picked from 1b16e7ac850969f38b375e511e3fa2f474a33867 commit)
2006-10-05 15:10:12 +01:00
Dave Jones
038b0a6d8d Remove all inclusions of <linux/config.h>
kbuild explicitly includes this at build time.

Signed-off-by: Dave Jones <davej@redhat.com>
2006-10-04 03:38:54 -04:00
Matt LaPlante
cab00891c5 Still more typo fixes
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-10-03 22:36:44 +02:00
Hoang-Nam Nguyen
e5a0106901 IB/ehca: Tweak trace message format
Add an extra space to make things more readable.

Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-02 14:52:17 -07:00
Hoang-Nam Nguyen
0f248d9cde IB/ehca: Fix device registration
Move the call to ib_register_device() later, since a device should not
be registered until it is completely read to be used.  This fixes
crashes that occur if an upper-layer driver such as IPoIB is loaded
before the ehca module.

Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-02 14:52:17 -07:00