linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-11 14:42:24 +00:00

Author	SHA1	Message	Date
Ramkrishna Vepa	e0f30baca1	IB/qib: Add optional NUMA affinity This patch adds context relative numa affinity conditioned on the module parameter numa_aware. The qib_ctxtdata has an additional node_id member and qib_create_ctxtdata() has an addition node_id parameter. The allocations within the hdr queue and eager queue setup routines now take this additional member and adjust allocations as necesary. PSM will pass the either current numa node or the node closest to the HCA depending on numa_aware. Verbs will always use the node closest to the HCA. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com> Signed-off-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-21 17:19:48 -07:00
Mike Marciniszyn	8469ba39a6	IB/qib: Add DCA support This patch adds DCA cache warming for systems that support DCA. The code uses cpu affinity notification to react to an affinity change from a user mode program like irqbalance and (re-)program the chip accordingly. This notification avoids reading the current cpu on every interrupt. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> [ Add Kconfig dependency on SMP && GENERIC_HARDIRQS to avoid failure to build due to undefined struct irq_affinity_notify. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-21 17:19:38 -07:00
Stephen Hemminger	1d3520357d	make drivers with pci error handlers const Covers the rest of the uses of pci error handler. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2012-09-07 16:35:00 -06:00
Mike Marciniszyn	5d7fe4efbf	IB/qib: Fix size of cc_supported_table_entries Commit `36a8f01cd2` ("IB/qib: Add congestion control agent implementation") tries to store the value 1984 in a u8, which leads to truncation. Fix this by making the member big enough. This bug was detected by a smatch warning. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-29 20:26:10 -07:00
Mike Marciniszyn	36a8f01cd2	IB/qib: Add congestion control agent implementation Add a congestion control agent in the driver that handles gets and sets from the congestion control manager in the fabric for the Performance Scale Messaging (PSM) library. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-19 11:20:04 -07:00
Mike Marciniszyn	551ace124d	IB/qib: Reduce sdma_lock contention Profiling has shown that sdma_lock is proving a bottleneck for performance. The situations include: - RDMA reads when krcvqs > 1 - post sends from multiple threads For RDMA read the current global qib_wq mechanism runs on all CPUs and contends for the sdma_lock when multiple RMDA read requests are fielded on differenct CPUs. For post sends, the direct call to qib_do_send() from multiple threads causes the contention. Since the sdma mechanism is per port, this fix converts the existing workqueue to a per port single thread workqueue to reduce the lock contention in the RDMA read case, and for any other case where the QP is scheduled via the workqueue mechanism from more than 1 CPU. For the post send case, This patch modifies the post send code to test for a non empty sdma engine. If the sdma is not idle the (now single thread) workqueue will be used to trigger the send engine instead of the direct call to qib_do_send(). Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-19 11:19:58 -07:00
Mike Marciniszyn	1c94283ddb	IB/qib: Add cache line awareness to qib_qp and qib_devdata structures This patch reorganizes the QP and devdata files to be more cache line aware. qib_qp fields in particular are split into read-mostly, send, and receive fields. qib_devdata fields are split into read-mostly and read/write fields Testing has show that bidirectional tests improve by as much as 100% with this patch. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-05-14 12:43:34 -07:00
Mike Marciniszyn	bb77a07723	IB/qib: Optimize pio ack buffer allocation This patch optimizes pio buffer allocation in the kernel. For qib, kernel pio buffers are used for sending acks. The code to allocate the buffer would always start at 0 until it found a buffer. This means that an average of 64 comparisions were done on each allocate, since the busy bit won't be cleared until the bits are refreshed when buffers are exhausted. This patch adds two new fields in the devdata struct, last_pio and min_kernel_pio. last_pio is the last buffer that was allocated. min_kernel_pio is the lowest potential available buffer. min_kernel_pio is modifed as contexts are allocated and deallocted. Reviewed-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-05-14 12:37:03 -07:00
Mike Marciniszyn	a778f3fddc	IB/qib: Add logic for affinity hint Call irq_set_affinity_hint() to give userspace programs such as irqbalance the information to be able to distribute qib interrupts appropriately. The logic allocates all non-receive interrupts to the first CPU local to the HCA. Receive interrupts are allocated round robin starting with the second CPU local to the HCA with potential wrap back to the second CPU. This patch also adds a refinement to the name registered for MSI-X interrupts so that user level scripts can determine the device associated with the IRQs when there are multiple HCAs with a potentially different set of local CPUs. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-02-25 17:45:49 -08:00
Mike Marciniszyn	af061a644a	IB/qib: Use RCU for qpn lookup The heavy weight spinlock in qib_lookup_qpn() is replaced with RCU. The hash list itself is now accessed via jhash functions instead of mod. The changes should benefit multiple receive contexts in different processors by not contending for the lock just to read the hash structures. The patch also adds a lookaside_qp (pointer) and a lookaside_qpn in the context. The interrupt handler will test the current packet's qpn against lookaside_qpn if the lookaside_qp pointer is non-NULL. The pointer is NULL'ed when the interrupt handler exits. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-21 09:38:54 -07:00
Mike Marciniszyn	9e1c0e4325	IB/qib: Eliminate divide/mod in converting idx to egr buf pointer The context init now saves a shift from rcvegrbufs_perchunk rcvegrbufs_perchunk_shift using ilog2. A BUG_ON() protects the power of 2 assumption. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-21 09:38:52 -07:00
Mike Marciniszyn	53ab1c6498	IB/qib: Correct nfreectxts for multiple HCAs The code that was recently introduced to report the number of free contexts is flawed for multiple HCAs: /* Return the number of free user ports (contexts) available. */ return scnprintf(buf, PAGE_SIZE, "%u\n", dd->cfgctxts - dd->first_user_ctxt - (u32)qib_stats.sps_ctxts); The qib_stats is global to the module, not per HCA, so the code is broken for multiple HCAs. This patch adds a qib_devdata field, freectxts, that reflects the free contexts for this HCA. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Reviewed-by: Ram Vepa <ram.vepa@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-06 09:33:35 -07:00
Mike Marciniszyn	e67306a380	IB/qib: Defer HCA error events to tasklet With ib_qib options: options ib_qib krcvqs=1 pcie_caps=0x51 rcvhdrcnt=4096 singleport=1 ibmtu=4 a run of ib_write_bw -a yields the following: ------------------------------------------------------------------ #bytes #iterations BW peak[MB/sec] BW average[MB/sec] 1048576 5000 2910.64 229.80 ------------------------------------------------------------------ The top cpu use in a profile is: CPU: Intel Architectural Perfmon, speed 2400.15 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 1002300 Counted LLC_MISSES events (Last level cache demand requests from this core that missed the LLC) with a unit mask of 0x41 (No unit mask) count 10000 samples % samples % app name symbol name 15237 29.2642 964 17.1195 ib_qib.ko qib_7322intr 12320 23.6618 1040 18.4692 ib_qib.ko handle_7322_errors 4106 7.8860 0 0 vmlinux vsnprintf Analysis of the stats, profile, the code, and the annotated profile indicate: - All of the overflow interrupts (one per packet overflow) are serviced on CPU0 with no mitigation on the frequency. - All of the receive interrupts are being serviced by CPU0. (That is the way truescale.cmds statically allocates the kctx IRQs to CPU) - The code is spending all of its time servicing QIB_I_C_ERROR RcvEgrFullErr interrupts on CPU0, starving the packet receive processing. - The decode_err routine is very inefficient, using a printf variant to format a "%s" and continues to loop when the errs mask has been cleared. - Both qib_7322intr and handle_7322_errors read pci registers, which is very inefficient. The fix does the following: - Adds a tasklet to service QIB_I_C_ERROR - Replaces the very inefficient scnprintf() with a memcpy(). A field is added to qib_hwerror_msgs to save the sizeof("string") at compile time so that a strlen is not needed during err_decode(). - The most frequent errors (Overflows) are serviced first to exit the loop as early as possible. - The loop now exits as soon as the errs mask is clear rather than fruitlessly looping through the msp array. With this fix the performance changes to: ------------------------------------------------------------------ #bytes #iterations BW peak[MB/sec] BW average[MB/sec] 1048576 5000 2990.64 2941.35 ------------------------------------------------------------------ During testing of the error handling overflow patch, it was determined that some CPU's were slower when servicing both overflow and receive interrupts on CPU0 with different MSI interrupt vectors. This patch adds an option (krcvq01_no_msi) to not use a dedicated MSI interrupt for kctx's < 2 and to service them on the default interrupt. For some CPUs, the cost of the interrupt enter/exit is more costly than then the additional PCI read in the default handler. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-22 11:56:05 -07:00
Lucas De Marchi	25985edced	Fix common misspellings Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>	2011-03-31 11:26:23 -03:00
Mike Marciniszyn	19ede2e422	IB/qib: Fix interrupt mitigation For SusieQ we need to write to the interrupt timer register before updating the header queue head with interrupt count. This is to ensure that the timer is enabled properly and a receive available interrupt is delivered. Otherwise this interrupt can be lost if the receiver header/eager queues are full before the timer is enabled. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-10 17:42:21 -08:00
Jason Gunthorpe	82fdb0ab54	IB/qib: Fix extra log level in qib_early_err() Noticed this odd looking thing in dmesg: ib_qib 0000:02:00.0: <3>ib_qib: Unable to enable pcie error reporting: -5 which is due to a bad use of dev_info. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Acked-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-10-26 16:09:02 -07:00
David Miller	ba818afdc6	IB/qib: Add missing <linux/slab.h> include Fix build failure on sparc64 which is missing the include of <linux/slab.h> via <asm/pci.h> that x86, powerpc, ia64, etc. have. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-08-05 14:26:58 -07:00
Ralph Campbell	cc323b2aaa	IB/qib: Avoid variable-length array Rather than use a variable size array allocation on the stack, define a constant for the maximum array size possible. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-07-19 13:21:24 -07:00
Dave Olson	fce24a9d28	IB/qib: Don't mark VL15 bufs as WC to avoid a rare 7322 chip problem Don't set write combining via PAT on the VL15 buffers to avoid a rare problem with unaligned writes from interrupt-flushed store buffers. Signed-off-by: Dave Olson <dave.olson@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-07-06 14:13:20 -07:00
Ralph Campbell	f931551baf	IB/qib: Add new qib driver for QLogic PCIe InfiniBand adapters Add a low-level IB driver for QLogic PCIe adapters. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-05-23 21:44:54 -07:00

20 Commits