Current message on FAWWN events is rather cryptic.
Expand the message to clarify its meaning.
Link: https://lore.kernel.org/r/20191105005708.7399-8-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver today is reading service parameters from the firmware and then
overwriting the firmware-provided values with values of its own. There are
some switch features that require preliminary FLOGI's that are
switch-specific and done prior to the actual fabric FLOGI for traffic. The
fw will perform those FLOGIs and will revise the service parameters for the
features configured. As the driver later overwrites those values with its
own values, it misconfigures things like BBSCN use by doing so.
Correct by eliminating the driver-overwrite of firmware values. The driver
correctly re-reads the service parameters after each link up to obtain the
latest values from firmware.
Link: https://lore.kernel.org/r/20191105005708.7399-3-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When debugging a recent discovery customer problem it was very hard to tell
what was happening with the existing discovery log messages. To fully debug
the issue additional log messages were necessary.
Add or extend log messages so that sufficient information is present for
debugging.
Link: https://lore.kernel.org/r/20191018211832.7917-16-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
After study, it was determined there was a double free of a CT iocb during
execution of lpfc_offline_prep and lpfc_offline. The prep routine issued
an abort for some CT iocbs, but the aborts did not complete fast enough for
a subsequent routine that waits for completion. Thus the driver proceeded
to lpfc_offline, which releases any pending iocbs. Unfortunately, the
completions for the aborts were then received which re-released the ct
iocbs.
Turns out the issue for why the aborts didn't complete fast enough was not
their time on the wire/in the adapter. It was the lpfc_work_done routine,
which requires the adapter state to be UP before it calls
lpfc_sli_handle_slow_ring_event() to process the completions. The issue is
the prep routine takes the link down as part of it's processing.
To fix, the following was performed:
- Prevent the offline routine from releasing iocbs that have had aborts
issued on them. Defer to the abort completions. Also means the driver
fully waits for the completions. Given this change, the recognition of
"driver-generated" status which then releases the iocb is no longer
valid. As such, the change made in the commit 296012285c is reverted.
As recognition of "driver-generated" status is no longer valid, this
patch reverts the changes made in
commit 296012285c ("scsi: lpfc: Fix leak of ELS completions on adapter reset")
- Modify lpfc_work_done to allow slow path completions so that the abort
completions aren't ignored.
- Updated the fdmi path to recognize a CT request that fails due to the
port being unusable. This stops FDMI retries. FDMI will be restarted on
next link up.
Link: https://lore.kernel.org/r/20190922035906.10977-14-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
An issue was seen discovering all SCSI Luns when a target device undergoes
link bounce.
The driver currently does not qualify the FC4 support on the target.
Therefore it will send a SCSI PRLI and an NVMe PRLI. The expectation is
that the target will reject the PRLI if it is not supported. If a PRLI
times out, the driver will retry. The driver will not proceed with the
device until both SCSI and NVMe PRLIs are resolved. In the failure case,
the device is FCP only and does not respond to the NVMe PRLI, thus
initiating the wait/retry loop in the driver. During that time, a RSCN is
received (device bounced) causing the driver to issue a GID_FT. The GID_FT
response comes back before the PRLI mess is resolved and it prematurely
cancels the PRLI retry logic and leaves the device in a STE_PRLI_ISSUE
state. Discovery with the target never completes or resets.
Fix by resetting the node state back to STE_NPR_NODE when GID_FT completes,
thereby restarting the discovery process for the node.
Link: https://lore.kernel.org/r/20190922035906.10977-10-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When target-side fault injections are made, the driver isn't reconnecting
to the remote port. The driver is logging "2753" error messages which
state:
"PLOGI failure DID:1B2400 Status:x3/xf0240008"
The failures status is indicating a Illegal field error, which points to
the Temporary RPI field being used for the ELS. This error typically means
the driver used an RPI that was already registered (shouldn't be registered
if using it in this context).
Study has found that if the driver were in discovery attempts and
encountered an error, it wouldn't flag the temporary rpi in error. Yet the
rpi was released for reallocation in these error paths and another ELS
could allocate the rpi. In the failure situation a retry was done on an ELS
that had encountered an error, and as the rpi wasn't marked in error, the
ELS reused the rpi it originally allocated. But that rpi had been allocated
by a different ELS issued after the original error and before the retry
attempt. The different ELS had succeeded and the RPI was registered.
Fix by marking the rpi state for the node to be in error, aka as needing
reallocation, upon an error in the els processing. Error state marking is
always done prior to release back to the internal rpi free list, which the
driver wasn't doing in cases prior.
Also enhanced some of the logging to help in the next case of problem
troubleshooting.
Link: https://lore.kernel.org/r/20190922035906.10977-7-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
A prior use-after-free mailbox fix solved it's problem by null'ing a ndlp
pointer. However, further testing has shown that this change causes a
later state change to occasionally be skipped, which results in a reference
count never being decremented thus the rpi is never released, which causes
a vport delete to never succeed.
Revise the fix in the prior patch to no longer null the ndlp. Instead the
RELEASE_RPI flag is set which will drive the release of the rpi.
Given the new code was added at a deep indentation level, refactor the code
block using a new routine that avoids the indentation issues.
Fixes: 9b16406864 ("scsi: lpfc: Fix use-after-free mailbox cmd completion")
Link: https://lore.kernel.org/r/20190922035906.10977-6-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Convert the remaining %pf users to %ps to prepare for the removal of the
old %pf conversion specifier support.
Fixes: 3235066449 ("scsi: lpfc: Migrate to %px and %pf in kernel print calls")
Link: https://lore.kernel.org/r/20190904160423.3865-1-sakari.ailus@linux.intel.com
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In order to see real addresses, convert %p with %px for kernel addresses
and replace %p with %pf for functions.
While converting, standardize on "x%px" throughout (not %px or 0x%px).
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
GetTrunkInfo is displaying an incorrect link speed when the link is a trunk
and the link has gone down. The driver is not clearing the logical speed
as part of the link down transition.
Fix by setting the logical speed to UNKNOWN SPEED when the link goes down.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During cable pull testing a deadlock was seen between lpfc_nlp_counters()
vs lpfc_mbox_process_link_up() vs lpfc_work_list_done(). They are all
waiting on the shost->host_lock.
Issue is all of these cases raise irq when taking out the lock but use
spin_unlock_irq() when unlocking. The unlock path is will unconditionally
re-enable interrupts in cases where irq state should be preserved. The
re-enablement allowed the other paths to execute which then causes the
deadlock.
Fix by converting the lock/unlock to irqsave/irqrestore.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In tests with remote ports contantly logging out/logging coupled with
occassional local link bounce, if a remote port is disocnnected for longer
than devloss_tmo and then subsequently reconnected, eventually the test
will fail to login with the remote port and remote port connectivity is
lost.
When devloss_tmo expires, the driver does not free the node struct until
the port or npiv instances is being deleted. The node is left allocated but
the state set to UNUSED. If the node was in the process of logging in when
the local link drop occurred, meaning the RPI was allocated for the node in
order to send the ELS, but not yet registered which comes after successful
login, the node is moved to the NPR state, and if devloss expires, to
UNUSED state. If the remote port comes back, the node associated with it
is restarted and this path happens to allocate a new RPI and overwrites the
prior RPI value. In the cases where the port was logged in and loggs out,
the path did release the RPI but did not set the node rpi value. In the
cases where the remote port never finished logging in, the path never did
the call to release the rpi. In this latter case, when the node is
subsequently restore, the new rpi allocation overwrites the rpi that was
not released, and the rpi is now leaked. Eventually the port will run out
of RPI resources to log into new remote ports.
Fix by following changes:
- When an rpi is released, do so under locks and ensure the node rpi value
is set to a non-allocated value (LPFC_RPI_ALLOC_ERROR). Note:
refactored to a small service routine to avoid indentation issues.
- When re-enabling a node, check the rpi value to determine if a new
allocation is necessary. If already set, use the prior rpi.
Enhanced logging to help in the future.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When connected to a high number of remote ports, the driver is encountering
PLOGI errors. The errors are due to adapter detected failures indicating
illegal field values.
Turns out the driver was prematurely clearing an RPI bitmask before waiting
for an UNREG_RPI mailbox completion. This allowed the RPI to be reused
before it was actually available.
Fix by clearing RPI bitmask only after UNREG_RPI mailbox completion.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
As spin_unlock_irq will enable interrupts.
Function lpfc_findnode_rpi is called from
lpfc_sli_abts_err_handler (./drivers/scsi/lpfc/lpfc_sli.c)
<- lpfc_sli_async_event_handler
<- lpfc_sli_process_unsol_iocb
<- lpfc_sli_handle_fast_ring_event
<- lpfc_sli_fp_intr_handler
<- lpfc_sli_intr_handler
and lpfc_sli_intr_handler is an interrupt handler.
Interrupts are enabled in interrupt handler. Use
spin_lock_irqsave/spin_unlock_irqrestore instead of spin_(un)lock_irq in
IRQ context to avoid this.
Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch adds general RSCN support:
- The ability to transmit an RSCN to the port on the other end of
the link (regular port if pt2pt, or fabric controller if fabric).
- And general recognition of an RSCN ELS when an ELS is received.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Currently the FC-NVMe driver is leverating the SCSI FC transport class to
access the remote ports. Which means that all FC-NVMe remote ports will be
visible to the fc transport layer, but due to missing definitions the port
roles will always be 'unknown'. This patch adds the missing definitions to
the fc transport class to that the port roles are correctly displayed.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Giridhar Malavali <gmalavali@marvell.com>
Reviewed-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch avoids that the compiler warns about missing fall-through
annotation when building with W=1.
Cc: James Smart <james.smart@broadcom.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If all the trunk links drop and a single link resumes, the link_state is
not properly reported. When trunked, the driver receives two async
cqes. One acqe reports the trunk link states, which the driver records.
The other cqe reports the overall state of the trunk. In the failing case,
the trunk link state acqe preceeds the overall trunk link state acqe. The
trunk link state acqe, as it's an "up" transition, calls a code path which
ensures a down transition before moving to the up state. The down
transition had a side effect of clearing the just-saved trunk link states.
Fix by not clearing the trunk link states if we've already transitioned
to a down state.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During debug, it was seen that the driver is issuing commands specific to
SLI3 on SLI4 devices. Although the adapter correctly rejected the command,
this should not be done.
Revise the code to stop sending these commands on a SLI4 adapter.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When unloading the driver, mailbox commands may be sent without holding a
reference on the ndlp. By the time the mailbox command completes, the ndlp
may have reduced its ref counts and been freed. The problem was reported
by KASAN.
While unregistering due to driver unload, have the completion noop'd by
setting the ndlp context NULL'd. Due to the unload, no further action was
necessary. Also, while reviewing this path, the generic nulling of the
context after handling should be slightly moved.
Reported by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For files modified as part of 12.2.0.0 patches, update copyright to 2019
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The conversion to enable SCSI and NVME fc4 support ran into an issue with
NPIV support. With NVME, NPIV is not currently supported, but with SCSI it
was. The driver reverted to its lowest setting meaning NPIV with SCSI was
not allowed.
Convert the NPIV checks and implementation so that SCSI can continue to
allow NPIV support.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Both NVME and SCSI aborts are now processed off the CQ workqueue and do not
generate events for the slowpath any more.
Remove the unused event code.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When a target's link dropped, an RSCN was received to communicate the
change. The driver detected the loss of the target and issued and UNREG_RPI
mailbox command. While that was being processed, another RSCN was received
to communicate the port coming back. The driver deferred the PLOGI to the
port until the mailbox command finishes. When the mailbox command completed
it saw the pending port and called the routines to issue the
PLOGI. However, it forgot to clear the UNREG_INP state flag, so the PLOGI
xmt routine nooped the PLOGI request assuming it needed to wait for the
mailbox command. At this point, login would never be re-attempted.
Clear UNREG_INP before issuing the deferred PLOGI.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If the adapter is taken offline, the trunk link port attributes continue to
report trunk links as up even though all links are down as the adapter is
offline.
Clear the trunk links state as part of taking the adapter offline.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Addition of support for if_type=6 missed several checks for interface type,
resulting in the failure of several key management features such as
firmware dump and loopback testing.
Correct the checks on the if_type so that both SLI4 IF_TYPE's 2 and 6 are
supported.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The current discovery state machine the driver treated FLOGI oddly. When
point to point, an FLOGI is to be exchanged by the two ports, with the port
with the most significant WWN then proceeding with PLOGI. The
implementation in the driver was keyed to closely with "what have I sent",
not with what has happened between the two endpoints. Thus, it blatantly
would ACC an FLOGI, but reject PLOGI's until it had its FLOGI ACC'd. The
problem is - the sending of FLOGI may be delayed for some reason, or the
response to FLOGI held off by the other side. In the failing situation the
other side sent an FLOGI, which was ACC'd, then sent PLOGIs which were then
rjt'd until the retry count for the PLOGIs were exceeded and the port gave
up. The FLOGI may have been very late in transmit, or the response held off
until the PLOGIs failed. Given the other port had the higher WWN, no PLOGIs
would occur and communication stopped.
Correct the situation by changing the FLOGI handling. Defer any response to
an FLOGI until the driver has sent its FLOGI as well. Then, upon either
completion of the sent FLOGI, or upon sending an ACC to a received FLOGI
(which may be received before or just after FLOGI was sent). the driver
will act on who has the higher WWN. if the other port does, the driver will
noop any handling of an FLOGI response (if outstanding) and wait for PLOGI.
If the local port does, the driver will transition to sending PLOGI and
will noop any action on responding to an FLOGI (if not yet received).
Fortunately, to implement this, it only took another state flag and
deferring any FLOGI response if the FLOGI has yet to be transmit. All
subsequent actions were already in place.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver is getting hit with 100s of RSCNs during remote port address
changes. Each of those RSCN's ends up generating UNREG_RPI and REG_PRI
mailbox commands. The discovery engine within the driver doesn't wait for
the mailbox command completions. Instead it sets state flags and moves
forward. At some point, there's a massive backlog of mailbox commands which
take time for the adapter to process. Additionally, it appears there were
duplicate events from the switch so the driver generated duplicate mailbox
commands for the same remote port. During this window, failures on PLOGI
and PRLI ELS's are see as the adapter is rejecting them as they are for
remote ports that still have pending mailbox commands.
Streamline the discovery engine so that PLOGI log checks for outstanding
UNREG_RPIs and defer the processing until the commands complete. This
better synchronizes the ELS transmission vs the RPI registrations.
Filter out multiple UNREG_RPIs being queued up for the same remote port.
Beef up log messages in this area.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver data structure for managing a mailbox command contained two
context fields. Unfortunately, the context were considered "generic" to be
used at the whim of the command code. Of course, one section of code used
fields this way, while another did it that way, and eventually there were
mixups.
Refactored the structure so that the generic contexts become a node context
and a buffer context and all code standardizes on their use.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add trunking support to the driver. Trunking is found on more recent
asics. In general, trunking appears as a single "port" to the driver
and overall behavior doesn't differ. Link speed is reported as an
aggregate value, while link speed control is done on a per-physical
link basis with all links in the trunk symmetrical. Some commands
returning port information are updated to additionally provide
trunking information. And new ACQEs are generated to report physical
link events relative to the trunk.
This patch contains the following modifications:
- Added link speed settings of 128GB and 256GB.
- Added handling of trunk-related ACQEs, mainly logging and trapping
of physical link statuses.
- Added additional bsg interface to query trunk state by applications.
- Augment link_state sysfs attribtute to display trunk link status
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The switches seem to respond faster to GID_PT vs GID_FT NameServer
queries. Add support for GID_PT to be used over GID_FT to enable
faster storage failover detection. Includes addition of new module
parameter to select between GID_PT and GID_FT (GID_FT is default).
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Testing a point-to-point topology and a case of re-FLOGI without
intervening link bouncing, showed an odd interaction with firmware and
a resulting scenario where the driver no longer probed after accepting
the new FLOGI.
Work around the firmware issue by issuing a link bounce if a FLOGI is
received after the link is already up and FLOGI's accepted.
While debugging the issue, realized that some debug traces should be
clarified to help in the future.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
On FCoE adapters, when running link bounce test in a loop, initiator
failed to login with switch switch and required driver reload to
recover. Switch reached a point where all subsequent FLOGIs would be
LS_RJT'd. Further testing showed the condition to be related to not
performing FCF discovery between FLOGI's.
Fix by monitoring FLOGI failures and once a repeated error is seen
repeat FCF discovery.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/scsi/lpfc/lpfc_hbadisc.c: In function 'lpfc_free_tx':
drivers/scsi/lpfc/lpfc_hbadisc.c:5431:19: warning:
variable 'psli' set but not used [-Wunused-but-set-variable]
Since commit 895427bd01 ("scsi: lpfc: NVME Initiator: Base modifications")
'psli' is not used any more.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When taking the board offline while performing i/o, unsafe locking errors
occurred and irq level isn't properly managed.
In lpfc_sli_hba_down, spin_lock_irqsave(&phba->hbalock, flags) does not
disable softirqs raised from timer expiry. It is possible that a softirq is
raised from the lpfc_els_retry_delay routine and recursively requests the same
phba->hbalock spinlock causing deadlock.
Address the deadlocks by creating a new port_list lock. The softirq behavior
can then be managed a level deeper into the calling sequences.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Driver only sends NVME PRLI to a device that also supports FCP. This resuls
in remote ports that don't have fc_remote_ports created for them. The driver
is clearing the nlp_fc4_type for a ndlp at the wrong time.
Fix by moving the nlp_fc4_type clearing to the discovery engine in the
DEVICE_RECOVERY state. Also ensure that rport registration is done for all
nlp_fc4_types.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Change references from "Broadcom Limited" to "Broadcom Inc." in the
copyright message. Update copyright duration if not yet updated for 2018.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
MDS diagnostics fail because of frame count mismatch.
Unavailability of SGL is the trigger for this issue. If ELS SGL is not
available to process MDS frame, IOCB is put in FCP txq but not attempted to
post afterwards. So, driver stops processing incoming frames as it runs out
of IOCB. lpfc_drain_txq attempts to submit IOCBS that are queued in ELS
txq but MDS frames are posted to FCP WQ.
Attempt to submit IOCBs that are present in FCP txq when MDS loopback is
running.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Remote port disappearance/reappearances would cause a series of RSCN
events to be delivered to the driver. During the resulting GID_FT
handling, the driver clears the fc4 settings on the remote port, which
makes it skip registration. As such, the nvme associations eventually
fail and return io errors to the applications.
Correct by not clearng the nlp_fc4_types for all nodes in
lpfc_issue_gidft. Instead, when the GID_FT response is handled, clear
the nlp_fc4_types of FCP and NVME prior to evaluating the fc4_type
returned by the GID_FT response. This approach leaves "skipped" nodes
with their nlp_fc4_types intacted.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When a port is configured for NVME and SCSI Initiator support and it probes
a target supporting both SCSI and NVME, NVME devices are discovered, but
SCSI devices are not.
The nlp_fc4_type for all NPorts should be cleared on Link Up or just before
GID_FTs get issued, as opposed to just during GID_FT cmpl. RSCN activity as
well as Link Up can trigger GID_FT. One GID_FT may complete before the next
one is issued.
Fix by clearng nlp_fc4_type on link up and just before both GID_FTs are
issued. During port swapping, copy nlp_fc4_type to the new ndlp
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The G7 adapter supports 64G link speeds. Add support to the driver.
In addition, a small cleanup to replace the odd bitmap logic with
a switch case.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Updated Copyright in files updated 11.4.0.7
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During link bounce testing in a point-to-point topology, the host may
enter a soft lockup on the lpfc_worker thread:
Call Trace:
lpfc_work_done+0x1f3/0x1390 [lpfc]
lpfc_do_work+0x16f/0x180 [lpfc]
kthread+0xc7/0xe0
ret_from_fork+0x3f/0x70
The driver was simultaneously setting a combination of flags that caused
lpfc_do_work()to effectively spin between slow path work and new event
data, causing the lockup.
Ensure in the typical wq completions, that new event data flags are set
if the slow path flag is running. The slow path will eventually
reschedule the wq handling.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver currently registers any remote port that has NVME support.
It should only be registering target ports.
Register only target ports.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During RSCN storms, the driver does not rediscover some targets. The
driver marks some RSCN as to be handled after the ones it's working
on. The driver missed processing some deferred RSCN.
Move where the driver checks for deferred RSCNs and initiate deferred
RSCN handling if the flag was set. Also revise nport state within the
RSCN confirm routine. Add some state data to a possible debug print to
aid future debugging.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
XRI_ABORTED_CQE completions were not being handled in the fast path.
They were being queued and deferred to the lpfc worker thread for
processing. This is an artifact of the driver design prior to moving
queue processing out of the isr and into a workq element. Now that queue
processing is already in a deferred context, remove this artifact and
process them directly.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This is mostly updates of the usual suspects: lpfc, qla2xxx, hisi_sas,
megaraid_sas, pm80xx, mpt3sas, be2iscsi, hpsa. and a host of minor
updates.
There's no major behaviour change or additions to the core in all of
this, so the potential for regressions should be small (biggest
potential being in the scsi error handler changes).
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJaCxtCAAoJEAVr7HOZEZN4d9EQAI+OHP6ss6zjKKC21c9jNPcH
NhLrNv37gHg/LA2VXeUEL9RGUjCGLIUrI4HsrxzkFAMLKP4TkshMs8/2RvczY+Sa
VpayPqVybEKLIS6ipQyM1SLIQff2nvtDVcN/T+8z1lkk45TrbA6ZGuwUwd2aJyEA
2V2wtg51ObnL0Nr9QPPll0JrtL1AnCZyRlu9XrwTZuuSBZwk93opIuuvbZm/3dVg
Ir4GSS4Y+PuHIfu4cxqdsPMdzRdY9I2me1YiE4jeFSn1/VTAjL4HBz7fO9eITT42
VhXSpDz1XvFsa9dJ0ubkqoALpJzCfOcBw+EuGvSydLEvOBoEVwMccdfaD9lT1zc5
L9e1Z5qqJoq7hTA6xTXCYfWG73I9HYvljtmc8yudKHhADOdnSTUXhaO6uBF0RNqD
OxPSA1RZwRx3c6lDOcK6BTtvLAkTEuYKdrWSKJi0w+QXJAyQ6etqbmsKpmPdRim7
Z4ZSpJFro2gyo9gcdJO0ykTG+z3U7Z/ay1sNgnuprsv+eU/QjUdlAPl18o79EkRf
H54zZggZ4wC6q/cFVVt4Vx+V+oqIeu38s7NDXS9UltLoTZPm2EzDW6pXd/38Z4Tf
a1oBAUET8kYLC90P8sVZxUIHZjITlpgDbyE2Lq00PMYXhk8S4IxF0aMN5RvVqzUv
+7N2HrHkSSgG1nhw1t+E
=3O85
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This is mostly updates of the usual suspects: lpfc, qla2xxx, hisi_sas,
megaraid_sas, pm80xx, mpt3sas, be2iscsi, hpsa. and a host of minor
updates.
There's no major behaviour change or additions to the core in all of
this, so the potential for regressions should be small (biggest
potential being in the scsi error handler changes)"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (203 commits)
scsi: lpfc: Fix hard lock up NMI in els timeout handling.
scsi: mpt3sas: remove a stray KERN_INFO
scsi: mpt3sas: cleanup _scsih_pcie_enumeration_event()
scsi: aacraid: use timespec64 instead of timeval
scsi: scsi_transport_fc: add 64GBIT and 128GBIT port speed definitions
scsi: qla2xxx: Suppress a kernel complaint in qla_init_base_qpair()
scsi: mpt3sas: fix dma_addr_t casts
scsi: be2iscsi: Use kasprintf
scsi: storvsc: Avoid excessive host scan on controller change
scsi: lpfc: fix kzalloc-simple.cocci warnings
scsi: mpt3sas: Update mpt3sas driver version.
scsi: mpt3sas: Fix sparse warnings
scsi: mpt3sas: Fix nvme drives checking for tlr.
scsi: mpt3sas: NVMe drive support for BTDHMAPPING ioctl command and log info
scsi: mpt3sas: Add-Task-management-debug-info-for-NVMe-drives.
scsi: mpt3sas: scan and add nvme device after controller reset
scsi: mpt3sas: Set NVMe device queue depth as 128
scsi: mpt3sas: Handle NVMe PCIe device related events generated from firmware.
scsi: mpt3sas: API's to remove nvme drive from sml
scsi: mpt3sas: API 's to support NVMe drive addition to SML
...
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: James Smart <james.smart@broadcom.com>
Cc: Dick Kennedy <dick.kennedy@broadcom.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Local Reject/Invalid RPI errors seen during discovery.
Temporary RPI cleanup was occurring regardless of SLI rev. It's only
necessary on SLI-4.
Adjust the test for whether cleanup is necessary.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver crashes when attempting to use a freed ndpl pointer.
The pci_remove_one handler runs on a separate kernel thread. The order
of the removal is starting by freeing all of the ndlps and then
disabling interrupts. In between these two events the driver can still
receive an ELS and process it. When it tries to use the ndlp pointer
will be NULL
Change the order of the pci_remove_one vs disable interrupts so that
interrupts are disabled before the ndlp's are freed.
Cc: <stable@vger.kernel.org> # 4.12+
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add Buffer to buffer credit recovery support to the driver. This is a
negotiated feature with the peer that allows for both sides to detect
dropped RRDY's and FC Frames and recover credit.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In a server with an 8G adapter and a 32G adapter, running NVME and FCP,
the server would crash with the following stack.
RIP: 0010: ... lpfc_nvme_register_port+0x38/0x420 [lpfc]
lpfc_nlp_state_cleanup+0x154/0x4f0 [lpfc]
lpfc_nlp_set_state+0x9d/0x1a0 [lpfc]
lpfc_cmpl_prli_prli_issue+0x35f/0x440 [lpfc]
lpfc_disc_state_machine+0x78/0x1c0 [lpfc]
lpfc_cmpl_els_prli+0x17c/0x1f0 [lpfc]
lpfc_sli_sp_handle_rspiocb+0x39b/0x6b0 [lpfc]
lpfc_sli_handle_slow_ring_event_s3+0x134/0x2d0 [lpfc]
lpfc_work_done+0x8ac/0x13b0 [lpfc]
lpfc_do_work+0xf1/0x1b0 [lpfc]
Crash, on the 8G adapter, is due to a vport which does not have a nvme
local port structure. It's not supposed to have one. NVME is not
supported on the 8G adapter, so the NVME PRLI, which started this flow
shouldn't have been sent in the first place.
Correct discovery engine to recognize when on an SLI3 rport, which
doesn't support SLI3, if the rport supports only NVME, don't send a NVME
PRLI. Instead, as no FC4 will be used, a LOGO is sent. If rport is FCP
and NVME, only execute the SCSI PRLI.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When unloading the driver, the NVMET driver would wait the full 30
seconds for its UNMAPPED initiator node to get removed before continuing
with the unload process. NVMEI worked correctly.
For each rport put into UNMAPPED or MAPPED state by NVMET, the driver
puts a reference on the NDLP. The difference is that NVMEI has a
unregister call for its rports and the extra reference is removed in the
unregister process. For NVMET, the driver has to remove the reference
explicitly when dropping out of UNMAPPED or MAPPED because there is no
unregister call.
Add a call to lpfc_nlp_put on the ndlp when NVMET and the old state was
UNMAPPED or MAPPED.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Lun Priority level shown as NA
Remote port is not getting registered for nameserver and fdmi. Due to
which dfc SendCTPassThru cmd is failing.
Made changes to register the remote port for both.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Added code to support Cisco MDS loopback diagnostic. The diagnostics run
various loopbacks including one which loops-back frame through the
driver.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver panic when using the els_wq during port reset.
Check for NULL els_wq before dereferencing.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
NVMET didn't have any RSCN handling at all and
would not execute implicit LOGO when receiving a PLOGI
from an rport that NVMET had in state UNMAPPED.
Clean up the logic in lpfc_nlp_state_cleanup for
initiators (FCP and NVME). NVMET should not respond to
RSCN including allocating new ndlps so this code was
conditionalized when nvmet_support is true. The check
for NLP_RCV_PLOGI in lpfc_setup_disc_node was moved
below the check for nvmet_support to allow the NVMET
to recover initiator nodes correctly. The implicit
logo was introduced with lpfc_rcv_plogi when NVMET gets
a PLOGI on an ndlp in UNMAPPED state. The RSCN handling
was modified to not respond to an RSCN in NVMET. Instead
NVMET sends a GID_FT and determines if an NVMEP_INITIATOR
it has is UNMAPPED but no longer in the zone membership.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Adding support for Fabric assigned WWPN and WWNN.
Firmware sends first FLOGI to fabric with vendor version changes.
On link up driver gets updated service parameter with FAWWN assigned port
name. Driver sends 2nd FLOGI with updated fawwpn and modifies the
vport->fc_portname in driver.
Note:
Soft wwpn will not be allowed when fawwpn is enabled.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
When RPI is not available, driver sends WQE with invalid RPI value and
rejected by HBA.
lpfc 0000:82:00.3: 1:3154 BLS ABORT RSP failed, data: x3/xa0320008
and
lpfc :2753 PLOGI failure DID:FFFFFA Status:x3/xa0240008
In this case, driver accesses rpi_ids array out of bounds.
Fix:
Check return value of lpfc_sli4_alloc_rpi(). Do not allocate
lpfc_nodelist entry if RPI is not available.
When RPI is not available, we will get discovery timeouts and
command drops for some of the vports as seen below.
lpfc :0273 Unexpected discovery timeout, vport State x0
lpfc :0230 Unexpected timeout, hba link state x5
lpfc :0111 Dropping received ELS cmd Data: x0 xc90c55 x0
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
This patch addresses the smatch issues identified by Dan Carpenter
in http://www.spinics.net/lists/linux-scsi/msg105663.html
The issues are:
drivers/scsi/lpfc/lpfc_hbadisc.c:316 lpfc_dev_loss_tmo_handler()
warn: we tested 'vport->load_flag & 2' before and it was 'false'
Action: removed item from test
drivers/scsi/lpfc/lpfc_hbadisc.c:701 lpfc_work_done()
warn: test_bit() takes a bit number
Action: changed definition so bit number
drivers/scsi/lpfc/lpfc_hbadisc.c:2206 lpfc_mbx_cmpl_fcf_scan_read_fcf_rec()
error: uninitialized symbol 'vlan_id'.
drivers/scsi/lpfc/lpfc_hbadisc.c:2582 lpfc_mbx_cmpl_fcf_rr_read_fcf_rec()
error: uninitialized symbol 'vlan_id'.
drivers/scsi/lpfc/lpfc_hbadisc.c:2683 lpfc_mbx_cmpl_read_fcf_rec() error:
uninitialized symbol 'vlan_id'.
Action: initilized value
drivers/scsi/lpfc/lpfc_hbadisc.c:4025 lpfc_register_remote_port()
error: we previously assumed 'rdata' could be null (see line 4023)
Action: refactored check block
drivers/scsi/lpfc/lpfc_hbadisc.c:4613 lpfc_sli4_dequeue_nport_iocbs()
error: double unlock 'irq:'
Action: removed inner irq reference
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
previous code did little more than log a message.
This patch adds abort path support, modeled after the SCSI code paths.
Currently addresses only the initiator path. Target path under
development, but stubbed out.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch shortens every init_timer in lpfc module followed by function
and data assignment using setup_timer. This is purely cleanup patch, it
does not add new functionality nor remove any existing functionality.
An init_timer call in this form:
init_timer(&vport->fc_disctmo);
vport->fc_disctmo.function = lpfc_disc_timeout;
vport->fc_disctmo.data = vport;
is shortened to:
setup_timer(&vport->fc_disctmo, lpfc_disc_timeout, vport);
It increases readability and reduces chances of mistakes done by
developers.
Signed-off-by: Tomas Jasek <tomsik68@gmail.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: James Smart <james.smart@broadcom.com>
Cc: Dick Kennedy <dick.kennedy@broadcom.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: <linux-scsi@vger.kernel.org>
Acked-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Update copyrights to 2017 for all files touched in this patch set
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
NVME Target: Tie in to NVME Fabrics nvmet_fc LLDD target api
Adds the routines to:
- register and deregister the FC port as a nvmet-fc targetport
- binding of nvme queues to adapter WQs
- receipt and passing of NVME LS's to transport, sending transport response
- receipt of NVME FCP CMD IUs, processing FCP target io data transmission
commands; transmission of FCP io response
- Abort operations for tgt io exchanges
[mkp: fixed space at end of file warning]
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
NVME Target: Merge into FC discovery
Adds NVME PRLI handling and Nameserver registrations for NVME
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
NVME Initiator: Tie in to NVME Fabrics nvme_fc LLDD initiator api
Adds the routines to:
- register and deregister the FC port as a nvme-fc initiator localport
- register and deregister remote FC ports as a nvme-fc remoteport
- binding of nvme queues to adapter WQs
- send/perform NVME LS's
- send/perform NVME FCP initiator io operations
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
NVME Initiator: Merge into FC discovery
Adds NVME PRLI support and Nameserver registrations and Queries for NVME
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
NVME Initiator: Base modifications
This patch adds base modifications for NVME initiator support.
The base modifications consist of:
- Formal split of SLI3 rings from SLI-4 WQs (sometimes referred to as
rings as well) as implementation now widely varies between the two.
- Addition of configuration modes:
SCSI initiator only; NVME initiator only; NVME target only; and
SCSI and NVME initiator.
The configuration mode drives overall adapter configuration,
offloads enabled, and resource splits.
NVME support is only available on SLI-4 devices and newer fw.
- Implements the following based on configuration mode:
- Exchange resources are split by protocol; Obviously, if only
1 mode, then no split occurs. Default is 50/50. module attribute
allows tuning.
- Pools and config parameters are separated per-protocol
- Each protocol has it's own set of queues, but share interrupt
vectors.
SCSI:
SLI3 devices have few queues and the original style of queue
allocation remains.
SLI4 devices piggy back on an "io-channel" concept that
eventually needs to merge with scsi-mq/blk-mq support (it is
underway). For now, the paradigm continues as it existed
prior. io channel allocates N msix and N WQs (N=4 default)
and either round robins or uses cpu # modulo N for scheduling.
A bunch of module parameters allow the configuration to be
tuned.
NVME (initiator):
Allocates an msix per cpu (or whatever pci_alloc_irq_vectors
gets)
Allocates a WQ per cpu, and maps the WQs to msix on a WQ #
modulo msix vector count basis.
Module parameters exist to cap/control the config if desired.
- Each protocol has its own buffer and dma pools.
I apologize for the size of the patch.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
----
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This contains code cleanups that were in the prior patch set.
This allows better review of real changes later.
minor code cleanups:
fix indentation, punctuation, line length
addition/reduction of whitespace
remove unneeded parens, braces
lpfc_debugfs_nodelist_data: print as u64 rather than byte by byte
covert printk(KERN_ERR to pr_err
small print string deltas
use num_present_cpus() rather than count them
comment updates
rctl/type names moved to module variable, not on stack
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Since we need to change the implementation, stop exposing internals.
Provide kref_read() to read the current reference count; typically
used for debug messages.
Kills two anti-patterns:
atomic_read(&kref->refcount)
kref->refcount.counter
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The default rpi completion handler does back to back puts to force the
removal of the ndlp. This ends up calling lpfc_unreg_rpi after the
reference count is at 0.
Fix: Check the reference count of the ndlp before getting the ref to
make sure we are not getting a reference on a removed object.
Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Several functions in lpfc have comments stating that the function must
be called with the hbalock (or hostlock, or ringlock) held. Add
lockdep_assert_held() annotations to these functions, so one can
actually verify the locks are held.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Acked-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Use new FDMI speed definitions for 10G, 25G and 40G FCoE.
Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Modularize, cleanup, add comments - for FDMI code in driver
Note: I don't like the comments with leading # - but as we have a lot if
present, I'm deferring to handle it in one big fix later.
Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix RegLogin failed error seen on Lancer FC during port bounce
Fix the statemachine and ref counting.
Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix the FLOGI discovery logic to comply with T11 standards
We weren't properly setting fabric parameters, such as R_A_TOV and E_D_TOV,
when we registered the vfi object in default configs and pt2pt configs.
Revise to now pass service params with the values to the firmware and
ensure they are reset on link bounce. Required reworking the call sequence
in the discovery threads.
Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Initial link up defaults were not properly being tracked relative to
initial FLOGI or pt2pt PLOGI. Add code to initialize them.
Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Remove set but not used variables.
Signed-off-by: Sebastian Herbszt <herbszt@gmx.de>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Use logical instead of bitwise AND.
Signed-off-by: Sebastian Herbszt <herbszt@gmx.de>
Reviewed-by: James Smart <james.smart@avagotech.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
The domain controller PLOGI's concurrent with prior LOGO's/unreg_rpi's
completing created a race condition where driver rpi ref count can
inadvertantly hit 0 and the rpi attempted to be freed. This error
sometimes resulted in Warning messages indicating kref.h via
lfpc_nlp_get+0x128.
Correct by dropping any new PLOGI until the prior nport state has settled.
Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Correct locking and refcounting in tracking our rports
Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Fix incorrect reference counting
Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
The order (it's a shall, but hard to dictate after the fact) is given in
FC-SCM - kind of. SCM indicates what shall be implemented, lists it as (a),
(b), (c), but actually doesn't say it has to be in that order. The only hard
requirement, called out in FCP-4, is that you must register your FC-4 Type
(via RFT_ID) before registering FC-4 Type Features (via RFF_ID), which makes
sense. We obviously violated this and there were some switches (or newer fw in
them) that enforced it. The other rule of thumbs are: register your data with
the switch first, then register for SCRs, then do queries about the fabric,
with the SCRs telling you of changes post the queries.
Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sebastian Herbszt <herbszt@gmx.de>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Update copyright to 2015
Signed-off-by: Dick Kennedy <dick.kennedy@emulex.com>
Signed-off-by: James Smart <james.smart@emulex.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Currently, the driver plays off the fact that older sli4 adapters have a
different rpi access pattern that allowed for the rpi reference to be
released earlier in the teardown sequence, allowing the driver to recycle
the rpi value sooner. Newer sli4 adapters have a different access pattern that
requires us to wait for a later mailbox completion. This changes the put
call location on the newer sli4 adapters.
Symptoms of the error are "0110 ELS" and the "0372 iotag" errors.
Signed-off-by: Dick Kennedy <dick.kennedy@emulex.com>
Signed-off-by: James Smart <james.smart@emulex.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Dick Kennedy <dick.kennedy@emulex.com>
Signed-off-by: James Smart <james.smart@emulex.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Dick Kennedy <dick.kennedy@emulex.com>
Signed-off-by: James Smart <james.smart@emulex.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Fixed Low priority issues from lpfc given by fortify source code scan.
Signed-off-by: James Smart <james.smart@emulex.com>
Signed-off-by: Dick Kennedy <dick.kennedy@emulex.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Fix crash from page fault caused by use after rport delete.
Signed-off-by: James Smart <james.smart@emulex.com>
Signed-off-by: Dick Kennedy <dick.kennedy@emulex.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Fix race between LOGO/PLOGI handling causing NULL pointer
Signed-off-by: James Smart <james.smart@emulex.com>
Signed-off-by: Dick Kennedy <dick.kennedy@emulex.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Fix quarantined XRI recovery qualifier state in link bounce
Signed-off-by: James Smart <james.smart@emulex.com>
Signed-off-by: Dick Kennedy <dick.kennedy@emulex.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
In prandom we have already reseeding mechanisms that trigger
periodically from a much better entropy source than just
feeding in jiffies through lpfc_mbx_cmpl_fcf_scan_read_fcf_rec()
[what a function name 8-)]. Therefore, just remove this.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Reviewed-by: James Smart <james.smart@emulex.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Mark functions as static in lpfc/lpfc_hbadisc.c because they are not
used outside this file.
This eliminates the following warnings in lpfc/lpfc_hbadisc.c:
drivers/scsi/lpfc/lpfc_hbadisc.c:2047:5: warning: no previous prototype for ‘lpfc_sli4_fcf_pri_list_add’ [-Wmissing-prototypes]
drivers/scsi/lpfc/lpfc_hbadisc.c:2681:1: warning: no previous prototype for ‘lpfc_init_vfi_cmpl’ [-Wmissing-prototypes]
drivers/scsi/lpfc/lpfc_hbadisc.c:4432:1: warning: no previous prototype for ‘lpfc_nlp_logo_unreg’ [-Wmissing-prototypes]
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: James Smart <james.smart@emulex.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
This is just a couple of drivers (hpsa and lpfc) that got left out for further
testing in linux-next. We also have one fix to a prior submission (qla2xxx
sparse).
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJTm48MAAoJEDeqqVYsXL0M1YEH/iZyEILT4EIZxre/tspqX/LB
dxtGlmlF8AEU8/Eze3k/OB5nSuGcnYZ1hN1CgT2zZEv+sih6FekQOQV06qTwzwbo
DnWA3dOrPVgMzzSVvXFEjryroIUNhZvMy8TGu+DefE9b6FUs6B3VZlMR3A+TcSgV
cgknkG2Q6mWN8rO44pTSVlVDe2JpkvCYsHnqhO8uneQXVHNtsPpV7FfoLMLjBUDX
dgsaDiUjyrj0sdR1yOgRjDH68FPewEiEONdtKi63kkI6zWDFASiKDY9yc1eIyjVd
/1gbBJxwTRl4dWEdsigr/pOBxs6yjXGBSl/6PPDtuvdpWLFWUg4C2XtDLz0KLfU=
=tdDT
-----END PGP SIGNATURE-----
Merge tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull more SCSI updates from James Bottomley:
"This is just a couple of drivers (hpsa and lpfc) that got left out for
further testing in linux-next. We also have one fix to a prior
submission (qla2xxx sparse)"
* tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (36 commits)
qla2xxx: fix sparse warnings introduced by previous target mode t10-dif patch
lpfc: Update lpfc version to driver version 10.2.8001.0
lpfc: Fix ExpressLane priority setup
lpfc: mark old devices as obsolete
lpfc: Fix for initializing RRQ bitmap
lpfc: Fix for cleaning up stale ring flag and sp_queue_event entries
lpfc: Update lpfc version to driver version 10.2.8000.0
lpfc: Update Copyright on changed files from 8.3.45 patches
lpfc: Update Copyright on changed files
lpfc: Fixed locking for scsi task management commands
lpfc: Convert runtime references to old xlane cfg param to fof cfg param
lpfc: Fix FW dump using sysfs
lpfc: Fix SLI4 s abort loop to process all FCP rings and under ring_lock
lpfc: Fixed kernel panic in lpfc_abort_handler
lpfc: Fix locking for postbufq when freeing
lpfc: Fix locking for lpfc_hba_down_post
lpfc: Fix dynamic transitions of FirstBurst from on to off
hpsa: fix handling of hpsa_volume_offline return value
hpsa: return -ENOMEM not -1 on kzalloc failure in hpsa_get_device_id
hpsa: remove messages about volume status VPD inquiry page not supported
...
Fix for initializing RRQ bitmap
Signed-off-by: James Smart <james.smart@emulex.com>
Reviewed-By: Dick Kennedy <dick.kennedy@emulex.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>