The ERP thread is performing a task before it is executing the
corresponding down on the semaphore. The response handler of the
just started exchange config should wait for the completion by
performing a down on this semaphore. Since this semaphore is still
positive from the ERP enqueue the handler won't wait and therefore
the exchange config will always fail leaving the adapter in error.
The problem can be solved by performing the down on the semaphore
before starting an ERP task. This is the logically correct order.
Only walk the ERP loop if there is a task to perform.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The nameserver port might be in state online when the adapter is
offlined. On adapter reactivation the nameserver port is not
re-opened due to the PORT_ONLINE status. This results in an
unsuccessful recovery. In forcing the nameserver port status
to offline on all adapter offline events this issue is prevented.
Waiting for the reference count to drop to zero in
zfcp_wka_port_offline is not required, so remove it.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
When running the scsi_scan from the zfcp workqueue and the target
device does not respond, the zfcp workqueue can block until the
scsi_scan hits a timeout. Move the work to the scsi host workqueue,
since this one is also used for the scan from the SCSI midlayer.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Ensure the refcounting is correct even if we were not able to
schedule a work. In addition we have to make sure no scheduled
work is pending while we're dequeing the adapter from the
systems environment.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Use the I/O blocking mechanism in the FC transport class to allow
faster failovers for multipathing:
- Call fc_remote_port_delete early to set the rport to BLOCKED.
- Check the rport status in queuecommand with fc_remote_portchkready
to no longer accept new I/O for this port and fail the I/O with the
appropriate scsi_cmnd result.
- Implement the terminate_rport_io handler to abort all pending I/O
requests
- Return SCSI commands with DID_TRANSPORT_DISRUPTED while erp is
running.
- When updating the remote port status, check for late changes and
update the remote ports status accordingly.
Acked-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The current number based id ERP logging is replaced by a string
based tag version. The benefit is an easier location of the code in
question and the removal of the lengthy array referencing the
individual messages.
The string (7 bytes) based version does not use more space since those
bytes were "used" anyway due to the alignment of the structure.
The encoding of the 7 byte string is as follows
[0-1] = filename
[2-5] = task/function
[6] = section
Due to the character of this string (fixed length) a string
termination is not required here.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
An adapter close was always performed whether it was required,
(e.g. in an error scenario) or not (e.g. initial open).
This patch is changing the process in only doing an
adapter close when it is required.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Use the device pointer in zfcp_unit for tracking if we have a
registered SCSI device. With this approach, the flag
ZFCP_STATUS_UNIT_REGISTERED is only redundant and can be removed.
Acked-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
PORT_PHYS_CLOSING is only set and cleared, but not actually used
for status checking.
PORT_INVALID_WWPN is set when the GID_PN request does not return
a d_id for a remote port, e.g. when a remote port has been
unplugged. For this case, the d_id is zero. In the erp we can
check the d_id and use the normal escalation procedure that gives
up after three retries and remove the special case.
PORT_NO_WWPN is unused: Each port in the remote port list has a
valid wwpn. The WKA ports are now tracked outside the port
list. Remove the PORT_NO_WWPN flag, since this is no longer set
for any port.
Acked-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The port flag DID_DID indicates whether we know the current id of the
port. This is always set in parallel. Since the id 0 is invalid
(because the port id 0 is invalid) we can remove the DID_DID flag:
d_id of 0 indicates an invalid d_id != 0 is a valid one.
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Acked-by: Felix Beck <felix@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Get rid of this one:
drivers/s390/scsi/zfcp_erp.c: In function 'zfcp_erp_thread':
drivers/s390/scsi/zfcp_erp.c:1400: warning: ignoring return value of
'down_interruptible', declared with attribute warn_unused_result
zfcp_erp_thread is a kernel thread which can't receive any signals.
So introduce a dummy variable and get rid of the warning.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Register zfcp with the new /proc/service_level interface to report the
FCP microcode level. When the adapter goes offline or a channel path
disappears, zfcp unregisters, since the microcode version might change
and zfcp does not know about it.
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Waiting for the ERP to be finished in a task running in the global
kernel work-queue is a bad idea, especially if the ERP needs to run
another job in this work-queue before it can finish. -> deadlock.
This patch removes the necessity to wait for a finished ERP from the
scan task and moves the job scheduling to the end of the ERP.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Prevent a SCSI target scan for a rport which have turned invalid
in the meantime.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
If an open port fsf request times out (in erp) the
corresponding erp_action member of the fsf
request need to set to NULL. If the port structure
will be removed later-on there will be still a
reference in the fsf request to the non existing
erp_action otherwise.
Signed-off-by: Martin Petermann <martin.petermann@de.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Some further bus_id -> dev_name() conversions in s390 code.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Trace ids 107 and 3 are used twice, fix this to have unique ids for
the erp triggers.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Due to the character of a scheduled work we cannot guarantee the
LUN register to be finished before an initial device tries to use it.
Therefor we have to wait for PENDING_SCSI_WORK flag to be cleared
before proceeding.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The zfcp_erp_thread was using the nolock version of the dbf function.
This resulted in a list access while other tasks could modifying the
list. The symptom was an erp thread running at 100% CPU and never
returning from the dbf function.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
In case of an adapter reopen all rports have to be deleted from the
environment. This should only happen for already registered rports
otherwise fc_remote_port_delete is called with a NULL pointer.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Each adapter reopen trigger automatically a scan_port task which
is waiting for the ERP to be finished before further processing.
Since the initial device setup enqueues adapter, port and LUN which
are individual ERP actions, this process would start after
everything is done. Unfortunately the port_reopen requires another
scheduled work to be finished which is queued after the automatic
scan_port -> deadlock !
This fix creates an own work queue for ERP based nameserver requests.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Reduce the size of zfcp data structures by removing unused and
redundant members. scsi_lun is only the mangled version of the
fcp_lun. So, remove the redundant field and use the fcp_lun instead.
Since the queue lock and the pci_batch indicator are only used in the
request queue, move them from the common queue struct to the adapter
struct.
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Changing the zfcp behaviour from always having the nameserver port
open to an on-demand strategy. This strategy reduces the use of
limited resources like port connections. The patch provides a common
infrastructure which could be used for all WKA ports in future.
Also reduce the number of nameserver lookups by changing the zfcp
behaviour of always querying the nameserver for the corresponding
destination ID of the remote port. If the destination ID has changed
during the reopen process we will be informed and then trigger a
nameserver query on demand.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
- Remove unused references and declarations, including one instance
of the FC ls_adisc struct that has been defined twice.
- Also remove the flags COMMON_OPENING, COMMON_CLOSING,
ADAPTER_REGISTERED and XPORT_OK that are only set and cleared, but
not checked anywhere.
- Remove the zfcp specific atomic_test_mask makro. Simply use
atomic_read directly instead.
- Remove the zfcp internal sg helper functions and switch the places
where it is still used to call sg_virt directly.
- With the update of the QDIO code, the QDIO data structures no
longer use the volatile type qualifier. Now we can also remove the
volatile qualifiers from the zfcp code.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Update the kernel messages in zfcp with input from the message review
and remove some messages that have been identified as redundant.
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Cleanup the code in zfcp_erp.c, move erp internal definititions to
this file and move FSF timeout handling to the FSF layer.
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Automatically attach the remote ports in zfcp when the adapter is set
online. This is done by querying all available ports from the FC
namesever. The scan for remote ports is also triggered by RSCNs and
can be triggered manually with the sysfs attribute 'port_rescan'.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Cleanup the messages used in the zfcp driver: Remove unnecessary debug
and trace message and convert the remaining messages to standard
kernel macros. Remove the zfcp message macros and while updating the
whole flie also update the copyright headers.
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Cleanup the interface code from zfcp to qdio. Also move code that
belongs to the qdio interface from the erp to the qdio file.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Move all Fibre Channel related code to new file and cleanup the code
while doing so.
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
drivers/s390/scsi/zfcp_dbf.c:692:2: warning: context imbalance in
'zfcp_rec_dbf_event_thread' - different lock contexts for basic block
Replace the parameter indicating if the lock is held with a new entry
function that only acquires the lock. This makes the lock handling
more visible and removes the sparse warning.
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: Martin Peschke <mp3@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Processing of an unsolicted status request can lead to a locking race
of the request_queue's queue_lock during the recreation of the
used up status read request while still in interrupt context
of the response handler.
Detaching the 'refill' of the long running status read requests from
the handler to a scheduled work is solving this issue.
In addition, each refill-run is trying to re-establish the full amount
of status read requests, which might have failed in earlier runs.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
drivers/s390/scsi/zfcp_aux.c: In function ‘zfcp_fsf_incoming_els_rscn’:
drivers/s390/scsi/zfcp_aux.c:1379: warning: cast from pointer to integer of
different size
drivers/s390/scsi/zfcp_aux.c: In function ‘zfcp_fsf_incoming_els_plogi’:
drivers/s390/scsi/zfcp_aux.c:1432: warning: cast from pointer to integer of
different size
drivers/s390/scsi/zfcp_aux.c: In function ‘zfcp_fsf_incoming_els_logo’:
drivers/s390/scsi/zfcp_aux.c:1457: warning: cast from pointer to integer of
different size
..
Just passing pointers rids us of these warnings and improves readability.
Signed-off-by: Martin Peschke <mp3@de.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
This patch removes the now obsolete erp_dbf trace.
Signed-off-by: Martin Peschke <mp3@de.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
This patch writes trace records for various phases of a recovery action:
action being created, action being processed, action continueing
asynchronously, action gone, action timed out, action dismissed etc.
Signed-off-by: Martin Peschke <mp3@de.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
This patch allows any recovery event to be traced back to an exact
cause, e.g. a particular request identified by an id (address).
Signed-off-by: Martin Peschke <mp3@de.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
This patch writes a trace record which provides information about state
changes for adapters, ports and units, e.g. target failure, targets becoming
online, targets being temporarily blocked due to pending recovery, targets
which have been recovered successfully etc.
Signed-off-by: Martin Peschke <mp3@de.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
This patch writes trace records which provide information about the
operation of the zfcp error recovery thread and the queues it works
on.
Signed-off-by: Martin Peschke <mp3@de.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
We need to hold the queue-lock when checking whether we still have a valid port
handle for the ELS command, i.e whether we can issue this request for this
port. If the error recovery is about to close this port, then it competes for
the queue-lock. If the close request issued by the error recovery wins, then it
is guaranteed that this port has been blocked for other requests.
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: Martin Peschke <mp3@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
zfcp_erp_strategy_check_fsfreq() checks if it is safe to access the
fsf_req associated with the erp_action that gets passed. To test if
it is safe it accesses the fsf_req in order to get its index into
the hash list. This is broken since the fsf_req might be freed already
and the read index has no meaning. It could lead to memory corruption.
Fix this by introducing a new zfcp_reqlist_find_safe() method which
just checks if addresses are equal. This is slower, but only gets
called in case of error recovery.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: Martin Peschke <mp3@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
When adding an invalid LUN, there is a deadlock between the add
via scsi_scan_target and the slave_destroy handler: The handler
waits for the scan to complete, but for an invalid unit,
scsi_scan_target directly calls the slave_destroy handler.
Fix the deadlock by removing the wait in the slave_destroy
handler, it was not necessary anyway.
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
It is not necessary to use jiffies or milliseconds to specify
waiting times that last a couple of seconds.
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Calling zfcp_erp_strategy_check_action() after zfcp_erp_action_to_running()
in zfcp_erp_strategy() might cause an unbalanced up() for erp_ready_sem,
which makes the zfcp recovery fail somewhere along the way:
erp thread processing erp_action:
|
| someone waking up erp thread for erp_action
| |
| | someone else dismissing erp_action:
| | |
V V V
write_lock_irqsave(&adapter->erp_lock, flags);
...
if (zfcp_erp_action_exists(erp_action) == ZFCP_ERP_ACTION_RUNNING) {
zfcp_erp_action_to_ready(erp_action);
up(&adapter->erp_ready_sem); /* first up() for erp_action */
}
write_unlock_irqrestore(&adapter->erp_lock, flags);
write_lock_irqsave(&adapter->erp_lock, flags);
...
zfcp_erp_action_to_running(erp_action);
write_unlock_restore(&adapter->erp_lock, flags);
/* processing erp_action */
write_lock_irqsave(&adapter->erp_lock, flags);
...
erp_action->status |= ZFCP_STATUS_ERP_DISMISSED;
if (zfcp_erp_action_exists(erp_action) ==
ZFCP_ERP_ACTION_RUNNING) {
zfcp_erp_action_to_ready(erp_action);
up(&adapter->erp_ready_sem);
/* second, unbalanced up() for erp_action */
}
...
write_unlock_restore(&adapter->erp_lock, flags);
write_lock_irqsave(&adapter->erp_lock, flags);
if (erp_action->status & ZFCP_STATUS_ERP_DISMISSED) {
zfcp_erp_action_dequeue(erp_action);
retval = ZFCP_ERP_DISMISSED;
}
...
write_unlock_restore(&adapter->erp_lock, flags);
down(&adapter->erp_ready_sem);
/* this down() is meant to balance the first up() */
The erp thread must not dismiss an erp_action after moving that action to
erp_running_head. Instead it should just go through the down() operation,
which balances the first up(), and run through zfcp_erp_strategy one more
time for the second up(), which eventually cleans up erp_action. Which
is similar to the normal processing of an event for erp_action doing
something asynchronously (e.g. waiting for the completion of an fsf_req).
This only works if we make sure that a dismissed erp_action is passed to
zfcp_erp_strategy() prior to the other action, which caused actions to be
dismissed. Therefore the patch implements this rule: running actions go to
the head of the ready list; new actions go to the tail of the ready list;
the erp thread picks actions to be processed from the ready list's head.
Signed-off-by: Martin Peschke <mp3@de.ibm.com>
Acked-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
zfcp_erp_action_dismiss() used to ignore any actions in the ready list. This
is a bug. Any action superseded by a stronger action needs to be dismissed.
This patch changes zfcp_erp_action_dismiss() so that it dismisses actions
regardless of their list affiliation. The ERP thread is able to handle this.
It is important to kick the erp thread only for actions in the running list,
though, as an imbalance of wakeup signals would confuse the erp thread
otherwise.
Signed-off-by: Martin Peschke <mp3@de.ibm.com>
Acked-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Most drivers need to set length and offset as well, so may as well fold
those three lines into one.
Add sg_assign_page() for those two locations that only needed to set
the page, where the offset/length is set outside of the function context.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>