Adds support for fc_block_scsi_eh to block the EH handlers if
the target device is in the blocked state to ensure we don't
take devices offline.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
This fixes a softlockup seen on resume. During resume, the CRQ
must be reenabled. However, the H_ENABLE_CRQ hcall used to do
this may return H_BUSY or H_LONG_BUSY. When this happens, the
caller is expected to retry later. Normally the H_ENABLE_CRQ
succeeds relatively soon. However, we have seen cases where
this can take long enough to see softlockup warnings.
This patch changes a simple loop, which was causing the
softlockup, to a loop at task level which sleeps between
retries rather than simply spinning.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Current driver is not clearing the per device tm_busy flag
following the Task Mangement request completion from the IOCTL path.
When this flag is set, the IO queues are frozen. The reason the flag
didn't get cleared is becuase the driver is referencing
memory associated to the mpi request following the completion, when
the memory had been reallocated for a new request. When the memory
was reallocated, the driver didn't clear the flag becuase it was
expecting a task managment reqeust, and the reallocated request was
for SCSI_IO. To fix the problem the driver needs to have a cached
backup copy of the original reqeust.
Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
(1) driver was not setting the sense data size prior to sending SCSI_IO,
resulting in the 0x31190000 loginfo
(2) The driver needs to copy the sense data to local buffer prior
to releasing the request message frame. If not, the sense buffer gets
overwritten by the next SCSI_IO request.
Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Adding additional messages to the error escallation callbacks which
displays the wwid, sas address, handle, phy number, enclosure logical id,
and slot. In the same eh callbacks, routines, the printks were converted
to sdev_printks, which displays the bus target mapping. These additional
modifications help better identify the device which is in recovery.
Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
ISSUE DESCRIPTION:
This test case involves creating two RAID1 volumes, then
simultaneiously issue host reset and pull all the drives associated to
the 1st raid volume. The observed behavour is the physical drives are
removed, however the volume remains. The expected behavour is the
volume as well as physical drives should be removed from OS.
FIX:
Add support in the post host reset device scan logic for raid volumes
where the driver will have an additional check for responding raid
volume where the status should be either online, optimal, or degraded.
So for voluemes that have a status of missing or failed, the driver
will mark them for deletion.
Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
In the driver mpt2sas_base_attach subroutine, we need to add
support to return the proper error code when there are memory allocation
failures, e.g. returning -ENOMEM.
Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Actual problem :
Driver may receiving the top level expander
removal event prior to all the individual PD removal events, hence the
driver is breaking down all the PDs in advanced to the actaul PD UNHIDE
event. Driver sends multiple
Target Resets to the same volume handle for each individual PD removal.
FIX DESCRIPTION:
To fix this issue, the entire PD device handshake protocal has to be
moved to interrupt context so the breakdown occurs immediately after the
actual UNHIDE event arrives. The driver will only issue one Target Reset to
the volume handle, occurring after the FAILED or MISSING volume status
event arrives from interrupt context. For the PD UNHIDE event, the driver
will issue target resets to the PD handles, followed by OP_REMOVE. The
driver will set the "deteleted" flag during interrupt context. A "pd_handle"
bitmask was introduced so the driver has a list of known pds during entire
life of the PD; this replaces the "hidden_raid_component" flag handle in
the sas_device object. Each bit in the bitmask represents a device handle.
The bit in the bitmask would be toggled ON/OFF when the HIDE/UNHIDE
events arrive; also this pd_handle bitmask would bould be refreshed
across host resets.
Here we kept older behavior of sending target reset to volume when there is
a single drive pull, wait for the reply, then send target resets
to the PDs. We kept this behavior so the driver will
behave the same for older versions of firmware.
Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Add support to display additional debug info for SCSI_IO and
RAID_SCSI_IO_PASSTHROUGH sent from the normal entry queued entry
point, as well as internal generated commands, and IOCTLS. The
additional debug info included the phy number, as well as the
sas address, enclosure logical id, and slot number. This debug info
has to be enabled thru the logging_level command line option, by
default this will not be displayed.
Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Converting print level from MPT2SAS_DEBUG_FMT to MPT2SAS_INFO_FMT.
Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Added support so the diag ring buffer can be pulled via sysfs
Added three new shost attributes: host_trace_buffer,
host_trace_buffer_enable, and host_trace_buffer_size. The
host_trace_buffer_enable attribute is used to either post or release
the trace buffers. The host_trace_buffer_size attribute contains
the size of the trace buffer. The host_trace_buffer atttribute contains
a maximum 4KB window of the buffer. In order to read the entire host buffer,
you will need to write the offset to host_trace_buffer prior to reading
it. release the host buffer, then write the entire host buffer contents to
a file.
In addition to this enhancement, we moved the automatic posting of host buffers
at driver load time to be called prior to port_enable, instead of after.
That way discovery is available in the host buffer.
Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Updating MPI header version N.
Removed mpi_history.txt.
Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Added a new sysfs shost attribute called ioc_reset_count. This will
keep count of host resets (both diagnostic and message unit).
Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Added support to send link resets, hard resets, enable/disable phys, and
changing link rates for for expanders. This will be exported to
attributes within the sas transport layer. A new wrapper function was
added for sending SMP passthru to expanders for phy control.
Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Added support to retrieve the invalid_dword_count,
running_disparity_error_count, loss_of_dword_sync_count, and
phy_reset_problem_count for expanders. This will be exported to
attributes within the sas transport layer. A new wrapper function was
added for sending SMP passthru to retrieve the expander phy error log.
Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Added command line option called disable_discovery. When enabled
on the command line, the driver will not send a port_enable when loaded
for the first time. If port_enable is not called, then there is
no discovery of devices, as well as the sas topology. Then later if one
desires to invoke discovery, then they will need to issue a diagnostic reset.
A diagnostic reset can be issued various ways. One of the way is throught
sysfs.
Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Driver should not allow multiple host reset when already host reset is in
progress. It is possible that host reset was sent by scsi mid layer while there was already an host reset active,
either issued via IOCTL interface or internaly, like a config page timeout.
Since there was a host reset active, the driver would return a FAILED response
to the scsi mid layer. The solution is make sure pending host resets will
wait for the active host reset to complete before returning control
back up the call stack.
Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Enclosure_identifier not being returned by mpt2sas
The driver exports callback function to the sas transport layer
for obtaining the enclosure logical id. This function is called
_transport_get_enclosure_identifier. The driver was searching
the wrong list for the enclosure_identifier. The driver should be
searching the sas device list instead of enclosure list. The
sas address that is passed to the driver is for the end device, not
enclosure.
Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Zero out the reserved or un-used CPL message fields to prevent any garbage
value.
Signed-off-by: Karen Xie <kxie@chelsio.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Some controllers might try to tell us they support 0 commands
in performant mode. This is a lie told by buggy firmware.
We have to be wary of this lest we try to allocate a negative
number of command blocks, which will be treated as unsigned,
and get an out of memory condition.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
There are things which need to be done in the intx
interrupt handler which do not need to be done in
the msi/msix interrupt handler, like checking that
the interrupt is actually for us, and checking that the
interrupt pending bit on the hardware is set (which we
weren't previously doing at all, which means old controllers
wouldn't work), so it makes sense to separate these into
two functions.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
The 6402/6404 are two PCI devices -- two Smart Array controllers
-- that fit into one slot. It is possible to reset them independently,
however, they share a battery backed cache module. One of the pair
controls the cache and the 2nd one access the cache through the first
one. If you reset the one controlling the cache, the other one will
not be a happy camper. So we just forbid resetting this conjoined
mess.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Smart Array controllers newer than the P600 do not honor the
PCI power state method of resetting the controllers. Instead,
in these cases we can get them to reset via the "doorbell" register.
This escaped notice until we began using "performant" mode because
the fact that the controllers did not reset did not normally
impede subsequent operation, and so things generally appeared to
"work". Once the performant mode code was added, if the controller
does not reset, it remains in performant mode. The code immediately
after the reset presumes the controller is in "simple" mode
(which previously, it had remained in simple mode the whole time).
If the controller remains in performant mode any code which presumes
it is in simple mode will not work. So the reset needs to be fixed.
Unfortunately there are some controllers which cannot be reset by
either method. (eg. p800). We detect these cases by noticing that
the controller seems to remain in performant mode even after a
reset has been attempted. In those case, we proceed anyway,
as if the reset has happened (and skip the step of waiting for
the controller to become ready -- which is expecting it to be in
"simple" mode.) To sum up, we try to do a better job of resetting
the controller if "reset_devices" is set, and if it doesn't work,
we print a message and try to continue anyway.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Rationale for this is that I will also need to use this code
in fixing kdump host reset code prior to having the hba structure.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Rationale for this is that in order to fix the hard reset code used
by kdump, we need to use this function before we even have the per
HBA structure.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
We were previously only accepting HP boards.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Add 5 CCISSE smart array controllers
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
1. To support 4M/1024 scatter-gather list entry, reorganize struct
ARCMSR_CDB and struct CommandControlBlock
2. To modify arcmsr_probe
3. In order to help fix F/W issue, add the driver mode for type B card
4. To improve AP's behavior while F/W resets
5. To unify struct MessageUnit_B's members' naming in all OS drivers'
6. To improve error handlers, arcmsr_bus_reset(), arcmsr_abort()
7. To fix the arcmsr_queue_command() in bus reset stage, just let the
commands pass down to FW, don't block
Signed-off-by: Nick Cheng <nick.cheng@areca.com.tw>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Remote ports were restarting indefinitely after getting
rejects in PRLI.
Fix by adding a counter of restarts and limiting that with
the port login retry limit as well.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
This patch somewhat combines two fixes to remote port handing in libfc.
The first problem was that rport work could be queued on a deleted
and freed rport. This is handled by not resetting rdata->event
ton NONE if the rdata is about to be deleted.
However, that fix led to the second problem, described by
Bhanu Gollapudi, as follows:
> Here is the sequence of events. T1 is first LOGO receive thread, T2 is
> fc_rport_work() scheduled by T1 and T3 is second LOGO receive thread and
> T4 is fc_rport_work scheduled by T3.
>
> 1. (T1)Received 1st LOGO in state Ready
> 2. (T1)Delete port & enter to RESTART state.
> 3. (T1)schdule event_work, since event is RPORT_EV_NONE.
> 4. (T1)set event = RPORT_EV_LOGO
> 5. (T1)Enter RESTART state as disc_id is set.
> 6. (T2)remember to PLOGI, and set event = RPORT_EV_NONE
> 6. (T3)Received 2nd LOGO
> 7. (T3)Delete Port & enter to RESTART state.
> 8. (T3)schedule event_work, since event is RPORT_EV_NONE.
> 9. (T3)Enter RESTART state as disc_id is set.
> 9. (T3)set event = RPORT_EV_LOGO
> 10.(T2)work restart, enter PLOGI state and issues PLOGI
> 11.(T4)Since state is not RESTART anymore, restart is not set, and the
> event is not reset to RPORT_EV_NONE. (current event is RPORT_EV_LOGO).
> 12. Now, PLOGI succeeds and fc_rport_enter_ready() will not schedule
> event_work, and hence the rport will never be created, eventually losing
> the target after dev_loss_tmo.
So, the problem here is that we were tracking the desire for
the rport be restarted by state RESTART, which was otherwise
equivalent to DELETE. A contributing factor is that we dropped
the lock between steps 6 and 10 in thread T2, which allows the
state to change, and we didn't completely re-evaluate then.
This is hopefully corrected by the following minor redesign:
Simplify the rport restart logic by making the decision to
restart after deleting the transport rport. That decision
is based on a new STARTED flag that indicates fc_rport_login()
has been called and fc_rport_logoff() has not been called
since then. This replaces the need for the RESTART state.
Only restart if the rdata is still in DELETED state
and only if it still has the STARTED flag set.
Also now, since we clear the event code much later in the
work thread, allow for the possibility that the rport may
have become READY again via incoming PLOGI, and if so,
queue another event to handle that.
In the problem scenario, the second LOGO received will
cause the LOGO event to occur again.
Reported-by: Bhanu Gollapudi <bprakash@broadcom.com>
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
In fnic_abort_cmd() and fnic_device_reset() assign `rport' earlier to make
FNIC_SCSI_DBG() calls cleaner.
In fnic_clean_pending_aborts() `rport' is not used.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Acked-by: Abhijeet Joglekar <abjoglek@cisco.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
lport state is enum not bit mask.
Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
As per FC-BB-5 rev.2, section 7.8.7.1, strict ordering of FIP descriptors
is required for ELS requests. Also, look for missing and duplicate critical
descriptors.
Signed-off-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Clear virtual link for NPIV ports is now handled by resetting
the matching vnport.
Signed-off-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
As per FC-BB-5 rev 2, section 7.8.6.2, malformed FIP frame shall be
discarded. Drop discovery adv, ELS and CLV's with duplicate critical
descriptors.
[Resending after incorporating Joe's review comments]
Signed-off-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Allow the D flag (indicating that keep-alives are not needed) to
be updated dynamically from received FIP advertisements.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
keep alives are disabled due to fd_flags set and also
stop updating keep alive values in that case.
Update select fcf time only if fcf is not already selected or
select time is not already determined from parse adv, and then
have select time cleared only once after fcf is selected.
Changed deadline check to time_after_eq() from time_after()
since now next timeout will be on exact 2.5 times FKA followed
by first advertisement.
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
[This patch has several improvements to the code in
the fip timers. It hasn't been tested yet.
I'm sending it out for review. Vasu, perhaps you can
merge this with your patch and test it together.]
The current code allows an advertisement to be used
even if it has been 3 times the FCF keep-alive
advertisement period (FKA) since one was received from
that FCF. The spec. calls for 2.5 times FKA.
Fix this and make sure we detect missed keep-alives promptly.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Resubmitting after incorporating Joe's review comment.
Unsolicited PRLO request is now handled by sending LS_ACC,
and then relogin to the remote port if an N-port login
session exists for that remote port.
Note that this patch should be applied on top of Joe Eykholt's
"Fix remote port restart problem" patch.
Signed-off-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
As per FC-LS Rev 1.62 table 46, response codes are handled as follows:
1. If the Req executed is true, PRLI is accepted.
2. If Req executed is not set, if resp code is 5,
PRLI is not retried and port is logged out.
3. If resp code is anything apart from 1 or 5, PRLI is retired
upto max retry count.
Signed-off-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Retry upto max_rport_retry_count when a target responds with
LS_RJT for a PRLI request.
Signed-off-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Host does not send discovery solicitation messages if Disc. Adv
from FCF are dropped. It restarts sending solicitation only
after receiving a Discovery Adv. from FCF. Fix is to restart
solicitation immediately after CVL processing.
Signed-off-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Avoid infinite loop while processing FIP ELS or discovery
advertisement with non-critical descriptors.
Signed-off-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com>
Acked-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
A check in fcoe_ctlr_send_keep_alive() returns if there's no
port_id for the local port. This could miss a keep alive if
we just did a host reset and have logged off and will log back in.
Return only if we are doing the port keep alive, in which case
we need to be logged in.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
A problem was found where the call to scsi_add_device() fails intermittently
for an adapter. This is caused when __scsi_add_device() returns -ENODEV as
a result of not calling scsi_probe_and_add_lun() since the call to
scsi_host_scan_allowed() fails. scsi_host_scan_allowed() fails because the
adapter state is set to SHOST_RECOVERY instead of SHOST_RUNNING. The state of
the adapter is being set to SHOST_RECOVERY by scsi_eh_scmd_add() during
error handling.
This problem is avoided by moving the setting of the allow_restart flag to
later in the device initialization sequence. This prevents further error
handling if we get a NOT_READY response from a TUR command by causing
scsi_check_sense() to return SUCCESS. Therefore, scsi_eh_scmd_add() will
not run and the adapter state will remain as SHOST_RUNNING.
Signed-off-by: Wayne Boyer <wayneb@linux.vnet.ibm.com>
Acked-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
"phba" is always null here so we can't dereference it.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
I added a kfree(pwrb_arr) in front of the return.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Compiling the driver will fail on 32 bit powerpc and other
architectures where writeq is not defined. This patch adds a
definition for writeq.
Signed-off-by: Wayne Boyer <wayneb@linux.vnet.ibm.com>
Acked-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>