Commit graph

29 commits

Author SHA1 Message Date
Dan Williams
303694eeee [SCSI] libsas: suspend / resume support
libsas power management routines to suspend and recover the sas domain
based on a model where the lldd is allowed and expected to be
"forgetful".

sas_suspend_ha - disable event processing allowing the lldd to take down
                 links without concern for causing hotplug events.
                 Regardless of whether the lldd actually posts link down
                 messages libsas notifies the lldd that all
                 domain_devices are gone.

sas_prep_resume_ha - on the way back up before the lldd starts link
                     training clean out any spurious events that were
                     generated on the way down, and re-enable event
                     processing

sas_resume_ha - after the lldd has started and decided that all phys
		have posted link-up events this routine is called to let
		libsas start it's own timeout of any phys that did not
		resume.  After the timeout an lldd can cancel the
                phy teardown by posting a link-up event.

Storage for ex_change_count (u16) and phy_change_count (u8) are changed
to int so they can be set to -1 to indicate 'invalidated'.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jacek Danecki <jacek.danecki@intel.com>
Tested-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-08-24 13:10:23 +04:00
Dan Williams
22b9153faa [SCSI] libsas: introduce sas_work to fix sas_drain_work vs sas_queue_work
When requeuing work to a draining workqueue the last work instance may
not be idle, so sas_queue_work() must not touch work->entry.  Introduce
sas_work with a drain_node list_head to have a private list for
collecting work deferred due to drain collision.

Fixes reports like:
  BUG: unable to handle kernel NULL pointer dereference at           (null)
  IP: [<ffffffff810410d4>] process_one_work+0x2e/0x338

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-04-23 12:03:39 +01:00
Dan Williams
899fcf40f3 [SCSI] libsas: set attached device type and target protocols for local phys
Before:
$ cat /sys/class/sas_phy/phy-6\:3/device_type
none
$ cat /sys/class/sas_phy/phy-6\:3/target_port_protocols
none

After:
$ cat /sys/class/sas_phy/phy-6\:3/device_type
end device
$ cat /sys/class/sas_phy/phy-6\:3/target_port_protocols
sata

Also downgrade the phy_list_lock to _irq instead of _irqsave since
libsas will never call sas_get_port_device with interrupts disbled.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-29 15:40:33 -06:00
Dan Williams
9508a66f89 [SCSI] libsas: async ata scanning
libsas ata error handling is already async but this does not help the
scan case.  Move initial link recovery out from under host->scan_mutex,
and delay synchronization with eh until after all port probe/recovery
work has been queued.

Device ordering is maintained with scan order by still calling
sas_rphy_add() in order of domain discovery.

Since we now scan the domain list when invoking libata-eh we need to be
careful to check for fully initialized ata ports.

Acked-by: Jack Wang <jack_wang@usish.com>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-29 15:35:41 -06:00
Dan Williams
354cf82980 [SCSI] libsas: let libata recover links that fail to transmit initial sig-fis
libsas fails to discover all sata devices in the domain.  If a device fails
negotiation and does not transmit a signature fis the link needs recovery.
libata already understands how to manage slow to come up links, so treat these
conditions as ata device attach events for the purposes of creating an
ata_port.  This allows libata to manage retrying link bring up.

Rediscovery is modified to be careful about checking changes in dev_type.  It
looks like libsas leaks old devices if the sas address changes, but that's a
fix for another patch.

Acked-by: Jack Wang <jack_wang@usish.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-29 15:33:02 -06:00
Dan Williams
5d7f6d1071 [SCSI] libsas: fix sas_unregister_ports vs sas_drain_work
We need to hold drain_mutex across the unregistration as port down events
queue device removal as chained events, so we need to make sure no other
drainers are active.

[ 1118.673968] WARNING: at kernel/workqueue.c:996 __queue_work+0x11a/0x326()
[ 1118.681982] Hardware name: S2600CP
[ 1118.686193] Modules linked in: isci(-) libsas scsi_transport_sas nls_utf8
ipv6 uinput sg iTCO_wdt iTCO_vendor_support i2c_i801 i2c_core ioatdma dca
sd_mod sr_mod cdrom ahci libahci libata [last unloaded: scsi_transport_sas]
[ 1118.709893] Pid: 6831, comm: rmmod Not tainted 3.2.0-isci+ #1
[ 1118.716727] Call Trace:
[ 1118.719867]  [<ffffffff8103e9f5>] warn_slowpath_common+0x85/0x9d
[ 1118.727000]  [<ffffffff8103ea27>] warn_slowpath_null+0x1a/0x1c
[ 1118.733942]  [<ffffffff81056d44>] __queue_work+0x11a/0x326
[ 1118.740481]  [<ffffffff81056f99>] queue_work_on+0x1b/0x22
[ 1118.746925]  [<ffffffff81057106>] queue_work+0x37/0x3e
[ 1118.753105]  [<ffffffffa0120e05>] ? sas_discover_event+0x55/0x82 [libsas]
[ 1118.761094]  [<ffffffff813217c3>] scsi_queue_work+0x42/0x44
[ 1118.767717]  [<ffffffffa0120e19>] sas_discover_event+0x69/0x82 [libsas]
[ 1118.775509]  [<ffffffffa0120f5b>] sas_unregister_dev+0xc3/0xcc [libsas]
[ 1118.783319]  [<ffffffffa0120fae>] sas_unregister_domain_devices+0x4a/0xc8 [libsas]
[ 1118.792731]  [<ffffffffa0120071>] sas_deform_port+0x60/0x1a6 [libsas]
[ 1118.800339]  [<ffffffffa01201ea>] sas_unregister_ports+0x33/0x44 [libsas]
[ 1118.808342]  [<ffffffffa011f7e5>] sas_unregister_ha+0x41/0x6b [libsas]
[ 1118.816055]  [<ffffffffa0134055>] isci_unregister+0x22/0x4d [isci]
[ 1118.823384]  [<ffffffffa0143040>] isci_pci_remove+0x2e/0x60 [isci]

Reported-by: Jacek Danecki <jacek.danecki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-29 15:27:09 -06:00
Dan Williams
ab5266335b [SCSI] libsas: route local link resets through ata-eh
Similar to the conversion of the transport-class reset we want bsg
initiated resets to be managed by libata.

Reported-by: Jacek Danecki <jacek.danecki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-29 15:25:32 -06:00
Dan Williams
f41a0c441c [SCSI] libsas: fix sas_find_local_phy(), take phy references
In the direct-attached case this routine returns the phy on which this
device was first discovered.  Which is broken if we want to support
wide-targets, as this phy reference can become stale even though the
port is still active.

In the expander-attached case this routine tries to lookup the phy by
scanning the attached sas addresses of the parent expander, and BUG_ONs
if it can't find it.  However since eh and the libsas workqueue run
independently we can still be attempting device recovery via eh after
libsas has recorded the device as detached.  This is even easier to hit
now that eh is blocked while device domain rediscovery takes place, and
that libata is fed more timed out commands increasing the chances that
it will try to recover the ata device.

Arrange for dev->phy to always point to a last known good phy, it may be
stale after the port is torn down, but it will catch up for wide port
reconfigurations, and never be NULL.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-29 13:01:06 -06:00
Dan Williams
36a3994739 [SCSI] libsas: poll for ata device readiness after reset
Use ata_wait_after_reset() to poll for link recovery after a reset.
This combined with sas_ha->eh_mutex prevents expander rediscovery from
probing phys in an intermediate state.  Local discovery does not have a
mechanism to filter link status changes during this timeout, so it
remains the responsibility of lldds to prevent premature port teardown.
Although once all lldd's support ->lldd_ata_check_ready() that could be
used as a gate to local port teardown.

The signature fis is re-transmitted when the link comes back so we
should be revalidating the ata device class, but that is left to a future
patch.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-29 12:49:36 -06:00
Dan Williams
2a559f4ba4 [SCSI] libsas: sas_phy_enable via transport_sas_phy_reset
Execute the link-reset triggered by sas_phy_enable via
transport_sas_phy_reset so that it can be managed by libata.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-19 14:18:01 -06:00
Dan Williams
81c757bc69 [SCSI] libsas: execute transport link resets with libata-eh via host workqueue
Link resets leave ata affiliations intact, so arrange for libsas to make
an effort to avoid dropping the device due to a slow-to-recover link.
Towards this end carry out reset in the host workqueue so that it can
check for ata devices and kick the reset request to libata.  Hard
resets, in contrast, bypass libata since they are meant for associating
an ata device with another initiator in the domain (tears down
affiliations).

Need to add a new transport_sas_phy_reset() since the current
sas_phy_reset() is a utility function to libsas lldds.  They are not
prepared for it to loop back into eh.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-19 14:13:51 -06:00
Dan Williams
0b3e09da13 [SCSI] libsas: perform sas-transport resets in shost->workq context
Extend the sas transport class to allow transport users to attach extra
data to a sas_phy (->hostdata).  Use this area in libsas to move resets
to workq context in preparation for scheduling ata device resets through
libata-eh.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-19 14:11:33 -06:00
Dan Williams
9095a64a9a [SCSI] libsas: fix timeout vs completion race
Until we have told the lldd to forget a task a timed out operation can
return from the hardware at any time.  Since completion frees the task
we need to make sure that no tasks run their normal completion handler
once eh has decided to manage the task.  Similar to
ata_scsi_cmd_error_handler() freeze completions to let eh judge the
outcome of the race.

Task collector mode is problematic because it presents a situation where
a task can be timed out and aborted before the lldd has even seen it.
For this case we need to guarantee that a task that an lldd has been
told to forget does not get queued after the lldd says "never seen it".
With sas_scsi_timed_out we achieve this with the ->task_queue_flush
mutex, rather than adding more time.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-19 14:06:08 -06:00
Dan Williams
87c8331fcf [SCSI] libsas: prevent domain rediscovery competing with ata error handling
libata error handling provides for a timeout for link recovery.  libsas
must not rescan for previously known devices in this interval otherwise
it may remove a device that is simply waiting for its link to recover.
Let libata-eh make the determination of when the link is stable and
prevent libsas (host workqueue) from taking action while this
determination is pending.

Using a mutex (ha->disco_mutex) to flush and disable revalidation while
eh is running requires any discovery action that may block on eh be
moved to its own context outside the lock.  Probing ATA devices
explicitly waits on ata-eh and the cache-flush-io issued during device
removal may also pend awaiting eh completion.  Essentially any rphy
add/remove activity needs to run outside the lock.

This adds two new cleanup states for sas_unregister_domain_devices()
'allocated-but-not-probed', and 'flagged-for-destruction'.  In the
'allocated-but-not-probed' state  dev->rphy points to a rphy that is
known to have not been through a sas_rphy_add() event.  At domain
teardown check if this device is still pending probe and cleanup
accordingly.  Similarly if a device has already been queued for removal
then sas_unregister_domain_devices has nothing to do.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-19 13:52:34 -06:00
Dan Williams
b1124cd3ec [SCSI] libsas: introduce sas_drain_work()
When an lldd invokes ->notify_port_event() it can trigger a chain of libsas
events to:

  1/ form the port and find the direct attached device

  2/ if the attached device is an expander perform domain discovery

A call to flush_workqueue() will only flush the initial port formation work.
Currently libsas users need to call scsi_flush_work() up to the max depth of
chain (which will grow from 2 to 3 when ata discovery is moved to its own
discovery event).  Instead of open coding multiple calls switch to use
drain_workqueue() to flush sas work.

drain_workqueue() does not handle new work submitted during the drain so
libsas needs a bit of infrastructure to hold off unchained work submissions
while a drain is in flight.  A lldd ->notify() event is considered 'unchained'
while a sas_discover_event() is 'chained'.  As Tejun notes:

  "For now, I think it would be best to add private wrapper in libsas to
   support deferring unchained work items while draining."

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-19 13:48:51 -06:00
Dan Williams
f8daa6e6d8 [SCSI] libsas: convert ha->state to flags
In preparation for adding new states (SAS_HA_DRAINING, SAS_HA_FROZEN),
convert ha->state into a set of flags.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-19 13:47:29 -06:00
Dan Williams
b15ebe0b5d [SCSI] libsas: replace event locks with atomic bitops
The locks only served to make sure the pending event bitmask was updated
consistently.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-19 13:41:04 -06:00
Dan Williams
735f7d2fed [SCSI] libsas: fix domain_device leak
Arrange for the deallocation of a struct domain_device object when it no
longer has:
1/ any children
2/ references by any scsi_targets
3/ references by a lldd

The comment about domain_device lifetime in
Documentation/scsi/libsas.txt is stale as it appears mainline never had
a version of a struct domain_device that was registered as a kobject.
We now manage domain_device reference counts on behalf of external
agents.

Reviewed-by: Jack Wang <jack_wang@usish.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-19 13:37:47 -06:00
Dan Williams
90f1e10d08 [SCSI] libsas: fix/amend device gone notification in sas_deform_port()
Commit 56dd2c06 "libsas: Don't issue commands to devices that have been
hot-removed" edited Darrick's original patch to remove setting 'gone' in
the sas_deform_port() path because that prevented scsi sync cache
commands from being issued when the driver was unloaded.  However, this
allows true device gone notifications (as signaled port phy events) to
trigger sync cache commands to devices that are known to be unreachable.

Teach libsas which sas_deform_port() invocations are likely device gone
events.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <jbottomley@parallels.com>
2011-05-26 22:49:32 -05:00
James Bottomley
96db6fa992 [SCSI] libsas: convert to standard kernel debugging
Instead of using a config option for debugging, just dump the
messages with KERN_DEBUG.

Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2011-01-24 12:05:38 -06:00
Jens Axboe
242f9dcb8b block: unify request timeout handling
Right now SCSI and others do their own command timeout handling.
Move those bits to the block layer.

Instead of having a timer per command, we try to be a bit more clever
and simply have one per-queue. This avoids the overhead of having to
tear down and setup a timer for each command, so it will result in a lot
less timer fiddling.

Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-10-09 08:56:13 +02:00
James Bottomley
b98e66fa0b [SCSI] libsas: add host SMP processing
This adds support for host side SMP processing, via a separate
SMP interpreter file.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-01-11 18:29:11 -06:00
Darrick J. Wong
5929faf333 [SCSI] libsas: Convert sas_proto users to sas_protocol
sparse complains about the mixing of enums in libsas.  Since the
underlying numeric values of both enums are the same, combine them
to get rid of the warning.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-01-11 18:22:41 -06:00
Darrick J. Wong
3a2755af37 [SCSI] sas_ata: Implement sas_task_abort for ATA devices
ATA devices need special handling for sas_task_abort.  If the ATA command
came from SCSI, then we merely need to tell SCSI to abort the scsi_cmnd.
However, internal commands require a bit more work--we need to fill the qc
with the appropriate error status and complete the command, and eventually
post_internal will issue the actual ABORT TASK.

Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2007-07-18 11:16:03 -05:00
Darrick J. Wong
6b0efb8516 [SCSI] libsas: Add SAS_HA state flags to avoid queueing events while unloading
Track sas_ha_struct state so that we ignore events that come in while
we're shutting things down.

Signed-off-by: Malahal Naineni <malahal@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2007-01-13 16:21:53 -06:00
David Howells
c4028958b6 WorkStruct: make allyesconfig
Fix up for make allyesconfig.

Signed-Off-By: David Howells <dhowells@redhat.com>
2006-11-22 14:57:56 +00:00
James Bottomley
a01e70e570 [SCSI] aci94xx: implement link rate setting
This patch implements the ability to set the minimum and maximum
linkrates for both libsas (for expanders) and aic94xx (for the host
phys).  It also tidies up the setting of the hardware min and max to
make sure they're updated when the expander emits a change broadcast.

Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-09-07 15:20:23 -05:00
James Bottomley
88edf74610 [SCSI] SAS: consolidate linkspeed definitions
At the moment we have two separate linkspeed enumerations covering
roughly the same values.  This patch consolidates on a single one enum
sas_linkspeed in scsi_transport_sas.h and uses it everywhere in the
aic94xx driver.  Eventually I'll get around to removing the duplicated
fields in asd_sas_phy and sas_phy ...

Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-09-07 12:41:16 -05:00
James Bottomley
2908d778ab [SCSI] aic94xx: new driver
This is the end point of the separate aic94xx driver based on the
original driver and transport class from Luben Tuikov
<ltuikov@yahoo.com>

The log of the separate development is:

Alexis Bruemmer:
  o aic94xx: fix hotplug/unplug for expanderless systems
  o aic94xx: disable split completion timer/setting by default
  o aic94xx: wide port off expander support
  o aic94xx: remove various inline functions
  o aic94xx: use bitops
  o aic94xx: remove queue comment
  o aic94xx: remove sas_common.c
  o aic94xx: sas remove depot's
  o aic94xx: use available list_for_each_entry_safe_reverse()
  o aic94xx: sas header file merge

James Bottomley:
  o aic94xx: fix TF_TMF_NO_CTX processing
  o aic94xx: convert to request_firmware interface
  o aic94xx: fix hotplug/unplug
  o aic94xx: add link error counts to the expander phys
  o aic94xx: add transport class phy reset capability
  o aic94xx: remove local_attached flag
  o Remove README
  o Fixup Makefile variable for libsas rename
  o Rename sas->libsas
  o aic94xx: correct return code for sas_discover_event
  o aic94xx: use parent backlink port
  o aic94xx: remove channel abstraction
  o aic94xx: fix routing algorithms
  o aic94xx: add backlink port
  o aic94xx: fix cascaded expander properties
  o aic94xx: fix sleep under lock
  o aic94xx: fix panic on module removal in complex topology
  o aic94xx: make use of the new sas_port
  o rename sas_port to asd_sas_port
  o Fix for eh_strategy_handler move
  o aic94xx: move entirely over to correct transport class formulation
  o remove last vestages of sas_rphy_alloc()
  o update for eh_timed_out move
  o Preliminary expander support for aic94xx
  o sas: remove event thread
  o minor warning cleanups
  o remove last vestiges of id mapping arrays
  o Further updates
  o Convert aic94xx over entirely to the transport class end device and
  o update aic94xx/sas to use the new sas transport class end device
  o [PATCH] aic94xx: attaching to the sas transport class
  o Add missing completion removal from prior patch
  o [PATCH] aic94xx: attaching to the sas transport class
  o Build fixes from akpm

Jeff Garzik:
  o [scsi aic94xx] Remove ->owner from PCI info table

Luben Tuikov:
  o initial aic94xx driver

Mike Anderson:
  o aic94xx: fix panic on module insertion
  o aic94xx: stub out SATA_DEV case
  o aic94xx: compile warning cleanups
  o aic94xx: sas_alloc_task
  o aic94xx: ref count update
  o aic94xx nexus loss time value
  o [PATCH] aic94xx: driver assertion in non-x86 BIOS env

Randy Dunlap:
  o libsas: externs not needed

Robert Tarte:
  o aic94xx: sequence patch - fixes SATA support

Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-08-29 09:52:29 -05:00