History log of /linux-master/drivers/ata/libata-eh.c
Revision Date Author Comments
# 0c76106c 19-Mar-2024 Damien Le Moal <dlemoal@kernel.org>

scsi: sd: Fix TCG OPAL unlock on system resume

Commit 3cc2ffe5c16d ("scsi: sd: Differentiate system and runtime start/stop
management") introduced the manage_system_start_stop scsi_device flag to
allow libata to indicate to the SCSI disk driver that nothing should be
done when resuming a disk on system resume. This change turned the
execution of sd_resume() into a no-op for ATA devices on system
resume. While this solved deadlock issues during device resume, this change
also wrongly removed the execution of opal_unlock_from_suspend(). As a
result, devices with TCG OPAL locking enabled remain locked and
inaccessible after a system resume from sleep.

To fix this issue, introduce the SCSI driver resume method and implement it
with the sd_resume() function calling opal_unlock_from_suspend(). The
former sd_resume() function is renamed to sd_resume_common() and modified
to call the new sd_resume() function. For non-ATA devices, this result in
no functional changes.

In order for libata to explicitly execute sd_resume() when a device is
resumed during system restart, the function scsi_resume_device() is
introduced. libata calls this function from the revalidation work executed
on devie resume, a state that is indicated with the new device flag
ATA_DFLAG_RESUMING. Doing so, locked TCG OPAL enabled devices are unlocked
on resume, allowing normal operation.

Fixes: 3cc2ffe5c16d ("scsi: sd: Differentiate system and runtime start/stop management")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218538
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20240319071209.1179257-1-dlemoal@kernel.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# 54d7211d 12-Oct-2023 Damien Le Moal <dlemoal@kernel.org>

ata: libata-eh: Spinup disk on resume after revalidation

Move the call to ata_dev_power_set_active() to transition a disk in
standby power mode to the active power mode from
ata_eh_revalidate_and_attach() before doing revalidation to the end of
ata_eh_recover(), after the link speed for the device is reconfigured
(if that was necessary). This is safer as this ensure that the VERIFY
command executed to spinup the disk is executed with the drive properly
reconfigured first.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>


# 0fecb508 28-Aug-2023 Damien Le Moal <dlemoal@kernel.org>

ata: libata-eh: Reduce "disable device" message verbosity

There is no point in warning about a device being disabled when we
expect it to be, that is, on suspend, shutdown or when detaching the
device.

Suppress the message "disable device" for these cases by introducing the
EH static function ata_eh_dev_disable() and by using it in
ata_eh_unload() and ata_eh_detach_dev(). ata_dev_disable() code is
modified to call this new function after printing the "disable device"
message.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>


# 7f95731c 28-Aug-2023 Damien Le Moal <dlemoal@kernel.org>

ata: libata-eh: Improve reset error messages

Some drives are really slow to spinup on resume, resulting is a very
slow response to COMRESET and to error messages such as:

ata1: COMRESET failed (errno=-16)
ata1: link is slow to respond, please be patient (ready=0)
ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
ata1.00: configured for UDMA/133

Given that the slowness of the response is indicated with the message
"link is slow to respond..." and that resets are retried until the
device is detected as online after up to 1min (ata_eh_reset_timeouts),
there is no point in printing the "COMRESET failed" error message. Let's
not scare the user with non fatal errors and only warn about reset
failures in ata_eh_reset() when all reset retries have been exhausted.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>


# 49728bdc 11-Sep-2023 Damien Le Moal <dlemoal@kernel.org>

ata: libata-eh: Fix compilation warning in ata_eh_link_report()

The 6 bytes length of the tries_buf string in ata_eh_link_report() is
too short and results in a gcc compilation warning with W-!:

drivers/ata/libata-eh.c: In function ‘ata_eh_link_report’:
drivers/ata/libata-eh.c:2371:59: warning: ‘%d’ directive output may be truncated writing between 1 and 11 bytes into a region of size 4 [-Wformat-truncation=]
2371 | snprintf(tries_buf, sizeof(tries_buf), " t%d",
| ^~
drivers/ata/libata-eh.c:2371:56: note: directive argument in the range [-2147483648, 4]
2371 | snprintf(tries_buf, sizeof(tries_buf), " t%d",
| ^~~~~~
drivers/ata/libata-eh.c:2371:17: note: ‘snprintf’ output between 4 and 14 bytes into a destination of size 6
2371 | snprintf(tries_buf, sizeof(tries_buf), " t%d",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2372 | ap->eh_tries);
| ~~~~~~~~~~~~~

Avoid this warning by increasing the string size to 16B.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>


# aa3998db 25-Aug-2023 Damien Le Moal <dlemoal@kernel.org>

ata: libata-scsi: Disable scsi device manage_system_start_stop

The introduction of a device link to create a consumer/supplier
relationship between the scsi device of an ATA device and the ATA port
of that ATA device fixes the ordering of system suspend and resume
operations. For suspend, the scsi device is suspended first and the ata
port after it. This is fine as this allows the synchronize cache and
START STOP UNIT commands issued by the scsi disk driver to be executed
before the ata port is disabled.

For resume operations, the ata port is resumed first, followed
by the scsi device. This allows having the request queue of the scsi
device to be unfrozen after the ata port resume is scheduled in EH,
thus avoiding to see new requests prematurely issued to the ATA device.
Since libata sets manage_system_start_stop to 1, the scsi disk resume
operation also results in issuing a START STOP UNIT command to the
device being resumed so that the device exits standby power mode.

However, restoring the ATA device to the active power mode must be
synchronized with libata EH processing of the port resume operation to
avoid either 1) seeing the start stop unit command being received too
early when the port is not yet resumed and ready to accept commands, or
after the port resume process issues commands such as IDENTIFY to
revalidate the device. In this last case, the risk is that the device
revalidation fails with timeout errors as the drive is still spun down.

Commit 0a8589055936 ("ata,scsi: do not issue START STOP UNIT on resume")
disabled issuing the START STOP UNIT command to avoid issues with it.
But this is incorrect as transitioning a device to the active power
mode from the standby power mode set on suspend requires a media access
command. The IDENTIFY, READ LOG and SET FEATURES commands executed in
libata EH context triggered by the ata port resume operation may thus
fail.

Fix these synchronization issues is by handling a device power mode
transitions for system suspend and resume directly in libata EH context,
without relying on the scsi disk driver management triggered with the
manage_system_start_stop flag.

To do this, the following libata helper functions are introduced:

1) ata_dev_power_set_standby():

This function issues a STANDBY IMMEDIATE command to transitiom a device
to the standby power mode. For HDDs, this spins down the disks. This
function applies only to ATA and ZAC devices and does nothing otherwise.
This function also does nothing for devices that have the
ATA_FLAG_NO_POWEROFF_SPINDOWN or ATA_FLAG_NO_HIBERNATE_SPINDOWN flag
set.

For suspend, call ata_dev_power_set_standby() in
ata_eh_handle_port_suspend() before the port is disabled and frozen.
ata_eh_unload() is also modified to transition all enabled devices to
the standby power mode when the system is shutdown or devices removed.

2) ata_dev_power_set_active() and

This function applies to ATA or ZAC devices and issues a VERIFY command
for 1 sector at LBA 0 to transition the device to the active power mode.
For HDDs, since this function will complete only once the disk spin up.
Its execution uses the same timeouts as for reset, to give the drive
enough time to complete spinup without triggering a command timeout.

For resume, call ata_dev_power_set_active() in
ata_eh_revalidate_and_attach() after the port has been enabled and
before any other command is issued to the device.

With these changes, the manage_system_start_stop and no_start_on_resume
scsi device flags do not need to be set in ata_scsi_dev_config(). The
flag manage_runtime_start_stop is still set to allow the sd driver to
spinup/spindown a disk through the sd runtime operations.

Fixes: 0a8589055936 ("ata,scsi: do not issue START STOP UNIT on resume")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>


# 7a3bc2b3 13-Sep-2023 Niklas Cassel <niklas.cassel@wdc.com>

ata: libata-eh: do not thaw the port twice in ata_eh_reset()

commit 1e641060c4b5 ("libata: clear eh_info on reset completion") added
a workaround that broke the retry mechanism in ATA EH.

Tejun himself suggested to remove this workaround when it was identified
to cause additional problems:
https://lore.kernel.org/linux-ide/20110426135027.GI878@htj.dyndns.org/

He even said:
"Hmm... it seems I wasn't thinking straight when I added that work around."
https://lore.kernel.org/linux-ide/20110426155229.GM878@htj.dyndns.org/

While removing the workaround solved the issue, however, the workaround was
kept to avoid "spurious hotplug events during reset", and instead another
workaround was added on top of the existing workaround in commit
8c56cacc724c ("libata: fix unexpectedly frozen port after ata_eh_reset()").

Because these IRQs happened when the port was frozen, we know that they
were actually a side effect of PxIS and IS.IPS(x) not being cleared before
the COMRESET. This is now done in commit 94152042eaa9 ("ata: libahci: clear
pending interrupt status"), so these workarounds can now be removed.

Since commit 1e641060c4b5 ("libata: clear eh_info on reset completion") has
now been reverted, the ATA EH retry mechanism is functional again, so there
is once again no need to thaw the port more than once in ata_eh_reset().

This reverts "the workaround on top of the workaround" introduced in commit
8c56cacc724c ("libata: fix unexpectedly frozen port after ata_eh_reset()").

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>


# 80cc944e 13-Sep-2023 Niklas Cassel <niklas.cassel@wdc.com>

ata: libata-eh: do not clear ATA_PFLAG_EH_PENDING in ata_eh_reset()

ata_scsi_port_error_handler() starts off by clearing ATA_PFLAG_EH_PENDING,
before calling ap->ops->error_handler() (without holding the ap->lock).

If an error IRQ is received while ap->ops->error_handler() is running,
the irq handler will set ATA_PFLAG_EH_PENDING.

Once ap->ops->error_handler() returns, ata_scsi_port_error_handler()
checks if ATA_PFLAG_EH_PENDING is set, and if it is, another iteration
of ATA EH is performed.

The problem is that ATA_PFLAG_EH_PENDING is not only cleared by
ata_scsi_port_error_handler(), it is also cleared by ata_eh_reset().

ata_eh_reset() is called by ap->ops->error_handler(). This additional
clearing done by ata_eh_reset() breaks the whole retry logic in
ata_scsi_port_error_handler(). Thus, if an error IRQ is received while
ap->ops->error_handler() is running, the port will currently remain
frozen and will never get re-enabled.

The additional clearing in ata_eh_reset() was introduced in commit
1e641060c4b5 ("libata: clear eh_info on reset completion").

Looking at the original error report:
https://marc.info/?l=linux-ide&m=124765325828495&w=2

We can see the following happening:
[ 1.074659] ata3: XXX port freeze
[ 1.074700] ata3: XXX hardresetting link, stopping engine
[ 1.074746] ata3: XXX flipping SControl

[ 1.411471] ata3: XXX irq_stat=400040 CONN|PHY
[ 1.411475] ata3: XXX port freeze

[ 1.420049] ata3: XXX starting engine
[ 1.420096] ata3: XXX rc=0, class=1
[ 1.420142] ata3: XXX clearing IRQs for thawing
[ 1.420188] ata3: XXX port thawed
[ 1.420234] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

We are not supposed to be able to receive an error IRQ while the port is
frozen (PxIE is set to 0, i.e. all IRQs for the port are disabled).

AHCI 1.3.1 section 10.7.1.1 First Tier (IS Register) states:
"Each bit location can be thought of as reporting a '1' if the virtual
"interrupt line" for that port is indicating it wishes to generate an
interrupt. That is, if a port has one or more interrupt status bit set,
and the enables for those status bits are set, then this bit shall be set."

Additionally, AHCI state P:ComInit clearly shows that the state machine
will only jump to P:ComInitSetIS (which sets IS.IPS(x) to '1'), if PxIE.PCE
is set to '1'. In our case, PxIE is set to 0, so IS.IPS(x) won't get set.

So IS.IPS(x) only gets set if PxIS and PxIE is set.

AHCI 1.3.1 section 10.7.1.1 First Tier (IS Register) also states:
"The bits in this register are read/write clear. It is set by the level of
the virtual interrupt line being a set, and cleared by a write of '1' from
the software."

So if IS.IPS(x) is set, you need to explicitly clear it by writing a 1 to
IS.IPS(x) for that port.

Since PxIE is cleared, the only way to get an interrupt while the port is
frozen, is if IS.IPS(x) is set, and the only way IS.IPS(x) can be set when
the port is frozen, is if it was set before the port was frozen.

However, since commit 737dd811a3db ("ata: libahci: clear pending interrupt
status"), we clear both PxIS and IS.IPS(x) after freezing the port, but
before the COMRESET, so the problem that commit 1e641060c4b5 ("libata:
clear eh_info on reset completion") fixed can no longer happen.

Thus, revert commit 1e641060c4b5 ("libata: clear eh_info on reset
completion"), so that the retry logic in ata_scsi_port_error_handler()
works once again. (The retry logic is still needed, since we can still
get an error IRQ _after_ the port has been thawed, but before
ata_scsi_port_error_handler() takes the ap->lock in order to check
if ATA_PFLAG_EH_PENDING is set.)

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>


# ff8072d5 31-Jul-2023 Hannes Reinecke <hare@suse.de>

ata: libata: remove references to non-existing error_handler()

With commit 65a15d6560df ("scsi: ipr: Remove SATA support") all
libata drivers now have the error_handler() callback provided,
so we can stop checking for non-existing error_handler callback.

Signed-off-by: Hannes Reinecke <hare@suse.de>
[niklas: fixed review comments, rebased, solved conflicts during rebase,
fixed bug that unconditionally dumped all QCs, removed the now unused
function ata_dump_status(), removed the now unreachable failure paths in
atapi_qc_complete(), removed the non-EH function to request ATAPI sense]
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>


# ca02f225 29-Jul-2023 Sergey Shtylyov <s.shtylyov@omp.ru>

ata: libata-eh: fix reset timeout type

ata_eh_reset_timeouts[] stores 'unsigned long' timeouts in ms, while
ata_eh_reset() passes these values to ata_deadline() that takes just
'unsigned int timeout_msecs' parameter. Change the reset timeout table
element's type to 'unsigned int' -- all timeouts fit into 'unsigned int'
but we have to change ULONG_MAX to UINT_MAX...

Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>


# 12980c1f 04-Jun-2023 Damien Le Moal <dlemoal@kernel.org>

ata: libata-eh: Use ata_ncq_enabled() in ata_eh_speed_down()

In ata_eh_speed_down(), instead of hard-coding the test on the device
flags to detect if NCQ is supported and enabled, use ata_ncq_enabled().

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>


# e4c26a1b 18-May-2023 Niklas Cassel <niklas.cassel@wdc.com>

ata: libata-eh: Clarify ata_eh_qc_retry() behavior at call site

While the function documentation for ata_eh_qc_retry() is clear,
from simply reading the single function that calls ata_eh_qc_retry(),
it is not clear that ata_eh_qc_retry() might not retry the command.

Add a comment in the single function that calls ata_eh_qc_retry() to
clarify the behavior.

[Damien] Added curly braces to "if () else" with multi-line comment.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>


# 18bd7718 10-May-2023 Niklas Cassel <niklas.cassel@wdc.com>

scsi: ata: libata: Handle completion of CDL commands using policy 0xD

A CDL timeout for policy 0xF is defined as a NCQ error, just with a CDL
specific sk/asc/ascq in the sense data. Therefore, the existing code in
libata does not need to be modified to handle a policy 0xF CDL timeout.

For Command Duration Limits policy 0xD:

The device shall complete the command without error with the additional
sense code set to DATA CURRENTLY UNAVAILABLE.

Since a CDL timeout for policy 0xD is not an error, we cannot use the NCQ
Command Error log (10h).

Instead, we need to read the Sense Data for Successful NCQ Commands log
(0Fh).

In the success case, just like in the error case, we cannot simply read a
log page from the interrupt handler itself, since reading a log page
involves sending a READ LOG DMA EXT or READ LOG EXT command.

Therefore, we add a new EH action ATA_EH_GET_SUCCESS_SENSE. When a command
completes without error, and when the ATA_SENSE bit is set, this new action
is set as pending, and EH is scheduled.

This way, similar to the NCQ error case, the log page will be read from EH
context.

An alternative would have been to add a new kthread or workqueue to handle
this. However, extending EH can be done with minimal changes and avoids the
need to synchronize a new kthread/workqueue with EH.

Co-developed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20230511011356.227789-20-nks@flawful.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# 24aeebbf 10-May-2023 Niklas Cassel <niklas.cassel@wdc.com>

scsi: ata: libata: Change ata_eh_request_sense() to not set CHECK_CONDITION

Currently, ata_eh_request_sense() unconditionally sets the scsicmd->result
to SAM_STAT_CHECK_CONDITION.

For Command Duration Limits policy 0xD:

The device shall complete the command without error (SAM_STAT_GOOD) with
the additional sense code set to DATA CURRENTLY UNAVAILABLE.

It is perfectly fine to have sense data for a command that returned
completion without error.

In order to support for CDL policy 0xD, we have to remove this assumption
that having sense data means that the command failed
(SAM_STAT_CHECK_CONDITION).

Change ata_eh_request_sense() to not set SAM_STAT_CHECK_CONDITION, and
instead move the setting of SAM_STAT_CHECK_CONDITION to the single caller
that wants SAM_STAT_CHECK_CONDITION set, that way ata_eh_request_sense()
can be reused in a follow-up patch that adds support for CDL policy 0xD.

The only caller of ata_eh_request_sense() is protected by: if (!(qc->flags
& ATA_QCFLAG_SENSE_VALID)), so we can remove this duplicated check from
ata_eh_request_sense() itself.

Additionally, ata_eh_request_sense() is only called from
ata_eh_analyze_tf(), which is only called when iteratating the QCs using
ata_qc_for_each_raw(), which does not include the internal tag, so cmd can
never be NULL (all non-internal commands have qc->scsicmd set), so remove
the !cmd check as well.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20230511011356.227789-14-nks@flawful.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# 6aa0365a 15-Jun-2023 Damien Le Moal <dlemoal@kernel.org>

ata: libata-scsi: Avoid deadlock on rescan after device resume

When an ATA port is resumed from sleep, the port is reset and a power
management request issued to libata EH to reset the port and rescanning
the device(s) attached to the port. Device rescanning is done by
scheduling an ata_scsi_dev_rescan() work, which will execute
scsi_rescan_device().

However, scsi_rescan_device() takes the generic device lock, which is
also taken by dpm_resume() when the SCSI device is resumed as well. If
a device rescan execution starts before the completion of the SCSI
device resume, the rcu locking used to refresh the cached VPD pages of
the device, combined with the generic device locking from
scsi_rescan_device() and from dpm_resume() can cause a deadlock.

Avoid this situation by changing struct ata_port scsi_rescan_task to be
a delayed work instead of a simple work_struct. ata_scsi_dev_rescan() is
modified to check if the SCSI device associated with the ATA device that
must be rescanned is not suspended. If the SCSI device is still
suspended, ata_scsi_dev_rescan() returns early and reschedule itself for
execution after an arbitrary delay of 5ms.

Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Reported-by: Joe Breuer <linux-kernel@jmbreuer.net>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217530
Fixes: a19a93e4c6a9 ("scsi: core: pm: Rely on the device driver core for async power management")
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Joe Breuer <linux-kernel@jmbreuer.net>


# 87629312 29-Dec-2022 Niklas Cassel <niklas.cassel@wdc.com>

ata: scsi: rename flag ATA_QCFLAG_FAILED to ATA_QCFLAG_EH

The name ATA_QCFLAG_FAILED is misleading since it does not mean that a
QC completed in error, or that it didn't complete at all. It means that
libata decided to schedule EH for the QC, so the QC is now owned by the
libata error handler (EH).

The normal execution path is responsible for not accessing a QC owned
by EH. libata core enforces the rule by returning NULL from
ata_qc_from_tag() for QCs owned by EH.

It is quite easy to mistake that a QC marked with ATA_QCFLAG_FAILED was
an error. However, a QC that was actually an error is instead indicated
by having qc->err_mask set. E.g. when we have a NCQ error, we abort all
QCs, which currently will mark all QCs as ATA_QCFLAG_FAILED. However, it
will only be a single QC that is an error (i.e. has qc->err_mask set).

Rename ATA_QCFLAG_FAILED to ATA_QCFLAG_EH to more clearly highlight that
this flag simply means that a QC is now owned by EH. This new name will
not mislead to think that the QC was an error (which is instead
indicated by having qc->err_mask set).

This also makes it more obvious that the EH code skips all QCs that do
not have ATA_QCFLAG_EH set (rather than ATA_QCFLAG_FAILED), since the EH
code should simply only care about QCs that are owned by EH itself.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>


# b83ad9ee 15-Dec-2022 Wenchao Hao <haowenchao@huawei.com>

ata: libata-eh: Cleanup ata_scsi_cmd_error_handler()

If ap->ops->error_handler is NULL just return. This patch also
fixes some comment style issue.

Signed-off-by: Wenchao Hao <haowenchao@huawei.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>


# 3d8a3ae3 14-Nov-2022 Niklas Cassel <niklas.cassel@wdc.com>

ata: libata: fix commands incorrectly not getting retried during NCQ error

A NCQ error means that the device has aborted processing of all active
commands.
To get the single NCQ command that caused the NCQ error, host software has
to read the NCQ error log, which also takes the device out of error state.

When the device encounters a NCQ error, we receive an error interrupt from
the HBA, and call ata_do_link_abort() to mark all outstanding commands on
the link as ATA_QCFLAG_FAILED (which means that these commands are owned
by libata EH), and then call ata_qc_complete() on them.

ata_qc_complete() will call fill_result_tf() for all commands marked as
ATA_QCFLAG_FAILED.

The taskfile is simply the latest status/error as seen from the device's
perspective. The taskfile will have ATA_ERR set in the status field and
ATA_ABORTED set in the error field.

When we fill the current taskfile values for all outstanding commands,
that means that qc->result_tf will have ATA_ERR set for all commands
owned by libata EH.

When ata_eh_link_autopsy() later analyzes all commands owned by libata EH,
it will call ata_eh_analyze_tf(), which will check if qc->result_tf has
ATA_ERR set, if it does, it will set qc->err_mask (which marks the command
as an error).

When ata_eh_finish() later calls __ata_qc_complete() on all commands owned
by libata EH, it will call qc->complete_fn() (ata_scsi_qc_complete()),
ata_scsi_qc_complete() will call ata_gen_ata_sense() to generate sense
data if qc->err_mask is set.

This means that we will generate sense data for commands that should not
have any sense data set. Having sense data set for the non-failed commands
will cause SCSI to finish these commands instead of retrying them.

While this incorrect behavior has existed for a long time, this first
became a problem once we started reading the correct taskfile register in
commit 4ba09d202657 ("ata: libahci: read correct status and error field
for NCQ commands").

Before this commit, NCQ commands would read the taskfile values received
from the last non-NCQ command completion, which most likely did not have
ATA_ERR set, since the last non-NCQ command was most likely not an error.

Fix this by changing ata_eh_analyze_ncq_error() to mark all non-failed
commands as ATA_QCFLAG_RETRY, and change the loop in ata_eh_link_autopsy()
to skip commands marked as ATA_QCFLAG_RETRY.

While at it, make sure that we clear ATA_ERR and any error bits for all
commands except the actual command that caused the NCQ error, so that no
other libata code will be able to misinterpret these commands as errors.

Fixes: 4ba09d202657 ("ata: libahci: read correct status and error field for NCQ commands")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>


# 4cb7c6f1 07-Oct-2022 Niklas Cassel <niklas.cassel@wdc.com>

ata: make use of ata_port_is_frozen() helper

Clean up the code by making use of the newly introduced
ata_port_is_frozen() helper function.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>


# 013115d9 26-Sep-2022 Niklas Cassel <niklas.cassel@wdc.com>

ata: libata: fetch sense data for ATA devices supporting sense reporting

Currently, the sense data reporting feature set is enabled for all ATA
devices which supports the feature set (ata_id_has_sense_reporting()),
see ata_dev_config_sense_reporting().

However, even if sense data reporting is enabled, and the device
indicates that sense data is available, the sense data is only fetched
for ATA ZAC devices. For regular ATA devices, the available sense data
is never fetched, it is simply ignored. Instead, libata will use the
ERROR + STATUS fields and map them to a very generic and reduced set
of sense data, see ata_gen_ata_sense() and ata_to_sense_error().

When sense data reporting was first implemented, regular ATA devices
did fetch the sense data from the device. However, this was restricted
to only ATA ZAC devices in commit ca156e006add ("libata: don't request
sense data on !ZAC ATA devices").

With recent changes related to sense data and NCQ autosense, we want
to, once again, fetch the sense data for all ATA devices supporting
sense reporting.
ata_gen_ata_sense() should only be used for devices that don't support
the sense data reporting feature set.
hopefully the features will be more robust this time around.

It is not just ZAC, many new ATA features, e.g. Command Duration
Limits, relies on working NCQ autosense and sense data. Therefore,
it is not really an option to avoid fetching the sense data forever.

If we encounter a device that is misbehaving because the sense data is
actually fetched, then that device should be quirked such that it
never enables the sense data reporting feature set in the first place,
since such a device is obviously not compliant with the specification.

The order in which we will try to add sense data to a scsi_cmnd:
1) NCQ autosense (if supported) - ata_eh_analyze_ncq_error()
2) REQUEST SENSE DATA EXT (if supported) - ata_eh_request_sense()
3) error + status field translation - ata_gen_ata_sense(), called
by ata_scsi_qc_complete() if neither 1) or 2) is supported.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>


# 4b89ad8e 26-Sep-2022 Niklas Cassel <niklas.cassel@wdc.com>

ata: libata: only set sense valid flag if sense data is valid

While this shouldn't be needed if all devices that claim that they
support NCQ autosense (ata_id_has_ncq_autosense()) and/or the sense
data reporting feature (ata_id_has_sense_reporting()), actually
supported those features.

However, there might be some old ATA devices that either have these
bits set, even when they don't support those features, or they simply
return malformed data when using those features.

These devices should be quirked, but in order to try to minimize the
impact for the users of these such devices, it was suggested by Damien
Le Moal that it might be a good idea to sanity check the sense data
received from the device. If the sense data looks bogus, then the
sense data is never added to the scsi_cmnd command.

Introduce a new function, ata_scsi_sense_is_valid(), and use it in all
places where sense data is received from the device.

Suggested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>


# 461ec040 26-Sep-2022 Niklas Cassel <niklas.cassel@wdc.com>

ata: libata: clarify when ata_eh_request_sense() will be called

ata_eh_request_sense() returns early when flag ATA_QCFLAG_SENSE_VALID is
set. However, since the call to ata_eh_request_sense() is guarded by a
ATA_SENSE bit conditional, the logical conclusion for the reader is that
all checks are performed at the call site.

Highlight the fact that the sense data will not be fetched if flag
ATA_QCFLAG_SENSE_VALID is already set by adding an additional check to
the existing guarding conditional. No functional change.

Additionally, add a comment explaining that ata_eh_analyze_tf() will
only fetch the sense data if:
-It was a non-NCQ command that failed, or
-It was a NCQ command that failed, but the sense data was not included
in the NCQ command error log (i.e. NCQ autosense is not supported).

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>


# 71d7b6e5 21-Sep-2022 Niklas Cassel <niklas.cassel@wdc.com>

ata: libata-eh: avoid needless hard reset when revalidating link

Performing a revalidation on a AHCI controller supporting LPM,
while using a lpm mode of e.g. med_power_with_dip (hipm + dipm) or
medium_power (hipm), will currently always lead to a hard reset.

The expected behavior is that a hard reset is only performed when
revalidate fails, because the properties of the drive has changed.

A revalidate performed after e.g. a NCQ error, or such a simple thing
as disabling write-caching (hdparm -W 0 /dev/sda), should succeed on
the first try (and should therefore not cause the link to be reset).

This unwarranted hard reset happens because ata_phys_link_offline()
returns true for a link that is in deep sleep. Thus the call to
ata_phys_link_offline() in ata_eh_revalidate_and_attach() will cause
the revalidation to fail, which causes ata_eh_handle_dev_fail() to be
called, which will set ehc->i.action |= ATA_EH_RESET, such that the
link is reset before retrying revalidation.

When the link is reset, the link is reestablished, so when
ata_eh_revalidate_and_attach() is called the second time, directly
after the link has been reset, ata_phys_link_offline() will return
false, and the revalidation will succeed.

Looking at "8.3.1.3 HBA Initiated" in the AHCI 1.3.1 specification,
it is clear the when host software writes a new command to memory,
by setting a bit in the PxCI/PxSACT HBA port registers, the HBA will
automatically bring back the link before sending out the Command FIS.

However, simply reading a SCR (like ata_phys_link_offline() does),
will not cause the HBA to automatically bring back the link.

As long as hipm is enabled, the HBA will put an idle link into deep
sleep. Avoid this needless hard reset on revalidation by temporarily
disabling hipm, by setting the LPM mode to ATA_LPM_MAX_POWER.

After revalidation is complete, ata_eh_recover() will restore the link
policy by setting the LPM mode to ap->target_lpm_policy.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>


# e3b1fff6 16-Sep-2022 Niklas Cassel <niklas.cassel@wdc.com>

ata: libata: drop superfluous ata_eh_analyze_tf() parameter

The parameter can easily be derived from struct ata_queued_cmd.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>


# b46c760e 16-Sep-2022 Niklas Cassel <niklas.cassel@wdc.com>

ata: libata: drop superfluous ata_eh_request_sense() parameter

The parameter can easily be derived from struct ata_queued_cmd.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>


# cb6e73aa 20-Sep-2022 ye xingchen <ye.xingchen@zte.com.cn>

ata: libata-eh: Remove the unneeded result variable

Return the value ata_port_abort() directly instead of storing it in
another redundant variable.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>


# d3122bf9 11-Aug-2022 Damien Le Moal <damien.lemoal@opensource.wdc.com>

ata: libata-eh: Add missing command name

Add the missing command name for ATA_CMD_NCQ_NON_DATA to
ata_get_cmd_name().

Fixes: 661ce1f0c4a6 ("libata/libsas: Define ATA_CMD_NCQ_NON_DATA")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>


# e06233f9 18-Jun-2022 Sergey Shtylyov <s.shtylyov@omp.ru>

ata: libata-eh: fix sloppy result type of ata_internal_cmd_timeout()

ata_internal_cmd_timeout() returns *unsigned long* timeout in ms, however
ata_exec_internal_sg() passes that timeout to msecs_to_jiffies() that takes
just *unsigned int*. Change ata_internal_cmd_timeout()'s result type to
*unsigned int* as well, also updating the *struct* ata_eh_cmd_timeout_ent
and the command timeout tables -- all timeouts fit into *unsigned int* but
we have to change ULONG_MAX to UINT_MAX...

Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>


# afae461a 16-Jun-2022 Sergey Shtylyov <s.shtylyov@omp.ru>

ata: libata-eh: fix sloppy result type of ata_eh_nr_in_flight()

ata_eh_nr_in_flight() counts the # of the active tagged commands and
thus cannot return a negative value but the result type is nevertheless
int. Switching it to unsigned int (along with the local variables
receiving the function's result) helps avoiding the sign extension
instructions when comparing with or assigning to unsigned long
ata_port::fastdrain_cnt and thus results in a more compact 64-bit
code.

Found by Linux Verification Center (linuxtesting.org) with the SVACE
static analysis tool.

[Damien]
Fixed commit message.

Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>


# efcef265 15-Feb-2022 Sergey Shtylyov <s.shtylyov@omp.ru>

ata: add/use ata_taskfile::{error|status} fields

Add the explicit error and status register fields to 'struct ata_taskfile'
using the anonymous *union*s ('struct ide_taskfile' had that for ages!) and
update the libata taskfile code accordingly. There should be no object code
changes resulting from that...

Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>


# 2a7b02ea 01-Feb-2022 Sergey Shtylyov <s.shtylyov@omp.ru>

ata: libata-acpi: kill ata_acpi_on_suspend()

Since the commit c05e6ff035c1b25d17364a685432 ("libata-acpi: implement
and use ata_acpi_init_gtm()") ata_acpi_on_suspend() just returns 0, so
its call from ata_eh_handle_port_suspend() doesn't make sense anymore.
Remove the function completely, at last...

Found by Linux Verification Center (linuxtesting.org) with the SVACE
static analysis tool.

Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>


# 1c95a27c 21-Dec-2021 Hannes Reinecke <hare@suse.de>

ata: libata: drop ata_msg_drv()

Callers are already protected by ata_dev_print_info(), so no need
to have an additional configuration parameter here.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>


# d4520903 21-Dec-2021 Hannes Reinecke <hare@suse.de>

ata: libata: revamp ata_get_cmd_descript()

Rename ata_get_cmd_descrip() to ata_get_cmd_name() and simplify
it to return "unknown" instead of NULL.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>


# c318458c 21-Dec-2021 Hannes Reinecke <hare@suse.de>

ata: libata: add tracepoints for ATA error handling

Add tracepoints for ATA error handling.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>


# f8ec26d0 21-Dec-2021 Hannes Reinecke <hare@suse.de>

ata: libata: add reset tracepoints

To follow the flow of control we should be using tracepoints, as
they will tie in with the actual I/O flow and deliver a better
overview about what it happening.
This patch adds tracepoints for hard reset, soft reset, and postreset
and adds them in the libata-eh control flow.
With that we can drop the reset DPRINTK calls in the various drivers.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>


# 68dbbe7d 04-Nov-2021 Damien Le Moal <damien.lemoal@opensource.wdc.com>

libata: fix read log timeout value

Some ATA drives are very slow to respond to READ_LOG_EXT and
READ_LOG_DMA_EXT commands issued from ata_dev_configure() when the
device is revalidated right after resuming a system or inserting the
ATA adapter driver (e.g. ahci). The default 5s timeout
(ATA_EH_CMD_DFL_TIMEOUT) used for these commands is too short, causing
errors during the device configuration. Ex:

...
ata9: SATA max UDMA/133 abar m524288@0x9d200000 port 0x9d200400 irq 209
ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
ata9.00: ATA-9: XXX XXXXXXXXXXXXXXX, XXXXXXXX, max UDMA/133
ata9.00: qc timeout (cmd 0x2f)
ata9.00: Read log page 0x00 failed, Emask 0x4
ata9.00: Read log page 0x00 failed, Emask 0x40
ata9.00: NCQ Send/Recv Log not supported
ata9.00: Read log page 0x08 failed, Emask 0x40
ata9.00: 27344764928 sectors, multi 16: LBA48 NCQ (depth 32), AA
ata9.00: Read log page 0x00 failed, Emask 0x40
ata9.00: ATA Identify Device Log not supported
ata9.00: failed to set xfermode (err_mask=0x40)
ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
ata9.00: configured for UDMA/133
...

The timeout error causes a soft reset of the drive link, followed in
most cases by a successful revalidation as that give enough time to the
drive to become fully ready to quickly process the read log commands.
However, in some cases, this also fails resulting in the device being
dropped.

Fix this by using adding the ata_eh_revalidate_timeouts entries for the
READ_LOG_EXT and READ_LOG_DMA_EXT commands. This defines a timeout
increased to 15s, retriable one time.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>


# c8329cd5 09-Aug-2021 Bart Van Assche <bvanassche@acm.org>

scsi: ata: Use scsi_cmd_to_rq() instead of scsi_cmnd.request

Prepare for removal of the request pointer by using scsi_cmd_to_rq()
instead. This patch does not change any functionality.

Link: https://lore.kernel.org/r/20210809230355.8186-8-bvanassche@acm.org
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# e06abcc6 20-Apr-2021 Gustavo A. R. Silva <gustavo@embeddedor.com>

libata: Fix fall-through warnings for Clang

In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning
by explicitly adding a break statement instead of letting the code fall
through to the next case.

Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# b8e162f9 15-Apr-2021 Bart Van Assche <bvanassche@acm.org>

scsi: core: Introduce enum scsi_disposition

Improve readability of the code in the SCSI core by introducing an
enumeration type for the values used internally that decide how to continue
processing a SCSI command. The eh_*_handler return values have not been
changed because that would involve modifying all SCSI drivers.

The output of the following command has been inspected to verify that no
out-of-range values are assigned to a variable of type enum
scsi_disposition:

KCFLAGS=-Wassign-enum make CC=clang W=1 drivers/scsi/

Link: https://lore.kernel.org/r/20210415220826.29438-6-bvanassche@acm.org
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# 94bd5719 23-Oct-2020 Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

ata: fix some kernel-doc markups

Some functions have different names between their prototypes
and the kernel-doc markup.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# df561f66 23-Aug-2020 Gustavo A. R. Silva <gustavoars@kernel.org>

treewide: Use fallthrough pseudo-keyword

Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>


# a0ccd251 26-Mar-2020 Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>

ata: move ata_eh_analyze_ncq_error() & co. to libata-sata.c

* move ata_eh_analyze_ncq_error() and ata_eh_read_log_10h() to
libata-sata.c

* add static inline for ata_eh_analyze_ncq_error() for
CONFIG_SATA_HOST=n case (link->sactive is non-zero only if
NCQ commands are actually queued so empty function body is
sufficient)

Code size savings on m68k arch using (modified) atari_defconfig:

text data bss dec hex filename
before:
16164 18 0 16182 3f36 drivers/ata/libata-eh.o
after:
15446 18 0 15464 3c68 drivers/ata/libata-eh.o

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# a695de27 26-Mar-2020 Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>

ata: start separating SATA specific code from libata-eh.c

Start separating SATA specific code from libata-eh.c:

* move sata_async_notification() to libata-sata.c:

Code size savings on m68k arch using (modified) atari_defconfig:

text data bss dec hex filename
before:
16243 18 0 16261 3f85 drivers/ata/libata-eh.o
after:
16164 18 0 16182 3f36 drivers/ata/libata-eh.o

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 4c9029e7 26-Mar-2020 Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>

ata: let compiler optimize out ata_eh_set_lpm() on non-SATA hosts

Add !IS_ENABLED(CONFIG_SATA_HOST) to ata_eh_set_lpm() to allow
compiler to optimize out the function for non-SATA configs (for
PATA hosts "ap && !ap->ops->set_lpm" condition is always true so
it's sufficient for the function to return zero).

Code size savings on m68k arch using (modified) atari_defconfig:

text data bss dec hex filename
before:
17353 18 0 17371 43db drivers/ata/libata-eh.o
after:
16607 18 0 16625 40f1 drivers/ata/libata-eh.o

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 2b67a6d3 26-Mar-2020 Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>

ata: remove EXPORT_SYMBOL_GPL()s not used by modules

Remove EXPORT_SYMBOL_GPL()s for functions used only by
the core libata code.

Code size savings on m68k arch using (modified) atari_defconfig:

text data bss dec hex filename
before:
39838 573 40 40451 9e03 drivers/ata/libata-core.o
21071 105 576 21752 54f8 drivers/ata/libata-scsi.o
17519 18 0 17537 4481 drivers/ata/libata-eh.o
after:
39688 573 40 40301 9d6d drivers/ata/libata-core.o
21040 105 576 21721 54d9 drivers/ata/libata-scsi.o
17405 18 0 17423 440f drivers/ata/libata-eh.o

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# a52fbcfc 26-Mar-2020 Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>

ata: move EXPORT_SYMBOL_GPL()s close to exported code

Move EXPORT_SYMBOL_GPL()s close to exported code like it is
done in other kernel subsystems. As a nice side effect this
results in the removal of few ifdefs.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 3e1ee734 26-Mar-2020 Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>

ata: remove stale maintainership information from core code

In commit 7634ccd2da97 ("libata: maintainership update") from 2018
Jens has officially taken over libata maintainership from Tejun so
remove stale information from core libata code.

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# ca156e00 24-Jun-2019 Tejun Heo <tj@kernel.org>

libata: don't request sense data on !ZAC ATA devices

ZAC support added sense data requesting on error for both ZAC and ATA
devices. This seems to cause erratic error handling behaviors on some
SSDs where the device reports sense data availability and then
delivers the wrong content making EH take the wrong actions. The
failure mode was sporadic on a LITE-ON ssd and couldn't be reliably
reproduced.

There is no value in requesting sense data from non-ZAC ATA devices
while there's a significant risk of introducing EH misbehaviors which
are difficult to reproduce and fix. Let's do the sense data dancing
only for ZAC devices.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Tested-by: Masato Suzuki <masato.suzuki@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# c82ee6d3 19-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 18

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 or at your option any
later version this program is distributed in the hope that it will
be useful but without any warranty without even the implied warranty
of merchantability or fitness for a particular purpose see the gnu
general public license for more details you should have received a
copy of the gnu general public license along with this program see
the file copying if not write to the free software foundation 675
mass ave cambridge ma 02139 usa

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 52 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jilayne Lovejoy <opensource@jilayne.com>
Reviewed-by: Steve Winslow <swinslow@gmail.com>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190519154042.342335923@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 39795d65 14-Nov-2018 Christoph Hellwig <hch@lst.de>

block: don't hold the queue_lock over blk_abort_request

There is nothing it could synchronize against, so don't go through
the pains of acquiring the lock.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 258c4e5c 19-Jun-2018 Jens Axboe <axboe@kernel.dk>

libata: convert eh to command iterators

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Tejun Heo <tj@kernel.org>


# 01fc27d9 29-May-2018 Christoph Hellwig <hch@lst.de>

libata: remove ata_scsi_timed_out

As far as I can tell this function can't even be called any more, given
that ATA implements its own eh_strategy_handler with ata_scsi_error, which
never calls ->eh_timed_out.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 28361c40 11-May-2018 Jens Axboe <axboe@kernel.dk>

libata: add extra internal command

Bump the internal tag to 32, instead of stealing the last tag in
our regular command space. This works just fine, since we don't
actually need a separate hardware tag for this. Internal commands
cannot coexist with NCQ commands.

As a bonus, we get rid of the special casing of what tag to use
for the internal command.

This is in preparation for utilizing all 32 commands for normal IO.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Tejun Heo <tj@kernel.org>


# 9d207acc 11-May-2018 Jens Axboe <axboe@kernel.dk>

libata: remove assumption that ATA_MAX_QUEUE - 1 is the max

In a few spots we iterate to ATA_MAX_QUEUE -1, including internal
knowledge that the last tag is the internal tag. Remove this
assumption.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Tejun Heo <tj@kernel.org>


# 804689ad 08-May-2018 Damien Le Moal <damien.lemoal@wdc.com>

libata: Fix command retry decision

For failed commands with valid sense data (e.g. NCQ commands),
scsi_check_sense() is used in ata_analyze_tf() to determine if the
command can be retried. In such case, rely on this decision and ignore
the command error mask based decision done in ata_worth_retry().

This fixes useless retries of commands such as unaligned writes on zoned
disks (TYPE_ZAC).

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>


# 7eb49509 08-May-2018 Damien Le Moal <damien.lemoal@wdc.com>

libata: Honor RQF_QUIET flag

Currently, libata ignores requests RQF_QUIET flag and print error
messages for failed commands, regardless if this flag is set in the
command request. Fix this by introducing the ata_eh_quiet() function and
using this function in ata_eh_link_autopsy() to determine if the EH
context should be quiet. This works by counting the number of failed
commands and the number of commands with the quiet flag set. If both
numbers are equal, the the EH context can be set to quiet and all error
messages suppressed. Otherwise, only the error messages for the failed
commands are suppressed and the link Emask and irq_stat messages printed.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>


# 54fb131b 08-May-2018 Damien Le Moal <damien.lemoal@wdc.com>

libata: Fix ata_err_string()

Add proper error string output for ATA_ERR_NCQ and ATA_ERR_NODEV_HINT
instead of returning "unknown error".

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>


# 79487259 08-May-2018 Damien Le Moal <damien.lemoal@wdc.com>

libata: Fix comment typo in ata_eh_analyze_tf()

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>


# 0d74d872 05-May-2018 Mathieu Malaterre <malat@debian.org>

driver core: add __printf verification to __ata_ehi_pushv_desc

__printf is useful to verify format and arguments. Remove the following
warning (with W=1):

drivers/ata/libata-eh.c:183:10: warning: function might be possible candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format]

Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Tejun Heo <tj@kernel.org>


# 6f54120e 27-Feb-2018 Jason Yan <yanaijie@huawei.com>

ata: do not schedule hot plug if it is a sas host

We've got a kernel panic when using sata disk with sas controller:

[115946.152283] Unable to handle kernel NULL pointer dereference at virtual address 000007d8
[115946.223963] CPU: 0 PID: 22175 Comm: kworker/0:1 Tainted: G W OEL 4.14.0 #1
[115946.232925] Workqueue: events ata_scsi_hotplug
[115946.237938] task: ffff8021ee50b180 task.stack: ffff00000d5d0000
[115946.244717] PC is at sas_find_dev_by_rphy+0x44/0x114
[115946.250224] LR is at sas_find_dev_by_rphy+0x3c/0x114
......
[115946.355701] Process kworker/0:1 (pid: 22175, stack limit = 0xffff00000d5d0000)
[115946.363369] Call trace:
[115946.456356] [<ffff000008878a9c>] sas_find_dev_by_rphy+0x44/0x114
[115946.462908] [<ffff000008878b8c>] sas_target_alloc+0x20/0x5c
[115946.469408] [<ffff00000885a31c>] scsi_alloc_target+0x250/0x308
[115946.475781] [<ffff00000885ba30>] __scsi_add_device+0xb0/0x154
[115946.481991] [<ffff0000088b520c>] ata_scsi_scan_host+0x180/0x218
[115946.488367] [<ffff0000088b53d8>] ata_scsi_hotplug+0xb0/0xcc
[115946.494801] [<ffff0000080ebd70>] process_one_work+0x144/0x390
[115946.501115] [<ffff0000080ec100>] worker_thread+0x144/0x418
[115946.507093] [<ffff0000080f2c98>] kthread+0x10c/0x138
[115946.512792] [<ffff0000080855dc>] ret_from_fork+0x10/0x18

We found that Ding Xiang has reported a similar bug before:
https://patchwork.kernel.org/patch/9179817/

And this bug still exists in mainline. Since libsas handles hotplug and
device adding/removing itself, do not need to schedule ata hot plug task
here if it is a sas host.

Signed-off-by: Jason Yan <yanaijie@huawei.com>
Cc: Ding Xiang <dingxiang@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>


# f1601113 02-Nov-2017 Rameshwar Prasad Sahu <rsahu@apm.com>

ata: fixes kernel crash while tracing ata_eh_link_autopsy event

When tracing ata link error event, the kernel crashes when the disk is
removed due to NULL pointer access by trace_ata_eh_link_autopsy API.
This occurs as the dev is NULL when the disk disappeared. This patch
fixes this crash by calling trace_ata_eh_link_autopsy only if "dev"
is not NULL.

v2 changes:
Removed direct passing "link" pointer instead of "dev" in trace API.

Signed-off-by: Rameshwar Prasad Sahu <rsahu@apm.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 255c03d15a29 ("libata: Add tracepoints")
Cc: stable@vger.kernel.org # v4.1+


# 05b83605 12-Oct-2017 Gustavo A. R. Silva <garsilva@embeddedor.com>

ata: mark expected switch fall-throughs

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

In cases where a "drop through" comment was already in place, I replaced
it with a proper "fall through" comment, which is what GCC is expecting
to find.

Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: Tejun Heo <tj@kernel.org>


# b93ab338 16-Oct-2017 Kees Cook <keescook@chromium.org>

libata: Convert timers to use timer_setup()

In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: linux-ide@vger.kernel.org
Link: https://lkml.kernel.org/r/20171005004842.GA23011@beast


# f4ac6476 13-Sep-2017 Hans de Goede <hdegoede@redhat.com>

libata: Add new med_power_with_dipm link_power_management_policy setting

As described by Matthew Garret quite a while back:
https://mjg59.dreamwidth.org/34868.html

Intel CPUs starting with the Haswell generation need SATA links to power
down for the "package" part of the CPU to reach low power-states like
PC7 / P8 which bring a significant power-saving with them.

The default max_performance lpm policy does not allow for these high
PC states, both the medium_power and min_power policies do allow this.

The min_power policy saves significantly more power, but there are some
reports of some disks / SSDs not liking min_power leading to system
crashes and in some cases even data corruption has been reported.

Matthew has found a document documenting the default settings of
Intel's IRST Windows driver with which most laptops ship:
https://www-ssl.intel.com/content/dam/doc/reference-guide/sata-devices-implementation-recommendations.pdf

Matthew wrote a patch changing med_power to match those defaults, but
that never got anywhere as some people where reporting issues with the
patch-set that patch was a part of.

This commit is another attempt to make the default IRST driver settings
available under Linux, but instead of changing medium_power and
potentially introducing regressions, this commit adds a new
med_power_with_dipm setting which is identical to the existing
medium_power accept that it enables dipm on top, which makes it match
the Windows IRST driver settings, which should hopefully be safe to
use on most devices.

The med_power_with_dipm setting is close to min_power, except that:
a) It does not use host-initiated slumber mode (ASP not set),
but it does allow device-initiated slumber
b) It does not enable DevSlp mode

On my T440s test laptop I get the following power savings when idle:
medium_power 0.9W
med_power_with_dipm 1.2W
min_power 1.2W

Suggested-by: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>


# a4f08141 29-Jun-2017 Paul E. McKenney <paulmck@kernel.org>

drivers/ata: Replace spin_unlock_wait() with lock/unlock pair

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair. This commit therefore eliminates the spin_unlock_wait() call and
associated else-clause and hoists the then-clause's lock and unlock out of
the "if" statement. This should be safe from a performance perspective
because according to Tejun there should be few if any drivers that don't
set their own error handler.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: <linux-ide@vger.kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>


# 2f60e1ab 30-Jul-2017 Jonathan Corbet <corbet@lwn.net>

libata: fix a couple of doc build warnings

The kerneldoc comments for a couple of functions in drivers/ata/libata-eh.c
had fallen behind the current implementation, resulting in these doc build
warnings:

./drivers/ata/libata-eh.c:1449: warning: No description found for parameter 'link'
./drivers/ata/libata-eh.c:1449: warning: Excess function parameter 'ap' description in 'ata_eh_done'
./drivers/ata/libata-eh.c:1590: warning: No description found for parameter 'qc'
./drivers/ata/libata-eh.c:1590: warning: Excess function parameter 'dev' description in 'ata_eh_request_sense'

Update the comments and make the warnings go away.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Tejun Heo <tj@kernel.org>


# ae867937 13-Jul-2017 Kefeng Wang <wangkefeng.wang@huawei.com>

libata: remove unused rc in ata_eh_handle_port_resume

Remove unused variable ‘rc’.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>


# f01f62c2 04-Jun-2017 Christoph Hellwig <hch@lst.de>

libata: move ata_read_log_page to libata-core.c

It is core functionality, and only one of the users is in the EH code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Tejun Heo <tj@kernel.org>


# 9bb9a39c 16-May-2017 Mauro Carvalho Chehab <mchehab@kernel.org>

ata: update references for libata documentation

The libata documentation is now using ReST. Update references
to it to point to the new place.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Tejun Heo <tj@kernel.org>


# 19285f3c 14-May-2017 Mauro Carvalho Chehab <mchehab@kernel.org>

ata: update references for libata documentation

The libata documentation is now using ReST. Update references
to it to point to the new place.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>


# 4091fb95 27-Feb-2017 Masahiro Yamada <yamada.masahiro@socionext.com>

scripts/spelling.txt: add "followings" pattern and fix typo instances

Fix typos and add the following to the scripts/spelling.txt:

followings||following

While we are here, add a missing colon in the boilerplate in DT binding
documents. The "you SoC" in allwinner,sunxi-pinctrl.txt was fixed as
well.

I reworded "as the followings:" to "as follows:" for
drivers/usb/gadget/udc/renesas_usb3.c.

Link: http://lkml.kernel.org/r/1481573103-11329-32-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b6a05c82 30-Jan-2017 Christoph Hellwig <hch@lst.de>

scsi: remove eh_timed_out methods in the transport template

Instead define the timeout behavior purely based on the host_template
eh_timed_out method and wire up the existing transport implementations
in the host templates. This also clears up the confusion that the
transport template method overrides the host template one, so some
drivers have to re-override the transport template one.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# fb1b8b11 09-Jan-2017 Geert Uytterhoeven <geert@linux-m68k.org>

libata-eh: Use switch() instead of sparse array for protocol strings

Replace the sparse 256-pointer array for looking up protocol strings by
a switch() statement to reduce kernel size.

According to bloat-o-meter, this saves 910 bytes on m68k (32-bit), and
1892 bytes on arm64 (64-bit).

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Tejun Heo <tj@kernel.org>


# 5b844b63 13-Jul-2016 Hannes Reinecke <hare@suse.de>

ata: Handle ATA NCQ NO-DATA commands correctly

Add a new taskfile protocol ATA_PROT_NCQ_NODATA to handle
ATA NCQ NO-DATA commands correctly.
And fixup ata_scsi_zbc_out_xlat() to use it.

Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Tejun Heo <tj@kernel.org>


# 5b51ba61 13-Jul-2016 Hannes Reinecke <hare@suse.de>

libata-eh: decode all taskfile protocols

Some taskfile protocol values where missing in ata_eh_link_report().

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Tejun Heo <tj@kernel.org>


# bd18bc04 13-Jul-2016 Hannes Reinecke <hare@suse.de>

ata: fixup ATA_PROT_NODATA

The taskfile protocol is a numeric value, and can not be ORed. Currently
this is harmless as the protocol is always zeroed before, but if it ever
has a non-zero value the ORing would create incorrect results.

Signed-off-by: Hannes Reinecke <hare@suse.de>
[hch: updated patch description]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Tejun Heo <tj@kernel.org>


# 72d8c36e 07-Jun-2016 Wei Fang <fangwei1@huawei.com>

scsi: fix race between simultaneous decrements of ->host_failed

sas_ata_strategy_handler() adds the works of the ata error handler to
system_unbound_wq. This workqueue asynchronously runs work items, so the
ata error handler will be performed concurrently on different CPUs. In
this case, ->host_failed will be decreased simultaneously in
scsi_eh_finish_cmd() on different CPUs, and become abnormal.

It will lead to permanently inequality between ->host_failed and
->host_busy, and scsi error handler thread won't start running. IO
errors after that won't be handled.

Since all scmds must have been handled in the strategy handler, just
remove the decrement in scsi_eh_finish_cmd() and zero ->host_busy after
the strategy handler to fix this race.

Fixes: 50824d6c5657 ("[SCSI] libsas: async ata-eh")
Cc: stable@vger.kernel.org
Signed-off-by: Wei Fang <fangwei1@huawei.com>
Reviewed-by: James Bottomley <jejb@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# 27708a95 24-Apr-2016 Hannes Reinecke <hare@suse.de>

libata: Implement ZBC OUT translation

ZAC drives implement a 'ZAC Management Out' command template,
which maps onto the ZBC OUT command.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Tejun Heo <tj@kernel.org>


# 28a3fc22 24-Apr-2016 Hannes Reinecke <hare@suse.de>

libata: implement ZBC IN translation

ZAC drives implement a 'ZAC Management In' command template,
which maps onto the ZBC IN command.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Tejun Heo <tj@kernel.org>


# d238ffd5 24-Apr-2016 Hannes Reinecke <hare@suse.de>

libata: do not attempt to retrieve sense code twice

Do not call ata_request_sense() if the sense code is already
present.

Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>


# 06dbde5f 04-Apr-2016 Hannes Reinecke <hare@suse.de>

libata: Implement control mode page to select sense format

Implement MODE SELECT for the control mode page to allow the OS
to switch to descriptor sense.

tj: Dropped s/sb/cmd->sense_buffer/ in ata_gen_ata_sense(). Added
@dev description to ata_msense_ctl_mode().

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Tejun Heo <tj@kernel.org>


# 3852e373 04-Apr-2016 Hannes Reinecke <hare@suse.de>

libata: evaluate SCSI sense code

Whenever a sense code is set it would need to be evaluated to
update the error mask.

tj: Cosmetic formatting updates.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Tejun Heo <tj@kernel.org>


# 492bf621 04-Apr-2016 Hannes Reinecke <hare@suse.de>

libata-eh: Set 'information' field for autosense

If NCQ autosense or the sense data reporting feature is enabled
the LBA of the offending command should be stored in the sense
data 'information' field.

tj: s/(u64)-1/U64_MAX/

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Tejun Heo <tj@kernel.org>


# e87fd28c 04-Apr-2016 Hannes Reinecke <hare@suse.de>

libata: Implement support for sense data reporting

ACS-4 defines a sense data reporting feature set.
This patch implements support for it.

tj: Cosmetic formatting updates.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Tejun Heo <tj@kernel.org>


# 5b01e4b9 04-Apr-2016 Hannes Reinecke <hare@suse.de>

libata: Implement NCQ autosense

Some newer devices support NCQ autosense (cf ACS-4), so we should
be using it to retrieve the sense code and speed up recovery.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Tejun Heo <tj@kernel.org>


# ea013a9b 04-Dec-2015 Andreas Werner <andreas.werner@men.de>

libata-eh.c: Introduce new ata port flag for controller which lockup on read log page

Some controller lockup on a ata_read_log_page.
Add new ata port flag ATA_FLAG_NO_LOG_PAGE which can used
to blacklist a controller.

If this flag is set, any attempt to read a log page returns an error
without actually issuing the command.

Signed-off-by: Andreas Werner <andreas.werner@men.de>
Signed-off-by: Tejun Heo <tj@kernel.org>


# 74a80d67 03-Aug-2015 Tejun Heo <tj@kernel.org>

Revert "libata: Implement NCQ autosense"

This reverts commit 42b966fbf35da9c87f08d98f9b8978edf9e717cf.

As implemented, ACS-4 sense reporting for ATA devices bypasses error
diagnosis and handling in libata degrading EH behavior significantly.
Revert the related changes for now.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Hannes Reinecke <hare@suse.de>
Cc: stable@vger.kernel.org #v4.1+


# 84ded2f8 03-Aug-2015 Tejun Heo <tj@kernel.org>

Revert "libata: Implement support for sense data reporting"

This reverts commit fe7173c206de63fc28475ee6ae42ff95c05692de.

As implemented, ACS-4 sense reporting for ATA devices bypasses error
diagnosis and handling in libata degrading EH behavior significantly.
Revert the related changes for now.

ATA_ID_COMMAND_SET_3/4 constants are not reverted as they're used by
later changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Hannes Reinecke <hare@suse.de>
Cc: stable@vger.kernel.org #v4.1+


# fe16d4f2 03-Aug-2015 Tejun Heo <tj@kernel.org>

Revert "libata-eh: Set 'information' field for autosense"

This reverts commit a1524f226a02aa6edebd90ae0752e97cfd78b159.

As implemented, ACS-4 sense reporting for ATA devices bypasses error
diagnosis and handling in libata degrading EH behavior significantly.
Revert the related changes for now.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Hannes Reinecke <hare@suse.de>
Cc: stable@vger.kernel.org #v4.1+


# eab6ee1c 19-May-2015 Martin K. Petersen <martin.petersen@oracle.com>

libata: Fix regression when the NCQ Send and Receive log page is absent

Commit 5d3abf8ff67f ("libata: Fall back to unqueued READ LOG EXT if
the DMA variant fails") allowed us to fall back to the unqueued READ
LOG variant if the queued version failed. However, if the device did
not support the page at all we would end up looping due to a merge
snafu.

Ensure we only take the fallback path once.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>


# 5d3abf8f 04-May-2015 Martin K. Petersen <martin.petersen@oracle.com>

libata: Fall back to unqueued READ LOG EXT if the DMA variant fails

Some devices advertise support for the READ/WRITE LOG DMA EXT commands
but fail when we try to issue them. This can lead to queued TRIM being
unintentionally disabled since the relevant feature flag is located in a
general purpose log page.

Fall back to unqueued READ LOG EXT if the DMA variant fails while
reading a log page.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Tejun Heo <tj@kernel.org>


# 09c5b480 25-Apr-2015 Gabriele Mazzotta <gabriele.mzt@gmail.com>

libata: Ignore spurious PHY event on LPM policy change

When the LPM policy is set to ATA_LPM_MAX_POWER, the device might
generate a spurious PHY event that cuases errors on the link.
Ignore this event if it occured within 10s after the policy change.

The timeout was chosen observing that on a Dell XPS13 9333 these
spurious events can occur up to roughly 6s after the policy change.

Link: http://lkml.kernel.org/g/3352987.ugV1Ipy7Z5@xps13
Signed-off-by: Gabriele Mazzotta <gabriele.mzt@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org


# 255c03d1 27-Mar-2015 Hannes Reinecke <hare@suse.de>

libata: Add tracepoints

Add some tracepoints for ata_qc_issue, ata_qc_complete, and
ata_eh_link_autopsy.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Tejun Heo <tj@kernel.org>


# a1524f22 27-Mar-2015 Hannes Reinecke <hare@suse.de>

libata-eh: Set 'information' field for autosense

If NCQ autosense or the sense data reporting feature is enabled
the LBA of the offending command should be stored in the sense
data 'information' field.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Tejun Heo <tj@kernel.org>


# fe7173c2 27-Mar-2015 Hannes Reinecke <hare@suse.de>

libata: Implement support for sense data reporting

ACS-4 defines a sense data reporting feature set.
This patch implements support for it.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Tejun Heo <tj@kernel.org>


# 42b966fb 27-Mar-2015 Hannes Reinecke <hare@suse.de>

libata: Implement NCQ autosense

Some newer devices support NCQ autosense (cf ACS-4), so we should
be using it to retrieve the sense code and speed up recovery.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Tejun Heo <tj@kernel.org>


# 825e2d87 27-Mar-2015 Hannes Reinecke <hare@suse.de>

libata: whitespace cleanup in ata_get_cmd_descript()

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Tejun Heo <tj@kernel.org>


# 9faa6438 27-Mar-2015 Hannes Reinecke <hare@suse.de>

libata: use READ_LOG_DMA_EXT

If READ_LOG_DMA_EXT is supported we should try to use it for
reading the log pages.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Tejun Heo <tj@kernel.org>


# a13b0c9d 19-Jan-2015 Hannes Reinecke <hare@suse.de>

libata: fixup oops in ata_eh_link_report()

We should only try to evaluate the cdb if this is an ATAPI
device, for any other device the 'cdb' field and the cdb_len
has no meaning.

Fixes: cbba5b0ee4c6c2fc8b78a21d0900099d480cf2e9
Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Tejun Heo <tj@kernel.org>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# cbba5b0e 07-Jan-2015 Hannes Reinecke <hare@suse.de>

libata: use __scsi_format_command()

libata already uses an internal buffer, so we should be using
__scsi_format_command() here.

Tested-by: Robert Elliott <elliott@hp.com>
Reviewed-by: Robert Elliott <elliott@hp.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 9d9acda9 06-Jan-2015 Nicholas Krause <xerofoify@gmail.com>

libata: Remove FIXME comment in atapi_eh_request_sense

Remove the FIXME comment in atapi_eh_request_sense () asking whether
memset of sense buffer is necessary. The buffer may be partially or
fully filled by the device. We want it to be cleared.

Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>


# 36aae28e 12-Dec-2014 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

libata: export ata_get_cmd_descript()

The driver sata_dwc_460ex is using this symbol. To build it as a
module we have to have the symbol exported. This patch adds
EXPORT_SYMBOL_GPL() macro for that.

tj: Updated to use EXPORT_SYMBOL_GPL() instead of EXPORT_SYMBOL() as
the only known user is an in-tree driver. Suggested by Sergei.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>


# 9162c657 05-Nov-2014 Hannes Reinecke <hare@suse.de>

libata: Implement ATA_DEV_ZAC

Add new ATA device type for ZAC devices.

Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Tejun Heo <tj@kernel.org>


# eec7e1c1 15-Jul-2014 Alexey Asemov <alex@alex-at.ru>

libata: EH should handle AMNF error condition as a media error

libata-eh.c should handle AMNF error condition (error byte bit 0,
usually code 0x01) in libata-eh.c along with UNC as a media error so
SCSI stack can handle it properly (translation code 0x01 is already
present in libata-scsi.c) but was never passed down due to lack of
handling in EH.

While using linux-based machine (AMD 6550M-based notebook, PCI IDs for the
controller are 1022:7801 subsys 1025:059d) and ddrescue to salvage data
from failing hard drive (WD7500BPVT 2.5" 750G SATA2), I've found that pure
AMNF 0x01 error code generates generic "device error" that is retried
several times by SCSI stack instead of "media error" that is passed up to
software.

So we may assume deprecated AMNF error code is surely not dead yet, and
it's better for it to be handled properly. As we may see it is used by
modern enough devices, and used properly: drive returned AMNF only when IDs
for track cannot be read completely due to dying head or positioning,
otherwise it returned UNC(orrectables).

Not handling it causes wrong generic error code ("device error") reporting
down the stack, can damage failing drives further because of excessive
retries, and slows salvaging down a lot. Also, there is handling code in
libata-scsi.c for 0x01 AMNF error already.

https://bugzilla.kernel.org/show_bug.cgi?id=80031

tj: Shortened $SUBJ and moved its content to the first paragraph.

Signed-off-by: Alexey Asemov <alex@alex-at.ru>
Signed-off-by: Tejun Heo <tj@kernel.org>


# bc6e7c4b 14-Mar-2014 Dan Williams <dan.j.williams@intel.com>

libata, libsas: kill pm_result and related cleanup

Tejun says:
"At least for libata, worrying about suspend/resume failures don't make
whole lot of sense. If suspend failed, just proceed with suspend. If
the device can't be woken up afterwards, that's that. There isn't
anything we could have done differently anyway. The same for resume, if
spinup fails, the device is dud and the following commands will invoke
EH actions and will eventually fail. Again, there really isn't any
*choice* to make. Just making sure the errors are handled gracefully
(ie. don't crash) and the following commands are handled correctly
should be enough."

The only libata user that actually cares about the result from a suspend
operation is libsas. However, it only cares about whether queuing a new
operation collides with an in-flight one. All libsas does with the
error is retry, but we can just let libata wait for the previous
operation before continuing.

Other cleanups include:
1/ Unifying all ata port pm operations on an ata_port_pm_ prefix
2/ Marking all ata port pm helper routines as returning void, only
ata_port_pm_ entry points need to fake a 0 return value.
3/ Killing ata_port_{suspend|resume}_common() in favor of calling
ata_port_request_pm() directly
4/ Killing the wrappers that just do a to_ata_port() conversion
5/ Clearly marking the entry points that do async operations with an
_async suffix.

Reference: http://marc.info/?l=linux-scsi&m=138995409532286&w=2

Cc: Phillip Susi <psusi@ubuntu.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Todd Brandt <todd.e.brandt@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>


# 35bf8821 07-Mar-2014 Dan Williams <dan.j.williams@intel.com>

libata: end the r-word

Prompted by the social effort in the US to discourage usage of the
adjective "retarded".

In this case we needlessly anthropomorphize hard drives. The
implication is that due to design deficiencies in the device reset
recovery time is negatively impacted. We can simply clearly state that
fact. "Exceptional devices cause outliers in reset recovery time." This
steers clear of any unintended comparison of such devices to humans with
cognitive disabilities.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>


# 462098b0 29-Oct-2013 Levente Kurusa <levex@linux.com>

ata: libata-eh: Remove unnecessary snprintf arithmetic

Remove an unnecessary arithmetic operation from a call to snprintf, because
the size parameter of snprintf includes the trailing null space.
Also, initialize the buffer on definition instead of a memset call.

Signed-off-by: Levente Kurusa <levex@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>


# 16735d02 14-Nov-2013 Wolfram Sang <wsa@kernel.org>

tree-wide: use reinit_completion instead of INIT_COMPLETION

Use this new function to make code more comprehensible, since we are
reinitialzing the completion, not initializing.

[akpm@linux-foundation.org: linux-next resyncs]
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Acked-by: Linus Walleij <linus.walleij@linaro.org> (personally at LCE13)
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 3915c3b5 21-Oct-2013 Robert Hancock <hancockrwd@gmail.com>

libata: Add some missing command descriptions

Add some missing command enumerations from the ATA-8 ACS-3 spec into
include/linux/ata.h, and add the corresponding human-readable command
descriptions in libata-eh.c.

Signed-off-by: Robert Hancock <hancockrwd@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>


# f13e2201 07-Aug-2009 Gwendal Grignou <gwendal@google.com>

libata: make ata_eh_qc_retry() bump scmd->allowed on bogus failures

libata EH decrements scmd->retries when the command failed for reasons
unrelated to the command itself so that, for example, commands aborted
due to suspend / resume cycle don't get penalized; however,
decrementing scmd->retries isn't enough for ATA passthrough commands.

Without this fix, ATA passthrough commands are not resend to the
drive, and no error is signalled to the caller because:

- allowed retry count is 1
- ata_eh_qc_complete fill the sense data, so result is valid
- sense data is filled with untouched ATA registers.

Signed-off-by: Gwendal Grignou <gwendal@google.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org


# 8c3d3d4b 14-May-2013 Tejun Heo <tj@kernel.org>

libata: update "Maintained by:" tags

Jeff moved on to a greener pasture.

s/Maintained by: Jeff Garzik/Maintained by: Tejun Heo/g

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jeff Garzik <jgarzik@pobox.com>


# a7ff60db 24-Jan-2013 Aaron Lu <aaron.lu@intel.com>

[libata] pm: differentiate system and runtime pm for ata port

We need to do different things for system PM and runtime PM, e.g. we do
not need to enable runtime wake for ZPODD when we are doing system
suspend, etc.

Currently, we use PMSG_SUSPEND for both system suspend and runtime
suspend and PMSG_ON for both system resume and runtime resume. Change
this by using PMSG_AUTO_SUSPEND for runtime suspend and PMSG_AUTO_RESUME
for runtime resume. And since PMSG_ON means no transition, it is changed
to PMSG_RESUME for ata port's system resume.

The ata_acpi_set_state is modified accordingly, and the sata case and
pata case is seperated for easy reading.

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 21334205 15-Jan-2013 Aaron Lu <aaron.lu@intel.com>

libata: handle power transition of ODD

When ata port is runtime suspended, it will check if the ODD attched to
it is a zero power(ZP) capable ODD and if the ZP capable ODD is in zero
power ready state. And if this is not the case, the highest acpi state
will be limited to ACPI_STATE_D3_HOT to avoid powering off the ODD. And
if the ODD can be powered off, runtime wake capability needs to be
enabled and powered_off flag will be set to let resume code knows that
the ODD was in powered off state.

And on resume, before it is powered on, if it was powered off during
suspend, runtime wake capability needs to be disabled. After it is
recovered, the ODD is considered functional, post power on processing
like eject tray if the ODD is drawer type is done, and several ZPODD
related fields will also be reset.

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 3dc67440 15-Jan-2013 Aaron Lu <aaron.lu@intel.com>

libata: check zero power ready status for ZPODD

Per the Mount Fuji spec, the ODD is considered zero power ready when:
- For slot type ODD, no media inside;
- For tray type ODD, no media inside and tray closed.

The information can be retrieved by either the returned information of
command GET_EVENT_STATUS_NOTIFICATION(the command is used to poll for
media event) or sense code.

The information provided by the media status byte is not accurate, it
is possible that after a new disc is just inserted, the status byte
still returns media not present. So this information can not be used as
the deciding factor, we use sense code to decide if zpready status is
true.

When we first sensed the ODD in the zero power ready state, the
zp_sampled will be set and timestamp will be recoreded. And after ODD
stayed in this state for some pre-defined period, the ODD is considered
as power off ready and the zp_ready flag will be set. The zp_ready flag
serves as the deciding factor other code will use to see if power off is
OK for the ODD.

The Mount Fuji spec suggests a delay should be used here, to avoid the
case user ejects the ODD and then instantly inserts a new one again, so
that we can avoid a power transition. And some ODDs may be slow to place
its head to the home position after disc is ejected, so a delay here is
generally a good idea. And the delay time can be changed via the module
param zpodd_poweroff_delay.

The zero power ready status check is performed in the ata port's runtime
suspend code path, when port is not frozen yet, as we need to issue some
IOs to the ODD.

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 1eaca39a 12-Dec-2012 Bian Yu <bianyu@kedacom.com>

[libata] ahci: Fix lack of command retry after a success error handler.

It should be a mistake introduced by commit 8d899e70c1b3afff.

qc->flags can't be set AC_ERR_*

Signed-off-by: Bian Yu <bianyu@kedacom.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 5416912a 02-Dec-2012 Aaron Lu <aaron.lu@intel.com>

libata: set dma_mode to 0xff in reset

ata_device->dma_mode's initial value is zero, which is not a valid dma
mode, but ata_dma_enabled will return true for this value. This patch
sets dma_mode to 0xff in reset function, so that ata_dma_enabled will
not return true for this case, or it will cause problem for pata_acpi.

The corrsponding bugzilla page is at:
https://bugzilla.kernel.org/show_bug.cgi?id=49151

Reported-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Tested-by: Szymon Janc <szymon@janc.net.pl>
Tested-by: Dutra Julio <dutra.julio@gmail.com>
Acked-by: Alan Cox <alan@linux.intel.com>
Cc: <stable@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 65fe1f0f 07-Sep-2012 Shane Huang <shane.huang@amd.com>

ahci: implement aggressive SATA device sleep support

Device Sleep is a feature as described in AHCI 1.3.1 Technical Proposal.
This feature enables an HBA and SATA storage device to enter the DevSleep
interface state, enabling lower power SATA-based systems.

Aggressive Device Sleep enables the HBA to assert the DEVSLP signal as
soon as there are no commands outstanding to the device and the port
specific Device Sleep idle timer has expired. This enables autonomous
entry into the DevSleep interface state without waiting for software
in power sensitive systems.

This patch enables Aggressive Device Sleep only if both host controller
and device support it.

Tested on AMD reference board together with Device Sleep supported device
sample.

Signed-off-by: Shane Huang <shane.huang@amd.com>
Reviewed-by: Aaron Lu <aaron.lwe@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# ca6d43b0 22-Jun-2012 Dan Williams <dan.j.williams@intel.com>

[SCSI] libata: reset once

Hotplug testing with libsas currently encounters a 55 second wait for
link recovery to give up. In the case where the user trusts the
response time of their devices permit the recovery attempts to be
limited to one.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>


# 60428407 12-Apr-2012 H Hartley Sweeten <hartleys@visionengravers.com>

libata-eh.c: local functions should not be exposed globally

The function ata_ering_clear_cb is only referenced in this file and
should be marked static to prevent it from being exposed globally.

This quiets the sparse warning:

warning: symbol 'ata_ering_clear_cb' was not declared. Should it be static?

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# e4a9c373 22-Jun-2012 Dan Williams <dan.j.williams@intel.com>

[SCSI] libata, libsas: introduce sched_eh and end_eh port ops

When managing shost->host_eh_scheduled libata assumes that there is a
1:1 shost-to-ata_port relationship. libsas creates a 1:N relationship
so it needs to manage host_eh_scheduled cumulatively at the host level.
The sched_eh and end_eh port port ops allow libsas to track when domain
devices enter/leave the "eh-pending" state under ha->lock (previously
named ha->state_lock, but it is no longer just a lock for ha->state
changes).

Since host_eh_scheduled indicates eh without backing commands pinning
the device it can be deallocated at any time. Move the taking of the
domain_device reference under the port_lock to guarantee that the
ata_port stays around for the duration of eh.

Reviewed-by: Jacek Danecki <jacek.danecki@intel.com>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>


# 8d899e70 02-May-2012 Mark Lord <kernel@teksavvy.com>

libata-eh don't waste time retrying media errors (v3)

ATA and SATA drives have had built-in retries for media errors
for as long as they've been commonplace in computers (early 1990s).

When libata stumbles across a bad sector, it can waste minutes
sitting there doing retry after retry before finally giving up
and letting the higher layers deal with it.

This patch removes retries for media errors only.

Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 6868225e 03-May-2012 Lin Ming <ming.m.lin@intel.com>

libata: skip old error history when counting probe trials

Commit d902747("[libata] Add ATA transport class") introduced
ATA_EFLAG_OLD_ER to mark entries in the error ring as cleared.

But ata_count_probe_trials_cb() didn't check this flag and it still
counts the old error history. So wrong probe trials count is returned
and it causes problem, for example, SATA link speed is slowed down from
3.0Gbps to 1.5Gbps.

Fix it by checking ATA_EFLAG_OLD_ER in ata_count_probe_trials_cb().

Cc: stable <stable@vger.kernel.org> # 2.6.37+
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 81c757bc 02-Dec-2011 Dan Williams <dan.j.williams@intel.com>

[SCSI] libsas: execute transport link resets with libata-eh via host workqueue

Link resets leave ata affiliations intact, so arrange for libsas to make
an effort to avoid dropping the device due to a slow-to-recover link.
Towards this end carry out reset in the host workqueue so that it can
check for ata devices and kick the reset request to libata. Hard
resets, in contrast, bypass libata since they are meant for associating
an ata device with another initiator in the domain (tears down
affiliations).

Need to add a new transport_sas_phy_reset() since the current
sas_phy_reset() is a utility function to libsas lldds. They are not
prepared for it to loop back into eh.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>


# 7a46c078 19-Oct-2011 Gwendal Grignou <gwendal@google.com>

[libata] Issue SRST to Sil3726 PMP

Reenable sending SRST to devices connected behind a Sil3726 PMP.
This allow staggered spinups and handles drives that spins up slowly.

While the drives spin up, the PMP will not accept SRST.
Most controller reissues the reset until the drive is ready, while
some [Sil3124] returns an error.
In ata_eh_error, wait 10s before reset the ATA port and try again.

Signed-off-by: Gwendal Grignou <gwendal@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 38789fda 17-Jul-2011 Paul Gortmaker <paul.gortmaker@windriver.com>

ide/ata: Add export.h for EXPORT_SYMBOL/THIS_MODULE where needed

They were getting this implicitly by an include of module.h
from device.h -- but we are going to clean that up and break
that include chain, so include export.h explicitly now.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# e8411fba 09-Aug-2011 Sergei Shtylyov <sshtylyov@ru.mvista.com>

libata-eh: ata_eh_followup_srst_needed() does not need 'classes' parameter

... since it does not use it.

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 8ea7645c 24-May-2011 Tejun Heo <tj@kernel.org>

libata: leave port thawed after reset failure

libata EH intentionally left a port frozen if it failed
ata_eh_reset(). The intention was avoiding continuous loop of resets
when the controller or attached device is flaky and reporting spurious
hotplug events. Once port enters this state, it can be recovered with
manual rescan, which seemed reasonable.

However, outside of my convoluted test setup, there have been very few
reports justifying this choice while there have been more cases where
the automatic freezing of the port after hotplug attempt of a faulty
device caused confusion and led to unnecessary resets.

This patch changes the behavior so that the port is thawed after reset
failure. This change doesn't necessarily solve but makes it easier
and more intuitive to work around hotplug related problems
(ie. re-pluggin or power cycling the device) as reported in the
followings.

https://bugzilla.kernel.org/show_bug.cgi?id=34712
http://thread.gmane.org/gmane.linux.kernel/1123265/focus=49548

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Reartes Guillermo <rtguille@gmail.com>
Reported-by: Bruce Stenning <b.stenning@indigovision.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>


# a9a79dfe 15-Apr-2011 Joe Perches <joe@perches.com>

ata: Convert ata_<foo>_printk(KERN_<LEVEL> to ata_<foo>_<level>

Saves text by removing nearly duplicated text format strings by
creating ata_<foo>_printk functions and printf extension %pV.

ata defconfig size shrinks ~5% (~8KB), allyesconfig ~2.5% (~13KB)

Format string duplication comes from:

#define ata_link_printk(link, lv, fmt, args...) do { \
if (sata_pmp_attached((link)->ap) || (link)->ap->slave_link) \
printk("%sata%u.%02u: "fmt, lv, (link)->ap->print_id, \
(link)->pmp , ##args); \
else \
printk("%sata%u: "fmt, lv, (link)->ap->print_id , ##args); \
} while(0)

Coalesce long formats.

$ size drivers/ata/built-in.*
text data bss dec hex filename
544969 73893 116584 735446 b38d6 drivers/ata/built-in.allyesconfig.ata.o
558429 73893 117864 750186 b726a drivers/ata/built-in.allyesconfig.dev_level.o
141328 14689 4220 160237 271ed drivers/ata/built-in.defconfig.ata.o
149567 14689 4220 168476 2921c drivers/ata/built-in.defconfig.dev_level.o

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>


# 8c56cacc 25-May-2011 Tejun Heo <tj@kernel.org>

libata: fix unexpectedly frozen port after ata_eh_reset()

To work around controllers which can't properly plug events while
reset, ata_eh_reset() clears error states and ATA_PFLAG_EH_PENDING
after reset but before RESET is marked done. As reset is the final
recovery action and full verification of devices including onlineness
and classfication match is done afterwards, this shouldn't lead to
lost devices or missed hotplug events.

Unfortunately, it forgot to thaw the port when clearing EH_PENDING, so
if the condition happens after resetting an empty port, the port could
be left frozen and EH will end without thawing it, making the port
unresponsive to further hotplug events.

Thaw if the port is frozen after clearing EH_PENDING. This problem is
reported by Bruce Stenning in the following thread.

http://thread.gmane.org/gmane.linux.kernel/1123265

stable: I think we should weather this patch a bit longer in -rcX
before sending it to -stable. Please wait at least a month
after this patch makes upstream. Thanks.

-v2: Fixed spelling in the comment per Dave Howorth.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Bruce Stenning <b.stenning@indigovision.com>
Cc: stable@kernel.org
Cc: Dave Howorth <dhoworth@mrc-lmb.cam.ac.uk>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>


# 8a745f1f 04-Mar-2011 Kristen Carlson Accardi <kristen@linux.intel.com>

libata: Power off empty ports

Give users the option of completely powering off unoccupied
SATA ports using the existing min_power link_power_management_policy
option. When the use selects this option on an empty port, we
will power the port off by setting DET to off. For occupied ports,
behavior is unchanged.

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>


# 5f6f12cc 09-May-2011 Tejun Heo <tj@kernel.org>

libata: fix oops when LPM is used with PMP

ae01b2493c (libata: Implement ATA_FLAG_NO_DIPM and apply it to mcp65)
added ATA_FLAG_NO_DIPM and made ata_eh_set_lpm() check the flag.
However, @ap is NULL if @link points to a PMP link and thus the
unconditional @ap->flags dereference leads to the following oops.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
IP: [<ffffffff813f98e1>] ata_eh_recover+0x9a1/0x1510
...
Pid: 295, comm: scsi_eh_4 Tainted: P 2.6.38.5-core2 #1 System76, Inc. Serval Professional/Serval Professional
RIP: 0010:[<ffffffff813f98e1>] [<ffffffff813f98e1>] ata_eh_recover+0x9a1/0x1510
RSP: 0018:ffff880132defbf0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff880132f40000 RCX: 0000000000000000
RDX: ffff88013377c000 RSI: ffff880132f40000 RDI: 0000000000000000
RBP: ffff880132defce0 R08: ffff88013377dc58 R09: ffff880132defd98
R10: 0000000000000000 R11: 00000000ffffffff R12: 0000000000000000
R13: 0000000000000000 R14: ffff88013377c000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff8800bf700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000018 CR3: 0000000001a03000 CR4: 00000000000406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process scsi_eh_4 (pid: 295, threadinfo ffff880132dee000, task ffff880133b416c0)
Stack:
0000000000000000 ffff880132defcc0 0000000000000000 ffff880132f42738
ffffffff813ee8f0 ffffffff813eefe0 ffff880132defd98 ffff88013377f190
ffffffffa00b3e30 ffffffff813ef030 0000000032defc60 ffff880100000000
Call Trace:
[<ffffffff81400867>] sata_pmp_error_handler+0x607/0xc30
[<ffffffffa00b273f>] ahci_error_handler+0x1f/0x70 [libahci]
[<ffffffff813faade>] ata_scsi_error+0x5be/0x900
[<ffffffff813cf724>] scsi_error_handler+0x124/0x650
[<ffffffff810834b6>] kthread+0x96/0xa0
[<ffffffff8100cd64>] kernel_thread_helper+0x4/0x10
Code: 8b 95 70 ff ff ff b8 00 00 00 00 48 3b 9a 10 2e 00 00 48 0f 44 c2 48 89 85 70 ff ff ff 48 8b 8d 70 ff ff ff f6 83 69 02 00 00 01 <48> 8b 41 18 0f 85 48 01 00 00 48 85 c9 74 12 48 8b 51 08 48 83
RIP [<ffffffff813f98e1>] ata_eh_recover+0x9a1/0x1510
RSP <ffff880132defbf0>
CR2: 0000000000000018

Fix it by testing @link->ap->flags instead.

stable: ATA_FLAG_NO_DIPM was added during 2.6.39 cycle but was
backported to 2.6.37 and 38. This is a fix for that and thus
also applicable to 2.6.37 and 38.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: "Nathan A. Mourey II" <nmoureyii@ne.rr.com>
LKML-Reference: <1304555277.2059.2.camel@localhost.localdomain>
Cc: Connor H <cmdkhh@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>


# ae01b249 16-Mar-2011 Tejun Heo <tj@kernel.org>

libata: Implement ATA_FLAG_NO_DIPM and apply it to mcp65

NVIDIA mcp65 familiy of controllers cause command timeouts when DIPM
is used. Implement ATA_FLAG_NO_DIPM and apply it.

This problem was reported by Stefan Bader in the following thread.

http://thread.gmane.org/gmane.linux.ide/48841

stable: applicable to 2.6.37 and 38.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Stefan Bader <stefan.bader@canonical.com>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>


# 25985edc 30-Mar-2011 Lucas De Marchi <lucas.demarchi@profusion.mobi>

Fix common misspellings

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>


# 0e0b494c 23-Jan-2011 James Bottomley <James.Bottomley@suse.de>

libata: separate error handler into usable components

Right at the moment, the libata error handler is incredibly
monolithic. This makes it impossible to use from composite drivers
like libsas and ipr which have to handle error themselves in the first
instance.

The essence of the change is to split the monolithic error handler
into two components: one which handles a queue of ata commands for
processing and the other which handles the back end of readying a
port. This allows the upper error handler fine grained control in
calling libsas functions (and making sure they only get called for ATA
commands whose lower errors have been fixed up).

Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# c34aeebc 23-Jan-2011 James Bottomley <James.Bottomley@suse.de>

libata: fix eh locking

The SCSI host eh_cmd_q should be protected by the host lock (not the
port lock). This probably doesn't matter that much at the moment,
since we try to serialise the add and eh pieces, but it might matter
in future for more convenient error handling. Plus this switches
libata to the standard eh pattern where you lock, remove from the cmd
queue to a local list and unlock and then operate on the local list.

Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# eb0e85e3 24-Feb-2011 Tejun Heo <tj@kernel.org>

libata: fix hotplug for drivers which don't implement LPM

ata_eh_analyze_serror() suppresses hotplug notifications if LPM is
being used because LPM generates spurious hotplug events. It compared
whether link->lpm_policy was different from ATA_LPM_MAX_POWER to
determine whether LPM is enabled; however, this is incorrect as for
drivers which don't implement LPM, lpm_policy is always
ATA_LPM_UNKNOWN. This disabled hotplug detection for all drivers
which don't implement LPM.

Fix it by comparing whether lpm_policy is greater than
ATA_LPM_MAX_POWER.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 64878c0e 23-Jan-2011 James Bottomley <James.Bottomley@suse.de>

[SCSI] libata: separate error handler into usable components

Right at the moment, the libata error handler is incredibly
monolithic. This makes it impossible to use from composite drivers
like libsas and ipr which have to handle error themselves in the first
instance.

The essence of the change is to split the monolithic error handler
into two components: one which handles a queue of ata commands for
processing and the other which handles the back end of readying a
port. This allows the upper error handler fine grained control in
calling libsas functions (and making sure they only get called for ATA
commands whose lower errors have been fixed up).

Cc: Tejun Heo <tj@kernel.org>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>


# 4451ef63 23-Jan-2011 James Bottomley <James.Bottomley@suse.de>

[SCSI] libata: fix eh locking

The SCSI host eh_cmd_q should be protected by the host lock (not the
port lock). This probably doesn't matter that much at the moment,
since we try to serialise the add and eh pieces, but it might matter
in future for more convenient error handling. Plus this switches
libata to the standard eh pattern where you lock, remove from the cmd
queue to a local list and unlock and then operate on the local list.

Cc: Tejun Heo <tj@kernel.org>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>


# e5005b15 09-Dec-2010 Tejun Heo <tj@kernel.org>

libata: issue DIPM enable commands with LPM state updated

Low level drivers may behave differently depending on the current
link->lpm_policy. During ata_eh_set_lpm(), DIPM enable commands are
issued after the successful completion of ap->ops->set_lpm(), which
means that the controller is already in the target state. This causes
DIPM enable commands to be processed with mismatching controller power
state and link->lpm_policy value.

In ahci, link->lpm_policy is used to ignore certain PHY events if LPM
is enabled; however, as DIPM commands are issued with stale
link->lpm_policy, they sometimes end up triggering these conditions
and get aborted leading to LPM configuration failure.

Fix it by updating link->lpm_policy before issuing DIPM enable
commands.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Kyle McMartin <kyle@mcmartin.ca>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# c0c362b6 06-Sep-2010 Tejun Heo <htejun@gmail.com>

libata: implement cross-port EH exclusion

In libata, the non-EH code paths should always take and release
ap->lock explicitly when accessing hardware or shared data structures.
However, once EH is active, it's assumed that the port is owned by EH
and EH methods don't explicitly take ap->lock unless race from irq
handler or other code paths are expected. However, libata EH didn't
guarantee exclusion among EHs for ports of the same host. IOW,
multiple EHs may execute in parallel on multiple ports of the same
controller.

In many cases, especially in SATA, the ports are completely
independent of each other and this doesn't cause problems; however,
there are cases where different ports share the same resource, which
lead to obscure timing related bugs such as the one fixed by commit
213373cf (ata_piix: fix locking around SIDPR access).

This patch implements exclusion among EHs of the same host. When EH
begins, it acquires per-host EH ownership by calling ata_eh_acquire().
When EH finishes, the ownership is released by calling
ata_eh_release(). EH ownership is also released whenever the EH
thread goes to sleep from ata_msleep() or explicitly and reacquired
after waking up.

This ensures that while EH is actively accessing the hardware, it has
exclusive access to it while allowing EHs to interleave and progress
in parallel as they hit waiting stages, which dominate the time spent
in EH. This achieves cross-port EH exclusion without pervasive and
fragile changes while still allowing parallel EH for the most part.

This was first reported by yuanding02@gmail.com more than three years
ago in the following bugzilla. :-)

https://bugzilla.kernel.org/show_bug.cgi?id=8223

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Reported-by: yuanding02@gmail.com
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 97750ceb 06-Sep-2010 Tejun Heo <tj@kernel.org>

libata: add @ap to ata_wait_register() and introduce ata_msleep()

Add optional @ap argument to ata_wait_register() and replace msleep()
calls with ata_msleep() which take optional @ap in addition to the
duration. These will be used to implement EH exclusion.

This patch doesn't cause any behavior difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 6c8ea89c 01-Sep-2010 Tejun Heo <tj@kernel.org>

libata: implement LPM support for port multipliers

Port multipliers can do DIPM on fan-out links fine. Implement support
for it. Tested w/ SIMG 57xx and marvell PMPs. Both the host and
fan-out links enter power save modes nicely.

SIMG 37xx and 47xx report link offline on SStatus causing EH to detach
the devices. Blacklisted.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 6b7ae954 01-Sep-2010 Tejun Heo <tj@kernel.org>

libata: reimplement link power management

The current LPM implementation has the following issues.

* Operation order isn't well thought-out. e.g. HIPM should be
configured after IPM in SControl is properly configured. Not the
other way around.

* Suspend/resume paths call ata_lpm_enable/disable() which must only
be called from EH context directly. Also, ata_lpm_enable/disable()
were called whether LPM was in use or not.

* Implementation is per-port when it should be per-link. As a result,
it can't be used for controllers with slave links or PMP.

* LPM state isn't managed consistently. After a link reset for
whatever reason including suspend/resume the actual LPM state would
be reset leaving ap->lpm_policy inconsistent.

* Generic/driver-specific logic boundary isn't clear. Currently,
libahci has to mangle stuff which libata EH proper should be
handling. This makes the implementation unnecessarily complex and
fragile.

* Tied to ALPM. Doesn't consider DIPM only cases and doesn't check
whether the device allows HIPM.

* Error handling isn't implemented.

Given the extent of mismatch with the rest of libata, I don't think
trying to fix it piecewise makes much sense. This patch reimplements
LPM support.

* The new implementation is per-link. The target policy is still
port-wide (ap->target_lpm_policy) but all the mechanisms and states
are per-link and integrate well with the rest of link abstraction
and can work with slave and PMP links.

* Core EH has proper control of LPM state. LPM state is reconfigured
when and only when reconfiguration is necessary. It makes sure that
LPM state is reset when probing for new device on the link.
Controller agnostic logic is now implemented in libata EH proper and
driver implementation only has to deal with controller specifics.

* Proper error handling. LPM config failure is attributed to the
device on the link and LPM is disabled for the link if it fails
repeatedly.

* ops->enable/disable_pm() are replaced with single ops->set_lpm()
which takes @policy and @hints. This simplifies driver specific
implementation.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# c93b263e 01-Sep-2010 Tejun Heo <tj@kernel.org>

libata: clean up lpm related symbols and sysfs show/store functions

Link power management related symbols are in confusing state w/ mixed
usages of lpm, ipm and pm. This patch cleans up lpm related symbols
and sysfs show/store functions as follows.

* lpm states - NOT_AVAILABLE, MIN_POWER, MAX_PERFORMANCE and
MEDIUM_POWER are renamed to ATA_LPM_UNKNOWN and
ATA_LPM_{MIN|MAX|MED}_POWER.

* Pre/postfixes are unified to lpm.

* sysfs show/store functions for link_power_management_policy were
curiously named get/put and unnecessarily complex. Renamed to
show/store and simplified.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# d9027470 25-May-2010 Gwendal Grignou <gwendal@google.com>

[libata] Add ATA transport class

This is a scheleton for libata transport class.
All information is read only, exporting information from libata:
- ata_port class: one per ATA port
- ata_link class: one per ATA port or 15 for SATA Port Multiplier
- ata_device class: up to 2 for PATA link, usually one for SATA.

Signed-off-by: Gwendal Grignou <gwendal@google.com>
Reviewed-by: Grant Grundler <grundler@google.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# e2f3d75f 07-Sep-2010 Tejun Heo <htejun@gmail.com>

libata: skip EH autopsy and recovery during suspend

For some mysterious reason, certain hardware reacts badly to usual EH
actions while the system is going for suspend. As the devices won't
be needed until the system is resumed, ask EH to skip usual autopsy
and recovery and proceed directly to suspend.

Signed-off-by: Tejun Heo <tj@kernel.org>
Tested-by: Stephan Diestelhorst <stephan.diestelhorst@amd.com>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# acad76272 05-Jul-2010 FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>

[libata] add ATA_CMD_DSM to ata_get_cmd_descript

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# ad72cf98 02-Jul-2010 Tejun Heo <tj@kernel.org>

libata: take advantage of cmwq and remove concurrency limitations

libata has two concurrency related limitations.

a. ata_wq which is used for polling PIO has single thread per CPU. If
there are multiple devices doing polling PIO on the same CPU, they
can't be executed simultaneously.

b. ata_aux_wq which is used for SCSI probing has single thread. In
cases where SCSI probing is stalled for extended period of time
which is possible for ATAPI devices, this will stall all probing.

#a is solved by increasing maximum concurrency of ata_wq. Please note
that polling PIO might be used under allocation path and thus needs to
be served by a separate wq with a rescuer.

#b is solved by using the default wq instead and achieving exclusion
via per-port mutex.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Jeff Garzik <jgarzik@pobox.com>


# fe06e5f9 10-May-2010 Tejun Heo <tj@kernel.org>

libata-sff: separate out BMDMA EH

Some of error handling logic in ata_sff_error_handler() and all of
ata_sff_post_internal_cmd() are for BMDMA. Create
ata_bmdma_error_handler() and ata_bmdma_post_internal_cmd() and move
BMDMA part into those.

While at it, change DMA protocol check to ata_is_dma(), fix
post_internal_cmd to call ap->ops->bmdma_stop instead of directly
calling ata_bmdma_stop() and open code hardreset selection so that
ata_std_error_handler() doesn't have to know about sff hardreset.

As these two functions are BMDMA specific, there's no reason to check
for bmdma_addr before calling bmdma methods if the protocol of the
failed command is DMA. sata_mv and pata_mpc52xx now don't need to set
.post_internal_cmd to ATA_OP_NULL and pata_icside and sata_qstor don't
need to set it to their bmdma_stop routines.

ata_sff_post_internal_cmd() becomes noop and is removed.

This fixes p3 described in clean-up-BMDMA-initialization patch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# c429137a 10-May-2010 Tejun Heo <tj@kernel.org>

libata-sff: port_task is SFF specific

port_task is tightly bound to the standard SFF PIO HSM implementation.
Using it for any other purpose would be error-prone and there's no
such user and if some drivers need such feature, it would be much
better off using its own. Move it inside CONFIG_ATA_SFF and rename it
to sff_pio_task.

The only function which is exposed to the core layer is
ata_sff_flush_pio_task() which is renamed from ata_port_flush_task()
and now also takes care of resetting hsm_task_state to HSM_ST_IDLE,
which is possible as it's now specific to PIO HSM.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# a09bf4cd 22-Apr-2010 Jeff Garzik <jeff@garzik.org>

libata: ensure NCQ error result taskfile is fully initialized
before returning it via qc->result_tf.

Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# fa41efda 14-Apr-2010 Tejun Heo <tj@kernel.org>

libata: fix locking around blk_abort_request()

blk_abort_request() expectes queue lock to be held by the caller.
Grab it before calling the function.

Lack of this synchronization led to infinite loop on corrupt
q->timeout_list.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 534ead70 14-Jan-2010 Tejun Heo <tj@kernel.org>

libata: retry FS IOs even if it has failed with AC_ERR_INVALID

libata currently doesn't retry if a command fails with AC_ERR_INVALID
assuming that retrying won't get it any further even if retried.
However, a failure may be classified as invalid through hardware
glitch (incorrect reading of the error register or firmware bug) and
there isn't whole lot to gain by not retrying as actually invalid
commands will be failed immediately. Also, commands serving FS IOs
are extremely unlikely to be invalid. Retry FS IOs even if it's
marked invalid.

Transient and incorrect invalid failure was seen while debugging
firmware related issue on Samsung n130 on bko#14314.

http://bugzilla.kernel.org/show_bug.cgi?id=14314

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Johannes Stezenbach <js@sig21.net>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 6013efd8 18-Nov-2009 Tejun Heo <tj@kernel.org>

libata: retry failed FLUSH if device didn't fail it

If ATA device failed FLUSH, it means that the device failed to write
out some amount of data and the error needs to be reported to upper
layers. As retries can't recover the lost data, FLUSH failures need to
be reported immediately in general.

However, if FLUSH fails due to transmission errors, the FLUSH needs to
be retried; otherwise, filesystems may switch to RO mode and/or raid
array may drop a drive for a random transmission glitch.

This condition can be rather easily reproduced on certain ahci
controllers which go through a PHY event after powersave mode switch +
ext4 combination. Powersave mode switch is often closely followed by
flush from the filesystem failing the FLUSH with ATA bus error which
makes the filesystem code believe that data is lost and drop to RO
mode. This was reported in the following bugzilla bug.

http://bugzilla.kernel.org/show_bug.cgi?id=14543

This patch makes libata EH retry FLUSH if it wasn't failed by the
device.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Andrey Vihrov <andrey.vihrov@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 4f7c2874 15-Oct-2009 Tejun Heo <tj@kernel.org>

libata: fix PMP initialization

Commit 842faa6c1a1d6faddf3377948e5cf214812c6c90 fixed error handling
during attach by not committing detected device class to dev->class
while attaching a new device. However, this change missed the PMP
class check in the configuration loop causing a new PMP device to go
through ata_dev_configure() as if it were an ATA or ATAPI device.

As PMP device doesn't have a regular IDENTIFY data, this makes
ata_dev_configure() tries to configure a PMP device using an invalid
data. For the most part, it wasn't too harmful and went unnoticed but
this ends up clearing dev->flags which may have ATA_DFLAG_AN set by
sata_pmp_attach(). This means that SATA_PMP_FEAT_NOTIFY ends up being
disabled on PMPs and on PMPs which honor the flag breaks hotplug
support.

This problem was discovered and reported by Ethan Hsiao.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Ethan Hsiao <ethanhsiao@jmicron.com>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 3b761d3d 06-Oct-2009 Tejun Heo <tj@kernel.org>

libata: fix incorrect link online check during probe

While trying to work around spurious detection retries for
non-existent devices on slave links, commit
816ab89782ac139a8b65147cca990822bb7e8675 incorrectly added link
offline check logic before ata_eh_thaw() was called. This means that
if an occupied link goes down briefly at the time that offline check
was performed, device class will be cleared to ATA_DEV_NONE and libata
wouldn't retry thus failing detection of the device.

The offline check should be done after the port is thawed together
with online check so that such link glitches can be detected by the
interrupt handler and handled properly.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Tim Blechmann <tim@klingt.org>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 6521148c 14-Jul-2009 Robert Hancock <hancockrwd@gmail.com>

libata: add command name parsing for error output

This patch improve libata's output for error/notification messages
to allow easier comprehension and debugging:

When ATAPI commands issued through the SCSI layer fail, use SCSI
functions to print the CDB in human-readable form instead of just
dumping out the CDB in hex.

Print out the name of the failed command (as defined by the ATA
specification) in error handling output along with the raw register
contents.

When reporting status of ACPI taskfile commands executed on resume,
also output the names of the commands being executed (or not) in
readable form.

Since the extra data for printing command names increases kernel
size slightly, a config option has been added to allow disabling
command name output (as well as some of the error register parsing)
for those highly sensitive to kernel text size.

Signed-off-by: Robert Hancock <hancockrwd@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 1e641060 16-Jul-2009 Tejun Heo <tj@kernel.org>

libata: clear eh_info on reset completion

Resets are done with port frozen but some controllers still issue
interrupts during reset and they may end up recording error conditions
in ehi leading to unnecessary EH retrials.

This patch makes ata_eh_reset() clear ehi on reset completion. As
reset is the most severe recovery action, there's nothing to lose by
clearing ehi on its completion.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Zdenek Kaspar <zkaspar82@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 54c38444 07-Apr-2009 Jeff Garzik <jeff@garzik.org>

[libata] EH: freeze port before aborting commands

Call the ->freeze() hook before aborting qc's, because some hardware
requires special handling prior to accessing the taskfile registers
(for diagnosis/analysis/reset). Most notably, hardware may wish to
disable the DMA engine or interrupts in the ->freeze() hook.

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 705d2014 26-Jul-2009 Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

libata: add missing NULL pointer check to ata_eh_reset()

drivers/ata/libata-eh.c +2403 ata_eh_reset(80) warning: variable derefenced before check 'slave'

Please note that this is _not_ a real bug at the moment since ata_eh_context
structure is embedded into ata_list structure and the code alwas checks for
'slave' before accessing 'sehc'.

Anyway lets add missing check and always have a valid 'sehc' pointer (which
makes code easier to understand and prevents introducing some possible bugs
in the future).

Reported-by: Dan Carpenter <error27@gmail.com>
Cc: corbet@lwn.net
Cc: eteo@redhat.com
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# fe2c4d01 07-Jul-2009 Tejun Heo <tj@kernel.org>

libata: fix follow-up SRST failure path

ata_eh_reset() was missing error return handling after follow-up SRST
allowing EH to continue the normal probing path after reset failure.
This was discovered while testing new WD 2TB drives which take longer
than 10 secs to spin up and cause the first follow-up SRST to time
out.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 98a1708d 22-Apr-2009 Martin Olsson <martin@minimum.se>

trivial: fix typos s/paramter/parameter/ and s/excute/execute/ in documentation and source comments.

Signed-off-by: Martin Olsson <martin@minimum.se>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# 6f9c1ea2 22-Apr-2009 Tejun Heo <tj@kernel.org>

libata: clear ering on resume

Error timestamps are in jiffies which doesn't run while suspended and
PHY events during resume isn't too uncommon. When the two are
combined, it can lead to unnecessary speed downs if the machine is
suspended and resumed repeatedly. Clear error history on resume.

This was reported and verified in bnc#486803 by Vladimir Botka.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Vladimir Botka <vbotka@novell.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 842faa6c 09-May-2009 Tejun Heo <tj@kernel.org>

libata: fix attach error handling

New device attach path in ata_eh_revalidate_and_attach() is divided
into two separate loops because ATA requires IDENTIFY to be issued to
slave first while the user expects to see device probe messages from
the master device. new_mask is used to track which devices are the
new ones between the first loop and the second.

This usually works well but if an error occurs during configuration
stage, ata_dev_revalidate_and_attach() returns with error code and
forgets new_mask. On the retry run, dev->class is set and new_mask
for the device is clear, so the device just gets revalidated and thus
ends up skipping post-configuration procedure including scheduling of
SCSI_HOTPLUG for the device. When this occurs, ATA part of probing
works fine but SCSI probing usually doesn't happen and makes the
device unreachable.

The behavior has been around for a very long time but it has been
uncovered with the recent addition of 1_5_GBPS horkage which uses
-EAGAIN return value from ata_dev_configure() to restart the probing
sequence after forcing cable speed.

This can be fixed by making sure dev->class is permanently set only
after all configurations are successfully complete. Fix it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Tim Connors <tconnors+linuxkml@astro.swin.edu.au>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# c96f1732 24-Mar-2009 Alan Cox <alan@redhat.com>

[libata] Improve timeout handling

On a timeout call a device specific handler early in the recovery so that
we can complete and process successful commands which timed out due to IRQ
loss or the like rather more elegantly.

[Revised to exclude the timeout handling on a few devices that inherit from
SFF but are not SFF enough to use the default timeout handler]

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# d6515e6f 03-Mar-2009 Tejun Heo <tj@kernel.org>

libata: make sure port is thawed when skipping resets

When SCR access is available and the link is offline, softreset is
skipped as it only wastes time and some controllers don't respond very
well. However, the skip path forgot to thaw the port, which not only
blocks further event notification from the port but also causes
repeated EH invocations on the same event on drivers which rely on
->thaw() to clear events if the IRQ is shared with another device or
port.

This problem has always been there but is uncovered by recent sata_nv
nf2/3 change which dropped hardreset support while maintaining SCR
access. nf2/3 doesn't clear hotplug event mask from the interrupt
handler but relies on ->thaw() to clear them. When the hardreset was
there, the reset action was never skipped and the port was always
thawed but, with the hardreset gone, ->prereset() determines that
there's no need for softreset and both ->softreset() and ->thaw() are
skipped. This leads to stuck hotplug event in the IRQ status register
triggering hotplug event whenever IRQ is delieverd on the same IRQ.
As the controller shares the same IRQ for both ports, this happens on
every IO if one port is occpupied and the other isn't.

This patch fixes the problem by making sure that the port is thawed on
reset-skip path.

bko#11615 reports this problem.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Robert Hancock <hancockrwd@gmail.com>
Reported-by: Dan Andresan <danyer@gmail.com>
Reported-by: Arne Woerner <arne_woerner@yahoo.com>
Reported-by: Stefan Lippers-Hollmann <s.L-H@gmx.de>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# b5357081 02-Mar-2009 Tejun Heo <tj@kernel.org>

libata: don't use on-stack sense buffer

sense_buffer is used as DMA target and shouldn't be allocated on
stack. Use ap->sector_buf instead. This problem is spotted by Chuck
Ebbert.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# cf9a590a 29-Jan-2009 Tejun Heo <tj@kernel.org>

libata: add no penalty retry request for EH device handling routines

Let -EAGAIN from EH device handling routines trigger EH retry without
consuming its tries count. This will be used to implement link SPD
horkage which requires hardreset to adjust SPD without affecting other
EH decisions. As it bypasses the forward progress guarantee provided
by the tries count, the requester is responsible for ensuring forward
progress.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# c2c7a89c 29-Jan-2009 Tejun Heo <tj@kernel.org>

libata: improve probe failure handling

When link is flaky at high speed, it isn't uncommon for a device to
repeatedly fail probing sequence early after successfully negotiating
high link speed. This often leads to consecutive hotplug events
without successful probing.

This patch improves libata EH such that it remembers probing trials
and if there have been more than two unsuccessful trials in the past
60 seconds, slows down link speed to 1.5Gbps.

As link speed negotiation is the duty of the PHY layer proper, the
goal of this fallback mechanism is to provide the last resort when
everything else fails, which unfortunately happens not too
infrequently, so no fancy 6->3->1.5 speeding down or highest
successful transmission speed seen kind of logics (yet).

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# a07d499b 29-Jan-2009 Tejun Heo <tj@kernel.org>

libata: add @spd_limit to sata_down_spd_limit()

Add @spd_limit to sata_down_spd_limit() so that the caller can specify
the SPD limit it wants. This parameter doesn't get in the way even
when it's too low. The closest possible limit is applied.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 99cf610a 29-Jan-2009 Tejun Heo <tj@kernel.org>

libata: clear dev->ering in smarter way

dev->ering used to be cleared together with the rest of ata_device in
ata_dev_init() which is called whenever a probing event occurs.
dev->ering is about to be used to track probing failures so it needs
to remain persistent over multiple porbing events. This patch
achieves this by doing the following.

* Instead of CLEAR_OFFSET, define CLEAR_BEGIN and CLEAR_END and only
clear between BEGIN and END. ering is moved after END. The split
of persistent area is to allow hotter items remain at the head.

* ering is explicitly cleared on ata_dev_disable() and when device
attach succeeds. So, ering is persistent throug a device's life
time (unless explicitly cleared of course) and also through periods
inbetween disablement of an attached device and successful detection
of the next one.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 678afac6 29-Jan-2009 Tejun Heo <tj@kernel.org>

libata: move ata_dev_disable() to libata-eh.c

ata_dev_disable() is about to be more tightly integrated into EH
logic. Move it to libata-eh.c.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# d89293ab 29-Jan-2009 Tejun Heo <tj@kernel.org>

libata: fix EH device failure handling

The dev->pio_mode > XFER_PIO_0 test is there to avoid unnecessary
speed down warning messages but it accidentally disabled SATA link spd
down during configuration phase after reset where PIO mode is always
zero.

This patch fixes the problem by moving the test where it belongs.
This makes libata probing sequence behave better when the connection
is flaky at higher link speeds which isn't too uncommon for eSATA
devices.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# ece180d1 03-Nov-2008 Tejun Heo <tj@kernel.org>

libata: perform port detach in EH

ata_port_detach() first made sure EH saw ATA_PFLAG_UNLOADING and then
assumed EH context belongs to it and performed detach operation
itself. However, UNLOADING doesn't disable all of EH and this could
lead to problems including triggering WARN_ON()'s in EH path.

This patch makes port detach behave more like other EH actions such
that ata_port_detach() requests EH to detach and waits for completion.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 1eca4365 03-Nov-2008 Tejun Heo <tj@kernel.org>

libata: beef up iterators

There currently are the following looping constructs.

* __ata_port_for_each_link() for all available links
* ata_port_for_each_link() for edge links
* ata_link_for_each_dev() for all devices
* ata_link_for_each_dev_reverse() for all devices in reverse order

Now there's a need for looping construct which is similar to
__ata_port_for_each_link() but iterates over PMP links before the host
link. Instead of adding another one with long name, do the following
cleanup.

* Implement and export ata_link_next() and ata_dev_next() which take
@mode parameter and can be used to build custom loop.
* Implement ata_for_each_link() and ata_for_each_dev() which take
looping mode explicitly.

The following iteration modes are implemented.

* ATA_LITER_EDGE : loop over edge links
* ATA_LITER_HOST_FIRST : loop over all links, host link first
* ATA_LITER_PMP_FIRST : loop over all links, PMP links first

* ATA_DITER_ENABLED : loop over enabled devices
* ATA_DITER_ENABLED_REVERSE : loop over enabled devices in reverse order
* ATA_DITER_ALL : loop over all devices
* ATA_DITER_ALL_REVERSE : loop over all devices in reverse order

This change removes exlicit device enabledness checks from many loops
and makes it clear which ones are iterated over in which direction.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 19b72321 04-Nov-2008 Tejun Heo <tj@kernel.org>

libata: fix last_reset timestamp handling

ehc->last_reset is used to ensure that resets are not issued too
close to each other. It's initialized to jiffies minus one minute
on EH entry. However, when new links are initialized after PMP is
probed, new links have zero for this timestamp resulting in long wait
depending on the current jiffies.

This patch makes last_set considered iff ATA_EHI_DID_RESET is set, in
which case last_reset is always initialized. As an added precaution,
WARN_ON() is added so that warning is printed if last_reset is
in future.

This problem is spotted and debugged by Shane Huang.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Shane Huang <Shane.Huang@amd.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 90484ebf 26-Oct-2008 Tejun Heo <tj@kernel.org>

libata: clear saved xfer_mode and ncq_enabled on device detach

libata EH saves xfer_mode and ncq_enabled at start to later set
DUBIOUS_XFER flag if it has changed. These values need to be cleared
on device detach such that hot device swap doesn't accidentally miss
DUBIOUS_XFER.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 4a9c7b33 27-Oct-2008 Tejun Heo <tj@kernel.org>

libata: fix device iteration bugs

There were several places where only enabled devices should be
iterated over but device enabledness wasn't checked.

* IDENTIFY data 40 wire check in cable_is_40wire()
* xfer_mode/ncq_enabled saving in ata_scsi_error()
* DUBIOUS_XFER handling in ata_set_mode()

While at it, reformat comments in cable_is_40wire().

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 816ab897 21-Oct-2008 Tejun Heo <tj@kernel.org>

libata: set device class to NONE if phys_offline

Reset methods don't have access to phys link status for slave links
and may incorrectly indicate device presence causing unnecessary probe
failures for unoccupied links. This patch clears device class to NONE
during post-reset processing if phys link is offline.

As on/offlineness semantics is strictly defined and used in multiple
places by the core layer, this won't change behavior for drivers which
don't use slave links.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# a568d1d2 21-Oct-2008 Tejun Heo <tj@kernel.org>

libata-eh: fix slave link EH action mask handling

Slave link action mask is transferred to master link and all the EH
actions are taken by the master link. ata_eh_about_to_do() and
ata_eh_done() are called with ATA_EH_ALL_ACTIONS to clear the slave
link actions during transfer. This always sets ATA_PFLAG_RECOVERED
flag causing spurious "EH complete" messages.

Don't set ATA_PFLAG_RECOVERED for slave link actions.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 848e4c68 20-Oct-2008 Tejun Heo <tj@kernel.org>

libata: transfer EHI control flags to slave ehc.i

ATA_EHI_NO_AUTOPSY and ATA_EHI_QUIET are used to control the behavior
of EH. As only the master link is visible outside EH, these flags are
set only for the master link although they should also apply to the
slave link, which causes spurious EH messages during probe and
suspend/resume.

This patch transfers those two flags to slave ehc.i before performing
slave autopsy and reporting.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 242f9dcb 14-Sep-2008 Jens Axboe <jens.axboe@oracle.com>

block: unify request timeout handling

Right now SCSI and others do their own command timeout handling.
Move those bits to the block layer.

Instead of having a timer per command, we try to be a bit more clever
and simply have one per-queue. This avoids the overhead of having to
tear down and setup a timer for each command, so it will result in a lot
less timer fiddling.

Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>


# 11fc33da 30-Aug-2008 Tejun Heo <htejun@gmail.com>

libata-eh: clear UNIT ATTENTION after reset

Resets make ATAPI devices raise UNIT ATTENTION which fails the next
command. As resets can happen asynchronously for unrelated reasons,
this sometimes disrupts innocent users. For example, reading DVD
fails after the system wakes up from suspend or the other device
sharing the channel went through bus error.

Clearing UA has some problems as it might clear UA which the userland
needs to know about. However, UA after resets can only be about the
reset itself and benefits of clearing it overweights cons. Missing UA
can only delay failure to one of the following commands anyway. For
example, timeout while burning is in progress will trigger reset and
reset the device state and probably corrupt the burning run. Although
the userland application won't get the UA, its pending writes will
fail.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 45fabbb7 21-Sep-2008 Elias Oltmanns <eo@nebensachen.de>

libata: Implement disk shock protection support

On user request (through sysfs), the IDLE IMMEDIATE command with UNLOAD
FEATURE as specified in ATA-7 is issued to the device and processing of
the request queue is stopped thereafter until the specified timeout
expires or user space asks to resume normal operation. This is supposed
to prevent the heads of a hard drive from accidentally crashing onto the
platter when a heavy shock is anticipated (like a falling laptop
expected to hit the floor). In fact, the whole port stops processing
commands until the timeout has expired in order to avoid any resets due
to failed commands on another device.

Signed-off-by: Elias Oltmanns <eo@nebensachen.de>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# b1c72916 31-Jul-2008 Tejun Heo <tj@kernel.org>

libata: implement slave_link

Explanation taken from the comment of ata_slave_link_init().

In libata, a port contains links and a link contains devices. There
is single host link but if a PMP is attached to it, there can be
multiple fan-out links. On SATA, there's usually a single device
connected to a link but PATA and SATA controllers emulating TF based
interface can have two - master and slave.

However, there are a few controllers which don't fit into this
abstraction too well - SATA controllers which emulate TF interface
with both master and slave devices but also have separate SCR
register sets for each device. These controllers need separate links
for physical link handling (e.g. onlineness, link speed) but should
be treated like a traditional M/S controller for everything else
(e.g. command issue, softreset).

slave_link is libata's way of handling this class of controllers
without impacting core layer too much. For anything other than
physical link handling, the default host link is used for both master
and slave. For physical link handling, separate @ap->slave_link is
used. All dirty details are implemented inside libata core layer.
From LLD's POV, the only difference is that prereset, hardreset and
postreset are called once more for the slave link, so the reset
sequence looks like the following.

prereset(M) -> prereset(S) -> hardreset(M) -> hardreset(S) ->
softreset(M) -> postreset(M) -> postreset(S)

Note that softreset is called only for the master. Softreset resets
both M/S by definition, so SRST on master should handle both (the
standard method will work just fine).

As slave_link excludes PMP support and only code paths which deal with
the attributes of physical link are affected, all the changes are
localized to libata.h, libata-core.c and libata-eh.c.

* ata_is_host_link() updated so that slave_link is considered as host
link too.

* iterator extended to iterate over the slave_link when using the
underbarred version.

* force param handling updated such that devno 16 is mapped to the
slave link/device.

* ata_link_on/offline() updated to return the combined result from
master and slave link. ata_phys_link_on/offline() are the direct
versions.

* EH autopsy and report are performed separately for master slave
links. Reset is udpated to implement the above described reset
sequence.

Except for reset update, most changes are minor, many of them just
modifying dev->link to ata_dev_phys_link(dev) or using phys online
test instead.

After this update, LLDs can take full advantage of per-dev SCR
registers by simply turning on slave link.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# da0e21d3 31-Jul-2008 Tejun Heo <tj@kernel.org>

libata: use ata_link_printk() when printing SError

SError belongs to link not port. Use ata_link_printk() to print it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 5dbfc9cb 31-Jul-2008 Tejun Heo <tj@kernel.org>

libata: always do follow-up SRST if hardreset returned -EAGAIN

As an optimization, follow-up SRST used to be skipped if
classification wasn't requested even when hardreset requested it via
-EAGAIN. However, some hardresets can't wait for device readiness and
skipping SRST can cause timeout or other failures during revalidation.
Always perform follow-up SRST if hardreset returns -EAGAIN. This
makes reset paths more predictable and thus less error-prone.

While at it, move hardreset error checking such that it's done right
after hardreset is finished. This simplifies followup SRST condition
check a bit and makes the reset path easier to modify.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# a674050e 31-Jul-2008 Tejun Heo <tj@kernel.org>

libata: fix EH action overwriting in ata_eh_reset()

ehc->i.action got accidentally overwritten to ATA_EH_HARD/SOFTRESET in
ata_eh_reset(). The original intention was to clear reset action
which wasn't selected. This can cause unexpected behavior when other
EH actions are scheduled together with reset. Fix it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 05944bdf 13-Aug-2008 Tejun Heo <tj@kernel.org>

libata: implement no[hs]rst force params

Implement force params nohrst, nosrst and norst. This is to work
around reset related problems and ease debugging.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 3eabddb8 10-Jun-2008 Tejun Heo <htejun@gmail.com>

libata-eh: update atapi_eh_request_sense() to take @dev instead of @qc

Update atapi_eh_request_sense() to take @dev, @sense_buf and
@dfl_sense_key instead of taking @qc and extracting information from
it. This change is to make the function more generic and allow it to
be called from other places.

While at it, make cdb initialization use initializer.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 87fbc5a0 19-May-2008 Tejun Heo <htejun@gmail.com>

libata: improve EH internal command timeout handling

ATA_TMOUT_INTERNAL which was 30secs were used for all internal
commands which is way too long when something goes wrong. This patch
implements command type based stepped timeouts. Different command
types can use different timeouts and each command type can use
different timeout values after timeouts.

ie. the initial timeout is set to a value which should cover most of
the cases but not too long so that run away cases don't delay things
too much. After the first try times out, the second try can use
longer timeout and if that one times out too, it can go for full 30sec
timeout.

IDENTIFYs use 5s - 10s - 30s timeout and all other commands use 5s -
10s timeouts.

This patch significantly cuts down the needed time to handle failure
cases while still allowing libata to work with nut job devices through
retries.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# d8af0eb6 19-May-2008 Tejun Heo <htejun@gmail.com>

libata: use ULONG_MAX to terminate reset timeout table

This doesn't introduce any functional changes. This is to make reset
timeout table consistent with to-be-added command timeout tables.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 0a2c0f56 19-May-2008 Tejun Heo <htejun@gmail.com>

libata: improve EH retry delay handling

EH retries were delayed by 5 seconds to ensure that resets don't occur
back-to-back. However, this 5 second delay is superflous or excessive
in many cases. For example, after IDENTIFY times out, there's no
reason to wait five more seconds before retrying.

This patch adds ehc->last_reset timestamp and record the timestamp for
the last reset trial or success and uses it to space resets by
ATA_EH_RESET_COOL_DOWN which is 5 secs and removes unconditional 5 sec
sleeps.

As this change makes inter-try waits often shorter and they're
redundant in nature, this patch also removes the "retrying..."
messages.

While at it, convert explicit rounding up division to DIV_ROUND_UP().

This change speeds up EH in many cases w/o sacrificing robustness.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 341c2c95 19-May-2008 Tejun Heo <htejun@gmail.com>

libata: consistently use msecs for time durations

libata has been using mix of jiffies and msecs for time druations.
This is getting confusing. As writing sub HZ values in jiffies is
PITA and msecs_to_jiffies() can't be used as initializer, unify unit
for all time durations to msecs. So, durations are in msecs and
deadlines are in jiffies. ata_deadline() is added to compute deadline
from a start time and duration in msecs.

While at it, drop now superflous _msec suffix from arguments and
rename @timeout to @deadline if it represents a fixed point in time
rather than duration.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# e0614db2 18-May-2008 Tejun Heo <htejun@gmail.com>

libata: ignore recovered PHY errors

No reason to get overzealous about recovered comm and data errors.
Some PHYs habitually sets them w/o no good reason and being draconian
about these soft error conditions doesn't seem to help anybody.

If need ever rises, we might need to add soft PHY error condition, say
AC_ERR_MAYBE_ATA_BUS and use it only to determine whether speed down
is necessary but I don't think that's very likely to happen. It's far
more likely we'll get timeouts or fatal transmission errors if
recovered errors are so prominent that they hamper operation.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# f046519f 18-May-2008 Tejun Heo <htejun@gmail.com>

libata: kill hotplug related race condition

Originally, whole reset processing was done while the port is frozen
and SError was cleared during @postreset(). This had two race
conditions. 1: hotplug could occur after reset but before SError is
cleared and libata won't know about it. 2: hotplug could occur after
all the reset is complete but before the port is thawed. As all
events are cleared on thaw, the hotplug event would be lost.

Commit ac371987a81c61c2efbd6931245cdcaf43baad89 kills the first race
by clearing SError during link resume but before link onlineness test.
However, this doesn't fix race #2 and in some cases clearing SError
after SRST is a good idea.

This patch solves this problem by cross checking link onlineness with
classification result after SError is cleared and port is thawed.
Reset is retried if link is online but all devices attached to the
link are unknown. As all devices will be revalidated, this one-way
check is enough to ensure that all devices are detected and
revalidated reliably.

This, luckily, also fixes the cases where host controller returns
bogus status while harddrive is spinning up after hotplug making
classification run before the device sends the first FIS and thus
causes misdetection.

Low level drivers can bypass the logic by setting class explicitly to
ATA_DEV_NONE if ever necessary (currently none requires this).

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# dc98c32c 18-May-2008 Tejun Heo <htejun@gmail.com>

libata: move reset freeze/thaw handling into ata_eh_reset()

Previously reset freeze/thaw handling lived outside of ata_eh_reset()
mainly because the original PMP reset code needed the port frozen
while resetting all the fan-out ports, which is no longer the case.

This patch moves freeze/thaw handling into ata_eh_reset().
@prereset() and @postreset() are now called w/o freezing the port
although @prereset() an be called frozen if the port is frozen prior
to entering ata_eh_reset().

This makes code simpler and will help removing hotplug event related
races.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 932648b0 18-May-2008 Tejun Heo <htejun@gmail.com>

libata: reorganize ata_eh_reset() no reset method path

Reorganize ata_eh_reset() such that @prereset() is called even when no
reset method is available and if block is used instead of goto to skip
actual reset. This makes no reset case behave better (readiness wait)
and future changes easier.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 10acf3b0 02-May-2008 Mark Lord <liml@rtr.ca>

libata: export ata_eh_analyze_ncq_error

Export ata_eh_analyze_ncq_error() for subsequent use by sata_mv,
as suggested by Tejun.

Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# a6116c9e 23-Apr-2008 Mark Lord <liml@rtr.ca>

libata-eh set tf flags in NCQ EH result_tf

Fix mis-reporting of NCQ errors by ensuring that result_tf->flags
is properly initialized in libata-eh. This allows ata_gen_ata_sense()
to report the failed block number correctly to SCSI after a media error
during NCQ.

This patch may also be a candidate for backporting to earlier kernels.
Without this fix, SCSI will fail I/O on the entire request rather
than just the bad sector. That can be bad for a request that was
merged from many independent read reads from different tasks.

Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 4f7faa3f 30-Jan-2008 Tejun Heo <htejun@gmail.com>

libata: make EH fail gracefully if no reset method is available

When no reset method is available, libata currently oopses. Although
the condition can't happen unless there's a bug in a low level driver,
oopsing isn't the best way to report the error condition. Complain,
dump stack and fail reset instead.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 45db2f6c 07-Apr-2008 Tejun Heo <htejun@gmail.com>

libata: move link onlineness check out of softreset methods

Currently, SATA softresets should do link onlineness check before
actually performing SRST protocol but it doesn't really belong to
softreset.

This patch moves onlineness check in softreset to ata_eh_reset() and
ata_eh_followup_srst_needed() to clean up code and help future sata_mv
changes which need clear separation between SCR and TF accesses.

sata_fsl is peculiar in that its softreset really isn't softreset but
combination of hardreset and softreset. This patch adds dummy private
->prereset to keep the current behavior but the driver really should
implement separate hard and soft resets and return -EAGAIN from
hardreset if it should be follwed by softreset.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 2a0c15ca 07-Apr-2008 Tejun Heo <htejun@gmail.com>

libata: kill dead code paths in reset path

Some code paths which had been made obsolete by recent reset
simplification were still around. Kill them.

* ata_eh_reset() checked for ATA_DEV_UNKNOWN to determine
classification failure. This is no longer applicable.

* ata_do_reset() should convert ATA_DEV_UNKNOWN to ATA_DEV_NONE
regardless of reset result (e.g. -EAGAIN).

* LLDs don't need to convert ATA_DEV_UNKNOWN to ATA_DEV_NONE.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 071f44b1 07-Apr-2008 Tejun Heo <htejun@gmail.com>

libata: implement PMP helpers

Implement helpers to test whether PMP is supported, attached and
determine pmp number to use when issuing SRST to a link. While at it,
move ata_is_host_link() so that it's together with the two new PMP
helpers.

This change simplifies LLDs and helps making PMP support optional.

Signed-off-by: Tejun Heo <htejun@gmail.com>


# 305d2a1a 07-Apr-2008 Tejun Heo <htejun@gmail.com>

libata: unify mechanism to request follow-up SRST

Previously, there were two ways to trigger follow-up SRST from
hardreset method - returning -EAGAIN and leaving all device classes
unmodified. Drivers never used the latter mechanism and the only use
case for the former was when hardreset couldn't classify.

Drop the latter mechanism and let -EAGAIN mean "perform follow-up SRST
if classification is required". This change removes unnecessary
follow-up SRSTs and simplifies reset implementations.

Signed-off-by: Tejun Heo <htejun@gmail.com>


# 5958e302 07-Apr-2008 Tejun Heo <htejun@gmail.com>

libata: move PMP SCR access failure during reset to ata_eh_reset()

If PMP fan-out reset fails and SCR isn't accessible, PMP should be
reset. This used to be tested by sata_pmp_std_hardreset() and
communicated to EH by -ERESTART. However, this logic is generic and
doesn't really have much to do with specific hardreset implementation.

This patch moves SCR access failure detection logic to ata_eh_reset()
where it belongs. As this makes sata_pmp_std_hardreset() identical to
sata_std_hardreset(), the function is killed and replaced with the
standard method.

Signed-off-by: Tejun Heo <htejun@gmail.com>


# 57c9efdf 07-Apr-2008 Tejun Heo <htejun@gmail.com>

libata: implement and use sata_std_hardreset()

Implement sata_std_hardreset(), which simply wraps around
sata_link_hardreset(). sata_std_hardreset() becomes new standard
hardreset method for sata_port_ops and sata_sff_hardreset() moves from
ata_base_port_ops to ata_sff_port_ops, which is where it really
belongs.

ata_is_builtin_hardreset() is added so that both
ata_std_error_handler() and ata_sff_error_handler() skip both builtin
hardresets if SCR isn't accessible.

piix_sidpr_hardreset() in ata_piix.c is identical to
sata_std_hardreset() in functionality and got replaced with the
standard function.

Signed-off-by: Tejun Heo <htejun@gmail.com>


# 9363c382 07-Apr-2008 Tejun Heo <htejun@gmail.com>

libata: rename SFF functions

SFF functions have confusing names. Some have sff prefix, some have
bmdma, some std, some pci and some none. Unify the naming by...

* SFF functions which are common to both BMDMA and non-BMDMA are
prefixed with ata_sff_.

* SFF functions which are specific to BMDMA are prefixed with
ata_bmdma_.

* SFF functions which are specific to PCI but apply to both BMDMA and
non-BMDMA are prefixed with ata_pci_sff_.

* SFF functions which are specific to PCI and BMDMA are prefixed with
ata_pci_bmdma_.

* Drop generic prefixes from LLD specific routines. For example,
bfin_std_dev_select -> bfin_dev_select.

The following renames are noteworthy.

ata_qc_issue_prot() -> ata_sff_qc_issue()
ata_pci_default_filter() -> ata_bmdma_mode_filter()
ata_dev_try_classify() -> ata_sff_dev_classify()

This rename is in preparation of separating SFF support out of libata
core layer. This patch strictly renames functions and doesn't
introduce any behavior difference.

Signed-off-by: Tejun Heo <htejun@gmail.com>


# 03faab78 27-Mar-2008 Tejun Heo <htejun@gmail.com>

libata: implement ATA_QCFLAG_RETRY

Currently whether a command should be retried after failure is
determined inside ata_eh_finish(). Add ATA_QCFLAG_RETRY and move the
logic into ata_eh_autopsy(). This makes things clearer and helps
extending retry determination logic.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# a1efdaba 24-Mar-2008 Tejun Heo <htejun@gmail.com>

libata: make reset related methods proper port operations

Currently reset methods are not specified directly in the
ata_port_operations table. If a LLD wants to use custom reset
methods, it should construct and use a error_handler which uses those
reset methods. It's done this way for two reasons.

First, the ops table already contained too many methods and adding
four more of them would noticeably increase the amount of necessary
boilerplate code all over low level drivers.

Second, as ->error_handler uses those reset methods, it can get
confusing. ie. By overriding ->error_handler, those reset ops can be
made useless making layering a bit hazy.

Now that ops table uses inheritance, the first problem doesn't exist
anymore. The second isn't completely solved but is relieved by
providing default values - most drivers can just override what it has
implemented and don't have to concern itself about higher level
callbacks. In fact, there currently is no driver which actually
modifies error handling behavior. Drivers which override
->error_handler just wraps the standard error handler only to prepare
the controller for EH. I don't think making ops layering strict has
any noticeable benefit.

This patch makes ->prereset, ->softreset, ->hardreset, ->postreset and
their PMP counterparts propoer ops. Default ops are provided in the
base ops tables and drivers are converted to override individual reset
methods instead of creating custom error_handler.

* ata_std_error_handler() doesn't use sata_std_hardreset() if SCRs
aren't accessible. sata_promise doesn't need to use separate
error_handlers for PATA and SATA anymore.

* softreset is broken for sata_inic162x and sata_sx4. As libata now
always prefers hardreset, this doesn't really matter but the ops are
forced to NULL using ATA_OP_NULL for documentation purpose.

* pata_hpt374 needs to use different prereset for the first and second
PCI functions. This used to be done by branching from
hpt374_error_handler(). The proper way to do this is to use
separate ops and port_info tables for each function. Converted.

Signed-off-by: Tejun Heo <htejun@gmail.com>


# b558eddd 23-Jan-2008 Tejun Heo <htejun@gmail.com>

libata: kill ata_ehi_schedule_probe()

ata_ehi_schedule_probe() was created to hide details of link-resuming
reset magic. Now that all the softreset workarounds are gone,
scheduling probe is very simple - set probe_mask and request RESET.
Kill ata_ehi_schedule_probe() and open code it. This also increases
consistency as ata_ehi_schedule_probe() couldn't cover individual
device probings so they were open-coded even when the helper existed.

While at it, define ATA_ALL_DEVICES as mask of all possible devices on
a link and always use it when requesting probe on link level for
simplicity and consistency. Setting extra bits in the probe_mask
doesn't hurt anybody.

Signed-off-by: Tejun Heo <htejun@gmail.com>


# 672b2d65 23-Jan-2008 Tejun Heo <htejun@gmail.com>

libata: kill ATA_EHI_RESUME_LINK

ATA_EHI_RESUME_LINK has two functions - promote reset to hardreset if
ATA_LFLAG_HRST_TO_RESUME is set and preventing EH from shortcutting
reset action when probing is requested. The former is gone now and
the latter can easily be achieved by making EH to perform at least one
reset if reset is requested, which also makes more sense than
depending on RESUME_LINK flag.

As ATA_EHI_RESUME_LINK was the only EHI reset modifier, this also
kills reset modifier handling.

Signed-off-by: Tejun Heo <htejun@gmail.com>


# cf480626 23-Jan-2008 Tejun Heo <htejun@gmail.com>

libata: prefer hardreset

When both soft and hard resets are available, libata preferred
softreset till now. The logic behind it was to be softer to devices;
however, this doesn't really help much. Rationales for the change:

* BIOS may freeze lock certain things during boot and softreset can't
unlock those. This by itself is okay but during operation PHY event
or other error conditions can trigger hardreset and the device may
end up with different configuration.

For example, after a hardreset, previously unlockable HPA can be
unlocked resulting in different device size and thus revalidation
failure. Similar condition can occur during or after resume.

* Certain ATAPI devices require hardreset to recover after certain
error conditions. On PATA, this is done by issuing the DEVICE RESET
command. On SATA, COMRESET has equivalent effect. The problem is
that DEVICE RESET needs its own execution protocol.

For SFF controllers with bare TF access, it can be easily
implemented but more advanced controllers (e.g. ahci and sata_sil24)
require specialized implementations. Simply using hardreset solves
the problem nicely.

* COMRESET initialization sequence is the norm in SATA land and many
SATA devices don't work properly if only SRST is used. For example,
some PMPs behave this way and libata works around by always issuing
hardreset if the host supports PMP.

Like the above example, libata has developed a number of mechanisms
aiming to promote softreset to hardreset if softreset is not going
to work. This approach is time consuming and error prone.

Also, note that, dependingon how you read the specs, it could be
argued that PMP fan-out ports require COMRESET to start operation.
In fact, all the PMPs on the market except one don't work properly
if COMRESET is not issued to fan-out ports after PMP reset.

* COMRESET is an integral part of SATA connection and any working
device should be able to handle COMRESET properly. After all, it's
the way to signal hardreset during reboot. This is the most used
and recommended (at least by the ahci spec) method of resetting
devices.

So, this patch makes libata prefer hardreset over softreset by making
the following changes.

* Rename ATA_EH_RESET_MASK to ATA_EH_RESET and use it whereever
ATA_EH_{SOFT|HARD}RESET used to be used. ATA_EH_{SOFT|HARD}RESET is
now only used to tell prereset whether soft or hard reset will be
issued.

* Strip out now unneeded promote-to-hardreset logics from
ata_eh_reset(), ata_std_prereset(), sata_pmp_std_prereset() and
other places.

Signed-off-by: Tejun Heo <htejun@gmail.com>


# 3ec25ebd 27-Mar-2008 Tejun Heo <htejun@gmail.com>

libata: ATA_EHI_LPM should be ATA_EH_LPM

EH actions are ATA_EH_* not ATA_EHI_*. Rename ATA_EHI_LPM to
ATA_EH_LPM.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# eec59f76 05-Mar-2008 Tejun Heo <htejun@gmail.com>

libata: allow LLDs w/o any reset method

Some old SFF controllers don't have any way to reset the channel.
Currently, this isn't supported and libata EH causes an oops. Allow
LLDs w/o any reset method and just assume ATA class in such cases.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 33267325 12-Feb-2008 Tejun Heo <htejun@gmail.com>

libata: implement libata.force module parameter

This patch implements libata.force module parameter which can
selectively override ATA port, link and device configurations
including cable type, SATA PHY SPD limit, transfer mode and NCQ.

For example, you can say "use 1.5Gbps for all fan-out ports attached
to the second port but allow 3.0Gbps for the PMP device itself, oh,
the device attached to the third fan-out port chokes on NCQ and
shouldn't go over UDMA4" by the following.

libata.force=2:1.5g,2.15:3.0g,2.03:noncq,udma4

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 75f9cafc 02-Jan-2008 Tejun Heo <htejun@gmail.com>

libata: fix off-by-one in error categorization

ATA_ECAT_DUBIOUS_BASE was too high by one and thus all DUBIOUS error
categorizations were wrong. This passed test because only ATA_BUS and
UNK_DEV were used during testing and the ones after them - ATA_BUS and
an overflowed entry - behaved similarly.

This patch fixes the problem by adding DUBIOUS_NONE category and use
it as base.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 0dc36888 18-Dec-2007 Tejun Heo <htejun@gmail.com>

libata: rename ATA_PROT_ATAPI_* to ATAPI_PROT_*

ATA_PROT_ATAPI_* are ugly and naming schemes between ATA_PROT_* and
ATA_PROT_ATAPI_* are inconsistent causing confusion. Rename them to
ATAPI_PROT_* and make them consistent with ATA counterpart.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# e6a73ab1 13-Dec-2007 Andrew Morton <akpm@linux-foundation.org>

drivers/ata/libata-eh.c: fix printk warning

drivers/ata/libata-eh.c: In function `ata_port_pbar_desc':
drivers/ata/libata-eh.c:215: warning: long long unsigned int format, long unsigned int arg (arg 4)

Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# e39eec13 01-Dec-2007 Jeff Garzik <jeff@garzik.org>

[libata] Build fix WRT ata_is_xxx() new API introduction

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 76326ac1 27-Nov-2007 Tejun Heo <htejun@gmail.com>

libata: implement fast speed down for unverified data transfer mode

It's very likely that the configured data transfer mode is the wrong
one if device fails data transfers right after initial data transfer
mode configuration (including NCQ on/off and xfermode). libata EH
needs to speed down fast before upper layers give up on probing.

This patch implement fast speed down rules to handle such cases
better. Error occured while data transfer hasn't been verified
trigger fast back-to-back speed down actions until data transfer
works.

This change will make cable mis-detection and other initial
configuration problems corrected before partition scanning code gives
up.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 00115e0f 27-Nov-2007 Tejun Heo <htejun@gmail.com>

libata: implement ATA_DFLAG_DUBIOUS_XFER

ATA_DFLAG_DUBIOUS_XFER is set whenever data transfer speed or method
changes and gets cleared when data transfer command succeeds in the
newly configured transfer mode.

This will be used to improve speed down logic.

Signed-off-by: Tejun Heo <htejun@gmail.com<
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 663f99b8 27-Nov-2007 Tejun Heo <htejun@gmail.com>

libata: adjust speed down rules

Speed down rules were too conservative. Adjust them a bit.

* More than 10 timeouts can't happen in 5 minutes as command timeout
is 30secs. Lower the limit for rule #1 to 6.

* 10 timeouts is too high for rule #3 too. Lower it to 6.

* SATAPI can benefit from falling back to PIO too. Allow SATAPI
devices to fall back to PIO.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 3884f7b0 27-Nov-2007 Tejun Heo <htejun@gmail.com>

libata: clean up EH speed down implementation

Clean up EH speed down implementation.

* is_io boolean variable is replaced eflags. is_io is ATA_EFLAG_IS_IO.

* Error categories now have names.

* Better comments.

* Reorder 5min and 10min rules in ata_eh_speed_down_verdict()

* Use local variable @link to cache @dev->link in ata_eh_speed_down()

These changes are to improve readability and ease further changes.
This patch doesn't introduce any behavior change.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 6f1d1e3a 27-Nov-2007 Tejun Heo <htejun@gmail.com>

libata: move ata_set_mode() to libata-eh.c

Move ata_set_mode() to libata-eh.c. ata_set_mode() is surely an EH
action and will be more tightly coupled with the rest of error
handling. Move it to libata-eh.c.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 02c05a27 27-Nov-2007 Tejun Heo <htejun@gmail.com>

libata: factor out ata_eh_schedule_probe()

Factor out ata_eh_schedule_probe() from ata_eh_handle_dev_fail() and
ata_eh_recover(). This is to improve maintainability and make future
changes easier.

In the previous revision, ata_dev_enabled() test was accidentally
dropped while factoring out. This problem was spotted by Bartlomiej.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# bd3adca5 01-Nov-2007 Shaohua Li <shaohua.li@intel.com>

libata-acpi: add ACPI _PSx method

ACPI spec (ver 3.0a, p289) requires IDE power on/off executes ACPI _PSx
methods. As recently most PATA drivers use libata, this patch adds _PSx
method support in libata. ACPI spec doesn't mention if SATA requires the
same _PSx method.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Acked-by: Len Brown <len.brown@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 4ccd3329 08-Jan-2008 Tejun Heo <htejun@gmail.com>

libata: don't normalize UNKNOWN to NONE after reset

After non-classifying reset, ehc->classes[] could contain
ATA_DEV_UNKNOWN which used to be normalized to ATA_DEV_NONE for
consistency. However, this causes unfortunate side effect for drivers
which have non-classifying hardresets (e.g. sata_nv) by making
hardreset report ATA_DEV_NONE for non-classifying resets and thus
makes EH believe that the port is unoccupied and recovery can be
skipped. The end result is that after a device is swapped with
another one, the new device isn't attached after the old one is
detached.

This patch makes ata_eh_reset() not normalize UNKNOWN to NONE after
non-classifying resets. This fixes the above problem. As UNKNOWN and
NONE are handled differently by only EH hotplug logic, this doesn't
cause other behavior changes.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Robert Hancock <hancockr@shaw.ca>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 2695e366 09-Jan-2008 Tejun Heo <htejun@gmail.com>

libata-pmp: propagate timeout to host link

Timeout on downstream command may indicate transmission problem on
host link. Propagate timeouts to host link.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# f2dfc1a1 11-Dec-2007 Tejun Heo <htejun@gmail.com>

libata: update atapi_eh_request_sense() such that lbam/lbah contains buffer size

While updating lbam/h for ATAPI commands, atapi_eh_request_sense() was
left out. Update it.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# abb6a889 28-Nov-2007 Tejun Heo <htejun@gmail.com>

libata: report protocol and full CDB on error

Protocol and CDB allocation size field are important in determining
what went wrong with ATAPI commands. Report them on failure.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 21bef6dd 14-Nov-2007 Adrian Bunk <bunk@kernel.org>

libata: remove unused functions

This patch removes the following obsolete functions:
- libata-core.c: __sata_phy_reset()
- libata-core.c: sata_phy_reset()
- libata-eh.c: ata_qc_timeout()
- libata-eh.c: ata_eng_timeout()

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Tejun Heo <htejun@gmail.com>


# dfcc173d 30-Oct-2007 Tejun Heo <htejun@gmail.com>

libata: consider errors not associated with commands for speed down

libata EH used to ignore errors not associated with commands when
determining whether speed down is necessary or not. This leads to the
following problems.

* Errors not associated with commands can occur indefinitely without
libata EH taking corrective actions.

* Upstream link errors don't trigger speed down when PMP is attached
to it and commands issued to downstream device trigger errors on the
upstream link.

This patch makes ata_eh_link_autopsy() consider errors not associated
with command for speed down.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 08cf69d0 30-Oct-2007 Tejun Heo <htejun@gmail.com>

libata: more robust reset failure handling

Reset failure is a critical error. It results in disabling the link
requiring user intervention to re-enable it. Make reset failure
handling more robust such that libata EH doesn't give up too early.

* Temporary glitches during hardreset may lead to classification
failure when there's no softreset available. Retry instead of
giving up.

* Initial softreset or follow up softreset may fail classification.
Move classification error handling block out of followup softreset
block such that both cases are handled and retry instead of giving
up. Also, on the last try, give ATA class a blind shot.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 416dc9ed 30-Oct-2007 Tejun Heo <htejun@gmail.com>

libata: cosmetic clean up / reorganization of ata_eh_reset()

Clean up and reorganize ata_eh_reset() to ease further changes.

* Cache ARRAY_SIZE(ata_eh_reset_timeouts) in @max_tries.
* Cache link->flags in @lflags.
* Move failure handling block to the end of the function and unnest
both success and failure handling blocks.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# cd955463 30-Oct-2007 Tejun Heo <htejun@gmail.com>

libata: fix timing computation in ata_eh_reset()

As jiffies changes asynchronously, it needs to be cached if unchanging
timestamp is needed. The code in ata_eh_reset() intended to do that
with @now but never actually did it. Fix it.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# e027bd36 26-Oct-2007 Tejun Heo <htejun@gmail.com>

libata: implement and use ATA_QCFLAG_QUIET

Implement ATA_QCFLAG_QUIET which indicates that there's no need to
report if the command fails with AC_ERR_DEV and set it for passthrough
commands.

Combined with previous changes, this now makes device errors for all
direct commands reported directly to the issuer without going through
EH actions and reporting.

Note that EH is still invoked after non-IO device errors to determine
the nature of the error and resume command execution (some controller
requires special care after error to continue). It just performs
default maintenance after error, examines what's going on, realizes
that it's none of its business and reports the command failure without
logging any error messages.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# f90f0828 26-Oct-2007 Tejun Heo <htejun@gmail.com>

libata: stop being overjealous about non-IO commands

libata EH always revalidated device and retried failed command after
error except for ATAPI CCs. This is unnecessary and hinders with
users issuing direct commands. This patch makes the following
changes.

* Make sata_sil24 not request ATA_EH_REVALIDATE on device errors.
sil24 is the only driver which does this. All others let libata EH
core code decide.

* Don't request revalidation after device error of non-IO command.
Revalidation doesn't really help anybody. As ATA_EH_REVALIDATE
isn't set by default, there's no reason to clear it after sense data
is read. Kill ATA_EH_REVALIDATE clearing code while at it.

* Don't retry non-IO command after device error. Device has rejected
the command. There's no point in retrying.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# ca77329f 24-Oct-2007 Kristen Carlson Accardi <kristen.c.accardi@intel.com>

[libata] Link power management infrastructure

Device Initiated Power Management, which is defined
in SATA 2.5 can be enabled for disks which support it.
This patch enables DIPM when the user sets the link
power management policy to "min_power".

Additionally, libata drivers can define a function
(enable_pm) that will perform hardware specific actions to
enable whatever power management policy the user set up
for Host Initiated Power management (HIPM).
This power management policy will be activated after all
disks have been enumerated and intialized. Drivers should
also define disable_pm, which will turn off link power
management, but not change link power management policy.

Documentation/scsi/link_power_management_policy.txt has additional
information.

Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 4fb4615b 29-Oct-2007 Tejun Heo <htejun@gmail.com>

libata: no need to speed down if already at PIO0

After reset, transfer mode is always PIO0 regardless of
dev->xfer_mask. Check dev->pio_mode before trying to slow down after
configuration failure. This prevents bogus speed down before device
is actually configured.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# cdeab114 29-Oct-2007 Tejun Heo <htejun@gmail.com>

libata: relocate forcing PIO0 on reset

Forcing PIO0 on reset was done inside ata_bus_softreset(), which is a
bit out of place as it should be applied to all resets - hard, soft
and implementation which don't use ata_bus_softreset(). Relocate it
such that...

* For new EH, it's done in ata_eh_reset() before calling prereset.

* For old EH, it's done before calling ap->ops->phy_reset() in
ata_bus_probe().

This makes PIO0 forced after all resets. Another difference is that
reset itself is done after PIO0 is forced.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 054a5fba 25-Oct-2007 Tejun Heo <htejun@gmail.com>

libata: track SLEEP state and issue SRST to wake it up

ATA devices in SLEEP mode don't respond to any commands. SRST is
necessary to wake it up. Till now, when a command is issued to a
device in SLEEP mode, the command times out, which makes EH reset the
device and retry the command after that, causing a long delay.

This patch makes libata track SLEEP state and issue SRST automatically
if a command is about to be issued to a device in SLEEP.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Bruce Allen <ballen@gravity.phys.uwm.edu>
Cc: Andrew Paprocki <andrew@ishiboo.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 0e06d9ce 24-Oct-2007 Tejun Heo <htejun@gmail.com>

libata: cosmetic clean up in ata_eh_reset()

Local variable @action usage in ata_eh_reset() is a bit confusing.
It's used only to cache ehc->i.action to test reset masks after
clearing it; however, due to the generic name "action", it's easy to
misinterpret the local variable as containing the selected reset
method later. Also, the reason for caching the original value is easy
to miss.

This patch renames @action to @tmp_action and make it buffer newly
selected value instead to improve readability.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 2dcb407e 19-Oct-2007 Jeff Garzik <jeff@garzik.org>

[libata] checkpatch-inspired cleanups

Tackle the relatively sane complaints of checkpatch --file.

The vast majority is indentation and whitespace changes, the rest are

* #include fixes
* printk KERN_xxx prefix addition
* BSS/initializer cleanups

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# 2855568b 11-Oct-2007 Jeff Garzik <jeff@garzik.org>

[libata] struct pci_dev related cleanups

* remove pointless pci_dev_to_dev() wrapper. Just directly reference
the embedded struct device like everyone else does.

* pata_cs5520: delete cs5520_remove_one(), it was a duplicate of
ata_pci_remove_one()

* linux/libata.h: don't bother including linux/pci.h, we don't need it.
Simply declare 'struct pci_dev' and assume interested parties will
include the header, as they should be doing anyway.

* linux/libata.h: consolidate all CONFIG_PCI declarations into a
single location in the header.

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>


# b06ce3e5 09-Oct-2007 Tejun Heo <htejun@gmail.com>

libata: use ata_exec_internal() for PMP register access

PMP registers used to be accessed with dedicated accessors ->pmp_read
and ->pmp_write. During reset, those callbacks are called with the
port frozen so they should be able to run without depending on
interrupt delivery. To achieve this, they were implemented polling.

However, as resetting the host port makes the PMP to isolate fan-out
ports until SError.X is cleared, resetting fan-out ports while port is
frozen doesn't buy much additional safety.

This patch updates libata PMP support such that PMP registers are
accessed using regular ata_exec_internal() mechanism and kills
->pmp_read/write() callbacks. The following changes are made.

* PMP access helpers - sata_pmp_read_init_tf(), sata_pmp_read_val(),
sata_pmp_write_init_tf() are folded into sata_pmp_read/write() which
are now standalone PMP register access functions.

* sata_pmp_read/write() returns err_mask instead of rc. This is
consistent with other functions which issue internal commands and
allows more detailed error reporting.

* ahci interrupt handler is modified to ignore BAD_PMP and
spurious/illegal completion IRQs while reset is in progress. These
conditions are expected during reset.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# afaa5c37 09-Oct-2007 Tejun Heo <htejun@gmail.com>

libata: implement ATA_PFLAG_RESETTING

Implement ATA_PFLAG_RESETTING. This flag is set while reset is in
progress. It's set before prereset is called and cleared after reset
fails or postreset is finished.

This flag itself doesn't have any function. It will be used by LLDs
to tell whether reset is in progress if it needs to behave differently
during reset.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 2b789108 09-Oct-2007 Tejun Heo <htejun@gmail.com>

libata: add @timeout to ata_exec_internal[_sg]()

Add @timeout argument to ata_exec_internal[_sg](). If 0, default
timeout ata_probe_timeout is used.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 90738683 08-Oct-2007 Tejun Heo <htejun@gmail.com>

libata: wrap schedule_timeout_uninterruptible() in loop

Tasks in uninterruptible sleep might be woken up by unrelated events
and should check whether the condition it was waiting for has actually
triggered. Wrap schedule_timeout_uninterruptible() in loop to achieve
it.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 94ff3d54 08-Oct-2007 Tejun Heo <htejun@gmail.com>

libata: skip suppress reporting if ATA_EHI_QUIET

ATA_EHI_NO_AUTOPSY and ATA_EHI_QUIET are used during initial probing
to skip exception analysis and reporting. Usually, there's nothing to
report but on some allowed but rare corner cases (e.g. phy status
changed interrupt when IRQ is enabled on frozen port - this happens if
IRQ pending status isn't cleared in the IRQ router or controller)
exception messages get printed.

Skip reporting if ATA_EHI_QUIET is set.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 1333e194 02-Oct-2007 Robert Hancock <hancockr@shaw.ca>

libata: add human-readable error value decoding

This adds human-readable decoding of the ATA status and error registers
(similar to what drivers/ide does) as well as the SATA Serror register
to libata error handling output. This prevents the need to pore
through standards documents to figure out the meaning of the bits
in these registers when looking at error reports. Some bits that
drivers/ide decoded are not decoded here, since the bits are either
command-dependent or obsolete, and properly parsing them would add
too much complexity.

Signed-off-by: Robert Hancock <hancockr@shaw.ca>

[edited slightly to make output a bit more symmetric]
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 633273a3 22-Sep-2007 Tejun Heo <htejun@gmail.com>

libata-pmp: hook PMP support and enable it

Hook PMP support into libata and enable it. Connect SCR and probing
functions, and update ata_dev_classify() to detect PMP.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 3495de73 22-Sep-2007 Tejun Heo <htejun@gmail.com>

libata-pmp: update ata_eh_reset() for PMP

PMP always requires SRST to be enabled. Also, hardreset reports
classification code from the first device when PMP is attached, not
from the PMP. Update ata_eh_reset() such that followup softreset is
performed if the controller is PMP capable and the host link is being
reset.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 7d77b247 22-Sep-2007 Tejun Heo <htejun@gmail.com>

libata-pmp-prep: implement sata_async_notification()

AN serves multiple purposes. For ATAPI, it's used for media change
notification. For PMP, for downstream PHY status change notification.
Implement sata_async_notification() which demultiplexes AN.

To avoid unnecessary port events, ATAPI AN is not enabled if PMP is
attached but SNTF is not available.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Kriten Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 668108d7 22-Sep-2007 Tejun Heo <htejun@gmail.com>

libata-pmp-prep: implement EH fast-fail path

If PMP itself becomes inaccessible while trying to link a downstream
link, spending time to recover the downstream link doesn't make any
sense. Make EH skip retry and fail fast if -ERESTART is received.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# f9df58cb 22-Sep-2007 Tejun Heo <htejun@gmail.com>

libata-pmp-prep: implement ATA_LFLAG_DISABLED

Implement ATA_LFLAG_DISABLED. The flag indicates the link is disabled
due to EH recovery failure. While a link is disabled, no EH action is
taken on the link and suspend/resume become noop too.

This will be used by PMP links to manage failed links.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# fd995f70 22-Sep-2007 Tejun Heo <htejun@gmail.com>

libata-pmp-prep: implement ATA_LFLAG_NO_RETRY

Some PMP links are connected to internal pseudo devices which may come
and go depending on situation. There's no reason to try hard to
recover them. ATA_LFLAG_NO_RETRY tells EH to not retry if the device
attached to the link fails.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# ae791c05 22-Sep-2007 Tejun Heo <htejun@gmail.com>

libata-pmp-prep: implement ATA_LFLAG_NO_SRST, ASSUME_ATA and ASSUME_SEMB

Some links on some PMPs locks up on SRST and/or report incorrect
device signature. Implement ATA_LFLAG_NO_SRST, ASSUME_ATA and
ASSUME_SEMB to handle these quirky links. NO_SRST makes EH avoid
SRST. ASSUME_ATA and SEMB forces class code to ATA and SEMB_UNSUP
respectively. Note that SEMB isn't currently supported yet so the
_UNSUP variant is used.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# da917d69 22-Sep-2007 Tejun Heo <htejun@gmail.com>

libata-pmp-prep: implement qc_defer helpers

Implement ap->nr_active_links (the number of links with active qcs),
ap->excl_link (pointer to link which can be used by ->qc_defer and is
cleared when a qc with ATA_QCFLAG_CLEAR_EXCL completes), and
ata_link_active().

These can be used by ->qc_defer() to implement proper command
exclusion. This set of helpers seem enough for both sil24 (ATAPI
exclusion needed) and cmd-switching PMP.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# fb7fd614 22-Sep-2007 Tejun Heo <htejun@gmail.com>

libata-pmp-prep: make a number of functions global to libata

Make a number of functions from libata-core.c and libata-eh.c global
to libata (drivers/ata/libata.h). These will be used by PMP.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 422c9daa 22-Sep-2007 Tejun Heo <htejun@gmail.com>

libata-pmp-prep: add @new_class to ata_dev_revalidate()

Consider newly found class code while revalidating. PMP resetting
always results in valid class code and issuing PMP commands to
ATA/ATAPI device isn't very attractive. Add @new_class to
ata_dev_revalidate() and check class code for revalidation.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# a1e10f7e 17-Aug-2007 Tejun Heo <htejun@gmail.com>

libata: move EH repeat reporting into ata_eh_report()

EH is sometimes repeated without any error or action. For example,
this happens when probing IDENTIFY fails because of a phantom device.
In these cases, all the repeated EH does is making sure there is no
unhandled error or pending action and return. This repeation is
necessary to avoid losing any event which occurred while EH was in
progress.

Unfortunately, this dry run causes annonying "EH pending after
completion" message. This patch moves the repeat reporting into
ata_eh_report() such that it's more compact and skipped on dry runs.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Mikael Pettersson <mikep@it.uu.se>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# cbcdd875 17-Aug-2007 Tejun Heo <htejun@gmail.com>

libata: implement and use ata_port_desc() to report port configuration

Currently, port configuration reporting has the following problems.

* iomapped address is reported instead of raw address
* report contains irrelevant fields or lacks necessary fields for
non-SFF controllers.
* host->irq/irq2 are there just for reporting and hacky.

This patch implements and uses ata_port_desc() and
ata_port_pbar_desc(). ata_port_desc() is almost identical to
ata_ehi_push_desc() except that it takes @ap instead of @ehi, has no
locking requirement, can only be used during host initialization and "
" is used as separator instead of ", ". ata_port_pbar_desc() is a
helper to ease reporting of a PCI BAR or an offsetted address into it.

LLD pushes whatever description it wants using the above two
functions. The accumulated description is printed on host
registration after "[S/P]ATA max MAX_XFERMODE ".

SFF init helpers and ata_host_activate() automatically add
descriptions for addresses and irq respectively, so only LLDs which
isn't standard SFF need to add custom descriptions. In many cases,
such controllers need to report different things anyway.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 9b1e2658 06-Aug-2007 Tejun Heo <htejun@gmail.com>

libata-link: update EH to deal with PMP links

Update ata_eh_autopsy(), ata_eh_report(),
ata_eh_revalidate_and_attach() and ata_eh_recover() to deal with PMP
links. ata_eh_autopsy() and ata_eh_report() updates are
straightforward. They just repeat the same operation over all
configured links. The only change to ata_eh_revalidate_and_attach()
is avoiding calling ->cable_select() on non-host ports.

ata_eh_recover() update is more complex as it first processes all
resets and then performs the rest. This is necessary as thawing with
some links in unknown state can be dangerous. ehi->action is cleared
on successful recovery of a link to avoid repeating recovery due to
failures in other links.

ata_eh_recover() iterates over only PMP links if PMP is attached, and,
on failure, the failing link is returned in @failed_link instead of
disabling devices directly. These are to integrate ata_eh_recover()
into PMP EH later.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# cf1b86c8 06-Aug-2007 Tejun Heo <htejun@gmail.com>

libata-link: update ata_scsi_error() to handle PMP links

Update ata_scsi_error() to handle PMP links. As error conditions can
occur on both host and PMP links, __ata_port_for_each_link() is used.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# dbd82616 06-Aug-2007 Tejun Heo <htejun@gmail.com>

libata-link: implement ata_link_abort()

Implement ata_link_abort().

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 0260731f 06-Aug-2007 Tejun Heo <htejun@gmail.com>

libata-link: linkify config/EH related functions

Make the following functions deal with ata_link instead of ata_port.

* ata_set_mode()
* ata_eh_autopsy() and related functions
* ata_eh_report() and related functions
* suspend/resume related functions
* ata_eh_recover() and related functions

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# cc0680a5 06-Aug-2007 Tejun Heo <htejun@gmail.com>

libata-link: linkify reset

Make reset methods and related functions deal with ata_link instead of
ata_port.

* ata_do_reset()
* ata_eh_reset()
* all prereset/reset/postreset methods and related functions

This patch introduces no behavior change.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 955e57df 06-Aug-2007 Tejun Heo <htejun@gmail.com>

libata-link: linkify EH action helpers

Make ata_eh_about_to_do() and ata_eh_done() deal with ata_link instead
of ata_port.

This patch introduces no behavior change.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 936fd732 06-Aug-2007 Tejun Heo <htejun@gmail.com>

libata-link: linkify PHY-related functions

Make the following PHY-related functions to deal with ata_link instead
of ata_port.

* sata_print_link_status()
* sata_down_spd_limit()
* ata_set_sata_spd_limit() and friends
* sata_link_debounce/resume()
* sata_scr_valid/read/write/write_flush()
* ata_link_on/offline()

This patch introduces no behavior change.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# f58229f8 06-Aug-2007 Tejun Heo <htejun@gmail.com>

libata-link: implement and use link/device iterators

Multiple links and different number of devices per link should be
considered to iterate over links and devices. This patch implements
and uses link and device iterators - ata_port_for_each_link() and
ata_link_for_each_dev() - and ata_link_max_devices().

This change makes a lot of functions iterate over only possible
devices instead of from dev 0 to dev ATA_MAX_DEVICES. All such
changes have been examined and nothing should be broken.

While at it, add a separating comment before device helpers to
distinguish them better from link helpers and others.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 9af5c9c9 06-Aug-2007 Tejun Heo <htejun@gmail.com>

libata-link: introduce ata_link

Introduce ata_link. It abstracts PHY and sits between ata_port and
ata_device. This new level of abstraction is necessary to support
SATA Port Multiplier, which basically adds a bunch of links (PHYs) to
a ATA host port. Fields related to command execution, spd_limit and
EH are per-link and thus moved to ata_link.

This patch only defines the host link. Multiple link handling will be
added later. Also, a lot of ap->link derefences are added but many of
them will be removed as each part is converted to deal directly with
ata_link instead of ata_port.

This patch introduces no behavior change.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: James Bottomley <James.Bottomley@SteelEye.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 5ddf24c5 15-Jul-2007 Tejun Heo <htejun@gmail.com>

libata: implement EH fast drain

In most cases, when EH is scheduled, all in-flight commands are
aborted causing EH to kick in immediately. However, in some cases
(especially with PMP), it's unclear which commands are affected by the
error condition and although aborting all in-flight commands work, it
isn't optimal and may cause unnecessary disruption. On the other
hand, waiting for in-flight commands to drain themselves can take up
to 30seconds.

This patch implements EH fast drain to handle such situations. It
gives in-flight commands some time to finish up but doesn't wait for
too long. After EH is scheduled, fast drain timer is started and if
no other completion occurs in ATA_EH_FASTDRAIN_INTERVAL all in-flight
commands are aborted. If any completion occurred in the interval, the
port is given another interval to finish up itself.

Currently ATA_EH_FASTDRAIN_INTERVAL is 3 secs which should be enough
for finishing up most commands.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 4e57c517 15-Jul-2007 Tejun Heo <htejun@gmail.com>

libata: schedule probing after SError access failure during autopsy

If SError isn't accessible, EH can't tell whether hotplug has happened
or not. Report SError read failure with AC_ERR_OTHER and schedule
probing with hardreset. This will be mainly useful for PMPs.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# fccb6ea5 15-Jul-2007 Tejun Heo <htejun@gmail.com>

libata: clear HOTPLUG flag after a reset

ATA_EHI_HOTPLUGGED is a hint for reset functions indicating the the
port might have gone through hotplug/unplug just before entering EH.
Reset functions modify their behaviors a bit to handle the situation
better - e.g. using longer debouncing delay.

Currently, once HOTPLUG is set, it isn't cleared till the end of EH.
This is unnecessary and makes EH take longer. Clear the HOTPLUGGED
flag after a reset try (successful or not).

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# f1545154 15-Jul-2007 Tejun Heo <htejun@gmail.com>

libata: quickly trigger SATA SPD down after debouncing failed

Debouncing failure is a good indicator of basic link problem. Use
-EPIPE to indicate debouncing failure and make ata_eh_reset() invoke
sata_down_spd_limit() if the error occurs during reset.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 008a7896 15-Jul-2007 Tejun Heo <htejun@gmail.com>

libata: improve SATA PHY speed down logic

sata_down_spd_limit() first reads the current SPD from SStatus and
limit the speed to the lower one of one below the current limit or one
below the current SPD in SStatus. SPD may not be accessible or valid
when SPD down is requested making sata_down_spd_limit() fail when it's
most needed.

This patch makes the current SPD cached after each successful reset
and forces GEN I speed (1.5Gbps) if neither of SStatus or the cached
value is valid, so sata_down_spd_limit() is now guaranteed to lower
the speed limit if lower speed is available.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 5335b729 15-Jul-2007 Tejun Heo <htejun@gmail.com>

libata: implement AC_ERR_NCQ

When an NCQ command fails, all commands in flight are aborted and the
offending one is reported using log page 10h. Depending on controller
characteristics and LLD implementation, all commands may appear as
having a device error due to shared TF status making it hard to
determine what's actually going on.

This patch adds AC_ERR_NCQ, marks the command reported by log page 10h
with it and print extra "<F>" after the error report for the command
to help distinguishing the offending command.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# b64bbc39 15-Jul-2007 Tejun Heo <htejun@gmail.com>

libata: improve EH report formatting

Requiring LLDs to format multiple error description messages properly
doesn't work too well. Help LLDs a bit by making ata_ehi_push_desc()
insert ", " on each invocation. __ata_ehi_push_desc() is the raw
version without the automatic separator.

While at it, make ehi_desc interface proper functions instead of
macros.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# fee7ca72 01-Jul-2007 Tejun Heo <htejun@gmail.com>

libata-link: separate out ata_eh_handle_dev_fail()

Separate out ata_eh_handle_dev_fail() from ata_eh_recover(). This is
in preparation of ata_link and PMP support.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 64578a3de 14-May-2007 Tejun Heo <htejun@gmail.com>

libata-acpi: implement _GTM/_STM support

Implement _GTM/_STM support. acpi_gtm is added to ata_port which
stores _GTM parameters over suspend/resume cycle. A new hook
ata_acpi_on_suspend() is responsible for storing _GTM parameters
during suspend. _STM is executed in ata_acpi_on_resume(). With this
change, invoking _GTF is safe on IDE hierarchy and acpi_sata check
before _GTF is removed.

ata_acpi_gtm() and ata_acpi_stm() implementation is taken from Alan
Cox's pata_acpi implementation. ata_acpi_gtm() is fixed such that the
result parameter is not shifted by sizeof(union acpi_object).

Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 6746544c 14-May-2007 Tejun Heo <htejun@gmail.com>

libata: reimplement ACPI invocation

This patch reimplements ACPI invocation such that, instead of
exporting ACPI details to the rest of libata, ACPI event handlers -
ata_acpi_on_resume() and ata_acpi_on_devcfg() - are used. These two
functions are responsible for determining whether specific ACPI method
is used and when.

On resume, _GTF is scheduled by setting ATA_DFLAG_ACPI_PENDING device
flag. This is done this way to avoid performing the action on wrong
device device (device swapping while suspended).

On every ata_dev_configure(), ata_acpi_on_devcfg() is called, which
performs _SDD and _GTF. _GTF is performed only after resuming and, if
SATA, hardreset as the ACPI spec specifies. As _GTF may contain
arbitrary commands, IDENTIFY page is re-read after _GTF taskfiles are
executed.

If one of ACPI methods fails, ata_acpi_on_devcfg() retries on the
first failure. If it fails again on the second try, ACPI is disabled
on the device. Note that successful configuration clears ACPI failed
status.

With all feature checks moved to the above two functions,
do_drive_set_taskfiles() is trivial and thus collapsed into
ata_acpi_exec_tfs(), which is now static and converted to return the
number of executed taskfiles to be used by ata_acpi_on_resume(). As
failures are handled properly, ata_acpi_push_id() now returns -errno
on errors instead of unconditional zero.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 914616a3 25-Jun-2007 Tejun Heo <htejun@gmail.com>

libata: fix infinite EH waiting bug

When EH gives up after repeated exceptions, it doesn't't clear the
PENDING bit on exit which leaves PENDING bit set without EH actually
scheduled. This makes ata_port_wait_eh() to wait forever makes rmmod
hang on such port. Fix it by clearing the flag.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 8b5bb2fa 25-Jun-2007 Tejun Heo <htejun@gmail.com>

libata: remove unused variable from ata_eh_reset()

Removed unused variable did_followup_srst from ata_eh_reset().

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 8af500bc 25-Jun-2007 Tejun Heo <htejun@gmail.com>

libata: kill non-sense warning message

prereset() is now allowed to set flag for unsupported reset method.
EH layer is responsible for selecting the fallback. Remove non-sense
warning message.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# a617c09f 21-May-2007 Jeff Garzik <jeff@garzik.org>

libata: Trim trailing whitespace

Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 8575b814 11-May-2007 Tejun Heo <htejun@gmail.com>

libata: give devices one last chance even if recovery failed with -EINVAL

After certain errors, some devices report complete garbage on
IDENTIFY. This can cause ata_dev_read_id() to fail with -EINVAL
resulting in immediate disabling of the device. Give the device one
last chance after -EINVAL to allow recovery from such situations. As
-EINVAL is triggered very rarely, this shouldn't cause any noticeable
affect on more common error paths.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Harald Dunkel <harald.dunkel@t-online.de>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# f4d6d004 01-May-2007 Tejun Heo <htejun@gmail.com>

libata: ignore EH scheduling during initialization

libata enables SCSI host during ATA host activation which happens
after IRQ handler is registered and IRQ is enabled. All ATA ports are
in frozen state when IRQ is enabled but frozen ports may raise limited
number of IRQs after being frozen - IOW, ->freeze() is not responsible
for clearing pending IRQs. During normal operation, the IRQ handler
is responsible for clearing spurious IRQs on frozen ports and it
usually doesn't require any extra code.

Unfortunately, during host initialization, the IRQ handler can end up
scheduling EH for a port whose SCSI host isn't initialized yet. This
results in OOPS in the SCSI midlayer. This is relatively short window
and scheduling EH for probing is the first thing libata does after
initialization, so ignoring EH scheduling until initialization is
complete solves the problem nicely.

This problem was spotted by Berck E. Nash in the following thread.

http://thread.gmane.org/gmane.linux.kernel/519412

Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Berck E. Nash <flyboy@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 9666f400 04-May-2007 Tejun Heo <htejun@gmail.com>

libata: reimplement suspend/resume support using sdev->manage_start_stop

Reimplement suspend/resume support using sdev->manage_start_stop.

* Device suspend/resume is now SCSI layer's responsibility and the
code is simplified a lot.

* DPM is dropped. This also simplifies code a lot. Suspend/resume
status is port-wide now.

* ata_scsi_device_suspend/resume() and ata_dev_ready() removed.

* Resume now has to wait for disk to spin up before proceeding. I
couldn't find easy way out as libata is in EH waiting for the
disk to be ready and sd is waiting for EH to complete to issue
START_STOP.

* sdev->manage_start_stop is set to 1 in ata_scsi_slave_config().
This fixes spindown on shutdown and suspend-to-disk.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 31daabda 02-Feb-2007 Tejun Heo <htejun@gmail.com>

libata: reimplement reset sequencing

libata previously depended upon waits in prereset to get resets after
hotplug right for both spin up and device ready wait. This was
necessary both for reliablity and speed as reset was likely to fail if
initiated too early and each try usually took more than 30secs to
fail. Previous patches fixed the reliability part by fixing status
and SCR handling in resets. This patch remedies the speed part by
improving reset sequencing.

Prereset waiting timeout is adjusted to 10s because spinup wait is
replaced by reset sequencing and !BSY wait is not as important as
before. During boot or module loading where the drive is already
fully spun up, !BSY wait succeeds immediately, so 10s should be enough
in most cases. It matters after hotplugging or other error
conditions, but in those cases, !BSY wait in prereset simply can't be
relied upon due to the varied and weird behaviors ATA controllers and
devices show.

Reset is now driven by ata_eh_reset_timeouts[] table which contains
timeouts for each reset try. The first reset can be softreset but the
following ones are always hardreset if available. Each timeout
defines deadline for the reset try. If a reset try fails, reset is
retried with the next timeout till the end of the timeout table is
reached. If a reset try fails before the timeout with error, libata
waits till the deadline of the failed try before retrying.

IOW, the timeout table defines timetable of reset tries such that the
n'th try always begins at least after the sum of all previous timeouts
has passed. The current timetable defines 4 tries and takes around 1
minute.

@0 : First try. This should succeed most of the time during boot.
@10 : 10s is enough to spin up most consumer harddrives. Give it
another shot.
@20 : 20s should spin up > 99% of working drives. This has 30s
timeout for retarded devices needing long idleness post reset.
@55 : Final try with 5s timeout just in case.

The above timetable is trade off between not annoying the device too
much with frequent resets and taking reasonable amount of time in most
cases. Some controllers may do better with shorter timeouts while
others may fare better with longer but we just can't rely upon LLD
writers to test each controller with wide variety of devices using
various scenarios. We need default behavior which reasonably fits
most cases.

I've tested the above timetable on a dozen SATA controllers and a few
PATA controllers with about a dozen different drives from all major
vendors and 4 different ODDs from three different vendors for both
boot and hotplug (if available) cases.

Boot probing is not affected unless the device is broken in which
cases new code gives up on the port after a minute rather than five or
nine minutes. When hotplugging, most devices get detected on the
first or second try. Multi-platter drives with long spin up time
which sometimes took > 40 secs with the original code, now usually
comes up during the second try and at least right after the third try
@20.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# d4b2bab4 02-Feb-2007 Tejun Heo <htejun@gmail.com>

libata: add deadline support to prereset and reset methods

Add @deadline to prereset and reset methods and make them honor it.
ata_wait_ready() which directly takes @deadline is implemented to be
used as the wait function. This patch is in preparation for EH timing
improvements.

* ata_wait_ready() never does busy sleep. It's only used from EH and
no wait in EH is that urgent. This function also prints 'be
patient' message automatically after 5 secs of waiting if more than
3 secs is remaining till deadline.

* ata_bus_post_reset() now fails with error code if any of its wait
fails. This is important because earlier reset tries will have
shorter timeout than the spec requires. If a device fails to
respond before the short timeout, reset should be retried with
longer timeout rather than silently ignoring the device.

There are three behavior differences.

1. Timeout is applied to both devices at once, not separately. This
is more consistent with what the spec says.

2. When a device passes devchk but fails to become ready before
deadline. Previouly, post_reset would just succeed and let
device classification remove the device. New code fails the
reset thus causing reset retry. After a few times, EH will give
up disabling the port.

3. When slave device passes devchk but fails to become accessible
(TF-wise) after reset. Original code disables dev1 after 30s
timeout and continues as if the device doesn't exist, while the
patched code fails reset. When this happens, new code fails
reset on whole port rather than proceeding with only the primary
device.

If the failing device is suffering transient problems, new code
retries reset which is a better behavior. If the failing device is
actually broken, the net effect is identical to it, but not to the
other device sharing the channel. In the previous code, reset would
have succeeded after 30s thus detecting the working one. In the new
code, reset fails and whole port gets disabled. IMO, it's a
pathological case anyway (broken device sharing bus with working
one) and doesn't really matter.

* ata_bus_softreset() is changed to return error code from
ata_bus_post_reset(). It used to return 0 unconditionally.

* Spin up waiting is to be removed and not converted to honor
deadline.

* To be on the safe side, deadline is set to 40s for the time being.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 0d64a233 22-Apr-2007 Tejun Heo <htejun@gmail.com>

libata: separate ATA_EHI_DID_RESET into DID_SOFTRESET and DID_HARDRESET

Separate ATA_EHI_DID_RESET into ATA_EHI_DID_SOFTRESET and
ATA_EHI_DID_HARDRESET. ATA_EHI_DID_RESET is redefined as OR of the
two flags. This patch doesn't introduce any behavior change. This
will be used later to determine whether _SDD is necessary or not.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# c1c4e8d5 22-Apr-2007 Tejun Heo <htejun@gmail.com>

libata: add missing call to ->cable_detect() in new EH path

->cable_detect() used to be called on by the old ata_bus_probe() path.
Add invocation to ata_eh_revalidate_and_attach() right after IDENTIFYs
are done.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# a51d644a 20-Mar-2007 Tejun Heo <htejun@gmail.com>

libata: improve AC_ERR_DEV handling for ->post_internal_cmd

->post_internal_cmd is simplified EH for internal commands. Its
primary mission is to stop the controller such that no rogue memory
access or other activities occur after the internal command is
released. It may provide error diagnostics by setting qc->err_mask
but this hasn't been a requirement.

To ignore SETXFER failure for CFA devices, libata needs to know
whether a command was failed by the device or for any other reason.
ie. internal command needs to get AC_ERR_DEV right.

This patch makes the following changes to AC_ERR_DEV handling and
->post_internal_cmd semantics to accomodate this need and simplify
callback implementation.

1. As long as the correct bits in the result TF registers are set,
there is no need to set AC_ERR_DEV explicitly. libata EH core
takes care of that for both normal and internal commands.

2. The only requirement for ->post_internal_cmd() is to put the
controller into quiescent state. It needs not to set any err_mask.

3. ata_exec_internal_sg() performs minimal error analysis such that
AC_ERR_DEV is automatically set as long as result_tf is filled
correctly.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 771b8dad 13-Mar-2007 Tejun Heo <htejun@gmail.com>

libata: hardreset on SERR_INTERNAL

There was a rare report where SB600 reported SERR_INTERNAL and SRST
couldn't get it out of the failure mode. Hardreset on SERR_INTERNAL.
As the problem is intermittent, whether this fixes the problem or not
hasn't been verified yet, but hardresetting the channel on internal
error is a good idea anyway.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 56287768 01-Apr-2007 Albert Lee <albertcc@tw.ibm.com>

libata: Clear tf before doing request sense (take 3)

patch 2/4:
Clear tf before doing request sense.

This fixes the AOpen 56X/AKH timeout problem.
(http://bugzilla.kernel.org/show_bug.cgi?id=8244)

Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 8c3c52a8 22-Mar-2007 Tejun Heo <htejun@gmail.com>

libata: IDENTIFY backwards for drive side cable detection

For drive side cable detection to work correctly, drives need to be
identified backwards such that the slave device releases PDIAG- before
the mater drive tries to detect cable type. ata_bus_probe() was fixed
by commit f31f0cc2f0b7527072d94d02da332d9bb8d7d94c but the new EH path
wasn't fixed. This patch makes new EH path do IDENTIFY backwards.

ata_dev_configure() for new devices are still performed master first.
This is to keep the detection messages in forward order.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 4aa9ab67 12-Mar-2007 Tejun Heo <htejun@gmail.com>

libata: don't whine if ->prereset() returns -ENOENT

->prereset() returns -ENOENT to tell libata that the port is empty and
reset sequencing should be stopped. This is not an error condition.
Update ata_eh_reset() such that it sets device classes to ATA_DEV_NONE
and return success in on -ENOENT. This makes spurious error message
go away.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 6ffa01d8 02-Mar-2007 Tejun Heo <htejun@gmail.com>

libata: add CONFIG_PM to libata core layer

Conditionalize all PM related stuff in libata core layer using
CONFIG_PM.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 44877b4e 20-Feb-2007 Tejun Heo <htejun@gmail.com>

libata: s/ap->id/ap->print_id/g

ata_port has two different id fields - id and port_no. id is
system-wide 1-based unique id for the port while port_no is 0-based
host-wide port number. The former is primarily used to identify the
ATA port to the user in printk messages while the latter is used in
various places in libata core and LLDs to index the port inside the
host.

The two fields feel quite similar and sometimes ap->id is used in
place of ap->port_no, which is very difficult to spot. This patch
renames ap->id to ap->print_id to reduce the possibility of such bugs.

Some printk messages are adjusted such that id string (ata%u[.%u])
isn't printed twice and/or to use ata_*_printk() instead of hardcoded
id format.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 7d47e8d4 02-Feb-2007 Tejun Heo <htejun@gmail.com>

libata: put some intelligence into EH speed down sequence

The current EH speed down code is more of a proof that the EH
framework is capable of adjusting transfer speed in response to error.
This patch puts some intelligence into EH speed down sequence. The
rules are..

* If there have been more than three timeout, HSM violation or
unclassified DEV errors for known supported commands during last 10
mins, NCQ is turned off.

* If there have been more than three timeout or HSM violation for known
supported command, transfer mode is slowed down. If DMA is active,
it is first slowered by one grade (e.g. UDMA133->100). If that
doesn't help, it's slowered to 40c limit (UDMA33). If PIO is
active, it's slowered by one grade first. If that doesn't help,
PIO0 is forced. Note that this rule does not change transfer mode.
DMA is never degraded into PIO by this rule.

* If there have been more than ten ATA bus, timeout, HSM violation or
unclassified device errors for known supported commands && speeding
down DMA mode didn't help, the device is forced into PIO mode. Note
that this rule is considered only for PATA devices and is pretty
difficult to trigger.

One error can only trigger one rule at a time. After a rule is
triggered, error history is cleared such that the next speed down
happens only after some number of errors are accumulated. This makes
sense because now speed down is done in bigger stride.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 4ae72a1e 02-Feb-2007 Tejun Heo <htejun@gmail.com>

libata: improve probe failure handling

* Move forcing device to PIO0 on device disable into
ata_dev_disable(). This makes both old and new EHs act the same
way.

* Speed down only PIO mode on probe failure. All commands used during
probing are PIO commands. There's no point in speeding down DMA.

* Retry at least once after -ENODEV. Some devices report garbled
IDENTIFY data after certain events. This shouldn't cause device
detach and re-attach.

* Rearrange EH failure path for simplicity.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 458337db 02-Feb-2007 Tejun Heo <htejun@gmail.com>

libata: improve ata_down_xfermask_limit()

Make ata_down_xfermask_limit() accept @sel instead of @force_pio0.
@sel selects how the xfermask limit will be adjusted. The following
selectors are defined.

* ATA_DNXFER_PIO : only speed down PIO
* ATA_DNXFER_DMA : only speed down DMA, don't cause transfer mode change
* ATA_DNXFER_40C : apply 40c cable limit
* ATA_DNXFER_FORCE_PIO : force PIO
* ATA_DNXFER_FORCE_PIO0 : force PIO0 (same as original with @force_pio0 == 1)
* ATA_DNXFER_ANY : same as original with @force_pio0 == 0

Currently, only ANY and FORCE_PIO0 are used to maintain the original
behavior. Other selectors will be used later to improve EH speed down
sequence.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 726f0785 03-Jan-2007 Tejun Heo <htejun@gmail.com>

libata: kill qc->nsect and cursect

libata used two separate sets of variables to record request size and
current offset for ATA and ATAPI. This is confusing and fragile.
This patch replaces qc->nsect/cursect with qc->nbytes/curbytes and
kills them. Also, ata_pio_sector() is updated to use bytes for
qc->cursg_ofs instead of sectors. The field used to be used in bytes
for ATAPI and in sectors for ATA.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 03ee5b1c 26-Jan-2007 Tejun Heo <htejun@gmail.com>

libata: fix ata_eh_suspend() return value

ata_eh_suspend() was returning 0 regardless of failure. This bug has
potential to lose data on suspend. Fix it.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 79a55b72 18-Jan-2007 Tejun Heo <htejun@gmail.com>

libata: fix handling of port actions in per-dev action mask

libata EH ignores port-wide actions in per-dev action mask. However,
device resume requests EH_SOFTRESET using per-dev action mask. Under
certain circumstances, this results in not resetting frozen port after
resuming which causes failure of all commands.

This patch allows port-wide actions to be requested in per-dev action
mask. Before EH recovery starts, port-wide actions will be collected.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 800b3996 03-Dec-2006 Tejun Heo <htejun@gmail.com>

[PATCH] libata: always use polling IDENTIFY

libata switched to IRQ-driven IDENTIFY when IRQ-driven PIO was
introduced. This has caused a lot of problems including device
misdetection and phantom device.

ATA_FLAG_DETECT_POLLING was added recently to selectively use polling
IDENTIFY on problemetic drivers but many controllers and devices are
affected by this problem and trying to adding ATA_FLAG_DETECT_POLLING
for each such case is diffcult and not very rewarding.

This patch makes libata always use polling IDENTIFY. This is
consistent with libata's original behavior and drivers/ide's behavior.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# a569a30d 20-Nov-2006 Tejun Heo <htejun@gmail.com>

[PATCH] libata: don't request sense if the port is frozen

If EH command is issued to a frozen port, it fails with AC_ERR_SYSTEM.
libata used to request sense even when the port is frozen needlessly
adding AC_ERR_SYSTEM to err_mask. Don't do it.

Signed-off-by: Tejun Heo <htejun@gmail.com>


# 664e8503 20-Nov-2006 Tejun Heo <htejun@gmail.com>

[PATCH] libata: print cdb[0] in failed qc report

Print cdb[0] in failed qc report.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 8a937581 14-Nov-2006 Tejun Heo <htejun@gmail.com>

[PATCH] libata: improve failed qc reporting

Improve failed qc reporting. The original message didn't include the
actual command nor full error status and it was necessary to
temporarily patch the code to find out exactly which command is
causing problem. This patch makes EH report full command and result
TFs along with data direction and length. This change will make bug
reports more useful.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 55a8e2c8 10-Nov-2006 Tejun Heo <htejun@gmail.com>

[PATCH] libata: implement presence detection via polling IDENTIFY

On some controllers (ICHs in piix mode), there is *NO* reliable way to
determine device presence other than issuing IDENTIFY and see how the
transaction proceeds by watching the TF status register.

libata acted this way before irq-pio and phantom devices caused very
little problem but now that IDENTIFY is performed using IRQ drive PIO,
such phantom devices now result in multiple 30sec timeouts during
boot.

This patch implements ATA_FLAG_DETECT_POLLING. If a LLD sets this
flag, libata core issues the initial IDENTIFY in polling mode and if
the initial data transfer fails w/ HSM violation, the port is
considered to be empty thus replicating the old libata and IDE
behavior.

Signed-off-by: Jeff Garzik <jeff@garzik.org>


# bff04647 10-Nov-2006 Tejun Heo <htejun@gmail.com>

[PATCH] libata: convert @post_reset to @flags in ata_dev_read_id()

Make ata_dev_read_id() take @flags instead of @post_reset. Currently
there is only one flag defined - ATA_READID_POSTRESET, which is
equivalent to @post_reset. This is preparation for polling presence
detection.

Signed-off-by: Jeff Garzik <jeff@garzik.org>


# baa1e78a 01-Nov-2006 Tejun Heo <htejun@gmail.com>

[PATCH] libata: implement ATA_EHI_SETMODE and ATA_EHI_POST_SETMODE

libata EH used to perform ata_set_mode() iff the EH session performed
reset as indicated by ATA_EHI_DID_RESET. This is incorrect because
->dev_config() called by revalidation is allowed to modify transfer
mode which ata_set_mode() should take care of. This patch implements
the following two flags.

* ATA_EHI_SETMODE: set during EH to schedule ata_set_mode(). Both new
device attachment and revalidation set this flag.

* ATA_EHI_POST_SETMODE: set while the device is revalidated after
ata_set_mode(). Post-setmode revalidation is different from initial
configuaration and EH revalidation in that ->dev_config() is not
allowed tune transfer mode. LLD can use this flag to determine
whether it's allowed to tune transfer mode. Note that POST_SETMODE
->dev_config() is guaranteed to be preceded by non-POST_SETMODE
->dev_config().

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# efdaedc4 01-Nov-2006 Tejun Heo <htejun@gmail.com>

[PATCH] libata: implement ATA_EHI_PRINTINFO

Implement ehi flag ATA_EHI_PRINTINFO. This flag is set when device
configuration needs to print out device info. This used to be handled
by @print_info argument to ata_dev_configure() but LLDs also need to
know about it in ->dev_config() callback.

This patch replaces @print_info w/ ATA_EHI_PRINTINFO and make sata_sil
print workaround messages only on the initial configuration.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 52bad64d 22-Nov-2006 David Howells <dhowells@redhat.com>

WorkStruct: Separate delayable and non-delayable events.

Separate delayable work items from non-delayable work items be splitting them
into a separate structure (delayed_work), which incorporates a work_struct and
the timer_list removed from work_struct.

The work_struct struct is huge, and this limits it's usefulness. On a 64-bit
architecture it's nearly 100 bytes in size. This reduces that by half for the
non-delayable type of event.

Signed-Off-By: David Howells <dhowells@redhat.com>


# c961922b 26-Sep-2006 Alan Cox <alan@lxorguk.ukuu.org.uk>

[PATCH] libata-eh: Remove layering violation and duplication when handling absent ports

This removes the layering violation where drivers have to fiddle
directly with EH flags. Instead we now recognize -ENOENT means "no port"
and do the handling in the core code.

This also removes an instance of a call to disable the port, and an
identical printk from each driver doing this. Even better - future rule
changes will be in one place only.

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# cca3974e 24-Aug-2006 Jeff Garzik <jeff@garzik.org>

libata: Grand renaming.

The biggest change is that ata_host_set is renamed to ata_host.

* ata_host_set => ata_host
* ata_probe_ent->host_flags => ata_probe_ent->port_flags
* ata_probe_ent->host_set_flags => ata_probe_ent->_host_flags
* ata_host_stats => ata_port_stats
* ata_port->host => ata_port->scsi_host
* ata_port->host_set => ata_port->host
* ata_port_info->host_flags => ata_port_info->flags
* ata_(.*)host_set(.*)\(\) => ata_\1host\2()

The leading underscore in ata_probe_ent->_host_flags is to avoid
reusing ->host_flags for different purpose. Currently, the only user
of the field is libata-bmdma.c and probe_ent itself is scheduled to be
removed.

ata_port->host is reused for different purpose but this field is used
inside libata core proper and of different type.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# c6fd2807 10-Aug-2006 Jeff Garzik <jeff@garzik.org>

Move libata to drivers/ata.