History log of /freebsd-10-stable/sys/dev/nvme/
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
297126 21-Mar-2016 mav

MFC r296617: Revert r292074 (by smh): Limit stripesize reported from
nvd(4) to 4K

I believe that this patch handled the problem from the wrong side.
Instead of making ZFS properly handle large stripe sizes, it made
unrelated driver to lie in reported parameters to workaround that.

Alternative solution for this problem from ZFS side was committed at
r296615.

296191 29-Feb-2016 jimharris

MFC r295944:

nvme: fix intx handler to not dereference ioq during initialization

This was a regression from r293328, which deferred allocation
of the controller's ioq array until after interrupts are enabled
during boot.

Approved by: re (gjb)
Sponsored by: Intel

295704 17-Feb-2016 jimharris

MFC r295532:

nvme: avoid duplicate SET_NUM_QUEUES commands

nvme(4) issues a SET_NUM_QUEUES command during device
initialization to ensure enough I/O queues exists for each
of the MSI-X vectors we have allocated. The SET_NUM_QUEUES
command is then issued again during nvme_ctrlr_start(), to
ensure that is properly set after any controller reset.

At least one NVMe drive exists which fails this second
SET_NUM_QUEUES command during device initialization. So
change nvme_ctrlr_start() to only issue its SET_NUM_QUEUES
command when it is coming out of a reset - avoiding the
duplicate SET_NUM_QUEUES during device initialization.

Approved by: re (glebius)
Sponsored by: Intel

294711 25-Jan-2016 smh

MFC r292074:

Limit stripesize reported from nvd(4) to 4K

Sponsored by: Multiplay

293673 11-Jan-2016 jimharris

MFC r293354:

nvme: replace NVME_CEILING macro with howmany()

293672 11-Jan-2016 jimharris

MFC r293352:

nvme: add hw.nvme.min_cpus_per_ioq tunable

Due to FreeBSD system-wide limits on number of MSI-X vectors
(https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199321),
it may be desirable to allocate fewer than the maximum number
of vectors for an NVMe device, in order to save vectors for
other devices (usually Ethernet) that can take better
advantage of them and may be probed after NVMe.

This tunable is expressed in terms of minimum number of CPUs
per I/O queue instead of max number of queues per controller,
to allow for a more even distribution of CPUs per queue. This
avoids cases where some number of CPUs have a dedicated queue,
but other CPUs need to share queues. Ideally the PR referenced
above will eventually be fixed and the mechanism implemented
here becomes obsolete anyways.

While here, fix a bug in the CPUs per I/O queue calculation to
properly account for the admin queue's MSI-X vector.

293671 11-Jan-2016 jimharris

MFC r293328:

nvme: do not revert to single I/O queue when per-CPU queues not available

Previously nvme(4) would revert to a single I/O queue if it could not
allocate enought interrupt vectors or NVMe submission/completion queues
to have one I/O queue per core. This patch determines how to utilize a
smaller number of available interrupt vectors, and assigns (as closely
as possible) an equal number of cores to each associated I/O queue.

293670 11-Jan-2016 jimharris

MFC r293327:

nvme: break out interrupt setup code into a separate function

293669 11-Jan-2016 jimharris

MFC r293326:

nvme: do not pre-allocate MSI-X IRQ resources

The issue referenced here was resolved by other changes
in recent commits, so this code is no longer needed.

293668 11-Jan-2016 jimharris

MFC r293325:

nvme: remove per_cpu_io_queues from struct nvme_controller

Instead just use num_io_queues to make this determination.

This prepares for some future changes enabling use of multiple
queues when we do not have enough queues or MSI-X vectors
for one queue per CPU.

293667 11-Jan-2016 jimharris

MFC r293324:

nvme: simplify some of the nested ifs in interrupt setup code

This prepares for some follow-up commits which do more work in
this area.

291214 23-Nov-2015 jimharris

MFC r290199:

nvd, nvme: report stripesize through GEOM disk layer

Sponsored by: Intel

291213 23-Nov-2015 jimharris

MFC r290198:

nvme: fix race condition in split bio completion path

Sponsored by: Intel

287676 11-Sep-2015 jimharris

MFC r286043:

nvme: do not notify a consumer about failures that occur during initialization

Sponsored by: Intel

285918 27-Jul-2015 jimharris

MFC r285816:

nvme: ensure csts.rdy bit is cleared before returning from nvme_ctrlr_disable

Sponsored by: Intel

285917 27-Jul-2015 jimharris

MFC r285815:

nvme: properly handle case where pci_alloc_msix does not alloc all vectors

Sponsored by: Intel

282926 14-May-2015 jimharris

MFC r281283:

nvme: remove CHATHAM related code

Chatham was an internal NVMe prototype board used for
early driver development.

Sponsored by: Intel

282925 14-May-2015 jimharris

MFC r281282:

nvme: add device strings for Intel DC series NVMe SSDs

Sponsored by: Intel

282924 14-May-2015 jimharris

MFC r281281, r281285:

nvme: create separate DMA tag for non-payload DMA buffers

Submission and completion queue memory need to use a
separate DMA tag for mappings than payload buffers,
to ensure mappings remain contiguous even with DMAR
enabled.

Sponsored by: Intel

282923 14-May-2015 jimharris

MFC r281280:

nvme: fall back to a smaller MSI-X vector allocation if necessary

Previously, if per-CPU MSI-X vectors could not be allocated,
nvme(4) would fall back to INTx with a single I/O queue pair.
This change will still fall back to a single I/O queue pair, but
allocate MSI-X vectors instead of reverting to INTx.

Sponsored by: Intel

267620 18-Jun-2014 jimharris

MFC r267342:

Use bitwise OR instead of logical OR when constructing value for
SET_FEATURES/NUMBER_OF_QUEUES command.

Sponsored by: Intel

265577 07-May-2014 jimharris

MFC r263311:

nvme: Allocate all MSI resources up front so that we can fall back to
INTx if necessary.

265576 07-May-2014 jimharris

MFC r263310:

nvme: Close hole where nvd(4) would not be notified of all nvme(4)
instances if modules loaded during boot.

265573 07-May-2014 jimharris

MFC r263278:

nvme: NVMe specification dictates 4-byte alignment for PRPs (not 8).

265572 07-May-2014 jimharris

MFC r263277:

nvme: Remove the software progress marker SET_FEATURE command during
controller initialization.

The spec says OS drivers should send this command after controller
initialization completes successfully, but other NVMe OS drivers are
not sending this command. This change will therefore reduce differences
between the FreeBSD and other OS drivers.

265569 07-May-2014 jimharris

MFC r260382:

For IDENTIFY passthrough commands to Chatham prototype controllers, copy
the spoofed identify data into the user buffer rather than issuing the
command to the controller, since Chatham IDENTIFY data is always spoofed.

While here, fix a bug in the spoofed data for Chatham submission and
completion queue entry sizes.

257707 05-Nov-2013 jimharris

MFC r257534:

Create a unique unit number for each controller and namespace cdev.

Sponsored by: Intel
Approved by: re (glebius)

256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


256169 08-Oct-2013 jimharris

Fix the LINT build.

Approved by: re (implicit)
MFC after: 1 week


256155 08-Oct-2013 jimharris

Do not leak resources during attach if nvme_ctrlr_construct() or the initial
controller resets fail.

Sponsored by: Intel
Reviewed by: carl
Approved by: re (hrs)
MFC after: 1 week


256154 08-Oct-2013 jimharris

Log and then disable asynchronous notification of persistent events after
they occur.

This prevents repeated notifications of the same event.

Status of these events may be viewed at any time by viewing the
SMART/Health Info Page using nvmecontrol, whether or not asynchronous
events notifications for those events are enabled. This log page can
be viewed using:

nvmecontrol logpage -p 2 <ctrlr id>

Future enhancements may re-enable these notifications on a periodic basis
so that if the notified condition persists, it will continue to be logged.

Sponsored by: Intel
Reviewed by: carl
Approved by: re (hrs)
MFC after: 1 week


256153 08-Oct-2013 jimharris

Do not enable temperature threshold as an asynchronous event notification
on NVMe controllers that do not support it.

Sponsored by: Intel
Reviewed by: carl
Approved by: re (hrs)
MFC after: 1 week


256152 08-Oct-2013 jimharris

Extend some 32-bit fields and variables to 64-bit to prevent overflow
when calculating stats in nvmecontrol perftest.

Sponsored by: Intel
Reported by: Joe Golio <joseph.golio@emc.com>
Reviewed by: carl
Approved by: re (hrs)
MFC after: 1 week


256151 08-Oct-2013 jimharris

Add driver-assisted striping for upcoming Intel NVMe controllers that can
benefit from it.

Sponsored by: Intel
Reviewed by: kib (earlier version), carl
Approved by: re (hrs)
MFC after: 1 week


254389 15-Aug-2013 ken

Change the way that unmapped I/O capability is advertised.

The previous method was to set the D_UNMAPPED_IO flag in the cdevsw
for the driver. The problem with this is that in many cases (e.g.
sa(4)) there may be some instances of the driver that can handle
unmapped I/O and some that can't. The isp(4) driver can handle
unmapped I/O, but the esp(4) driver currently cannot. The cdevsw
is shared among all driver instances.

So instead of setting a flag on the cdevsw, set a flag on the cdev.
This allows drivers to indicate support for unmapped I/O on a
per-instance basis.

sys/conf.h: Remove the D_UNMAPPED_IO cdevsw flag and replace it
with an SI_UNMAPPED cdev flag.

kern_physio.c: Look at the cdev SI_UNMAPPED flag to determine
whether or not a particular driver can handle
unmapped I/O.

geom_dev.c: Set the SI_UNMAPPED flag for all GEOM cdevs.
Since GEOM will create a temporary mapping when
needed, setting SI_UNMAPPED unconditionally will
work.

Remove the D_UNMAPPED_IO flag.

nvme_ns.c: Set the SI_UNMAPPED flag on cdevs created here
if NVME_UNMAPPED_BIO_SUPPORT is enabled.

vfs_aio.c: In aio_qphysio(), check the SI_UNMAPPED flag on a
cdev instead of the D_UNMAPPED_IO flag on the cdevsw.

sys/param.h: Bump __FreeBSD_version to 1000045 for the switch from
setting the D_UNMAPPED_IO flag in the cdevsw to setting
SI_UNMAPPED in the cdev.

Reviewed by: kib, jimharris
MFC after: 1 week
Sponsored by: Spectra Logic


254303 13-Aug-2013 jimharris

If a controller fails to initialize, do not notify consumers (nvd) of its
namespaces.

Sponsoredy by: Intel
Reviewed by: carl
MFC after: 3 days


254302 13-Aug-2013 jimharris

Send a shutdown notification in the driver unload path, to ensure
notification gets sent in cases where system shuts down with driver
unloaded.

Sponsored by: Intel
Reviewed by: carl
MFC after: 3 days


253476 19-Jul-2013 jimharris

Add message when nvd disks are attached and detached.

As part of this commit, add an nvme_strvis() function which borrows
heavily from cam_strvis(). This will allow stripping of
leading/trailing whitespace and also handle unprintable characters
in model/serial numbers. This function goes into a new nvme_util.c
file which is used by both the driver and nvmecontrol.

Sponsored by: Intel
Reviewed by: carl
MFC after: 3 days


253474 19-Jul-2013 jimharris

Fix nvme(4) and nvd(4) to support non 512-byte sector sizes.

Recent testing with QEMU that has variable sector size support for
NVMe uncovered some of these issues. Chatham prototype boards supported
only 512 byte sectors.

Sponsored by: Intel
Reviewed by: carl
MFC after: 3 days


253438 17-Jul-2013 jimharris

Use pause() instead of DELAY() when polling for completion of admin
commands during controller initialization.

DELAY() does not work here during config_intrhook context - we need to
explicitly relinquish the CPU for the admin command completion to
get processed.

Sponsored by: Intel
Reported by: Adam Brooks <adam.j.brooks@intel.com>
Reviewed by: carl
MFC after: 3 days


253437 17-Jul-2013 jimharris

Define constants for the lengths of the serial number, model number
and firmware revision in the controller's identify structure.

Also modify consumers of these fields to ensure they only use the
specified number of bytes for their respective fields.

Sponsored by: Intel
Reviewed by: carl
MFC after: 3 days


253209 11-Jul-2013 jimharris

Fix a poorly worded comment in nvme(4).

MFC after: 3 days


253113 09-Jul-2013 jimharris

Add comment explaining why CACHE_LINE_SIZE is defined in nvme_private.h
if not already defined elsewhere.

Requested by: attilio
MFC after: 3 days


253112 09-Jul-2013 jimharris

Update copyright dates.

MFC after: 3 days


253108 09-Jul-2013 jimharris

Do not retry failed async event requests.

Sponsored by: Intel
MFC after: 3 days


253107 09-Jul-2013 jimharris

Add pci_enable_busmaster() and pci_disable_busmaster() calls in
nvme_attach() and nvme_detach() respectively.

Sponsored by: Intel
MFC after: 3 days


252278 27-Jun-2013 jimharris

Add firmware replacement and activation support to nvmecontrol(8) through
a new firmware command.

NVMe controllers may support up to 7 firmware slots for storing of
different firmware revisions. This new firmware command supports
firmware replacement (i.e. firmware download) with or without immediate
activation, or activation of a previously stored firmware image. It
also supports selection of the firmware slot during replacement
operations, using IDENTIFY information from the controller to
check that the specified slot is valid.

Newly activated firmware does not take effect until the new controller
reset, either via a reboot or separate 'nvmecontrol reset' command to the
same controller.

Submitted by: Joe Golio <joseph.golio@emc.com>
Obtained from: EMC / Isilon Storage Division
MFC after: 3 days


252273 26-Jun-2013 jimharris

Remove remaining uio-related code.

The nvme_physio() function was removed quite a while ago, which was the
only user of this uio-related code.

Sponsored by: Intel
MFC after: 3 days


252272 26-Jun-2013 jimharris

Fail any passthrough command whose transfer size exceeds the controller's
max transfer size. This guards against rogue commands coming in from
userspace.

Also add KASSERTS for the virtual address and unmapped bio cases, if the
transfer size exceeds the controller's max transfer size.

Sponsored by: Intel
MFC after: 3 days


252271 26-Jun-2013 jimharris

Use MAXPHYS to specify the maximum I/O size for nvme(4).

Also allow admin commands to transfer up to this maximum I/O size, rather
than the artificial limit previously imposed. The larger I/O size is very
beneficial for upcoming firmware download support. This has the added
benefit of simplifying the code since both admin and I/O commands now use
the same maximum I/O size.

Sponsored by: Intel
MFC after: 3 days


249422 12-Apr-2013 jimharris

Remove the NVME_IDENTIFY_CONTROLLER and NVME_IDENTIFY_NAMESPACE IOCTLs and replace
them with the NVMe passthrough equivalent.

Sponsored by: Intel


249421 12-Apr-2013 jimharris

Add support for passthrough NVMe commands.

This includes a new IOCTL to support a generic method for nvmecontrol(8) to pass
IDENTIFY, GET_LOG_PAGE, GET_FEATURES and other commands to the controller, rather than
separate IOCTLs for each.

Sponsored by: Intel


249420 12-Apr-2013 jimharris

Move the busdma mapping functions to nvme_qpair.c.

This removes nvme_uio.c completely.

Sponsored by: Intel


249419 12-Apr-2013 jimharris

Remove the NVMe-specific physio and associated routines.

These were added early on for benchmarking purposes to avoid the mapped I/O
penalties incurred in kern_physio. Now that FreeBSD (including kern_physio)
supports unmapped I/O, the need for these NVMe-specific routines no longer exists.

Sponsored by: Intel


249418 12-Apr-2013 jimharris

Add a mutex to each namespace, for general locking operations on the namespace.

Sponsored by: Intel


249417 12-Apr-2013 jimharris

Rename the controller's fail_req_lock, so that it can be used for other
locking operations on the controller.

Sponsored by: Intel


249416 12-Apr-2013 jimharris

Do not panic when a busdma mapping operation fails.

Instead, print an error message and fail the associated command with
DATA_TRANSFER_ERROR NVMe completion status.

Sponsored by: Intel


248977 01-Apr-2013 jimharris

Add unmapped bio support to nvme(4) and nvd(4).

Sponsored by: Intel


248913 29-Mar-2013 jimharris

Add "type" to nvme_request, signifying if its payload is a VADDR, UIO, or
NULL. This simplifies decisions around if/how requests are routed through
busdma. It also paves the way for supporting unmapped bios.

Sponsored by: Intel


248835 28-Mar-2013 jimharris

Remove obsolete comment. This code has now been tested with the QEMU
NVMe device emulator.


248834 28-Mar-2013 jimharris

Delete extra IO qpairs allocated based on number of MSI-X vectors, but
later found to not be usable because the controller doesn't support the
same number of queues.

This is not the normal case, but does occur with the Chatham prototype
board.

Sponsored by: Intel


248780 27-Mar-2013 jimharris

Fix printf format issue on i386.

Reported by: bz


248773 26-Mar-2013 jimharris

Clean up debug prints.

1) Consistently use device_printf.
2) Make dump_completion and dump_command into something more
human-readable.

Sponsored by: Intel
Reviewed by: carl


248771 26-Mar-2013 jimharris

Move common code from the different nvme_allocate_request functions into a
separate function.

Sponsored by: Intel
Suggested by: carl
Reviewed by: carl


248770 26-Mar-2013 jimharris

Change a number of malloc(9) calls to use M_WAITOK instead of
M_NOWAIT.

Sponsored by: Intel
Suggested by: carl
Reviewed by: carl


248769 26-Mar-2013 jimharris

Replace usages of mtx_pool_find used for admin commands with a polling
mechanism.

Now that all requests are timed, we are guaranteed to get a completion
notification, even if it is an abort status due to a timed out admin
command.

This has the effect of simplifying the controller and namespace setup
code, so that it reads straight through rather than broken up into
a bunch of different callback functions.

Sponsored by: Intel
Reviewed by: carl


248768 26-Mar-2013 jimharris

Abort and do not retry any outstanding admin commands left over after
a controller reset.

Sponsored by: Intel
Reviewed by: carl


248767 26-Mar-2013 jimharris

Add the ability to internally mark a controller as failed, if it is unable to
start or reset. Also add a notifier for NVMe consumers for controller fail
conditions and plumb this notifier for nvd(4) to destroy the associated
GEOM disks when a failure occurs.

This requires a bit of work to cover the races when a consumer is sending
I/O requests to a controller that is transitioning to the failed state. To
help cover this condition, add a task to defer completion of I/Os submitted
to a failed controller, so that the consumer will still always receive its
completions in a different context than the submission.

Sponsored by: Intel
Reviewed by: carl


248766 26-Mar-2013 jimharris

Just disable the controller instead of deleting IO queues during detach.

This is just as effective, and removes the need for a bunch of admin commands
to a controller that's going to be disabled shortly anyways.

Sponsored by: Intel
Reviewed by: carl


248764 26-Mar-2013 jimharris

Set Pre-boot Software Load Count to 0 at the end of the controller
start process.

The spec indicates the OS driver should use Set Features (Software
Progress Marker) to set the pre-boot software load count to 0
after the OS driver has successfully been initialized. This allows
pre-boot software to determine if there have been any issues with the
OS loading.

Sponsored by: Intel
Reviewed by: carl


248763 26-Mar-2013 jimharris

Remove the is_started flag from struct nvme_controller.

This flag was originally added to communicate to the sysctl code
which oids should be built, but there are easier ways to do this. This
needs to be cleaned up prior to adding new controller states - for example,
controller failure.

Sponsored by: Intel
Reviewed by: carl


248762 26-Mar-2013 jimharris

Ensure the controller's MDTS is accounted for in max_xfer_size.

The controller's IDENTIFY data contains MDTS (Max Data Transfer Size) to
allow the controller to specify the maximum I/O data transfer size. nvme(4)
already provides a default maximum, but make sure it does not exceed what
MDTS reports.

Sponsored by: Intel
Reviewed by: carl


248761 26-Mar-2013 jimharris

Cap the number of retry attempts to a configurable number. This ensures
that if a specific I/O repeatedly times out, we don't retry it indefinitely.

The default number of retries will be 4, but is adjusted using hw.nvme.retry_count.

Sponsored by: Intel
Reviewed by: carl


248760 26-Mar-2013 jimharris

Pass associated log page data to async event consumers, if requested.

Sponsored by: Intel
Reviewed by: carl


248759 26-Mar-2013 jimharris

When an asynchronous event request is completed, automatically fetch the
specified log page.

This satisfies the spec condition that future async events of the same type
will not be sent until the associated log page is fetched.

Sponsored by: Intel
Reviewed by: carl


248758 26-Mar-2013 jimharris

Add structure definitions and controller command function for firmware
log pages.

Sponsored by: Intel
Reviewed by: carl


248757 26-Mar-2013 jimharris

Add structure definitions and a controller command function for
error log pages.

Sponsored by: Intel
Reviewed by: carl


248756 26-Mar-2013 jimharris

Create struct nvme_status.

NVMe error log entries include status, so breaking this out into
its own data structure allows it to be included in both the
nvme_completion data structure as well as error log entry data
structures.

While here, expose nvme_completion_is_error(), and change all of
the places that were explicitly looking at sc/sct bits to use this
macro instead.

Sponsored by: Intel
Reviewed by: carl


248755 26-Mar-2013 jimharris

Make nvme_ctrlr_reset a nop if a reset is already in progress.

This protects against cases where a controller crashes with multiple
I/O outstanding, each timing out and requesting controller resets
simultaneously.

While here, remove a debugging printf from a previous commit, and add
more logging around I/O that need to be resubmitted after a controller
reset.

Sponsored by: Intel
Reviewed by: carl


248754 26-Mar-2013 jimharris

By default, always escalate to controller reset when an I/O times out.

While aborts are typically cleaner than a full controller reset, many times
an I/O timeout indicates other controller-level issues where aborts may not
work. NVMe drivers for other operating systems are also defaulting to
controller reset rather than aborts for timed out I/O.

Sponsored by: Intel
Reviewed by: carl


248749 26-Mar-2013 jimharris

Add a tunable for the I/O timeout interval. Default is still 30 seconds,
but can be adjusted between a min/max of 5 and 120 seconds.

Sponsored by: Intel
Reviewed by: carl


248748 26-Mar-2013 jimharris

Add handling for controller fatal status (csts.cfs).

On any I/O timeout, check for csts.cfs==1. If set, the controller
is reporting fatal status and we reset the controller immediately,
rather than trying to abort the timed out command.

This changeset also includes deferring the controller start portion
of the reset to a separate task. This ensures we are always performing
a controller start operation from a consistent context.

Sponsored by: Intel
Reviewed by: carl


248747 26-Mar-2013 jimharris

Add API for nvme consumers to access controller and namespace identify data.

Sponsored by: Intel
Reviewed by: carl


248746 26-Mar-2013 jimharris

Add controller reset capability to nvme(4) and ability to explicitly
invoke it from nvmecontrol(8).

Controller reset will be performed in cases where I/O are repeatedly
timing out, the controller reports an unrecoverable condition, or
when explicitly requested via IOCTL or an nvme consumer. Since the
controller may be in such a state where it cannot even process queue
deletion requests, we will perform a controller reset without trying
to clean up anything on the controller first.

Sponsored by: Intel
Reviewed by: carl


248741 26-Mar-2013 jimharris

Keep a doubly-linked list of outstanding trackers.

This enables in-order re-submission of I/O after a controller reset.

Sponsored by: Intel


248740 26-Mar-2013 jimharris

Create a generic nvme_ctrlr_cmd_get_log_page function, and change the
health information log page function to use it.

Sponsored by: Intel


248739 26-Mar-2013 jimharris

Expose the get/set features API to nvme consumers.

Sponsored by: Intel


248738 26-Mar-2013 jimharris

Add an interface for nvme shim drivers (i.e. nvd) to register for
notifications when new nvme controllers are added to the system.

Sponsored by: Intel


248737 26-Mar-2013 jimharris

Enable asynchronous event requests on non-Chatham devices.

Also add logic to clean up all outstanding asynchronous event requests
when resetting or shutting down the controller, since these requests
will not be explicitly completed by the controller itself.

Sponsored by: Intel


248736 26-Mar-2013 jimharris

Move controller destruction code from nvme_detach() to new nvme_ctrlr_destruct()
function.

Sponsored by: Intel


248735 26-Mar-2013 jimharris

Specify command timeout interval on a per-command type basis.

This is primarily driven by the need to disable timeouts for asynchronous
event requests, which by nature should not be timed out.

Sponsored by: Intel


248734 26-Mar-2013 jimharris

Explicitly abort a timed out command, if the ABORT command sent to the
controller indicates the command was not found.

Sponsored by: Intel


248733 26-Mar-2013 jimharris

Break out the code for completing an nvme_tracker object into a separate
function.

This allows for completions outside the normal completion path, for example
when an ABORT command fails due to the controller reporting the targeted
command does not exist. This is mainly for protection against a faulty
controller, but we need to clean up our internal request nonetheless.

Sponsored by: Intel


248732 26-Mar-2013 jimharris

Add support for ABORT commands, including issuing these commands when
an I/O times out.

Also ensure that we retry commands that are aborted due to a timeout.

Sponsored by: Intel


248731 26-Mar-2013 jimharris

Add an internal _nvme_qpair_submit_request function, which performs
the submit action assuming the qpair lock has already been acquired.

Also change nvme_qpair_submit_request to just lock/unlock the mutex
around a call to this new function.

This fixes a recursive mutex acquisition in the retry path.

Sponsored by: Intel


248730 26-Mar-2013 jimharris

Make the DSM range count 0-based. Previously we were deallocating one more
LBA than we should have been.

Sponsored by: Intel


248729 26-Mar-2013 jimharris

Do not look at the namespace's thin provisioning field to determine if DSM
command is supported. The two are not related.

Sponsored by: Intel


247963 07-Mar-2013 obrien

Fix GCC build:
/usr/src/sys/modules/nvme/../../dev/nvme/nvme.c:211: warning: format '%qx' expects type 'long unsigned int', but argument 9 has type 'long long unsigned int' [-Wformat]


245136 07-Jan-2013 jimharris

Revert r244549.

This change was originally intended to account for test kthreads under
the nvmecontrol process, but jhb indicated it may not be safe to
associate kthreads with userland processes and this could have
unintended consequences.

I did not observe any problems with this change, but my testing didn't
exhaust the kinds of corner cases that could cause problems. It is not
that important to account for these test threads under nvmecontrol, so I
am just reverting this change for now.

On a related note, the part of this patch for <= 7.x fails compilation
so reverting this fixes that too.

Suggested by: jhb


244549 21-Dec-2012 jimharris

Put kthreads under curproc so they are attached to nvmecontrol rather
than pid 0.

Sponsored by: Intel


244413 18-Dec-2012 jimharris

Map BAR 4/5, because NVMe spec says devices may place the MSI-X table
behind BAR 4/5, rather than in BAR 0/1 with the control/doorbell registers.

Sponsored by: Intel


244411 18-Dec-2012 jimharris

Simplify module definition by adding nvme_modevent to DRIVER_MODULE()
definition.

Submitted by: Carl Delsey <carl.r.delsey@intel.com>


244410 18-Dec-2012 jimharris

Do not use taskqueue to defer completion work when using INTx. INTx now
matches MSI-X behavior.

Sponsored by: Intel


243951 06-Dec-2012 jimharris

Add PCI device ID for 8-channel IDT NVMe controller, and clarify that the
previously defined IDT PCI device ID was for a 32-channel controller.

Submitted by: Joe Golio <joseph.golio@isilon.com>


242420 31-Oct-2012 jimharris

Use callout_reset_curcpu to allow the callout to be handled by the
current CPU and not always CPU 0.

This has the added benefit of reducing a huge amount of spinlock
contention on the callout_cpu spinlock for CPU 0.

Sponsored by: Intel


241689 18-Oct-2012 glebius

Fix build after r241659.


241665 18-Oct-2012 jimharris

Add ability to queue nvme_request objects if no nvme_trackers are available.

This eliminates the need to manage queue depth at the nvd(4) level for
Chatham prototype board workarounds, and also adds the ability to
accept a number of requests on a single qpair that is much larger
than the number of trackers allocated.

Sponsored by: Intel


241664 18-Oct-2012 jimharris

Preallocate a limited number of nvme_tracker objects per qpair, rather
than dynamically creating them at runtime.

Sponsored by: Intel


241663 18-Oct-2012 jimharris

Create nvme_qpair_submit_request() which eliminates all of the code
duplication between the admin and io controller-level submit
functions.

Sponsored by: Intel


241662 18-Oct-2012 jimharris

Simplify how the qpair lock is acquired and released.

Sponsored by: Intel


241661 18-Oct-2012 jimharris

Cleanup uio-related code to use struct nvme_request and
nvme_ctrlr_submit_io_request().

While here, also fix case where a uio may have more than 1 iovec.
NVMe's definition of SGEs (called PRPs) only allows for the first SGE to
start on a non-page boundary. The simplest way to handle this is to
construct a temporary uio for each iovec, and submit an NVMe request
for each.

Sponsored by: Intel


241660 18-Oct-2012 jimharris

Add nvme_ctrlr_submit_[admin|io]_request functions which consolidates
code for allocating nvme_tracker objects and making calls into
bus_dmamap_load for commands which have payloads.

Sponsored by: Intel


241659 18-Oct-2012 jimharris

Add struct nvme_request object which contains all of the parameters passed
from an NVMe consumer.

This allows us to mostly build NVMe command buffers without holding the
qpair lock, and also allows for future queueing of nvme_request objects
in cases where the submission queue is full and no nvme_tracker objects
are available.

Sponsored by: Intel


241658 18-Oct-2012 jimharris

Merge struct nvme_prp_list into struct nvme_tracker.

This simplifies the driver significantly where it is constructing
commands to be submitted to hardware. By reducing the number of
PRPs (NVMe parlance for SGE) from 128 to 32, it ensures we do not
allocate too much memory for more common smaller I/O sizes, while
still supporting up to 128KB I/O sizes.

This also paves the way for pre-allocation of nvme_tracker objects
for each queue which will simplify the I/O path even further.

Sponsored by: Intel


241657 18-Oct-2012 jimharris

Add return codes to all functions used for submitting commands to I/O
queues.

Sponsored by: Intel


241434 10-Oct-2012 jimharris

Count number of times each queue pair's interrupt handler is invoked.

Also add sysctls to query and reset each queue pair's stats, including
the new count added here.

Sponsored by: Intel


241433 10-Oct-2012 jimharris

Put the nvme_qpair mutex on its own cacheline.

Sponsored by: Intel


241394 10-Oct-2012 kevlo

Revert previous commit...

Pointyhat to: kevlo (myself)


241370 09-Oct-2012 kevlo

Prefer NULL over 0 for pointers


240700 19-Sep-2012 jimharris

In nvme(4), set device description for BUS_PROBE_GENERIC case.

Reported by: jhb


240697 19-Sep-2012 jimharris

Report nvme(4) as a generic driver for NVMe devices if PCI class, subclass
and programming interface codes match.

Sponsored by: Intel


240672 18-Sep-2012 jimharris

Add #if 0 around nvme_async_event_cb() until NVMe AER functionality
can be tested.

This fixes a build warning found only with clang.


240671 18-Sep-2012 jimharris

Add __aligned(4) to NVMe defined data structures.

This fixes issue in nvmecontrol(8), where clang throws a cast-align
warning when casting a __packed structure pointer to a uint32_t
pointer as part of printing raw hex output.

Reported by: dhw


240616 17-Sep-2012 jimharris

This is the first of several commits which will add NVM Express (NVMe)
support to FreeBSD. A full description of the overall functionality
being added is below. nvmexpress.org defines NVM Express as "an optimized
register interface, command set and feature set fo PCI Express (PCIe)-based
Solid-State Drives (SSDs)."

This commit adds nvme(4) and nvd(4) driver source code and Makefiles
to the tree.

Full NVMe functionality description:
Add nvme(4) and nvd(4) drivers and nvmecontrol(8) for NVM Express (NVMe)
device support.

There will continue to be ongoing work on NVM Express support, but there
is more than enough to allow for evaluation of pre-production NVM Express
devices as well as soliciting feedback. Questions and feedback are welcome.

nvme(4) implements NVMe hardware abstraction and is a provider of NVMe
namespaces. The closest equivalent of an NVMe namespace is a SCSI LUN.
nvd(4) is an NVMe consumer, surfacing NVMe namespaces as GEOM disks.
nvmecontrol(8) is used for NVMe configuration and management.

The following are currently supported:
nvme(4)
- full mandatory NVM command set support
- per-CPU IO queues (enabled by default but configurable)
- per-queue sysctls for statistics and full command/completion queue
dumps for debugging
- registration API for NVMe namespace consumers
- I/O error handling (except for timeoutsee below)
- compilation switches for support back to stable-7

nvd(4)
- BIO_DELETE and BIO_FLUSH (if supported by controller)
- proper BIO_ORDERED handling

nvmecontrol(8)
- devlist: list NVMe controllers and their namespaces
- identify: display controller or namespace identify data in
human-readable or hex format
- perftest: quick and dirty performance test to measure raw
performance of NVMe device without userspace/physio/GEOM
overhead

The following are still work in progress and will be completed over the
next 3-6 months in rough priority order:
- complete man pages
- firmware download and activation
- asynchronous error requests
- command timeout error handling
- controller resets
- nvmecontrol(8) log page retrieval

This has been primarily tested on amd64, with light testing on i386. I
would be happy to provide assistance to anyone interested in porting
this to other architectures, but am not currently planning to do this
work myself. Big-endian and dmamap sync for command/completion queues
are the main areas that would need to be addressed.

The nvme(4) driver currently has references to Chatham, which is an
Intel-developed prototype board which is not fully spec compliant.
These references will all be removed over time.

Sponsored by: Intel
Contributions from: Joe Golio/EMC <joseph dot golio at emc dot com>