History log of /freebsd-current/sys/geom/geom_dev.c
Revision Date Author Comments
# fdafd315 24-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Automated cleanup of cdefs and other formatting

Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.

Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/

Sponsored by: Netflix


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# bd5d9037 28-Dec-2022 Zhenlei Huang <zlei@FreeBSD.org>

GEOM: Remove redundant NULL pointer check before g_free()

Reviewed by: melifaro, pjd, imp
Approved by: kp (mentor)
Differential Revision: https://reviews.freebsd.org/D37779


# 5f438dd3 10-Jun-2022 Alan Somers <asomers@FreeBSD.org>

ses: don't panic if disk elements have really weird descriptors

SES allows element descriptors to contain characters like spaces and
quotes that devfs does not allow to appear in device aliases. Since SES
element descriptors are outside of the kernel's control, we should
gracefully handle a failure to create a device physical path alias.

PR: 264513
Reported by: Yuri <yuri@aetern.org>
Reviewed by: imp, mav
Sponsored by: Axcient
MFC after: 2 weeks


# 65c87a6c 28-Apr-2022 Robert Wing <rew@FreeBSD.org>

geom_dev: extend kevent support for geom dev

Add support for the following NOTE events:
NOTE_OPEN, NOTE_CLOSE, NOTE_CLOSE_WRITE, NOTE_READ, and NOTE_WRITE.

Differential Revision: https://reviews.freebsd.org/D34777


# 0a90043e 14-Apr-2022 Mitchell Horne <mhorne@FreeBSD.org>

Remove 12.x ABI compat for kernel dump ioctls

This code was marked gone_in(14), so it can now be removed.

The only consumer of this interface is dumpon(8). We do not maintain
strict backwards compatibility for this utility because a) it
can't/shouldn't be used from a jail or chroot and b) it is highly
specific interface unique to FreeBSD. The host's (presumably more
up-to-date) copy of dumpon(8) should be used to configure kernel dump
devices.

Reviewed by: markj, emaste
MFC after: never
Differential Revision: https://reviews.freebsd.org/D34914


# 9c90bfcd 14-Apr-2022 Mitchell Horne <mhorne@FreeBSD.org>

Remove 11.x ABI compat for kernel dump ioctls

This code was marked gone_in(13), so its time has passed.

The only consumer of this interface is dumpon(8). We do not maintain
strict backwards compatibility for this utility because a) it
can't/shouldn't be used from a jail or chroot and b) it is highly
specific interface unique to FreeBSD. The host's (presumably more
up-to-date) copy of dumpon(8) should be used to configure kernel dump
devices.

Reviewed by: markj, emaste
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D34913


# d91d2b51 20-Jan-2022 Mark Johnston <markj@FreeBSD.org>

geom: Handle partial I/O in g_{read,write,delete}_data()

These routines are used internally by GEOM to dispatch I/O requests to a
provider, typically for tasting or for updating GEOM class metadata
blocks.

These routines assumed that partial I/O did not occur without setting
BIO_ERROR, but this is possible in at least two cases:
- Some or all of the I/O range is beyond the provider's mediasize.
In this scenario g_io_check() truncates the bounds of the request
before it is handed to the target provider.
- A read from vnode-backed md(4) device returns EOF (the backing vnode
is allowed to be smaller than the device itself) or partial vnode I/O
occurs.
In these scenarios g_read_data() could return a partially uninitialized
buffer. Many consumers are not affected by the first case, since the
offsets used for provider metadata or tasting are relative to the
provider's mediasize, but in some cases metadata is read at fixed
offsets, such as when searching for a UFS superblock using the offsets
defined by SBLOCKSEARCH.

Thus, modify the routines to explicitly check for a non-zero residual
and return EIO in that case. Remove a related check from the
DIOCGDELETE ioctl handler, it is handled within g_delete_data() now.

Reviewed by: mav, imp, kib
Reported by: KMSAN
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31293


# a50e92cc 18-Jan-2022 Robert Wing <rew@FreeBSD.org>

geom: add kqfilter support for geom dev

The only event hooked up is NOTE_ATTRIB, which is triggered when the
device is resized. Support for other NOTE_* events to follow.

Reviewed by: kib, jhb
Differential Revision: https://reviews.freebsd.org/D33402


# cd853791 27-Nov-2020 Konstantin Belousov <kib@FreeBSD.org>

Make MAXPHYS tunable. Bump MAXPHYS to 1M.

Replace MAXPHYS by runtime variable maxphys. It is initialized from
MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.

Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer
cache buffers exactly to atop(maxbcachebuf) (currently it is sized to
atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1.
The +1 for pbufs allow several pbuf consumers, among them vmapbuf(),
to use unaligned buffers still sized to maxphys, esp. when such
buffers come from userspace (*). Overall, we save significant amount
of otherwise wasted memory in b_pages[] for buffer cache buffers,
while bumping MAXPHYS to desired high value.

Eliminate all direct uses of the MAXPHYS constant in kernel and driver
sources, except a place which initialize maxphys. Some random (and
arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted
straight. Some drivers, which use MAXPHYS to size embeded structures,
get private MAXPHYS-like constant; their convertion is out of scope
for this work.

Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs,
dev/siis, where either submitted by, or based on changes by mav.

Suggested by: mav (*)
Reviewed by: imp, mav, imp, mckusick, scottl (intermediate versions)
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D27225


# a3f4217e 27-Oct-2020 Warner Losh <imp@FreeBSD.org>

Remove frontstuff

Nothing implements this in the tree. Remove the ioctl and the
conversion to the geom atttribute stuff.

This was introduced in r94287 in 2002 and was retired in r113390
2003. It appeared in FreeBSD 5.0, but no other releases. This is a
vestige that was missed at the time and overlooked until now. No
compat is provided for this reason. And there's no implementation of
it today. And it was never part of a release from a stable branch.

Reviewed by: phk@
Differential Revision: https://reviews.freebsd.org/D26967


# 3001e97d 19-Oct-2020 Edward Tomasz Napierala <trasz@FreeBSD.org>

Fix fallout from r366811.

PR: 250442
Reported by: lwhsu
Reviewed by: mav
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D26855


# d22ff249 18-Oct-2020 Edward Tomasz Napierala <trasz@FreeBSD.org>

Make g_attach() return ENXIO for orphaned providers; update various
classes to add missing error checking.

Reviewed by: imp
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D26658


# 887611b1 28-Aug-2020 Warner Losh <imp@FreeBSD.org>

Retire devctl_notify_f()

devctl_notify_f isn't needed, so retire it. The flags argument is now
unused, so rather than keep it around, retire it. Convert all old
users of it to devctl_notify(). This path no longer sleeps, so is safe
to call from any context. Since it doesn't sleep, it doesn't need to
know if it is OK to sleep or not.

Reviewed by: markj@
Differential Revision: https://reviews.freebsd.org/D26140


# 773e541e 20-Aug-2020 Warner Losh <imp@FreeBSD.org>

Use devctl.h instead of bus.h to reduce newbus pollution.

There's no need for these parts of the kernel to know about newbus,
so narrow what is included to devctl.h for device_notify_*.

Suggested by: kib@


# 8510f61a 08-Jul-2020 Xin LI <delphij@FreeBSD.org>

sys/geom: consistently use _PATH_DEV instead of hardcoding "/dev/".

Reviewed by: cem
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D25565


# 4a711b8d 25-Jun-2020 John Baldwin <jhb@FreeBSD.org>

Use zfree() instead of explicit_bzero() and free().

In addition to reducing lines of code, this also ensures that the full
allocation is always zeroed avoiding possible bugs with incorrect
lengths passed to explicit_bzero().

Suggested by: cem
Reviewed by: cem, delphij
Approved by: csprng (cem)
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D25435


# a9ca503b 06-Jun-2020 Conrad Meyer <cem@FreeBSD.org>

Revert r361838

Reported by: delphij


# 5b9b571c 05-Jun-2020 Conrad Meyer <cem@FreeBSD.org>

geom_label: Use provider aliasing to alias upstream geoms

For synthetic aliases (just pseudonyms inferred from metadata like GPT or
UFS labels, GPT UUIDs, etc), use the GEOM provider aliasing system to create
a symlink to the real device instead of creating an independent device.
This makes it more clear which labels and devices correspond, and we can
safely have multiple labels to a single device accessed at once.

The confusingly named geom_label on-disk construct continues to behave
identically to how it did before.

This requires teaching GEOM's provider aliasing about the possibility
that aliases might be added later in time, and GEOM's devfs interaction
layer not to worry about existing aliases during retaste.

Discussed with: imp
Relnotes: sure, if we don't end up reverting it
Differential Revision: https://reviews.freebsd.org/D24968


# ae1cce52 13-May-2020 Warner Losh <imp@FreeBSD.org>

Reimplement aliases in geom

The alias needs to be part of the provider instead of the geom to work
properly. To bind the DEV geom, we need to look at the provider's names and
aliases and create the dev entries from there. If this lives in the GEOM, then
it won't propigate down the tree properly. Remove it from geom, add it provider.

Update geli, gmountver, gnop, gpart, and guzip to use it, which handles the bulk
of the uses in FreeBSD. I think this is all the providers that create a new name
based on their parent's name.


# 7029da5c 26-Feb-2020 Pawel Biernacki <kaktus@FreeBSD.org>

Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)

r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718


# 19cfcf25 05-Dec-2019 Alexander Motin <mav@FreeBSD.org>

Block ioctls for dying GEOM_DEV instances.

For normal I/Os consumer and provider statuses are checked by g_io_check().
But ioctl calls often do not go through it, being dispatched directly. This
change makes their semantics more alike, protecting lower levels.

MFC after: 2 weeks


# 6b3c68bf 05-Dec-2019 Alexander Motin <mav@FreeBSD.org>

Make GEOM_DEV code slightly more compact.

Should be no functional change.

MFC after: 2 weeks


# e61ed798 04-Dec-2019 Alexander Motin <mav@FreeBSD.org>

Switch GEOM_DEV from make_dev_p() to make_dev_s().

It closes the race condition and so allows to remove few NULL checks.

Also while there, use dev->si_drv1 in addition to cp->private to store
softc pointer. For calls coming from the dev side it gives reliable cache
hit instead of often miss before.

MFC after: 2 weeks
Sponsored by: iXsystems, Inc.


# 6b6e2954 06-May-2019 Conrad Meyer <cem@FreeBSD.org>

List-ify kernel dump device configuration

Allow users to specify multiple dump configurations in a prioritized list.
This enables fallback to secondary device(s) if primary dump fails. E.g.,
one might configure a preference for netdump, but fallback to disk dump as a
second choice if netdump is unavailable.

This change does not list-ify netdump configuration, which is tracked
separately from ordinary disk dumps internally; only one netdump
configuration can be made at a time, for now. It also does not implement
IPv6 netdump.

savecore(8) is already capable of scanning and iterating multiple devices
from /etc/fstab or passed on the command line.

This change doesn't update the rc or loader variables 'dumpdev' in any way;
it can still be set to configure a single dump device, and rc.d/savecore
still uses it as a single device. Only dumpon(8) is updated to be able to
configure the more complicated configurations for now.

As part of revving the ABI, unify netdump and disk dump configuration ioctl
/ structure, and leave room for ipv6 netdump as a future possibility.
Backwards-compatibility ioctls are added to smooth ABI transition,
especially for developers who may not keep kernel and userspace perfectly
synced.

Reviewed by: markj, scottl (earlier version)
Relnotes: maybe
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D19996


# 9c498bd5 24-Apr-2019 Alexander Motin <mav@FreeBSD.org>

Call delist_dev() before destroy_dev_sched_cb().

destroy_dev_sched_cb() is excessively asynchronous, and during media change
retaste new provider may appear sooner then device of the previous one get
destroyed.

MFC after: 1 week
Sponsored by: iXsystems, Inc.


# e6b0d5eb 30-Mar-2019 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Introduce new event SIZECHANGE within GEOM system to inform about GEOM
providers mediasize changes.

While here, use GEOM nomenclature to describe providers instead of calling
them device nodes.

Obtained from: Fudo Security
Tested in: AWS


# d4fbe32c 19-Feb-2019 Mark Johnston <markj@FreeBSD.org>

Limit the number of entries allocated for a REPORT_ZONES command.

The DIOCGETZONE ioctl can be used to fetch the zone list of an SMR
drive, and the caller specifies the number of entries it wants to fetch.
Clamp the caller's request to a sane limit so that a user cannot attempt
large allocations. Callers already need to invoke the ioctl multiple
times to fetch the full list in general, so there's no harm in limiting
the number of entries returned.

Fix style while here.

admbug: 807
Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com>
Reviewed by: asomers, ken
Tested by: ken
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19249


# 02a99230 27-Dec-2018 Alexander Motin <mav@FreeBSD.org>

Switch from mutexes to atomics in GEOM_DEV I/O path.

Mutexes in I/O path there were used twice per I/O to atomically access
several variables to close and/or destroy the device on last request
completion. I found the way to fit all required info into one integer,
suitable for atomic operations. It opened race window on device close,
but addition of timeout to the msleep() there should cover it.

Profiling shows removal of significant spinning time on those mutexes
and IOPS increase from ~600K to >800K to NVMe on 72-core systems.

MFC after: 1 month
Sponsored by: iXsystems, Inc.


# 9dcafe16 04-Dec-2018 Maxim Sobolev <sobomax@FreeBSD.org>

Another attempt to fix issue with the DIOCGDELETE ioctl(2) not
handling slightly out-of-bound requests properly (r340187).
Perform range check here rather then rely on g_delete_data() to DTRT.

The g_delete_data() would always return success for requests
starting just the next byte after providers media boundary.

MFC after: 4 weeks


# bd92e6b6 05-May-2018 Mark Johnston <markj@FreeBSD.org>

Refactor some of the MI kernel dump code in preparation for netdump.

- Add clear_dumper() to complement set_dumper().
- Drain netdump's preallocated mbuf pool when clearing the dumper.
- Don't do bounds checking for dumpers with mediasize 0.
- Add dumper callbacks for initialization for writing out headers.

Reviewed by: sbruno
MFC after: 1 month
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D15252


# 6469bdcd 06-Apr-2018 Brooks Davis <brooks@FreeBSD.org>

Move most of the contents of opt_compat.h to opt_global.h.

opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by: kib, cem, jhb, jtl
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14941


# 3728855a 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/geom: adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.


# 64a16434 24-Oct-2017 Mark Johnston <markj@FreeBSD.org>

Add support for compressed kernel dumps.

When using a kernel built with the GZIO config option, dumpon -z can be
used to configure gzip compression using the in-kernel copy of zlib.
This is useful on systems with large amounts of RAM, which require a
correspondingly large dump device. Recovery of compressed dumps is also
faster since fewer bytes need to be copied from the dump device.

Because we have no way of knowing the final size of a compressed dump
until it is written, the kernel will always attempt to dump when
compression is configured, regardless of the dump device size. If the
dump is aborted because we run out of space, an error is reported on
the console.

savecore(8) is modified to handle compressed dumps and save them to
vmcore.<index>.gz, as it does when given the -z option.

A new rc.conf variable, dumpon_flags, is added. Its value is added to
the boot-time dumpon(8) invocation that occurs when a dump device is
configured in rc.conf.

Reviewed by: cem (earlier version)
Discussed with: def, rgrimes
Relnotes: yes
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D11723


# 36d6e014 07-Aug-2017 Warner Losh <imp@FreeBSD.org>

Eliminate useless adjustments of aliased device.

No need to set any fields in the cloned device. devfs uses symlinks,
so the adev entries returned won't be presented to the drivers. Since
we don't save copies, nothing else will see them. This code came from
the old compat code, and it appears to be obsolete or never needed.

Submitted by: kib@
Differential Review: https://reviews.freebsd.org/D11919


# c624eb25 07-Aug-2017 Warner Losh <imp@FreeBSD.org>

Add aliasing concept to geom.

Add an alias name list to geoms. Use them in geom_dev to create
aliases. Previously, geom_dev would create an device node for the name
of the geom. Now, additional nodes are created pointing back to the
primary node with make_dev_alias_p. Aliases must be in place on the
geom before any tasting occurs.

Differential Revision: https://reviews.freebsd.org/D11873


# 4d5832bc 07-Mar-2017 Alexander Motin <mav@FreeBSD.org>

When chunking large DIOCGDELETE, do it on stripe edge.

MFC after: 2 weeks


# 480f31c2 10-Dec-2016 Konrad Witaszczyk <def@FreeBSD.org>

Add support for encrypted kernel crash dumps.

Changes include modifications in kernel crash dump routines, dumpon(8) and
savecore(8). A new tool called decryptcore(8) was added.

A new DIOCSKERNELDUMP I/O control was added to send a kernel crash dump
configuration in the diocskerneldump_arg structure to the kernel.
The old DIOCSKERNELDUMP I/O control was renamed to DIOCSKERNELDUMP_FREEBSD11 for
backward ABI compatibility.

dumpon(8) generates an one-time random symmetric key and encrypts it using
an RSA public key in capability mode. Currently only AES-256-CBC is supported
but EKCD was designed to implement support for other algorithms in the future.
The public key is chosen using the -k flag. The dumpon rc(8) script can do this
automatically during startup using the dumppubkey rc.conf(5) variable. Once the
keys are calculated dumpon sends them to the kernel via DIOCSKERNELDUMP I/O
control.

When the kernel receives the DIOCSKERNELDUMP I/O control it generates a random
IV and sets up the key schedule for the specified algorithm. Each time the
kernel tries to write a crash dump to the dump device, the IV is replaced by
a SHA-256 hash of the previous value. This is intended to make a possible
differential cryptanalysis harder since it is possible to write multiple crash
dumps without reboot by repeating the following commands:
# sysctl debug.kdb.enter=1
db> call doadump(0)
db> continue
# savecore

A kernel dump key consists of an algorithm identifier, an IV and an encrypted
symmetric key. The kernel dump key size is included in a kernel dump header.
The size is an unsigned 32-bit integer and it is aligned to a block size.
The header structure has 512 bytes to match the block size so it was required to
make a panic string 4 bytes shorter to add a new field to the header structure.
If the kernel dump key size in the header is nonzero it is assumed that the
kernel dump key is placed after the first header on the dump device and the core
dump is encrypted.

Separate functions were implemented to write the kernel dump header and the
kernel dump key as they need to be unencrypted. The dump_write function encrypts
data if the kernel was compiled with the EKCD option. Encrypted kernel textdumps
are not supported due to the way they are constructed which makes it impossible
to use the CBC mode for encryption. It should be also noted that textdumps don't
contain sensitive data by design as a user decides what information should be
dumped.

savecore(8) writes the kernel dump key to a key.# file if its size in the header
is nonzero. # is the number of the current core dump.

decryptcore(8) decrypts the core dump using a private RSA key and the kernel
dump key. This is performed by a child process in capability mode.
If the decryption was not successful the parent process removes a partially
decrypted core dump.

Description on how to encrypt crash dumps was added to the decryptcore(8),
dumpon(8), rc.conf(5) and savecore(8) manual pages.

EKCD was tested on amd64 using bhyve and i386, mipsel and sparc64 using QEMU.
The feature still has to be tested on arm and arm64 as it wasn't possible to run
FreeBSD due to the problems with QEMU emulation and lack of hardware.

Designed by: def, pjd
Reviewed by: cem, oshogbo, pjd
Partial review: delphij, emaste, jhb, kib
Approved by: pjd (mentor)
Differential Revision: https://reviews.freebsd.org/D4712


# 8532d381 31-Oct-2016 Conrad Meyer <cem@FreeBSD.org>

Add BUF_TRACKING and FULL_BUF_TRACKING buffer debugging

Upstream the BUF_TRACKING and FULL_BUF_TRACKING buffer debugging code.
This can be handy in tracking down what code touched hung bios and bufs
last. The full history is especially useful, but adds enough bloat that
it shouldn't be enabled in release builds.

Function names (or arbitrary string constants) are tracked in a
fixed-size ring in bufs. Bios gain a pointer to the upper buf for
tracking. SCSI CCBs gain a pointer to the upper bio for tracking.

Reviewed by: markj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8366


# 151746b2 27-May-2016 Alan Somers <asomers@FreeBSD.org>

Avoid issuing spa config updates for physical path when not necessary

ZFS's configuration needs to be updated whenever the physical path for a
device changes, but not when a new device is introduced. This is because new
devices necessarily cause config updates, but only if they are actually
accepted into the pool.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
Split vdev_geom_set_physpath out of vdev_geom_attrchanged. When
setting the vdev's physical path, only request a config update if
the physical path has changed. Don't request it when opening a
device for the first time, because the config sync will happen
anyway upstack.

sys/geom/geom_dev.c
Split g_dev_set_physpath and g_dev_set_media out of
g_dev_attrchanged

Submitted by: will, asomers
MFC after: 4 weeks
Sponsored by: Spectra Logic Corp
Differential Revision: https://reviews.freebsd.org/D6428


# 9a6844d5 19-May-2016 Kenneth D. Merry <ken@FreeBSD.org>

Add support for managing Shingled Magnetic Recording (SMR) drives.

This change includes support for SCSI SMR drives (which conform to the
Zoned Block Commands or ZBC spec) and ATA SMR drives (which conform to
the Zoned ATA Command Set or ZAC spec) behind SAS expanders.

This includes full management support through the GEOM BIO interface, and
through a new userland utility, zonectl(8), and through camcontrol(8).

This is now ready for filesystems to use to detect and manage zoned drives.
(There is no work in progress that I know of to use this for ZFS or UFS, if
anyone is interested, let me know and I may have some suggestions.)

Also, improve ATA command passthrough and dispatch support, both via ATA
and ATA passthrough over SCSI.

Also, add support to camcontrol(8) for the ATA Extended Power Conditions
feature set. You can now manage ATA device power states, and set various
idle time thresholds for a drive to enter lower power states.

Note that this change cannot be MFCed in full, because it depends on
changes to the struct bio API that break compatilibity. In order to
avoid breaking the stable API, only changes that don't touch or depend on
the struct bio changes can be merged. For example, the camcontrol(8)
changes don't depend on the new bio API, but zonectl(8) and the probe
changes to the da(4) and ada(4) drivers do depend on it.

Also note that the SMR changes have not yet been tested with an actual
SCSI ZBC device, or a SCSI to ATA translation layer (SAT) that supports
ZBC to ZAC translation. I have not yet gotten a suitable drive or SAT
layer, so any testing help would be appreciated. These changes have been
tested with Seagate Host Aware SATA drives attached to both SAS and SATA
controllers. Also, I do not have any SATA Host Managed devices, and I
suspect that it may take additional (hopefully minor) changes to support
them.

Thanks to Seagate for supplying the test hardware and answering questions.

sbin/camcontrol/Makefile:
Add epc.c and zone.c.

sbin/camcontrol/camcontrol.8:
Document the zone and epc subcommands.

sbin/camcontrol/camcontrol.c:
Add the zone and epc subcommands.

Add auxiliary register support to build_ata_cmd(). Make sure to
set the CAM_ATAIO_NEEDRESULT, CAM_ATAIO_DMA, and CAM_ATAIO_FPDMA
flags as appropriate for ATA commands.

Add a new get_ata_status() function to parse ATA result from SCSI
sense descriptors (for ATA passthrough over SCSI) and ATA I/O
requests.

sbin/camcontrol/camcontrol.h:
Update the build_ata_cmd() prototype

Add get_ata_status(), zone(), and epc().

sbin/camcontrol/epc.c:
Support for ATA Extended Power Conditions features. This includes
support for all features documented in the ACS-4 Revision 12
specification from t13.org (dated February 18, 2016).

The EPC feature set allows putting a drive into a power power mode
immediately, or setting timeouts so that the drive will
automatically enter progressively lower power states after various
idle times.

sbin/camcontrol/fwdownload.c:
Update the firmware download code for the new build_ata_cmd()
arguments.

sbin/camcontrol/zone.c:
Implement support for Shingled Magnetic Recording (SMR) drives
via SCSI Zoned Block Commands (ZBC) and ATA Zoned Device ATA
Command Set (ZAC).

These specs were developed in concert, and are functionally
identical. The primary differences are due to SCSI and ATA
differences. (SCSI is big endian, ATA is little endian, for
example.)

This includes support for all commands defined in the ZBC and
ZAC specs.

sys/cam/ata/ata_all.c:
Decode a number of additional ATA command names in ata_op_string().

Add a new CCB building function, ata_read_log().

Add ata_zac_mgmt_in() and ata_zac_mgmt_out() CCB building
functions. These support both DMA and NCQ encapsulation.

sys/cam/ata/ata_all.h:
Add prototypes for ata_read_log(), ata_zac_mgmt_out(), and
ata_zac_mgmt_in().

sys/cam/ata/ata_da.c:
Revamp the ada(4) driver to support zoned devices.

Add four new probe states to gather information needed for zone
support.

Add a new adasetflags() function to avoid duplication of large
blocks of flag setting between the async handler and register
functions.

Add new sysctl variables that describe zone support and paramters.

Add support for the new BIO_ZONE bio, and all of its subcommands:
DISK_ZONE_OPEN, DISK_ZONE_CLOSE, DISK_ZONE_FINISH, DISK_ZONE_RWP,
DISK_ZONE_REPORT_ZONES, and DISK_ZONE_GET_PARAMS.

sys/cam/scsi/scsi_all.c:
Add command descriptions for the ZBC IN/OUT commands.

Add descriptions for ZBC Host Managed devices.

Add a new function, scsi_ata_pass() to do ATA passthrough over
SCSI. This will eventually replace scsi_ata_pass_16() -- it
can create the 12, 16, and 32-byte variants of the ATA
PASS-THROUGH command, and supports setting all of the
registers defined as of SAT-4, Revision 5 (March 11, 2016).

Change scsi_ata_identify() to use scsi_ata_pass() instead of
scsi_ata_pass_16().

Add a new scsi_ata_read_log() function to facilitate reading
ATA logs via SCSI.

sys/cam/scsi/scsi_all.h:
Add the new ATA PASS-THROUGH(32) command CDB. Add extended and
variable CDB opcodes.

Add Zoned Block Device Characteristics VPD page.

Add ATA Return SCSI sense descriptor.

Add prototypes for scsi_ata_read_log() and scsi_ata_pass().

sys/cam/scsi/scsi_da.c:
Revamp the da(4) driver to support zoned devices.

Add five new probe states, four of which are needed for ATA
devices.

Add five new sysctl variables that describe zone support and
parameters.

The da(4) driver supports SCSI ZBC devices, as well as ATA ZAC
devices when they are attached via a SCSI to ATA Translation (SAT)
layer. Since ZBC -> ZAC translation is a new feature in the T10
SAT-4 spec, most SATA drives will be supported via ATA commands
sent via the SCSI ATA PASS-THROUGH command. The da(4) driver will
prefer the ZBC interface, if it is available, for performance
reasons, but will use the ATA PASS-THROUGH interface to the ZAC
command set if the SAT layer doesn't support translation yet.
As I mentioned above, ZBC command support is untested.

Add support for the new BIO_ZONE bio, and all of its subcommands:
DISK_ZONE_OPEN, DISK_ZONE_CLOSE, DISK_ZONE_FINISH, DISK_ZONE_RWP,
DISK_ZONE_REPORT_ZONES, and DISK_ZONE_GET_PARAMS.

Add scsi_zbc_in() and scsi_zbc_out() CCB building functions.

Add scsi_ata_zac_mgmt_out() and scsi_ata_zac_mgmt_in() CCB/CDB
building functions. Note that these have return values, unlike
almost all other CCB building functions in CAM. The reason is
that they can fail, depending upon the particular combination
of input parameters. The primary failure case is if the user
wants NCQ, but fails to specify additional CDB storage. NCQ
requires using the 32-byte version of the SCSI ATA PASS-THROUGH
command, and the current CAM CDB size is 16 bytes.

sys/cam/scsi/scsi_da.h:
Add ZBC IN and ZBC OUT CDBs and opcodes.

Add SCSI Report Zones data structures.

Add scsi_zbc_in(), scsi_zbc_out(), scsi_ata_zac_mgmt_out(), and
scsi_ata_zac_mgmt_in() prototypes.

sys/dev/ahci/ahci.c:
Fix SEND / RECEIVE FPDMA QUEUED in the ahci(4) driver.

ahci_setup_fis() previously set the top bits of the sector count
register in the FIS to 0 for FPDMA commands. This is okay for
read and write, because the PRIO field is in the only thing in
those bits, and we don't implement that further up the stack.

But, for SEND and RECEIVE FPDMA QUEUED, the subcommand is in that
byte, so it needs to be transmitted to the drive.

In ahci_setup_fis(), always set the the top 8 bits of the
sector count register. We need it in both the standard
and NCQ / FPDMA cases.

sys/geom/eli/g_eli.c:
Pass BIO_ZONE commands through the GELI class.

sys/geom/geom.h:
Add g_io_zonecmd() prototype.

sys/geom/geom_dev.c:
Add new DIOCZONECMD ioctl, which allows sending zone commands to
disks.

sys/geom/geom_disk.c:
Add support for BIO_ZONE commands.

sys/geom/geom_disk.h:
Add a new flag, DISKFLAG_CANZONE, that indicates that a given
GEOM disk client can handle BIO_ZONE commands.

sys/geom/geom_io.c:
Add a new function, g_io_zonecmd(), that handles execution of
BIO_ZONE commands.

Add permissions check for BIO_ZONE commands.

Add command decoding for BIO_ZONE commands.

sys/geom/geom_subr.c:
Add DDB command decoding for BIO_ZONE commands.

sys/kern/subr_devstat.c:
Record statistics for REPORT ZONES commands. Note that the
number of bytes transferred for REPORT ZONES won't quite match
what is received from the harware. This is because we're
necessarily counting bytes coming from the da(4) / ada(4) drivers,
which are using the disk_zone.h interface to communicate up
the stack. The structure sizes it uses are slightly different
than the SCSI and ATA structure sizes.

sys/sys/ata.h:
Add many bit and structure definitions for ZAC, NCQ, and EPC
command support.

sys/sys/bio.h:
Convert the bio_cmd field to a straight enumeration. This will
yield more space for additional commands in the future. After
change r297955 and other related changes, this is now possible.
Converting to an enumeration will also prevent use as a bitmask
in the future.

sys/sys/disk.h:
Define the DIOCZONECMD ioctl.

sys/sys/disk_zone.h:
Add a new API for managing zoned disks. This is very close to
the SCSI ZBC and ATA ZAC standards, but uses integers in native
byte order instead of big endian (SCSI) or little endian (ATA)
byte arrays.

This is intended to offer to the complete feature set of the ZBC
and ZAC disk management without requiring the application developer
to include SCSI or ATA headers. We also use one set of headers
for ioctl consumers and kernel bio-level consumers.

sys/sys/param.h:
Bump __FreeBSD_version for sys/bio.h command changes, and inclusion
of SMR support.

usr.sbin/Makefile:
Add the zonectl utility.

usr.sbin/diskinfo/diskinfo.c
Add disk zoning capability to the 'diskinfo -v' output.

usr.sbin/zonectl/Makefile:
Add zonectl makefile.

usr.sbin/zonectl/zonectl.8
zonectl(8) man page.

usr.sbin/zonectl/zonectl.c
The zonectl(8) utility. This allows managing SCSI or ATA zoned
disks via the disk_zone.h API. You can report zones, reset write
pointers, get parameters, etc.

Sponsored by: Spectra Logic
Differential Revision: https://reviews.freebsd.org/D6147
Reviewed by: wblock (documentation)


# e8d57122 29-Apr-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/geom: spelling fixes in comments.

No functional change.


# 86787e8d 17-Nov-2015 Steven Hartland <smh@FreeBSD.org>

Fix early kernel dump via dumpdev env

Setting the dumpdev via env e.g. loader.conf provides the ability to
configure the kernel dump device during early boot. When using this
g_io_getattr was returning EPERM due to cp->acr == 0.

Fix this by calling g_access to ensure we're a read consumer prior
to calling g_dev_setdumpdev.

MFC after: 2 weeks
Sponsored by: Multiplay


# 4a3760ba 11-Oct-2015 Alexander Motin <mav@FreeBSD.org>

Remove compatibility shims for legacy ATA device names.

We got new ATA stack in FreeBSD 8.x, switched to it at 9.x, completely
removed old stack at 10.x, so at 11.x it is time to remove compat shims.


# b4d72907 23-Sep-2015 Conrad Meyer <cem@FreeBSD.org>

geom_dev: Use kenv 'dumpdev' in the same way as rc/etc.d/dumpon

Skip a /dev/ prefix, if one is present, when checking for matching
device names for dump.

Suggested by: avg
Reviewed by: markj
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D3725


# 72800098 04-Aug-2015 Edward Tomasz Napierala <trasz@FreeBSD.org>

Fix panic triggered by code like this:

open("/dev/md0", O_EXEC);

Discussed with: kib@, mav@
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3051


# d6cc35b2 03-Aug-2015 Edward Tomasz Napierala <trasz@FreeBSD.org>

Fix panic that would happen on forcibly unmounting devfs (note that
as it is now, devfs ignores MNT_FORCE anyway, so it needs to be modified
to trigger the panic) with consumers still opened.

Note that this still results in a leak of r/w/e counters. It seems
to be harmless, though. If anyone knows a better way to approach
this - please tell.

Discussed with: kib@, mav@
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3050


# 0ada3afc 09-Apr-2015 Alexander Motin <mav@FreeBSD.org>

Remove sleeps from geom_up thread on device destruction.

MFC after: 3 days.


# 01de1a06 14-Jan-2015 Edward Tomasz Napierala <trasz@FreeBSD.org>

Add devd(8) notifications for creation and destruction of GEOM devices.

Differential Revision: https://reviews.freebsd.org/D1211
MFC after: 1 month
Sponsored by: The FreeBSD Foundation


# 5ebb15b9 10-Nov-2014 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Add missing privilege check when setting the dump device. Before that change it
was possible for a regular user to setup the dump device if he had write access
to the given device. In theory it is a security issue as user might get access
to kernel's memory after provoking kernel crash, but in practise it is not
recommended to give regular users direct access to storage devices.

Rework the code so that we do privileges check within the set_dumper() function
to avoid similar problems in the future.

Discussed with: secteam


# c3e7ba3e 05-Nov-2014 Alexander Motin <mav@FreeBSD.org>

Add to CTL support for logical block provisioning threshold notifications.

For ZVOL-backed LUNs this allows to inform initiators if storage's used or
available spaces get above/below the configured thresholds.

MFC after: 2 weeks
Sponsored by: iXsystems, Inc.


# 2be111bf 16-Oct-2014 Davide Italiano <davide@FreeBSD.org>

Follow up to r225617. In order to maximize the re-usability of kernel code
in userland rename in-kernel getenv()/setenv() to kern_setenv()/kern_getenv().
This fixes a namespace collision with libc symbols.

Submitted by: kmacy
Tested by: make universe


# 0478dc0c 07-Oct-2014 Andrey V. Elsukov <ae@FreeBSD.org>

Add an ability to set dumpdev via loader(8) tunable.

MFC after: 3 weeks


# d1718390 02-Oct-2014 Hiroki Sato <hrs@FreeBSD.org>

Fix a bug in r272297 which prevented dumpdev from setting.
!u is not equivalent to (u != 0).


# 227f68ed 29-Sep-2014 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Be prepared that set_dumper() might fail even when resetting it or prefix
the call with (void) to document that we intentionally ignore the return
value - no way to handle an error in case of device disappearing.


# 7f5b5071 30-Sep-2014 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Style fixes.


# 274919e9 18-Aug-2014 Scott Long <scottl@FreeBSD.org>

Deal explicitly with possible failures of make_dev_alias_p() in GEOM.

Submitted by: Mariusz Zaborski <oshogbo@FreeBSD.org>
MFC after: 3 days


# 2634da8c 12-Dec-2013 Alexander Motin <mav@FreeBSD.org>

Fix bug introduced at r256607. We have to recalculate bp_resid here since
sizes of original and completed requests may differ due to end of media.

Bisected by: pho


# 40ea77a0 22-Oct-2013 Alexander Motin <mav@FreeBSD.org>

Merge GEOM direct dispatch changes from the projects/camlock branch.

When safety requirements are met, it allows to avoid passing I/O requests
to GEOM g_up/g_down thread, executing them directly in the caller context.
That allows to avoid CPU bottlenecks in g_up/g_down threads, plus avoid
several context switches per I/O.

The defined now safety requirements are:
- caller should not hold any locks and should be reenterable;
- callee should not depend on GEOM dual-threaded concurency semantics;
- on the way down, if request is unmapped while callee doesn't support it,
the context should be sleepable;
- kernel thread stack usage should be below 50%.

To keep compatibility with GEOM classes not meeting above requirements
new provider and consumer flags added:
- G_CF_DIRECT_SEND -- consumer code meets caller requirements (request);
- G_CF_DIRECT_RECEIVE -- consumer code meets callee requirements (done);
- G_PF_DIRECT_SEND -- provider code meets caller requirements (done);
- G_PF_DIRECT_RECEIVE -- provider code meets callee requirements (request).
Capable GEOM class can set them, allowing direct dispatch in cases where
it is safe. If any of requirements are not met, request is queued to
g_up or g_down thread same as before.

Such GEOM classes were reviewed and updated to support direct dispatch:
CONCAT, DEV, DISK, GATE, MD, MIRROR, MULTIPATH, NOP, PART, RAID, STRIPE,
VFS, ZERO, ZFS::VDEV, ZFS::ZVOL, all classes based on g_slice KPI (LABEL,
MAP, FLASHMAP, etc).

To declare direct completion capability disk(9) KPI got new flag equivalent
to G_PF_DIRECT_SEND -- DISKFLAG_DIRECT_COMPLETION. da(4) and ada(4) disk
drivers got it set now thanks to earlier CAM locking work.

This change more then twice increases peak block storage performance on
systems with manu CPUs, together with earlier CAM locking changes reaching
more then 1 million IOPS (512 byte raw reads from 16 SATA SSDs on 4 HBAs to
256 user-level threads).

Sponsored by: iXsystems, Inc.
MFC after: 2 months


# 21d0712c 16-Oct-2013 Alexander Motin <mav@FreeBSD.org>

MFprojects/camlock r256371:
Fix passing uninitialized bio_resid argument to g_trace().


# ce625ec7 15-Aug-2013 Kenneth D. Merry <ken@FreeBSD.org>

Change the way that unmapped I/O capability is advertised.

The previous method was to set the D_UNMAPPED_IO flag in the cdevsw
for the driver. The problem with this is that in many cases (e.g.
sa(4)) there may be some instances of the driver that can handle
unmapped I/O and some that can't. The isp(4) driver can handle
unmapped I/O, but the esp(4) driver currently cannot. The cdevsw
is shared among all driver instances.

So instead of setting a flag on the cdevsw, set a flag on the cdev.
This allows drivers to indicate support for unmapped I/O on a
per-instance basis.

sys/conf.h: Remove the D_UNMAPPED_IO cdevsw flag and replace it
with an SI_UNMAPPED cdev flag.

kern_physio.c: Look at the cdev SI_UNMAPPED flag to determine
whether or not a particular driver can handle
unmapped I/O.

geom_dev.c: Set the SI_UNMAPPED flag for all GEOM cdevs.
Since GEOM will create a temporary mapping when
needed, setting SI_UNMAPPED unconditionally will
work.

Remove the D_UNMAPPED_IO flag.

nvme_ns.c: Set the SI_UNMAPPED flag on cdevs created here
if NVME_UNMAPPED_BIO_SUPPORT is enabled.

vfs_aio.c: In aio_qphysio(), check the SI_UNMAPPED flag on a
cdev instead of the D_UNMAPPED_IO flag on the cdevsw.

sys/param.h: Bump __FreeBSD_version to 1000045 for the switch from
setting the D_UNMAPPED_IO flag in the cdevsw to setting
SI_UNMAPPED in the cdev.

Reviewed by: kib, jimharris
MFC after: 1 week
Sponsored by: Spectra Logic


# 6f926c0b 26-Apr-2013 Steven Hartland <smh@FreeBSD.org>

Added a sysctl (kern.geom.dev.delete_max_sectors) to control the maximum
size of a delete request sent to the providing device performed by g_dev_ioctl.

This allows the kernel and apps via ioctl e.g. newfs -E to request large LBA
deletes which siginificantly improves performance.

Previously this was hard coded to 65536 sectors, the new default is 262144
which doubles the throughput of deletes on commonly available SSD's.

In tests on a Intel 520 120GB FW: 400i disk it improved the delete throughput
from 1.6GB/s to over 2.6GB/s on a full disk delete such as that done via
newfs -E

For some SSD's where delete time is pretty much constant, no matter what
the request, setting this to 0 will provide significantly better throughput
e.g. Samsung 840 240GB FW DXT07B0Q @ 262144 = 79G/s, @ 0 = 2259G/s

Reviewed by: mav
Approved by: pjd (mentor)
MFC after: 2 weeks


# 16fac6c9 06-Apr-2013 Edward Tomasz Napierala <trasz@FreeBSD.org>

Make it possible to submit FLUSH bios through geom_dev strategy. This
is required for CTL to work with device-backed LUNs.

Reviewed by: mav


# 31932fae 25-Mar-2013 Alexander Kabaev <kan@FreeBSD.org>

Do not pass unmapped buffers to drivers that cannot handle them

In physio, check if device can handle unmapped IO and pass an
appropriately mapped buffer to the driver strategy routine. The
only driver in the tree that can handle unmapped buffers is one
exposed by GEOM, so mark it as such with the new flag in the
driver cdevsw structure.

This fixes insta-panics on hosts, running dconschat, as /dev/fwmem
is an example of the driver that makes use of physio routine, but
bypasses the g_down thread, where the buffer gets mapped normally.

Discussed with: kib (earlier version)


# 3c330aff 24-Mar-2013 Alexander Motin <mav@FreeBSD.org>

Fix long known deadlock between geom dev destruction and d_close() call.
Use destroy_dev_sched_cb() to not wait for device destruction while holding
GEOM topology lock (that actually caused deadlock). Use request counting
protected by mutex to properly wait for outstanding requests completion in
cases of device closing and geom destruction. Unlike r227009, this code
does not block taskqueue thread for indefinite time, waiting for completion.


# 02c62349 19-Nov-2012 Jaakko Heinonen <jh@FreeBSD.org>

- Don't pass geom and provider names as format strings.
- Add __printflike() attributes.
- Remove an extra argument for the g_new_geomf() call in swapongeom_ev().

Reviewed by: pjd


# bad7e7f3 01-Nov-2012 Alfred Perlstein <alfred@FreeBSD.org>

Provide a device name in the sysctl tree for programs to query the
state of crashdump target devices.

This will be used to add a "-l" (ell) flag to dumpon(8) to list the
currently configured dumpdev.

Reviewed by: phk


# 24d1105d 28-Aug-2012 Ed Schouten <ed@FreeBSD.org>

Remove unneeded G_PF_CANDELETE flag.

This flag is only used by GEOM so it can be propagated to the character
device's SI_CANDELETE. Unfortunately, SI_CANDELETE seems to do nothing.


# 3631c638 29-Jul-2012 Alexander Motin <mav@FreeBSD.org>

Implement media change notification for DA and CD removable media devices.
It includes three parts:
1) Modifications to CAM to detect media media changes and report them to
disk(9) layer. For modern SATA (and potentially UAS) devices it utilizes
Asynchronous Notification mechanism to receive events from hardware.
Active polling with TEST UNIT READY commands with 3 seconds period is used
for incapable hardware. After that both CD and DA drivers work the same way,
detecting two conditions: "NOT READY: Medium not present" after medium was
detected previously, and "UNIT ATTENTION: Not ready to ready change, medium
may have changed". First one reported to disk(9) as media removal, second
as media insert/change. To reliably receive second event new
AC_UNIT_ATTENTION async added to make UAs broadcasted to all periphs by
generic error handling code in cam_periph_error().
2) Modifications to GEOM core to handle media remove and change events.
Media removal handled by spoiling all consumers attached to the provider.
Media change event also schedules provider retaste after spoiling to probe
new media. New flag G_CF_ORPHAN was added to consumers to reflect that
consumer is in process of destruction. It allows retaste to create new
geom instance of the same class, while previous one is still dying.
3) Modifications to some GEOM classes: DEV -- to report media change
events to devd; VFS -- to handle spoiling same as orphan to prevent
accessing replaced media. PART class already handles spoiling alike to
orphan.

Reviewed by: silence on geom@ and scsi@
Tested by: avg
Sponsored by: iXsystems, Inc. / PC-BSD
MFC after: 2 months


# aaaf515f 06-Jul-2012 Edward Tomasz Napierala <trasz@FreeBSD.org>

Fix typo in the comment.


# 107c1508 14-Nov-2011 Alexander Motin <mav@FreeBSD.org>

Temporary revert r227009 to fix freeze on UP systems without PREEMPTION.

Before r215687, if some withered geom or provider could not be destroyed,
g_event thread went to sleep for 0.1s before retrying. After that change
it is just restarting immediately. r227009 made orphaned (withered) provider
to not detach immediately, but only after context switch. That made loop
inside g_event thread infinite on UP systems without PREEMPTION.

To address original problem with possible dead lock addressed by r227009
we have to fix r215687 change first, that needs some time to think and test.


# 755d1ea5 01-Nov-2011 Alexander Motin <mav@FreeBSD.org>

Make orphan() method in geom_dev asynchronous using destroy_dev_sched_cb()
instead of destroy_dev(). It moves device destruction waiting out of the
topology lock and so fixes dead lock between orphanization and closing.
Real provider and geom destruction called from swi context after device
destroyed as callback of the destroy_dev_sched_cb().


# 416494d7 14-Jun-2011 Justin T. Gibbs <gibbs@FreeBSD.org>

Plumb device physical path reporting from CAM devices, through GEOM and
DEVFS, and make it accessible via the diskinfo utility.

Extend GEOM's generic attribute query mechanism into generic disk consumers.
sys/geom/geom_disk.c:
sys/geom/geom_disk.h:
sys/cam/scsi/scsi_da.c:
sys/cam/ata/ata_da.c:
- Allow disk providers to implement a new method which can override
the default BIO_GETATTR response, d_getattr(struct bio *). This
function returns -1 if not handled, otherwise it returns 0 or an
errno to be passed to g_io_deliver().

sys/cam/scsi/scsi_da.c:
sys/cam/ata/ata_da.c:
- Don't copy the serial number to dp->d_ident anymore, as the CAM XPT
is now responsible for returning this information via
d_getattr()->(a)dagetattr()->xpt_getatr().

sys/geom/geom_dev.c:
- Implement a new ioctl, DIOCGPHYSPATH, which returns the GEOM
attribute "GEOM::physpath", if possible. If the attribute request
returns a zero-length string, ENOENT is returned.

usr.sbin/diskinfo/diskinfo.c:
- If the DIOCGPHYSPATH ioctl is successful, report physical path
data when diskinfo is executed with the '-v' option.

Submitted by: will
Reviewed by: gibbs
Sponsored by: Spectra Logic Corporation

Add generic attribute change notification support to GEOM.

sys/sys/geom/geom.h:
Add a new attrchanged method field to both g_class
and g_geom.

sys/sys/geom/geom.h:
sys/geom/geom_event.c:
- Provide the g_attr_changed() function that providers
can use to advertise attribute changes.
- Perform delivery of attribute change notifications
from a thread context via the standard GEOM event
mechanism.

sys/geom/geom_subr.c:
Inherit the attrchanged method from class to geom (class instance).

sys/geom/geom_disk.c:
Provide disk_attr_changed() to provide g_attr_changed() access
to consumers of the disk API.

sys/cam/scsi/scsi_pass.c:
sys/cam/scsi/scsi_da.c:
sys/geom/geom_dev.c:
sys/geom/geom_disk.c:
Use attribute changed events to track updates to physical path
information.

sys/cam/scsi/scsi_da.c:
Add AC_ADVINFO_CHANGED to the registered asynchronous CAM
events for this driver. When this event occurs, and
the updated buffer type references our physical path
attribute, emit a GEOM attribute changed event via the
disk_attr_changed() API.

sys/cam/scsi/scsi_pass.c:
Add AC_ADVINFO_CHANGED to the registered asynchronous CAM
events for this driver. When this event occurs, update
the physical patch devfs alias for this pass instance.

Submitted by: gibbs
Sponsored by: Spectra Logic Corporation


# bd5c3686 03-May-2011 Alexander Motin <mav@FreeBSD.org>

Use make_dev_alias_p() added in r221397 to create alias dev entry.
It removes panic in case if alias name is already busy for some reason.


# 0d307e09 26-Apr-2011 Alexander Motin <mav@FreeBSD.org>

- Add shim to simplify migration to the CAM-based ATA. For each new adaX
device in /dev/ create symbolic link with adY name, trying to mimic old ATA
numbering. Imitation is not complete, but should be enough in most cases to
mount file systems without touching /etc/fstab.
- To know what behavior to mimic, restore ATA_STATIC_ID option in cases
where it was present before.
- Add some more details to UPDATING.


# 06f4c96d 24-Mar-2011 Alexander Motin <mav@FreeBSD.org>

MFgraid/head r217827:
Change BIO_GETATTR("GEOM::kerneldump") API to make set_dumper() called by
consumer (geom_dev) instead of provider (geom_disk). This allows any geom
insert it's code into the dump call chain, implementing more sophisticated
functionality then just disk partitioning.


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# bc2589f5 19-Oct-2010 Jaakko Heinonen <jh@FreeBSD.org>

Use make_dev_p(9) with the MAKEDEV_CHECKNAME flag instead of make_dev(9)
and print a diagnostic if the call fails.

This avoids a panic when a device with an invalid name is attempted to
be registered. For example the label class gets device names from
untrusted input.

Reviewed by: freebsd-geom


# 1bdfff22 11-Jun-2010 Andriy Gapon <avg@FreeBSD.org>

fix a few cases where a string is passed via format argument instead of
via %s

Most of the cases looked harmless, but this is done for the sake of
correctness. In one case it even allowed to drop an intermediate buffer.

Found by: clang
MFC after: 2 week


# 92d8dadc 18-Jan-2010 Alexander Motin <mav@FreeBSD.org>

MFC r201139:
Add BIO_DELETE support to ada(4):
- For SSDs use TRIM feature of DATA SET MANAGEMENT command, as defined by
ACS-2 specification working draft.
- For CompactFlash use CFA ERASE command, same as ad(4) does.

With this patch, `newfs -E /dev/ada1` was able to restore write speed of
my heavily weared OCZ Vertex SSD (firmware 1.4) up to the initial level
for the most part of it's capacity.

I have no idea whether it is normal, but for some reason it takes 200ms
to handle any TRIM command on this drive, that was making delete extremely
slow. But TRIM command is able to accept long list of LBAs and the length of
that list seems doesn't affect it's execution time. Implemented request
clusting algorithm allowed me to rise delete rate up to reasonable numbers,
when many parallel DELETE requests running.


# 2aa244f2 05-Jan-2010 Alexander Motin <mav@FreeBSD.org>

MFC r200934:
Add two disk ioctls, giving user-level tools information about disk/array
stripe (optimal access block) size and offset.


# 1c80ec0a 28-Dec-2009 Alexander Motin <mav@FreeBSD.org>

Add BIO_DELETE support to ada(4):
- For SSDs use TRIM feature of DATA SET MANAGEMENT command, as defined by
ACS-2 specification working draft.
- For CompactFlash use CFA ERASE command, same as ad(4) does.

With this patch, `newfs -E /dev/ada1` was able to restore write speed of
my heavily weared OCZ Vertex SSD (firmware 1.4) up to the initial level
for the most part of it's capacity. Previous 1.3 firmware, even reportiong
TRIM capabilty bit set, was not working, reporting ABORT error for every
DSM command.

I have no idea whether it is normal, but for some reason it takes 200ms
to handle any TRIM command on this drive, that was making delete extremely
slow. But TRIM command is able to accept long list of LBAs and the length of
that list seems doesn't affect it's execution time. Implemented request
clusting algorithm allowed me to rise delete rate up to reasonable numbers,
when many parallel DELETE requests running.


# 8b303238 24-Dec-2009 Alexander Motin <mav@FreeBSD.org>

Add two disk ioctls, giving user-level tools information about disk/array
stripe (optimal access block) size and offset.


# c4511bef 17-Nov-2009 Alexander Motin <mav@FreeBSD.org>

MFC r196964:
Do not check proper request alignment here in geom_dev in production.
It will be checked any way later by g_io_check() in g_io_schedule_down().
It is only needed here to not trigger panic from additional check, when
INVARIANTS enabled. So cover it with #ifdef INVARIANTS. It saves two
64bit divisions per request.


# 18e42503 07-Sep-2009 Alexander Motin <mav@FreeBSD.org>

Do not check proper request alignment here in geom_dev in production.
It will be checked any way later by g_io_check() in g_io_schedule_down().
It is only needed here to not trigger panic from additional check, when
INVARIANTS enabled. So cover it with #ifdef INVARIANTS. It saves two
64bit divisions per request.


# f43b57e3 07-Jul-2009 Marcel Moolenaar <marcel@FreeBSD.org>

Revert revisions 188839 and 188868. Use of the ioctl in geom_dev.c
is invalid because the ioctl happens without prior open. The ioctl
got introduced to provide backward compatibility for extended
partitions, but it ended up not being used because it didn't work
as expected. Since there are no consumers of the ioctl and the
implementation is broken, the best fix is to remove the code
entirely.

Spotted by: phk
Approved by: re (kensmith)


# 59c532c5 19-Feb-2009 Marcel Moolenaar <marcel@FreeBSD.org>

Provide compatibility symlink for logical partitions:
1. Extend geom_dev by having it create the symlink (i.e. call
make_dev_alias) based on the DIOCGPROVIDERALIAS ioctl.
In this way the functionaility is generic and thus usable
by any geom/provider.
2. Have g_part handle said ioctl through the devalias method,
so that it's under control of the scheme itself. By design
the alias will not be created for newly added partitions.


# 739b705c 24-Jan-2009 Ed Schouten <ed@FreeBSD.org>

Remove unused unrhdr from GEOM character device module.

Now that make_dev() doesn't require unit numbers to be unique, there is
no need to use an unrhdr here to generate the numbers. Remove the entire
init-routine, because it is optional.


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# d3ce8327 26-Sep-2008 Ed Schouten <ed@FreeBSD.org>

Remove unit2minor() use from kernel code.

When I changed kern_conf.c three months ago I made device unit numbers
equal to (unneeded) device minor numbers. We used to require
bitshifting, because there were eight bits in the middle that were
reserved for a device major number. Not very long after I turned
dev2unit(), minor(), unit2minor() and minor2unit() into macro's.
The unit2minor() and minor2unit() macro's were no-ops.

We'd better not remove these four macro's from the kernel, because there
is a lot of (external) code that may still depend on them. For now it's
harmless to remove all invocations of unit2minor() and minor2unit().

Reviewed by: kib


# f805f204 07-Sep-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Add a new ioctl for getting the provider name of a geom provider.
- Add a routine for looking up a device and checking if it is a valid geom
provider given a partial or full path to its device node.

Reviewed by: phk
Approved by: pjd (mentor)


# 06d425f9 28-May-2008 Ed Schouten <ed@FreeBSD.org>

Remove the distinction between device minor and unit numbers.

Even though we got rid of device major numbers some time ago, device
drivers still need to provide unique device minor numbers to make_dev().
These numbers are only used inside the kernel. They are not related to
device major and minor numbers which are visible in devfs. These are
actually based on the inode number of the device.

It would eventually be nice to remove minor numbers entirely, but we
don't want to be too agressive here.

Because the 8-15 bits of the device number field (si_drv0) are still
reserved for the major number, there is no 1:1 mapping of the device
minor and unit numbers. Because this is now unused, remove the
restrictions on these numbers.

The MAXMAJOR definition was actually used for two purposes. It was used
to convert both the userspace and kernelspace device numbers to their
major/minor pair, which is why it is now named UMINORMASK.

minor2unit() and unit2minor() have now become useless. Both minor() and
dev2unit() now serve the same purpose. We should eventually remove some
of them, at least turning them into macro's. If devfs would become
completely minor number unaware, we could consider using si_drv0 directly,
just like si_drv1 and si_drv2.

Approved by: philip (mentor)


# 015a11e6 16-Dec-2007 Poul-Henning Kamp <phk@FreeBSD.org>

Chop DIOCGDELETE from userland up in 1024 sector chunks to give geom_disk
or any other bio chopping geom a reasonable size of work.

Check for delivered signals between chunks, because the request size
and service time is unbounded.


# eed6cda9 16-Dec-2007 Poul-Henning Kamp <phk@FreeBSD.org>

Don't limit BIO_DELETE requests to MAXPHYS, they perform no data
transfers, so they are not subject to the VM system limitation.


# 0589353a 05-May-2007 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Implement three new ioctls that can be used with GEOM provider:

DIOCGFLUSH - Flush write cache (sends BIO_FLUSH).

DIOCGDELETE - Delete data (mark as unused) (sends BIO_DELETE).

DIOCGIDENT - Get provider's uniqe and fixed identifier (asks for
GEOM::ident attribute).

First two are self-explanatory, but the last one might not be. Here are
properties of provider's ident:

- ident value is preserved between reboots,
- provider can be detached/attached and ident is preserved,
- provider's name can change - ident can't,
- ident value should not be based on on-disk metadata; in other words
copying whole data from one disk to another should not yield the same
ident for the other disk,
- there could be more than one provider with the same ident, but only if
they point at exactly the same physical storage, this is the case for
multipathing for example,
- GEOM classes that consumes single providers and provide single providers,
like geli, gbde, should just attach class name to the ident of the
underlying provider,
- ident is an ASCII string (is printable),
- ident is optional and applications can't relay on its presence.

The main purpose for this is that application and remember provider's ident
and once it tries to open provider by its name again, it may compare idents
to be sure this is the right provider. If it is not (idents don't match),
then it can open provider by its ident.

OK'ed by: phk


# 17e910a2 26-Mar-2007 Kris Kennaway <kris@FreeBSD.org>

make_dev(9) can be (and is) called without Giant, so there is no need to
drop the topology lock and acquire Giant around this call.

Reviewed by: phk


# 4d70511a 27-Feb-2007 John Baldwin <jhb@FreeBSD.org>

Use pause() rather than tsleep() on stack variables and function pointers.


# 6e50e38f 23-Feb-2007 John Baldwin <jhb@FreeBSD.org>

Use tsleep() rather than msleep() with a NULL mtx parameter.


# 274ede62 18-Jun-2006 Simon L. B. Nielsen <simon@FreeBSD.org>

In g_dev_strategy(), when failing an IO request with EINVAL due to
offset or request size which is not a multiple of the sector size, make
sure that the bio is set to indicate that no data has actually been
transferred.

The result of this is that the file offset is no longer incremented for
these requests. The fact that the file offset was incremented broke
fdisk(8)'s probing of sector size for non-512 byte sector sizes.

Reviewed by: phk, cperciva
Submitted by: mdodd
MFC after: 2 weeks


# b3b21113 17-Mar-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Avoid null pointer dereference.


# 3b3f38ed 07-Mar-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Add placeholder mutex argument to new_unrhdr().


# 2221dbeb 12-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Pass the file->flags down to geom ioctl handlers.

Reject certain ioctls if write permission is not indicated.

Bump geom API version.

Reported by: Ruben de Groot <mail25@bzerk.org>


# 55f499a9 29-Oct-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Don't set si_bsize_phys, nobody cares.


# 6afb3b1c 29-Oct-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Give dev_strategy() an explict cdev argument in preparation for removing
buf->b-dev.

Put a bio between the buf passed to dev_strategy() and the device driver
strategy routine in order to not clobber fields in the buf.

Assert copyright on vfs_bio.c and update copyright message to canonical
text. There is no legal difference between John Dysons two-clause
abbreviated BSD license and the canonical text.


# 8c24ef5f 24-Oct-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Use unit number allocation functions for GEOM minor numbers.


# f8fe7a73 25-Oct-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Retire si_stripesize and si_stripeoffset they will not be needed in cdev
in the future.


# 85986ce0 23-Oct-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Don't call g_waitidle(), it happens automagically now.


# 6c252337 27-Sep-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Deny invalid I/O requests which comes from userland here, because later
we'll get a panic.
MT5 candidate.

Reviewed by: phk


# a7830346 24-Sep-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Assert topology is held in g_dev_getprovider().

Don't call devsw(). It is not necessary, and we do not need to hold dev_lock
to compare the devsw pointer to our own since we do not dereference it.


# 5721c9c7 08-Aug-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Tag all geom classes in the tree with a version number.


# 650ee351 08-Aug-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Use default method initialization on geoms.


# fc6c63b4 19-Jun-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Duplicate the securelevel check from spec_vnops.c here.


# b90c8559 17-Jun-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Reduce the thaumaturgical level of root filesystem mounts: Instead of using
an otherwise redundant clone routine in geom_disk.c, mount a temporary
DEVFS and do a proper lookup.

Submitted by: thomas


# f3732fd1 17-Jun-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Second half of the dev_t cleanup.

The big lines are:
NODEV -> NULL
NOUDEV -> NODEV
udev_t -> dev_t
udev2dev() -> findcdev()

Various minor adjustments including handling of userland access to kernel
space struct cdev etc.


# 89c9c53d 16-Jun-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.


# dc08ffec 21-Feb-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Device megapatch 4/6:

Introduce d_version field in struct cdevsw, this must always be
initialized to D_VERSION.

Flip sense of D_NOGIANT flag to D_NEEDGIANT, this involves removing
four D_NOGIANT flags and adding 145 D_NEEDGIANT flags.


# d2bae332 12-Feb-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Remove the absolute count g_access_abs() function since experience has
shown that it is not useful.

Rename the relative count g_access_rel() function to g_access(), only
the name has changed.

Change all g_access_rel() calls in our CVS tree to call g_access() instead.

Add an #ifndef BURN_BRIDGES #define of g_access_rel() for source
code compatibility.


# 752e0f01 23-Jan-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Add missing newline in printf.

Submitted by: Pawel Jakub Dawidek <nick@garage.freebsd.pl>


# d1b8bf47 19-Oct-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Remove KASSERT check for negative bio_offsets, add "normal" EIO
error return for same.


# e83d1f3b 12-Oct-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Assume that bp->bio_offset is correctly initialized.

This fixes non-power-of-2 blocksize GEOM I/O.


# f03bec94 04-Sep-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Make sure to return ENOIOCTL if the ioctl is not handled.


# 497c3347 01-Sep-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Simplify the ioctl handling in GEOM.

This replaces the current ioctl processing with a direct call path
from geom_dev() where the ioctl arrives (from SPECFS) to any directly
connected GEOM class.

The inverse of the above is no longer supported. This is the
situation were you have one or more intervening GEOM classes, for
instance a BSDlabel on top of a MBR or PC98. If you want to issue
MBR or PC98 specific ioctls, you will need to issue them on a MBR
or PC98 providers.

This paves the way for inviting CD's, FD's and other special cases
inside GEOM.


# bff1e299 30-Aug-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Add the new g_dev_getprovider() function, the swap_pager needs it now.

Spotted by: mr


# 4ba5a129 12-Aug-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Replace a panic with a .1Hz retry loop.
Not a perfect solution, but far cheaper than one.


# a35006e8 02-Aug-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Kick Giant compatibility one layer up.


# 50b1faef 11-Jun-2003 David E. O'Brien <obrien@FreeBSD.org>

Use __FBSDID().

Approved by: phk


# 84c080a8 07-Jun-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Improve the root-dev prompt facility for printing devices which could
possibly be a root filesystem.


# ac2ba9e3 07-Jun-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Wait for everything to settle before we try to print the list of
geom devices.


# 3bae8877 31-May-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Remove unused variables.

Found by: FlexeLint


# f075585f 31-May-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Remove the G_CLASS_INITIALIZER, we do not need it anymore.


# 4da6e74c 09-May-2003 Poul-Henning Kamp <phk@FreeBSD.org>

When a GEOM (/dev-)device is closed and we find that I/O requests are
still outstanding, give them a chance to complete.

If after 10 seconds we still find outstanding I/O requests, complete
the close with a console warning that the system is likely to panic
later on.

This is a workaround for umount -f not quite doing the right thing.

Approved by: re/scottl


# c4da4e46 02-May-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Back out all the stuff that didn't belong in the last commit.


# e65ab0f8 02-May-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Use g_slice_spoiled() rather than g_std_spoiled().

Remember to free the buffer we got from g_read_data().


# 104a9b7e 29-Apr-2003 Alexander Kabaev <kan@FreeBSD.org>

Deprecate machine/limits.h in favor of new sys/limits.h.
Change all in-tree consumers to include <sys/limits.h>

Discussed on: standards@
Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>


# c7e1925c 02-Apr-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Properly handle races between open/close and orphan.

KASSERT the race between close and strategy, it is an error in the upper
echelons if this happens,

Add XXX: comment explaining why the ioctl/orphan race is not closed.


# c138fec0 24-Mar-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Premptively change initializations of struct g_class to use C99
sparse struct initializations before we extend the struct with
new OAM related member functions.


# b4b138c2 18-Mar-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Including <sys/stdint.h> is (almost?) universally only to be able to use
%j in printfs, so put a newsted include in <sys/systm.h> where the printf
prototype lives and save everybody else the trouble.


# 564632b0 09-Mar-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Remove unneeded #include of geom_stats.h


# df6c9fe9 09-Mar-2003 Poul-Henning Kamp <phk@FreeBSD.org>

When a DEV class consumer is orphan'ed we need to wait for all the
outstanding requests to return before we unravel the mesh.

It is very important that the stuff below us plays nice and don't
overlook a couple of outstanding bio's, because until they remember
the geom event thread is blocked. At an expense in code here this
could be made more robust, but I actually _want_ a robust failure
in this case so any offending drivers can be fixed.


# 7ac40f5f 02-Mar-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Gigacommit to improve device-driver source compatibility between
branches:

Initialize struct cdevsw using C99 sparse initializtion and remove
all initializations to default values.

This patch is automatically generated and has been tested by compiling
LINT with all the fields in struct cdevsw in reverse order on alpha,
sparc64 and i386.

Approved by: re(scottl)


# aa8918fa 02-Mar-2003 Poul-Henning Kamp <phk@FreeBSD.org>

NO_GEOM cleanup:

Remove cdevsw->d_psize() implementation, we don't need it any more.


# a163d034 18-Feb-2003 Warner Losh <imp@FreeBSD.org>

Back out M_* changes, per decision of the TRB.

Approved by: trb


# 0364fe2c 11-Feb-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Advertise MAXPHYS upwards, we will split as necessary before we get to the
bottom of things.


# 8a63edc3 11-Feb-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Better names for struct disk elements: d_maxsize, d_stripeoffset
and d_stripesisze;

Introduce si_stripesize and si_stripeoffset in struct cdev so we
can make the visible to clustering code.

Add stripesize and stripeoffset to providers.

DTRT with stripesize and stripeoffset in various places in GEOM.


# 88b1af77 10-Feb-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Use the SI_CANDELETE flag on the dev_t rather than the D_CANFREE flag
on the cdevsw to determine ability to handle the BIO_DELETE request.


# 4ec35300 08-Feb-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Move the g_stat struct to its own .h file, we will export it to other code.

Insted of embedding a struct g_stat in consumers and providers, merely
include a pointer.

Remove a couple of <sys/time.h> includes now unneeded.

Add a special allocator for struct g_stat. This allocator will allocate
entire pages and hand out g_stat functions from there. The "id" field
indicates free/used status.

Add "/dev/geom.stats" device driver whic exports the pages from the
allocator to userland with mmap(2) in read-only mode.

This mmap(2) interface should be considered a non-public interface and
the functions in libgeom (not yet committed) should be used to access
the statistics data.


# 91cd3dc6 07-Feb-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Move #defines of major/minor to internal header file so other bits can
share and coordinate with geom_dev.


# 801bb689 07-Feb-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Commit the correct copy of the g_stat structure.

Add debug.sizeof.g_stat sysctl.

Set the id field of the g_stat when we create consumers and providers.

Remove biocount from consumer, we will use the counters in the g_stat
structure instead. Replace one field which will need to be atomically
manipulated with two fields which will not (stat.nop and stat.nend).

Change add companion field to bio_children: bio_inbed for the exact
same reason.

Don't output the biocount in the confdot output.

Fix KASSERT in g_io_request().

Add sysctl kern.geom.collectstats defaulting to off.

Collect the following raw statistics conditioned on this sysctl:

for each consumer and provider {
total number of operations started.
total number of operations completed.
time last operation completed.
sum of idle-time.
for each of BIO_READ, BIO_WRITE and BIO_DELETE {
number of operations completed.
number of bytes completed.
number of ENOMEM errors.
number of other errors.
sum of transaction time.
}
}

API for getting hold of these statistics data not included yet.


# 936cc461 07-Feb-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Rename bio_linkage to the more obvious bio_parent.
Add bio_t0 timestamp, and include <sys/time.h> where needed


# 44956c98 21-Jan-2003 Alfred Perlstein <alfred@FreeBSD.org>

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


# d320fdbc 14-Jan-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Now that we have non-geom_disk based drivers, we need to cover for those,
in case they return EOPNOTSUPP on an ioctl.

Found by: jhb


# 4b2f4ce9 13-Jan-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Always issue ioctls as BIO_GEATTR requests. The direction of data copies on
ioctls are no reliable indication of the ioctls "set" or "get" nature or if
such simplistic categories can even be applied.

MFC candidate: boot0cfg issue.


# abb50a48e 13-Jan-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Remove g_silence(). It does not do anything anymore.


# 105df8c3 02-Jan-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Update si_bsize_phys on open.

MFC candidate.


# cd4b1352 26-Dec-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Add an XXX comment to explain the predicament.


# cc0163a3 13-Dec-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Add a couple of KASSERTS, just in case.


# b630d83f 01-Nov-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Add KASSERT for bio_cmd validity here as well. Various hacks still
bypass specfs.


# ce225127 25-Oct-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Add a g_dev_print() function which prints all the /dev entries GEOM
know about.


# c03bf4f2 25-Oct-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Loose the g_dev_clone() noise.


# 3f12caa1 20-Oct-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Now that the sectorsize and mediasize are properties of the provider,
don't take the detour over the I/O path to discover them using getattr(),
we can just pick them out directly.

Do note though, that for now they are only valid after the first open
of the underlying disk device due compatibility with the old disk_create()
API. This will change in the future so they will always be valid.

Sponsored by: DARPA & NAI Labs.


# 14ac6812 20-Oct-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Use %jd instead of %lld now that we have it.


# 2408cfeb 19-Oct-2002 Thomas Moestl <tmm@FreeBSD.org>

The argument to the DIOCGMEDIASIZE ioctl() is an off_t, not an u_int.

Reviewed by: phk


# 02fcfac0 15-Oct-2002 Nate Lawson <njl@FreeBSD.org>

Return an error if the drive reports heads/sectors that do not make sense.
This fixes a divide by zero in fdisk(8)

Reviewed by: phk


# adfa3213 07-Oct-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Copyin and copyout are only possible from a process-native thread,
and therefore we need a way for ioctl handlers to run in that thread
in GEOM. Rather than invent a complicated registration system to
recognize which ioctl handler to use for a given ioctl, we still
schedule all ioctls down the tree as bio transactions but add a
special return code that means "call me directly" and have the
geom_dev layer do that.

Use this for all ioctls that make it as far as a diskdriver to
avoid any backwards compatibility problems.

Requested by: scottl
Sponsored by: DARPA & NAI Labs


# 2874f1cf 04-Oct-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Properly isolate the locking domains of sysctl from the topology lock
for the sysctls which report the configuration.

Sponsored by: DARPA & NAI Labs.


# a4319fd0 02-Oct-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Put some failing ioctl related printfs under a suitable debug flag.

Sponsored by: DARPA & NAI Labs.


# 0a2ece04 01-Oct-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Use the canonical root:operator 0640 for GEOM disk devices.

Spotted by: brooks
Sponsored by: DARPA & NAI Labs.


# 4ae67700 28-Sep-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Style, whitespace and lint fixes.

Sponsored by: DARPA & NAI Labs.


# 9169e800 27-Sep-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Various no-ops:

Add a __unused.

Make the 2byte decoder functions return 16 bits for the benefits
of picky lints.

No need to grab giant around a tsleep() when we have a timeout.

Sponsored by: DARPA & NAI Labs.


# 46714777 20-Sep-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Retire now unused DIOCGDVIRGIN kludge.

Sponsored by: DARPA & NAI Labs.


# 02945fef 06-Sep-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Don't respect the O_EXCL flag, we don't get it back on close so we cannot
correctly track it.

Spotted by: peter
Sponsored by: DARPA & NAI Labs.


# 503abe45 09-Jun-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Improve some on the naming.

Submitted by: iedowse


# 3abe4a80 21-May-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Remove the "-class" suffix from classes, they will not be ambiguous.

Sponsored by: DARPA & NAI Labs.


# 4b8374a7 20-May-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Don't grab Giant around malloc(9) and free(9).
Don't grab Giant around wakeup(9).
Don't print verbose messages about each device found in geom_dev.
Various cleanups.

Sponsored by: DARPA & NAI Labs.


# 53705e35 23-Apr-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Implement the GEOMGETCONF ioctl which returns vital stats for the
current device in XML in an sbuf.

Sponsored by: DARPA & NAI Labs


# 95c24b31 19-Apr-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Make kernel dumps work with GEOM.

Notice that if the device on which the dump is set is destroyed for
any reason, the dump setting is lost. This in particular will
happen in the case of spoilage. For instance if you set dump on
ad0s1b and open ad0 for writing, ad0s* will be spoilt and the dump
setting lost. See geom(4) for more about spoiling.

Sponsored by: DARPA & NAI Labs.


# 1bdb20a6 09-Apr-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Implement DIOCGFRONTSTUFF ioctl which reports how many bytes from the start
of the device magic stuff might occupy.

Sponsored by: DARPA & NAI Labs.


# c7b1a1d1 09-Apr-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Various stylistic nit picking.

Sponsored by: DARPA & NAI Labs.


# 1265c0ce 08-Apr-2002 Poul-Henning Kamp <phk@FreeBSD.org>

In reverence of the 3rd X11 development rule:

3.The only thing worse than generalizing from one example
is generalizing from no examples at all.

Remove the fwcylinders attribute before anybody gets the idea that we
alone have squared the circle.

Sponsored by: DARPA & NAI Labs.


# 07d77fc6 04-Apr-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Move access and orphan member functions from class to geom.

Sponsored by: DARPA & NAI Labs


# 4c0a424c 28-Mar-2002 Poul-Henning Kamp <phk@FreeBSD.org>

In the absense of any smarter way to do this, cast various printf
arguments to silence printf format warnings.


# b1876192 26-Mar-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Eliminate some thread pointers which do not make sense anymore.

Split private parts of geom.h into geom_int.h. The latter should
never be included in class implemtations.


# e805e8f0 26-Mar-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Cave in to tradition and rename "methods" to "classes".


# 00dcdc8d 19-Mar-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Add five GEOM oriented ioctls to get basic information about a geom device.


# b14d84e2 17-Mar-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Change the giant-dropping method a fair bit to keep WITNESS more
happy.


# 14e4cefc 16-Mar-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Hmm, talk about optimizer-fodder. Make the DIOCGDVIRGIN hack work again.


# a5b2e75d 16-Mar-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Add a generic and general ioctl pass-through mechanism.

It should now be posible to issue ioctls to SCSI CD drives.


# dd84a43c 11-Mar-2002 Poul-Henning Kamp <phk@FreeBSD.org>

First commit of the GEOM subsystem to make it easier for people to
test and play with this.

This is not yet production quality and should be run only on dedicated
test boxes.

For people who want to develop transformations for GEOM there exist a
set of shims to run geom in userland (ask phk@freebsd.org).

Reports of all kinds to: phk@freebsd.org
Please include in report:
dmesg
sysctl debug.geomdot
sysctl debug.geomconf

Known significant limitations:
no kernel dump facility.
ioctls severely restricted.

Sponsored by: DARPA, NAI Labs