History log of /freebsd-current/sys/kern/subr_devstat.c
Revision Date Author Comments
# fdafd315 24-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Automated cleanup of cdefs and other formatting

Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.

Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/

Sponsored by: Netflix


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 32faf071 29-Aug-2022 Mark Johnston <markj@FreeBSD.org>

devstat: Remove DTrace io probes lacking a BIO reference

The io:::start and end probes trace individual I/O requests.

Also remove the unimplemented wait-start and wait-done probes.

PR: 266098
MFC after: 1 week


# ab63da35 27-Feb-2021 Alan Somers <asomers@FreeBSD.org>

Speed up geom_stats_resync in the presence of many devices

The old code had a O(n) loop, where n is the size of /dev/devstat.
Multiply that by another O(n) loop in devstat_mmap for a total of
O(n^2).

This change adds DIOCGMEDIASIZE support to /dev/devstat so userland can
quickly determine the right amount of memory to map, eliminating the
O(n) loop in userland.

This change decreases the time to run "gstat -bI0.001" with 16,384 md
devices from 29.7s to 4.2s.

Also, fix a memory leak first reported as PR 203097.

Sponsored by: Axcient
Reviewed by: mav, imp
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D28968


# 8b220f89 24-Oct-2020 Alexander Motin <mav@FreeBSD.org>

Fix asymmetry in devstat(9) calls by GEOM.

Before this GEOM passed bio pointer to transaction start, but not end.
It was irrelevant until devstat(9) got DTrace hooks, that appeared to
provide bio pointer on I/O completion, but not on submission.

MFC after: 2 weeks
Sponsored by: iXsystems, Inc.


# 7029da5c 26-Feb-2020 Pawel Biernacki <kaktus@FreeBSD.org>

Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)

r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718


# 024932aa 29-Dec-2019 Alexander Motin <mav@FreeBSD.org>

Use atomic for start_count in devstat_start_transaction().

Combined with earlier nstart/nend removal it allows to remove several locks
from request path of GEOM and few other places. It would be cool if we had
more SMP-friendly statistics, but this helps too.

Sponsored by: iXsystems, Inc.


# cb847b81 06-Dec-2019 Alexander Motin <mav@FreeBSD.org>

Make devstat_end_transaction_bio() count BIO_ORDERED.

MFC after: 2 weeks


# b6c7d9c3 22-Aug-2018 Conrad Meyer <cem@FreeBSD.org>

devstat(9): Constify function parameters that can be const

No functional change.

When attempting to document the changed argument types in devstat.9, I
discovered the 20 year old manual page severely mismatched reality even
prior to my simple change. So I took a first cut pass cleaning that up to
match reality. I'm sure I've missed some things; the goal was just to leave
it better than when I started.

Sponsored by: Dell EMC Isilon


# 8a36da99 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/kern: adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.


# 9a6844d5 19-May-2016 Kenneth D. Merry <ken@FreeBSD.org>

Add support for managing Shingled Magnetic Recording (SMR) drives.

This change includes support for SCSI SMR drives (which conform to the
Zoned Block Commands or ZBC spec) and ATA SMR drives (which conform to
the Zoned ATA Command Set or ZAC spec) behind SAS expanders.

This includes full management support through the GEOM BIO interface, and
through a new userland utility, zonectl(8), and through camcontrol(8).

This is now ready for filesystems to use to detect and manage zoned drives.
(There is no work in progress that I know of to use this for ZFS or UFS, if
anyone is interested, let me know and I may have some suggestions.)

Also, improve ATA command passthrough and dispatch support, both via ATA
and ATA passthrough over SCSI.

Also, add support to camcontrol(8) for the ATA Extended Power Conditions
feature set. You can now manage ATA device power states, and set various
idle time thresholds for a drive to enter lower power states.

Note that this change cannot be MFCed in full, because it depends on
changes to the struct bio API that break compatilibity. In order to
avoid breaking the stable API, only changes that don't touch or depend on
the struct bio changes can be merged. For example, the camcontrol(8)
changes don't depend on the new bio API, but zonectl(8) and the probe
changes to the da(4) and ada(4) drivers do depend on it.

Also note that the SMR changes have not yet been tested with an actual
SCSI ZBC device, or a SCSI to ATA translation layer (SAT) that supports
ZBC to ZAC translation. I have not yet gotten a suitable drive or SAT
layer, so any testing help would be appreciated. These changes have been
tested with Seagate Host Aware SATA drives attached to both SAS and SATA
controllers. Also, I do not have any SATA Host Managed devices, and I
suspect that it may take additional (hopefully minor) changes to support
them.

Thanks to Seagate for supplying the test hardware and answering questions.

sbin/camcontrol/Makefile:
Add epc.c and zone.c.

sbin/camcontrol/camcontrol.8:
Document the zone and epc subcommands.

sbin/camcontrol/camcontrol.c:
Add the zone and epc subcommands.

Add auxiliary register support to build_ata_cmd(). Make sure to
set the CAM_ATAIO_NEEDRESULT, CAM_ATAIO_DMA, and CAM_ATAIO_FPDMA
flags as appropriate for ATA commands.

Add a new get_ata_status() function to parse ATA result from SCSI
sense descriptors (for ATA passthrough over SCSI) and ATA I/O
requests.

sbin/camcontrol/camcontrol.h:
Update the build_ata_cmd() prototype

Add get_ata_status(), zone(), and epc().

sbin/camcontrol/epc.c:
Support for ATA Extended Power Conditions features. This includes
support for all features documented in the ACS-4 Revision 12
specification from t13.org (dated February 18, 2016).

The EPC feature set allows putting a drive into a power power mode
immediately, or setting timeouts so that the drive will
automatically enter progressively lower power states after various
idle times.

sbin/camcontrol/fwdownload.c:
Update the firmware download code for the new build_ata_cmd()
arguments.

sbin/camcontrol/zone.c:
Implement support for Shingled Magnetic Recording (SMR) drives
via SCSI Zoned Block Commands (ZBC) and ATA Zoned Device ATA
Command Set (ZAC).

These specs were developed in concert, and are functionally
identical. The primary differences are due to SCSI and ATA
differences. (SCSI is big endian, ATA is little endian, for
example.)

This includes support for all commands defined in the ZBC and
ZAC specs.

sys/cam/ata/ata_all.c:
Decode a number of additional ATA command names in ata_op_string().

Add a new CCB building function, ata_read_log().

Add ata_zac_mgmt_in() and ata_zac_mgmt_out() CCB building
functions. These support both DMA and NCQ encapsulation.

sys/cam/ata/ata_all.h:
Add prototypes for ata_read_log(), ata_zac_mgmt_out(), and
ata_zac_mgmt_in().

sys/cam/ata/ata_da.c:
Revamp the ada(4) driver to support zoned devices.

Add four new probe states to gather information needed for zone
support.

Add a new adasetflags() function to avoid duplication of large
blocks of flag setting between the async handler and register
functions.

Add new sysctl variables that describe zone support and paramters.

Add support for the new BIO_ZONE bio, and all of its subcommands:
DISK_ZONE_OPEN, DISK_ZONE_CLOSE, DISK_ZONE_FINISH, DISK_ZONE_RWP,
DISK_ZONE_REPORT_ZONES, and DISK_ZONE_GET_PARAMS.

sys/cam/scsi/scsi_all.c:
Add command descriptions for the ZBC IN/OUT commands.

Add descriptions for ZBC Host Managed devices.

Add a new function, scsi_ata_pass() to do ATA passthrough over
SCSI. This will eventually replace scsi_ata_pass_16() -- it
can create the 12, 16, and 32-byte variants of the ATA
PASS-THROUGH command, and supports setting all of the
registers defined as of SAT-4, Revision 5 (March 11, 2016).

Change scsi_ata_identify() to use scsi_ata_pass() instead of
scsi_ata_pass_16().

Add a new scsi_ata_read_log() function to facilitate reading
ATA logs via SCSI.

sys/cam/scsi/scsi_all.h:
Add the new ATA PASS-THROUGH(32) command CDB. Add extended and
variable CDB opcodes.

Add Zoned Block Device Characteristics VPD page.

Add ATA Return SCSI sense descriptor.

Add prototypes for scsi_ata_read_log() and scsi_ata_pass().

sys/cam/scsi/scsi_da.c:
Revamp the da(4) driver to support zoned devices.

Add five new probe states, four of which are needed for ATA
devices.

Add five new sysctl variables that describe zone support and
parameters.

The da(4) driver supports SCSI ZBC devices, as well as ATA ZAC
devices when they are attached via a SCSI to ATA Translation (SAT)
layer. Since ZBC -> ZAC translation is a new feature in the T10
SAT-4 spec, most SATA drives will be supported via ATA commands
sent via the SCSI ATA PASS-THROUGH command. The da(4) driver will
prefer the ZBC interface, if it is available, for performance
reasons, but will use the ATA PASS-THROUGH interface to the ZAC
command set if the SAT layer doesn't support translation yet.
As I mentioned above, ZBC command support is untested.

Add support for the new BIO_ZONE bio, and all of its subcommands:
DISK_ZONE_OPEN, DISK_ZONE_CLOSE, DISK_ZONE_FINISH, DISK_ZONE_RWP,
DISK_ZONE_REPORT_ZONES, and DISK_ZONE_GET_PARAMS.

Add scsi_zbc_in() and scsi_zbc_out() CCB building functions.

Add scsi_ata_zac_mgmt_out() and scsi_ata_zac_mgmt_in() CCB/CDB
building functions. Note that these have return values, unlike
almost all other CCB building functions in CAM. The reason is
that they can fail, depending upon the particular combination
of input parameters. The primary failure case is if the user
wants NCQ, but fails to specify additional CDB storage. NCQ
requires using the 32-byte version of the SCSI ATA PASS-THROUGH
command, and the current CAM CDB size is 16 bytes.

sys/cam/scsi/scsi_da.h:
Add ZBC IN and ZBC OUT CDBs and opcodes.

Add SCSI Report Zones data structures.

Add scsi_zbc_in(), scsi_zbc_out(), scsi_ata_zac_mgmt_out(), and
scsi_ata_zac_mgmt_in() prototypes.

sys/dev/ahci/ahci.c:
Fix SEND / RECEIVE FPDMA QUEUED in the ahci(4) driver.

ahci_setup_fis() previously set the top bits of the sector count
register in the FIS to 0 for FPDMA commands. This is okay for
read and write, because the PRIO field is in the only thing in
those bits, and we don't implement that further up the stack.

But, for SEND and RECEIVE FPDMA QUEUED, the subcommand is in that
byte, so it needs to be transmitted to the drive.

In ahci_setup_fis(), always set the the top 8 bits of the
sector count register. We need it in both the standard
and NCQ / FPDMA cases.

sys/geom/eli/g_eli.c:
Pass BIO_ZONE commands through the GELI class.

sys/geom/geom.h:
Add g_io_zonecmd() prototype.

sys/geom/geom_dev.c:
Add new DIOCZONECMD ioctl, which allows sending zone commands to
disks.

sys/geom/geom_disk.c:
Add support for BIO_ZONE commands.

sys/geom/geom_disk.h:
Add a new flag, DISKFLAG_CANZONE, that indicates that a given
GEOM disk client can handle BIO_ZONE commands.

sys/geom/geom_io.c:
Add a new function, g_io_zonecmd(), that handles execution of
BIO_ZONE commands.

Add permissions check for BIO_ZONE commands.

Add command decoding for BIO_ZONE commands.

sys/geom/geom_subr.c:
Add DDB command decoding for BIO_ZONE commands.

sys/kern/subr_devstat.c:
Record statistics for REPORT ZONES commands. Note that the
number of bytes transferred for REPORT ZONES won't quite match
what is received from the harware. This is because we're
necessarily counting bytes coming from the da(4) / ada(4) drivers,
which are using the disk_zone.h interface to communicate up
the stack. The structure sizes it uses are slightly different
than the SCSI and ATA structure sizes.

sys/sys/ata.h:
Add many bit and structure definitions for ZAC, NCQ, and EPC
command support.

sys/sys/bio.h:
Convert the bio_cmd field to a straight enumeration. This will
yield more space for additional commands in the future. After
change r297955 and other related changes, this is now possible.
Converting to an enumeration will also prevent use as a bitmask
in the future.

sys/sys/disk.h:
Define the DIOCZONECMD ioctl.

sys/sys/disk_zone.h:
Add a new API for managing zoned disks. This is very close to
the SCSI ZBC and ATA ZAC standards, but uses integers in native
byte order instead of big endian (SCSI) or little endian (ATA)
byte arrays.

This is intended to offer to the complete feature set of the ZBC
and ZAC disk management without requiring the application developer
to include SCSI or ATA headers. We also use one set of headers
for ioctl consumers and kernel bio-level consumers.

sys/sys/param.h:
Bump __FreeBSD_version for sys/bio.h command changes, and inclusion
of SMR support.

usr.sbin/Makefile:
Add the zonectl utility.

usr.sbin/diskinfo/diskinfo.c
Add disk zoning capability to the 'diskinfo -v' output.

usr.sbin/zonectl/Makefile:
Add zonectl makefile.

usr.sbin/zonectl/zonectl.8
zonectl(8) man page.

usr.sbin/zonectl/zonectl.c
The zonectl(8) utility. This allows managing SCSI or ATA zoned
disks via the disk_zone.h API. You can report zones, reset write
pointers, get parameters, etc.

Sponsored by: Spectra Logic
Differential Revision: https://reviews.freebsd.org/D6147
Reviewed by: wblock (documentation)


# e3043798 29-Apr-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/kern: spelling fixes in comments.

No functional change.


# f0188618 21-Oct-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Fix multiple incorrect SYSCTL arguments in the kernel:

- Wrong integer type was specified.

- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.

- Logical OR where binary OR was expected.

- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.

- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.

- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.

- Updated "EXAMPLES" section in SYSCTL manual page.

MFC after: 3 days
Sponsored by: Mellanox Technologies


# b646225a 24-Mar-2014 Maksim Yevmenkin <emax@FreeBSD.org>

change defaule permissions on /dev/devstat. while i'm here remove
D_NEEDGIANT flag

Submitted by: jhb
Reviewed by: jhb, scottl, rwatson, delphij, phk
MFC after: 1 week


# d9fae5ab 26-Nov-2013 Andriy Gapon <avg@FreeBSD.org>

dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINE

In its stead use the Solaris / illumos approach of emulating '-' (dash)
in probe names with '__' (two consecutive underscores).

Reviewed by: markj
MFC after: 3 weeks


# 54366c0b 25-Nov-2013 Attilio Rao <attilio@FreeBSD.org>

- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging
option, unbreak the lock tracing release semantic by embedding
calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined
version of the releasing functions for mutex, rwlock and sxlock.
Failing to do so skips the lockstat_probe_func invokation for
unlocking.
- As part of the LOCKSTAT support is inlined in mutex operation, for
kernel compiled without lock debugging options, potentially every
consumer must be compiled including opt_kdtrace.h.
Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the
dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES
is linked there and it is only used as a compile-time stub [0].

[0] immediately shows some new bug as DTRACE-derived support for debug
in sfxge is broken and it was never really tested. As it was not
including correctly opt_kdtrace.h before it was never enabled so it
was kept broken for a while. Fix this by using a protection stub,
leaving sfxge driver authors the responsibility for fixing it
appropriately [1].

Sponsored by: EMC / Isilon storage division
Discussed with: rstone
[0] Reported by: rstone
[1] Discussed with: philip


# 2e1ae0b3 23-Oct-2013 Mark Johnston <markj@FreeBSD.org>

Redefine the io provider using the SDT(9) macros instead of doing everything
manually. This change has no functional impact.

Discussed with: gnn


# 40ea77a0 22-Oct-2013 Alexander Motin <mav@FreeBSD.org>

Merge GEOM direct dispatch changes from the projects/camlock branch.

When safety requirements are met, it allows to avoid passing I/O requests
to GEOM g_up/g_down thread, executing them directly in the caller context.
That allows to avoid CPU bottlenecks in g_up/g_down threads, plus avoid
several context switches per I/O.

The defined now safety requirements are:
- caller should not hold any locks and should be reenterable;
- callee should not depend on GEOM dual-threaded concurency semantics;
- on the way down, if request is unmapped while callee doesn't support it,
the context should be sleepable;
- kernel thread stack usage should be below 50%.

To keep compatibility with GEOM classes not meeting above requirements
new provider and consumer flags added:
- G_CF_DIRECT_SEND -- consumer code meets caller requirements (request);
- G_CF_DIRECT_RECEIVE -- consumer code meets callee requirements (done);
- G_PF_DIRECT_SEND -- provider code meets caller requirements (done);
- G_PF_DIRECT_RECEIVE -- provider code meets callee requirements (request).
Capable GEOM class can set them, allowing direct dispatch in cases where
it is safe. If any of requirements are not met, request is queued to
g_up or g_down thread same as before.

Such GEOM classes were reviewed and updated to support direct dispatch:
CONCAT, DEV, DISK, GATE, MD, MIRROR, MULTIPATH, NOP, PART, RAID, STRIPE,
VFS, ZERO, ZFS::VDEV, ZFS::ZVOL, all classes based on g_slice KPI (LABEL,
MAP, FLASHMAP, etc).

To declare direct completion capability disk(9) KPI got new flag equivalent
to G_PF_DIRECT_SEND -- DISKFLAG_DIRECT_COMPLETION. da(4) and ada(4) disk
drivers got it set now thanks to earlier CAM locking work.

This change more then twice increases peak block storage performance on
systems with manu CPUs, together with earlier CAM locking changes reaching
more then 1 million IOPS (512 byte raw reads from 16 SATA SSDs on 4 HBAs to
256 user-level threads).

Sponsored by: iXsystems, Inc.
MFC after: 2 months


# e431d66c 16-Oct-2013 Alexander Motin <mav@FreeBSD.org>

MFprojects/camlock r254905:
Introduce new function devstat_end_transaction_bio_bt(), adding new argument
to specify present time. Use this function to move binuptime() out of lock,
substantially reducing lock congestion when slow timecounter is used.


# 7833d828 24-Jul-2012 Neel Natu <neel@FreeBSD.org>

Fix compilation error when compiling a kernel without KDTRACE_HOOKS


# 92a9c65b 11-Jul-2012 Konstantin Belousov <kib@FreeBSD.org>

Fix build for kernels with dtrace hooks.

MFC after: 1 month


# 3fac94ba 11-Jul-2012 George V. Neville-Neil <gnn@FreeBSD.org>

Initial commit of an I/O provider for DTrace on FreeBSD.

These probes are most useful when looking into the structures
they provide, which are listed in io.d. For example:

dtrace -n 'io:genunix::start { printf("%d\n", args[0]->bio_bcount); }'

Note that the I/O systems in FreeBSD and Solaris/Illumos are sufficiently
different that there is not a 1:1 mapping from scripts that work
with one to the other.
MFC after: 1 month


# 6472ac3d 07-Nov-2011 Ed Schouten <ed@FreeBSD.org>

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


# 094efe75 13-Jun-2011 Kenneth D. Merry <ken@FreeBSD.org>

Instead of using an atomic operation to determine whether the devstat(9)
device node has been created, pass MAKEDEV_CHECKNAME in so that the devfs
code will do the check.

Use a regular static variable as before, that's good enough to keep us from
calling into devfs most of the time.

Suggested by: kib
MFC after: 1 week
Sponsored by: Spectra Logic Corporation


# 27c959cf 13-Jun-2011 Justin T. Gibbs <gibbs@FreeBSD.org>

Fix a couple of race conditions in devstat(9) initialization.

In devstat_new_entry(), there is no need to initialize the queue
and the mutex in this function. There are ways to do static
initialization on both, so use STAILQ_HEAD_INITIALIZER and
MTX_SYSINIT to initialize the queue and the mutex.

In devstat_alloc(), use an atomic test and set routine to guard
making our entry in /dev. Using just a plain static variable
creates a race condition on multiprocessor machines. If you
attempt to create a second entry in devfs, the kernel will panic.

Submitted by: kdm
Reviewed by: gibbs
Sponsored by: Spectra Logic Corporation
MFC after: 1 week.


# 23b70c1a 04-Jan-2011 Konstantin Belousov <kib@FreeBSD.org>

Finish r210923, 210926. Mark some devices as eternal.

MFC after: 2 weeks


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# 60ae52f7 21-Jun-2010 Ed Schouten <ed@FreeBSD.org>

Use ISO C99 integer types in sys/kern where possible.

There are only about 100 occurences of the BSD-specific u_int*_t
datatypes in sys/kern. The ISO C99 integer types are used here more
often.


# cfd7bace 29-Dec-2009 Robert Noland <rnoland@FreeBSD.org>

Update d_mmap() to accept vm_ooffset_t and vm_memattr_t.

This replaces d_mmap() with the d_mmap2() implementation and also
changes the type of offset to vm_ooffset_t.

Purge d_mmap2().

All driver modules will need to be rebuilt since D_VERSION is also
bumped.

Reviewed by: jhb@
MFC after: Not in this lifetime...


# 39df6da8 18-Sep-2009 Attilio Rao <attilio@FreeBSD.org>

Don't allocate new unnecessary pages when devstat_alloc() looses the
run for re-acuiring the lock, but recheck if new pages are allocatable
from the pool and free the previously allocated ones.

Tested by: pho, Giovanni Trematerra
<giovanni dot trematerra at gmail dot com>


# 2c204a16 03-Feb-2009 Warner Losh <imp@FreeBSD.org>

Use NULL in preference to 0 in pointer contexts.


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# 279d4d37 03-May-2005 Jeff Roberson <jeff@FreeBSD.org>

- Remove two mtx_asserts that can incorrectly trigger if
devstat_end_transaction is called from a fast interrupt. Presently
there is no way for mtx_assert to determine that we're not executing
in a real thread context.

Submitted by: jhusted@isilon.com


# 9454b2d8 06-Jan-2005 Warner Losh <imp@FreeBSD.org>

/* -> /*- for copyright notices, minor format tweaks as necessary


# 89c9c53d 16-Jun-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.


# dc08ffec 21-Feb-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Device megapatch 4/6:

Introduce d_version field in struct cdevsw, this must always be
initialized to D_VERSION.

Flip sense of D_NOGIANT flag to D_NEEDGIANT, this involves removing
four D_NOGIANT flags and adding 145 D_NEEDGIANT flags.


# 70cd7713 26-Sep-2003 Poul-Henning Kamp <phk@FreeBSD.org>

The present defaults for the open and close for device drivers which
provide no methods does not make any sense, and is not used by any
driver.

It is a pretty hard to come up with even a theoretical concept of
a device driver which would always fail open and close with ENODEV.

Change the defaults to be nullopen() and nullclose() which simply
does nothing.

Remove explicit initializations to these from the drivers which
already used them.


# 037c3d0f 16-Aug-2003 Poul-Henning Kamp <phk@FreeBSD.org>

It is not an error to have no devices in the kernel: Return the
generation number and start it from one instead of zero.


# 677b542e 10-Jun-2003 David E. O'Brien <obrien@FreeBSD.org>

Use __FBSDID().


# 6e17a0d7 17-Apr-2003 Hartmut Brandt <harti@FreeBSD.org>

Unbreak vinum, iostat and systat on sparc64 by changing the devstat
generation number back to a long (sizeof(u_int) != sizeof(long) on
sparc64). The alternative would have been to heavily change the libdevstat API.

Discussed with: phk, ken


# 227f9a1c 24-Mar-2003 Jake Burkholder <jake@FreeBSD.org>

- Add vm_paddr_t, a physical address type. This is required for systems
where physical addresses larger than virtual addresses, such as i386s
with PAE.
- Use this to represent physical addresses in the MI vm system and in the
i386 pmap code. This also changes the paddr parameter to d_mmap_t.
- Fix printf formats to handle physical addresses >4G in the i386 memory
detection code, and due to kvtop returning vm_paddr_t instead of u_long.

Note that this is a name change only; vm_paddr_t is still the same as
vm_offset_t on all currently supported platforms.

Sponsored by: DARPA, Network Associates Laboratories
Discussed with: re, phk (cdevsw change)


# 84513ca2 18-Mar-2003 Jake Burkholder <jake@FreeBSD.org>

long != int. Use SYSCTL_UINT for kern.devstat.generation. Fixes booting
on sparc64.


# c967bee7 18-Mar-2003 Poul-Henning Kamp <phk@FreeBSD.org>

If devstat_new_entry() is passed a unit number of -1 assume that
the devstat is for an "interior" GEOM node and register using the
name argument as a geom identity pointer. Do not put these devstat
structures on the list returned by the sysctl.

This gives us the ability to tell the two kinds of nodes apart and
leave the current "strictly physical" view of devstat intact without
modifications, yet be able to use devstat for both kinds of devices.

It also saves us bloating struct devstat with another 48 bytes of
space for the name. At least for now.

Reviewed by: ken


# 224d5539 18-Mar-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Make devstat fully Giant agnostic:

Add a mutex and protect the allocation and traversal of the list with it.

When we allocate a page for devstat use we drop the mutex and use
M_WAITOK this is not nice, but under the given circumstances the
best we can do.

In the sysctl handler for returning the devstat entries we do not want to
hold the mutex across copyout(9) calls, so we keep a very careful eye on
the devstat_generation count, and abandon with EBUSY if it changes under
our feet.

Specifically test for BIO_WRITE, rather than default non-read,non-deletes
as write. Make the default be DEVSTAT_NO_DATA.

Add atomic increments of the sequence[01] fields so applications using the
mmap'ed view stand a chance of detecting updates in progress.

Reviewed by: ken


# 538aabaa 18-Mar-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Make devstat_new_entry() take a const void * rather than const char *
argument, GEOM nodes are not identified by ascii string.


# 5fa5746d 16-Mar-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Add a #define for the device name of the mmap device for devstat.
Constify the geom identification pointer.


# d15cd510 15-Mar-2003 Poul-Henning Kamp <phk@FreeBSD.org>

One devstat_start_transaction_bio() is enough.


# 7194d335 15-Mar-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Run a revision of the devstat interface:

Kernel:

Change statistics to use the *uptime() timescale (ie: relative to
boottime) rather than the UTC aligned timescale. This makes the
device statistics code oblivious to clock steps.

Change timestamps to bintime format, they are cheaper.

Remove the "busy_count", and replace it with two counter fields:
"start_count" and "end_count", which are updated in the down and
up paths respectively. This removes the locking constraint on
devstat.

Add a timestamp argument to devstat_start_transaction(), this will
normally be a timestamp set by the *_bio() function in bp->bio_t0.
Use this field to calculate duration of I/O operations.

Add two timestamp arguments to devstat_end_transaction(), one is
the current time, a NULL pointer means "take timestamp yourself",
the other is the timestamp of when this transaction started (see
above).

Change calculation of busy_time to operate on "the salami principle":
Only when we are idle, which we can determine by the start+end
counts being identical, do we update the "busy_from" field in the
down path. In the up path we accumulate the timeslice in busy_time
and update busy_from.

Change the byte_* and num_* fields into two arrays: bytes[] and
operations[].

Userland:

Change the misleading "busy_time" name to be called "snap_time" and
make the time long double since that is what most users need anyway,
fill it using clock_gettime(CLOCK_MONOTONIC) to put it on the same
timescale as the kernel fields.

Change devstat_compute_etime() to operate on struct bintime.

Remove the version 2 legacy interface: the change to bintime makes
compatibility far too expensive.

Fix a bug in systat's "vm" page where boot relative busy times would
be bogus.

Bump __FreeBSD_version to 500107

Review & Collaboration by: ken


# 9fa85de2 15-Mar-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Add a devstat_start_transaction_bio() to match the
devstat_end_transaction_bio() we already have.

For now it just calls devstat_start_transaction(), but that will change
shortly.


# d42ee4e4 09-Mar-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Note that MAJOR_AUTO is now the default if d_maj is not initialized. This
is more robust and prevents the hijacking of /dev/console for the typical
mistake.

Remove unneeded MAJOR_AUTO uses, it is only needed explicitly now if the
driver source has cross-branch compatibility to old releases.


# f37de122 08-Mar-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Retire devstat_add_entry() as a public function and bump __FreeBSD_version
to mark this act.


# c7e73d59 08-Mar-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Introduce a device driver for /dev/devstat, this will allow us to mmap
the device statistics structures into userland instead of using sysctl.

Introduce new devstat_new_entry() function which allocates the devstat
structure an calls devstat_add_entry() on it.


# e80fb434 17-Oct-2002 Robert Drehmel <robert@FreeBSD.org>

Use strlcpy() instead of strncpy() to copy NUL terminated strings
for safety and consistency.


# 57c10583 22-Feb-2002 Poul-Henning Kamp <phk@FreeBSD.org>

GC: BIO_ORDERED, various infrastructure dealing with BIO_ORDERED.


# 938a4e5c 04-Aug-2001 Thomas Moestl <tmm@FreeBSD.org>

Export the head structure for the device statistics STAILQ in
sys/devicestat.h, so that the queue can be walked in crashdumps using
libkvm.


# 37d40066 04-Feb-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Another round of the <sys/queue.h> FOREACH transmogriffer.

Created with: sed(1)
Reviewed by: md5(1)


# 9701cd40 05-Jul-2000 John Baldwin <jhb@FreeBSD.org>

Support for unsigned integer and long sysctl variables. Update the
SYSCTL_LONG macro to be consistent with other integer sysctl variables
and require an initial value instead of assuming 0. Update several
sysctl variables to use the unsigned types.

PR: 15251
Submitted by: Kelly Yancey <kbyanc@posi.net>


# 77978ab8 04-Jul-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Previous commit changing SYSCTL_HANDLER_ARGS violated KNF.

Pointed out by: bde


# 82d9ae4e 03-Jul-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Style police catches up with rev 1.26 of src/sys/sys/sysctl.h:

Sanitize SYSCTL_HANDLER_ARGS so that simplistic tools can grog our
sources:

-sysctl_vm_zone SYSCTL_HANDLER_ARGS
+sysctl_vm_zone (SYSCTL_HANDLER_ARGS)


# e3975643 25-May-2000 Jake Burkholder <jake@FreeBSD.org>

Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by: msmith and others


# 740a1973 23-May-2000 Jake Burkholder <jake@FreeBSD.org>

Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by: phk
Reviewed by: phk
Approved by: mdodd


# ad7ba3d4 06-May-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Remove devstat_end_transaction_buf() everybody uses
devstat_end_transaction_bio() now.


# 9626b608 05-May-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Separate the struct bio related stuff out of <sys/buf.h> into
<sys/bio.h>.

<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall
not be made a nested include according to bdes teachings on the
subject of nested includes.

Diskdrivers and similar stuff below specfs::strategy() should no
longer need to include <sys/buf.> unless they need caching of data.

Still a few bogus uses of struct buf to track down.

Repocopy by: peter


# 282ac69e 02-Apr-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Clone bio versions of certain bits of infrastructure:
devstat_end_transaction_bio()
bioq_* versions of bufq_* incl bioqdisksort()
the corresponding "buf" versions will disappear when no longer used.

Move b_offset, b_data and b_bcount to struct bio.

Add BIO_FORMAT as a hack for fd.c etc.

We are now largely ready to start converting drivers to use struct
bio instead of struct buf.


# c244d2de 02-Apr-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Move B_ERROR flag to b_ioflags and call it BIO_ERROR.

(Much of this done by script)

Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED.

Move b_pblkno and b_iodone_chain to struct bio while we transition, they
will be obsoleted once bio structs chain/stack.

Add bio_queue field for struct bio aware disksort.

Address a lot of stylistic issues brought up by bde.


# 21144e3b 20-Mar-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Remove B_READ, B_WRITE and B_FREEBUF and replace them with a new
field in struct buf: b_iocmd. The b_iocmd is enforced to have
exactly one bit set.

B_WRITE was bogusly defined as zero giving rise to obvious coding
mistakes.

Also eliminate the redundant struct buf flag B_CALL, it can just
as efficiently be done by comparing b_iodone to NULL.

Should you get a panic or drop into the debugger, complaining about
"b_iocmd", don't continue. It is likely to write on your disk
where it should have been reading.

This change is a step in the direction towards a stackable BIO capability.

A lot of this patch were machine generated (Thanks to style(9) compliance!)

Vinum users: Greg has not had time to test this yet, be careful.


# 2e3c8fcb 16-Nov-1999 Poul-Henning Kamp <phk@FreeBSD.org>

This is a partial commit of the patch from PR 14914:

Alot of the code in sys/kern directly accesses the *Q_HEAD and *Q_ENTRY
structures for list operations. This patch makes all list operations
in sys/kern use the queue(3) macros, rather than directly accessing the
*Q_{HEAD,ENTRY} structures.

This batch of changes compile to the same object files.

Reviewed by: phk
Submitted by: Jake Burkholder <jake@checker.org>
PR: 14914


# 39b3c6a9 02-Oct-1999 Bruce Evans <bde@FreeBSD.org>

Removed unnecessary splclock() protection for getmicrotime() and
getmicrouptime().

Removed unused includes.

Reviewed by: ken


# c8a90c31 22-Sep-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Stylistic cleanup.

Submitted by: ken.


# 8db3b947 19-Sep-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Move an end-paren to its intended place.


# f80d57ee 18-Sep-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Count "free" operations in their own new tranaction type.

WARNING: libdevstat, iostat, vmstat, systat etc etc will need a recompile.

Add devstat_end_transaction_buf() which pulls all the vital data out
of a struct buf which is ready for biodone().


# c3aac50f 27-Aug-1999 Peter Wemm <peter@FreeBSD.org>

$Id$ -> $FreeBSD$


# 3d177f46 03-May-1999 Bill Fumerola <billf@FreeBSD.org>

Add sysctl descriptions to many SYSCTL_XXXs

PR: kern/11197
Submitted by: Adrian Chadd <adrian@FreeBSD.org>
Reviewed by: billf(spelling/style/minor nits)
Looked at by: bde(style)


# 2a96b3fa 10-Apr-1999 Eivind Eklund <eivind@FreeBSD.org>

Staticize.


# 2a888f93 09-Feb-1999 Kenneth D. Merry <ken@FreeBSD.org>

Add a prioritization field to the devstat_add_entry() call so that
peripheral drivers can determine where in the devstat(9) list they are
inserted.

This requires recompilation of libdevstat, systat, vmstat, rpc.rstatd, and
any ports that depend on the devstat code, since the size of the devstat
structure has changed. The devstat version number has been incremented as
well to reflect the change.

This sorts devices in the devstat list in "more interesting" to "less
interesting" order. So, for instance, da devices are now more important
than floppy drives, and so will appear before floppy drives in the default
output from systat, iostat, vmstat, etc.

The order of devices is, for now, kept in a central table in devicestat.h.
If individual drivers were able to make a meaningful decision on what
priority they should be at attach time, we could consider splitting the
priority information out into the various drivers. For now, though, they
have no way of knowing that, so it's easier to put them in an easy to find
table.

Also, move the checkversion() call in vmstat(8) to a more logical place.

Thanks to Bruce and David O'Brien for suggestions, for reviewing this, and
for putting up with the long time it has taken me to commit it. Bruce did
object somewhat to the central priority table (he would rather the
priorities be distributed in each driver), so his objection is duly noted
here.

Reviewed by: bde, obrien


# 486bddb0 27-Dec-1998 Doug Rabson <dfr@FreeBSD.org>

Fix some 64bit truncation problems which crept into SYSCTL_LONG() with the
last cleanup. Since the oid_arg2 field of struct sysctl_oid is not wide
enough to hold a long, the SYSCTL_LONG() macro has been modified to only
support exporting long variables by pointer instead of by value.

Reviewed by: bde


# 2127f260 04-Dec-1998 Archie Cobbs <archie@FreeBSD.org>

Examine all occurrences of sprintf(), strcat(), and str[n]cpy()
for possible buffer overflow problems. Replaced most sprintf()'s
with snprintf(); for others cases, added terminating NUL bytes where
appropriate, replaced constants like "16" with sizeof(), etc.

These changes include several bug fixes, but most changes are for
maintainability's sake. Any instance where it wasn't "immediately
obvious" that a buffer overflow could not occur was made safer.

Reviewed by: Bruce Evans <bde@zeta.org.au>
Reviewed by: Matthew Dillon <dillon@apollo.backplane.com>
Reviewed by: Mike Spengler <mks@networkcs.com>


# d9e371b9 15-Nov-1998 Kenneth D. Merry <ken@FreeBSD.org>

Now that the wd driver is fixed (Thanks Bruce!), re-enable the
devstat_end_transaction error message that gets printed whenever the
busy count is < 0.

This will help catch drivers that improperly implement devstat(9) support.


# 14177d72 14-Nov-1998 Garrett Wollman <wollman@FreeBSD.org>

My changes to the new device interface:

- Interface wth the new resource manager.
- Allow for multiple drivers implementing a single devclass.
- Remove ordering dependencies between header files.
- Style cleanup.
- Add DEVICE_SUSPEND and DEVICE_RESUME methods.
- Move to a single-phase interrupt setup scheme.

Kernel builds on the Alpha are brken until Doug gets a chance to incorporate
these changes on that side.

Agreed to in principle by: dfr


# a937ccdc 14-Oct-1998 Kenneth D. Merry <ken@FreeBSD.org>

Disable the 'devstat_end_transaction' busy count printf until after 3.0
release goes out the door. We know there's a bug in the devstat
implementation in the wd driver, but bde and msmith haven't been able to
fix it yet.

So, disable the printf to avoid confusing/worrying people.

Suggested by: msmith


# a795f8bb 05-Oct-1998 Kenneth D. Merry <ken@FreeBSD.org>

Make the printf when busy_time < 0 a little more descriptive. This may
help track down bugs in the devstat implementation in various drivers.
(i.e., any situation where the driver does not call the devstat routines
once and only once for each transaction initiation and completion)

Prompted by: msmith


# bcc6a3da 19-Sep-1998 Kenneth D. Merry <ken@FreeBSD.org>

Change the devstat generation number from an int to a long. The int-sized
generation was causing unaligned access faults on the Alpha.

I have incremented the devstat version number, since this is an interface
change. You'll need to recompile libdevstat, systat, iostat, vmstat and
rpc.rstatd along with your kernel.

Partially Submitted by: Andrew Gallatin <gallatin@cs.duke.edu>


# 7a59208d 15-Sep-1998 Justin T. Gibbs <gibbs@FreeBSD.org>

New Kernel device statistics code.

Submitted by: "Kenneth D. Merry" <ken@plutotech.com>