History log of /freebsd-current/sys/sys/aio.h
Revision Date Author Comments
# 8dfc788b 03-Feb-2024 Konstantin Belousov <kib@FreeBSD.org>

aio_read2/aio_write2: add AIO_OP2_VECTORED

Suggested by: Vinícius dos Santos Oliveira <vini.ipsmaker@gmail.com>
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D43448


# 06cb1c3f 31-Jan-2024 Konstantin Belousov <kib@FreeBSD.org>

libc: add aio_read2() and aio_write2() functions

as wrappers around lio_listio(LIO_READ/WRITE | LIO_FOFFSET, &iocb, 1);

Suggested and reviewed by: jhb
Discussed with: asomers
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differrential revision: https://reviews.freebsd.org/D43448


# e4b7bbd6 13-Jan-2024 Konstantin Belousov <kib@FreeBSD.org>

lio_listio(2): add LIO_FOFFSET flag to ignore aiocb aio_offset

and use the current file offset instead.

Requested by: Vinícius dos Santos Oliveira <vini.ipsmaker@gmail.com>
Reviewed by: jhb
Discussed with: asomers
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D43448


# 95ee2897 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: two-line .h pattern

Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix


# 98844e99 15-Feb-2023 John Baldwin <jhb@FreeBSD.org>

aio: Fix more synchronization issues in aio_biowakeup.

- Use atomic_store to set job->error. atomic_set does an or
operation, not assignment.

- Use refcount_* to manage job->nbio.

This ensures proper memory barriers are present so that the last bio
won't see a possibly stale value of job->error.

- Don't re-read job->error after reading it via atomic_load.

Reported by: markj (1)
Reviewed by: mjg, markj
Differential Revision: https://reviews.freebsd.org/D38611


# f30a1ae8 22-Aug-2021 Thomas Munro <tmunro@FreeBSD.org>

lio_listio(2): Allow LIO_READV and LIO_WRITEV.

Allow multiple vector IOs to be started with one system call.
aio_readv() and aio_writev() already used these opcodes under the
covers. This commit makes them available to user space.

Being non-standard extensions, they're only visible if __BSD_VISIBLE is
defined, like the functions.

Reviewed by: asomers, kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D31627


# 2247f489 02-Jan-2021 Alan Somers <asomers@FreeBSD.org>

aio: micro-optimize the lio_opcode assignments

This allows slightly more efficient opcode testing in-kernel. It is
transparent to userland, except to applications that sneakily submit
aio fsync or aio mlock operations via lio_listio, which has never been
documented, requires the use of deliberately undefined constants
(LIO_SYNC and LIO_MLOCK), and is arguably a bug.

Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D27942


# ff1a3078 09-Jan-2021 Alan Somers <asomers@FreeBSD.org>

lio_listio: validate aio_lio_opcode

Previously, we would accept any kind of LIO_* opcode, including ones
that were intended for in-kernel use only like LIO_SYNC (which is not
defined in userland). The situation became more serious with
022ca2fc7fe08d51f33a1d23a9be49e6d132914e. After that revision, setting
aio_lio_opcode to LIO_WRITEV or LIO_READV would trigger an assertion.

Note that POSIX does not specify what should happen if aio_lio_opcode is
invalid.

MFC-with: 022ca2fc7fe08d51f33a1d23a9be49e6d132914e
Reviewed by: jhb, tmunro, 0mp
Differential Revision: <https://reviews.freebsd.org/D28078


# 801ac943 07-Jan-2021 Thomas Munro <tmunro@FreeBSD.org>

aio_fsync(2): Support O_DSYNC.

aio_fsync(O_DSYNC, ...) is the asynchronous version of fdatasync(2).

Reviewed by: kib, asomers, jhb
Differential Review: https://reviews.freebsd.org/D25071


# 022ca2fc 02-Jan-2021 Alan Somers <asomers@FreeBSD.org>

Add aio_writev and aio_readv

POSIX AIO is great, but it lacks vectored I/O functions. This commit
fixes that shortcoming by adding aio_writev and aio_readv. They aren't
part of the standard, but they're an obvious extension. They work just
like their synchronous equivalents pwritev and preadv.

It isn't yet possible to use vectored aiocbs with lio_listio, but that
could be added in the future.

Reviewed by: jhb, kib, bcr
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D27743


# 01206038 21-Dec-2020 Alan Somers <asomers@FreeBSD.org>

AIO: remove the kaiocb->bio linkage

Vectored aio will require each aiocb to be associated with multiple
bios, so we can't store a link to the latter from the former. But we
don't really need to. aio_biowakeup already knows the bio it's using,
and the other fields can be stored within the bio and/or buf itself.

Also, remove the unused kaiocb.backend2 field.

Reviewed By: kib
Differential Revision: https://reviews.freebsd.org/D27682


# cd853791 27-Nov-2020 Konstantin Belousov <kib@FreeBSD.org>

Make MAXPHYS tunable. Bump MAXPHYS to 1M.

Replace MAXPHYS by runtime variable maxphys. It is initialized from
MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.

Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer
cache buffers exactly to atop(maxbcachebuf) (currently it is sized to
atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1.
The +1 for pbufs allow several pbuf consumers, among them vmapbuf(),
to use unaligned buffers still sized to maxphys, esp. when such
buffers come from userspace (*). Overall, we save significant amount
of otherwise wasted memory in b_pages[] for buffer cache buffers,
while bumping MAXPHYS to desired high value.

Eliminate all direct uses of the MAXPHYS constant in kernel and driver
sources, except a place which initialize maxphys. Some random (and
arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted
straight. Some drivers, which use MAXPHYS to size embeded structures,
get private MAXPHYS-like constant; their convertion is out of scope
for this work.

Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs,
dev/siis, where either submitted by, or based on changes by mav.

Suggested by: mav (*)
Reviewed by: imp, mav, imp, mckusick, scottl (intermediate versions)
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D27225


# 21b8a7a6 12-Feb-2018 Alan Somers <asomers@FreeBSD.org>

Fix a comment. No functional change.

MFC after: 3 weeks
Sponsored by: Spectra Logic Corp


# c4e20cad 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/sys: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.


# b8db1c9f 14-Nov-2017 John Baldwin <jhb@FreeBSD.org>

Use #if instead of #ifdef for __BSD_VISIBLE tests.

__BSD_VISIBLE is always defined and it's value instead needs to be
tested via #if to determine if FreeBSD-specific APIs should be
exposed.

PR: 196226, 223481 (exp-run)
Submitted by: pluknet
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D12977


# b1012d80 21-Jun-2016 John Baldwin <jhb@FreeBSD.org>

Account for AIO socket operations in thread/process resource usage.

File and disk-backed I/O requests store counts of read/written disk
blocks in each AIO job so that they can be charged to the thread that
completes an AIO request via aio_return() or aio_waitcomplete(). This
change extends AIO jobs to store counts of received/sent messages and
updates socket backends to set these counts accordingly. Note that
the socket backends are careful to only charge a single messages for
each AIO request even though a single request on a blocking socket might
invoke sosend or soreceive multiple times. This is to mimic the
resource accounting of synchronous read/write.

Adjust the UNIX socketpair AIO test to verify that the message resource
usage counts update accordingly for aio_read and aio_write.

Approved by: re (hrs)
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D6911


# fe0bdd1d 15-Jun-2016 John Baldwin <jhb@FreeBSD.org>

Move backend-specific fields of kaiocb into a union.

This reduces the size of kaiocb slightly. I've also added some generic
fields that other backends can use in place of the BIO-specific fields.

Change the socket and Chelsio DDP backends to use 'backend3' instead of
abusing _aiocb_private.status directly. This confines the use of
_aiocb_private to the AIO internals in vfs_aio.c.

Reviewed by: kib (earlier version)
Approved by: re (gjb)
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D6547


# 049342a9 30-May-2016 Ed Schouten <ed@FreeBSD.org>

Add missing restrict keywords to lio_listio().


# bb430bc7 21-Mar-2016 John Baldwin <jhb@FreeBSD.org>

Fully handle size_t lengths in AIO requests.

First, update the return types of aio_return() and aio_waitcomplete() to
ssize_t.

POSIX requires aio_return() to return a ssize_t so that it can represent
all return values from read() and write(). aio_waitcomplete() should use
ssize_t for the same reason.

aio_return() has used ssize_t in <aio.h> since r31620 but the manpage and
system call entry were not updated. aio_waitcomplete() has always
returned int.

Note that this does not require new system call stubs as this is
effectively only an API change in how the compiler interprets the return
value.

Second, allow aio_nbytes values up to IOSIZE_MAX instead of just INT_MAX.

aio_read/write should now honor the same length limits as normal read/write.

Third, use longs instead of ints in the aio_return() and aio_waitcomplete()
system call functions so that the 64-bit size_t in the in-kernel aiocb
isn't truncated to 32-bits before being copied out to userland or
being returned.

Finally, a simple test has been added to verify the bounds checking on the
maximum read size from a file.


# f3215338 01-Mar-2016 John Baldwin <jhb@FreeBSD.org>

Refactor the AIO subsystem to permit file-type-specific handling and
improve cancellation robustness.

Introduce a new file operation, fo_aio_queue, which is responsible for
queueing and completing an asynchronous I/O request for a given file.
The AIO subystem now exports library of routines to manipulate AIO
requests as well as the ability to run a handler function in the
"default" pool of AIO daemons to service a request.

A default implementation for file types which do not include an
fo_aio_queue method queues requests to the "default" pool invoking the
fo_read or fo_write methods as before.

The AIO subsystem permits file types to install a private "cancel"
routine when a request is queued to permit safe dequeueing and cleanup
of cancelled requests.

Sockets now use their own pool of AIO daemons and service per-socket
requests in FIFO order. Socket requests will not block indefinitely
permitting timely cancellation of all requests.

Due to the now-tight coupling of the AIO subsystem with file types,
the AIO subsystem is now a standard part of all kernels. The VFS_AIO
kernel option and aio.ko module are gone.

Many file types may block indefinitely in their fo_read or fo_write
callbacks resulting in a hung AIO daemon. This can result in hung
user processes (when processes attempt to cancel all outstanding
requests during exit) or a hung system. To protect against this, AIO
requests are only permitted for known "safe" files by default. AIO
requests for all file types can be enabled by setting the new
vfs.aio.enable_usafe sysctl to a non-zero value. The AIO tests have
been updated to skip operations on unsafe file types if the sysctl is
zero.

Currently, AIO requests on sockets and raw disks are considered safe
and are enabled by default. aio_mlock() is also enabled by default.

Reviewed by: cem, jilles
Discussed with: kib (earlier version)
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D5289


# 6160e12c 08-Jun-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Add new system call - aio_mlock(). The name speaks for itself. It allows
to perform the mlock(2) operation, which can consume a lot of time, under
control of aio(4).

Reviewed by: kib, jilles
Sponsored by: Nginx, Inc.


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# 34d3ac59 14-Mar-2009 David Schultz <das@FreeBSD.org>

Namespace: aio_waitcomplete() is a BSD extension.
Also, don't pollute the namespace by including <sys/time.h>.


# c4592cbc 10-Dec-2008 John Baldwin <jhb@FreeBSD.org>

Rather than using a char array with explicit assumptions about the layout
of 'struct osigevent' in 'struct aiocb', use int and void pointer spare
members that are identical to 'struct osigevent'.

MFC after: 1 month


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# 53fcc63c 23-Mar-2006 David Xu <davidxu@FreeBSD.org>

Add aio_fsync() prototype.


# 0972628a 29-Oct-2005 David Xu <davidxu@FreeBSD.org>

Fix sigevent's POSIX incompatible problem by adding member fields
sigev_notify_function and sigev_notify_attributes. AIO syscalls
use sigevent, so they have to be adjusted.

Reviewed by: alc


# 60727d8b 06-Jan-2005 Warner Losh <imp@FreeBSD.org>

/* -> /*- for license, minor formatting changes


# 48dac059 06-Jan-2002 Alan Cox <alc@FreeBSD.org>

o Add missing synchronization (splnet()/splx()) in aio_free_entry().
o Move the definition of struct aiocblist from sys/aio.h to kern/vfs_aio.c.
o Make aio_swake_cb() static.


# eae43d0e 31-Dec-2001 Alan Cox <alc@FreeBSD.org>

o Some style(9)-motivated changes to white space.


# 21d56e9c 29-Dec-2001 Alfred Perlstein <alfred@FreeBSD.org>

Make AIO a loadable module.

Remove the explicit call to aio_proc_rundown() from exit1(), instead AIO
will use at_exit(9).

Add functions at_exec(9), rm_at_exec(9) which function nearly the
same as at_exec(9) and rm_at_exec(9), these functions are called
on behalf of modules at the time of execve(2) after the image
activator has run.

Use a modified version of tegge's suggestion via at_exec(9) to close
an exploitable race in AIO.

Fix SYSCALL_MODULE_HELPER such that it's archetecuterally neutral,
the problem was that one had to pass it a paramater indicating the
number of arguments which were actually the number of "int". Fix
it by using an inline version of the AS macro against the syscall
arguments. (AS should be available globally but we'll get to that
later.)

Add a primative system for dynamically adding kqueue ops, it's really
not as sophisticated as it should be, but I'll discuss with jlemon when
he's around.


# b40ce416 12-Sep-2001 Julian Elischer <julian@FreeBSD.org>

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 13644654 10-Mar-2001 Alan Cox <alc@FreeBSD.org>

When aio_read/write() is used on a raw device, physical buffers are
used for up to "vfs.aio.max_buf_aio" of the requests. If a request
size is MAXPHYS, but the request base isn't page aligned, vmapbuf()
will map the end of the user space buffer into the start of the kva
allocated for the next physical buffer. Don't use a physical buffer
in this case. (This change addresses problem report 25617.)

When an aio_read/write() on a raw device has completed, timeout() is
used to schedule a signal to the process. Thus, the reporting is
delayed up to 10 ms (assuming hz is 100). The process might have
terminated in the meantime, causing a trap 12 when attempting to
deliver the signal. Thus, the timeout must be cancelled when removing
the job.

aio jobs in state JOBST_JOBQGLOBAL should be removed from the
kaio_jobqueue list during process rundown.

During process rundown, some aio jobs might move from one list to a
different list that has already been "emptied", causing the rundown to
be incomplete. Retry the rundown.

A call to BUF_KERNPROC() is needed after obtaining a physical buffer
to disassociate the lock from the running process since it can return
to userland without releasing that lock.

PR: 25617
Submitted by: tegge


# 4270aed7 04-Mar-2001 Alan Cox <alc@FreeBSD.org>

Remove another outdated comment about aio_cancel().


# 88137067 04-Mar-2001 Alan Cox <alc@FreeBSD.org>

Remove an out-of-date comment: aio_cancel() has been supported
since revision 1.69 of kern/vfs_aio.c.


# fb579e9a 03-Mar-2001 Alan Cox <alc@FreeBSD.org>

Remove the field privatemodes from struct __aiocb_private and the
related code from aio_read() and aio_write(). This field was
intended, but never used, to allow a mythical user-level library to
make an aio_read() or aio_write() behave like an ordinary read() or
write(), i.e., a blocking I/O operation.


# badfcb1b 25-Nov-2000 Alan Cox <alc@FreeBSD.org>

Undo rev 1.8: This commit actually added a second declaration
of aio_error() to the same file.


# 6464c1fe 03-Oct-2000 Alan Cox <alc@FreeBSD.org>

Remove another unused field from struct __aiocb_private.


# b5fbbe95 25-Sep-2000 Alan Cox <alc@FreeBSD.org>

Remove (long) unused fields from struct __aiocb_private.


# 96b8882a 05-Sep-2000 Alan Cox <alc@FreeBSD.org>

Make the basic AIO functions, i.e., aio_read() and aio_write(),
work on the Alpha, at least, for the aio_qphysio() case. Specifically,
fix an unaligned access fault.


# e3975643 25-May-2000 Jake Burkholder <jake@FreeBSD.org>

Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by: msmith and others


# 740a1973 23-May-2000 Jake Burkholder <jake@FreeBSD.org>

Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by: phk
Reviewed by: phk
Approved by: mdodd


# 4807c4eb 22-Apr-2000 Garrett Wollman <wollman@FreeBSD.org>

Fix a warning with a forward struct declaration.


# cb679c38 16-Apr-2000 Jonathan Lemon <jlemon@FreeBSD.org>

Introduce kqueue() and kevent(), a kernel event notification facility.


# bfbbc4aa 13-Jan-2000 Jason Evans <jasone@FreeBSD.org>

Add aio_waitcomplete(). Make aio work correctly for socket descriptors.
Make gratuitous style(9) fixes (me, not the submitter) to make the aio
code more readable.

PR: kern/12053
Submitted by: Chris Sedore <cmsedore@maxwell.syr.edu>


# 664a31e4 28-Dec-1999 Peter Wemm <peter@FreeBSD.org>

Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot). This is consistant with the other
BSD's who made this change quite some time ago. More commits to come.


# c3aac50f 27-Aug-1999 Peter Wemm <peter@FreeBSD.org>

$Id$ -> $FreeBSD$


# 501bebb5 14-Jun-1999 Nik Clayton <nik@FreeBSD.org>

Include <sys/time.h> for correctness. BDE has a better version of this,
but it's more complex, and in his words

Commit your version, since it is the only one that is clearly permitted
(if not best), and I'll untangle it later.

PR: docs/11589
Reviewed by: Bruce "he kicks ass" Evans


# f0231545 17-Jan-1999 Dmitrij Tejblum <dt@FreeBSD.org>

Bring a bit closer to the normal form. (In particular, add
__BEGIN_DECLS/__END_DECLS).


# 71af8fba 11-Apr-1998 John Dyson <dyson@FreeBSD.org>

Add aio_error decl.


# 8a6472b7 28-Mar-1998 Peter Dufault <dufault@FreeBSD.org>

Finish _POSIX_PRIORITY_SCHEDULING. Needs P1003_1B and
_KPOSIX_PRIORITY_SCHEDULING options to work. Changes:

Change all "posix4" to "p1003_1b". Misnamed files are left
as "posix4" until I'm told if I can simply delete them and add
new ones;

Add _POSIX_PRIORITY_SCHEDULING system calls for FreeBSD and Linux;

Add man pages for _POSIX_PRIORITY_SCHEDULING system calls;

Add options to LINT;

Minor fixes to P1003_1B code during testing.


# 2d5936d3 08-Mar-1998 Peter Dufault <dufault@FreeBSD.org>

Preprocessor directives require a leading '#'

Submitted by: ccsanady@friley585.res.iastate.edu


# 7a2ac24c 08-Mar-1998 Peter Dufault <dufault@FreeBSD.org>

Put sigevent and AIO_LISTIO_MAX back in aio.h so
that kernels can be built.


# aac4ad2c 08-Mar-1998 Peter Dufault <dufault@FreeBSD.org>

Reviewed by: bde

Changes to support building with _POSIX_SOURCE set to 199309L:

1. Add sys/_posix.h to handle those preprocessor defs that POSIX
says have effects when defined before including any header files;

2. Change POSIX4_VISIBLE back to _POSIX4_VISIBLE

3. Add _POSIX4_VISIBLE_HISTORICALLY for pre-existing BSD features now
defined in POSIX. These show up when:

_POSIX_SOURCE and _POSIX_C_SOURCE are not set or
_POSIX_C_SOURCE is set >= 199309L

and vanish when:

_POSIX_SOURCE is set or _POSIX_C_SOURCE is < 199309L.

4. Explain these in man 9 posix4;

5. Include _posix.h and conditionalize on new feature test.


# 78922e41 07-Dec-1997 John Dyson <dyson@FreeBSD.org>

Correct prototypes to match POSIX. Correct return code for aio_cancel.
Submitted by: Alex Nash <nash@mcs.com>


# 5aaef07c 16-Jul-1997 John Dyson <dyson@FreeBSD.org>

Clean up some lint associated with the AIO code.


# 0f606585 15-Jun-1997 John Dyson <dyson@FreeBSD.org>

Ouch!!! This should fix a serious build problem after the addition of the
new preliminary AIO support. Unfortunately, I had a stray copy of aio.h
that made me think that things worked.