History log of /freebsd-current/sys/sys/event.h
Revision Date Author Comments
# 95ee2897 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: two-line .h pattern

Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/


# 67f938c5 02-Jun-2023 Mark Johnston <markj@FreeBSD.org>

kevent: Make references to filter definitions const

Follow-up revisions can make individual filter definitions const. No
functional change intended.

Reviewed by: kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D35842


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix


# 54579376 31-Mar-2023 Konstantin Belousov <kib@FreeBSD.org>

Change kqueue1() to be compatible with NetBSD

by making it accept some open(2) flags. More precisely, only
O_CLOEXEC is supported, the flag is translated into the KQUEUE_CLOEXEC flag
for kqueuex(2), and O_NONBLOCK is silently ignored.

Reported and tested by: vishwin
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D39377


# dac31024 31-Mar-2023 Konstantin Belousov <kib@FreeBSD.org>

Rename kqueue1(2) to kqueuex(2) to avoid compat issues with NetBSD

Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D39377


# 375732cc 25-Mar-2023 Konstantin Belousov <kib@FreeBSD.org>

kqueue1(2): export the symbol from libc

Reviewed by: emaste, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D39271


# 61194e98 25-Mar-2023 Konstantin Belousov <kib@FreeBSD.org>

Add kqueue1() syscall

It takes the flags argument. Immediate use is to provide the KQUEUE_CLOEXEC
flag for kqueue(2).

Reviewed by: emaste, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D39271


# 3454a7ca 20-Aug-2022 Robert Wing <rew@FreeBSD.org>

kqueue: retire knlist_init_rw_reader()

Last usage was removed in afa85850e79c1839ec33efa1138206687b952cfa.

Reviewed by: pauamma, melifaro, kib
Differential Revision: https://reviews.freebsd.org/D36205


# 91091921 23-Nov-2021 Warner Losh <imp@FreeBSD.org>

kqueue: Define older kqueue event types better

struct kqueue is designed to live in a restricted namespace, but the
older compat versions are not. Shift to using unsigned short instead
of u_short, unsigned int instead of u_int and the __*int*_t types
instead of the unprefiexed versions.

Sponsored by: Netflix
Reviewed by: brooks
Differential Revision: https://reviews.freebsd.org/D33056


# 8e4a3add 15-Nov-2021 Brooks Davis <brooks@FreeBSD.org>

struct kevent_freebsd11 -> struct freebsd11_kevent

Rename to match the naming of syscalls and allow 32 to be appended
without making an ugly name like kevent_freebsd1132.

While here, make the kevent changelist argument const.

Reviewed by: kib


# 0321a799 23-Sep-2021 Nathaniel Wesley Filardo <nfilardo@microsoft.com>

kqueue: Add EV_KEEPUDATA flag

When this flag is set, operations that update an existing kevent will
not change the udata field. This can be used to NOTE_TRIGGER or
EV_{EN,DIS}ABLE events without overwriting the stashed pointer.

Reviewed by: Domagoj Stolfa <domagoj.stolfa@gmail.com>
Obtained from: CheriBSD
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D30286


# 98168a6e 06-Sep-2021 Konstantin Belousov <kib@FreeBSD.org>

kqueue: drain kqueue taskqueue if syscall tickled it

Otherwise return from the syscall and next syscall, which could be
kevent(2) on the kqueue that should be notified, races with the kqueue
taskqueue thread, and potentially misses the wakeup. This is reliably
visible when kevent(2) only peeks into events using zeroed timeout.

PR: 258310
Reported by: arichardson, Jan Kokemüller <jan.kokemueller@gmail.com>
Reviewed by: arichardson, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31858


# f1f98706 18-Apr-2021 Warner Losh <imp@FreeBSD.org>

Minor style cleanup

We prefer 'while (0)' to 'while(0)' according to grep and stlye(9)'s
space after keyword rule. Remove a few stragglers of the latter.
Many of these usages were inconsistent within the file.

MFC After: 3 days
Sponsored by: Netflix


# e90afaa0 08-Nov-2020 Mateusz Guzik <mjg@FreeBSD.org>

kqueue: save space by using only one func pointer for assertions


# f6e54eb3 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

sys: clean up empty lines in .c and .h files


# 83ec37c8 21-Apr-2020 Kyle Evans <kevans@FreeBSD.org>

kevent32: fix the split of data into data1/data2

The current situation results in intermittent breakage if data gets split up
with the sign bit set on the data1 half of it, as PAIR32TO64 will then:
data1 | (data2 << 32) -> resulting in data1 getting sign-extended when it's
implicitly widened and clobbering the result. AFAICT, there's no compelling
reason for these to be signed.

This was most exposed by flakiness in the kqueue timer tests under compat32
after the ABSTIME test got switched over to using a better clock and
microseconds.

Reviewed by: kib
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D24518


# 792843c3 24-Nov-2018 Mark Johnston <markj@FreeBSD.org>

Pass malloc flags directly through kevent(2) subroutines.

Some kevent functions have a boolean "waitok" parameter for use when
calling malloc(9). Replace them with the corresponding malloc() flags:
the desired behaviour is known at compile-time, so this eliminates a
couple of conditional branches, and makes the code easier to read.

No functional change intended.

Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18318


# 45aecd04 21-Nov-2018 Mark Johnston <markj@FreeBSD.org>

Remove KN_HASKQLOCK.

It is a write-only flag whose last use was removed in r302235.

No functional change intended.

Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18059


# 1dfc4dbf 18-Jul-2018 David Bright <dab@FreeBSD.org>

Make the definition of struct kevent in event.h match what the man page for kevent(2) says.

This is a trivial comment-only fix. The man page for kevent(2) gives
the definition of struct kevent, including a comment on each
field. The actual definition in sys/event.h omitted the comments on
some fields. Add the comments in. Not only does this make the man page
and include file agree, but the comments are useful in and of
themselves.

Reviewed by: kib (D15778: commented that this should be a separate commit)
MFC after: 3 days
Sponsored by: Dell EMC


# 5539e88a 10-Jul-2018 David Bright <dab@FreeBSD.org>

Address some (although not all) style(9) issues in event.h after r335776.

Reported by: bde@
MFC after: 1 day
Sponsored by: Dell EMC


# 11f8e2ef 28-Jun-2018 David Bright <dab@FreeBSD.org>

Fix compilation error in r335765 under gcc 4.2.1.

The anonymous object initialization introduced in r335765 was
acceptable to clang, but not gcc 4.2.1. Fix it for both.

Reported by: jhibbits@
Pointy Hat: myself
MFC after: 1 week
X-MFC-with: r335765
Sponsored by: Dell EMC


# 240e9dcd 28-Jun-2018 David Bright <dab@FreeBSD.org>

Remove potential identifier conflict in the EV_SET macro.

PR43905 pointed out a problem with the EV_SET macro if the passed
struct kevent pointer were specified with an expression with side
effects (e.g., "kevp++"). This was fixed in rS110241, but by using a
local block that defined an internal variable (named "kevp") to get
the pointer value once. This worked, but could cause issues if an
existing variable named "kevp" is in scope. To avoid that issue,
jilles@ pointed out that "C99 compound literals and designated
initializers allow doing this cleanly using a macro". This change
incorporates that suggestion, essentially verbatim from jilles@
comment on PR43905, except retaining the old definition for pre-C99 or
non-STDC (e.g., C++) compilers.

PR: 43905
Submitted by: Jilles Tjoelker (jilles@)
Reported by: Lamont Granquist <lamont@scriptkiddie.org>
Reviewed by: jmg (no comments), jilles
MFC after: 1 week
Sponsored by: Dell EMC
Differential Revision: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=43905


# c4e20cad 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/sys: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.


# ffb66079 24-Nov-2017 John Baldwin <jhb@FreeBSD.org>

Decode kevent structures logged via ktrace(2) in kdump.

- Add a new KTR_STRUCT_ARRAY ktrace record type which dumps an array of
structures.

The structure name in the record payload is preceded by a size_t
containing the size of the individual structures. Use this to
replace the previous code that dumped the kevent arrays dumped for
kevent(). kdump is now able to decode the kevent structures rather
than dumping their contents via a hexdump.

One change from before is that the 'changes' and 'events' arrays are
not marked with separate 'read' and 'write' annotations in kdump
output. Instead, the first array is the 'changes' array, and the
second array (only present if kevent doesn't fail with an error) is
the 'events' array. For kevent(), empty arrays are denoted by an
entry with an array containing zero entries rather than no record.

- Move kevent decoding tables from truss to libsysdecode.

This adds three new functions to decode members of struct kevent:
sysdecode_kevent_filter, sysdecode_kevent_flags, and
sysdecode_kevent_fflags.

kdump uses these helper functions to pretty-print kevent fields.

- Move structure definitions for freebsd11 and freebsd32 kevent
structures to <sys/event.h> so that they can be shared with userland.
The 32-bit structures are only exposed if _WANT_KEVENT32 is defined.
The freebsd11 structures are only exposed if _WANT_FREEBSD11_KEVENT is
defined. The 32-bit freebsd11 structure requires both.

- Decode freebsd11 kevent structures in truss for the compat11.kevent()
system call.

- Log 32-bit kevent structures via ktrace for 32-bit compat kevent()
system calls.

- While here, constify the 'void *data' argument to ktrstruct().

Reviewed by: kib (earlier version)
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D12470


# 2b34e843 16-Jun-2017 Konstantin Belousov <kib@FreeBSD.org>

Add abstime kqueue(2) timers and expand struct kevent members.

This change implements NOTE_ABSTIME flag for EVFILT_TIMER, which
specifies that the data field contains absolute time to fire the
event.

To make this useful, data member of the struct kevent must be extended
to 64bit. Using the opportunity, I also added ext members. This
changes struct kevent almost to Apple struct kevent64, except I did
not changed type of ident and udata, the later would cause serious API
incompatibilities.

The type of ident was kept uintptr_t since EVFILT_AIO returns a
pointer in this field, and e.g. CHERI is sensitive to the type
(discussed with brooks, jhb).

Unlike Apple kevent64, symbol versioning allows us to claim ABI
compatibility and still name the new syscall kevent(2). Compat shims
are provided for both host native and compat32.

Requested by: bapt
Reviewed by: bapt, brooks, ngie (previous version)
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D11025


# fe817852 07-Apr-2017 Patrick Kelsey <pkelsey@FreeBSD.org>

Fix typo.

hist -> hint

MFC after: 3 days


# 92bb8c68 13-Feb-2017 Ed Schouten <ed@FreeBSD.org>

Make <sys/event.h> work on its own.

Right now this header file doesn't want to build when included on its
own, as it depends on some integer types that are only declared
internally. Switch it over to use <sys/_types.h> and the __*
counterparts.


# 7d03ff1f 16-Jan-2017 Hiren Panchasara <hiren@FreeBSD.org>

Add kevent EVFILT_EMPTY for notification when a client has received all data
i.e. everything outstanding has been acked.

Reviewed by: bz, gnn (previous version)
MFC after: 3 days
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D9150


# 67319388 02-Jan-2017 Konstantin Belousov <kib@FreeBSD.org>

Remove unneeded externs keywords. Reindent long lines.

Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# fd30dd7c 26-Dec-2016 Konstantin Belousov <kib@FreeBSD.org>

Make knote KN_INFLUX state counted. This is final fix for the issue
closed by r310302 for knote().

If KN_INFLUX | KN_SCAN flags are set for the note passed to knote() or
knote_fork(), i.e. the knote is scanned, we might erronously clear
INFLUX when finishing notification. For normal knote() it was fixed
in r310302 simply by remembering the fact that we do not own
KN_INFLUX, since there we own knlist lock and scan thread cannot clear
KN_INFLUX until we drop the lock. For knote_fork(), the situation is
more complicated, e must drop knlist lock AKA the process lock, since
we need to register new knotes.

Change KN_INFLUX into counter and allow shared ownership of the
in-flux state between scan and knote_fork() or knote(). Both in-flux
setters need to ensure that knote is not dropped in parallel. Added
assert about kn_influx == 1 in knote_drop() verifies that in-flux state
is not shared when knote is destroyed.

Since KBI of the struct knote is changed by addition of the int
kn_influx field, reorder kn_hook and kn_hookid to fill pad on LP64
arches [1]. This keeps sizeof(struct knote) to same 128 bytes as it
was before addition of kn_influx, on amd64.

Reviewed by: markj
Suggested by: markj [1]
Tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D8898


# fc05543f 25-Dec-2016 Konstantin Belousov <kib@FreeBSD.org>

Some optimizations for kqueue timers.

There is no need to do two allocations per kqueue timer. Gather all
data needed by the timer callout into the structure and allocate it at
once.

Use the structure to preserve the result of timer2sbintime(), to not
perform repeated 64bit calculations in callout.

Remove tautological casts.
Remove now unused p_nexttime [1].

Noted by: markj [1]
Reviewed by: markj (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
X-MFC note: do not remove p_nexttime
Differential revision: https://reviews.freebsd.org/D8901


# 9eb3f143 27-Jun-2016 Konstantin Belousov <kib@FreeBSD.org>

Fix userspace build after r302235: do not expose bool field of the
structure, change it to int.

The real fix is to sanitize user-visible definitions in sys/event.h,
e.g. the affected struct knlist is of no use for userspace programs.

Reported and tested by: jkim
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Approved by: re (gjb)


# 9e590ff0 27-Jun-2016 Konstantin Belousov <kib@FreeBSD.org>

When filt_proc() removes event from the knlist due to the process
exiting (NOTE_EXIT->knlist_remove_inevent()), two things happen:
- knote kn_knlist pointer is reset
- INFLUX knote is removed from the process knlist.
And, there are two consequences:
- KN_LIST_UNLOCK() on such knote is nop
- there is nothing which would block exit1() from processing past the
knlist_destroy() (and knlist_destroy() resets knlist lock pointers).
Both consequences result either in leaked process lock, or
dereferencing NULL function pointers for locking.

Handle this by stopping embedding the process knlist into struct proc.
Instead, the knlist is allocated together with struct proc, but marked
as autodestroy on the zombie reap, by knlist_detach() function. The
knlist is freed when last kevent is removed from the list, in
particular, at the zombie reap time if the list is empty. As result,
the knlist_remove_inevent() is no longer needed and removed.

Other changes:

In filt_procattach(), clear NOTE_EXEC and NOTE_FORK desired events
from kn_sfflags for knote registered by kernel to only get NOTE_CHILD
notifications. The flags leak resulted in excessive
NOTE_EXEC/NOTE_FORK reports.

Fix immediate note activation in filt_procattach(). Condition should
be either the immediate CHILD_NOTE activation, or immediate NOTE_EXIT
report for the exiting process.

In knote_fork(), do not perform racy check for KN_INFLUX before kq
lock is taken. Besides being racy, it did not accounted for notes
just added by scan (KN_SCAN).

Some minor and incomplete style fixes.

Analyzed and tested by: Eric Badger <eric@badgerio.us>
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Approved by: re (gjb)
Differential revision: https://reviews.freebsd.org/D6859


# c89e1b87 03-May-2016 Konstantin Belousov <kib@FreeBSD.org>

Add EVFILT_VNODE open, read and close notifications.

While there, order EVFILT_VNODE notes descriptions alphabetically.

Based on submission, and tested by: Vladimir Kondratyev <wulf@cicgroup.ru>
MFC after: 2 weeks


# 5652770d 05-Feb-2016 John Baldwin <jhb@FreeBSD.org>

Rename aiocblist to kaiocb and use consistent variable names.

Typically <foo>list is used for a structure that holds a list head in
FreeBSD, not for members of a list. As such, rename 'struct aiocblist'
to 'struct kaiocb' (the kernel version of 'struct aiocb').

While here, use more consistent variable names for AIO control blocks:

- Use 'job' instead of 'aiocbe', 'cb', 'cbe', or 'iocb' for kernel job
objects.
- Use 'jobn' instead of 'cbn' for use with TAILQ_FOREACH_SAFE().
- Use 'sjob' and 'sjobn' instead of 'scb' and 'scbn' for fsync jobs.
- Use 'ujob' instead of 'aiocbp', 'job', 'uaiocb', or 'uuaiocb' to hold
a user pointer to a 'struct aiocb'.
- Use 'ujobp' instead of 'aiocbp' for a user pointer to a 'struct aiocb *'.

Reviewed by: kib
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D5125


# 0e3d6ed4 28-Jan-2016 Eric van Gyzen <vangyzen@FreeBSD.org>

kqueue EVFILT_PROC: avoid collision between NOTE_CHILD and NOTE_EXIT

NOTE_CHILD and NOTE_EXIT return something in kevent.data: the parent
pid (ppid) for NOTE_CHILD and the exit status for NOTE_EXIT.
Do not let the two events be combined, since one would overwrite
the other's data.

PR: 180385
Submitted by: David A. Bright <david_a_bright@dell.com>
Reviewed by: jhb
MFC after: 1 month
Sponsored by: Dell Inc.
Differential Revision: https://reviews.freebsd.org/D4900


# 2433a4eb 05-Aug-2015 Ed Schouten <ed@FreeBSD.org>

Make it possible to implement poll(2) on top of kqueue(2).

It looks like EVFILT_READ and EVFILT_WRITE trigger under the same
conditions as poll()'s POLLRDNORM and POLLWRNORM as described by POSIX.
The only difference is that POLLRDNORM has to be triggered on regular
files unconditionally, whereas EVFILT_READ only triggers when not EOF.

Introduce a new flag, NOTE_FILE_POLL, that can be used to make
EVFILT_READ and EVFILT_WRITE behave identically to poll(). This flag
will be used by cloudlibc's poll() function.

Reviewed by: jmg
Differential Revision: https://reviews.freebsd.org/D3303


# 2c30bc1f 15-Nov-2014 John-Mark Gurney <jmg@FreeBSD.org>

prevent doing filter ops locking for staticly compiled filter ops...
This significantly reduces lock contention when adding/removing knotes
on busy multi-kq system... Next step is to cache these references per
kq.. i.e. kq refs it once and keeps a local ref count so that the same
refs don't get accessed by many cpus...

only allocate a knote when we might use it...

Add a new flag, _FORCEONESHOT.. This allows a thread to force the
delivery of another event in a safe manner, say waking up an idle http
connection to force it to be reaped...

If we are _DISABLE'ing a knote, don't bother to call f_event on it, it's
disabled, so won't be delivered anyways..

Tested by: adrian


# 41e8f7ef 04-Oct-2014 Ian Lepore <ian@FreeBSD.org>

Make kevent(2) periodic timer events more reliably periodic. The event
callout is now scheduled using the C_ABSOLUTE flag, and the absolute time
of each event is calculated as the time the previous event was scheduled
for plus the interval. This ensures that latency in processing a given
event doesn't perturb the arrival time of any subsequent events.

Reviewed by: jhb


# 3b12cd43 24-Jul-2014 Baptiste Daroussin <bapt@FreeBSD.org>

Fix a typo in a comment

Reported by: jhb


# 42e62eca 18-Jul-2014 Baptiste Daroussin <bapt@FreeBSD.org>

Extend kqueue's EVFILT_TIMER by adding precision unit flags support

Define the precision macros as bits sets to conform with XNU equivalent.
Test fflags passed for EVFILT_TIMER and return EINVAL in case an invalid flag
is passed.

Phabric: https://phabric.freebsd.org/D421
Reviewed by: kib


# 38219d6a 07-Apr-2014 Ed Schouten <ed@FreeBSD.org>

Implement kqueue(2) for procdesc(4).

kqueue(2) already supports EVFILT_PROC. Add an EVFILT_PROCDESC that
behaves the same, but operates on a procdesc(4) instead. Only implement
NOTE_EXIT for now. The nice thing about NOTE_EXIT is that it also
returns the exit status of the process, meaning that we can now obtain
this value, even if pdwait4(2) is still unimplemented.

Notes:

- Simply reuse EVFILT_NETDEV for EVFILT_PROCDESC. As both of these will
be used on totally different descriptor types, this should not clash.

- Let procdesc_kqops_event() reuse the same structure as filt_proc().
The only difference is that procdesc_kqops_event() should also be able
to deal with the case where the process was already terminated after
registration. Simply test this when hint == 0.

- Fix some style(9) issues in filt_proc() to keep it consistent with the
newly added procdesc_kqops_event().

- Save the exit status of the process in pd->pd_xstat, as we cannot pick
up the proctree_lock from within procdesc_kqops_event().

Discussed on: arch@
Reviewed by: kib@


# 1a5edcf8 05-Apr-2014 Konstantin Belousov <kib@FreeBSD.org>

When KN_INFLUX is set on the knote due to kqueue_register() or
kqueue_scan() unlocking the kqueue to call f_event, knote() or
knote_fork() should not skip the knote. The knote is not going to
disappear during the influx time, and the mutual exclusion between
scan and knote() is ensured by both code pathes taking knlist lock.
The race appears since knlist lock is before kq lock, so KN_INFLUX
must be set, kq lock must be dropped and only then knlist lock can be
taken. The window between kq unlock and knlist lock causes lost
events.

Add a flag KN_SCAN to indicate that KN_INFLUX is set in a manner safe
for the knote(), and check for it to ignore KN_INFLUX in the knote*()
as needed. Also, in knote(), remove the lockless check for the
KN_INFLUX flag, which could also result in the lost notification.

Reported and tested by: Kohji Okuno <okuno.kohji@jp.panasonic.com>
Discussed with: jmg
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 46d3e7e9 07-Jan-2014 Adrian Chadd <adrian@FreeBSD.org>

Reserve an event type for the upcoming EVENT_SENDFILE and
extend the event struct pointer union to allow for 'other' types.

Sponsored by: Netflix, Inc.


# b12698e1 18-Sep-2013 Roman Divacky <rdivacky@FreeBSD.org>

Revert r255672, it has some serious flaws, leaking file references etc.

Approved by: re (delphij)


# 253c75c0 18-Sep-2013 Roman Divacky <rdivacky@FreeBSD.org>

Implement epoll support in Linuxulator. This is a tiny wrapper around kqueue
to implement epoll subset of functionality. The kqueue user data are 32bit
on i386 which is not enough for epoll user data so this patch overrides
kqueue fileops to maintain enough space in struct file.

Initial patch developed by me in 2007 and then extended and finished
by Yuri Victorovich.

Approved by: re (delphij)
Sponsored by: Google Summer of Code
Submitted by: Yuri Victorovich <yuri at rawbw dot com>
Tested by: Yuri Victorovich <yuri at rawbw dot com>


# e8de242d 13-Sep-2013 Konstantin Belousov <kib@FreeBSD.org>

Use TAILQ instead of STAILQ for kqeueue filedescriptors to ensure constant
time removal on kqueue close.

Reported and tested by: pho
Reviewed by: jmg
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Approved by: re (delphij)


# 5b596f0f 07-Aug-2013 John Baldwin <jhb@FreeBSD.org>

Don't emit a spurious EVFILT_PROC event with no fflags set on process exit
if NOTE_EXIT is not being monitored. The rationale is that a listener
should only get an event for exit() if they registered interest via
NOTE_EXIT. This matches the behavior on OS X.
- Don't save the exit status on process exit unless NOTE_EXIT is being
monitored.
- Add an internal EV_DROP flag that requests kqueue_scan() to free the
knote without signalling it to userland and use this when a process
exits but the fflags in the knote is zero.

Reviewed by: jmg
MFC after: 1 month


# b25711e6 26-Mar-2012 Alexander V. Chernikov <melifaro@FreeBSD.org>

- Add knlist_init_rw_reader() function to kqueue(9).
Function acquired reader lock if needed.
Assert check for reader or writer lock (RA_LOCKED / RA_UNLOCKED)
- While here, add knlist_init_mtx.9 to MLINKS and fix some style(9) issues

Reviewed by: glebius
Approved by: ae(mentor)

MFC after: 2 weeks


# 8f80f103 06-Nov-2011 Ed Schouten <ed@FreeBSD.org>

Remove MALLOC_DECLAREs of nonexisting malloc-pools.

After careful grepping, it seems none of these pools can be found in our
source tree. They are not in use, nor are they defined.


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# 52c240aa 22-Jan-2010 Brooks Davis <brooks@FreeBSD.org>

MFC r201350:

The devices that supported EVFILT_NETDEV kqueue filters were removed in
r195175. Remove all definitions, documentation, and usage.

The change of function signature for vlan_link_state() was not merged to
maintain the ABI.


# a6fffd6c 31-Dec-2009 Brooks Davis <brooks@FreeBSD.org>

The devices that supported EVFILT_NETDEV kqueue filters were removed in
r195175. Remove all definitions, documentation, and usage.

fifo_misc.c:
Remove all kqueue tests as fifo_io.c performs all those that
would have remained.

Reviewed by: rwatson
MFC after: 3 weeks
X-MFC note: don't change vlan_link_state() function signature


# 83613795 31-Oct-2009 Stacey Son <sson@FreeBSD.org>

MFC 197240,197241,197242,197243,197293,197294,197407:

Add EVFILT_USER filter and EV_DISPATCH/EV_RECEIPT flags to kevent(2).

Approved by: rwatson (mentor)


# 1c2825bd 22-Sep-2009 Roman Divacky <rdivacky@FreeBSD.org>

Change unsigned foo to u_foo as required by style(9).

Requested by: bde
Approved by: ed (mentor)


# 6413e27b 17-Sep-2009 Roman Divacky <rdivacky@FreeBSD.org>

Fix the style of the previous commit.

Approved by: ed (mentor, implicit)


# abc8594d 17-Sep-2009 Roman Divacky <rdivacky@FreeBSD.org>

Make these argument/variable unsigned as the defines for them don't fit
into signed 32bit integer.

Approved by: ed (mentor, implicit)
Approved by: sson


# fdc1a113 15-Sep-2009 Stacey Son <sson@FreeBSD.org>

Add EV_RECEIPT to kevents.

EV_RECEIPT is useful to disambiguating error conditions when multiple
events structures are passed to kevent(2). The error code is returned
in the data field and EV_ERROR is set.

Approved by: rwatson (co-mentor)


# 1a921c41 15-Sep-2009 Stacey Son <sson@FreeBSD.org>

Add the EV_DISPATCH flag to kevents.

When the EV_DISPATCH flag is used the event source will be disabled
immediately after the delivery of an event. This is similar to the
EV_ONESHOT flag but it doesn't delete the event.

Approved by: rwatson (co-mentor)


# 2c2e4499 15-Sep-2009 Stacey Son <sson@FreeBSD.org>

Add EVFILT_USER to kevents.

Add user events support to kernel events which are not associated with any
kernel mechanism but are triggered by user level code. This is useful for
adding user level events to an event handler that may also be monitoring
kernel events.

Approved by: rwatson (co-mentor)


# 95128e98 15-Sep-2009 Stacey Son <sson@FreeBSD.org>

Add optional touch event filter hooks to kevents.

The touch event filter is called when a kernel event data is possibly
updated. There are two hook points. First, during a kevent() system
call. Second, when an event has been triggered.

Approved by: rwatson (co-mentor)


# fe1d3f15 28-Jun-2009 Stanislav Sedov <stas@FreeBSD.org>

- Turn the third (islocked) argument of the knote call into flags parameter.
Introduce the new flag KNF_NOKQLOCK to allow event callers to be called
without KQ_LOCK mtx held.
- Modify VFS knote calls to always use KNF_NOKQLOCK flag. This is required
for ZFS as its getattr implementation may sleep.

Approved by: re (rwatson)
Reviewed by: kib
MFC after: 2 weeks


# d8b0556c 10-Jun-2009 Konstantin Belousov <kib@FreeBSD.org>

Adapt vfs kqfilter to the shared vnode lock used by zfs write vop. Use
vnode interlock to protect the knote fields [1]. The locking assumes
that shared vnode lock is held, thus we get exclusive access to knote
either by exclusive vnode lock protection, or by shared vnode lock +
vnode interlock.

Do not use kl_locked() method to assert either lock ownership or the
fact that curthread does not own the lock. For shared locks, ownership
is not recorded, e.g. VOP_ISLOCKED can return LK_SHARED for the shared
lock not owned by curthread, causing false positives in kqueue subsystem
assertions about knlist lock.

Remove kl_locked method from knlist lock vector, and add two separate
assertion methods kl_assert_locked and kl_assert_unlocked, that are
supposed to use proper asserts. Change knlist_init accordingly.

Add convenience function knlist_init_mtx to reduce number of arguments
for typical knlist initialization.

Submitted by: jhb [1]
Noted by: jhb [2]
Reviewed by: jhb
Tested by: rnoland


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# 7054ee4e 07-Jul-2008 Konstantin Belousov <kib@FreeBSD.org>

The kqueue_register() function assumes that it is called from the top of
the syscall code and acquires various event subsystem locks as needed.
The handling of the NOTE_TRACK for EVFILT_PROC is currently done by
calling the kqueue_register() from filt_proc() filter, causing recursive
entrance of the kqueue code. This results in the LORs and recursive
acquisition of the locks.

Implement the variant of the knote() function designed to only handle
the fork() event. It mostly copies the knote() body, but also handles
the NOTE_TRACK, removing the handling from the filt_proc(), where it
causes problems described above. The function is called from the fork1()
instead of knote().

When encountering NOTE_TRACK knote, it marks the knote as influx
and drops the knlist and kqueue lock. In this context call to
kqueue_register is safe from the problems.

An error from the kqueue_register() is reported to the observer as
NOTE_TRACKERR fflag.

PR: 108201
Reviewed by: jhb, Pramod Srinivasan <pramod juniper net> (previous version)
Discussed with: jmg
Tested by: pho
MFC after: 2 weeks


# a8afa221 24-Jan-2008 Jean-Sébastien Pédron <dumbbell@FreeBSD.org>

When asked to use kqueue, AIO stores its internal state in the
`kn_sdata' member of the newly registered knote. The problem is that
this member is overwritten by a call to kevent(2) with the EV_ADD flag,
targetted at the same kevent/knote. For instance, a userland application
may set the pointer to NULL, leading to a panic.

A testcase was provided by the submitter.

PR: kern/118911
Submitted by: MOROHOSHI Akihiko <moro@remus.dti.ne.jp>
MFC after: 1 day


# 4db71d27 23-Sep-2006 John-Mark Gurney <jmg@FreeBSD.org>

hide kqueue_register from public view, and replace it w/ kqfd_register...
this eliminates a possible race in aio registering a kevent..


# cc548bd9 16-Mar-2006 Poul-Henning Kamp <phk@FreeBSD.org>

Remove nested includes of <sys/_lock.h> and <sys/_mutex.h> which spill into
userland. The comment indicated that something in userland needed them, but
make universe can't seem to find any traces of it.

Move <sys/queue.h> include up.


# b484d9f6 24-Nov-2005 Ruslan Ermilov <ru@FreeBSD.org>

Fix prototype to match the code and documentation.


# 9ac16277 12-Oct-2005 Doug Ambrisko <ambrisko@FreeBSD.org>

Use a better EVFILT_LIO description!

Submitted by: alc


# 69cd28da 12-Oct-2005 Doug Ambrisko <ambrisko@FreeBSD.org>

Add in kqueue support to LIO event notification and fix how it handled
notifications when LIO operations completed. These were the problems
with LIO event complete notification:
- Move all LIO/AIO event notification into one general function
so we don't have bugs in different data paths. This unification
got rid of several notification bugs one of which if kqueue was
used a SIGILL could get sent to the process.
- Change the LIO event accounting to count all AIO request that
could have been split across the fast path and daemon mode.
The prior accounting only kept track of AIO op's in that
mode and not the entire list of operations. This could cause
a bogus LIO event complete notification to occur when all of
the fast path AIO op's completed and not the AIO op's that
ended up queued for the daemon.

Suggestions from: alc


# 571dcd15 01-Jul-2005 Suleiman Souhlal <ssouhlal@FreeBSD.org>

Fix the recent panics/LORs/hangs created by my kqueue commit by:

- Introducing the possibility of using locks different than mutexes
for the knlist locking. In order to do this, we add three arguments to
knlist_init() to specify the functions to use to lock, unlock and
check if the lock is owned. If these arguments are NULL, we assume
mtx_lock, mtx_unlock and mtx_owned, respectively.

- Using the vnode lock for the knlist locking, when doing kqueue operations
on a vnode. This way, we don't have to lock the vnode while holding a
mutex, in filt_vfsread.

Reviewed by: jmg
Approved by: re (scottl), scottl (mentor override)
Pointyhat to: ssouhlal
Will be happy: everyone


# 679985d0 09-Jun-2005 Suleiman Souhlal <ssouhlal@FreeBSD.org>

Allow EVFILT_VNODE events to work on every filesystem type, not just
UFS by:
- Making the pre and post hooks for the VOP functions work even when
DEBUG_VFS_LOCKS is not defined.
- Moving the KNOTE activations into the corresponding VOP hooks.
- Creating a MNTK_NOKNOTE flag for the mnt_kern_flag field of struct
mount that permits filesystems to disable the new behavior.
- Creating a default VOP_KQFILTER function: vfs_kqfilter()

My benchmarks have not revealed any performance degradation.

Reviewed by: jeff, bde
Approved by: rwatson, jmg (kqueue changes), grehan (mentor)


# efe5beca 03-Jun-2005 Paul Saab <ps@FreeBSD.org>

Wrap copyin/copyout for kevent so the 32bit wrapper does not have
to malloc nchanges * sizeof(struct kevent) AND/OR nevents *
sizeof(struct kevent) on every syscall.

Glanced at by: peter, jmg
Obtained from: Yahoo!
MFC after: 2 weeks


# a517cb30 25-Mar-2005 John-Mark Gurney <jmg@FreeBSD.org>

remove unimplemented part of the interface..

MFC after: 3 days


# c4c44d29 17-Mar-2005 John-Mark Gurney <jmg@FreeBSD.org>

fix aio+kq... I've been running ambrisko's test program for much longer
w/o problems than I was before... This simply brings back the knote_delete
as knlist_delete which will also drop the knote's, instead of just clearing
the list and seeing _ONESHOT...

Fix a race where if a note was _INFLUX and _DETACHED, it could end up being
modified... whoopse..

MFC after: 1 week
Prodded by: ambrisko and dwhite


# b8a4edc1 01-Mar-2005 Paul Saab <ps@FreeBSD.org>

Use kern_kevent instead of the stackgap for 32bit syscall wrapping.

Submitted by: jhb
Tested on: amd64


# aafa519a 15-Aug-2004 John-Mark Gurney <jmg@FreeBSD.org>

move the declaration of struct kqlist into the non-KERNEL visable section
to fix userland.


# ad3b9257 15-Aug-2004 John-Mark Gurney <jmg@FreeBSD.org>

Add locking to the kqueue subsystem. This also makes the kqueue subsystem
a more complete subsystem, and removes the knowlege of how things are
implemented from the drivers. Include locking around filter ops, so a
module like aio will know when not to be unloaded if there are outstanding
knotes using it's filter ops.

Currently, it uses the MTX_DUPOK even though it is not always safe to
aquire duplicate locks. Witness currently doesn't support the ability
to discover if a dup lock is ok (in some cases).

Reviewed by: green, rwatson (both earlier versions)


# 215bb69f 14-Jul-2004 Alfred Perlstein <alfred@FreeBSD.org>

do { } while(0) KNOTE macro, whitespace


# 94ed9c8a 04-Jul-2004 Alfred Perlstein <alfred@FreeBSD.org>

Introduce a new kevent filter. EVFILT_FS that will be used to signal
generic filesystem events to userspace. Currently only mount and unmount
of filesystems are signalled. Soon to be added, up/down status of NFS.

Introduce a sysctl node used to route requests to/from filesystems
based on filesystem ids.

Introduce a new vfsop, vfs_sysctl(mp, req) that is used as the callback/
entrypoint by the sysctl code to change individual filesystems.


# 485a42f4 02-Feb-2003 Jacques Vidrine <nectar@FreeBSD.org>

Tweak the definition of the EV_SET macro so that it evaluates each
of its arguments exactly once. (Previously it evaluated the first
argument six times.)

MFC after: 1 week


# 83b9e252 29-Jun-2002 Bruce Evans <bde@FreeBSD.org>

Updated a comment. Namspace pollution in <sys/select.h> is now moot since
it was moved to <sys/selinfo.h>.

Fixed indentation of $FreeBSD$.


# 80208239 28-Jun-2002 Alfred Perlstein <alfred@FreeBSD.org>

More caddr_t removal.
Change struct knote's kn_hook from caddr_t to void *.


# c58eb46e 23-Mar-2002 Bruce Evans <bde@FreeBSD.org>

Fixed some style bugs in the removal of __P(()). The main ones were
not removing tabs before "__P((", and not outdenting continuation lines
to preserve non-KNF lining up of code with parentheses. Switch to KNF
formatting and/or rewrap the whole prototype in some cases.


# 789f12fe 19-Mar-2002 Alfred Perlstein <alfred@FreeBSD.org>

Remove __P


# 21d56e9c 29-Dec-2001 Alfred Perlstein <alfred@FreeBSD.org>

Make AIO a loadable module.

Remove the explicit call to aio_proc_rundown() from exit1(), instead AIO
will use at_exit(9).

Add functions at_exec(9), rm_at_exec(9) which function nearly the
same as at_exec(9) and rm_at_exec(9), these functions are called
on behalf of modules at the time of execve(2) after the image
activator has run.

Use a modified version of tegge's suggestion via at_exec(9) to close
an exploitable race in AIO.

Fix SYSCALL_MODULE_HELPER such that it's archetecuterally neutral,
the problem was that one had to pass it a paramater indicating the
number of arguments which were actually the number of "int". Fix
it by using an inline version of the AS macro against the syscall
arguments. (AS should be available globally but we'll get to that
later.)

Add a primative system for dynamically adding kqueue ops, it's really
not as sophisticated as it should be, but I'll discuss with jlemon when
he's around.


# 9a2a57a1 29-Sep-2001 Jonathan Lemon <jlemon@FreeBSD.org>

Add ability to attach knotes to network devices.
Introduce EVFILT_NETDEV to report network device changes.


# 0217f5c7 29-Sep-2001 Jonathan Lemon <jlemon@FreeBSD.org>

Have EVFILT_TIMERS allocate their callouts via malloc() instead of using
the static callout list allocated by the system.

Change malloc type from M_TEMP to M_KQUEUE to better track memory.

Add a kern.kq_calloutmax to globally limit the amount of kernel memory
that can be allocated by callouts.

Submitted by: iedowse (items 1, 2)


# b40ce416 12-Sep-2001 Julian Elischer <julian@FreeBSD.org>

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 5f5c2e95 19-Jul-2001 Jonathan Lemon <jlemon@FreeBSD.org>

Introduce EVFILT_TIMER, which allows a process to establish an
arbitrary number of timers, both oneshot and periodic.

Repeatedly reminded to commit by: jayanth
Reviewed by: peter (a while back)


# 24607d88 23-Feb-2001 Jonathan Lemon <jlemon@FreeBSD.org>

Add an EV_SET() convenience macro for initializing struct kevent prior
to the call to kevent().

Update the copyright notices as well.


# da403b9d 23-Feb-2001 Jonathan Lemon <jlemon@FreeBSD.org>

Introduce a NOTE_LOWAT flag for use with the read/write filters, which
allow the watermark to be passed in via the data field during the EV_ADD
operation.

Hook this up to the socket read/write filters; if specified, it overrides
the so_{rcv|snd}.sb_lowat values in the filter.

Inspired by: "Ronald F. Guilmette" <rfg@monkeys.com>


# 7df2842d 23-Feb-2001 Jonathan Lemon <jlemon@FreeBSD.org>

Add a NOTE_REVOKE flag for vnodes, which is triggered from within vclean().
Use this to tell a filter attached to a vnode that the underlying vnode is
no longer valid, by returning EV_EOF.

PR: kern/25309, kern/25206


# 608a3ce6 15-Feb-2001 Jonathan Lemon <jlemon@FreeBSD.org>

Extend kqueue down to the device layer.

Backwards compatible approach suggested by: peter


# a8e65b91 18-Jul-2000 Jonathan Lemon <jlemon@FreeBSD.org>

Simplify kqueue API slightly.

Discussed on: -arch


# e3975643 25-May-2000 Jake Burkholder <jake@FreeBSD.org>

Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by: msmith and others


# 740a1973 23-May-2000 Jake Burkholder <jake@FreeBSD.org>

Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by: phk
Reviewed by: phk
Approved by: mdodd


# 5e97a562 04-May-2000 Jonathan Lemon <jlemon@FreeBSD.org>

Change the definition of sdata from u_long --> intptr_t to correctly
match the data type in struct kevent.


# 76690876 18-Apr-2000 Jonathan Lemon <jlemon@FreeBSD.org>

Add forward declaration of `struct timespec' to quiet compiler warnings.


# 2ae7162e 17-Apr-2000 Jonathan Lemon <jlemon@FreeBSD.org>

Add user prototypes for kevent() and kqueue().


# 94f7814e 16-Apr-2000 Jonathan Lemon <jlemon@FreeBSD.org>

change {u}long -> {u}intptr_t to make it clear that these fields
may also contain pointers as well.


# 3ee12e4f 16-Apr-2000 Jonathan Lemon <jlemon@FreeBSD.org>

Add files that I forgot to `cvs add' on last commit.