History log of /freebsd-current/sys/geom/geom_kern.c
Revision Date Author Comments
# fdafd315 24-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Automated cleanup of cdefs and other formatting

Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.

Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/

Sponsored by: Netflix


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# ffc1cc95 28-Jan-2022 Alexander Motin <mav@FreeBSD.org>

GEOM: Relax direct dispatch for GEOM threads.

The only cases when direct dispatch does not make sense is for I/O
submission from down thread and for completion from up thread. In
all other cases, if both consumer and producer are OK about it, we
can save on context switches.

MFC after: 2 weeks


# c2da9542 10-Aug-2021 Alexander Motin <mav@FreeBSD.org>

geom(4): Mark all sysctls as CTLFLAG_MPSAFE.

This code does not use Giant lock for very long time.

MFC after: 2 weeks


# d40bc607 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

geom: clean up empty lines in .c and .h files


# 7029da5c 26-Feb-2020 Pawel Biernacki <kaktus@FreeBSD.org>

Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)

r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718


# 61322a0a 04-Dec-2019 Alexander Motin <mav@FreeBSD.org>

Mark some more hot global variables with __read_mostly.

MFC after: 1 week


# 5c32e9fc 18-Jun-2019 Alexander Motin <mav@FreeBSD.org>

Optimize kern.geom.conf* sysctls.

On large systems those sysctls may generate megabytes of output. Before
this change sbuf(9) code was resizing buffer by 4KB each time many times,
generating tons of TLB shootdowns. Unfortunately in this case existing
sbuf_new_for_sysctl() mechanism, supposed to help with this issue, is not
applicable, since all the sbuf writes are done in different kernel thread.

This change improves situation in two ways:
- on first sysctl call, not providing any output buffer, it sets special
sbuf drain function, just counting the data and so not needing big buffer;
- on second sysctl call it uses as initial buffer size value saved on
previous call, so that in most cases there will be no reallocation, unless
GEOM topology changed significantly.

MFC after: 1 week
Sponsored by: iXsystems, Inc.


# 3728855a 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/geom: adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.


# d5446cc8 20-May-2016 Konstantin Belousov <kib@FreeBSD.org>

Remove unneeded Giant locking around kthreads creation.

Sponsored by: The FreeBSD Foundation


# dff9131e 20-May-2016 Konstantin Belousov <kib@FreeBSD.org>

Remove asserts that Giant is not held on entrance into geom KPI, which
outlived their usefulness. This allows to remove drop/pickup Giant
wrappers around GEOM calls.

Discussed with: alfred, imp, phk
Sponsored by: The FreeBSD Foundation


# e8d57122 29-Apr-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/geom: spelling fixes in comments.

No functional change.


# f0188618 21-Oct-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Fix multiple incorrect SYSCTL arguments in the kernel:

- Wrong integer type was specified.

- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.

- Logical OR where binary OR was expected.

- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.

- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.

- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.

- Updated "EXAMPLES" section in SYSCTL manual page.

MFC after: 3 days
Sponsored by: Mellanox Technologies


# af3b2549 27-Jun-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Pull in r267961 and r267973 again. Fix for issues reported will follow.


# 37a107a4 27-Jun-2014 Glen Barber <gjb@FreeBSD.org>

Revert r267961, r267973:

These changes prevent sysctl(8) from returning proper output,
such as:

1) no output from sysctl(8)
2) erroneously returning ENOMEM with tools like truss(1)
or uname(1)
truss: can not get etype: Cannot allocate memory


# 3da1cf1e 27-Jun-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after: 2 weeks
Sponsored by: Mellanox Technologies


# 40ea77a0 22-Oct-2013 Alexander Motin <mav@FreeBSD.org>

Merge GEOM direct dispatch changes from the projects/camlock branch.

When safety requirements are met, it allows to avoid passing I/O requests
to GEOM g_up/g_down thread, executing them directly in the caller context.
That allows to avoid CPU bottlenecks in g_up/g_down threads, plus avoid
several context switches per I/O.

The defined now safety requirements are:
- caller should not hold any locks and should be reenterable;
- callee should not depend on GEOM dual-threaded concurency semantics;
- on the way down, if request is unmapped while callee doesn't support it,
the context should be sleepable;
- kernel thread stack usage should be below 50%.

To keep compatibility with GEOM classes not meeting above requirements
new provider and consumer flags added:
- G_CF_DIRECT_SEND -- consumer code meets caller requirements (request);
- G_CF_DIRECT_RECEIVE -- consumer code meets callee requirements (done);
- G_PF_DIRECT_SEND -- provider code meets caller requirements (done);
- G_PF_DIRECT_RECEIVE -- provider code meets callee requirements (request).
Capable GEOM class can set them, allowing direct dispatch in cases where
it is safe. If any of requirements are not met, request is queued to
g_up or g_down thread same as before.

Such GEOM classes were reviewed and updated to support direct dispatch:
CONCAT, DEV, DISK, GATE, MD, MIRROR, MULTIPATH, NOP, PART, RAID, STRIPE,
VFS, ZERO, ZFS::VDEV, ZFS::ZVOL, all classes based on g_slice KPI (LABEL,
MAP, FLASHMAP, etc).

To declare direct completion capability disk(9) KPI got new flag equivalent
to G_PF_DIRECT_SEND -- DISKFLAG_DIRECT_COMPLETION. da(4) and ada(4) disk
drivers got it set now thanks to earlier CAM locking work.

This change more then twice increases peak block storage performance on
systems with manu CPUs, together with earlier CAM locking changes reaching
more then 1 million IOPS (512 byte raw reads from 16 SATA SSDs on 4 HBAs to
256 user-level threads).

Sponsored by: iXsystems, Inc.
MFC after: 2 months


# 1b2cb2b3 24-Sep-2013 Dag-Erling Smørgrav <des@FreeBSD.org>

Introduce a kern.geom.notaste sysctl that can be used to temporarily
disable GEOM tasting to avoid the "bouncing GEOM" problem where, when
you shut down the consumer of a provider which can be viewed in multiple
ways (typically a mirror whose members are labeled partitions), GEOM
will immediately taste that provider's alter ego and reattach the
consumer.

Approved by: re (glebius)


# b2901e99 11-May-2011 Andrew Thompson <thompsa@FreeBSD.org>

Move the three geom kprocs as threads under a single pid.

Reviewed by: julian


# f7842e00 22-Nov-2010 Jaakko Heinonen <jh@FreeBSD.org>

Use g_eventlock to protect against losing wakeups in the g_event process
and replace tsleep(9) with msleep(9) which doesn't use a timeout. The
previously used timeout caused the event process to wake up ten times
per second on an idle system.

one_event() is now called with the topology lock held and it returns
with both the topology and event locks held when there are no more
events in the queue.

Reported by: mav, Marius Nünnerich
Reviewed by: freebsd-geom


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# 2616144e 09-Aug-2008 Dag-Erling Smørgrav <des@FreeBSD.org>

Add sbuf_new_auto as a shortcut for the very common case of creating a
completely dynamic sbuf.

Obtained from: Varnish
MFC after: 2 weeks


# 982d11f8 04-Jun-2007 Jeff Roberson <jeff@FreeBSD.org>

Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
sychronization.
- Use the per-process spinlock rather than the sched_lock for per-process
scheduling synchronization.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


# 8c957640 25-Nov-2005 Lukas Ertl <le@FreeBSD.org>

Add sysctl descriptions.


# d1c712ed 19-Apr-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Call g_waitidle() instead of GEOM using the root_mount_hold() KPI.
GEOM could (and will) get events as a result of drivers coming in
late so a one-shot method is not good enough for GEOM.


# 73fbaa74 18-Apr-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Add a named reference-count KPI to hold off mounting of the root filesystem.

While we wait for holds to be released, print a list of who holds us
back once per second.

Use the new KPI from GEOM instead of vfs_mount.c calling g_waitidle().

Use the new KPI also from ata.

With ATAmkIII's newbusification, ata could narrowly miss the window
and ad0 would not exist when we tried to mount root.


# 07e95ed6 09-Feb-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Make various random things static


# 63710c4d 30-Dec-2004 John Baldwin <jhb@FreeBSD.org>

Stop explicitly touching td_base_pri outside of the scheduler and simply
set a thread's priority via sched_prio() when that is the desired action.
The schedulers will start managing td_base_pri internally shortly.


# 7e8ca741 13-Sep-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Make kern.geom.debugflags sysctl tunable from /boot/loader.conf.
It will help to debug problems when booting.

Approved by: phk


# 99cf2f94 10-Feb-2004 Poul-Henning Kamp <phk@FreeBSD.org>

don't call sbuf_clear() right after sbuf_new(), it is not necessary.


# 44be139b 18-Jun-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Sleep on "-" in our normal state to simplify debugging.


# 50b1faef 11-Jun-2003 David E. O'Brien <obrien@FreeBSD.org>

Use __FBSDID().

Approved by: phk


# 51da11a2 29-Apr-2003 Mark Murray <markm@FreeBSD.org>

Fix some easy, global, lint warnings. In most cases, this means
making some local variables static. In a couple of cases, this means
removing an unused variable.


# 0a9c130c 23-Apr-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Introduce a g_waitfor_event() function which posts an event and waits for
it to be run (or cancelled) and use this instead of home-rolled versions.


# a974614b 23-Apr-2003 Poul-Henning Kamp <phk@FreeBSD.org>

More of the event stuff can now be private to geom_event.c


# 8cd1535a 23-Apr-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Rename g_call_me() to g_post_event(), and give it a flag
argument to determine if we can M_WAITOK in malloc.


# b5cba416 23-Apr-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Move the shutdown eventhandler stuff to a more logical place.


# afcbcfae 02-Apr-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Change events to have an array of "void *" references, and give the
event posting functions varargs to fill these.

Attribute g_call_me() to appropriate g_geom's where necessary.

Add a flag argument to g_call_me() methods which will be used to signal
cancellation of events in the future.

This commit should be a no-op.


# d49d7ca5 24-Mar-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Turn /dev/geom.ctl from a GEOM class into a plain character device driver
instead, it will never see a disk-I/O transaction, so this is a lot simpler.


# e24cbd90 18-Mar-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Retire the GEOM private statistics code and use devstat instead.


# f0e185d7 11-Feb-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Implement a bio-taskqueue to reduce number of context switches in
disk I/O processing.

The intent is that the disk driver in its hardware interrupt
routine will simply schedule the bio on the task queue with
a routine to finish off whatever needs done.

The g_up thread will then schedule this routine, the likely
outcome of which is a biodone() which queues the bio on
g_up's regular queue where it will be picked up and processed.

Compared to the using the regular taskqueue, this saves one
contextswitch.

Change our scheduling of the g_up and g_down queues to be water-tight,
at the cost of breaking the userland regression test-shims.

Input and ideas from: scottl


# dc803e37 11-Feb-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Remove another printf which does not say anything we didn't already know.


# cce7303a 09-Feb-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Update the statistics collection code to track busy time instead of
idle time.

Statistics now default to "on" and can be turned off with
sysctl kern.geom.collectstats=0

Performance impact of statistics collection is on the order of
800 nsec per consumer/provider set on a 700MHz Athlon.


# 4ec35300 08-Feb-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Move the g_stat struct to its own .h file, we will export it to other code.

Insted of embedding a struct g_stat in consumers and providers, merely
include a pointer.

Remove a couple of <sys/time.h> includes now unneeded.

Add a special allocator for struct g_stat. This allocator will allocate
entire pages and hand out g_stat functions from there. The "id" field
indicates free/used status.

Add "/dev/geom.stats" device driver whic exports the pages from the
allocator to userland with mmap(2) in read-only mode.

This mmap(2) interface should be considered a non-public interface and
the functions in libgeom (not yet committed) should be used to access
the statistics data.


# 801bb689 07-Feb-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Commit the correct copy of the g_stat structure.

Add debug.sizeof.g_stat sysctl.

Set the id field of the g_stat when we create consumers and providers.

Remove biocount from consumer, we will use the counters in the g_stat
structure instead. Replace one field which will need to be atomically
manipulated with two fields which will not (stat.nop and stat.nend).

Change add companion field to bio_children: bio_inbed for the exact
same reason.

Don't output the biocount in the confdot output.

Fix KASSERT in g_io_request().

Add sysctl kern.geom.collectstats defaulting to off.

Collect the following raw statistics conditioned on this sysctl:

for each consumer and provider {
total number of operations started.
total number of operations completed.
time last operation completed.
sum of idle-time.
for each of BIO_READ, BIO_WRITE and BIO_DELETE {
number of operations completed.
number of bytes completed.
number of ENOMEM errors.
number of other errors.
sum of transaction time.
}
}

API for getting hold of these statistics data not included yet.


# 886d1942 07-Feb-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Fix some sleep strings to make more sense.


# 9693da43 27-Dec-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Remove the "ascii" attribute from the sysctls so that "sysctl -a" will
skip them.


# 3ff81a4c 26-Dec-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Use a mutex assert to document our locking circumstances.


# f03692cb 01-Dec-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Fix a cut&past-o.

Spotted by: yar
Approved by: re (blanket)


# d518e539 28-Oct-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Add the remaning part of the new libdisk interaction.

WARNING: This is not a published interface, it is a stopgap measure for
WARNING: libdisk so we can get 5.0-R out of the door.

Sponsored by: DARPA & NAI Labs


# 3d5500fc 25-Oct-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Reduce the GEOM verbosity under bootverbose to something more sufferable.
This is not quite the set of information I would want, but the tree where
I have the "correct" version is messed up with conflicts.

Sponsored by: DARPA & NAI Labs.


# 6adb7488 20-Oct-2002 Poul-Henning Kamp <phk@FreeBSD.org>

No need to specify CTLTYPE_INT when we use SYSCTL_INT.


# 37e7c03d 17-Oct-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Be consistent and return the NUL at the end of kern.geom.conf{xml,dot}.

Spotted by: sam


# 2874f1cf 04-Oct-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Properly isolate the locking domains of sysctl from the topology lock
for the sysctls which report the configuration.

Sponsored by: DARPA & NAI Labs.


# 29c21195 02-Oct-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Move GEOM's sysctls under kern.geom.

Sponsored by: DARPA & NAI Labs.


# 079a527a 28-Sep-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Zero the local-variable mutexes before we call mtx_init() on them,
failing to do this may lead mtx_init() to belive they have already
been initialized.

Detected by: Marc Recht <marc@informatik.uni-bremen.de>


# 4ae67700 28-Sep-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Style, whitespace and lint fixes.

Sponsored by: DARPA & NAI Labs.


# b1937dd1 27-Sep-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Make the UP/DOWN threads hold on to their own private mutex while doing
work.

This prevents people from sleeping in the UP/DOWN I/O path by mistake
or design (doing so almost invariably result in deadlocks since it
stalls all I/O processing in the given direction.

Sponsored by: DARPA & NAI Labs.


# 9169e800 27-Sep-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Various no-ops:

Add a __unused.

Make the 2byte decoder functions return 16 bits for the benefits
of picky lints.

No need to grab giant around a tsleep() when we have a timeout.

Sponsored by: DARPA & NAI Labs.


# f04af827 29-Jun-2002 Julian Elischer <julian@FreeBSD.org>

Don't use the static thread.. it is going away.


# b1876192 26-Mar-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Eliminate some thread pointers which do not make sense anymore.

Split private parts of geom.h into geom_int.h. The latter should
never be included in class implemtations.


# e805e8f0 26-Mar-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Cave in to tradition and rename "methods" to "classes".


# dd84a43c 11-Mar-2002 Poul-Henning Kamp <phk@FreeBSD.org>

First commit of the GEOM subsystem to make it easier for people to
test and play with this.

This is not yet production quality and should be run only on dedicated
test boxes.

For people who want to develop transformations for GEOM there exist a
set of shims to run geom in userland (ask phk@freebsd.org).

Reports of all kinds to: phk@freebsd.org
Please include in report:
dmesg
sysctl debug.geomdot
sysctl debug.geomconf

Known significant limitations:
no kernel dump facility.
ioctls severely restricted.

Sponsored by: DARPA, NAI Labs