History log of /freebsd-current/sys/kern/kern_sysctl.c
Revision Date Author Comments
# d5eae570 01-May-2024 Mark Johnston <markj@FreeBSD.org>

sysctl: Make sysctl_ctx_free() a bit safer

Clear the list before returning so that sysctl_ctx_free() can be called
more than once on the same list without side effects. This simplifies
error handling in drivers; previously, drivers would have to be careful
to call sysctl_ctx_free() at most once to avoid a use-after-free.

While here, use TAILQ_FOREACH_SAFE in the loop which unregisters OIDs.

Reviewed by: thj, emaste
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D45041


# 29363fb4 23-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove ancient SCCS tags.

Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.

Sponsored by: Netflix


# 0a713948 22-Nov-2023 Alexander Motin <mav@FreeBSD.org>

Replace random sbuf_printf() with cheaper cat/putc.


# f80babf9 22-Sep-2023 Alexander Motin <mav@FreeBSD.org>

kern_sysctl: Make name2oid() non-destructive to the name

It is not the first time I see it panicking while trying to modify
const memory. Lets make it safer and easier to use. While there,
mark few functions using it also const.

MFC after: 10 days


# cf7974fd 20-Sep-2023 Zhenlei Huang <zlei@FreeBSD.org>

sysctl: Update 'master' copy of vnet SYSCTLs on kernel environment variables change

Complete phase three of 3da1cf1e88f8.

With commit 110113bc086f, vnet sysctl variables can be loader tunable
but the feature is limited. When the kernel modules have been initialized,
any changes (e.g. via kenv) to kernel environment variable will not affect
subsequently created VNETs.

This change relexes the limitation by listening on kernel environment
variable's set / unset events, and then update the 'master' copy of vnet
SYSCTL or restore it to its initial value.

With this change, TUNABLE_XXX_FETCH can be greately eliminated for vnet
loader tunables.

Reviewed by: glebius
Fixes: 110113bc086f sysctl(9): Enable vnet sysctl variables to be loader tunable
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D41825


# 110113bc 09-Sep-2023 Zhenlei Huang <zlei@FreeBSD.org>

sysctl(9): Enable vnet sysctl variables to be loader tunable

Complete phase two of 3da1cf1e88f8.

In 3da1cf1e88f8, the meaning of the flag CTLFLAG_TUN is extended to
automatically check if there is a kernel environment variable which
shall initialize the SYSCTL during early boot. It works for all SYSCTL
types both statically and dynamically created ones, except for the
SYSCTLs which belong to VNETs.

This change extends the meaning further, to allow it also works for
the SYSCTLs which belong to VNETs. A typical usage is
```
VNET_DEFINE_STATIC(int, foo) = 0;
SYSCTL_INT(_net, OID_AUTO, foo, CTLFLAG_RWTUN | CTLFLAG_VNET,
&VNET_NAME(foo), 0, "Description of the foo loader tunable");
```

Note that the implementation has a limitation. It behaves the same way
as that of non-vnet loader tunables. That is, after the kernel or modules
being initialized, any changes (e.g. via kenv) to kernel environment
variable will not affect the corresponding vnet variable of subsequently
created VNETs. To overcome it, we can use TUNABLE_XXX_FETCH to fetch
the kernel environment variable into those vnet variables during vnet
constructing.

This change will fix the following SYSCTLs those belong to VNETs and
have CTLFLAG_TUN flag:
```
net.add_addr_allfibs
net.bpf.optimize_writers
net.inet.tcp.fastopen.ccache_buckets
net.link.bridge.inherit_mac
net.link.bridge.ipfw_arp
net.link.bridge.log_stp
net.link.bridge.pfil_bridge
net.link.bridge.pfil_local_phys
net.link.bridge.pfil_member
net.link.bridge.pfil_onlyip
net.link.lagg.default_use_flowid
net.link.lagg.default_use_numa
net.link.lagg.default_flowid_shift
net.link.lagg.lacp.debug
net.link.lagg.lacp.default_strict_mode
```

Although the following vnet SYSCTLs have CTLFLAG_TUN flag, theirs
values are re-fetched via TUNABLE_XXX_FETCH, thus are not affected
by this change.
```
net.inet.ip.reass_hashsize
net.inet.tcp.hostcache.cachelimit
net.inet.tcp.hostcache.hashsize
net.inet.tcp.hostcache.bucketlimit
net.inet.tcp.syncache.bucketlimit
net.inet.tcp.syncache.cachelimit
net.inet.tcp.syncache.hashsize
net.key.spdcache.maxentries
net.key.spdcache.threshold
```

In memoriam: hselasky
Discussed with: hselasky, glebius
Fixes: 3da1cf1e88f8 Extend the meaning of the CTLFLAG_TUN flag ...
MFC after: 2 weeks
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D39638


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 105e397e 17-Apr-2023 Gordon Bergling <gbe@FreeBSD.org>

kern_sysctl: Remove double words in source code comments

- s/on on/on/

MFC after: 5 days


# f394d9c0 27-Jan-2023 Gleb Smirnoff <glebius@FreeBSD.org>

sysctl: use correct types and names in sysctl_*sec_to_sbintime

The functions are intended to report kernel variables that are
stored as sbintime_t (pointed to by arg1) as human readable
nanoseconds or milliseconds (reported via sysctl_handle_64).
The variable types and names were reversed. I guess there is
no functional change here, as all types flipped around were
signed 64. Note that these function aren't used yet anywhere
in the kernel.

Reviewed by: mav
Differential revision: https://reviews.freebsd.org/D38217


# e5f93d10 30-Sep-2022 Doug Moore <dougm@FreeBSD.org>

show_sysctl_all: reduce copying, please coverity

Modify db_show_sysctl_all so that it does not copy more than once the
data of the input oid, and so that what it passes to db_show_oid does
not alarm coverity.

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D36847


# 5294bfa7 28-Sep-2022 Doug Moore <dougm@FreeBSD.org>

sysctl_search_oid: remove all-NULL precondition

The implementation of sysctl_search_oid no longer relies on the
initial value of nodes to be all NULL, so remove the comment that
demands it and let the caller stop enforcing it.

Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D36768


# 9f6f9007 27-Sep-2022 Doug Moore <dougm@FreeBSD.org>

name2oid: use find_oidname

In name2oid, use sysctl _find_oidname instead of re-implementing it.
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D36765


# e96ae5cb 27-Sep-2022 Doug Moore <dougm@FreeBSD.org>

sysctl_search_oid: remove useless tests

sysctl_search_old makes several tests in a loop that can be removed.

The first test in the loop is only ever true on the first loop
iteration, and is always true on that iteration, so its work can be
done before the loop begins.

The upper and lower bounds on the loop variable 'indx' are each tested
on each iteration, but 'indx' is changed in one direction or the other
only once within the loop, so only one bound needs to be checked.

Two ways remain in the loop that nodes[indx] can change (after one of
them is put before the loop start), and one of them applies exactly
when indx has been incremented, so no separate test for that case
requires testing.

Restructure and add comments that makes clearer that this is a basic
depth-first search.

Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D36741


# ed518345 26-Sep-2022 Doug Moore <dougm@FreeBSD.org>

register_oid: fix duplicate oid after d3f96f661050

sysctl_register_oid must check the uniqueness of any newly computed
oid_number in sysctl_register_oid.

Reviewed by: asomers
MFC with: d3f96f661050
Differential Revision: https://reviews.freebsd.org/D36743


# c075ea46 27-Sep-2022 Hans Petter Selasky <hselasky@FreeBSD.org>

sysctl(3): Implement SYSCTL_FOREACH() to iterate all OIDs in a sysctl list.

To avoid using the sysctl list macros directly in external kernel modules.

Reviewed by: asomers, manu and asiciliano
Differential Revision: https://reviews.freebsd.org/D36748
MFC after: 1 week
Sponsored by: NVIDIA Networking


# d3f96f66 07-Sep-2022 Alan Somers <asomers@FreeBSD.org>

Fix O(n^2) behavior in sysctl

Sysctl OIDs were internally stored in linked lists, triggering O(n^2)
behavior when userland iterates over many of them. The slowdown is
noticeable for MIBs that have > 100 children (for example, vm.uma). But
it's unignorable for kstat.zfs when a pool has > 1000 datasets.

Convert the linked lists into RB trees. This produces a ~25x speedup
for listing kstat.zfs with 4100 datasets, and no measurable penalty for
small dataset counts.

Bump __FreeBSD_version for the KPI change.

Sponsored by: Axcient
Reviewed by: mjg
Differential Revision: https://reviews.freebsd.org/D36500


# 258958b3 05-Jul-2022 Mitchell Horne <mhorne@FreeBSD.org>

ddb: use _FLAGS command macros where appropriate

Some command definitions were forced to use DB_FUNC in order to specify
their required flags, CS_OWN or CS_MORE. Use the new macros to simplify
these.

Reviewed by: markj, jhb
MFC after: 3 days
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D35582


# d8bd949b 22-Nov-2021 Brooks Davis <brooks@FreeBSD.org>

sys___sysctl: regularize argument struct

Let makesyscalls generate the normal struct __sysctl_args structure.
It works fine.

Reviewed by: kib


# 6c950655 21-Jul-2021 Alan Somers <asomers@FreeBSD.org>

Escape any '.' characters in sysctl node names

ZFS creates some sysctl nodes that include a pool name, and '.' is an
allowed character in pool names. But it's the separator in the sysctl
tree, so it can't be included in a sysctl name. Replace it with "%25".
Handily, "%" is illegal in ZFS pool names, so there's no ambiguity
there.

PR: 257316
MFC after: 3 weeks
Sponsored by: Axcient
Reviewed by: freqlabs
Differential Revision: https://reviews.freebsd.org/D31265


# 4342ba18 18-Apr-2021 Konstantin Belousov <kib@FreeBSD.org>

sysctl_handle_string: do not malloc when SYSCTL_IN cannot fault

In particular, this avoids malloc(9) calls when from early tunable handling,
with no working malloc yet.

Reported and tested by: mav
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 571a1a64 18-Apr-2021 Warner Losh <imp@FreeBSD.org>

Minor style tidy: if( -> if (

Fix a few 'if(' to be 'if (' in a few places, per style(9) and
overwhelming usage in the rest of the kernel / tree.

MFC After: 3 days
Sponsored by: Netflix


# 8db8bebf 30-Nov-2020 Alexander V. Chernikov <melifaro@FreeBSD.org>

Move inner loop logic out of sysctl_sysctl_next_ls().

Refactor sysctl_sysctl_next_ls():
* Move huge inner loop out of sysctl_sysctl_next_ls() into a separate
non-recursive function, returning the next step to be taken.
* Update resulting node oid parts only on successful lookup
* Make sysctl_sysctl_next_ls() return boolean success/failure instead of errno,
slightly simplifying logic

Reviewed by: freqlabs
Differential Revision: https://reviews.freebsd.org/D27029


# e58483c4 24-Oct-2020 Ryan Moeller <freqlabs@FreeBSD.org>

sysctl+kern_sysctl: Honor SKIP for descendant nodes

Ensure we also skip descendants of SKIP nodes when iterating through children
of an explicitly specified node.

Reported by: np
Reviewed by: np
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D26833


# 0595c124 24-Oct-2020 Ryan Moeller <freqlabs@FreeBSD.org>

kern_sysctl: Misc code cleanup

Remove unused oidpp parameter from sysctl_sysctl_next_ls and
add high level comments to describe how it works.

No functional change.

Reviewed by: imp
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D26854


# 92e17803 05-Oct-2020 Ryan Moeller <freqlabs@FreeBSD.org>

Enable iterating all sysctls, even ones with CTLFLAG_SKIP

Add an "nextnoskip" sysctl that allows for listing of sysctls intended to be
normally skipped for cost reasons.

This makes it so the names/descriptions of those sysctls can be discovered with
sysctl -aN/sysctl -ad/sysctl -at.

It also makes it so children are visited when a node flagged with CTLFLAG_SKIP
is explicitly requested.

The intended use case is to mark the root "kstat" node with CTLFLAG_SKIP so that
the extensive and expensive stats are skipped by default but may still be easily
obtained without having to know them all (which may not even be possible) and
request each one-by-one.

Reviewed by: jhb
MFC after: 2 weeks
Relnotes: yes
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D26560


# 6fed89b1 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

kern: clean up empty lines in .c and .h files


# 19112958 16-Apr-2020 Brooks Davis <brooks@FreeBSD.org>

style(9): end continued line with operator.


# f65eac0f 15-Apr-2020 Pawel Biernacki <kaktus@FreeBSD.org>

sysctl_handle_string: Put logical or in parentheses.

Reported by: rdivacky
Approved by: kib (mentor)
Pointy-hat to: kaktus


# 1627b1fd 15-Apr-2020 Pawel Biernacki <kaktus@FreeBSD.org>

sysctl(9): fix handling string tunables.

r357614 changed internals of handling string sysctls, and inadvertently
broke setting string tunables. Take them into account.

PR: 245463
Reported by: jhb, np
Reviewed by: imp, jhb, kib
Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D24429


# 37d4ece7 10-Feb-2020 Li-Wen Hsu <lwhsu@FreeBSD.org>

Restore the behavior of allowing empty string in a string sysctl

Added as a special case to avoid unnecessary memory operations.

Reviewed by: delphij
Sponsored by: The FreeBSD Foundation


# 210176ad 05-Feb-2020 Pawel Biernacki <kaktus@FreeBSD.org>

sysctl(9): add CTLFLAG_NEEDGIANT flag

Add CTLFLAG_NEEDGIANT flag (modelled after D_NEEDGIANT) that will be used to
mark sysctls that still require locking Giant.

Rewrite sysctl_handle_string() to use internal locking instead of locking
Giant.

Mark SYSCTL_STRING, SYSCTL_OPAQUE and their variants as MPSAFE.

Add infrastructure support for enforcing proper use of CTLFLAG_NEEDGIANT
and CTLFLAG_MPSAFE flags with SYSCTL_PROC and SYSCTL_NODE, not enabled yet.

Reviewed by: kib (mentor)
Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D23378


# 3ff65f71 30-Jan-2020 Mateusz Guzik <mjg@FreeBSD.org>

Remove duplicated empty lines from kern/*.c

No functional changes.


# 91c4b68f 06-Jan-2020 Pawel Biernacki <kaktus@FreeBSD.org>

kern_sysctl: make sysctl.debug work as intended

r136999 introduced SYSTCL_DEBUG but apparently "opt_sysctl.h" was never
included making the option ignored.

r322954 introduced sysctl.reuse_test with OID number equal to 0, effectively
shadowing the very special sysctl.debug one. Use OID_AUTO as it doesn't need
any special treatment.

Reviewed by: kib (mentor)
Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D23056


# 43cefe8b 25-Nov-2019 Ryan Libby <rlibby@FreeBSD.org>

sysctl sysctls: wire old buf before output with sysctl lock

Several sysctl sysctls output to a user buffer while holding a
non-sleepable lock that protects the sysctl topology. They need to wire
the output buffer, or else they may try to sleep on a page fault.

Reviewed by: cem, markj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D22528


# 382e01c8 18-Sep-2019 Konstantin Belousov <kib@FreeBSD.org>

sysctl: use names instead of magic numbers.

Replace magic numbers with symbols for internal sysctl operations.
Convert in-kernel and libc consumers.

Submitted by: Pawel Biernacki
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D21693


# d05b53e0 02-Sep-2019 Mateusz Guzik <mjg@FreeBSD.org>

Add sysctlbyname system call

Previously userspace would issue one syscall to resolve the sysctl and then
another one to actually use it. Do it all in one trip.

Fallback is provided in case newer libc happens to be running on an older
kernel.

Submitted by: Pawel Biernacki
Reported by: kib, brooks
Differential Revision: https://reviews.freebsd.org/D17282


# 7d0658ad 07-Aug-2019 Conrad Meyer <cem@FreeBSD.org>

Fix !DDB kernel configurations after r350713

KDB is standard and the kdb_active variable is always available. So,
de-conditionalize inclusion of sys/kdb.h in kern_sysctl.c.

Reported by: Michael Butler <imb AT protected-networks.net>
X-MFC-With: r350713
Sponsored by: Dell EMC Isilon


# 088c17b4 07-Aug-2019 Conrad Meyer <cem@FreeBSD.org>

ddb(4): Add 'sysctl' command

Implement `sysctl` in `ddb` by overriding `SYSCTL_OUT`. When handling the
req, we install custom ddb in/out handlers. The out handler prints straight
to the debugger, while the in handler ignores all input. This is intended
to allow us to print just about any sysctl.

There is a known issue when used from ddb(4) entered via 'sysctl
debug.kdb.enter=1'. The DDB mode does not quite prevent all lock
interactions, and it is possible for the recursive Giant lock to be unlocked
when the ddb(4) 'sysctl' command is used. This may result in a panic on
return from ddb(4) via 'c' (continue). Obviously, this is not a problem
when debugging already-paniced systems.

Submitted by: Travis Lane (formerly: <travis.lane AT isilon.com>)
Reviewed by: vangyzen (earlier version), Don Morris <dgmorris AT earthlink.net>
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D20219


# 0f702183 11-Jun-2019 John Baldwin <jhb@FreeBSD.org>

Make the warning intervals for deprecated crypto algorithms tunable.

New sysctl/tunables can now set the interval (in seconds) between
rate-limited crypto warnings. The new sysctls are:
- kern.cryptodev_warn_interval for /dev/crypto
- net.inet.ipsec.crypto_warn_interval for IPsec
- kern.kgssapi_warn_interval for KGSSAPI

Reviewed by: cem
MFC after: 1 month
Relnotes: yes
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D20555


# 7d7db529 07-May-2019 Conrad Meyer <cem@FreeBSD.org>

device_printf: Use sbuf for more coherent prints on SMP

device_printf does multiple calls to printf allowing other console messages to
be inserted between the device name, and the rest of the message. This change
uses sbuf to compose to two into a single buffer, and prints it all at once.

It exposes an sbuf drain function (drain-to-printf) for common use.

Update documentation to match; some unit tests included.

Submitted by: jmg
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D16690


# 10f7b12c 17-Dec-2018 Brooks Davis <brooks@FreeBSD.org>

const poison the `new` pointer of __sysctl.

Reviewed by: kib
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D18444


# 90acd1d1 19-Nov-2018 Hans Petter Selasky <hselasky@FreeBSD.org>

Minor code factoring. No functional change.

MFC after: 1 week
Sponsored by: Mellanox Technologies


# 2205f61a 19-Nov-2018 Hans Petter Selasky <hselasky@FreeBSD.org>

Be more verbose when a sysctl fails to unregister.
Print name of sysctl in question.

MFC after: 1 week
Sponsored by: Mellanox Technologies


# 36173f69 15-Nov-2018 Warner Losh <imp@FreeBSD.org>

Do proper conversion to/from sbt.

Doh! sbttoX and Xtosbt were backwards. While they ran, they produced
bogus results.

Pointy hat to: imp@


# 003ffd57 02-Nov-2018 Warner Losh <imp@FreeBSD.org>

Add sysctl_usec_to_sbintime and sysctl_msec_to_sbintime.

These functions are used to present a sbintime_t as either a number of
microseconds or a number of milliseconds respectively.

Sponsored by: Netflix


# ce70c572 20-Jun-2018 Hans Petter Selasky <hselasky@FreeBSD.org>

Permit the kernel environment to set an array of numeric values for a single
sysctl(9) node.

Reviewed by: kib@, imp@, jhb@
Differential Revision: https://reviews.freebsd.org/D15802
MFC after: 1 week
Sponsored by: Mellanox Technologies


# 891cf3ed 18-May-2018 Ed Maste <emaste@FreeBSD.org>

Use NULL for SYSINIT's last arg, which is a pointer type

Sponsored by: The FreeBSD Foundation


# 6469bdcd 06-Apr-2018 Brooks Davis <brooks@FreeBSD.org>

Move most of the contents of opt_compat.h to opt_global.h.

opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by: kib, cem, jhb, jtl
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14941


# 64282a12 01-Feb-2018 Hans Petter Selasky <hselasky@FreeBSD.org>

Slightly bump the maximum OID path for loading tunable SYSCTLs.

Coming updates to the mlx5en(4) driver will require this.

MFC after: 1 week
Sponsored by: Mellanox Technologies


# 51369649 20-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.


# 8b9817a4 11-Nov-2017 Mateusz Guzik <mjg@FreeBSD.org>

sysctl: try to avoid malloc in name2oid

name2oid is called all the time and passed names are almost always very short
(< 16 characters).


# 9b8de76b 18-Oct-2017 Mateusz Guzik <mjg@FreeBSD.org>

sysctl: only take mem lock if oldlen is > 4 * PAGE_SIZE

The previous limit of just one page is hit by ps.

The entire mechanism should be reworked, if not whacked. It seems the intent
is to reduce kernel dos-ability - some handlers wire the amount of memory
passed here. Handlers should probably stop wiring in the first place or in
the worst case indicate they are doing so so that the check is done only if
necessary. It should also probably be a counter, not a lock.

MFC after: 1 week


# 693593b6 04-Oct-2017 Andriy Gapon <avg@FreeBSD.org>

sysctl-s in a module should be accessible only when the module is initialized

A sysctl can have a custom handler that may access data that is initialized
via SYSINIT(9) or via a module event handler (also invoked via SYSINIT).
Thus, it is not safe to allow access to the module's sysctl-s until
the initialization is performed. Likewise, we should not allow access
to teh sysctl-s after the module is uninitialized.
The latter is easy to achieve by properly ordering linker_file_unregister_sysctls
and linker_file_sysuninit.
The former is not as easy for two reasons:
- the initialization may depend on tunables which get set when sysctl-s are
registered, so we need to set the tunables before running sysinit-s
- the initialization may try to dynamically add more sysctl-s under statically
defined sysctl nodes
So, this change splits the sysctl setup into two phases. In the first phase
the sysctl-s are registered as before but they are disabled and hidden from
consumers. In the second phase, done after sysinit-s, normal access to the
sysctl-s is enabled.

The change should affect only dynamic module loading and unloading after
the system boot-up. Nothing changes for sysctl-s compiled into the kernel
and sysctl-s in preloaded modules.

Discussed with: hselasky, ian, jhb
Reviewed by: julian, kib
MFC after: 2 weeks
Sponsored by: Panzura
Differential Revision: https://reviews.freebsd.org/D12545


# 550374ef 01-Oct-2017 Andriy Gapon <avg@FreeBSD.org>

revert r324166, it has an unrelated change in it


# a79d52d7 26-Sep-2017 Mateusz Guzik <mjg@FreeBSD.org>

sysctl: remove target buffer read/write checks prior to calling the handler

Said checks were inherently racy anyway as jokers could unmap target areas
before the handler got around to accessing them.

This saves time by avoiding locking the address space.

MFC after: 1 week


# 956713cb 26-Sep-2017 Mateusz Guzik <mjg@FreeBSD.org>

Annotate sysctlmemlock with __exclusive_cache_line.

MFC after: 1 week


# 4ae2ade1 27-Aug-2017 Conrad Meyer <cem@FreeBSD.org>

Enhance debugibility of sysctl leaf re-use warnings

Print the full conflicting oid path, and include the function name in the
warning so it is clear that the warnings are sysctl-related.

PR: 221853
Submitted by: Fabian Keil <fk AT fabiankeil.de> (earlier version)
Sponsored by: Dell EMC Isilon


# 8b69c3e7 21-Mar-2017 Enji Cooper <ngie@FreeBSD.org>

Print out name of non-dynamic sysctl in sysctl_remove_oid_locked

This will provide a slightly better smoking gun than just stating
"can't remove non-dynamic nodes!" when calling sysctl_ctx_free(9)
and sysctl_remove_{name,oid}(9) with a non-dynamic (likely
static) sysctl.

MFC after: 1 week
Sponsored by: Dell EMC Isilon


# 669a25b5 15-Dec-2016 Ed Schouten <ed@FreeBSD.org>

Document the existence of the {0, 6, ...} sysctl.


# 1e1f3941 13-Dec-2016 Ed Schouten <ed@FreeBSD.org>

Add support for attaching aggregation labels to sysctl objects.

I'm currently working on writing a metrics exporter for the Prometheus
monitoring system to provide access to sysctl metrics. Prometheus and
sysctl have some structural differences:

- sysctl is a tree of string component names.
- Prometheus uses a flat namespace for its metrics, but allows you to
attach labels with values to them, so that you can do aggregation.

An initial version of my exporter simply translated

hw.acpi.thermal.tz1.temperature

to

sysctl_hw_acpi_thermal_tz1_temperature_celcius

while we should ideally have

sysctl_hw_acpi_thermal_temperature_celcius{thermal_zone="tz1"}

allowing you to graph all thermal zones on a system in one go.

The change presented in this commit adds support for accomplishing this,
by providing the ability to attach labels to nodes. In the example I
gave above, the label "thermal_zone" would be attached to "tz1". As this
is a feature that will only be used very rarely, I decided to not change
the KPI too aggressively.

Discussed on: hackers@
Reviewed by: cem
Differential Revision: https://reviews.freebsd.org/D8775


# 69a28758 15-Sep-2016 Ed Maste <emaste@FreeBSD.org>

Renumber license clauses in sys/kern to avoid skipping #3


# 84e717c4 26-May-2016 Hans Petter Selasky <hselasky@FreeBSD.org>

Add support for boolean sysctl's.

Because the size of bool can be implementation defined, make a bool
sysctl handler which handle bools. Userspace sees the bools like
unsigned 8-bit integers. Values are filtered to either 1 or 0 upon
read and write, similar to what a compiler would do.

Requested by: kmacy @
Sponsored by: Mellanox Technologies


# e3043798 29-Apr-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/kern: spelling fixes in comments.

No functional change.


# b85f65af 15-Apr-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

kern: for pointers replace 0 with NULL.

These are mostly cosmetical, no functional change.

Found with devel/coccinelle.


# 87b87bb8 25-Jan-2016 Mark Johnston <markj@FreeBSD.org>

Evaluate the sysctl_running fail point before taking the sysctl lock.

The fail point handler may sleep, but this is not permitted while holding a
rm read lock.

MFC after: 1 week
Sponsored by: EMC / Isilon Storage Division


# e072f955 07-Nov-2015 Conrad Meyer <cem@FreeBSD.org>

Flesh out sysctl types further (follow-up of r290475)

Use the right intmax_t type instead of intptr_t in a few remaining
places.

Add support for CTLFLAG_TUN for the new fixed with types. Bruce will be
upset that the new handlers silently truncate tuned quad-sized inputs,
but so do all of the existing handlers.

Add the new types to debug_dump_node, for whatever use that is.

Bump FreeBSD_version again, for good measure. We are changing
SYSCTL_HANDLER_ARGS and a member of struct sysctl_oid to intmax_t.

Correct the sysctl typed NULL values for the fixed-width types. (Hat
tip: hps@.)

Suggested by: hps (partial)
Sponsored by: EMC / Isilon Storage Division


# be87839e 06-Nov-2015 Conrad Meyer <cem@FreeBSD.org>

Round out SYSCTL macros to the full set of fixed-width types

Add S8, S16, S32, and U32 types; add SYSCTL*() macros for them, as well
as for the existing 64-bit types. (While SYSCTL*QUAD and UQUAD macros
already exist, they do not take the same sort of 'val' parameter that
the other macros do.)

Clean up the documented "types" in the sysctl.9 document. (These are
macros and thus not real types, but the manual page documents intent.)

The sysctl_add_oid(9) arg2 has been bumped from intptr_t to intmax_t to
accommodate 64-bit types on 32-bit pointer architectures.

This is just the kernel support piece; the userspace sysctl(1) support
will follow in a later patch.

Submitted by: Ravi Pokala <rpokala@panasas.com>
Reviewed by: cem
Relnotes: no
Sponsored by: Panasas
Differential Revision: https://reviews.freebsd.org/D4091


# c3220d0b 22-Oct-2015 Conrad Meyer <cem@FreeBSD.org>

Sysctl: Add common support for U8, U16 types

Sponsored by: EMC / Isilon Storage Division


# 7665e341 15-Sep-2015 Mateusz Guzik <mjg@FreeBSD.org>

sysctl: switch sysctllock to a sleepable rmlock, take 2

This restores r285125. Previous attempt was reverted due to a bug in rmlocks,
which is fixed since r287833.


# 4ae1e3c7 30-Jul-2015 Mateusz Guzik <mjg@FreeBSD.org>

Revert r285125 until rmlocks get fixed.

Right now there is a chance that sysctl unregister will cause reader to
block on the sx lock associated with sysctl rmlock, in which case kernels
with debug enabled will panic.


# 6f99ea05 06-Jul-2015 Patrick Kelsey <pkelsey@FreeBSD.org>

Don't acquire sysctlmemlock in userland_sysctl() when the old value
pointer is NULL, as in that case there are no userland pages that
could potentially be wired. It is common for old to be NULL and
oldlenp to be non-NULL in calls to userland_sysctl(), as this is used
to probe for the length of a variable-length sysctl entry before
retrieving a value. Note that it is typical for such calls to be made
with an uninitialized value in *oldlenp, so sysctlmemlock was
essentially being acquired at random (depending on the uninitialized
value in *oldlenp being > PAGE_SIZE or not) for these calls prior to
this patch.

Differential Revision: https://reviews.freebsd.org/D2987
Reviewed by: mjg, kib
Approved by: jmallett (mentor)
MFC after: 1 month


# ee5f66f8 04-Jul-2015 Mateusz Guzik <mjg@FreeBSD.org>

sysctl: get rid of sysctl_lock/unlock

Inline their contents into the only consumer.


# d5fc115a 04-Jul-2015 Mateusz Guzik <mjg@FreeBSD.org>

sysctl: remove a debugging printf which crept in with r285125


# b8633775 04-Jul-2015 Mateusz Guzik <mjg@FreeBSD.org>

sysctl: switch sysctllock to a sleepable rmlock

The lock is almost never taken for writing.


# 38668c60 25-Mar-2015 Hans Petter Selasky <hselasky@FreeBSD.org>

Implement a simple OID number garbage collector. Given the increasing
number of dynamically created and destroyed SYSCTLs during runtime it
is very likely that the current new OID number limit of 0x7fffffff can
be reached. Especially if dynamic OID creation and destruction results
from automatic tests. Additional changes:

- Optimize the typical use case by decrementing the next automatic OID
sequence number instead of incrementing it. This saves searching time
when inserting new OIDs into a fresh parent OID node.

- Add simple check for duplicate non-automatic OID numbers.

MFC after: 1 week


# 502702c6 24-Mar-2015 Hans Petter Selasky <hselasky@FreeBSD.org>

Make sure tunable sysctls are only fetched once. The existing code can
re-register sysctls when destroying sysctl contexts or when moving
sysctls from one tree to another.


# ab91c9a7 24-Mar-2015 Hans Petter Selasky <hselasky@FreeBSD.org>

Correct string pointer offset for error printout.


# 28349245 17-Mar-2015 Ian Lepore <ian@FreeBSD.org>

In sbuf_new_for_sysctl(), default the buffer size to 64 bytes if the
passed-in pointer is NULL and the length is zero.


# 1eafc078 14-Mar-2015 Ian Lepore <ian@FreeBSD.org>

Set the SBUF_INCLUDENUL flag in sbuf_new_for_sysctl() so that sysctl
strings returned to userland include the nulterm byte.

Some uses of sbuf_new_for_sysctl() write binary data rather than strings;
clear the SBUF_INCLUDENUL flag after calling sbuf_new_for_sysctl() in
those cases. (Note that the sbuf code still automatically adds a nulterm
byte in sbuf_finish(), but since it's not included in the length it won't
get copied to userland along with the binary data.)

Remove explicit adding of a nulterm byte in a couple places now that it
gets done automatically by the sbuf drain code.

PR: 195668


# 84267cac 28-Dec-2014 Mateusz Guzik <mjg@FreeBSD.org>

sysctl: don't modify oid_running for static nodes

It is necessary to prevent nodes from being destroyed while used, but static
ones cannot be destroyed.


# f84f8f94 25-Oct-2014 Mateusz Guzik <mjg@FreeBSD.org>

Now that sysctl_root is only called with sysctl lock in shared mode, update
its assertion to require that.

Update comment missed in r273400: sysctl_xlock/unlock -> sysctl_xlock/xunlock

Noted by: jhb


# b0d69dfa 23-Oct-2014 Dag-Erling Smørgrav <des@FreeBSD.org>

In all cases except CTLTYPE_STRING, penv is NULL here, so passing it
indiscriminately to printf() and freeenv() is incorrect. Add a NULL
check before freeenv(); as for printf(), we could use req.newptr
instead, but we'd have to select the correct format string based on
the type, and that's too much work for an error message, so just
remove it.


# fca77320 21-Oct-2014 Mateusz Guzik <mjg@FreeBSD.org>

Mark some more sysctl stuff shared-locked and MPSAFE.


# b564c5d6 21-Oct-2014 Mateusz Guzik <mjg@FreeBSD.org>

Make sysctl name2oid shared-locked as well.

This is a follow-up to r273401.


# efe0abdd 21-Oct-2014 Mateusz Guzik <mjg@FreeBSD.org>

Implement shared locking for sysctl.


# 580a0117 21-Oct-2014 Mateusz Guzik <mjg@FreeBSD.org>

Rename sysctl_lock and _unlock to sysctl_xlock and _xunlock.


# 2be111bf 16-Oct-2014 Davide Italiano <davide@FreeBSD.org>

Follow up to r225617. In order to maximize the re-usability of kernel code
in userland rename in-kernel getenv()/setenv() to kern_setenv()/kern_getenv().
This fixes a namespace collision with libc symbols.

Submitted by: kmacy
Tested by: make universe


# 30d58d6b 10-Jul-2014 Mateusz Guzik <mjg@FreeBSD.org>

Don't make a temporary copy of fixed sysctl strings.


# 604bf9d3 05-Jul-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

When getting the initial value of numeric tunables use the
getenv_xxx() functions instead of strtoq(), because the getenv_xxx()
functions include wrappers for various postfixes like G/M/K, which
strtoq() doesn't do.


# 6a3287f8 27-Jun-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Fix regression issue after r267961. Handle special string case for
SYSCTLs like previously.

MFC after: 2 weeks
Reported by: several people


# af3b2549 27-Jun-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Pull in r267961 and r267973 again. Fix for issues reported will follow.


# 37a107a4 27-Jun-2014 Glen Barber <gjb@FreeBSD.org>

Revert r267961, r267973:

These changes prevent sysctl(8) from returning proper output,
such as:

1) no output from sysctl(8)
2) erroneously returning ENOMEM with tools like truss(1)
or uname(1)
truss: can not get etype: Cannot allocate memory


# 3da1cf1e 27-Jun-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after: 2 weeks
Sponsored by: Mellanox Technologies


# 4a144410 16-Mar-2014 Robert Watson <rwatson@FreeBSD.org>

Update kernel inclusions of capability.h to use capsicum.h instead; some
further refinement is required as some device drivers intended to be
portable over FreeBSD versions rely on __FreeBSD_version to decide whether
to include capability.h.

MFC after: 3 weeks


# b5c32cf4 07-Feb-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Remove identical vnet sysctl handlers, and handle CTLFLAG_VNET
in the sysctl_root().

Note: SYSCTL_VNET_* macros can be removed as well. All is
needed to virtualize a sysctl oid is set CTLFLAG_VNET on it.
But for now keep macros in place to avoid large code churn.

Sponsored by: Nginx, Inc.


# f510415d 08-Aug-2013 Scott Long <scottl@FreeBSD.org>

Add a helpful message that can help point to why a sysctl tree removal failed

Obtained from: Netflix
MFC after: 3 days


# db9066f7 01-Mar-2013 Marius Strobl <marius@FreeBSD.org>

- Use strdup(9) instead of reimplementing it.
- Use __DECONST instead of strange casts.
- Reduce code duplication and simplify name2oid().

PR: 176373
Submitted by: Christoph Mallon
MFC after: 1 week


# 18716f9f 11-Feb-2013 Marius Strobl <marius@FreeBSD.org>

Update comments to reflect r246689.


# bdc5f017 11-Feb-2013 Marius Strobl <marius@FreeBSD.org>

Make SYSCTL_{LONG,QUAD,ULONG,UQUAD}(9) work as advertised and also handle
constant values.

Reviewed by: kib
MFC after: 3 days


# 5730afc9 21-Mar-2012 Alan Cox <alc@FreeBSD.org>

Handle spurious page faults that may occur in no-fault sections of the
kernel.

When access restrictions are added to a page table entry, we flush the
corresponding virtual address mapping from the TLB. In contrast, when
access restrictions are removed from a page table entry, we do not
flush the virtual address mapping from the TLB. This is exactly as
recommended in AMD's documentation. In effect, when access
restrictions are removed from a page table entry, AMD's MMUs will
transparently refresh a stale TLB entry. In short, this saves us from
having to perform potentially costly TLB flushes. In contrast,
Intel's MMUs are allowed to generate a spurious page fault based upon
the stale TLB entry. Usually, such spurious page faults are handled
by vm_fault() without incident. However, when we are executing
no-fault sections of the kernel, we are not allowed to execute
vm_fault(). This change introduces special-case handling for spurious
page faults that occur in no-fault sections of the kernel.

In collaboration with: kib
Tested by: gibbs (an earlier version)

I would also like to acknowledge Hiroki Sato's assistance in
diagnosing this problem.

MFC after: 1 week


# 8451d0dd 16-Sep-2011 Kip Macy <kmacy@FreeBSD.org>

In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by: rwatson
Approved by: re (bz)


# ff66f6a4 17-Jul-2011 Robert Watson <rwatson@FreeBSD.org>

Define two new sysctl node flags: CTLFLAG_CAPRD and CTLFLAG_CAPRW, which
may be jointly referenced via the mask CTLFLAG_CAPRW. Sysctls with these
flags are available in Capsicum's capability mode; other sysctl nodes are
not.

Flag several useful sysctls as available in capability mode, such as memory
layout sysctls required by the run-time linker and malloc(3). Also expose
access to randomness and available kernel features.

A few sysctls are enabled to support name->MIB conversion; these may leak
information to capability mode by virtue of providing resolution on names
not flagged for access in capability mode. This is, generally, not a huge
problem, but might be something to resolve in the future. Flag these cases
with XXX comments.

Submitted by: jonathan
Sponsored by: Google, Inc.


# 3d08a76b 12-May-2011 Matthew D Fleming <mdf@FreeBSD.org>

Use a name instead of a magic number for kern_yield(9) when the priority
should not change. Fetch the td_user_pri under the thread lock. This
is probably not necessary but a magic number also seems preferable to
knowing the implementation details here.

Requested by: Jason Behmer < jason DOT behmer AT isilon DOT com >


# e4cd31dd 21-Mar-2011 Jeff Roberson <jeff@FreeBSD.org>

- Merge changes to the base system to support OFED. These include
a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND,
and other miscellaneous small features.


# e7ceb1e9 07-Feb-2011 Matthew D Fleming <mdf@FreeBSD.org>

Based on discussions on the svn-src mailing list, rework r218195:

- entirely eliminate some calls to uio_yeild() as being unnecessary,
such as in a sysctl handler.

- move should_yield() and maybe_yield() to kern_synch.c and move the
prototypes from sys/uio.h to sys/proc.h

- add a slightly more generic kern_yield() that can replace the
functionality of uio_yield().

- replace source uses of uio_yield() with the functional equivalent,
or in some cases do not change the thread priority when switching.

- fix a logic inversion bug in vlrureclaim(), pointed out by bde@.

- instead of using the per-cpu last switched ticks, use a per thread
variable for should_yield(). With PREEMPTION, the only reasonable
use of this is to determine if a lock has been held a long time and
relinquish it. Without PREEMPTION, this is essentially the same as
the per-cpu variable.


# 00f0e671 26-Jan-2011 Matthew D Fleming <mdf@FreeBSD.org>

Explicitly wire the user buffer rather than doing it implicitly in
sbuf_new_for_sysctl(9). This allows using an sbuf with a SYSCTL_OUT
drain for extremely large amounts of data where the caller knows that
appropriate references are held, and sleeping is not an issue.

Inspired by: rwatson


# 73d6f851 26-Jan-2011 Matthew D Fleming <mdf@FreeBSD.org>

Remove the CTLFLAG_NOLOCK as it seems to be both unused and
unfunctional. Wiring the user buffer has only been done explicitly
since r101422.

Mark the kern.disks sysctl as MPSAFE since it is and it seems to have
been mis-using the NOLOCK flag.

Partially break the KPI (but not the KBI) for the sysctl_req 'lock'
field since this member should be private and the "REQ_LOCKED" state
seems meaningless now.


# cbc134ad 19-Jan-2011 Matthew D Fleming <mdf@FreeBSD.org>

Introduce signed and unsigned version of CTLTYPE_QUAD, renaming
existing uses. Rename sysctl_handle_quad() to sysctl_handle_64().


# 2fee06f0 18-Jan-2011 Matthew D Fleming <mdf@FreeBSD.org>

Specify a CTLTYPE_FOO so that a future sysctl(8) change does not need
to rely on the format string.


# dd6312a7 29-Nov-2010 Matthew D Fleming <mdf@FreeBSD.org>

Fix uninitialized variable warning that shows on Tinderbox but not my
setup. (??)

Submitted by: Michael Butler <imb at protected-networks dot net>


# ccecef29 29-Nov-2010 Matthew D Fleming <mdf@FreeBSD.org>

Do not hold the sysctl lock across a call to the handler. This fixes a
general LOR issue where the sysctl lock had no good place in the
hierarchy. One specific instance is #284 on
http://sources.zabbadoz.net/freebsd/lor.html .

Reviewed by: jhb
MFC after: 1 month
X-MFC-note: split oid_refcnt field for oid_running to preserve KBI


# d0bb6f25 29-Nov-2010 Matthew D Fleming <mdf@FreeBSD.org>

Slightly modify the logic in sysctl_find_oid to reduce the indentation.
There should be no functional change.

MFC after: 3 days


# 5127ecb8 29-Nov-2010 Matthew D Fleming <mdf@FreeBSD.org>

Use the SYSCTL_CHILDREN macro in kern_sysctl.c to help de-obfuscate the
code.

MFC after: 3 days


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# 4e657159 16-Sep-2010 Matthew D Fleming <mdf@FreeBSD.org>

Re-add r212370 now that the LOR in powerpc64 has been resolved:

Add a drain function for struct sysctl_req, and use it for a variety
of handlers, some of which had to do awkward things to get a large
enough SBUF_FIXEDLEN buffer.

Note that some sysctl handlers were explicitly outputting a trailing
NUL byte. This behaviour was preserved, though it should not be
necessary.

Reviewed by: phk (original patch)


# 404a593e 13-Sep-2010 Matthew D Fleming <mdf@FreeBSD.org>

Revert r212370, as it causes a LOR on powerpc. powerpc does a few
unexpected things in copyout(9) and so wiring the user buffer is not
sufficient to perform a copyout(9) while holding a random mutex.

Requested by: nwhitehorn


# dd67e210 09-Sep-2010 Matthew D Fleming <mdf@FreeBSD.org>

Add a drain function for struct sysctl_req, and use it for a variety of
handlers, some of which had to do awkward things to get a large enough
FIXEDLEN buffer.

Note that some sysctl handlers were explicitly outputting a trailing NUL
byte. This behaviour was preserved, though it should not be necessary.

Reviewed by: phk


# da2a30fc 13-Aug-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

MFC r196176:

Make it possible to change the vnet sysctl variables on jails
with their own virtual network stack. Jails only inheriting a
network stack cannot change anything that cannot be changed from
within a prison.

Reviewed by: rwatson, zec

Approved by: re (kib)


# eb79e1c7 13-Aug-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

Make it possible to change the vnet sysctl variables on jails
with their own virtual network stack. Jails only inheriting a
network stack cannot change anything that cannot be changed from
within a prison.

Reviewed by: rwatson, zec
Approved by: re (kib)


# 530c0060 01-Aug-2009 Robert Watson <rwatson@FreeBSD.org>

Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks. Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by: bz
Approved by: re (vimage blanket)


# a08362ce 21-Jul-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

sysctl_msec_to_ticks is used with both virtualized and
non-vrtiualized sysctls so we cannot used one common function.

Add a macro to convert the arg1 in the virtualized case to
vnet.h to not expose the maths to all over the code.

Add a wrapper for the single virtualized call, properly handling
arg1 and call the default implementation from there.

Convert the two over places to use the new macro.

Reviewed by: rwatson
Approved by: re (kib)


# eddfbb76 14-Jul-2009 Robert Watson <rwatson@FreeBSD.org>

Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)


# ebd8672c 17-Jun-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

Add explicit includes for jail.h to the files that need them and
remove the "hidden" one from vimage.h.


# 9ed47d01 15-Jun-2009 Jamie Gritton <jamie@FreeBSD.org>

Get vnets from creds instead of threads where they're available, and from
passed threads instead of curthread.

Reviewed by: zec, julian
Approved by: bz (mentor)


# bcf11e8d 05-Jun-2009 Robert Watson <rwatson@FreeBSD.org>

Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.

Discussed with: pjd


# 433e2f47 15-May-2009 Dag-Erling Smørgrav <des@FreeBSD.org>

Remove do-nothing code that was required to dirty the old buffer on Alpha.

Coverity ID: 838
Approved by: jhb, alc


# 6b72d8db 15-May-2009 Konstantin Belousov <kib@FreeBSD.org>

Revert r192094. The revision caused problems for sysctl(3) consumers
that expect that oldlen is filled with required buffer length even when
supplied buffer is too short and returned error is ENOMEM.

Redo the fix for kern.proc.filedesc, by reverting the req->oldidx when
remaining buffer space is too short for the current kinfo_file structure.
Also, only ignore ENOMEM. We have to convert ENOMEM to no error condition
to keep existing interface for the sysctl, though.

Reported by: ed, Florian Smeets <flo kasimir com>
Tested by: pho


# 3e829b18 14-May-2009 John Baldwin <jhb@FreeBSD.org>

- Use a separate sx lock to try to limit the number of concurrent userland
sysctl requests to avoid wiring too much user memory. Only grab this
lock if the user's old buffer is larger than a page as a tradeoff to
allow more concurrency for common small requests.
- Just use a shared lock on the sysctl tree for user sysctl requests now.

MFC after: 1 week


# e401a6a5 14-May-2009 Konstantin Belousov <kib@FreeBSD.org>

Do not advance req->oldidx when sysctl_old_user returning an
error due to copyout failure or short buffer.

The later breaks the usermode iterators of the sysctl results that pack
arbitrary number of variable-sized structures. Iterator expects that
kernel filled exactly oldlen bytes, and tries to interpret half-filled
or garbage structure at the end of the buffer. In particular,
kinfo_getfile(3) segfaulted.

Reported and tested by: pho
MFC after: 3 weeks


# f6dfe47a 30-Apr-2009 Marko Zec <zec@FreeBSD.org>

Permit buiding kernels with options VIMAGE, restricted to only a single
active network stack instance. Turning on options VIMAGE at compile
time yields the following changes relative to default kernel build:

1) V_ accessor macros for virtualized variables resolve to structure
fields via base pointers, instead of being resolved as fields in global
structs or plain global variables. As an example, V_ifnet becomes:

options VIMAGE: ((struct vnet_net *) vnet_net)->_ifnet
default build: vnet_net_0._ifnet
options VIMAGE_GLOBALS: ifnet

2) INIT_VNET_* macros will declare and set up base pointers to be used
by V_ accessor macros, instead of resolving to whitespace:

INIT_VNET_NET(ifp->if_vnet); becomes

struct vnet_net *vnet_net = (ifp->if_vnet)->mod_data[VNET_MOD_NET];

3) Memory for vnet modules registered via vnet_mod_register() is now
allocated at run time in sys/kern/kern_vimage.c, instead of per vnet
module structs being declared as globals. If required, vnet modules
can now request the framework to provide them with allocated bzeroed
memory by filling in the vmi_size field in their vmi_modinfo structures.

4) structs socket, ifnet, inpcbinfo, tcpcb and syncache_head are
extended to hold a pointer to the parent vnet. options VIMAGE builds
will fill in those fields as required.

5) curvnet is introduced as a new global variable in options VIMAGE
builds, always pointing to the default and only struct vnet.

6) struct sysctl_oid has been extended with additional two fields to
store major and minor virtualization module identifiers, oid_v_subs and
oid_v_mod. SYSCTL_V_* family of macros will fill in those fields
accordingly, and store the offset in the appropriate vnet container
struct in oid_arg1.
In sysctl handlers dealing with virtualized sysctls, the
SYSCTL_RESOLVE_V_ARG1() macro will compute the address of the target
variable and make it available in arg1 variable for further processing.

Unused fields in structs vnet_inet, vnet_inet6 and vnet_ipfw have
been deleted.

Reviewed by: bz, rwatson
Approved by: julian (mentor)


# a56be37e 11-Mar-2009 John Baldwin <jhb@FreeBSD.org>

Add a new type of KTRACE record for sysctl(3) invocations. It uses the
internal sysctl_sysctl_name() handler to map the MIB array to a string
name and logs this name in the trace log. This can be useful to see
exactly which sysctls a thread is invoking.

MFC after: 1 month


# 72150f4c 10-Mar-2009 John Baldwin <jhb@FreeBSD.org>

- Remove a recently added comment from kernel_sysctlbyname() that isn't
needed.
- Move the release of the sysctl sx lock after the vsunlock() in
userland_sysctl() to restore the original memlock behavior of
minimizing the amount of memory wired to handle sysctl requests.

MFC after: 1 week


# 875b66a0 06-Feb-2009 John Baldwin <jhb@FreeBSD.org>

Expand the scope of the sysctllock sx lock to protect the sysctl tree itself.
Back in 1.1 of kern_sysctl.c the sysctl() routine wired the "old" userland
buffer for most sysctls (everything except kern.vnode.*). I think to prevent
issues with wiring too much memory it used a 'memlock' to serialize all
sysctl(2) invocations, meaning that only one user buffer could be wired at
a time. In 5.0 the 'memlock' was converted to an sx lock and renamed to
'sysctl lock'. However, it still only served the purpose of serializing
sysctls to avoid wiring too much memory and didn't actually protect the
sysctl tree as its name suggested. These changes expand the lock to actually
protect the tree.

Later on in 5.0, sysctl was changed to not wire buffers for requests by
default (sysctl_handle_opaque() will still wire buffers larger than a single
page, however). As a result, user buffers are no longer wired as often.
However, many sysctl handlers still wire user buffers, so it is still
desirable to serialize userland sysctl requests. Kernel sysctl requests
are allowed to run in parallel, however.

- Expose sysctl_lock()/sysctl_unlock() routines to exclusively lock the
sysctl tree for a few places outside of kern_sysctl.c that manipulate
the sysctl tree directly including the kernel linker and vfs_register().
- sysctl_register() and sysctl_unregister() require the caller to lock
the sysctl lock using sysctl_lock() and sysctl_unlock(). The rest of
the public sysctl API manage the locking internally.
- Add a locked variant of sysctl_remove_oid() for internal use so that
external uses of the API do not need to be aware of locking requirements.
- The kernel linker no longer needs Giant when manipulating the sysctl
tree.
- Add a missing break to the loop in vfs_register() so that we stop looking
at the sysctl MIB once we have changed it.

MFC after: 1 month


# f3b86a5f 28-Jan-2009 Ed Schouten <ed@FreeBSD.org>

Mark most often used sysctl's as MPSAFE.

After running a `make buildkernel', I noticed most of the Giant locks in
sysctl are only caused by a very small amount of sysctl's:

- sysctl.name2oid. This one is locked by SYSCTL_LOCK, just like
sysctl.oidfmt.

- kern.ident, kern.osrelease, kern.version, etc. These are just constant
strings.

- kern.arandom, used by the stack protector. It is already protected by
arc4_mtx.

I also saw the following sysctl's show up. Not as often as the ones
above, but still quite often:

- security.jail.jailed. Also mark security.jail.list as MPSAFE. They
don't need locking or already use allprison_lock.

- kern.devname, used by devname(3), ttyname(3), etc.

This seems to reduce Giant locking inside sysctl by ~75% in my primitive
test setup.


# 1e99191d 23-Jan-2009 John Baldwin <jhb@FreeBSD.org>

Add a flag to tag individual sysctl leaf nodes as MPSAFE and thus not
needing Giant.

Submitted by: csjp (an older version)


# 4cfe4790 31-Dec-2008 Ed Schouten <ed@FreeBSD.org>

Don't clobber sysctl_root()'s error number.

When sysctl() is being called with a buffer that is too small, it will
return ENOMEM. Unfortunately the changes I made the other day sets the
error number to 0, because it just returns the error number of the
copyout(). Revert this part of the change.


# 71b6d504 29-Dec-2008 Ed Schouten <ed@FreeBSD.org>

Fix compilation. Also move ogetkerninfo() to kern_xxx.c.

It seems I forgot to remove `int error' from a single piece of code. I'm
also moving ogetkerninfo() to kern_xxx.c, because it belongs to the
class of compat system information system calls, not the generic sysctl
code.


# ddf9d243 28-Dec-2008 Ed Schouten <ed@FreeBSD.org>

Push down Giant inside sysctl. Also add some more assertions to the code.

In the existing code we didn't really enforce that callers hold Giant
before calling userland_sysctl(), even though there is no guarantee it
is safe. Fix this by just placing Giant locks around the call to the oid
handler. This also means we only pick up Giant for a very short period
of time. Maybe we should add MPSAFE flags to sysctl or phase it out all
together.

I've also added SYSCTL_LOCK_ASSERT(). We have to make sure sysctl_root()
and name2oid() are called with the sysctl lock held.

Reviewed by: Jille Timmermans <jille quis cx>


# cd2983ca 12-Dec-2008 Konstantin Belousov <kib@FreeBSD.org>

Uio_yield() already does DROP_GIANT/PICKUP_GIANT, no need to repeat this
around the call.

Noted by: bde


# af80b2c9 11-Dec-2008 Konstantin Belousov <kib@FreeBSD.org>

The userland_sysctl() function retries sysctl_root() until returned
error is not EAGAIN. Several sysctls that inspect another process use
p_candebug() for checking access right for the curproc. p_candebug()
returns EAGAIN for some reasons, in particular, for the process doing
exec() now. If execing process tries to lock Giant, we get a livelock,
because sysctl handlers are covered by Giant, and often do not sleep.

Break the livelock by dropping Giant and allowing other threads to
execute in the EAGAIN loop.

Also, do not return EAGAIN from p_candebug() when process is executing,
use more appropriate EBUSY error [1].

Reported and tested by: pho
Suggested by: rwatson [1]
Reviewed by: rwatson, des
MFC after: 1 week


# 97021c24 26-Nov-2008 Marko Zec <zec@FreeBSD.org>

Merge more of currently non-functional (i.e. resolving to
whitespace) macros from p4/vimage branch.

Do a better job at enclosing all instantiations of globals
scheduled for virtualization in #ifdef VIMAGE_GLOBALS blocks.

De-virtualize and mark as const saorder_state_alive and
saorder_state_any arrays from ipsec code, given that they are never
updated at runtime, so virtualizing them would be pointless.

Reviewed by: bz, julian
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# cd17ceaa 30-Nov-2007 Peter Wemm <peter@FreeBSD.org>

Add sysctl_rename_oid() to support device_set_unit() usage. Otherwise,
when unit numbers are changed, the sysctl devinfo tree gets out of sync
and duplicate trees are attempted to be attached with the original name.


# 30d239bc 24-Oct-2007 Robert Watson <rwatson@FreeBSD.org>

Merge first in a series of TrustedBSD MAC Framework KPI changes
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:

mac_<object>_<method/action>
mac_<object>_check_<method/action>

The previous naming scheme was inconsistent and mostly
reversed from the new scheme. Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier. Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods. Also simplify, slightly,
some entry point names.

All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.

Sponsored by: SPARTA (original patches against Mac OS X)
Obtained from: TrustedBSD Project, Apple Computer


# 70ffc2fb 02-Sep-2007 Robert Watson <rwatson@FreeBSD.org>

In userland_sysctl(), call useracc() with the actual newlen value to be
used, rather than the one passed via 'req', which may not reflect a
rewrite. This call to useracc() is redundant to validation performed by
later copyin()/copyout() calls, so there isn't a security issue here,
but this could technically lead to excessive validation of addresses if
the length in newlen is shorter than req.newlen.

Approved by: re (kensmith)
Reviewed by: jhb
Submitted by: Constantine A. Murenin <cnst+freebsd@bugmail.mojo.ru>
Sponsored by: Google Summer of Code 2007


# 32f9753c 11-Jun-2007 Robert Watson <rwatson@FreeBSD.org>

Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in
some cases, move to priv_check() if it was an operation on a thread and
no other flags were present.

Eliminate caller-side jail exception checking (also now-unused); jail
privilege exception code now goes solely in kern_jail.c.

We can't yet eliminate suser() due to some cases in the KAME code where
a privilege check is performed and then used in many different deferred
paths. Do, however, move those prototypes to priv.h.

Reviewed by: csjp
Obtained from: TrustedBSD Project


# df82ff50 04-Jun-2007 David Malone <dwmalone@FreeBSD.org>

Add a function for exporting 64 bit types.


# 873fbcd7 05-Mar-2007 Robert Watson <rwatson@FreeBSD.org>

Further system call comment cleanup:

- Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde)
- Remove extra blank lines in some cases.
- Add extra blank lines in some cases.
- Remove no-op comments consisting solely of the function name, the word
"syscall", or the system call name.
- Add punctuation.
- Re-wrap some comments.


# 0c14ff0e 04-Mar-2007 Robert Watson <rwatson@FreeBSD.org>

Remove 'MPSAFE' annotations from the comments above most system calls: all
system calls now enter without Giant held, and then in some cases, acquire
Giant explicitly.

Remove a number of other MPSAFE annotations in the credential code and
tweak one or two other adjacent comments.


# acd3428b 06-Nov-2006 Robert Watson <rwatson@FreeBSD.org>

Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges. These may
require some future tweaking.

Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>


# aed55708 22-Oct-2006 Robert Watson <rwatson@FreeBSD.org>

Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from: TrustedBSD Project
Sponsored by: SPARTA


# be70abcc 16-Jun-2006 Yaroslav Tykhiy <ytykhiy@gmail.com>

Kill an XXX remark that has been untrue since rev. 1.150 of this file.


# a4684d74 16-Feb-2006 Andre Oppermann <andre@FreeBSD.org>

Make sysctl_msec_to_ticks(SYSCTL_HANDLER_ARGS) generally available instead
of being private to tcp_timer.c.

Sponsored by: TCP/IP Optimization Fundraise 2005
MFC after: 3 days


# f4af687a 24-Jan-2006 Don Lewis <truckman@FreeBSD.org>

Touch all the pages wired by sysctl_wire_old_buffer() to avoid PTE
modified bit emulation traps on Alpha while holding locks in the
sysctl handler.

A better solution would be to pass a hint to the Alpha pmap code to
tell mark these pages as modified when they as they are being wired,
but that appears to be more difficult to implement.

Suggested by: jhb
MFC after: 3 days


# d8339a26 08-Aug-2005 Christian S.J. Peron <csjp@FreeBSD.org>

Drop in a WITNESS_WARN into SYSCTL_IN to make sure that we are
not holding any non-sleep-able-locks locks when copyin is called.
This gets executed un-conditionally since we have no function
to wire the buffer in this direction.

Pointed out by: truckman
MFC after: 1 week


# 417ab24f 08-Aug-2005 Christian S.J. Peron <csjp@FreeBSD.org>

Check to see if we wired the user-supplied buffers in SYSCTL_OUT, if
the buffer has not been wired and we are holding any non-sleep-able locks,
drop a witness warning. If the buffer has not been wired, it is possible
that the writing of the data can sleep, especially if the page is not in
memory. This can result in a number of different locking issues, including
dead locks.

MFC after: 1 week
Discussed with: rwatson
Reviewed by: jhb


# 85eb15a2 09-Feb-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Make another bunch of SYSCTL_NODEs static


# 5937226d 07-Feb-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Add a missing prefix to a struct field for consistency.


# 46003fb3 31-Dec-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Be consistent and always use form 'return (value);' instead of 'return value;'.
We had (before this change) 84 lines where it was style(9)-clean and 15 lines
where it was not.


# df970488 27-Oct-2004 Robert Watson <rwatson@FreeBSD.org>

Move the 'debug' sysctl tree under options SYSCTL_DEBUG. It generates
an inordinate amount of synchronous console output that is fairly
undesirable on slower serial console. It's easily hit by accident
when frobbing other sysctls late at night.


# a1bd71b2 12-Oct-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Add missing zero flag arguments to calls to userland_sysctl()


# a7bc3102 11-Oct-2004 Peter Wemm <peter@FreeBSD.org>

Put on my peril sensitive sunglasses and add a flags field to the internal
sysctl routines and state. Add some code to use it for signalling the need
to downconvert a data structure to 32 bits on a 64 bit OS when requested by
a 32 bit app.

I tried to do this in a generic abi wrapper that intercepted the sysctl
oid's, or looked up the format string etc, but it was a real can of worms
that turned into a fragile mess before I even got it partially working.

With this, we can now run 'sysctl -a' on a 32 bit sysctl binary and have
it not abort. Things like netstat, ps, etc have a long way to go.

This also fixes a bug in the kern.ps_strings and kern.usrstack hacks.
These do matter very much because they are used by libc_r and other things.


# 00fbcda8 28-Jul-2004 Alexander Kabaev <kan@FreeBSD.org>

Avoid casts as lvalues.


# 56f21b9d 26-Jul-2004 Colin Percival <cperciva@FreeBSD.org>

Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is
somewhat clearer, but more importantly allows for a consistent naming
scheme for suser_cred flags.

The old name is still defined, but will be removed in a few days (unless I
hear any complaints...)

Discussed with: rwatson, scottl
Requested by: jhb


# b4adfcf2 10-Jun-2004 Brian Feldman <green@FreeBSD.org>

Make sysctl_wire_old_buffer() respect ENOMEM from vslock() by marking
the valid length as 0. This prevents vsunlock() from removing a system
wire from memory that was not successfully wired (by us).

Submitted by: tegge


# 7f8a436f 05-Apr-2004 Warner Losh <imp@FreeBSD.org>

Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core


# a961520c 15-Mar-2004 Don Lewis <truckman@FreeBSD.org>

Rename the wiredlen member of struct sysctl_req to validlen and always
set it to avoid the need for a bunch of code that tests whether or
not the lock member is set to REQ_WIRED in order to determine which
length member should be used.

Fix another bug in the oldlen return value code.

Fix a potential wired memory leak if a sysctl handler uses
sysctl_wire_old_buffer() and returns an EAGAIN error to trigger
a retry.


# 8ac3e8e9 15-Mar-2004 Don Lewis <truckman@FreeBSD.org>

Don't bother calling vslock() and vsunlock() if oldlen is zero.

If vslock() returns ENOMEM, sysctl_wire_old_buffer() should set
wiredlen to zero and return zero (success) so that the handler will
operate according to sysctl(3):
The size of the buffer is given by the location specified by
oldlenp before the call, and that location gives the amount
of data copied after a successful call and after a call that
returns with the error code ENOMEM.
The handler will return an ENOMEM error because the zero length
buffer will overflow.


# ce8660e3 14-Mar-2004 Don Lewis <truckman@FreeBSD.org>

Revert to the original vslock() and vsunlock() API with the following
exceptions:
Retain the recently added vslock() error return.

The type of the len argument should be size_t, not u_int.

Suggested by: bde


# 16929939 05-Mar-2004 Don Lewis <truckman@FreeBSD.org>

Undo the merger of mlock()/vslock and munlock()/vsunlock() and the
introduction of kern_mlock() and kern_munlock() in
src/sys/kern/kern_sysctl.c 1.150
src/sys/vm/vm_extern.h 1.69
src/sys/vm/vm_glue.c 1.190
src/sys/vm/vm_mmap.c 1.179
because different resource limits are appropriate for transient and
"permanent" page wiring requests.

Retain the kern_mlock() and kern_munlock() API in the revived
vslock() and vsunlock() functions.

Combine the best parts of each of the original sets of implementations
with further code cleanup. Make the mclock() and vslock()
implementations as similar as possible.

Retain the RLIMIT_MEMLOCK check in mlock(). Move the most strigent
test, which can return EAGAIN, last so that requests that have no
hope of ever being satisfied will not be retried unnecessarily.

Disable the test that can return EAGAIN in the vslock() implementation
because it will cause the sysctl code to wedge.

Tested by: Cy Schubert <Cy.Schubert AT komquats.com>


# 21885af5 27-Feb-2004 Dag-Erling Smørgrav <des@FreeBSD.org>

Add sysctl_move_oid() which reparents an existing OID.


# 47934cef 25-Feb-2004 Don Lewis <truckman@FreeBSD.org>

Split the mlock() kernel code into two parts, mlock(), which unpacks
the syscall arguments and does the suser() permission check, and
kern_mlock(), which does the resource limit checking and calls
vm_map_wire(). Split munlock() in a similar way.

Enable the RLIMIT_MEMLOCK checking code in kern_mlock().

Replace calls to vslock() and vsunlock() in the sysctl code with
calls to kern_mlock() and kern_munlock() so that the sysctl code
will obey the wired memory limits.

Nuke the vslock() and vsunlock() implementations, which are no
longer used.

Add a member to struct sysctl_req to track the amount of memory
that is wired to handle the request.

Modify sysctl_wire_old_buffer() to return an error if its call to
kern_mlock() fails. Only wire the minimum of the length specified
in the sysctl request and the length specified in its argument list.
It is recommended that sysctl handlers that use sysctl_wire_old_buffer()
should specify reasonable estimates for the amount of data they
want to return so that only the minimum amount of memory is wired
no matter what length has been specified by the request.

Modify the callers of sysctl_wire_old_buffer() to look for the
error return.

Modify sysctl_old_user to obey the wired buffer length and clean up
its implementation.

Reviewed by: bms


# 63dba32b 21-Feb-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Reimplement sysctls handling by MAC framework.
Now I believe it is done in the right way.

Removed some XXMAC cases, we now assume 'high' integrity level for all
sysctls, except those with CTLFLAG_ANYBODY flag set. No more magic.

Reviewed by: rwatson
Approved by: rwatson, scottl (mentor)
Tested with: LINT (compilation), mac_biba(4) (functionality)


# f0597024 05-Oct-2003 Bruce M Simpson <bms@FreeBSD.org>

Bring back sysctl_wire_old_buffer(). Fix a bug in sysctl_handle_opaque()
whereby the pointers would not get reset on a retried SYSCTL_OUT() call.

Noticed by: bde


# dcf59a59 05-Oct-2003 Bruce M Simpson <bms@FreeBSD.org>

Fix a security problem in sysctl() the long way round.

Use pre-emption detection to avoid the need for wiring a userland buffer
when copying opaque data structures.

sysctl_wire_old_buffer() is now a no-op. Other consumers of this
API should use pre-emption detection to notice update collisions.

vslock() and vsunlock() should no longer be called by any code
and should be retired in subsequent commits.

Discussed with: pete, phk
MFC after: 1 week


# 51830edc 05-Oct-2003 Bruce M Simpson <bms@FreeBSD.org>

Fold the vslock() and vsunlock() calls in this file with #if 0's; they will
go away in due course. Involuntary pre-emption means that we can't count
on wiring of pages alone for consistency when performing a SYSCTL_OUT()
bigger than PAGE_SIZE.

Discussed with: pete, phk


# 5be99846 04-Oct-2003 Bruce M Simpson <bms@FreeBSD.org>

Remove magic numbers surrounding locking state in the sysctl module, and
replace them with more meaningful defines.


# 677b542e 10-Jun-2003 David E. O'Brien <obrien@FreeBSD.org>

Use __FBSDID().


# 193f2edb 29-May-2003 Maxime Henrion <mux@FreeBSD.org>

When loading a module that contains a sysctl which is already compiled
in the kernel, the sysctl_register() call would fail, as expected.
However, when unloading this module again, the kernel would then panic
in sysctl_unregister(). Print a message error instead.

Submitted by: Nicolai Petri <nicolai@catpipe.net>
Reviewed by: imp
Approved by: re@ (jhb)


# 74019059 11-Mar-2003 John Baldwin <jhb@FreeBSD.org>

Use a shorter and less redundant name for the sysctl tree lock.


# 26306795 04-Mar-2003 John Baldwin <jhb@FreeBSD.org>

Replace calls to WITNESS_SLEEP() and witness_list() with equivalent calls
to WITNESS_WARN().


# 90623e1a 22-Feb-2003 Robert Watson <rwatson@FreeBSD.org>

Don't panic when enumerating SYSCTL_NODE() nodes without any children
nodes.

Submitted by: green, Hiten Pandya <hiten@unixdaemons.com>


# a163d034 18-Feb-2003 Warner Losh <imp@FreeBSD.org>

Back out M_* changes, per decision of the TRB.

Approved by: trb


# 44956c98 21-Jan-2003 Alfred Perlstein <alfred@FreeBSD.org>

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


# fe41ca53 14-Jan-2003 Matthew Dillon <dillon@FreeBSD.org>

Introduce the ability to flag a sysctl for operation at secure level 2 or 3
in addition to secure level 1. The mask supports up to a secure level of 8
but only add defines through CTLFLAG_SECURE3 for now.

As per the missif in the log entry for 1.11 of ip_fw2.c which added the
secure flag to the IPFW sysctl's in the first place, change the secure
level requirement from 1 to 3 now that we have support for it.

Reviewed by: imp
With Design Suggestions by: imp


# 5d9155df 10-Jan-2003 Maxime Henrion <mux@FreeBSD.org>

Fix kernel build.

Pointy hats to: dillon, Hiten Pandya <hiten@unixdaemons.com>


# d3fc69ee 27-Oct-2002 Robert Watson <rwatson@FreeBSD.org>

Implement mac_check_system_sysctl(), a MAC Framework entry point to
permit MAC policies to augment the security protections on sysctl()
operations. This is not really a wonderful entry point, as we
only have access to the MIB of the target sysctl entry, rather than
the more useful entry name, but this is sufficient for policies
like Biba that wish to use their notions of privilege or integrity
to prevent inappropriate sysctl modification. Affects MAC kernels
only. Since SYSCTL_LOCK isn't in sysctl.h, just kern_sysctl.c,
we can't assert the SYSCTL subsystem lockin the MAC Framework.

Approved by: re
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


# 5b8ee62b 26-Oct-2002 Maxime Henrion <mux@FreeBSD.org>

Fix a style nit.


# e80fb434 17-Oct-2002 Robert Drehmel <robert@FreeBSD.org>

Use strlcpy() instead of strncpy() to copy NUL terminated strings
for safety and consistency.


# 37c84183 28-Sep-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Be consistent about "static" functions: if the function is marked
static in its prototype, mark it static at the definition too.

Inspired by: FlexeLint warning #512


# 306e6b83 10-Aug-2002 Maxime Henrion <mux@FreeBSD.org>

Introduce a new sysctl flag, CTLFLAG_SKIP, which will cause
sysctl_sysctl_next() to skip this sysctl. The sysctl is
still available, but doesn't appear in a "sysctl -a".

This is especially useful when you want to deprecate a sysctl,
and add a warning into it to warn users that they are using
an old interface. Without this flag, the warning would get
echoed when running "sysctl -a" (which happens at boot).


# b74f2c18 06-Aug-2002 Don Lewis <truckman@FreeBSD.org>

Don't automagically call vslock() from SYSCTL_OUT(). Instead, complain
about calls to SYSCTL_OUT() made with locks held if the buffer has not
been pre-wired. SYSCTL_OUT() should not be called while holding locks,
but if this is not possible, the buffer should be wired by calling
sysctl_wire_old_buffer() before grabbing any locks.


# 9e74cba3 28-Jul-2002 Don Lewis <truckman@FreeBSD.org>

Make a temporary copy of the output data in the generic sysctl handlers
so that the data is less likely to be inconsistent if SYSCTL_OUT() blocks.
If the data is large, wire the output buffer instead.

This is somewhat less than optimal, since the handler could skip the copy
if it knew that the data was static.

If the data is dynamic, we are still not guaranteed to get a consistent
copy since another processor could change the data while the copy is in
progress because the data is not locked. This problem could be solved if
the generic handlers had the ability to grab the proper lock before the
copy and release it afterwards.

This may duplicate work done in other sysctl handlers in the kernel which
also copy the data, possibly while a lock is held, before calling they call
a generic handler to output the data. These handlers should probably call
SYSCTL_OUT() directly.


# 0600730d 22-Jul-2002 Don Lewis <truckman@FreeBSD.org>

Provide a way for sysctl handlers to pre-wire their output buffer before
they grab a lock so that they don't block in SYSCTL_OUT() with the lock
being held.


# f0d2d038 15-Jul-2002 Mark Murray <markm@FreeBSD.org>

Fix a bazillion lint and WARNS warnings. One major fix is the removal of
semicolons from the end of macros:

#define FOO() bar(a,b,c);

becomes

#define FOO() bar(a,b,c)

Thus requiring the semicolon in the invocation of FOO. This is much
cleaner syntax and more consistent with expectations when writing
function-like things in source.

With both peril-sensitive sunglasses and flame-proof undies on, tighten
up some types, and work around some warnings generated by this. There
are some _horrible_ const/non-const issues in this code.


# 01609114 28-Jun-2002 Alfred Perlstein <alfred@FreeBSD.org>

more caddr_t removal.


# 3bd1da29 01-Apr-2002 Robert Watson <rwatson@FreeBSD.org>

Update comment regarding the locking of the sysctl tree.

Rename memlock to sysctllock, and MEMLOCK()/MEMUNLOCK() to SYSCTL_LOCK()/
SYSCTL_UNLOCK() and related changes to make the lock names make more
sense.

Submitted by: Jonathan Mini <mini@haikugeek.com>


# 29a2c0cd 01-Apr-2002 Alfred Perlstein <alfred@FreeBSD.org>

Use sx locks instead of flags+tsleep locks.

Submitted by: Jonathan Mini <mini@haikugeek.com>


# 44731cab 01-Apr-2002 John Baldwin <jhb@FreeBSD.org>

Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API. The entire API now consists of two functions
similar to the pre-KSE API. The suser() function takes a thread pointer
as its only argument. The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0. The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on: smp@


# 7906271f 22-Mar-2002 Robert Watson <rwatson@FreeBSD.org>

In sysctl, req->td is believed always to be non-NULL, so there's no need
to test req->td for NULL values and then do somewhat more bizarre things
relating to securelevel special-casing and suser checks. Remove the
testing and conditional security checks based on req->td!=NULL, and insert
a KASSERT that td != NULL. Callers to sysctl must always specify the
thread (be it kernel or otherwise) requesting the operation, or a
number of current sysctls will fail due to assumptions that the thread
exists.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs
Discussed with: bde


# a854ed98 27-Feb-2002 John Baldwin <jhb@FreeBSD.org>

Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.


# 6105f815 15-Dec-2001 Luigi Rizzo <luigi@FreeBSD.org>

Add code to export and print the description associated to sysctl
variables. Use the -d flag in sysctl(8) to see this information.

Possible extensions to sysctl:
+ report variables that do not have a description
+ given a name, report the oid it maps to.

Note to developers: have a look at your code, there are a number of
variables which do not have a description.

Note to developers: do we want this in 4.5 ? It is a very small change
and very useful for documentation purposes.

Suggested by: Orion Hodson


# 023a0e61 27-Nov-2001 Peter Wemm <peter@FreeBSD.org>

Dont print the sysctl node tree unless you're root.

Found by: jkb (Yahoo OS troublemaker)


# ce178806 07-Nov-2001 Robert Watson <rwatson@FreeBSD.org>

o Replace reference to 'struct proc' with 'struct thread' in 'struct
sysctl_req', which describes in-progress sysctl requests. This permits
sysctl handlers to have access to the current thread, permitting work
on implementing td->td_ucred, migration of suser() to using struct
thread to derive the appropriate ucred, and allowing struct thread to be
passed down to other code, such as network code where td is not currently
available (and curproc is used).

o Note: netncp and netsmb are not updated to reflect this change, as they
are not currently KSE-adapted.

Reviewed by: julian
Obtained from: TrustedBSD Project


# 88fbb423 12-Oct-2001 Peter Pentchev <roam@FreeBSD.org>

Remove the panic when trying to register a sysctl with an oid too high.
This stops panics on unloading modules which define their own sysctl sets.

However, this also removes the protection against somebody actually
defining a static sysctl with an oid in the range of the dynamic ones,
which would break badly if there is already a dynamic sysctl with
the requested oid.

Apparently, the algorithm for removing sysctl sets needs a bit more work.
For the present, the panic I introduced only leads to Bad Things (tm).

Submitted by: many users of -current :(
Pointy hat to: roam (myself) for not testing rev. 1.112 enough.


# c2f413af 26-Sep-2001 Robert Watson <rwatson@FreeBSD.org>

o Modify sysctl access control check to use securelevel_gt(), and
clarify sysctl access control logic.

Obtained from: TrustedBSD Project


# b40ce416 12-Sep-2001 Julian Elischer <julian@FreeBSD.org>

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# fb99ab88 01-Sep-2001 Matthew Dillon <dillon@FreeBSD.org>

Giant Pushdown

clock_gettime() clock_settime() nanosleep() settimeofday()
adjtime() getitimer() setitimer() __sysctl() ogetkerninfo()
sigaction() osigaction() sigpending() osigpending() osigvec()
osigblock() osigsetmask() sigsuspend() osigsuspend() osigstack()
sigaltstack() kill() okillpg() trapsignal() nosys()


# df557538 29-Aug-2001 Peter Wemm <peter@FreeBSD.org>

Fix the ogetkerninfo() syscall handling of sizes for
KINFO_BSDI_SYSINFO. This supposedly fixes Netscape 3.0.4 (bsdi binary)
on -current. (and is also applicable to RELENG_4)

PR: 25476
Submitted by: Philipp Mergenthaler <un1i@rz.uni-karlsruhe.de>


# 7ca4d05f 25-Jul-2001 Peter Pentchev <roam@FreeBSD.org>

Make dynamic sysctl entries start at 0x100, not decimal 100 - there are
static entries with oid's over 100, and defining enough dynamic entries
causes an overlap.

Move the "magic" value 0x100 into <sys/sysctl.h> where it belongs.

PR: 29131
Submitted by: "Alexander N. Kabaev" <kabaev@mail.ru>
Reviewed by: -arch, -audit
MFC after: 2 weeks


# 107e7dc5 25-Jul-2001 Peter Pentchev <roam@FreeBSD.org>

Style(9): function names on a separate line, max line length 80 chars.

Reviewed by: -arch, -audit
MFC after: 2 weeks


# 2f7f966c 22-Jun-2001 Matt Jacob <mjacob@FreeBSD.org>

int -> size_t fix


# f41325db 13-Jun-2001 Peter Wemm <peter@FreeBSD.org>

With this commit, I hereby pronounce gensetdefs past its use-by date.

Replace the a.out emulation of 'struct linker_set' with something
a little more flexible. <sys/linker_set.h> now provides macros for
accessing elements and completely hides the implementation.

The linker_set.h macros have been on the back burner in various
forms since 1998 and has ideas and code from Mike Smith (SET_FOREACH()),
John Polstra (ELF clue) and myself (cleaned up API and the conversion
of the rest of the kernel to use it).

The macros declare a strongly typed set. They return elements with the
type that you declare the set with, rather than a generic void *.

For ELF, we use the magic ld symbols (__start_<setname> and
__stop_<setname>). Thanks to Richard Henderson <rth@redhat.com> for the
trick about how to force ld to provide them for kld's.

For a.out, we use the old linker_set struct.

NOTE: the item lists are no longer null terminated. This is why
the code impact is high in certain areas.

The runtime linker has a new method to find the linker set
boundaries depending on which backend format is in use.

linker sets are still module/kld unfriendly and should never be used
for anything that may be modular one day.

Reviewed by: eivind


# b8edb44c 02-Jun-2001 Dima Dorfman <dd@FreeBSD.org>

When tring to find out if this is a request for a write in
kernel_sysctl and userland_sysctl, check for whether new is NULL, not
whether newlen is 0. This allows one to set a string sysctl to "".


# 1890520a 18-May-2001 Jonathan Lemon <jlemon@FreeBSD.org>

Add convenience function kernel_sysctlbyname() for kernel consumers,
so they don't have to roll their own sysctlbyname function.


# 3a515572 07-Mar-2001 Thomas Moestl <tmm@FreeBSD.org>

Make the SYSCTL_OUT handlers sysctl_old_user() and sysctl_old_kernel()
more robust. They would correctly return ENOMEM for the first time when
the buffer was exhausted, but subsequent calls in this case could cause
writes ouside of the buffer bounds.

Approved by: rwatson


# fc2ffbe6 04-Feb-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Mechanical change to use <sys/queue.h> macro API instead of
fondling implementation details.

Created with: sed(1)
Reviewed by: md5(1)


# fea8c9f2 29-Jan-2001 Peter Wemm <peter@FreeBSD.org>

Remove unused variable 'int n;'


# 5ed57d32 23-Jan-2001 Kirk McKusick <mckusick@FreeBSD.org>

Never reuse AUTO_OID values.

Approved by: Alfred Perlstein <bright@wintelcom.net>


# c5f9b6d0 05-Jan-2001 John Baldwin <jhb@FreeBSD.org>

- For dynamic sysctl's added at runtime, don't assume that the name passed
to the SYSCTL_ADD_FOO() macros is a constant that should be turned into
a string via the pre-processor. Instead, require it to be an explicit
string so that names can be generated on the fly.
- Make some of the char * arguments to sysctl_add_oid() const to quiet
warnings.


# 7cc0979f 08-Dec-2000 David Malone <dwmalone@FreeBSD.org>

Convert more malloc+bzero to malloc+M_ZERO.

Submitted by: josh@zipperup.org
Submitted by: Robert Drehmel <robd@gmx.net>


# 93e8459a 28-Jul-2000 Peter Wemm <peter@FreeBSD.org>

Fix some style nits.
Fix(?) some compile warnings regarding const handling.


# bd3cdc31 15-Jul-2000 Andrzej Bialecki <abial@FreeBSD.org>

These patches implement dynamic sysctls. It's possible now to add
and remove sysctl oids at will during runtime - they don't rely on
linker sets. Also, the node oids can be referenced by more than
one kernel user, which means that it's possible to create partially
overlapping trees.

Add sysctl contexts to help programmers manage multiple dynamic
oids in convenient way.

Please see the manpages for detailed discussion, and example module
for typical use.

This work is based on ideas and code snippets coming from many
people, among them: Arun Sharma, Jonathan Lemon, Doug Rabson,
Brian Feldman, Kelly Yancey, Poul-Henning Kamp and others. I'd like
to specially thank Brian Feldman for detailed review and style
fixes.

PR: kern/16928
Reviewed by: dfr, green, phk


# 77978ab8 04-Jul-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Previous commit changing SYSCTL_HANDLER_ARGS violated KNF.

Pointed out by: bde


# 82d9ae4e 03-Jul-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Style police catches up with rev 1.26 of src/sys/sys/sysctl.h:

Sanitize SYSCTL_HANDLER_ARGS so that simplistic tools can grog our
sources:

-sysctl_vm_zone SYSCTL_HANDLER_ARGS
+sysctl_vm_zone (SYSCTL_HANDLER_ARGS)


# e3975643 25-May-2000 Jake Burkholder <jake@FreeBSD.org>

Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by: msmith and others


# 740a1973 23-May-2000 Jake Burkholder <jake@FreeBSD.org>

Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by: phk
Reviewed by: phk
Approved by: mdodd


# ed6aff73 18-Apr-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Remove unneeded <sys/buf.h> includes.

Due to some interesting cpp tricks in lockmgr, the LINT kernel shrinks
by 924 bytes.


# 7de47255 13-Mar-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Remove unused 3rd argument from vsunlock() which abused B_WRITE.


# 226420a4 30-Nov-1999 Brian Feldman <green@FreeBSD.org>

Separate some common sysctl code into sysctl_find_oid() and calling
thereof. Also, make the errno returns _correct_, and add a new one
which is more appropriate.


# 02c58685 30-Oct-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Change useracc() and kernacc() to use VM_PROT_{READ|WRITE|EXECUTE} for the
"rw" argument, rather than hijacking B_{READ|WRITE}.

Fix two bugs (physio & cam) resulting by the confusion caused by this.

Submitted by: Tor.Egge@fast.no
Reviewed by: alc, ken (partly)


# c3aac50f 27-Aug-1999 Peter Wemm <peter@FreeBSD.org>

$Id$ -> $FreeBSD$


# e96c1fdc 27-Jun-1999 Peter Wemm <peter@FreeBSD.org>

Minor tweaks to make sure (new) prerequisites for <sys/buf.h> (mostly
splbio()/splx()) are #included in time.


# 75c13541 28-Apr-1999 Poul-Henning Kamp <phk@FreeBSD.org>

This Implements the mumbled about "Jail" feature.

This is a seriously beefed up chroot kind of thing. The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.

For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact: "real virtual servers".

Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.

Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.

It generally does what one would expect, but setting up a jail
still takes a little knowledge.

A few notes:

I have no scripts for setting up a jail, don't ask me for them.

The IP number should be an alias on one of the interfaces.

mount a /proc in each jail, it will make ps more useable.

/proc/<pid>/status tells the hostname of the prison for
jailed processes.

Quotas are only sensible if you have a mountpoint per prison.

There are no privisions for stopping resource-hogging.

Some "#ifdef INET" and similar may be missing (send patches!)

If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!

Tools, comments, patches & documentation most welcome.

Have fun...

Sponsored by: http://www.rndassociates.com/
Run for almost a year by: http://www.servetheweb.com/


# f711d546 27-Apr-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Suser() simplification:

1:
s/suser/suser_xxx/

2:
Add new function: suser(struct proc *), prototyped in <sys/proc.h>.

3:
s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/

The remaining suser_xxx() calls will be scrutinized and dealt with
later.

There may be some unneeded #include <sys/cred.h>, but they are left
as an exercise for Bruce.

More changes to the suser() API will come along with the "jail" code.


# 6a5d592a 30-Mar-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Purging lint from the Bruce filter.


# 7f4173cc 23-Mar-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Fix some nasty hangs if garbage were passed.

Noticed by: Emmanuel DELOGET <pixel@DotCom.FR>
Remembered by: msmith


# ce02431f 16-Feb-1999 Doug Rabson <dfr@FreeBSD.org>

* Change sysctl from using linker_set to construct its tree using SLISTs.
This makes it possible to change the sysctl tree at runtime.

* Change KLD to find and register any sysctl nodes contained in the loaded
file and to unregister them when the file is unloaded.

Reviewed by: Archie Cobbs <archie@whistle.com>,
Peter Wemm <peter@netplex.com.au> (well they looked at it anyway)


# 86415b71 10-Jan-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Back out last change to sysctl.

It was nay'ed before committing on the grounds that this is not
the way to do it, and has been decided as such several times in
the past.

There is not point in loading gobs of ascii into the kernel when
the only use of that ascii is presentation to the user.

Next thing we'd be adding all section 4 man pages to the loaded
kernel as well.

The argument about KLD's is bogus, klds can store a file in
/usr/share/doc/sysctl/dev/foo/thisvar.txt with a description and
sysctl or other facilities can pick it up there.

Proper documentation will take several K worth of text for many
sysctl variables, we don't want that in the kernel under any
circumstances.

I will welcome any well thought out attempt at improving the
situation wrt. sysctl documentation, but this wasn't it.


# 302a1102 09-Jan-1999 Dag-Erling Smørgrav <des@FreeBSD.org>

Add kernel support for sysctl descriptions. The NO_SYSCTL_DESCRIPTIONS option
disables them if they're not wanted; in that case, sysctl_sysctl_descr will
always return an empty string.

Apporved by: jkh


# 486bddb0 27-Dec-1998 Doug Rabson <dfr@FreeBSD.org>

Fix some 64bit truncation problems which crept into SYSCTL_LONG() with the
last cleanup. Since the oid_arg2 field of struct sysctl_oid is not wide
enough to hold a long, the SYSCTL_LONG() macro has been modified to only
support exporting long variables by pointer instead of by value.

Reviewed by: bde


# 2b648ac0 13-Dec-1998 Don Lewis <truckman@FreeBSD.org>

Add a generic flag, CTLFLAG_SECURE, which can be used to mark a sysctl
variable unwriteable when securelevel > 0.
Reviewed by: jdp, eivind


# 2127f260 04-Dec-1998 Archie Cobbs <archie@FreeBSD.org>

Examine all occurrences of sprintf(), strcat(), and str[n]cpy()
for possible buffer overflow problems. Replaced most sprintf()'s
with snprintf(); for others cases, added terminating NUL bytes where
appropriate, replaced constants like "16" with sizeof(), etc.

These changes include several bug fixes, but most changes are for
maintainability's sake. Any instance where it wasn't "immediately
obvious" that a buffer overflow could not occur was made safer.

Reviewed by: Bruce Evans <bde@zeta.org.au>
Reviewed by: Matthew Dillon <dillon@apollo.backplane.com>
Reviewed by: Mike Spengler <mks@networkcs.com>


# aa855a59 15-Oct-1998 Peter Wemm <peter@FreeBSD.org>

*gulp*. Jordan specifically OK'ed this..

This is the bulk of the support for doing kld modules. Two linker_sets
were replaced by SYSINIT()'s. VFS's and exec handlers are self registered.
kld is now a superset of lkm. I have converted most of them, they will
follow as a seperate commit as samples.
This all still works as a static a.out kernel using LKM's.


# e99ea9ec 05-Sep-1998 Bruce Evans <bde@FreeBSD.org>

Ignore the statically configured vfs type numbers and assign vfs
type numbers in vfs attach order (modulo incomplete reuse of old
numbers after vfs LKMs are unloaded). This requires reinitializing
the sysctl tree (or at least the vfs subtree) for vfs's that support
sysctls (currently only nfs). sysctl_order() already handled
reinitialization reasonably except it checked for annulled self
references in the wrong place.

Fixed sysctls for vfs LKMs.


# 134e06fe 05-Sep-1998 Bruce Evans <bde@FreeBSD.org>

Fixed bogotification of pseudocode for syscall args by rev.1.53 of
syscalls.master.


# 069e9bc1 24-Aug-1998 Doug Rabson <dfr@FreeBSD.org>

Change various syscalls to use size_t arguments instead of u_int.

Add some overflow checks to read/write (from bde).

Change all modifications to vm_page::flags, vm_page::busy, vm_object::flags
and vm_object::paging_in_progress to use operations which are not
interruptable.

Reviewed by: Bruce Evans <bde@zeta.org.au>


# 5591b823d 16-Dec-1997 Eivind Eklund <eivind@FreeBSD.org>

Make COMPAT_43 and COMPAT_SUNOS new-style options.


# cb226aaa 06-Nov-1997 Poul-Henning Kamp <phk@FreeBSD.org>

Move the "retval" (3rd) parameter from all syscall functions and put
it in struct proc instead.

This fixes a boatload of compiler warning, and removes a lot of cruft
from the sources.

I have not removed the /*ARGSUSED*/, they will require some looking at.

libkvm, ps and other userland struct proc frobbing programs will need
recompiled.


# a1c995b6 12-Oct-1997 Poul-Henning Kamp <phk@FreeBSD.org>

Last major round (Unless Bruce thinks of somthing :-) of malloc changes.

Distribute all but the most fundamental malloc types. This time I also
remembered the trick to making things static: Put "static" in front of
them.

A couple of finer points by: bde


# 55166637 11-Oct-1997 Poul-Henning Kamp <phk@FreeBSD.org>

Distribute and statizice a lot of the malloc M_* types.

Substantial input from: bde


# 4e750649 09-Apr-1997 Bruce Evans <bde@FreeBSD.org>

Include <sys/buf.h> instead of <sys/vnode.h>. kern_sysctl.c no
longer has anything to do with vnodes and never had anything to do
with buffers, but it needs the definitions of B_READ and B_WRITE
for use with the bogus useracc() interface and was getting them
bogusly due to excessive cleanups in rev.1.49.


# 6875d254 22-Feb-1997 Peter Wemm <peter@FreeBSD.org>

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 1130b656 14-Jan-1997 Jordan K. Hubbard <jkh@FreeBSD.org>

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# 1fbf1f71 15-Dec-1996 Bruce Evans <bde@FreeBSD.org>

Fixed garbage being returned for constant int values, e.g., for
KERN_SAVED_IDS.

Should be in 2.2.

Reviewed by: phk
Found by: NIST-PCTS


# 3f6a052a 03-Sep-1996 Bruce Evans <bde@FreeBSD.org>

Fixed bogus casts (const on the wrong `*' in `**') in a qsort-comparision
function.


# 09a8dfa2 31-Aug-1996 Bruce Evans <bde@FreeBSD.org>

Don't depend in the kernel on the gcc feature of doing arithmetic on
pointers of type `void *'. Warn about this in future.


# 1c346c70 10-Jun-1996 Nate Williams <nate@FreeBSD.org>

Implemented 'kern_sysctl', which differs from 'userland_sysctl' in that
it assumes all of the data exists in the kernel. Also, fix
sysctl_new-kernel (unused until now) which had reversed operands to
bcopy().

Reviewed by: phk

Poul writes:
... actually the lock/sleep/wakeup cruft shouldn't be needed in the
kernel version I think, but just leave it there for now.


# 7a69d923 06-Jun-1996 Poul-Henning Kamp <phk@FreeBSD.org>

If handler function returns EAGAIN, restart operation.


# 61220614 13-Apr-1996 Poul-Henning Kamp <phk@FreeBSD.org>

Fix a longstanding bug and a buglet of no significance.
Now net.ipx works.

Noticed by: John Hay -- John.Hay@csir.co.za


# 45ec3b38 07-Apr-1996 Poul-Henning Kamp <phk@FreeBSD.org>

Move the "mib" variables out to their own file.


# edbfedac 11-Mar-1996 Peter Wemm <peter@FreeBSD.org>

Import 4.4BSD-Lite2 onto the vendor branch, note that in the kernel, all
files are off the vendor branch, so this should not change anything.

A "U" marker generally means that the file was not changed in between
the 4.4Lite and Lite-2 releases, and does not need a merge. "C" generally
means that there was a change.
[note new unused (in this form) syscalls.conf, to be 'cvs rm'ed]


# 04d41bbe 10-Mar-1996 Jeffrey Hsu <hsu@FreeBSD.org>

From Lite2: rename fs to vfs.
Reviewed by: davidg & bde


# a1d2540f 01-Jan-1996 Peter Wemm <peter@FreeBSD.org>

Fix the reversed source and dest args to bcopy() in the kernel space
sysctl handler (ouch!)

Add a "const" qualifier to the source of the copyin() and copyout()
functions - the other const warning in kern_sysctl.c was silenced when
copyout was declared as having a const source.. (which it is)


# 3ac9f819 17-Dec-1995 Poul-Henning Kamp <phk@FreeBSD.org>

Add an obscure feature, needed for debugging.


# 87b6de2b 14-Dec-1995 Poul-Henning Kamp <phk@FreeBSD.org>

A Major staticize sweep. Generates a couple of warnings that I'll deal
with later.
A number of unused vars removed.
A number of unused procs removed or #ifdefed.


# efeaf95a 06-Dec-1995 David Greenman <dg@FreeBSD.org>

Untangled the vm.h include file spaghetti.


# 65d0bc13 06-Dec-1995 Poul-Henning Kamp <phk@FreeBSD.org>

A couple of minor tweaks to the sysctl stuff.


# 4cb03b1b 05-Dec-1995 Bruce Evans <bde@FreeBSD.org>

Include <vm/vm.h> or <vm/vm_page.h> explicitly to avoid breaking when
vnode_if.h doesn't include vm stuff.


# 946bb7a2 04-Dec-1995 Poul-Henning Kamp <phk@FreeBSD.org>

A major sweep over the sysctl stuff.

Move a lot of variables home to their own code (In good time before xmas :-)

Introduce the string descrition of format.

Add a couple more functions to poke into these marvels, while I try to
decide what the correct interface should look like.

Next is adding vars on the fly, and sysctl looking at them too.

Removed a tine bit of defunct and #ifdefed notused code in swapgeneric.


# 4b2af45f 19-Nov-1995 Poul-Henning Kamp <phk@FreeBSD.org>

Mega commit for sysctl.
Convert the remaining sysctl stuff to the new way of doing things.
the devconf stuff is the reason for the large number of files.
Cleaned up some compiler warnings while I were there.


# 52041295 16-Nov-1995 Poul-Henning Kamp <phk@FreeBSD.org>

All net.* sysctl converted now.


# 43f24226 14-Nov-1995 Poul-Henning Kamp <phk@FreeBSD.org>

Do what is generally belived to be the right thing, though it may not be :-)


# d457bade 14-Nov-1995 Poul-Henning Kamp <phk@FreeBSD.org>

Final part of this bunch of sysctl commits: cleanup.


# af8364b0 14-Nov-1995 Poul-Henning Kamp <phk@FreeBSD.org>

Get rid of the last debug sysctl variables of the old style.


# 9565c0e6 14-Nov-1995 Poul-Henning Kamp <phk@FreeBSD.org>

Get rid of hostnamelen variable.


# a9ad941c 14-Nov-1995 Poul-Henning Kamp <phk@FreeBSD.org>

Move all the VM sysctl stuff home where it belongs.


# 16cd04a3 14-Nov-1995 Poul-Henning Kamp <phk@FreeBSD.org>

A couple of nitpicks.


# f7152c81 14-Nov-1995 Poul-Henning Kamp <phk@FreeBSD.org>

Convert dumpdev & securelevel.


# 549a075a 14-Nov-1995 Poul-Henning Kamp <phk@FreeBSD.org>

KERN_MAXFILESPERPROC, KERN_MAXFILES went to another file.


# 27aef046 14-Nov-1995 Poul-Henning Kamp <phk@FreeBSD.org>

Get rid of domainnamelen.


# a52752a4 14-Nov-1995 Poul-Henning Kamp <phk@FreeBSD.org>

Move KERN_NTP to a more suitable file.


# 45a4ad11 14-Nov-1995 Poul-Henning Kamp <phk@FreeBSD.org>

Move the process-table stuff to a more suitable file.
Remove filetable stuff from kern_sysctl.c


# deae269a 13-Nov-1995 Poul-Henning Kamp <phk@FreeBSD.org>

Try to make my new scheme work more along the lines of the manual.
There are still some gray areas here and there.


# ae0eb976 12-Nov-1995 Poul-Henning Kamp <phk@FreeBSD.org>

The entire sysctl callback to read/write version. I havn't tested this as
much as I'd like to, but the malloc stunt I tried for an interim for
sure does worse.
Now we can read and write from any kind of address-space, not only
user and kernel, using callbacks.
This may be over-generalization for now, but it's actually simpler.


# d2d3e875 11-Nov-1995 Bruce Evans <bde@FreeBSD.org>

Included <sys/sysproto.h> to get central declarations for syscall args
structs and prototypes for syscalls.

Ifdefed duplicated decentralized declarations of args structs. It's
convenient to have this visible but they are hard to maintain. Some
are already different from the central declarations. 4.4lite2 puts
them in comments in the function headers but I wanted to avoid the
large changes for that.


# 3c8e79dd 10-Nov-1995 Bruce Evans <bde@FreeBSD.org>

Fixed type of sysctl_order_cmp().
KNFized sysctl_order_cmp().
Staticized definition of kern_sysctl() to match its declaration.


# 69feb388 10-Nov-1995 Poul-Henning Kamp <phk@FreeBSD.org>

Fix a minor buglet.


# 6ff1bbeb 10-Nov-1995 Poul-Henning Kamp <phk@FreeBSD.org>

convert more sysctl variables.


# b8da2396 09-Nov-1995 Poul-Henning Kamp <phk@FreeBSD.org>

Make the old compat functions use the sysctl front door, rather than
crashing through the walls.
This should save Peters blood pressure and netscapes uname call.


# 787d58f2 08-Nov-1995 Poul-Henning Kamp <phk@FreeBSD.org>

Fix some of the sysctl broke, and add a lot more to it.


# 2e210993 06-Nov-1995 Poul-Henning Kamp <phk@FreeBSD.org>

On working the new sysctl vars a bit I realized that I needed more generality.
This is here now. We can now access (the new) sysctl variables from the
kernel too and using functions to handle access is more sane now.
I will now attack sysctl variables in the rest of the kernel and get them
all converted to newspeak.


# 3a34a5c3 28-Oct-1995 Poul-Henning Kamp <phk@FreeBSD.org>

Sorry, the last commit screwed up for me, this is the right one (I hope!)
Please refer to the previous commit message about sysctl variables.


# b396cd83 27-Oct-1995 Poul-Henning Kamp <phk@FreeBSD.org>

Rewamp the way we make sysctl variables to be easier to cope with.

The goal is to make them "user-friendly" :-)

In the end this will allow a SNMP style "getnext" function, sysctl editing
in the boot-editor and/or debugger, LKMs can define sysctl vars when
they get loaded, and remove them when unloaded and other interesting
uses for dynamic sysctl variables.


# e1453736 31-Jul-1995 Mike Pritchard <mpp@FreeBSD.org>

Fix the sysctl string routines to return as much of the
string as possible and return ENOMEM if the entire string cannot
be returned. This brings the routines in line with how the man
page says they work, and how the calling routines are expecting
them to work. This allows the dummy uname() routine in libc to
obtain the version string, since the kernel version string is
longer than that normally returned by the uname() routine.
This is 3/4 of the fix for PR# 462.

Reviewed by: Bruce Evans


# 1c8fc26c 28-Jul-1995 David Greenman <dg@FreeBSD.org>

Fixed panic in fill_eproc() caused by inadequate checking for NULL pointers.


# 6ece4a51 08-Jul-1995 Peter Wemm <peter@FreeBSD.org>

This implements enough of the BSDI extensions to the net-2 ogetkerninfo()
syscall to allow applications linked against their libc's uname() to
work. Netscape 1.1N being a prime example, which prints:
"uname() failed. cant tell what system we're running on".
This change is a little ugly, but that's mainly because of the "interesting"
semantics of the BSDI extension.
Since ogetkerninfo() is only enabled by COMPAT_43, Netscape will only
be affected on kernels with that option (eg: "GENERIC")
Reviewed by: davidg


# 9b2e5354 30-May-1995 Rodney W. Grimes <rgrimes@FreeBSD.org>

Remove trailing whitespace.


# 5bb4f738 12-May-1995 Garrett Wollman <wollman@FreeBSD.org>

The death of `options NODUMP'. Now the dump area can be dynamically
configured (and unconfigured) on the fly. A sysctl(3) MIB variable is
provided to inspect and modify the dump device setting.


# b5e8ce9f 16-Mar-1995 Bruce Evans <bde@FreeBSD.org>

Add and move declarations to fix all of the warnings from `gcc -Wimplicit'
(except in netccitt, netiso and netns) and most of the warnings from
`gcc -Wnested-externs'. Fix all the bugs found. There were no serious
ones.


# e6373c9e 20-Feb-1995 Guido van Rooij <guido@FreeBSD.org>

Implement maxprocperuid and maxfilesperproc. They are tunable
via sysctl(8). The initial value of maxprocperuid is maxproc-1,
that of maxfilesperproc is maxfiles (untill maxfile will disappear)

Now it is at least possible to prohibit one user opening maxfiles

-Guido

Submitted by:
Obtained from:


# be6a1d14 27-Dec-1994 David Greenman <dg@FreeBSD.org>

Fixed multiple bugs that cause null pointers to be followed or FREEed data
to be accessed if a process blocks when it is being run down.


# f81fcd41 18-Dec-1994 Guido van Rooij <guido@FreeBSD.org>

Fix bug in sysctl_string so that when a string has a length that is to
short, it gets filled uop to its length. This matches the getdomainname
and gethostname manual pages.
(getbootfile also uses this function and I think it should have the same
behaviour)

This also fixes a bug with keyinit where the seed was not saved in
/etc/skeykeys. So S/Key should be fully functional again.

Reviewed by:
Submitted by:
Obtained from:


# 6124ac44 14-Nov-1994 Bruce Evans <bde@FreeBSD.org>

Move declarations of public functions to <sys/sysctl.h>.

Make some private data static.

Comment about MAXPATHLEN bytes of bloat for the kernel name.


# 8478caba 15-Oct-1994 Garrett Wollman <wollman@FreeBSD.org>

kern_clock.c: define dk_names[][].
kern_sysctl.c: call dev_sysctl for hw.devconf mib subtree
kern_devconf.c: sysctl-accessible device-configuration and -management
interface


# 82478919 06-Oct-1994 David Greenman <dg@FreeBSD.org>

Use tsleep() rather than sleep so that 'ps' is more informative about
the wait.


# 797f2d22 02-Oct-1994 Poul-Henning Kamp <phk@FreeBSD.org>

All of this is cosmetic. prototypes, #includes, printfs and so on. Makes
GCC a lot more silent.


# 63b46ee5 23-Sep-1994 Garrett Wollman <wollman@FreeBSD.org>

Add MIB variable kern.bootfile (R/W) giving the name of the booted kernel.
Kernel variable is kernelname[].


# c901836c 20-Sep-1994 Garrett Wollman <wollman@FreeBSD.org>

Implemented loadable VFS modules, and made most existing filesystems
loadable. (NFS is a notable exception.)


# e9e2a852 19-Sep-1994 Andrey A. Chernov <ache@FreeBSD.org>

sysctl incorrectly check name[2] instead of name[1]


# 3f31c649 18-Sep-1994 Garrett Wollman <wollman@FreeBSD.org>

Redo Kernel NTP PLL support, kernel side.

This code is mostly taken from the 1.1 port (which was in turn taken from
Dave Mills's kern.tar.Z example). A few significant differences:

1) ntp_gettime() is now a MIB variable rather than a system call. A few
fiddles are done in libc to make it behave the same.

2) mono_time does not participate in the PLL adjustments.

3) A new interface has been defined (in <machine/clock.h>) for doing
possibly machine-dependent things around the time of the clock update.
This is used in Pentium kernels to disable interrupts, set `time', and
reset the CPU cycle counter as quickly as possible to avoid jitter in
microtime(). Measurements show an apparent resolution of a bit more than
8.14usec, which is reasonable given system-call overhead.


# 7a808ab7 15-Sep-1994 Andrey A. Chernov <ache@FreeBSD.org>

KERN_ADJKERNTZ removed from here to cpu_sysctl MACHDEP section


# b93df246 14-Sep-1994 Andrey A. Chernov <ache@FreeBSD.org>

KERN_ADJKERNTZ added in preparation of resettodr() implementation


# 501c2393 09-Sep-1994 Garrett Wollman <wollman@FreeBSD.org>

Define new MIB variable, hw.floatingpoint, which is true if FP hardware
is present, and false if an emulator is being used.


# f23b4c91 18-Aug-1994 Garrett Wollman <wollman@FreeBSD.org>

Fix up some sloppy coding practices:

- Delete redundant declarations.
- Add -Wredundant-declarations to Makefile.i386 so they don't come back.
- Delete sloppy COMMON-style declarations of uninitialized data in
header files.
- Add a few prototypes.
- Clean up warnings resulting from the above.

NB: ioconf.c will still generate a redundant-declaration warning, which
is unavoidable unless somebody volunteers to make `config' smarter.


# 9ae15916 10-Aug-1994 Garrett Wollman <wollman@FreeBSD.org>

Make it easier for programs to figure out what revision of FreeBSD they
are running under. Here's how to bootstrap (order is important):

1) Re-compile gcc (just the driver is all you need).
2) Re-compile libc.
3) Re-compile your kernel. Reboot.
4) cd /usr/src/include; make install

You can now detect the compilation environment with the following code:

#if !defined(__FreeBSD__)
#define __FreeBSD_version 199401
#elif __FreeBSD__ == 1
#define __FreeBSD_version 199405
#else
#include <osreldate.h>
#endif

You can determine the run-time environment by calling the new C library
function getosreldate(), or by examining the MIB variable kern.osreldate.

For the time being, the release date is defined as 199409, which we have
already established as our target.


# 6ae6a09b 09-Aug-1994 Garrett Wollman <wollman@FreeBSD.org>

Change default security level to -1, so that users don't get bitten by
upcoming makefile change.


# 57034e74 08-Aug-1994 Garrett Wollman <wollman@FreeBSD.org>

Run-time configuration of VFS update interval. Old UPDATE_INTERVAL
configuration option is no longer supported.


# 4849be9c 07-Aug-1994 Garrett Wollman <wollman@FreeBSD.org>

Define a sysctl MIB variable for the YP domain name.


# 3c4dd356 02-Aug-1994 David Greenman <dg@FreeBSD.org>

Added $Id$


# 26f9a767 25-May-1994 Rodney W. Grimes <rgrimes@FreeBSD.org>

The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.

Reviewed by: Rodney W. Grimes
Submitted by: John Dyson and David Greenman


# df8bae1d 24-May-1994 Rodney W. Grimes <rgrimes@FreeBSD.org>

BSD 4.4 Lite Kernel Sources