History log of /freebsd-current/sys/sys/user.h
Revision Date Author Comments
# 1a8d1764 29-Mar-2024 Gleb Smirnoff <glebius@FreeBSD.org>

inpcb: fully retire inp_ppcb pointer

Before a protocol specific control block started to embed inpcb in self
(see 0aa120d52f3c, e68b3792440c, 483fe96511ec) this pointer used to point
at it.

Retain kf_sock_inpcb field in the struct kinfo_file in <sys/user.h>. The
exp-run detected a minimal use of the field in ports:
* sysutils/lsof - patched upstream
* net-mgmt/netdata - patch accepted upstream
* emulators/qemu-user-static - upstream master branch seems not using
the field anymore
We can keep the field around for some time, but eventually it may be
reused for something else.

PR: 277659 (exp-run)
Reviewed by: tuexen
Differential Revision: https://reviews.freebsd.org/D44491


# 29363fb4 23-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove ancient SCCS tags.

Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.

Sponsored by: Netflix


# af93fea7 23-Aug-2023 Jake Freeland <jfree@freebsd.org>

timerfd: Move implementation from linux compat to sys/kern

Move the timerfd impelemntation from linux compat code to sys/kern. Use
it to implement the new system calls for timerfd. Add a hook to kern_tc
to allow timerfd to know when the system time has stepped. Add kqueue
support to timerfd. Adjust a few names to be less Linux centric.

RelNotes: YES
Reviewed by: markj (on irc), imp, kib (with reservations), jhb (slack)
Differential Revision: https://reviews.freebsd.org/D38459


# 2ff63af9 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .h pattern

Remove /^\s*\*+\s*\$FreeBSD\$.*$\n/


# dec7db49 23-Jan-2023 Jiajie Chen <c@jia.je>

Add kf_file_nlink field to kf_file and populate it

This will allow user-space programs (e.g. lsof) to locate deleted files
whose nlink equals zero. Prior to this commit, programs has to use
stat(kf_path) to get nlink, but that will fail if the file is deleted.

[mjg: s/fail/file in the commit message]

Reviewed by: mjg
Differential Revision: https://reviews.freebsd.org/D38169


# 939f0b63 10-May-2022 Kornel Dulęba <kd@FreeBSD.org>

Implement shared page address randomization

It used to be mapped at the top of the UVA.
If the randomization is enabled any address above .data section will be
randomly chosen and a guard page will be inserted in the shared page
default location.
The shared page is now mapped in exec_map_stack, instead of
exec_new_vmspace. The latter function is called before image activator
has a chance to parse ASLR related flags.
The KERN_PROC_VM_LAYOUT sysctl was extended to provide shared page
address.
The feature is enabled by default for 64 bit applications on all
architectures.
It can be toggled kern.elf64.aslr.shared_page sysctl.

Approved by: mw(mentor)
Sponsored by: Stormshield
Obtained from: Semihalf
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D35349


# 7f3c78fb 16-Jul-2022 Mark Johnston <markj@FreeBSD.org>

vm_pager: Remove references to KVME_TYPE_DEFAULT in the kernel

Keep the definition around since it's used by userspace.

Reviewed by: alc, imp, kib
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35791


# 8c309d48 17-Jun-2022 Damjan Jovanovic <damjan.jov@gmail.com>

struct kinfo_file changes needed for lsof to work using only usermode APIs`

Add kf_pipe_buffer_[in/out/size] fields to kf_pipe, and populate them.

Add a kf_kqueue struct to the kf_un union, to allow querying kqueue state,
and populate it.

Populate the kf_sock_rcv_sb_state and kf_sock_snd_sb_state fields in
kf_sock for INET/INET6 sockets, and populate all other fields for all
transport layer protocols, not just TCP.

Bump __FreeBSD_version.

Differential revision: https://reviews.freebsd.org/D34184
Reviewed by: jhb, kib, se
MFC after: 1 week


# 6ead1379 01-Apr-2022 Konstantin Belousov <kib@FreeBSD.org>

sys/user.h: Add kinfo_lockf structure to report advisory locks

Reviewed by: markj, rmacklem
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D34756


# 3ce04aca 17-Jan-2022 Mark Johnston <markj@FreeBSD.org>

proc: Add a sysctl to fetch virtual address space layout info

This provides information about fixed regions of the target process'
user memory map.

Reviewed by: kib
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33708


# 04389c85 08-Aug-2021 Gordon Bergling <gbe@FreeBSD.org>

Fix some common typos in comments

- s/configuraiton/configuration/
- s/specifed/specified/
- s/compatiblity/compatibility/

MFC after: 5 days


# ecfbddf0 14-Apr-2021 Konstantin Belousov <kib@FreeBSD.org>

sysctl vm.objects: report backing object and swap use

For anonymous objects, provide a handle kvo_me naming the object,
and report the handle of the backing object. This allows userspace
to deconstruct the shadow chain. Right now the handle is the address
of the object in KVA, but this is not guaranteed.

For the same anonymous objects, report the swap space used for actually
swapped out pages, in kvo_swapped field. I do not believe that it is
useful to report full 64bit counter there, so only uint32_t value is
returned, clamped to the max.

For kinfo_vmentry, report anonymous object handle backing the entry,
so that the shadow chain for the specific mapping can be deconstructed.

Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D29771


# 25c6318c 13-Feb-2021 Konstantin Belousov <kib@FreeBSD.org>

procstat: distinguish vm map guards in procstat vm output.

Requested and reviewed by: rwatson (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D28658


# 7a202823 23-Dec-2020 Konstantin Belousov <kib@FreeBSD.org>

Expose eventfd in the native API/ABI using a new __specialfd syscall

eventfd is a Linux system call that produces special file descriptors
for event notification. When porting Linux software, it is currently
usually emulated by epoll-shim on top of kqueues. Unfortunately, kqueues
are not passable between processes. And, as noted by the author of
epoll-shim, even if they were, the library state would also have to be
passed somehow. This came up when debugging strange HW video decode
failures in Firefox. A native implementation would avoid these problems
and help with porting Linux software.

Since we now already have an eventfd implementation in the kernel (for
the Linuxulator), it's pretty easy to expose it natively, which is what
this patch does.

Submitted by: greg@unrelenting.technology
Reviewed by: markj (previous version)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D26668


# 688f8b82 24-Nov-2020 John Baldwin <jhb@FreeBSD.org>

Remove the cloned file descriptors for /dev/crypto.

Crypto file descriptors were added in the original OCF import as a way
to provide per-open data (specifically the list of symmetric
sessions). However, this gives a bit of a confusing API where one has
to open /dev/crypto and then invoke an ioctl to obtain a second file
descriptor. This also does not match the API used with /dev/crypto on
other BSDs or with Linux's /dev/crypto driver.

Character devices have gained support for per-open data via cdevpriv
since OCF was imported, so use cdevpriv to simplify the userland API
by permitting ioctls directly on /dev/crypto descriptors.

To provide backwards compatibility, CRIOGET now opens another
/dev/crypto descriptor via kern_openat() rather than dup'ing the
existing file descriptor. This preserves prior semantics in case
CRIOGET is invoked multiple times on a single file descriptor.

Reviewed by: markj
Relnotes: yes
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D27302


# 85078b85 17-Nov-2020 Conrad Meyer <cem@FreeBSD.org>

Split out cwd/root/jail, cmask state from filedesc table

No functional change intended.

Tracking these structures separately for each proc enables future work to
correctly emulate clone(2) in linux(4).

__FreeBSD_version is bumped (to 1300130) for consumption by, e.g., lsof.

Reviewed by: kib
Discussed with: markj, mjg
Differential Revision: https://reviews.freebsd.org/D27037


# fea73412 24-Dec-2019 Conrad Meyer <cem@FreeBSD.org>

sleep(9), sleepqueue(9): const'ify wchan pointers

_sleep(9), wakeup(9), sleepqueue(9), et al do not dereference or modify the
channel pointers provided in any way; they are merely used as intptrs into a
dictionary structure to match waiters with wakers. Correctly annotate this
such that _sleep() and wakeup() may be used on const pointers without
invoking ugly patterns like __DECONST(). Plumb const through all of the
underlying sleepqueue bits.

No functional change.

Reviewed by: rlibby
Discussed with: kib, markj
Differential Revision: https://reviews.freebsd.org/D22914


# bc2d137a 22-May-2019 Konstantin Belousov <kib@FreeBSD.org>

Make pack_kinfo() available for external callers.

Reviewed by: jilles, tmunro
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D20258


# 6a855903 05-Mar-2019 Mark Johnston <markj@FreeBSD.org>

Show wiring state of map entries in procstat -v.

Note that only entries wired by userspace are shown as such. In
particular, entries transiently wired by sysctl_wire_old_buffer() are
not flagged as wired in procstat -v output.

Reviewed by: kib (previous version)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D19461


# f1863400 03-Dec-2018 Konstantin Belousov <kib@FreeBSD.org>

Improve procstat reporting for the linux cdev file descriptors.

If there is a vnode attached to the linux file, use it to fill
kinfo_file. Otherwise, report a new KF_TYPE_DEV file type, without
supplying any type-specific information.

KF_TYPE_DEV is supposed to be used by most devfs-specific file types.

Sponsored by: Mellanox Technologies
MFC after: 1 week


# b65a9568 24-Sep-2018 John Baldwin <jhb@FreeBSD.org>

Restore the API of the kf_sa_local and kf_sa_peer members.

In 11.x and earlier these were accessible as direct members of 'struct
kinfo_file'. Existing code already knows about the new location of
these members as well, so wrapper macros did not work for these
fields. Instead, define an anonymous struct containing the fields
from 'struct kinfo_file' in FreeBSD 11 that were not part of the
'kf_un' union. This anonymous struct is then placed in an anonymous
union along with the new 'kf_un' union. This preserves the API of
both structure layouts without requiring any wrapper macros.

PR: 231525
Reviewed by: kib
Approved by: re (gjb)
Differential Revision: https://reviews.freebsd.org/D17262


# 51369649 20-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.


# 95b97895 26-May-2017 Conrad Meyer <cem@FreeBSD.org>

procstat(1): Add TCP socket send/recv buffer size

Add TCP socket send and receive buffer size to procstat -f output.

Reviewed by: kib, markj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D10689


# 69921123 23-May-2017 Konstantin Belousov <kib@FreeBSD.org>

Commit the 64-bit inode project.

Extend the ino_t, dev_t, nlink_t types to 64-bit ints. Modify
struct dirent layout to add d_off, increase the size of d_fileno
to 64-bits, increase the size of d_namlen to 16-bits, and change
the required alignment. Increase struct statfs f_mntfromname[] and
f_mntonname[] array length MNAMELEN to 1024.

ABI breakage is mitigated by providing compatibility using versioned
symbols, ingenious use of the existing padding in structures, and
by employing other tricks. Unfortunately, not everything can be
fixed, especially outside the base system. For instance, third-party
APIs which pass struct stat around are broken in backward and
forward incompatible ways.

Kinfo sysctl MIBs ABI is changed in backward-compatible way, but
there is no general mechanism to handle other sysctl MIBS which
return structures where the layout has changed. It was considered
that the breakage is either in the management interfaces, where we
usually allow ABI slip, or is not important.

Struct xvnode changed layout, no compat shims are provided.

For struct xtty, dev_t tty device member was reduced to uint32_t.
It was decided that keeping ABI compat in this case is more useful
than reporting 64-bit dev_t, for the sake of pstat.

Update note: strictly follow the instructions in UPDATING. Build
and install the new kernel with COMPAT_FREEBSD11 option enabled,
then reboot, and only then install new world.

Credits: The 64-bit inode project, also known as ino64, started life
many years ago as a project by Gleb Kurtsou (gleb). Kirk McKusick
(mckusick) then picked up and updated the patch, and acted as a
flag-waver. Feedback, suggestions, and discussions were carried
by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles),
and Rick Macklem (rmacklem). Kris Moore (kris) performed an initial
ports investigation followed by an exp-run by Antoine Brodin (antoine).
Essential and all-embracing testing was done by Peter Holm (pho).
The heavy lifting of coordinating all these efforts and bringing the
project to completion were done by Konstantin Belousov (kib).

Sponsored by: The FreeBSD Foundation (emaste, kib)
Differential revision: https://reviews.freebsd.org/D10439


# fbbd9655 28-Feb-2017 Warner Losh <imp@FreeBSD.org>

Renumber copyright clause 4

Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by: Jan Schaumann <jschauma@stevens.edu>
Pull Request: https://github.com/freebsd/freebsd/pull/96


# 3d32d4a7 07-Dec-2016 Eric van Gyzen <vangyzen@FreeBSD.org>

Export the whole thread name in kinfo_proc

kinfo_proc::ki_tdname is three characters shorter than
thread::td_name. Add a ki_moretdname field for these three
extra characters. Add the new field to kinfo_proc32, as well.
Update all in-tree consumers to read the new field and assemble
the full name, except for lldb's HostThreadFreeBSD.cpp, which
I will handle separately. Bump __FreeBSD_version.

Reviewed by: kib
MFC after: 1 week
Relnotes: yes
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D8722


# 7f417bfa 03-May-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/sys: minor spelling fixes.

While the changes are minor, these headers are very visible.

MFC after: 2 weeks


# e6b95927 06-Oct-2015 Conrad Meyer <cem@FreeBSD.org>

Fix core corruption caused by race in note_procstat_vmmap

This fix is spiritually similar to r287442 and was discovered thanks to
the KASSERT added in that revision.

NT_PROCSTAT_VMMAP output length, when packing kinfo structs, is tied to
the length of filenames corresponding to vnodes in the process' vm map
via vn_fullpath. As vnodes may move during coredump, this is racy.

We do not remove the race, only prevent it from causing coredump
corruption.

- Add a sysctl, kern.coredump_pack_vmmapinfo, to allow users to disable
kinfo packing for PROCSTAT_VMMAP notes. This avoids VMMAP corruption
and truncation, even if names change, at the cost of up to PATH_MAX
bytes per mapped object. The new sysctl is documented in core.5.

- Fix note_procstat_vmmap to self-limit in the second pass. This
addresses corruption, at the cost of sometimes producing a truncated
result.

- Fix PROCSTAT_VMMAP consumers libutil (and libprocstat, via copy-paste)
to grok the new zero padding.

Reported by: pho (https://people.freebsd.org/~pho/stress/log/datamove4-2.txt)
Relnotes: yes
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D3824


# 14bdbaf2 03-Sep-2015 Conrad Meyer <cem@FreeBSD.org>

Detect badly behaved coredump note helpers

Coredump notes depend on being able to invoke dump routines twice; once
in a dry-run mode to get the size of the note, and another to actually
emit the note to the corefile.

When a note helper emits a different length section the second time
around than the length it requested the first time, the kernel produces
a corrupt coredump.

NT_PROCSTAT_FILES output length, when packing kinfo structs, is tied to
the length of filenames corresponding to vnodes in the process' fd table
via vn_fullpath. As vnodes may move around during dump, this is racy.

So:

- Detect badly behaved notes in putnote() and pad underfilled notes.

- Add a fail point, debug.fail_point.fill_kinfo_vnode__random_path to
exercise the NT_PROCSTAT_FILES corruption. It simply picks random
lengths to expand or truncate paths to in fo_fill_kinfo_vnode().

- Add a sysctl, kern.coredump_pack_fileinfo, to allow users to
disable kinfo packing for PROCSTAT_FILES notes. This should avoid
both FILES note corruption and truncation, even if filenames change,
at the cost of about 1 kiB in padding bloat per open fd. Document
the new sysctl in core.5.

- Fix note_procstat_files to self-limit in the 2nd pass. Since
sometimes this will result in a short write, pad up to our advertised
size. This addresses note corruption, at the risk of sometimes
truncating the last several fd info entries.

- Fix NT_PROCSTAT_FILES consumers libutil and libprocstat to grok the
zero padding.

With suggestions from: bjk, jhb, kib, wblock
Approved by: markj (mentor)
Relnotes: yes
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D3548


# ff87ae35 27-May-2015 John Baldwin <jhb@FreeBSD.org>

Export a list of VM objects in the system via a sysctl. The list can be
examined via 'vmstat -o'. It can be used to determine which files are
using physical pages of memory and how much each is using.

Differential Revision: https://reviews.freebsd.org/D2277
Reviewed by: alc, kib
MFC after: 2 weeks
Sponsored by: Norse Corp, Inc. (forward porting to HEAD/10)


# bfda9935 06-Nov-2014 Mateusz Guzik <mjg@FreeBSD.org>

Add sysctl kern.proc.cwd

It returns only current working directory of given process which saves a lot of
overhead over kern.proc.filedesc if given proc has a lot of open fds.

Submitted by: Tiwei Bie <btw mail.ustc.edu.cn> (slightly modified)
X-Additional: JuniorJobs project


# e77f9fed 18-Oct-2014 Adrian Chadd <adrian@FreeBSD.org>

Update the ULE scheduler + thread and kinfo structs to use int for cpuid
rather than u_char.

To try and play nice with the ABI, the u_char CPU ID values are clamped
at 254. The new fields now contain the full CPU ID, or -1 for no cpu.

Differential Revision: D955
Reviewed by: jhb, kib
Sponsored by: Norse Corp, Inc.


# 8b04bbef 28-Aug-2014 Mateusz Guzik <mjg@FreeBSD.org>

Return real parent pid in kinfo (used by e.g. ps)

Add a separate field which exports tracer pid and add a new keyword
("tracer") for ps to display it.

This is a follow up to r270444.

Reviewed by: kib
MFC after: 1 week
Relnotes: yes


# 6b6348aa 02-May-2014 Mateusz Guzik <mjg@FreeBSD.org>

Fix typo in KF_FD_TYPE_TRACE comment: ptrace -> ktrace


# 2db08c03 11-Feb-2014 John Baldwin <jhb@FreeBSD.org>

Expose OBJT_MGTDEVICE VM objects used for GEM/TTM with drm2 as an
explicit object type.

Reviewed by: kib
MFC after: 1 week


# fdaff241 04-Jan-2014 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Bring back the old size of the kinfo_file structure to preserve ABI.
Keep only one uint64_t spare for further cap_rights_t expension.

Add a comment clarifying that if the size of this structure changes,
a new sysctl MIB has to be allocate for it and the old structure has
to be returned by the old sysctl MIB.

Requested by: re
MFC after: 3 days


# 80c3af4e 26-Nov-2013 Konstantin Belousov <kib@FreeBSD.org>

Add an kinfo sysctl to retrieve signal trampoline location for the
given process.

Note that the correctness of the trampoline length returned for ABIs
which do not use shared page depends on the correctness of the struct
sysvec sv_szsigcodebase member, which will be fixed on as-need basis.

Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 55648840 19-Sep-2013 John Baldwin <jhb@FreeBSD.org>

Extend the support for exempting processes from being killed when swap is
exhausted.
- Add a new protect(1) command that can be used to set or revoke protection
from arbitrary processes. Similar to ktrace it can apply a change to all
existing descendants of a process as well as future descendants.
- Add a new procctl(2) system call that provides a generic interface for
control operations on processes (as opposed to the debugger-specific
operations provided by ptrace(2)). procctl(2) uses a combination of
idtype_t and an id to identify the set of processes on which to operate
similar to wait6().
- Add a PROC_SPROTECT control operation to manage the protection status
of a set of processes. MADV_PROTECT still works for backwards
compatability.
- Add a p_flag2 to struct proc (and a corresponding ki_flag2 to kinfo_proc)
the first bit of which is used to track if P_PROTECT should be inherited
by new child processes.

Reviewed by: kib, jilles (earlier version)
Approved by: re (delphij)
MFC after: 1 month


# 7008be5b 04-Sep-2013 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Change the cap_rights_t type from uint64_t to a structure that we can extend
in the future in a backward compatible (API and ABI) way.

The cap_rights_t represents capability rights. We used to use one bit to
represent one right, but we are running out of spare bits. Currently the new
structure provides place for 114 rights (so 50 more than the previous
cap_rights_t), but it is possible to grow the structure to hold at least 285
rights, although we can make it even larger if 285 rights won't be enough.

The structure definition looks like this:

struct cap_rights {
uint64_t cr_rights[CAP_RIGHTS_VERSION + 2];
};

The initial CAP_RIGHTS_VERSION is 0.

The top two bits in the first element of the cr_rights[] array contain total
number of elements in the array - 2. This means if those two bits are equal to
0, we have 2 array elements.

The top two bits in all remaining array elements should be 0.
The next five bits in all array elements contain array index. Only one bit is
used and bit position in this five-bits range defines array index. This means
there can be at most five array elements in the future.

To define new right the CAPRIGHT() macro must be used. The macro takes two
arguments - an array index and a bit to set, eg.

#define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL)

We still support aliases that combine few rights, but the rights have to belong
to the same array element, eg:

#define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL)
#define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL)

#define CAP_FCHMODAT (CAP_FCHMOD | CAP_LOOKUP)

There is new API to manage the new cap_rights_t structure:

cap_rights_t *cap_rights_init(cap_rights_t *rights, ...);
void cap_rights_set(cap_rights_t *rights, ...);
void cap_rights_clear(cap_rights_t *rights, ...);
bool cap_rights_is_set(const cap_rights_t *rights, ...);

bool cap_rights_is_valid(const cap_rights_t *rights);
void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src);
void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src);
bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);

Capability rights to the cap_rights_init(), cap_rights_set(),
cap_rights_clear() and cap_rights_is_set() functions are provided by
separating them with commas, eg:

cap_rights_t rights;

cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);

There is no need to terminate the list of rights, as those functions are
actually macros that take care of the termination, eg:

#define cap_rights_set(rights, ...) \
__cap_rights_set((rights), __VA_ARGS__, 0ULL)
void __cap_rights_set(cap_rights_t *rights, ...);

Thanks to using one bit as an array index we can assert in those functions that
there are no two rights belonging to different array elements provided
together. For example this is illegal and will be detected, because CAP_LOOKUP
belongs to element 0 and CAP_PDKILL to element 1:

cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);

Providing several rights that belongs to the same array's element this way is
correct, but is not advised. It should only be used for aliases definition.

This commit also breaks compatibility with some existing Capsicum system calls,
but I see no other way to do that. This should be fine as Capsicum is still
experimental and this change is not going to 9.x.

Sponsored by: The FreeBSD Foundation


# 5e9ccc87 26-Aug-2013 Will Andrews <will@FreeBSD.org>

Add the ability to display the default FIB number for a process to the
ps(1) utility, e.g. "ps -O fib".

bin/ps/keyword.c:
Add the "fib" keyword and default its column name to "FIB".

bin/ps/ps.1:
Add "fib" as a supported keyword.

sys/compat/freebsd32/freebsd32.h:
sys/kern/kern_proc.c:
sys/sys/user.h:
Add the default fib number for a process (p->p_fibnum)
to the user land accessible process data of struct kinfo_proc.

Submitted by: Oliver Fromme <olli@fromme.com>, gibbs


# 958aa575 03-May-2013 John Baldwin <jhb@FreeBSD.org>

Similar to 233760 and 236717, export some more useful info about the
kernel-based POSIX semaphore descriptors to userland via procstat(1) and
fstat(1):
- Change sem file descriptors to track the pathname they are associated
with and add a ksem_info() method to copy the path out to a
caller-supplied buffer.
- Use the fo_stat() method of shared memory objects and ksem_info() to
export the path, mode, and value of a semaphore via struct kinfo_file.
- Add a struct semstat to the libprocstat(3) interface along with a
procstat_get_sem_info() to export the mode and value of a semaphore.
- Teach fstat about semaphores and to display their path, mode, and value.

MFC after: 2 weeks


# fe52cf54 14-Apr-2013 Mikolaj Golub <trociny@FreeBSD.org>

Re-factor the code to provide kern_proc_filedesc_out(), kern_proc_out(),
and kern_proc_vmmap_out() functions to output process kinfo structures
to sbuf, to make the code reusable.

The functions are going to be used in the coredump routine to store
procstat info in the core program header notes.

Reviewed by: kib
MFC after: 3 weeks


# 2609222a 01-Mar-2013 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Merge Capsicum overhaul:

- Capability is no longer separate descriptor type. Now every descriptor
has set of its own capability rights.

- The cap_new(2) system call is left, but it is no longer documented and
should not be used in new code.

- The new syscall cap_rights_limit(2) should be used instead of
cap_new(2), which limits capability rights of the given descriptor
without creating a new one.

- The cap_getrights(2) syscall is renamed to cap_rights_get(2).

- If CAP_IOCTL capability right is present we can further reduce allowed
ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed
ioctls can be retrived with cap_ioctls_get(2) syscall.

- If CAP_FCNTL capability right is present we can further reduce fcntls
that can be used with the new cap_fcntls_limit(2) syscall and retrive
them with cap_fcntls_get(2).

- To support ioctl and fcntl white-listing the filedesc structure was
heavly modified.

- The audit subsystem, kdump and procstat tools were updated to
recognize new syscalls.

- Capability rights were revised and eventhough I tried hard to provide
backward API and ABI compatibility there are some incompatible changes
that are described in detail below:

CAP_CREATE old behaviour:
- Allow for openat(2)+O_CREAT.
- Allow for linkat(2).
- Allow for symlinkat(2).
CAP_CREATE new behaviour:
- Allow for openat(2)+O_CREAT.

Added CAP_LINKAT:
- Allow for linkat(2). ABI: Reuses CAP_RMDIR bit.
- Allow to be target for renameat(2).

Added CAP_SYMLINKAT:
- Allow for symlinkat(2).

Removed CAP_DELETE. Old behaviour:
- Allow for unlinkat(2) when removing non-directory object.
- Allow to be source for renameat(2).

Removed CAP_RMDIR. Old behaviour:
- Allow for unlinkat(2) when removing directory.

Added CAP_RENAMEAT:
- Required for source directory for the renameat(2) syscall.

Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR):
- Allow for unlinkat(2) on any object.
- Required if target of renameat(2) exists and will be removed by this
call.

Removed CAP_MAPEXEC.

CAP_MMAP old behaviour:
- Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and
PROT_WRITE.
CAP_MMAP new behaviour:
- Allow for mmap(2)+PROT_NONE.

Added CAP_MMAP_R:
- Allow for mmap(PROT_READ).
Added CAP_MMAP_W:
- Allow for mmap(PROT_WRITE).
Added CAP_MMAP_X:
- Allow for mmap(PROT_EXEC).
Added CAP_MMAP_RW:
- Allow for mmap(PROT_READ | PROT_WRITE).
Added CAP_MMAP_RX:
- Allow for mmap(PROT_READ | PROT_EXEC).
Added CAP_MMAP_WX:
- Allow for mmap(PROT_WRITE | PROT_EXEC).
Added CAP_MMAP_RWX:
- Allow for mmap(PROT_READ | PROT_WRITE | PROT_EXEC).

Renamed CAP_MKDIR to CAP_MKDIRAT.
Renamed CAP_MKFIFO to CAP_MKFIFOAT.
Renamed CAP_MKNODE to CAP_MKNODEAT.

CAP_READ old behaviour:
- Allow pread(2).
- Disallow read(2), readv(2) (if there is no CAP_SEEK).
CAP_READ new behaviour:
- Allow read(2), readv(2).
- Disallow pread(2) (CAP_SEEK was also required).

CAP_WRITE old behaviour:
- Allow pwrite(2).
- Disallow write(2), writev(2) (if there is no CAP_SEEK).
CAP_WRITE new behaviour:
- Allow write(2), writev(2).
- Disallow pwrite(2) (CAP_SEEK was also required).

Added convinient defines:

#define CAP_PREAD (CAP_SEEK | CAP_READ)
#define CAP_PWRITE (CAP_SEEK | CAP_WRITE)
#define CAP_MMAP_R (CAP_MMAP | CAP_SEEK | CAP_READ)
#define CAP_MMAP_W (CAP_MMAP | CAP_SEEK | CAP_WRITE)
#define CAP_MMAP_X (CAP_MMAP | CAP_SEEK | 0x0000000000000008ULL)
#define CAP_MMAP_RW (CAP_MMAP_R | CAP_MMAP_W)
#define CAP_MMAP_RX (CAP_MMAP_R | CAP_MMAP_X)
#define CAP_MMAP_WX (CAP_MMAP_W | CAP_MMAP_X)
#define CAP_MMAP_RWX (CAP_MMAP_R | CAP_MMAP_W | CAP_MMAP_X)
#define CAP_RECV CAP_READ
#define CAP_SEND CAP_WRITE

#define CAP_SOCK_CLIENT \
(CAP_CONNECT | CAP_GETPEERNAME | CAP_GETSOCKNAME | CAP_GETSOCKOPT | \
CAP_PEELOFF | CAP_RECV | CAP_SEND | CAP_SETSOCKOPT | CAP_SHUTDOWN)
#define CAP_SOCK_SERVER \
(CAP_ACCEPT | CAP_BIND | CAP_GETPEERNAME | CAP_GETSOCKNAME | \
CAP_GETSOCKOPT | CAP_LISTEN | CAP_PEELOFF | CAP_RECV | CAP_SEND | \
CAP_SETSOCKOPT | CAP_SHUTDOWN)

Added defines for backward API compatibility:

#define CAP_MAPEXEC CAP_MMAP_X
#define CAP_DELETE CAP_UNLINKAT
#define CAP_MKDIR CAP_MKDIRAT
#define CAP_RMDIR CAP_UNLINKAT
#define CAP_MKFIFO CAP_MKFIFOAT
#define CAP_MKNOD CAP_MKNODAT
#define CAP_SOCK_ALL (CAP_SOCK_CLIENT | CAP_SOCK_SERVER)

Sponsored by: The FreeBSD Foundation
Reviewed by: Christoph Mallon <christoph.mallon@gmx.de>
Many aspects discussed with: rwatson, benl, jonathan
ABI compatibility discussed with: kib


# bac7a76e 21-Aug-2012 David E. O'Brien <obrien@FreeBSD.org>

Missing one in r239505.


# b8d5e2e6 21-Aug-2012 David E. O'Brien <obrien@FreeBSD.org>

Restore the style of r195843 to that of pre-r194498
to reduce gratuitous diffs to older sources.


# 599fc82b 16-Jul-2012 Gabor Pali <pgj@FreeBSD.org>

- Add support for displaying process stack memory regions.

Approved by: rwatson
MFC after: 3 days


# 4d34e019 23-May-2012 Konstantin Belousov <kib@FreeBSD.org>

Calculate the count of per-process cow faults. Export the count to
userspace using the obscure spare int field in struct kinfo_proc.

Submitted by: Andrey Zonov <andrey zonov org>
MFC after: 1 week


# 5384d089 07-Nov-2011 Mikolaj Golub <trociny@FreeBSD.org>

Add KVME_FLAG_SUPER and use it in sysctl_kern_proc_vmmap for marking
entries with superpages.

Submitted by: Mel Flynn <mel.flynn+fbsd.hackers@mailing.thruhere.net>
Reviewed by: alc, rwatson


# cfb5f768 18-Aug-2011 Jonathan Anderson <jonathan@FreeBSD.org>

Add experimental support for process descriptors

A "process descriptor" file descriptor is used to manage processes
without using the PID namespace. This is required for Capsicum's
Capability Mode, where the PID namespace is unavailable.

New system calls pdfork(2) and pdkill(2) offer the functional equivalents
of fork(2) and kill(2). pdgetpid(2) allows querying the PID of the remote
process for debugging purposes. The currently-unimplemented pdwait(2) will,
in the future, allow querying rusage/exit status. In the interim, poll(2)
may be used to check (and wait for) process termination.

When a process is referenced by a process descriptor, it does not issue
SIGCHLD to the parent, making it suitable for use in libraries---a common
scenario when using library compartmentalisation from within large
applications (such as web browsers). Some observers may note a similarity
to Mach task ports; process descriptors provide a subset of this behaviour,
but in a UNIX style.

This feature is enabled by "options PROCDESC", but as with several other
Capsicum kernel features, is not enabled by default in GENERIC 9.0.

Reviewed by: jhb, kib
Approved by: re (kib), mentor (rwatson)
Sponsored by: Google Inc


# c30b9b51 20-Jul-2011 Jonathan Anderson <jonathan@FreeBSD.org>

Export capability information via sysctls.

When reporting on a capability, flag the fact that it is a capability,
but also unwrap to report all of the usual information about the
underlying file.

Approved by: re (kib), mentor (rwatson)
Sponsored by: Google Inc


# 925af544 18-Jul-2011 Bjoern A. Zeeb <bz@FreeBSD.org>

Rename ki_ocomm to ki_tdname and OCOMMLEN to TDNAMLEN.
Provide backward compatibility defines under BURN_BRIDGES.

Suggested by: jhb
Reviewed by: emaste
Sponsored by: Sandvine Incorporated
Approved by: re (kib)


# 0daf62d9 12-May-2011 Stanislav Sedov <stas@FreeBSD.org>

- Commit work from libprocstat project. These patches add support for runtime
file and processes information retrieval from the running kernel via sysctl
in the form of new library, libprocstat. The library also supports KVM backend
for analyzing memory crash dumps. Both procstat(1) and fstat(1) utilities have
been modified to take advantage of the library (as the bonus point the fstat(1)
utility no longer need superuser privileges to operate), and the procstat(1)
utility is now able to display information from memory dumps as well.

The newly introduced fuser(1) utility also uses this library and able to operate
via sysctl and kvm backends.

The library is by no means complete (e.g. KVM backend is missing vnode name
resolution routines, and there're no manpages for the library itself) so I
plan to improve it further. I'm commiting it so it will get wider exposure
and review.

We won't be able to MFC this work as it relies on changes in HEAD, which
was introduced some time ago, that break kernel ABI. OTOH we may be able
to merge the library with KVM backend if we really need it there.

Discussed with: rwatson


# 7123f4cd6 05-Mar-2011 Edward Tomasz Napierala <trasz@FreeBSD.org>

Export login class information via kinfo and make it possible to view
it using "ps -o class".


# 96fcc75f 01-Mar-2011 Robert Watson <rwatson@FreeBSD.org>

Add initial support for Capsicum's Capability Mode to the FreeBSD kernel,
compiled conditionally on options CAPABILITIES:

Add a new credential flag, CRED_FLAG_CAPMODE, which indicates that a
subject (typically a process) is in capability mode.

Add two new system calls, cap_enter(2) and cap_getmode(2), which allow
setting and querying (but never clearing) the flag.

Export the capability mode flag via process information sysctls.

Sponsored by: Google, Inc.
Reviewed by: anderson
Discussed with: benl, kris, pjd
Obtained from: Capsicum Project
MFC after: 3 months


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# 6239ef1d 07-Oct-2010 Ed Maste <emaste@FreeBSD.org>

Make a thread's address available via the kern proc sysctl, just like the
process address.

Add "tdaddr" keyword to ps(1) to display this thread address.

Distilled from Sandvine's patch set by Mark Johnston.


# 937912ea 27-May-2010 Attilio Rao <attilio@FreeBSD.org>

Add the support for reporting the NOCOREDUMP flag from
sysctl_kern_proc_vmmap().

Sponsored by: Sandvine Incorporated
Reviewed by: kib, emaste
MFC after: 1 week


# 19effccd 08-May-2010 Konstantin Belousov <kib@FreeBSD.org>

MFC r204051 (by imp):
n64 has a different size for KINFO_PROC_SIZE.

Approved by: imp

MFC r207152:
Move the constants specifying the size of struct kinfo_proc into
machine-specific header files. Add KINFO_PROC32_SIZE for struct
kinfo_proc32 for architectures providing COMPAT_FREEBSD32. Add
CTASSERT for the size of struct kinfo_proc32.

MFC r207269:
Style: use #define<TAB> instead of #define<SPACE>.


# ed780687 23-Apr-2010 Konstantin Belousov <kib@FreeBSD.org>

Move the constants specifying the size of struct kinfo_proc into
machine-specific header files. Add KINFO_PROC32_SIZE for struct
kinfo_proc32 for architectures providing COMPAT_FREEBSD32. Add
CTASSERT for the size of struct kinfo_proc32.

Submitted by: pluknet
Reviewed by: imp, jhb, nwhitehorn
MFC after: 2 weeks


# ae67f550 18-Feb-2010 Warner Losh <imp@FreeBSD.org>

n64 has a different size for KINFO_PROC_SIZE.


# 1b5768be 24-Jul-2009 Brooks Davis <brooks@FreeBSD.org>

Revert the changes to struct kinfo_proc in r194498. Instead, fill
in up to 16 (KI_NGROUPS) values and steal a bit from ki_cr_flags
(all bits currently unused) to indicate overflow with the new flag
KI_CRF_GRP_OVERFLOW.

This fixes procstat -s.

Approved by: re (kib)


# 01381811 24-Jul-2009 John Baldwin <jhb@FreeBSD.org>

Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to
a device pager (OBJT_DEVICE) object in that it uses fictitious pages to
provide aliases to other memory addresses. The primary difference is that
it uses an sglist(9) to determine the physical addresses for a given offset
into the object instead of invoking the d_mmap() method in a device driver.

Reviewed by: alc
Approved by: re (kensmith)
MFC after: 2 weeks


# 9690fcee 10-Jul-2009 Warner Losh <imp@FreeBSD.org>

Properly size things for n64 builds.


# 838d9858 19-Jun-2009 Brooks Davis <brooks@FreeBSD.org>

Rework the credential code to support larger values of NGROUPS and
NGROUPS_MAX, eliminate ABI dependencies on them, and raise the to 1024
and 1023 respectively. (Previously they were equal, but under a close
reading of POSIX, NGROUPS_MAX was defined to be too large by 1 since it
is the number of supplemental groups, not total number of groups.)

The bulk of the change consists of converting the struct ucred member
cr_groups from a static array to a pointer. Do the equivalent in
kinfo_proc.

Introduce new interfaces crcopysafe() and crsetgroups() for duplicating
a process credential before modifying it and for setting group lists
respectively. Both interfaces take care for the details of allocating
groups array. crsetgroups() takes care of truncating the group list
to the current maximum (NGROUPS) if necessary. In the future,
crsetgroups() may be responsible for insuring invariants such as sorting
the supplemental groups to allow groupmember() to be implemented as a
binary search.

Because we can not change struct xucred without breaking application
ABIs, we leave it alone and introduce a new XU_NGROUPS value which is
always 16 and is to be used or NGRPS as appropriate for things such as
NFS which need to use no more than 16 groups. When feasible, truncate
the group list rather than generating an error.

Minor changes:
- Reduce the number of hand rolled versions of groupmember().
- Do not assign to both cr_gid and cr_groups[0].
- Modify ipfw to cache ucreds instead of part of their contents since
they are immutable once referenced by more than one entity.

Submitted by: Isilon Systems (initial implementation)
X-MFC after: never
PR: bin/113398 kern/133867


# 820c5181 01-Jun-2009 Robert Watson <rwatson@FreeBSD.org>

Add a flags field to struct ucred, and export that via kinfo_proc,
consuming one of its spare fields. The cr_flags field is currently
unused, but will be used for features, including capability mode and
pay-as-you-go audit.

Discussed with: jhb, sson


# a8f4518c 02-Dec-2008 Peter Wemm <peter@FreeBSD.org>

kf_offset was supposed to be signed.


# 43151ee6 01-Dec-2008 Peter Wemm <peter@FreeBSD.org>

Merge user/peter/kinfo branch as of r185547 into head.

This changes struct kinfo_filedesc and kinfo_vmentry such that they are
same on both 32 and 64 bit platforms like i386/amd64 and won't require
sysctl wrapping.

Two new OIDs are assigned. The old ones are available under
COMPAT_FREEBSD7 - but it isn't that simple. The superceded interface
was never actually released on 7.x.

The other main change is to pack the data passed to userland via the
sysctl. kf_structsize and kve_structsize are reduced for the copyout.
If you have a process with 100,000+ sockets open, the unpacked records
require a 132MB+ copyout. With packing, it is "only" ~35MB. (Still
seriously unpleasant, but not quite as devastating). A similar problem
exists for the vmentry structure - have lots and lots of shared libraries
and small mmaps and its copyout gets expensive too.

My immediate problem is valgrind. It traditionally achieves this
functionality by parsing procfs output, in a packed format. Secondly, when
tracing 32 bit binaries on amd64 under valgrind, it uses a cross compiled
32 bit binary which ran directly into the differing data structures in 32
vs 64 bit mode. (valgrind uses this to track file descriptor operations
and this therefore affected every single 32 bit binary)

I've added two utility functions to libutil to unpack the structures into
a fixed record length and to make it a little more convenient to use.


# 7a9c4d24 30-Oct-2008 Peter Wemm <peter@FreeBSD.org>

Add three extra to the kinfo_proc_vmmap data. kve_offset - the offset
within an object that a mapping refers to. fileid and fsid are inode/dev
for vnodes. (Linux procfs has these and valgrind is really unhappy
without them.) I believe I didn't change the size of the struct.


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# bc093719 20-Aug-2008 Ed Schouten <ed@FreeBSD.org>

Integrate the new MPSAFE TTY layer to the FreeBSD operating system.

The last half year I've been working on a replacement TTY layer for the
FreeBSD kernel. The new TTY layer was designed to improve the following:

- Improved driver model:

The old TTY layer has a driver model that is not abstract enough to
make it friendly to use. A good example is the output path, where the
device drivers directly access the output buffers. This means that an
in-kernel PPP implementation must always convert network buffers into
TTY buffers.

If a PPP implementation would be built on top of the new TTY layer
(still needs a hooks layer, though), it would allow the PPP
implementation to directly hand the data to the TTY driver.

- Improved hotplugging:

With the old TTY layer, it isn't entirely safe to destroy TTY's from
the system. This implementation has a two-step destructing design,
where the driver first abandons the TTY. After all threads have left
the TTY, the TTY layer calls a routine in the driver, which can be
used to free resources (unit numbers, etc).

The pts(4) driver also implements this feature, which means
posix_openpt() will now return PTY's that are created on the fly.

- Improved performance:

One of the major improvements is the per-TTY mutex, which is expected
to improve scalability when compared to the old Giant locking.
Another change is the unbuffered copying to userspace, which is both
used on TTY device nodes and PTY masters.

Upgrading should be quite straightforward. Unlike previous versions,
existing kernel configuration files do not need to be changed, except
when they reference device drivers that are listed in UPDATING.

Obtained from: //depot/projects/mpsafetty/...
Approved by: philip (ex-mentor)
Discussed: on the lists, at BSDCan, at the DevSummit
Sponsored by: Snow B.V., the Netherlands
dcons(4) fixed by: kan


# 6bc1e9cd 26-Jun-2008 John Baldwin <jhb@FreeBSD.org>

Rework the lifetime management of the kernel implementation of POSIX
semaphores. Specifically, semaphores are now represented as new file
descriptor type that is set to close on exec. This removes the need for
all of the manual process reference counting (and fork, exec, and exit
event handlers) as the normal file descriptor operations handle all of
that for us nicely. It is also suggested as one possible implementation
in the spec and at least one other OS (OS X) uses this approach.

Some bugs that were fixed as a result include:
- References to a named semaphore whose name is removed still work after
the sem_unlink() operation. Prior to this patch, if a semaphore's name
was removed, valid handles from sem_open() would get EINVAL errors from
sem_getvalue(), sem_post(), etc. This fixes that.
- Unnamed semaphores created with sem_init() were not cleaned up when a
process exited or exec'd. They were only cleaned up if the process
did an explicit sem_destroy(). This could result in a leak of semaphore
objects that could never be cleaned up.
- On the other hand, if another process guessed the id (kernel pointer to
'struct ksem' of an unnamed semaphore (created via sem_init)) and had
write access to the semaphore based on UID/GID checks, then that other
process could manipulate the semaphore via sem_destroy(), sem_post(),
sem_wait(), etc.
- As part of the permission check (UID/GID), the umask of the proces
creating the semaphore was not honored. Thus if your umask denied group
read/write access but the explicit mode in the sem_init() call allowed
it, the semaphore would be readable/writable by other users in the
same group, for example. This includes access via the previous bug.
- If the module refused to unload because there were active semaphores,
then it might have deregistered one or more of the semaphore system
calls before it noticed that there was a problem. I'm not sure if
this actually happened as the order that modules are discovered by the
kernel linker depends on how the actual .ko file is linked. One can
make the order deterministic by using a single module with a mod_event
handler that explicitly registers syscalls (and deregisters during
unload after any checks). This also fixes a race where even if the
sem_module unloaded first it would have destroyed locks that the
syscalls might be trying to access if they are still executing when
they are unloaded.

XXX: By the way, deregistering system calls doesn't do any blocking
to drain any threads from the calls.
- Some minor fixes to errno values on error. For example, sem_init()
isn't documented to return ENFILE or EMFILE if we run out of semaphores
the way that sem_open() can. Instead, it should return ENOSPC in that
case.

Other changes:
- Kernel semaphores now use a hash table to manage the namespace of
named semaphores nearly in a similar fashion to the POSIX shared memory
object file descriptors. Kernel semaphores can now also have names
longer than 14 chars (up to MAXPATHLEN) and can include subdirectories
in their pathname.
- The UID/GID permission checks for access to a named semaphore are now
done via vaccess() rather than a home-rolled set of checks.
- Now that kernel semaphores have an associated file object, the various
MAC checks for POSIX semaphores accept both a file credential and an
active credential. There is also a new posixsem_check_stat() since it
is possible to fstat() a semaphore file descriptor.
- A small set of regression tests (using the ksem API directly) is present
in src/tools/regression/posixsem.

Reported by: kris (1)
Tested by: kris
Reviewed by: rwatson (lightly)
MFC after: 1 month


# ac0ddd0b 29-Apr-2008 Oleksandr Tymoshenko <gonzo@FreeBSD.org>

Define KINFO_PROC_SIZE for mips.

Approved by: cognet (mentor)


# f2805949 08-Feb-2008 Joe Marcus Clarke <marcus@FreeBSD.org>

Add support for displaying a process' current working directory, root
directory, and jail directory within procstat. While this functionality
is available already in fstat, encapsulating it in the kern.proc.filedesc
sysctl makes it accessible without using kvm and thus without needing
elevated permissions.

The new procstat output looks like:

PID COMM FD T V FLAGS REF OFFSET PRO NAME
76792 tcsh cwd v d -------- - - - /usr/src
76792 tcsh root v d -------- - - - /
76792 tcsh 15 v c rw------ 16 9130 - -
76792 tcsh 16 v c rw------ 16 9130 - -
76792 tcsh 17 v c rw------ 16 9130 - -
76792 tcsh 18 v c rw------ 16 9130 - -
76792 tcsh 19 v c rw------ 16 9130 - -

I am also bumping __FreeBSD_version for this as this new feature will be
used in at least one port.

Reviewed by: rwatson
Approved by: rwatson


# 07dd4a31 20-Jan-2008 Robert Watson <rwatson@FreeBSD.org>

Export a type for POSIX SHM file descriptors via kern.proc.filedesc as
used by procstat, or SHM descriptors will show up as type unknown in
userspace.


# 1cc8c45c 02-Dec-2007 Robert Watson <rwatson@FreeBSD.org>

Add another new sysctl in support of the forthcoming procstat(1) to
support its -k argument:

kern.proc.kstack - dump the kernel stack of a process, if debugging
is permitted.

This sysctl is present if either "options DDB" or "options STACK" is
compiled into the kernel. Having support for tracing the kernel
stacks of processes from user space makes it much easier to debug
(or understand) specific wmesg's while avoiding the need to enter
DDB in order to determine the path by which a process came to be
blocked on a particular wait channel or lock.


# cc43c38c 02-Dec-2007 Robert Watson <rwatson@FreeBSD.org>

Add two new sysctls in support of the forthcoming procstat(1) to support
its -f and -v arguments:

kern.proc.filedesc - dump file descriptor information for a process, if
debugging is permitted, including socket addresses, open flags, file
offsets, file paths, etc.

kern.proc.vmmap - dump virtual memory mapping information for a process,
if debugging is permitted, including layout and information on
underlying objects, such as the type of object and path.

These provide a superset of the information historically available
through the now-deprecated procfs(4), and are intended to be exported
in an ABI-robust form.


# b61ce5b0 16-Sep-2007 Jeff Roberson <jeff@FreeBSD.org>

- Move all of the PS_ flags into either p_flag or td_flags.
- p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or
previously the sched_lock. These bugs have existed for some time.
- Allow swapout to try each thread in a process individually and then
swapin the whole process if any of these fail. This allows us to move
most scheduler related swap flags into td_flags.
- Keep ki_sflag for backwards compat but change all in source tools to
use the new and more correct location of P_INMEM.

Reported by: pho
Reviewed by: attilio, kib
Approved by: re (kensmith)


# ef4d5877 14-May-2006 Olivier Houchard <cognet@FreeBSD.org>

Switch to a 64bit time_t, while it's not a big problem to do so.

Suggested by: imp


# 73dbd3da 11-May-2006 John Baldwin <jhb@FreeBSD.org>

Remove various bits of conditional Alpha code and fixup a few comments.


# 11f4763d 18-Jan-2006 Julian Elischer <julian@FreeBSD.org>

Return the thread name in the kinfo_proc structure.
Also correct the comment describing what the value is.


# a6180acb 09-Jun-2005 Garance A Drosehn <gad@FreeBSD.org>

Re-arrange some variables in kinfo_proc, and add more spare room. This
makes the amount of spare room consistent across architectures, and gets
rid of some cases where space was wasted-and-unusable because types have
different sizes & alignment on different hardware platforms. This change
causes more variables to move around on i386, and is much less disruptive
on other platforms. This has been tested on i386, ppc, and sparc, and has
at least been compiled on everything but arm.

Reviewed by: no objections from freebsd-arch, bde


# c78941e6 20-Mar-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Add ki_jid field to the kinfo_proc structure and store jail ID there.

Reviewed by: gad
MFC after: 3 days


# 60727d8b 06-Jan-2005 Warner Losh <imp@FreeBSD.org>

/* -> /*- for license, minor formatting changes


# 1eecfae3 26-Nov-2004 David Schultz <das@FreeBSD.org>

Axe a.out core dump support. Neither older gdb binaries nor current
bfd sources understand the present format.


# 75b5bcba 19-Nov-2004 David Schultz <das@FreeBSD.org>

We don't do U area swapping anymore, so update some comments. Also,
prototype the new routine that creates a mock U area for a.out core
dumps.

Reviewed by: arch@


# 8a3f1adf 11-Jul-2004 Alfred Perlstein <alfred@FreeBSD.org>

Reserve a pointer "ki_udata" in kinfo_proc as a convenience for userland.


# b21c2d7b 21-Jun-2004 Garance A Drosehn <gad@FreeBSD.org>

Use the correct type (lwpid_t) for ki_tid .

Noticed by: julian
Approved by: julian, marcel


# 83e36b89 20-Jun-2004 Garance A Drosehn <gad@FreeBSD.org>

Change the architecture-based setting of KINFO_PROC_SIZE and KI_NSPARE so
that it is a series of alphabetically-ordered #fidef's, from Bruce Evans.
Define two new thread-related values in kproc_info, from Cyrille Lefevre.
Also remove a few values from kproc_info that were not needed, and change
around a few comments, from me. Changes are combined into a single commit
simply because it is a hassle to make sure that alignments and sizes are
not changed on any platform when modifying kproc_info.


# d688cb4c 19-Jun-2004 Garance A Drosehn <gad@FreeBSD.org>

Add some more fields to the 'struct kinfo_proc', including some fields
which just mark areas which are empty due to issues with the alignment
of already-existing fields. This defines several unrelated variables
in one shot, because most of the work for updating kinfo_proc is making
sure the sizeof(struct kinfo_proc) remains the same across all hardware
platforms, and that no space is wasted on any platform due to alignment
issues with the new variables.

Submitted by: some by Cyrille Lefevre, some by me


# f3732fd1 17-Jun-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Second half of the dev_t cleanup.

The big lines are:
NODEV -> NULL
NOUDEV -> NODEV
udev_t -> dev_t
udev2dev() -> findcdev()

Various minor adjustments including handling of userland access to kernel
space struct cdev etc.


# 818f833a 07-May-2004 Olivier Houchard <cognet@FreeBSD.org>

Define KINFO_PROC_SIZE for arm.


# 82c6e879 06-Apr-2004 Warner Losh <imp@FreeBSD.org>

Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core


# 53cd4099 21-Jan-2004 Peter Grehan <grehan@FreeBSD.org>

Make proc's kg_nice/ki_nice explicitly signed for PPC. This is a
no-op on {i386/alpha/ia64/sparc64} where chars are signed by
default. Should help ARM and S390 which also suffer from this.

Tested on: ppc, i386, objdump disasm before/after diffs
Reviewed by: obrien, bde (a while back)


# 90af4afa 13-May-2003 John Baldwin <jhb@FreeBSD.org>

- Merge struct procsig with struct sigacts.
- Move struct sigacts out of the u-area and malloc() it using the
M_SUBPROC malloc bucket.
- Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(),
sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared().
- Remove the p_sigignore, p_sigacts, and p_sigcatch macros.
- Add a mutex to struct sigacts that protects all the members of the struct.
- Add sigacts locking.
- Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now
that sigacts is locked.
- Several in-kernel functions such as psignal(), tdsignal(), trapsignal(),
and thread_stopped() are now MP safe.

Reviewed by: arch@
Approved by: re (rwatson)


# 7c2063f7 30-Apr-2003 Peter Wemm <peter@FreeBSD.org>

Use the 64 bit sized struct kinfo_proc for AMD64.


# af3f249f 14-Oct-2002 Peter Wemm <peter@FreeBSD.org>

The a.out md_coredump stuff isn't referenced anywhere anymore, and
hasn't been filled in for ages.. Nuked.


# 551cf4e1 02-Oct-2002 John Baldwin <jhb@FreeBSD.org>

Rename the mutex thread and process states to use a more generic 'LOCK'
name instead. (e.g., SLOCK instead of SMTX, TD_ON_LOCK() instead of
TD_ON_MUTEX()) Eventually a turnstile abstraction will be added that
will be shared with mutexes and other types of locks. SLOCK/TDI_LOCK will
be used internally by the turnstile code and will not be specific to
mutexes. Making the change now ensures that turnstiles can be dropped
in at a later date without affecting the ABI of userland applications.


# cb5e1f4f 02-May-2002 Marcel Moolenaar <marcel@FreeBSD.org>

Adjust KINFO_PROC_SIZE due to segsz_t being changed from a 32-bit to
a 64-bit integral.


# 902200a1 06-Apr-2002 Jake Burkholder <jake@FreeBSD.org>

Correct KINFO_PROC_SIZE for sparc64 due to segsz_t change.


# 789f12fe 19-Mar-2002 Alfred Perlstein <alfred@FreeBSD.org>

Remove __P


# 50c649d2 10-Mar-2002 Dima Dorfman <dd@FreeBSD.org>

Don't depend on ucred.h to include sys/queue.h for us.


# 9e3b3d75 11-Oct-2001 Peter Wemm <peter@FreeBSD.org>

Update comments regarding the transient nature of k_kproc and u_md
in struct user.


# 2af8d76d 13-Sep-2001 David E. O'Brien <obrien@FreeBSD.org>

Re-apply rev 1.178 -- style(9) the structure definitions.
I have to wonder how many other changes were lost in the KSE mildstone 2 merge.


# b40ce416 12-Sep-2001 Julian Elischer <julian@FreeBSD.org>

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 77330eeb 16-Aug-2001 Peter Wemm <peter@FreeBSD.org>

Use the backwards compatability mechanisms so that ps/top etc dont have
unnecessary breakage.

While here, use explicit sizes for the string fields so that we dont
have unintentional changes again in the future when key tunables change.

This still is not quite right, but a june userland is happy with
a -current kernel with these tweaks.


# c37c2d03 09-Aug-2001 John Baldwin <jhb@FreeBSD.org>

Bump MAXCOMLEN from 16 to 19 to take advantage of 32-bit alignment.

Approved by: peter, jasone


# bcbde237 06-Aug-2001 John Baldwin <jhb@FreeBSD.org>

Get the order of the sys/_lock.h and sys/_mutex.h headers right.


# 9ceb1844 30-Jul-2001 Jake Burkholder <jake@FreeBSD.org>

Machine dependent ifdefs for sparc64.


# fb49f549 09-Jun-2001 Benno Rice <benno@FreeBSD.org>

Changes to sys/ includes to support PowerPC.

Reviewed by: obrien, dfr


# fb919e4d 01-May-2001 Mark Murray <markm@FreeBSD.org>

Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by: bde (with reservations)


# f34fa851 28-Mar-2001 John Baldwin <jhb@FreeBSD.org>

Catch up to header include changes:
- <sys/mutex.h> now requires <sys/systm.h>
- <sys/mutex.h> and <sys/sx.h> now require <sys/lock.h>


# 288c7d77 09-Mar-2001 Doug Rabson <dfr@FreeBSD.org>

Define KINFO_PROC_SIZE for ia64.


# 5af5ae6e 08-Mar-2001 Andrew Gallatin <gallatin@FreeBSD.org>

Take the KINFO_PROC_SIZE back down to 912 on alpha.

Since the compiler lays out the stuct so that pointers are naturally
(8-byte) aligned aligned, adding the int ki_layout didn't change the size of
the stuct; it just converted the alignment padding to a usable struct
field.


# 2b4b2c9b 06-Mar-2001 Kirk McKusick <mckusick@FreeBSD.org>

Apply i386 fix in 1.32 for the alpha too.


# 923e3066 06-Mar-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Silence the sizeof warning from struct kinfo_proc


# cf92d2f0 12-Feb-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Since we're in "everybody is hosed anyway" add an layout identifier
to struct kinfo_proc.

All userland/kernel shared structs should contain *both* a size and
a layout field.

I will add the code to use the field later.


# d5a08a60 11-Feb-2001 Jake Burkholder <jake@FreeBSD.org>

Implement a unified run queue and adjust priority levels accordingly.

- All processes go into the same array of queues, with different
scheduling classes using different portions of the array. This
allows user processes to have their priorities propogated up into
interrupt thread range if need be.
- I chose 64 run queues as an arbitrary number that is greater than
32. We used to have 4 separate arrays of 32 queues each, so this
may not be optimal. The new run queue code was written with this
in mind; changing the number of run queues only requires changing
constants in runq.h and adjusting the priority levels.
- The new run queue code takes the run queue as a parameter. This
is intended to be used to create per-cpu run queues. Implement
wrappers for compatibility with the old interface which pass in
the global run queue structure.
- Group the priority level, user priority, native priority (before
propogation) and the scheduling class into a struct priority.
- Change any hard coded priority levels that I found to use
symbolic constants (TTIPRI and TTOPRI).
- Remove the curpriority global variable and use that of curproc.
This was used to detect when a process' priority had lowered and
it should yield. We now effectively yield on every interrupt.
- Activate propogate_priority(). It should now have the desired
effect without needing to also propogate the scheduling class.
- Temporarily comment out the call to vm_page_zero_idle() in the
idle loop. It interfered with propogate_priority() because
the idle process needed to do a non-blocking acquire of Giant
and then other processes would try to propogate their priority
onto it. The idle process should not do anything except idle.
vm_page_zero_idle() will return in the form of an idle priority
kernel thread which is woken up at apprioriate times by the vm
system.
- Update struct kinfo_proc to the new priority interface. Deliberately
change its size by adjusting the spare fields. It remained the same
size, but the layout has changed, so userland processes that use it
would parse the data incorrectly. The size constraint should really
be changed to an arbitrary version number. Also add a debug.sizeof
sysctl node for struct kinfo_proc.


# 05687cca 09-Feb-2001 Peter Wemm <peter@FreeBSD.org>

Remove some leftovers. This is obviously unused, since the #defines
referred to members that no longer exist.


# 6d3e7b9b 23-Jan-2001 John Baldwin <jhb@FreeBSD.org>

Add a new item to kinfo_proc: ki_sflag to mirror p_sflag.


# 52f1d19f 14-Jan-2001 Matt Jacob <mjacob@FreeBSD.org>

The size of kinfo_proc on an alpha is 904 (not 640).


# 1f7d2501 12-Dec-2000 Kirk McKusick <mckusick@FreeBSD.org>

Change the proc information returned from the kernel so that it
no longer contains kernel specific data structures, but rather
only scalar values and structures that are already part of the
kernel/user interface, specifically rusage and rtprio. It no
longer contains proc, session, pcred, ucred, procsig, vmspace,
pstats, mtx, sigiolst, klist, callout, pasleep, or mdproc. If
any of these changed in size, ps, w, fstat, gcore, systat, and
top would all stop working. The new structure has over 200 bytes
of unassigned space for future values to be added, yet is nearly
100 bytes smaller per entry than the structure that it replaced.


# c345dd33 29-Nov-2000 John Baldwin <jhb@FreeBSD.org>

- Add a p_mtxname field to proc which points to the description of the
mutex a process is blocked on or NULL.
- Add a corresponding field e_mtxname to eproc.
- Fix a spelling nit in a comment.


# 664a31e4 28-Dec-1999 Peter Wemm <peter@FreeBSD.org>

Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot). This is consistant with the other
BSD's who made this change quite some time ago. More commits to come.


# e29e202c 16-Nov-1999 Peter Wemm <peter@FreeBSD.org>

Add e_stats (p->p_stats, from struct user->u_stats) to eproc so it's
fetchable via sysctl. This saves ps having to read the u-area for stats.
Be sure to recompile libkvm, ps, w, top and the usual suspects.


# c3aac50f 27-Aug-1999 Peter Wemm <peter@FreeBSD.org>

$Id$ -> $FreeBSD$


# f40ddd55 17-May-1999 Doug Rabson <dfr@FreeBSD.org>

Change the definition of e_tdev in struct kinfo_proc from dev_t to udev_t

Reviewed by: Poul-Henning Kamp <phk@critter.freebsd.dk>


# 88c5ea45 25-Jan-1999 Julian Elischer <julian@FreeBSD.org>

Enable Linux threads support by default.
This takes the conditionals out of the code that has been tested by
various people for a while.
ps and friends (libkvm) will need a recompile as some proc structure
changes are made.

Submitted by: "Richard Seaman, Jr." <dick@tar.com>


# d8c85307 12-Jan-1999 Julian Elischer <julian@FreeBSD.org>

Re-enable the options in ps(1) that were disabled with the Linux
threads support.

Submitted by: "Richard Seaman, Jr." <dick@tar.com>


# dc9c271a 07-Jan-1999 Julian Elischer <julian@FreeBSD.org>

Changes to the LINUX_THREADS support to only allocate extra memory for
shared signal handling when there is shared signal handling being
used.

This removes the main objection to making the shared signal handling
a standard ability in rfork() and friends and 'unconditionalising'
this code. (i.e. the allocation of an extra 328 bytes per process).

Signal handling information remains in the U area until such a time as
it's reference count would be incremented to > 1. At that point a new
struct is malloc'd and maintained in KVM so that it can be shared between
the processes (threads) using it.

A function to check the reference count and move the struct back to the U
area when it drops back to 1 is also supplied. Signal information is
therefore now swapable for all processes that are not sharing that
information with other processes. THis should addres the concerns raised
by Garrett and others.

Submitted by: "Richard Seaman, Jr." <dick@tar.com>


# 6626c604 18-Dec-1998 Julian Elischer <julian@FreeBSD.org>

Reviewed by: Luoqi Chen, Jordan Hubbard
Submitted by: "Richard Seaman, Jr." <lists@tar.com>
Obtained from: linux :-)

Code to allow Linux Threads to run under FreeBSD.

By default not enabled
This code is dependent on the conditional
COMPAT_LINUX_THREADS (suggested by Garret)
This is not yet a 'real' option but will be within some number of hours.


# adf93ebc 15-Jul-1998 Doug Rabson <dfr@FreeBSD.org>

Point at the right place for alpha registers.


# 08637435 28-Mar-1998 Bruce Evans <bde@FreeBSD.org>

Moved some #includes from <sys/param.h> nearer to where they are actually
used.


# e91f3cb0 03-Mar-1997 Andrey A. Chernov <ache@FreeBSD.org>

Use roundup(MAXLOGNAME, sizeof(long)) as e_login/s_login arrays size
instead of all hardcoded assumptions historically used
(i.e. sizeof(long) == 4)

Use MAXLOGNAME == 17 for stricter setlogin() size checking. Since
it rounds up to 20, all sizes remains the same


# 1366201a 03-Mar-1997 Andrey A. Chernov <ache@FreeBSD.org>

Bump MAXLOGNAME to 20 to really hold 16-bytes user names + NUL
and decrease spare array by one to keep the same size of eproc


# 6875d254 22-Feb-1997 Peter Wemm <peter@FreeBSD.org>

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 996c772f 09-Feb-1997 John Dyson <dyson@FreeBSD.org>

This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.

Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>


# 1130b656 14-Jan-1997 Jordan K. Hubbard <jkh@FreeBSD.org>

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# 5e714635 06-Dec-1996 Jordan K. Hubbard <jkh@FreeBSD.org>

Drop the e_spare[] array to 3 elements so that padding of the eproc
structure remains the same despite bumping MAXLOGNAME.


# f313170d 10-Sep-1996 Bruce Evans <bde@FreeBSD.org>

Updated #includes to 4.4Lite style.


# 33c2e07d 08-Dec-1995 Peter Wemm <peter@FreeBSD.org>

A really gross hack to make more of the source tree compile again.
#include <sys/user> used to be self contained, but now it needs either
half a dozen VM specific includes beforehand (yuck, so much for
portability), or some horrible hack like this for user-mode only
applications.. The kind of stuff that needs this is the libkvm stuff,
w, ps, etc... I would welcome a better fix for this BTW.. :-)
(note: this is #ifndef KERNEL, so it shouldn't be re-polluting the kernel
space after it's been so painfully cleaned up...)


# 7e247b34 29-Oct-1995 Poul-Henning Kamp <phk@FreeBSD.org>

Don't include things more than once :-)


# 3a34a5c3 28-Oct-1995 Poul-Henning Kamp <phk@FreeBSD.org>

Sorry, the last commit screwed up for me, this is the right one (I hope!)
Please refer to the previous commit message about sysctl variables.


# 9b2e5354 30-May-1995 Rodney W. Grimes <rgrimes@FreeBSD.org>

Remove trailing whitespace.


# af9da405 20-Aug-1994 Paul Richards <paul@FreeBSD.org>

Made them all idempotent.
Reviewed by:
Submitted by:


# 3c4dd356 02-Aug-1994 David Greenman <dg@FreeBSD.org>

Added $Id$


# df8bae1d 24-May-1994 Rodney W. Grimes <rgrimes@FreeBSD.org>

BSD 4.4 Lite Kernel Sources