History log of /freebsd-current/sys/kern/sys_procdesc.c
Revision Date Author Comments
# f006524d 19-Jan-2024 Konstantin Belousov <kib@FreeBSD.org>

kcmp(2): implement for procdesc

Reviewed by: brooks, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D43518


# fdafd315 24-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Automated cleanup of cdefs and other formatting

Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.

Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/

Sponsored by: Netflix


# 5657e065 28-Aug-2023 Warner Losh <imp@FreeBSD.org>

procdesc: Remove stray references to pdwait4

We've never had pdwait4, and have no plans of adding pdwait4. Remove the
stray references here.

Sponsored by: Netflix


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix


# 2b68eb8e 01-Oct-2021 Mateusz Guzik <mjg@FreeBSD.org>

vfs: remove thread argument from VOP_STAT

and fo_stat.


# 523d6673 08-Nov-2020 Mateusz Guzik <mjg@FreeBSD.org>

procdesc: convert the zone to a malloc type

The object is 128 bytes in size.


# 8e493611 25-Nov-2019 Mariusz Zaborski <oshogbo@FreeBSD.org>

procdesc: allow to collect status through wait(1) if process is traced

The debugger like truss(1) depends on the wait(2) syscall. This syscall
waits for ALL children. When it is waiting for ALL child's the children
created by process descriptors are not returned. This behavior was
introduced because we want to implement libraries which may pdfork(1).

The behavior of process descriptor brakes truss(1) because it will
not be able to collect the status of processes with process descriptors.

To address this problem the status is returned to parent when the
child is traced. While the process is traced the debugger is the new parent.
In case the original parent and debugger are the same process it means the
debugger explicitly used pdfork() to create the child. In that case the debugger
should be using kqueue()/pdwait() instead of wait().

Add test case to verify that. The test case was implemented by markj@.

Reviewed by: kib, markj
Discussed with: jhb
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D20362


# c878d1eb 05-Aug-2019 Mariusz Zaborski <oshogbo@FreeBSD.org>

procdesc: fix the function name

I changed name of the function r350429 and forgot to update
the r350612 patch.

Reported by: jenkins
MFC after: 1 month


# fd631bcd 05-Aug-2019 Mariusz Zaborski <oshogbo@FreeBSD.org>

procdesc: fix reparenting when the debugger is attached

The process is reparented to the debugger while it is attached.
B B
/ ----> |
A A D

Every time when the process is reparented, it is added to the orphan list
of the previous parent:

A->orphan = B
D->orphan = NULL

When the A process will close the process descriptor to the B process,
the B process will be reparented to the init process.
B B - init
| ---->
A D A D

A->orphan = B
D->orphan = B

In this scenario, the B process is in the orphan list of A and D.

When the last process descriptor is closed instead of reparenting
it to the reaper let it stay with the debugger process and set
our previews parent to the reaper.

Add test case for this situation.
Notice that without this patch the kernel will crash with this test case:
panic: orphan 0xfffff8000e990530 of 0xfffff8000e990000 has unexpected oppid 1

Reviewed by: markj, kib
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D20361


# 2c054ce9 16-Nov-2018 Mateusz Guzik <mjg@FreeBSD.org>

proc: always store parent pid in p_oppid

Doing so removes the dependency on proctree lock from sysctl process list
export which further reduces contention during poudriere -j 128 runs.

Reviewed by: kib (previous version)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17825


# ddab8c35 16-Oct-2018 Mark Johnston <markj@FreeBSD.org>

Reparent a child of pdfork(2) to its reaper when the procdesc is closed.

Unconditionally reparenting to PID 1 breaks the procctl(2) reaper
functionality.

Add a regression test for this case.

Reviewed by: kib
Approved by: re (gjb)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17589


# cbd92ce6 09-May-2018 Matt Macy <mmacy@FreeBSD.org>

Eliminate the overhead of gratuitous repeated reinitialization of cap_rights

- Add macros to allow preinitialization of cap_rights_t.

- Convert most commonly used code paths to use preinitialized cap_rights_t.
A 3.6% speedup in fstat was measured with this change.

Reported by: mjg
Reviewed by: oshogbo
Approved by: sbruno
MFC after: 1 month


# 862db53f 20-Feb-2018 Mateusz Guzik <mjg@FreeBSD.org>

Fix reaping on process fd close broken after r329449

The only consumer of proc_reap other than proc_to_reap was not updated
to not PROC_SLOCK.

Reported by: Juan Ramon Molina Menor <listjm club.fr>


# 8a36da99 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/kern: adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.


# 584b675e 27-Jul-2016 Konstantin Belousov <kib@FreeBSD.org>

Hide the boottime and bootimebin globals, provide the getboottime(9)
and getboottimebin(9) KPI. Change consumers of boottime to use the
KPI. The variables were renamed to avoid shadowing issues with local
variables of the same name.

Issue is that boottime* should be adjusted from tc_windup(), which
requires them to be members of the timehands structure. As a
preparation, this commit only introduces the interface.

Some uses of boottime were found doubtful, e.g. NLM uses boottime to
identify the system boot instance. Arguably the identity should not
change on the leap second adjustment, but the commit is about the
timekeeping code and the consumers were kept bug-to-bug compatible.

Tested by: pho (as part of the bigger patch)
Reviewed by: jhb (same)
Discussed with: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
X-Differential revision: https://reviews.freebsd.org/D7302


# 5fa69ff0 10-Jul-2016 Robert Watson <rwatson@FreeBSD.org>

In process-descriptor close(2) and fstat(2), audit target process
information. pgkill(2) already audits target process ID.

MFC after: 3 days
Sponsored by: DARPA, AFRL


# b3a73448 07-Jun-2016 Mariusz Zaborski <oshogbo@FreeBSD.org>

Introduce the PD_CLOEXEC for pdfork(2).

Reviewed by: mjg


# b4490c6e 18-Jul-2015 Konstantin Belousov <kib@FreeBSD.org>

The si_status field of the siginfo_t, provided by the waitid(2) and
SIGCHLD signal, should keep full 32 bits of the status passed to the
_exit(2).

Split the combined p_xstat of the struct proc into the separate exit
status p_xexit for normal process exit, and signalled termination
information p_xsig. Kernel-visible macro KW_EXITCODE() reconstructs
old p_xstat from p_xexit and p_xsig. p_xexit contains complete status
and copied out into si_status.

Requested by: Joerg Schilling
Reviewed by: jilles (previous version), pho
Tested by: pho
Sponsored by: The FreeBSD Foundation


# 9696feeb 22-Sep-2014 John Baldwin <jhb@FreeBSD.org>

Add a new fo_fill_kinfo fileops method to add type-specific information to
struct kinfo_file.
- Move the various fill_*_info() methods out of kern_descrip.c and into the
various file type implementations.
- Rework the support for kinfo_ofile to generate a suitable kinfo_file object
for each file and then convert that to a kinfo_ofile structure rather than
keeping a second, different set of code that directly manipulates
type-specific file information.
- Remove the shm_path() and ksem_info() layering violations.

Differential Revision: https://reviews.freebsd.org/D775
Reviewed by: kib, glebius (earlier version)


# 2d69d0dc 12-Sep-2014 John Baldwin <jhb@FreeBSD.org>

Fix various issues with invalid file operations:
- Add invfo_rdwr() (for read and write), invfo_ioctl(), invfo_poll(),
and invfo_kqfilter() for use by file types that do not support the
respective operations. Home-grown versions of invfo_poll() were
universally broken (they returned an errno value, invfo_poll()
uses poll_no_poll() to return an appropriate event mask). Home-grown
ioctl routines also tended to return an incorrect errno (invfo_ioctl
returns ENOTTY).
- Use the invfo_*() functions instead of local versions for
unsupported file operations.
- Reorder fileops members to match the order in the structure definition
to make it easier to spot missing members.
- Add several missing methods to linuxfileops used by the OFED shim
layer: fo_write(), fo_truncate(), fo_kqfilter(), and fo_stat(). Most
of these used invfo_*(), but a dummy fo_stat() implementation was
added.


# 38219d6a 07-Apr-2014 Ed Schouten <ed@FreeBSD.org>

Implement kqueue(2) for procdesc(4).

kqueue(2) already supports EVFILT_PROC. Add an EVFILT_PROCDESC that
behaves the same, but operates on a procdesc(4) instead. Only implement
NOTE_EXIT for now. The nice thing about NOTE_EXIT is that it also
returns the exit status of the process, meaning that we can now obtain
this value, even if pdwait4(2) is still unimplemented.

Notes:

- Simply reuse EVFILT_NETDEV for EVFILT_PROCDESC. As both of these will
be used on totally different descriptor types, this should not clash.

- Let procdesc_kqops_event() reuse the same structure as filt_proc().
The only difference is that procdesc_kqops_event() should also be able
to deal with the case where the process was already terminated after
registration. Simply test this when hint == 0.

- Fix some style(9) issues in filt_proc() to keep it consistent with the
newly added procdesc_kqops_event().

- Save the exit status of the process in pd->pd_xstat, as we cannot pick
up the proctree_lock from within procdesc_kqops_event().

Discussed on: arch@
Reviewed by: kib@


# d7a39436 06-Apr-2014 Ed Schouten <ed@FreeBSD.org>

Fix a typo. The function name is pdfork; not pfork.


# a90feb39 06-Apr-2014 Ed Schouten <ed@FreeBSD.org>

Nit: fix locking of p->p_state in procdesc_close().

According to <sys/proc.h>, this field needs to be locked with either the
p_mtx or the p_slock. In this case the damage was quite small. Instead
of being reaped, the process would just be reparented to init, so it
could be reaped from there.


# 4a144410 16-Mar-2014 Robert Watson <rwatson@FreeBSD.org>

Update kernel inclusions of capability.h to use capsicum.h instead; some
further refinement is required as some device drivers intended to be
portable over FreeBSD versions rely on __FreeBSD_version to decide whether
to include capability.h.

MFC after: 3 weeks


# f2b525e6 30-Nov-2013 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Make process descriptors standard part of the kernel. rwhod(8) already
requires process descriptors to work and having PROCDESC in GENERIC
seems not enough, especially that we hope to have more and more consumers
in the base.

MFC after: 3 days


# 7008be5b 04-Sep-2013 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Change the cap_rights_t type from uint64_t to a structure that we can extend
in the future in a backward compatible (API and ABI) way.

The cap_rights_t represents capability rights. We used to use one bit to
represent one right, but we are running out of spare bits. Currently the new
structure provides place for 114 rights (so 50 more than the previous
cap_rights_t), but it is possible to grow the structure to hold at least 285
rights, although we can make it even larger if 285 rights won't be enough.

The structure definition looks like this:

struct cap_rights {
uint64_t cr_rights[CAP_RIGHTS_VERSION + 2];
};

The initial CAP_RIGHTS_VERSION is 0.

The top two bits in the first element of the cr_rights[] array contain total
number of elements in the array - 2. This means if those two bits are equal to
0, we have 2 array elements.

The top two bits in all remaining array elements should be 0.
The next five bits in all array elements contain array index. Only one bit is
used and bit position in this five-bits range defines array index. This means
there can be at most five array elements in the future.

To define new right the CAPRIGHT() macro must be used. The macro takes two
arguments - an array index and a bit to set, eg.

#define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL)

We still support aliases that combine few rights, but the rights have to belong
to the same array element, eg:

#define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL)
#define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL)

#define CAP_FCHMODAT (CAP_FCHMOD | CAP_LOOKUP)

There is new API to manage the new cap_rights_t structure:

cap_rights_t *cap_rights_init(cap_rights_t *rights, ...);
void cap_rights_set(cap_rights_t *rights, ...);
void cap_rights_clear(cap_rights_t *rights, ...);
bool cap_rights_is_set(const cap_rights_t *rights, ...);

bool cap_rights_is_valid(const cap_rights_t *rights);
void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src);
void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src);
bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);

Capability rights to the cap_rights_init(), cap_rights_set(),
cap_rights_clear() and cap_rights_is_set() functions are provided by
separating them with commas, eg:

cap_rights_t rights;

cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);

There is no need to terminate the list of rights, as those functions are
actually macros that take care of the termination, eg:

#define cap_rights_set(rights, ...) \
__cap_rights_set((rights), __VA_ARGS__, 0ULL)
void __cap_rights_set(cap_rights_t *rights, ...);

Thanks to using one bit as an array index we can assert in those functions that
there are no two rights belonging to different array elements provided
together. For example this is illegal and will be detected, because CAP_LOOKUP
belongs to element 0 and CAP_PDKILL to element 1:

cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);

Providing several rights that belongs to the same array's element this way is
correct, but is not advised. It should only be used for aliases definition.

This commit also breaks compatibility with some existing Capsicum system calls,
but I see no other way to do that. This should be fine as Capsicum is still
experimental and this change is not going to 9.x.

Sponsored by: The FreeBSD Foundation


# b1dd38f4 16-Aug-2013 Konstantin Belousov <kib@FreeBSD.org>

Restore the previous sendfile(2) behaviour on the block devices.
Provide valid .fo_sendfile method for several missed struct fileops.

Reviewed by: glebius
Sponsored by: The FreeBSD Foundation


# f13b5a0f 12-Nov-2012 Konstantin Belousov <kib@FreeBSD.org>

Add the wait6(2) system call. It takes POSIX waitid()-like process
designator to select a process which is waited for. The system call
optionally returns siginfo_t which would be otherwise provided to
SIGCHLD handler, as well as extended structure accounting for child
and cumulative grandchild resource usage.

Allow to get the current rusage information for non-exited processes
as well, similar to Solaris.

The explicit WEXITED flag is required to wait for exited processes,
allowing for more fine-grained control of the events the waiter is
interested in.

Fix the handling of siginfo for WNOWAIT option for all wait*(2)
family, by not removing the queued signal state.

PR: standards/170346
Submitted by: "Jukka A. Ukkonen" <jau@iki.fi>
MFC after: 1 month


# 707641ec 01-Sep-2012 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Fix panic in procdesc that can be triggered in the following scenario:

1. Process A pdfork(2)s process B.
2. Process A passes process descriptor of B to unrelated process C.
3. Hit CTRL+C to terminate process A. Process B is also terminated
with SIGINT.
4. init(8) collects status of process B.
5. Process C closes process descriptor associated with process B.

When we have such order of events, init(8), by collecting status of
process B, will call procdesc_reap(). This function sets pd_proc to NULL.

Now when process C calls close on this process descriptor,
procdesc_close() is called. Unfortunately procdesc_close() assumes that
pd_proc points at a valid proc structure, but it was set to NULL earlier,
so the kernel panics.

The patch also adds setting 'p->p_procdesc' to NULL in procdesc_reap(),
which I think should be done.

MFC after: 1 week


# 53e16463 19-Jun-2012 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Check proper flag (PDF_DAEMON, not PD_DAEMON) when deciding if the process
should be killed or not.

This fixes killing pdfork(2)ed process on last close of the corresponding
process descriptor.

Reviewed by: rwatson
MFC after: 1 month


# 8451d0dd 16-Sep-2011 Kip Macy <kmacy@FreeBSD.org>

In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by: rwatson
Approved by: re (bz)


# cfb5f768 18-Aug-2011 Jonathan Anderson <jonathan@FreeBSD.org>

Add experimental support for process descriptors

A "process descriptor" file descriptor is used to manage processes
without using the PID namespace. This is required for Capsicum's
Capability Mode, where the PID namespace is unavailable.

New system calls pdfork(2) and pdkill(2) offer the functional equivalents
of fork(2) and kill(2). pdgetpid(2) allows querying the PID of the remote
process for debugging purposes. The currently-unimplemented pdwait(2) will,
in the future, allow querying rusage/exit status. In the interim, poll(2)
may be used to check (and wait for) process termination.

When a process is referenced by a process descriptor, it does not issue
SIGCHLD to the parent, making it suitable for use in libraries---a common
scenario when using library compartmentalisation from within large
applications (such as web browsers). Some observers may note a similarity
to Mach task ports; process descriptors provide a subset of this behaviour,
but in a UNIX style.

This feature is enabled by "options PROCDESC", but as with several other
Capsicum kernel features, is not enabled by default in GENERIC 9.0.

Reviewed by: jhb, kib
Approved by: re (kib), mentor (rwatson)
Sponsored by: Google Inc