History log of /freebsd-current/sys/kern/kern_sig.c
Revision Date Author Comments
# 53186bc1 19-Apr-2024 Konstantin Belousov <kib@FreeBSD.org>

sigqueue(2): add impl-specific flag __SIGQUEUE_TID

The flag allows the pid argument to designate a thread from the calling
process. The flag value is carved from the high bit of the signal
number, which slightly changes the ABI of syscall.

Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D44867


# 2effad53 19-Apr-2024 Konstantin Belousov <kib@FreeBSD.org>

kern_thr.c/kern_sig.c: remove sys/cdefs.h

Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D44867


# 6a4616a5 06-Apr-2024 Jake Freeland <jfree@FreeBSD.org>

ktrace: Record signal violations with KTR_CAPFAIL

Report the delivery of signals to processes other than self while
Capsicum violation tracing with CAPFAIL_SIGNAL.

Reviewed by: markj
Approved by: markj (mentor)
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D40679


# 05296a0f 06-Apr-2024 Jake Freeland <jfree@FreeBSD.org>

ktrace: Record syscall violations with KTR_CAPFAIL

Report syscalls that are not allowed in capability mode with
CAPFAIL_SYSCALL.

Reviewed by: markj
Approved by: markj (mentor)
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D40678


# ed410b78 28-Nov-2023 Konstantin Belousov <kib@FreeBSD.org>

EVFILT_SIGNAL: do not use target process pointer on detach

It is enough to know knlist to remove from it, and the list is
autodestroyed on last removal.

PR: 275286
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D42777


# 29363fb4 23-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove ancient SCCS tags.

Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.

Sponsored by: Netflix


# 0a713948 22-Nov-2023 Alexander Motin <mav@FreeBSD.org>

Replace random sbuf_printf() with cheaper cat/putc.


# 5804a162 25-Sep-2023 Konstantin Belousov <kib@FreeBSD.org>

nosys(): add kern.signosys tunable/sysctl to control SIGSYS

Reviewed by: dchagin, markj
Discussed with: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41976


# b82b4ae7 25-Sep-2023 Konstantin Belousov <kib@FreeBSD.org>

sysentvec: add SV_SIGSYS flag

to allow ABIs to indicate that SIGSYS is needed. Mark all native
FreeBSD ABIs with the flag.

This implicitly marks Linux' ABIs as not delivering SIGSYS on invalid
syscall.

Reviewed by: dchagin, markj
Discussed with: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41976


# 39024a89 25-Sep-2023 Konstantin Belousov <kib@FreeBSD.org>

syscalls: fix missing SIGSYS for several ENOSYS errors

In particular, when the syscall number is too large, or when syscall is
dynamic. For that, add nosys_sysent structure to pass fake sysent to
syscall top code.

Reviewed by: dchagin, markj
Discussed with: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41976


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 1a6238d1 05-Aug-2023 Ed Maste <emaste@FreeBSD.org>

sigexit: add a break in default case

Suggested by: markj
Fixes: 6edbe5616c76 ("Provide some more information for...")
Sponsored by: The FreeBSD Foundation


# 6edbe561 27-Oct-2017 Ed Maste <emaste@FreeBSD.org>

Provide some more information for userland core dumps

Previously the log message indicated only "(core dumped)" if a core was
successfully created, or nothing if it was not. This provides
insufficient information to faciliate debugging. Dtrace is no help as
coredump() is static and we cannot find the return value via fbt.
Expand the log message to include error return value information.

Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39942


# dfe17248 20-Jul-2023 Konstantin Belousov <kib@FreeBSD.org>

sigtd(): prefer non-stopped thread as a target for signal queue

This should improve signal delivery latency and better expose the
process state to the executing threads.

Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41128


# aaa92413 20-Jul-2023 Konstantin Belousov <kib@FreeBSD.org>

Revert "killpg(): close a race with fork(), part 2"

This reverts commits 81a37995c757b4e3ad8a5c699864197fd1ebdcf5 and
565a343ae3a30bc2973182ff8dfd2fa37d7f615f.

There is still a leakage of the p_killpg_cnt, some but not all sources
of which were identified.

Second, and more important, is that there is a fundamental issue with
blocked signals having KSI_KILLPG flag set. Queueing of such signal
increments p_killpg_cnt, but it cannot be decremented until the signal
is delivered. If, for instance, a single-threaded process with blocked
signal receives killpg-kill and executes fork(2), the fork enter check
returns with ERESTART. And since signal is blocked, the condition
cannot be cleared.

Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41128


# 437e1e37 24-Jul-2023 Brooks Davis <brooks@FreeBSD.org>

kern_sig.c: include sys/jail.h per style(9)

Fixes: e7228204343ac61d2316e74891e3606361a23f6e
Sponsored by: DARPA


# 17cb2ac3 13-Jul-2023 Dmitry Chagin <dchagin@FreeBSD.org>

signal: Get rid of gsignal() as it not used anywhere

Reviewed by: imp, kib
Differential Revision: https://reviews.freebsd.org/D41007
MFC after: 1 week


# 565a343a 09-Jul-2023 Konstantin Belousov <kib@FreeBSD.org>

sigqueue_delete_set_proc(): initialize sq_proc for worklist

This should fix leaks for the p_killpg_cnt counter, because
sigqueue_flush() drops ksi's.

Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 81a37995 15-Jun-2023 Konstantin Belousov <kib@FreeBSD.org>

killpg(): close a race with fork(), part 2

When we are sending terminating signal to the group, killpg() needs to
guarantee that all group members are to be terminated (it does not need
to ensure that they are terminated on return from killpg()). The
pg_killsx change eliminates the largest window there, but still, if a
multithreaded process is signalled, the following could happen:
- thread 1 is selected for the signal delivery and gets descheduled
- thread 2 waits for pg_killsx lock, obtains it and forks
- thread 1 continue executing and terminates the process
This scenario allows the child to escape still.

To fix it, count the number of signals sent to the process with
killpg(2), in p_killpg_cnt variable, which is incremented in killpg()
and decremented after signal handler frame is created or in exit1()
after single-threading. This way we avoid forking if the termination is
due.

Noted and reviewed by: markj (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D40493


# 3360b485 12-Jun-2023 Konstantin Belousov <kib@FreeBSD.org>

killpg(2): close a race with fork(2), part1

If the process group member performs fork(), the child could escape
signalling from killpg(). Prevent it by introducing an sx process group
lock pg_killsx which is taken interruptibly shared around fork. If there
is a pending signal, do the trip through userspace with ERESTART to
handle signal ASTs. The lock is taken exclusively during killpg().

The lock is also locked exclusive when the process changes group
membership, to avoid escaping a signal by this means, by ensuring that
the process group is stable during fork.

Note that the new lock is before proctree lock, so in some situations we
could only do trylocking to obtain it.

This relatively simple approach cannot work for REAP_KILL, because
process potentially belongs to more than one reaper tree by having
sub-reapers.

Reported by: dchagin
Tested by: dchagin, pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D40493


# 4b59d172 15-Jun-2023 Konstantin Belousov <kib@FreeBSD.org>

killpg1(): update the herald comment

Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D40493


# 0d3f1b4f 01-Jun-2023 Mark Johnston <markj@FreeBSD.org>

signal: Make the signal disposition table const

No functional change intended.

MFC after: 1 week


# 974be51b 22-Dec-2022 Konstantin Belousov <kib@FreeBSD.org>

Fixes for ptrace_syscallreq()

Re-assign the sc local (syscall number) before moving args for SYS_syscall.
Correct the audit and kdtrace hooks invocations.

Fixes: 140ceb5d956bb8795a77c23d3fd5ef047b0f3c68
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 140ceb5d 30-Nov-2022 Konstantin Belousov <kib@FreeBSD.org>

ptrace(2): add PT_SC_REMOTE remote syscall request

Reviewed by: markj
Discussed with: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D37590


# f0592b3c 30-Nov-2022 Konstantin Belousov <kib@FreeBSD.org>

Add a thread debugging flag TDB_BOUNDARY

It indicates to a debugger that the thread is stopped at the
kernel->user exit path.

Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D37590


# e6feeae2 30-Nov-2022 Konstantin Belousov <kib@FreeBSD.org>

sys: rename td_coredump to td_remotereq

and TDB_COREDUMPRQ to TDB_COREDUMPREQ

Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D37590


# 69413598 10-Mar-2022 Mateusz Guzik <mjg@FreeBSD.org>

signal: use proc_iterate to save on work

Most notably poudriere performs kill -9 -1 in jails for each port
being built. This reduces the scan from hundrends of processes to
literally 1.

Reviewed by: jamie, markj
Differential Revision: https://reviews.freebsd.org/D34522


# 0a4f2ac3 17-Aug-2022 Konstantin Belousov <kib@FreeBSD.org>

kern_sig.c: style

Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D36207


# cdb58f9d 17-Aug-2022 Konstantin Belousov <kib@FreeBSD.org>

ksiginfo_tryfree(): change return type to bool

The function result is already used as bool.

Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D36207


# cc29f221 17-Aug-2022 Konstantin Belousov <kib@FreeBSD.org>

ksiginfo_alloc(): change to directly take M_WAITOK/NOWAIT flags

Also style, and remove unneeded cast.

Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D36207


# c53fec76 02-Aug-2022 Konstantin Belousov <kib@FreeBSD.org>

sig_suspend_threads(): remove 'sending' arg

The TDA_AST flag is set on td2 unconditionally (as it was TDF_ASTPENDING
before AST rework), so it is not used practically for some time.

Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D36033


# f2fd7d8b 03-Aug-2022 Konstantin Belousov <kib@FreeBSD.org>

ast_sig(): add missed TDAI()

Mask checked was completely wrong

Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D36033


# 4fced864 22-Jul-2022 Konstantin Belousov <kib@FreeBSD.org>

sigfastblock_setpend() and fastblock_mask can be static now

Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D35888


# c6d31b83 18-Jul-2022 Konstantin Belousov <kib@FreeBSD.org>

AST: rework

Make most AST handlers dynamically registered. This allows to have
subsystem-specific handler source located in the subsystem files,
instead of making subr_trap.c aware of it. For instance, signal
delivery code on return to userspace is now moved to kern_sig.c.

Also, it allows to have some handlers designated as the cleanup (kclear)
type, which are called both at AST and on thread/process exit. For
instance, ast(), exit1(), and NFS server no longer need to be aware
about UFS softdep processing.

The dynamic registration also allows third-party modules to register AST
handlers if needed. There is one caveat with loadable modules: the
code does not make any effort to ensure that the module is not unloaded
before all threads processed through AST handler in it. In fact, this
is already present behavior for hwpmc.ko and ufs.ko. I do not think it
is worth the efforts and the runtime overhead to try to fix it.

Reviewed by: markj
Tested by: emaste (arm64), pho
Discussed with: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D35888


# 4493a13e 15-May-2022 Konstantin Belousov <kib@FreeBSD.org>

Do not single-thread itself when the process single-threaded some another process

Since both self single-threading and remote single-threading rely on
suspending the thread doing thread_single(), it cannot be mixed: thread
doing thread_suspend_switch() might be subject to thread_suspend_one()
and vice versa.

In collaboration with: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D35310


# 02a2aacb 26-May-2022 Konstantin Belousov <kib@FreeBSD.org>

issignal(): ignore signals when process is single-threading for exit

Places that will wait for curproc->p_singlethr to become zero (in the
next commit, the counter of number of external single-threading is
to be introduced), must wait for it interruptible, otherwise we
deadlock. On the other hand, a signal delivered during this window,
if directed to the waiting thread, would cause the wait loop to become
a busy loop.

Since we are exiting, it is safe to ignore the signals.

Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D35310


# d3000939 04-May-2022 Konstantin Belousov <kib@FreeBSD.org>

P2_WEXIT: avoid thread_single() for exiting process earlier

before the process itself does thread_single(SINGLE_EXIT). We cannot
single-thread such process in ALLPROC (external) mode, and properly
detect and report the failure to do so due to the process becoming
zombie is easier to prevent than handle.

In collaboration with: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D35310


# 4a700f3c 25-Apr-2022 Dmitry Chagin <dchagin@FreeBSD.org>

sigtimedwait: Prevent timeout math overflows.

Our kern_sigtimedwait() calculates absolute sleep timo value as 'uptime+timeout'.
So, when the user specifies a big timeout value (LONG_MAX), the calculated
timo can be less the the current uptime value.
In that case kern_sigtimedwait() returns EAGAIN instead of EINTR, if
unblocked signal was caught.

While here switch to a high-precision sleep method.

Reviewed by: mav, kib
In collaboration with: mav
Differential revision: https://reviews.freebsd.org/D34981
MFC after: 2 weeks


# c5c981d4 18-Apr-2022 Mateusz Guzik <mjg@FreeBSD.org>

signals: plug a set-but-not-used var

Sponsored by: Rubicon Communications, LLC ("Netgate")


# 863070bb 04-Mar-2022 Eric van Gyzen <vangyzen@FreeBSD.org>

ksiginfo_alloc: pass M_WAITOK or M_NOWAIT to uma_zalloc

It expects exactly one of those flags. A future commit will assert this.

Reviewed by: rstone
MFC after: 1 month
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D34451


# bb92cd7b 24-Mar-2022 Mateusz Guzik <mjg@FreeBSD.org>

vfs: NDFREE(&nd, NDF_ONLY_PNBUF) -> NDFREE_PNBUF(&nd)


# a24afbb4 08-Jan-2022 Konstantin Belousov <kib@FreeBSD.org>

Ignore debugger-injected signals left after detaching

PR: 261010
Reported by: Martin Simmons <martin@lispworks.com>
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33787


# 753c8513 04-Jan-2022 John Baldwin <jhb@FreeBSD.org>

sigev_findtd: Fix whitespace nit in argument list.

Obtained from: CheriBSD


# fe27f1db 25-Dec-2021 Alexander Motin <mav@FreeBSD.org>

kern: Remove CTLFLAG_NEEDGIANT from some sysctls.

MFC after: 2 weeks


# 7e1d3eef 25-Nov-2021 Mateusz Guzik <mjg@FreeBSD.org>

vfs: remove the unused thread argument from NDINIT*

See b4a58fbf640409a1 ("vfs: remove cn_thread")

Bump __FreeBSD_version to 1400043.


# 3d277851 21-Oct-2021 Konstantin Belousov <kib@FreeBSD.org>

sig_ast_checksusp(): mark the local p as __diagused

It is only used to assert that the (current) process is locked

Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 81f2e906 16-Oct-2021 Mark Johnston <markj@FreeBSD.org>

signal: Add SIG_FOREACH and refactor issignal()

Add a SIG_FOREACH macro that can be used to iterate over a signal set.
This is a bit cleaner and more efficient than calling sig_ffs() in a
loop. The implementation is based on BIT_FOREACH_ISSET(), except
that the bitset limbs are always 32 bits wide, and signal sets are
1-indexed rather than 0-indexed like bitset(9) sets.

issignal() cannot really be modified to use SIG_FOREACH() directly.
Take this opportunity to split the function into two explicit loops.
I've always found this function hard to read and think that this change
is an improvement.

Remove sig_ffs(), nothing uses it now.

Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32473


# 1adebca1 14-Oct-2021 Konstantin Belousov <kib@FreeBSD.org>

Style

Sponsored by: The FreeBSD Foundation
MFC after: 3 days


# 244ab566 04-Oct-2021 Konstantin Belousov <kib@FreeBSD.org>

Add curproc_sigkilled()

Function returns an indicator that the process was killed with SIGKILL

Reviewed by: imp, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32313


# dc2d0899 04-Oct-2021 Konstantin Belousov <kib@FreeBSD.org>

kern_sig.c: Remove unused SIGPROP_CANTMASK

Reviewed by: imp, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32313


# 9b86d3e5 02-Oct-2021 Konstantin Belousov <kib@FreeBSD.org>

When queuing ignored signal, only abort target thread' sleep if it is inside sigwait()

Reported and tested by: trasz
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32252


# f17eb93d 01-Oct-2021 Konstantin Belousov <kib@FreeBSD.org>

When sending ignored signal, arrange for zero return code from sleep

Otherwise consumers get unexpected EINTR errors without seeing
a properly discarded signal.

Reported and tested by: trasz
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32252


# b599982b 02-Oct-2021 Konstantin Belousov <kib@FreeBSD.org>

Move td_pflags2 TDP2_SIGWAIT to td_flags TDF_SIGWAIT

The flag should be accessible from non-current threads.

Reviewed by: markj
Tested by: trasz
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32252


# cf0ee873 12-Sep-2021 Konstantin Belousov <kib@FreeBSD.org>

Drop cloudabi

According to https://github.com/NuxiNL/cloudlibc:
CloudABI is no longer being maintained. It was an awesome experiment,
but it never got enough traction to be sustainable.

There is no reason to keep it in FreeBSD.

Approved by: ed (private mail)
Reviewed by: emaste
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D31923


# c4feb1ab 16-Aug-2021 Mark Johnston <markj@FreeBSD.org>

sigtimedwait: Use a unique wait channel for sleeping

When a sigtimedwait(2) caller goes to sleep, it uses a wait channel of
p->p_sigacts with the proc lock as the interlock. However, p_sigacts
can be shared between processes if a child is created with
rfork(RFSIGSHARE | RFPROC). Thus we can end up with two threads
sleeping on the same wait channel using different locks, which is not
permitted.

Fix the problem simply by using a process-unique wait channel, following
the example of sigsuspend. The actual wait channel value is irrelevant
here, sleeping threads are awoken using sleepq_abort().

Reported by: syzbot+8c417afabadb50bb8827@syzkaller.appspotmail.com
Reported by: syzbot+1d89fc2a9ef92ef64fa8@syzkaller.appspotmail.com
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31563


# bc387624 05-Jun-2021 Konstantin Belousov <kib@FreeBSD.org>

Add a knob to not drop signal with default ignored or ignored actions

Traditionally, BSD drops signals with the default action during send,
not even putting them to the destination process queue. This semantic
is not shared with other operating systems (Linux), which do queue
such signals. In particular, sigtimedwait(2) and related syscalls can
observe the delivery.

Add a global knob kern.sig_discard_ign which can be set to false to force
enqueuing of the signals with default action. Also add an ABI flag to
indicate that signals should be queued.

Note that it is not practical to run with the knob turned on, because almost
all software that care about the delivery of such signals, is aware of the
difference, and misbehaves if the signals are actually queued. The purpose
of the knob as is is to allow for easier diagnostic of the programs that
need the adjustments, to confirm the cause of problem.

Reported by: dchagin
Reviewed by: dchagin, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30675


# acced8b0 07-Jun-2021 Konstantin Belousov <kib@FreeBSD.org>

sigwait: add comment explaining EINTR/ERESTART details

Reviewed by: dchagin, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30675


# afb36e28 06-Jun-2021 Konstantin Belousov <kib@FreeBSD.org>

sigwait(2) and sigtimedwait(2) must not be restarted.

Reported by: dchagin
Reviewed by: dchagin, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30675


# 87a64872 23-Apr-2021 Konstantin Belousov <kib@FreeBSD.org>

Add ptrace(PT_COREDUMP)

It writes the core of live stopped process to the file descriptor
provided as an argument.

Based on the initial version from https://reviews.freebsd.org/D29691,
submitted by Michał Górny <mgorny@gentoo.org>.

Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D29955


# 68d311b6 24-Apr-2021 Konstantin Belousov <kib@FreeBSD.org>

ptracestop: mark threads suspended there with the new TDB_SSWITCH flag

This way threads in ptracestop can be discovered by debugger

Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D29955


# 2fd1ffef 05-Mar-2021 Konstantin Belousov <kib@FreeBSD.org>

Stop arming kqueue timers on knote owner suspend or terminate

This way, even if the process specified very tight reschedule
intervals, it should be stoppable/killable.

Reported and reviewed by: markj
Tested by: markj, pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D29106


# dc47fdf1 05-Mar-2021 Konstantin Belousov <kib@FreeBSD.org>

Stop arming periodic process timers on suspend or terminate

Reported and reviewed by: markj
Tested by: markj, pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D29106


# dbec10e0 12-Mar-2021 Jonathan T. Looney <jtl@FreeBSD.org>

Fetch the sigfastblock value in syscalls that wait for signals

We have seen several cases of processes which have become "stuck" in
kern_sigsuspend(). When this occurs, the kernel's td_sigblock_val
is set to 0x10 (one block outstanding) and the userspace copy of the
word is set to 0 (unblocked). Because the kernel's cached value
shows that signals are blocked, kern_sigsuspend() blocks almost all
signals, which means the process hangs indefinitely in sigsuspend().

It is not entirely clear what is causing this condition to occur.
However, it seems to make sense to add some protection against this
case by fetching the latest sigfastblock value from userspace for
syscalls which will sleep waiting for signals. Here, the change is
applied to kern_sigsuspend() and kern_sigtimedwait().

Reviewed by: kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D29225


# 513320c0 11-Jan-2021 Konstantin Belousov <kib@FreeBSD.org>

sigfastblock_setpend(): do not set PEND user flag unless TDP_SIGFASTPENDING is set.

User pending bit should not be set if kernel did not noted a pending signal.

Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28089


# 5844bd05 28-Dec-2020 Konstantin Belousov <kib@FreeBSD.org>

jobc: rework detection of orphaned groups.

Instead of trying to maintain pg_jobc counter on each process group
update (and sometimes before), just calculate the counter when needed.
Still, for the benefit of the signal delivery code, explicitly mark
orphaned groups as such with the new process group flag.

This way we prevent bugs in the corner cases where updates to the counter
were missed due to complicated configuration of p_pptr/p_opptr/real_parent
(debugger).

Since we need to iterate over all children of the process on exit, this
change mostly affects the process group entry and leave, where we need
to iterate all process group members to detect orpaned status.

(For MFC, keep pg_jobc around but unused).

Reported by: jhb
Reviewed by: jilles
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27871


# e0d83cd3 30-Dec-2020 Konstantin Belousov <kib@FreeBSD.org>

issignal(): when handling STOP-like signals, drop sigacts mutex earlier.

Reviewed by: jilles
Tested by: pho
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27871


# 993a1699 30-Dec-2020 Konstantin Belousov <kib@FreeBSD.org>

Style. Improve some KASSERTs messages.

Reviewed by: jilles
Tested by: pho
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27871


# 203dda8a 08-Oct-2020 Konstantin Belousov <kib@FreeBSD.org>

sig_intr(9): return early if AST is not scheduled.

Check td_flags for relevant AST requests lock-less. This opens the
race slightly wider where sig_intr() returns false negative, but might
be it is worth it.

Requested by: mjg
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 0400be45 04-Oct-2020 Konstantin Belousov <kib@FreeBSD.org>

Add sig_intr(9).

It gives the answer would the thread sleep according to current state
of signals and suspensions. Of course the answer is racy and allows
for false-negatives (no sleep when signal is delivered after process
lock is dropped). Also the answer might change due to signal
rescheduling among threads in multi-threaded process.

Still it is the best approximation I can provide, to answering the
question was the thread interrupted.

Reviewed by: markj
Tested by: pho, rmacklem
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D26628


# 0c82fb26 04-Oct-2020 Konstantin Belousov <kib@FreeBSD.org>

Refactor sleepq_catch_signals().

- Extract suspension check into sig_ast_checksusp() helper.
- Extract signal check and calculation of the interruption errno into
sig_ast_needsigchk() helper.
The helpers are moved to kern_sig.c which is the proper place for
signal-related code.

Improve control flow in sleepq_catch_signals(), to handle ret == 0
(can sleep) and ret != 0 (interrupted) only once, by separating
checking code into sleepq_check_ast_sq_locked(), which return value is
interpreted at single location.

Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D26628


# 18f917a9 02-Sep-2020 Brooks Davis <brooks@FreeBSD.org>

Always report ENOSYS in init

While rare, encountering an unimplemented system call early in init is
catastrophic and difficult to debug. Even after a SIGSYS handler is
registered, such configurations are problematic. As such, always report
such events for pid 1 (following kern.lognosys if non-zero).

Reviewed by: kevans, imp
Obtained from: CheriBSD (plus suggestions from kevans)
MFC after: 1 week
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D26288


# 6fed89b1 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

kern: clean up empty lines in .c and .h files


# feabaaf9 24-Aug-2020 Mateusz Guzik <mjg@FreeBSD.org>

cache: drop the always curthread argument from reverse lookup routines

Note VOP_VPTOCNP keeps getting it as temporary compatibility for zfs.

Tested by: pho


# 773e541e 20-Aug-2020 Warner Losh <imp@FreeBSD.org>

Use devctl.h instead of bus.h to reduce newbus pollution.

There's no need for these parts of the kernel to know about newbus,
so narrow what is included to devctl.h for device_notify_*.

Suggested by: kib@


# e298466e 29-May-2020 Andriy Gapon <avg@FreeBSD.org>

corefile_open_last: don't keep a locked vnode while locking other ones

Consider this scenario:
- kern.corefile=/var/coredumps/%N.%U.%I.core
- multiple processes with the same name crash at the same time

It's possible that one process selects existing file N as oldvp while it
keeps looking for an unused file number. Another process scans through
files and stumbles upon N. That process would be blocked on the vnode
lock while holding the directory vnode exclusively locked. The first
process would, thus, get blocked on the directory's vnode lock.

More generally, holding a file's vnode lock (oldvp) while trying to lock
its directory (for the next lookup) is a violation of the vnode locking
order.

I have observed this deadlock in the wild.

So, the change to keep oldvp "opened" but unlocked and to lock it again
only if it's to be returned as the result.
As kib noted, an alternative would be to keep the directory locked and
to use VOP_LOOKUP directly for scanning through existing core files.

Reviewed by: kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D25027


# fb3c434b 11-May-2020 Konstantin Belousov <kib@FreeBSD.org>

sigfastblock: fix delivery of the pending signals in single-threaded processes.

If single-threaded process receives a signal during critical section
established by sigfastblock(2) word, unblock did not caused signal
delivery because sigfastblock(SIGFASTBLOCK_UNBLOCK) failed to request
ast handling of the pending signals.

Set TDF_ASTPENDING | TDF_NEEDSIGCHK on unblock or when kernel forces
end of sigfastblock critical section, to cause syscall exit to recheck
and deliver any signal pending.

Reported by: corydoras@ridiculousfish.com
PR: 246385
Sponsored by: The FreeBSD Foundation


# 59838c1a 01-Apr-2020 John Baldwin <jhb@FreeBSD.org>

Retire procfs-based process debugging.

Modern debuggers and process tracers use ptrace() rather than procfs
for debugging. ptrace() has a supserset of functionality available
via procfs and new debugging features are only added to ptrace().
While the two debugging services share some fields in struct proc,
they each use dedicated fields and separate code. This results in
extra complexity to support a feature that hasn't been enabled in the
default install for several years.

PR: 244939 (exp-run)
Reviewed by: kib, mjg (earlier version)
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D23837


# 00ebd809 10-Mar-2020 Konstantin Belousov <kib@FreeBSD.org>

Fix signal delivery might be on sigfastblock clearing.

When clearing sigfastblock, either by sigfastblock(UNSETPTR) call or
implicitly on execve(2), kernel must check for pending signals and
reschedule them if needed.

E.g. on execve, all other threads are terminated, and current thread
fast block pointer is cleaned. If any signal was left pending, it can
now be delivered to the current thread, and we should prepare for
ast() on return to userspace to notice the signals.

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation


# 0bc52b0b 10-Mar-2020 Konstantin Belousov <kib@FreeBSD.org>

Return reschedule_signals() to being static again.

It was used after sigfastblock_setpend() call in in ast() when current
thread fast-blocks signals. Add a flag to sigfastblock_setpend() to
request reschedule, and remove the direct use of the function from
subr_trap.c

Tested by: pho
Sponsored by: The FreeBSD Foundation


# 7029da5c 26-Feb-2020 Pawel Biernacki <kaktus@FreeBSD.org>

Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)

r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718


# fe20aaec 22-Feb-2020 Ryan Libby <rlibby@FreeBSD.org>

sys/kern: quiet -Wwrite-strings

Quiet a variety of Wwrite-strings warnings in sys/kern at low-impact
sites. This patch avoids addressing certain others which would need to
plumb const through structure definitions.

Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D23798


# a113b17f 20-Feb-2020 Konstantin Belousov <kib@FreeBSD.org>

Do not read sigfastblock word on syscall entry.

On machines with SMAP, fueword executes two serializing instructions
which can be seen in microbenchmarks.

As a measure to restore microbenchmark numbers, only read the word on
the attempt to deliver signal in ast(). If the word is set, signal is
not delivered and word is kept, preventing interruption of
interruptible sleeps by signals until userspace calls
sigfastblock(UNBLOCK) which clears the word.

This way, the spurious EINTR that userspace can see while in critical
section is on first interruptible sleep, if a signal is pending, and
on signal posting. It is believed that it is not important for rtld
and lbithr critical sections. It might be visible for the application
code e.g. for the callback of dl_iterate_phdr(3), but again the belief
is that the non-compliance is acceptable. Most important is that the
retry of the sleeping syscall does not interrupt unless additional
signal is posted.

For now I added the knob kern.sigfastblock_fetch_always to enable the
word read on syscall entry to be able to diagnose possible issues due
to spurious EINTR.

While there, do some code restructuting to have all sigfastblock()
handling located in kern_sig.c.

Reviewed by: jeff
Discussed with: mjg
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D23622


# 146fc63f 09-Feb-2020 Konstantin Belousov <kib@FreeBSD.org>

Add a way to manage thread signal mask using shared word, instead of syscall.

A new syscall sigfastblock(2) is added which registers a uint32_t
variable as containing the count of blocks for signal delivery. Its
content is read by kernel on each syscall entry and on AST processing,
non-zero count of blocks is interpreted same as the signal mask
blocking all signals.

The biggest downside of the feature that I see is that memory
corruption that affects the registered fast sigblock location, would
cause quite strange application misbehavior. For instance, the process
would be immune to ^C (but killable by SIGKILL).

With consumers (rtld and libthr added), benchmarks do not show a
slow-down of the syscalls in micro-measurements, and macro benchmarks
like buildworld do not demonstrate a difference. Part of the reason is
that buildworld time is dominated by compiler, and clang already links
to libthr. On the other hand, small utilities typically used by shell
scripts have the total number of syscalls cut by half.

The syscall is not exported from the stable libc version namespace on
purpose. It is intended to be used only by our C runtime
implementation internals.

Tested by: pho
Disscussed with: cem, emaste, jilles
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D12773


# 7739d927 01-Feb-2020 Mateusz Guzik <mjg@FreeBSD.org>

cache: replace kern___getcwd with vn_getcwd

The previous routine was resulting in extra data copies most notably in
linux_getcwd.


# 3ff65f71 30-Jan-2020 Mateusz Guzik <mjg@FreeBSD.org>

Remove duplicated empty lines from kern/*.c

No functional changes.


# b249ce48 03-Jan-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: drop the mostly unused flags argument from VOP_UNLOCK

Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.

Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D21427


# 61a74c5c 15-Dec-2019 Jeff Roberson <jeff@FreeBSD.org>

schedlock 1/4

Eliminate recursion from most thread_lock consumers. Return from
sched_add() without the thread_lock held. This eliminates unnecessary
atomics and lock word loads as well as reducing the hold time for
scheduler locks. This will eventually allow for lockless remote adds.

Discussed with: kib
Reviewed by: jhb
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22626


# 34ad5ac2 13-Dec-2019 Edward Tomasz Napierala <trasz@FreeBSD.org>

Add kern_kill() and use it in Linuxulator. It's just a cleanup,
no functional changes.

Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22645


# 0cc9fb75 07-Dec-2019 Konstantin Belousov <kib@FreeBSD.org>

Only return EPERM from kill(-pid) when no process was signalled.

As mandated by POSIX. Also clarify the kill(2) manpage.

While there, restructure the code in killpg1() to use helper which
keeps overall state of the process list iteration in the killpg1_ctx
structued, later used to infer the error returned.

Reported by: amdmi3
Reviewed by: jilles
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D22621


# ef401a85 27-Nov-2019 Konstantin Belousov <kib@FreeBSD.org>

Requested and tested by: kevans
Reviewed by: kevans (previous version), markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D22546


# 079c5b9e 25-Sep-2019 Kyle Evans <kevans@FreeBSD.org>

rfork(2): add RFSPAWN flag

When RFSPAWN is passed, rfork exhibits vfork(2) semantics but also resets
signal handlers in the child during creation to avoid a point of corruption
of parent state from the child.

This flag will be used by posix_spawn(3) to handle potential signal issues.

Reviewed by: jilles, kib
Differential Revision: https://reviews.freebsd.org/D19058


# 7e097daa 11-Aug-2019 Konstantin Belousov <kib@FreeBSD.org>

Only enable COMPAT_43 changes for syscalls ABI for a.out processes.

Reviewed by: imp, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D21200


# 91898857 29-Jul-2019 Mark Johnston <markj@FreeBSD.org>

Avoid relying on header pollution from sys/refcount.h.

MFC after: 3 days
Sponsored by: The FreeBSD Foundation


# d26d63a4 17-Jul-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: multiple interruptility improvements

1) Don't explicitly not mask SIGKILL. kern_sigprocmask won't allow it to be
masked, anyway.

2) Fix an infinite loop bug. If a process received both a maskable signal
lower than 9 (like SIGINT) and then received SIGKILL,
fticket_wait_answer would spin. msleep would immediately return EINTR,
but cursig would return SIGINT, so the sleep would get retried. Fix it
by explicitly checking whether SIGKILL has been received.

3) Abandon the sig_isfatal optimization introduced by r346357. That
optimization would cause fticket_wait_answer to return immediately,
without waiting for a response from the server, if the process were going
to exit anyway. However, it's vulnerable to a race:

1) fatal signal is received while fticket_wait_answer is sleeping.
2) fticket_wait_answer sends the FUSE_INTERRUPT operation.
3) fticket_wait_answer determines that the signal was fatal and returns
without waiting for a response.
4) Another thread changes the signal to non-fatal.
5) The first thread returns to userspace. Instead of exiting, the
process continues.
6) The application receives EINTR, wrongly believes that the operation
was successfully interrupted, and restarts it. This could cause
problems for non-idempotent operations like FUSE_RENAME.

Reported by: kib (the race part)
Sponsored by: The FreeBSD Foundation


# 9d3ecb7e 16-Jul-2019 Eric van Gyzen <vangyzen@FreeBSD.org>

Adds signal number format to kern.corefile

Add format capability to core file names to include signal
that generated the core. This can help various validation workflows
where all cores should not be considered equally (SIGQUIT is often
intentional and not an error unlike SIGSEGV or SIGBUS)

Submitted by: David Leimbach (leimy2k@gmail.com)
Reviewed by: markj
MFC after: 1 week
Relnotes: sysctl kern.corefile can now include the signal number
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D20970


# 89f2ab06 23-Jun-2019 Konstantin Belousov <kib@FreeBSD.org>

Switch to check for effective user id in r349320, and disable dumping
into existing files for sugid processes.

Despite using real user id pronounces the intent, it actually breaks
suid coredumps, while not making any difference for non-sugid
processes. The reason for the breakage is that non-existent core file
is created with the effective uid (unless weird hacks like SUIDDIR are
configured).

Then, if user enabled kern.sugid_coredump, core dumping should not
overwrite core files owned by effective uid, but we cannot pretend to
use real uid for dumping.

PR: 68905
admbugs: 358
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 7a29e0bf 23-Jun-2019 Konstantin Belousov <kib@FreeBSD.org>

coredump: avoid writing to core files not owned by the real user.

Reported by: blake frantz <trew@hick.org>
PR: 68905
admbugs: 358
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# ab74c843 29-May-2019 Konstantin Belousov <kib@FreeBSD.org>

Do not go into sleep in sleepq_catch_signals() when SIGSTOP from
PT_ATTACH was consumed.

In particular, do not clear TDP_FSTP in ptracestop() if td_wchan is
non-NULL. Leave it to sleepq_catch_signal() to clear and convert zero
return code to EINTR.

Otherwise, per submitter report, if the PT_ATTACH SIGSTOP was
delivered right after the thread was added to the sleepqueue but not
yet really sleep, and cursig() caused debugger attach, the thread
sleeps instead of returning to the userspace boundary with EINTR.

PR: 231445
Reported by: Efi Weiss <valmarelox@gmail.com>
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D20381


# 83bf5ec3 25-Apr-2019 John Baldwin <jhb@FreeBSD.org>

Remove p_code from struct proc.

Contrary to the comments, it was never used by core dumps or
debuggers. Instead, it used to hold the signal code of a pending
signal, but that was replaced by the 'ksi_code' member of ksiginfo_t
when signal information was reworked in 7.0.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D20047


# ebbfe00e 24-Apr-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: interruptibility improvements suggested by kib

* Block stop signals in fticket_wait_answer
* Hold ps_mtx while checking signal disposition
* style(9) changes

PR: 346357
Reported by: kib
Sponsored by: The FreeBSD Foundation


# 559966d6 21-Apr-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: commit missing files from r346387

PR: 346357
Sponsored by: The FreeBSD Foundation


# de2d0d29 04-Jan-2019 Kristof Provost <kp@FreeBSD.org>

Remove unneeded NULL check for td_ucred

td_ucred is always set, so we don't need the ternary expression to check for
it.


# 4ae4822d 29-Dec-2018 Kristof Provost <kp@FreeBSD.org>

Simplify jail ID printing on process exit

As suggested by kib@, we don't need to check p_ucred, because that's only NULL
during process creation, and cr_prison is never NULL.


# af8becca 29-Dec-2018 Kristof Provost <kp@FreeBSD.org>

Make kernel print jail ID when logging a process exit

Kernel now includes jail ID when logging a process exit. jid is 0 for unjailed
processes.

Submitted by: Marie Helene Kvello-Aune <freebsd@mhka.no>
Relnotes: yes
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D18618


# c5786670 10-Dec-2018 John Baldwin <jhb@FreeBSD.org>

Don't report stale signal information for non-signal events in ptrace_lwpinfo.

Once a signal's siginfo was copied to 'td_si' as part of the signal
exchange in issignal(), it was never cleared. This caused future
thread events that are reported as SIGTRAP events without signal
information to report the stale siginfo in 'td_si'. For example, if a
debugger created a new process and used SIGSTOP to stop it after
PT_ATTACH, future system call entry / exit events would set PL_FLAG_SI
with the SIGSTOP siginfo in pl_siginfo. This broke 'catch syscall' in
current versions of gdb as it assumed PL_FLAG_SI with SIGTRAP
indicates a breakpoint or single step trap.

Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D18487


# affd9185 27-Nov-2018 Konstantin Belousov <kib@FreeBSD.org>

Improve sigonstack().

Avoid relying on unsigned overflow for the test.
Simplify expressions to avoid duplicate check for the range.
Style.
Add herald comment.

Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D18361


# a70e9a13 04-Aug-2018 Konstantin Belousov <kib@FreeBSD.org>

Swap in WKILLED processes.

Swapped-out process that is WKILLED must be swapped in as soon as
possible. The reason is that such process can be killed by OOM and
its pages can be only freed if the process exits. To exit, the kernel
stack of the process must be mapped.

When allocating pages for the stack of the WKILLED process on swap in,
use VM_ALLOC_SYSTEM requests to increase the chance of the allocation
to succeed.

Add counter of the swapped out processes to avoid unneeded iteration
over the allprocs list when there is no work to do, reducing the
allproc_lock ownership.

Reviewed by: alc, markj (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D16489


# 6040822c 30-Jul-2018 Alan Somers <asomers@FreeBSD.org>

Make timespecadd(3) and friends public

The timespecadd(3) family of macros were imported from NetBSD back in
r35029. However, they were initially guarded by #ifdef _KERNEL. In the
meantime, we have grown at least 28 syscalls that use timespecs in some
way, leading many programs both inside and outside of the base system to
redefine those macros. It's better just to make the definitions public.

Our kernel currently defines two-argument versions of timespecadd and
timespecsub. NetBSD, OpenBSD, and FreeDesktop.org's libbsd, however, define
three-argument versions. Solaris also defines a three-argument version, but
only in its kernel. This revision changes our definition to match the
common three-argument version.

Bump _FreeBSD_version due to the breaking KPI change.

Discussed with: cem, jilles, ian, bde
Differential Revision: https://reviews.freebsd.org/D14725


# f1fe1e02 15-Jul-2018 Mariusz Zaborski <oshogbo@FreeBSD.org>

Extend amount of possible coredumps from 10 to 100000 when using index format.
The amount of digits in the name of corefile is assigned dynamically.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D16118


# 6cad1a5d 04-Jul-2018 Mariusz Zaborski <oshogbo@FreeBSD.org>

Add description to debug.ncores sysctl.

Reviewed by: bcr
Differential Revision: https://reviews.freebsd.org/D16123


# 0dea6e3c 01-Jul-2018 Mariusz Zaborski <oshogbo@FreeBSD.org>

core(5): overwrite the oldest core dump

The '%I' format in the kern.corefile sysctl limits the number of
core files that a process can generate to the number stored in the
debug.ncores sysctl. The '%I' format is replaced by the single digit
index. Previously, if all indexes were taken the kernel would overwrite
only a core file with the highest index in a filename.
Currently the system will create a new core file if there is a free
index or if all slots are taken it will overwrite the oldest one.

Reviewed by: kib(code), bcr (updating)
Differential Revision: https://reviews.freebsd.org/D15991
Differential Revision: https://reviews.freebsd.org/D16084


# 349fcda4 26-Jun-2018 Warner Losh <imp@FreeBSD.org>

Fix devctl generation for core files.

We have a problem with vn_fullpath_global when the file exists. Work
around it by printing the full path if the core file name starts with /,
or current working directory followed by the filename if not.

Sponsored by: Netflix
Differential Review: https://reviews.freebsd.org/D16026


# 6e22bbf6 21-Jun-2018 Konstantin Belousov <kib@FreeBSD.org>

fork: avoid endless wait with PTRACE_FORK and RFSTOPPED.

An RFSTOPPED thread can't clean TDB_STOPATFORK, which is done in the
fork_return() in its context, so parent is stuck forever. Triggered
when trying to ptrace linux process. Instead of waiting for the new
thread to clear TDB_STOPATFORK, tag it as traced and reparent to the
debugger in do_fork(), and let it only notify the debugger when run.

Submitted by: Yanko Yankulov <yanko.yankulov@gmail.com>
Reviewed by: jhb
MFC after: 1 week
X-MFC-Note: keep p_dbgwait placeholder intact
Differential revision: https://reviews.freebsd.org/D15857


# ddd4d15e 18-May-2018 Matt Macy <mmacy@FreeBSD.org>

signotify: don't create a stack local that isn't used on non-debug builds


# cbd92ce6 09-May-2018 Matt Macy <mmacy@FreeBSD.org>

Eliminate the overhead of gratuitous repeated reinitialization of cap_rights

- Add macros to allow preinitialization of cap_rights_t.

- Convert most commonly used code paths to use preinitialized cap_rights_t.
A 3.6% speedup in fstat was measured with this change.

Reported by: mjg
Reviewed by: oshogbo
Approved by: sbruno
MFC after: 1 month


# 6469bdcd 06-Apr-2018 Brooks Davis <brooks@FreeBSD.org>

Move most of the contents of opt_compat.h to opt_global.h.

opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by: kib, cem, jhb, jtl
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14941


# fb441a88 27-Mar-2018 Konstantin Belousov <kib@FreeBSD.org>

Fix several leaks of kernel stack data through paddings.

It is random collection of fixes for issues not yet corrected,
reported at https://tsyrklevi.ch/clang_analyzer/freebsd_013017/. Many
issues from that list were already corrected. Most of them are for
compat32, old compat32 or affect both primary host ABI and compat32.

The freebsd32_kldstat(), for instance, was already fixed by using
malloc(M_ZERO). Patch includes correction to report the supplied
version back, which is just pedantic.

Reviewed by: brooks, emaste (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D14868


# 681a1b75 20-Feb-2018 Mateusz Guzik <mjg@FreeBSD.org>

Make killpg1 perform process validity checks without proc lock held.


# 6026dcd7 13-Feb-2018 Mark Johnston <markj@FreeBSD.org>

Add support for zstd-compressed user and kernel core dumps.

This works similarly to the existing gzip compression support, but
zstd is typically faster and gives better compression ratios.

Support for this functionality must be configured by adding ZSTDIO to
one's kernel configuration file. dumpon(8)'s new -Z option is used to
configure zstd compression for kernel dumps. savecore(8) now recognizes
and saves zstd-compressed kernel dumps with a .zst extension.

Submitted by: cem (original version)
Relnotes: yes
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D13101,
https://reviews.freebsd.org/D13633


# 78f57a9c 08-Jan-2018 Mark Johnston <markj@FreeBSD.org>

Generalize the gzio API.

We currently use a set of subroutines in kern_gzio.c to perform
compression of user and kernel core dumps. In the interest of adding
support for other compression algorithms (zstd) in this role without
complicating the API consumers, add a simple compressor API which can be
used to select an algorithm.

Also change the (non-default) GZIO kernel option to not enable
compressed user cores by default. It's not clear that such a default
would be desirable with support for multiple algorithms implemented,
and it's inconsistent in that it isn't applied to kernel dumps.

Reviewed by: cem
Differential Revision: https://reviews.freebsd.org/D13632


# 51369649 20-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.


# 537d0fb1 11-Nov-2017 Mateusz Guzik <mjg@FreeBSD.org>

Use pfind_any in linux_rt_sigqueueinfo and kern_sigqueue


# 6e1619da 11-Nov-2017 Mateusz Guzik <mjg@FreeBSD.org>

Add pfind_any

It looks for both regular and zombie processes. This avoids allproc relocking
previously seen with pfind -> zpfind calls.


# e9445808 16-Oct-2017 Konstantin Belousov <kib@FreeBSD.org>

Re-evaluate thread' signal mask after ptracestop().

The stop drops process lock, which allows the signal mask to be
changed and our selected signal might become blocked, i.e. should be
returned to the process queue instead of delivery.

Also, for the existing check of the process no longer having an
attached debugger, we should not loose the signal, but requeue it.

Reported and tested by: bdrewery
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# cd735d8f 16-Oct-2017 Konstantin Belousov <kib@FreeBSD.org>

Improve assertion that an ignored or blocked signal is not delivered.

Split two conditions into separate asserts. Print additional details,
like the signal number and action value.

Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 0167b33b 16-Oct-2017 Konstantin Belousov <kib@FreeBSD.org>

Style.

Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 7ceeb35b 27-Jul-2017 Konstantin Belousov <kib@FreeBSD.org>

Make it possible to request nosys logging to console.

New kern.lognosys values are
1 - log to ctty
2 - log to console
3 - log to both.

Inspired by: eugen
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# f5a077c3 12-Jun-2017 Konstantin Belousov <kib@FreeBSD.org>

Print unimplemented syscall number to the ctty on SIGSYS, if enabled
by the knob kern.lognosys.

Discussed with: imp
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
X-Differential revision: https://reviews.freebsd.org/D11080


# 3e85b721 16-May-2017 Ed Maste <emaste@FreeBSD.org>

Remove register keyword from sys/ and ANSIfy prototypes

A long long time ago the register keyword told the compiler to store
the corresponding variable in a CPU register, but it is not relevant
for any compiler used in the FreeBSD world today.

ANSIfy related prototypes while here.

Reviewed by: cem, jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D10193


# 396a0d44 12-May-2017 Konstantin Belousov <kib@FreeBSD.org>

Do not wake up sleeping thread in reschedule_signals() if the signal
is blocked. The spurious wakeup might result in spurious EINTR.

The reschedule_signals() function is called when the calling thread
has the signal mask changed. For each newly blocked signal, we try to
find a thread which might have the signal not blocked. If no such
thread exists, sigtd() returns random thread, which must not be waken
up. I decided that re-checking, as suggested by PR submitter, is more
reasonable change than to change sigtd() interface, due to other uses
of sigtd(). signotify() already performs this check.

Submitted by: Duane <parakleta@darkreality.org>
PR: 219228
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 0c46712c 11-May-2017 Mark Johnston <markj@FreeBSD.org>

Let ptracestop() suspend threads sleeping in an SBDRY section.

When a thread enters ptracestop(), for example because it had received
SIGSTOP from ptrace(PT_ATTACH), it attempts to suspend other threads in
the same process. In the case of a thread sleeping interruptibly in an
SBDRY section, sig_suspend_threads() must wake the thread and allow it to
reach the user-mode boundary. However, sig_suspend_threads() would
erroneously avoid waking up such threads, resulting in an apparent hang.

Reviewed by: kib
Tested by: pho
MFC after: 2 weeks
Sponsored by: Dell EMC Isilon


# f19351aa 05-May-2017 Brooks Davis <brooks@FreeBSD.org>

Provide a freebsd32 implementation of sigqueue()

The previous misuse of sys_sigqueue() was sending random register or
stack garbage to 64-bit targets. The freebsd32 implementation preserves
the sival_int member of value when signaling a 64-bit process.

Document the mixed ABI implementation of union sigval and the
incompability of sival_ptr with pointer integrity schemes.

Reviewed by: kib, wblock
MFC after: 1 week
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D10605


# 86be94fc 30-Mar-2017 Tycho Nightingale <tychon@FreeBSD.org>

Add support for capturing 'struct ptrace_lwpinfo' for signals
resulting in a process dumping core in the corefile.

Also extend procstat to view select members of 'struct ptrace_lwpinfo'
from the contents of the note.

Sponsored by: Dell EMC Isilon


# 469ec1eb 17-Mar-2017 Konstantin Belousov <kib@FreeBSD.org>

When clearing altsigstack settings on exec, do it to the right thread.

Diagnosed by: smh
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# b4d33259 16-Mar-2017 Eric Badger <badger@FreeBSD.org>

Don't clear p_ptevents on normal SIGKILL delivery

The ptrace() user has the option of discarding the signal. In such a
case, p_ptevents should not be modified. If the ptrace() user decides to
send a SIGKILL, ptevents will be cleared in ptracestop(). procfs events
do not have the capability to discard the signal, so continue to clear
the mask in that case.

Reviewed by: jhb (initial revision)
MFC after: 1 week
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D9939


# b38bd91f 07-Mar-2017 Eric Badger <badger@FreeBSD.org>

don't stop in issignal() if P_SINGLE_EXIT is set

Suppose a traced process is stopped in ptracestop() due to receipt of a
SIGSTOP signal, and is awaiting orders from the tracing process on how
to handle the signal. Before sending any such orders, the tracing
process exits. This should kill the traced process. But suppose a second
thread handles the SIGKILL and proceeds to exit1(), calling
thread_single(). The first thread will now awaken and will have a chance
to check once more if it should go to sleep due to the SIGSTOP. It must
not sleep after P_SINGLE_EXIT has been set; this would prevent the
SIGKILL from taking effect, leaving a stopped orphan behind after the
tracing process dies.

Also add new tests for this condition.

Reviewed by: kib
MFC after: 2 weeks
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D9890


# e052a8b9 02-Mar-2017 Ed Maste <emaste@FreeBSD.org>

kern_sig.c: ANSIfy and remove archaic register keyword

Sponsored by: The FreeBSD Foundation


# 82a4538f 20-Feb-2017 Eric Badger <badger@FreeBSD.org>

Defer ptracestop() signals that cannot be delivered immediately

When a thread is stopped in ptracestop(), the ptrace(2) user may request
a signal be delivered upon resumption of the thread. Heretofore, those signals
were discarded unless ptracestop()'s caller was issignal(). Fix this by
modifying ptracestop() to queue up signals requested by the ptrace user that
will be delivered when possible. Take special care when the signal is SIGKILL
(usually generated from a PT_KILL request); no new stop events should be
triggered after a PT_KILL.

Add a number of tests for the new functionality. Several tests were authored
by jhb.

PR: 212607
Reviewed by: kib
Approved by: kib (mentor)
MFC after: 2 weeks
Sponsored by: Dell EMC
In collaboration with: jhb
Differential Revision: https://reviews.freebsd.org/D9260


# 69a28758 15-Sep-2016 Ed Maste <emaste@FreeBSD.org>

Renumber license clauses in sys/kern to avoid skipping #3


# ed6d876b 06-Sep-2016 Brooks Davis <brooks@FreeBSD.org>

Modernize the initalization of sigproptbl.

Use C99 designators to set the value of each slot and the nitems macro to
check for valid entries. In the process, switch to indexing by signal
number rather than signal-1 for improved clarity.

Obtained from: CheriBSD (a6053c5abf03a5f53bbfcdd3a26429383f67e09f)
Sponsored by: DARPA, AFRL
Reviewed by: kib


# fd50a707 02-Sep-2016 Brooks Davis <brooks@FreeBSD.org>

Merge from CheriBSD:

Rename sigprop-table constants to SIGPROP_ from SA_ to reduce the
impression of a namespace collision.

Submitted by: rwatson
Reviewed by: jhb, kib (slightly different versions)
Obtained from: CheriBSD (814ec5771cb1cb53deba317c561de62a91ae7684)
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D7616


# 7f649dda 18-Aug-2016 Mark Johnston <markj@FreeBSD.org>

Correct a check for P2_PTRACE_FSTP in ptracestop().

MFC after: 1 day


# b7a25e63 28-Jul-2016 Konstantin Belousov <kib@FreeBSD.org>

When a debugger attaches to the process, SIGSTOP is sent to the
target. Due to a way issignal() selects the next signal to deliver
and report, if the simultaneous or already pending another signal
exists, that signal might be reported by the next waitpid(2) call.
This causes minor annoyance for debuggers, which must be prepared to
take any signal as the first event, then filter SIGSTOP later.

More importantly, for tools like gcore(1), which attach and then
detach without processing events, SIGSTOP might leak to be delivered
after PT_DETACH. This results in the process being unintentionally
stopped after detach, which is fatal for automatic tools.

The solution is to force SIGSTOP to be the first signal reported after
the attach. Attach code is modified to set P2_PTRACE_FSTP to indicate
that the attaching ritual was not yet finished, and issignal() prefers
SIGSTOP in that condition. Also, the thread which handles
P2_PTRACE_FSTP is made to guarantee to own p_xthread during the first
waitpid(2). All that ensures that SIGSTOP is consumed first.

Additionally, if P2_PTRACE_FSTP is still set on detach, which means
that waitpid(2) was not called at all, SIGSTOP is removed from the
queue, ensuring that the process is resumed on detach.

In issignal(), when acting on STOPing signals, remove the signal from
queue before suspending. Otherwise parallel attach could result in
ptracestop() acting on that STOP as if it was the STOP signal from the
attach. Then SIGSTOP from attach leaks again.

As a minor refactoring, some bits of the common attach code is moved
to new helper proc_set_traced().

Reported by: markj
Reviewed by: jhb, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D7256


# cc37baea 26-Jul-2016 Stephen J. Kiernan <stevek@FreeBSD.org>

Add the NUM_CORE_FILES kernel config option which specifies the limit for the
number of core files allowed by a particular process when using the %I core
file name pattern.

Sanity check at compile time to ensure the value is within the valid range of
0-10.

Reviewed by: jtl, sjg
Approved by: sjg (mentor)
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D6812


# 8d570f64 15-Jul-2016 John Baldwin <jhb@FreeBSD.org>

Add a mask of optional ptrace() events.

ptrace() now stores a mask of optional events in p_ptevents. Currently
this mask is a single integer, but it can be expanded into an array of
integers in the future.

Two new ptrace requests can be used to manipulate the event mask:
PT_GET_EVENT_MASK fetches the current event mask and PT_SET_EVENT_MASK
sets the current event mask.

The current set of events include:
- PTRACE_EXEC: trace calls to execve().
- PTRACE_SCE: trace system call entries.
- PTRACE_SCX: trace syscam call exits.
- PTRACE_FORK: trace forks and auto-attach to new child processes.
- PTRACE_LWP: trace LWP events.

The S_PT_SCX and S_PT_SCE events in the procfs p_stops flags have
been replaced by PTRACE_SCE and PTRACE_SCX. PTRACE_FORK replaces
P_FOLLOW_FORK and PTRACE_LWP replaces P2_LWP_EVENTS.

The PT_FOLLOW_FORK and PT_LWP_EVENTS ptrace requests remain for
compatibility but now simply toggle corresponding flags in the
event mask.

While here, document that PT_SYSCALL, PT_TO_SCE, and PT_TO_SCX both
modify the event mask and continue the traced process.

Reviewed by: kib
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D7044


# 46e47c4f 03-Jul-2016 Konstantin Belousov <kib@FreeBSD.org>

Provide helper macros to detect 'non-silent SBDRY' state and to
calculate appropriate return value for stops. Simplify the code by
using them.

Fix typo in sig_suspend_threads(). The thread which sleep must be
aborted is td2. (*)

In issignal(), when handling stopping signal for thread in
TD_SBDRY_INTR state, do not stop, this is wrong and fires assert.
This is yet another place where execution should be forced out of
SBDRY-protected region. For such case, return -1 from issignal() and
translate it to corresponding error code in sleepq_catch_signals().
Assert that other consumers of cursig() are not affected by the new
return value. (*)

Micro-optimize, mostly VFS and VOP methods, by avoiding calling the
functions when SIGDEFERSTOP_NOP non-change is requested. (**)

Reported and tested by: pho (*)
Requested by: bde (**)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Approved by: re (gjb)


# 6f56cb8d 28-Jun-2016 Konstantin Belousov <kib@FreeBSD.org>

Complete r302215. TDF_SBDRY | TDF_SERESTART and TDF_SBDRY |
TDF_SEINTR flags values, unlike TDF_SBDRY, must be treated almost as
if TDF_SBDRY is not set for STOP signal delivery. The only difference
is that sig_suspend_threads() should abort the sleep instead of doing
immediate suspension.

Reported by: ngie
Sponsored by: The FreeBSD Foundation
MFC after: 12 days
Approved by: re (gjb)


# 9e590ff0 27-Jun-2016 Konstantin Belousov <kib@FreeBSD.org>

When filt_proc() removes event from the knlist due to the process
exiting (NOTE_EXIT->knlist_remove_inevent()), two things happen:
- knote kn_knlist pointer is reset
- INFLUX knote is removed from the process knlist.
And, there are two consequences:
- KN_LIST_UNLOCK() on such knote is nop
- there is nothing which would block exit1() from processing past the
knlist_destroy() (and knlist_destroy() resets knlist lock pointers).
Both consequences result either in leaked process lock, or
dereferencing NULL function pointers for locking.

Handle this by stopping embedding the process knlist into struct proc.
Instead, the knlist is allocated together with struct proc, but marked
as autodestroy on the zombie reap, by knlist_detach() function. The
knlist is freed when last kevent is removed from the list, in
particular, at the zombie reap time if the list is empty. As result,
the knlist_remove_inevent() is no longer needed and removed.

Other changes:

In filt_procattach(), clear NOTE_EXEC and NOTE_FORK desired events
from kn_sfflags for knote registered by kernel to only get NOTE_CHILD
notifications. The flags leak resulted in excessive
NOTE_EXEC/NOTE_FORK reports.

Fix immediate note activation in filt_procattach(). Condition should
be either the immediate CHILD_NOTE activation, or immediate NOTE_EXIT
report for the exiting process.

In knote_fork(), do not perform racy check for KN_INFLUX before kq
lock is taken. Besides being racy, it did not accounted for notes
just added by scan (KN_SCAN).

Some minor and incomplete style fixes.

Analyzed and tested by: Eric Badger <eric@badgerio.us>
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Approved by: re (gjb)
Differential revision: https://reviews.freebsd.org/D6859


# 3a1e5dd8 26-Jun-2016 Konstantin Belousov <kib@FreeBSD.org>

Rewrite sigdeferstop(9) and sigallowstop(9) into more flexible
framework allowing to set the suspension policy for the dynamic block.
Extend the currently possible policies of stopping on interruptible
sleeps and ignoring such sleeps by two more: do not suspend at
interruptible sleeps, but interrupt them with either EINTR or ERESTART.

Reviewed by: jilles
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Approved by: re (gjb)


# fb4cdc96 09-Jun-2016 Mariusz Zaborski <oshogbo@FreeBSD.org>

Define tunable instead of using CTLFLAG_RWTUN flag with kern.corefile.

The allproc_lock lock used in the sysctl_kern_corefile function is initialized
in the procinit function which is called after setting sysctl values at boot.
That means if we set kern.corefile at boot we will be trying to use
lock with is uninitialized and machine will crash.

If we define kern.corefile as tunable instead of using CTFLAG_RWTUN we will
not call the sysctl_kern_corefile function and we will not use an uninitialized
lock. When machine will boot then we will start using function depending on
the lock.

Reviewed by: pjd


# 5fcfab6e 29-Dec-2015 John Baldwin <jhb@FreeBSD.org>

Add ptrace(2) reporting for LWP events.

Add two new LWPINFO flags: PL_FLAG_BORN and PL_FLAG_EXITED for reporting
thread creation and destruction. Newly created threads will stop to report
PL_FLAG_BORN before returning to userland and exiting threads will stop to
report PL_FLAG_EXIT before exiting completely. Both of these events are
only enabled and reported if PT_LWP_EVENTS is enabled on a process.


# 36160958 16-Dec-2015 Mark Johnston <markj@FreeBSD.org>

Fix style issues around existing SDT probes.

- Use SDT_PROBE<N>() instead of SDT_PROBE(). This has no functional effect
at the moment, but will be needed for some future changes.
- Don't hardcode the module component of the probe identifier. This is
set automatically by the SDT framework.

MFC after: 1 week


# 2f2f522b 27-Sep-2015 Andriy Gapon <avg@FreeBSD.org>

save some bytes by using more concise SDT_PROBE<n> instead of SDT_PROBE

SDT_PROBE requires 5 parameters whereas SDT_PROBE<n> requires n parameters
where n is typically smaller than 5.

Perhaps SDT_PROBE should be made a private implementation detail.

MFC after: 20 days


# f3fe76ec 12-Aug-2015 Ed Schouten <ed@FreeBSD.org>

Unignore signals when starting CloudABI processes.

As CloudABI processes cannot adjust their signal handlers, we need to
make sure that we start up CloudABI processes with consistent signal
masks. Though the POSIx standard signal behavior is all right, we do
need to make sure that we ignore SIGPIPE, as it would otherwise be
hard to interact with pipes and sockets.

Extend execsigs() to iterate over ps_sigignore and call sigdflt() for
each of the ignored signals.

Reviewed by: kib
Obtained from: https://github.com/NuxiNL/freebsd
Differential Revision: https://reviews.freebsd.org/D3365


# b4490c6e 18-Jul-2015 Konstantin Belousov <kib@FreeBSD.org>

The si_status field of the siginfo_t, provided by the waitid(2) and
SIGCHLD signal, should keep full 32 bits of the status passed to the
_exit(2).

Split the combined p_xstat of the struct proc into the separate exit
status p_xexit for normal process exit, and signalled termination
information p_xsig. Kernel-visible macro KW_EXITCODE() reconstructs
old p_xstat from p_xexit and p_xsig. p_xexit contains complete status
and copied out into si_status.

Requested by: Joerg Schilling
Reviewed by: jilles (previous version), pho
Tested by: pho
Sponsored by: The FreeBSD Foundation


# ea566832 10-Jul-2015 Ed Schouten <ed@FreeBSD.org>

Add missing const keyword to kern_sigaction()'s 'act' parameter.

This structure is not modified by the function. Also add const to
sigact_flag_test(), as it is called by kern_sigaction().


# f6f6d240 10-Jun-2015 Mateusz Guzik <mjg@FreeBSD.org>

Implement lockless resource limits.

Use the same scheme implemented to manage credentials.

Code needing to look at process's credentials (as opposed to thred's) is
provided with *_proc variants of relevant functions.

Places which possibly had to take the proc lock anyway still use the proc
pointer to access limits.


# aef68c96 29-May-2015 Konstantin Belousov <kib@FreeBSD.org>

When delivering a signal with default disposition to the thread,
tdsigwakeup() increases the priority of the low-priority threads, to
give them a chance to be terminated timely. Also, kernel allows user
to signal kernel processes. The combined effect is that signalling
idle process bump a priority of the selected delivery thread, which
starts eating CPU.

Check for the delivery thread be an idle thread and do not raise its
priority then.

The signal delivery to the kernel threads must be opt-in feature.
Kernel thread should explicitely declare the ability to handle signals
directed to it. E.g., nfsd threads check for signal as an indication
of exit request.

Most threads do not handle signals at all, and queuing the signal to
them causes odd side-effects. Most innocent consequence is the memory
leak due to queued ksiginfo, which is never deleted from the sigqueue.
Code to prevent even queuing signals to the kernel threads is trivial,
but it requires careful examination of each call to kproc/kthread
creation to decide should the signalling be allowed. The commit is a
stop-gap measure which fixes the immediate case for now.

PR: 200493
Reported and tested by: trasz
Discussed with: trasz, emaste
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 515b7a0b 25-May-2015 John Baldwin <jhb@FreeBSD.org>

Add KTR tracing for some MI ptrace events.

Differential Revision: https://reviews.freebsd.org/D2643
Reviewed by: kib


# 0da9e11b 23-Mar-2015 Rui Paulo <rpaulo@FreeBSD.org>

Disable coredump_devctl because it could lead to leaking paths to
jails.


# 5bc0ff88 20-Mar-2015 Mateusz Guzik <mjg@FreeBSD.org>

coredump: protect corefilename access with a lock

Previously format string traversal could happen while the string itself was
being modified.

Use allproc_lock as coredumping is a rare operation and as such we don't
have to create a dedicated lock.

Submitted by: Tiwei Bie <btw mail.ustc.edu.cn>
Reviewed by: kib
X-Additional: JuniorJobs project


# aa14e9b7 08-Mar-2015 Mark Johnston <markj@FreeBSD.org>

Reimplement support for userland core dump compression using a new interface
in kern_gzio.c. The old gzio interface was somewhat inflexible and has not
worked properly since r272535: currently, the gzio functions are called with
a range lock held on the output vnode, but kern_gzio.c does not pass the
IO_RANGELOCKED flag to vn_rdwr() calls, resulting in deadlock when vn_rdwr()
attempts to reacquire the range lock. Moreover, the new gzio interface can
be used to implement kernel core compression.

This change also modifies the kernel configuration options needed to enable
userland core dump compression support: gzio is now an option rather than a
device, and the COMPRESS_USER_CORES option is removed. Core dump compression
is enabled using the kern.compress_user_cores sysctl/tunable.

Differential Revision: https://reviews.freebsd.org/D1832
Reviewed by: rpaulo
Discussed with: kib


# dacbc9db 24-Feb-2015 Konstantin Belousov <kib@FreeBSD.org>

Keep a reference on the coredump vnode for vn_fullpath() call. Do it
by moving vn_close() after the point where notification is sent.

Reported by: sbruno
Tested by: pho, sbruno
Sponsored by: The FreeBSD Foundation


# b5263b26 11-Feb-2015 Rui Paulo <rpaulo@FreeBSD.org>

Remove check against NULL after M_WAITOK.

Submitted by: Oliver Pinter


# 6fbc0f7d 10-Feb-2015 Rui Paulo <rpaulo@FreeBSD.org>

Restore the data array in coredump(), but use a different style to
calculate the length.

Requested by: kib


# 624157bb 10-Feb-2015 Rui Paulo <rpaulo@FreeBSD.org>

Remove a printf and an strlen() from the coredump code.


# eb6368d4 09-Feb-2015 Rui Paulo <rpaulo@FreeBSD.org>

Sanitise the coredump file names sent to devd.

While there, add a sysctl to turn this feature off as requested by
kib@.


# 842ab62b 09-Feb-2015 Rui Paulo <rpaulo@FreeBSD.org>

Notify devd(8) when a process crashed.

This change implements a notification (via devctl) to userland when
the kernel produces coredumps after a process has crashed.
devd can then run a specific command to produce a human readable crash
report. The command is most usually a helper that runs gdb/lldb
commands on the file/coredump pair. It's possible to use this
functionality for implementing automatic generation of crash reports.

devd(8) will be notified of the full path of the binary that crashed and
the full path of the coredump file.


# 677258f7 18-Jan-2015 Konstantin Belousov <kib@FreeBSD.org>

Add procctl(2) PROC_TRACE_CTL command to enable or disable debugger
attachment to the process. Note that the command is not intended to
be a security measure, rather it is an obfuscation feature,
implemented for parity with other operating systems.

Discussed with: jilles, rwatson
Man page fixes by: rwatson
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# e3612a4c 18-Jan-2015 Konstantin Belousov <kib@FreeBSD.org>

Make SIGSTOP working for sleeps done while waiting for fifo readers or
writers in open(2), when the fifo is located on an NFS mount.

Reported by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 271ab240 16-Jan-2015 Konstantin Belousov <kib@FreeBSD.org>

For sigaction(2), ignore possible garbage in sa_flags for sa_handler
== SIG_DFL or SIG_IGN. Sloppy code does not fully initialize struct
sigaction for such cases, and being too demanding in the case of
default handler does not catch anything.

Reported and tested by: Alex Tutubalin <lexa@lexa.ru>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 8ee9765a 21-Dec-2014 Konstantin Belousov <kib@FreeBSD.org>

Add VN_OPEN_NAMECACHE flag for vn_open_cred(9), which requests that
the created file name was cached. Use the flag for core dumps.

Requested by: rpaulo
Tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 6ddcc233 13-Dec-2014 Konstantin Belousov <kib@FreeBSD.org>

Add facility to stop all userspace processes. The supposed use of the
feature is to quisce the system before suspend.

Stop is implemented by reusing the thread_single(9) with the special
mode SINGLE_ALLPROC. SINGLE_ALLPROC differs from the existing
single-threading modes by allowing (requiring) caller to operate on
other process. Interruptible sleeps for !TDF_SBDRY threads are
suspended like SIGSTOP does it, instead of aborting the sleep, like
SINGLE_NO_EXIT, to avoid spurious EINTRs on resume.

Provide debugging sysctl debug.stop_all_proc, which causes total stop
and suspends syncer, while waiting for variable reset for resume. It
is used for debugging; should be removed after the real use of the
interface is added.

In collaboration with: pho
Discussed with: avg
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 70778bba 28-Nov-2014 Konstantin Belousov <kib@FreeBSD.org>

Assert the state of the process lock and sigact mutex in
kern_sigprocmask() and reschedule_signals().

Discussed with: rea
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# e442f29f 26-Nov-2014 Konstantin Belousov <kib@FreeBSD.org>

Fix SA_SIGINFO | SA_RESETHAND handling. The sysent' sv_sendsig()
method needs pre-reset state of the ps_siginfo to correctly construct
signal frame.

Move sigdflt() call after the sv_sendsig() invocation in postsig().
Simultaneously extract common code from trapsignal() and postsig()
into new helper postsig_done().

Submitted by: rea
MFC after: 1 week


# 539c9eef 04-Oct-2014 Konstantin Belousov <kib@FreeBSD.org>

Fixes for i/o during coredumping:
- Do not dump into system files.
- Do not acquire write reference to the mount point where img.core is
written, in the coredump(). The vn_rdwr() calls from ELF imgact
request the write ref from vn_rdwr(). Recursive acqusition of the
write ref deadlocks with the unmount.
- Instead, take the range lock for the whole core file. This prevents
parallel dumping from two processes executing the same image,
converting the useless interleaved dump into sequential dumping,
with second core overwriting the first.

Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# c83655f3 24-Aug-2014 Konstantin Belousov <kib@FreeBSD.org>

Revert the handling of all siginfo sa_flags except SA_SIGINFO to the
pre-r270321. Namely, the flags are preserved for SIG_DFL and SIG_IGN
dispositions.

Requested and reviewed by: jilles
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# ce8daaad 24-Aug-2014 Mateusz Guzik <mjg@FreeBSD.org>

Use refcount_init in sigacts_alloc.

This change is a no-op, but fixes up an inconsistency introduced with
r268634.

MFC after: 3 days


# 350ae563 22-Aug-2014 Konstantin Belousov <kib@FreeBSD.org>

Ensure that sigaction flags for signal, which disposition is reset to
ignored or default, are not leaking. Apparently, there exists code
which relies on SA_SIGINFO not reported for SIG_DFL or SIG_IGN.

In kern_sigaction, ignore flags when resetting. Encapsulate the flag
and disposition testing into helper sigact_flag_test().

On exec, and when delivering signal with SA_RESETHAND flag set,
signals are reset automatically. Use new helper sigdflt(), which
removes duplicated code and corrects all flag bits for the signal.

For proc0, set sigintr bit for all ignored signals. Ignored signals
are consumed in tdsendsignal() and not delivered to the victim thread
at all.

Reported and tested by: royger
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 2d864174 22-Aug-2014 Konstantin Belousov <kib@FreeBSD.org>

Check the validity of struct sigaction sa_flags value, reject unknown
flags.

Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# c959c237 14-Jul-2014 Mateusz Guzik <mjg@FreeBSD.org>

Manage struct sigacts refcnt with atomics instead of a mutex.

MFC after: 1 week


# d00c8ea4 01-Jul-2014 Mateusz Guzik <mjg@FreeBSD.org>

Perform a lockless check in sigacts_shared.

It is used only during execve (i.e. singlethreaded), so there is no fear
of returning 'not shared' which soon becomes 'shared'.

While here reorganize the code a little to avoid proc lock/unlock in
shared case.

MFC after: 1 week


# af3b2549 27-Jun-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Pull in r267961 and r267973 again. Fix for issues reported will follow.


# 37a107a4 27-Jun-2014 Glen Barber <gjb@FreeBSD.org>

Revert r267961, r267973:

These changes prevent sysctl(8) from returning proper output,
such as:

1) no output from sysctl(8)
2) erroneously returning ENOMEM with tools like truss(1)
or uname(1)
truss: can not get etype: Cannot allocate memory


# 3da1cf1e 27-Jun-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after: 2 weeks
Sponsored by: Mellanox Technologies


# 4a144410 16-Mar-2014 Robert Watson <rwatson@FreeBSD.org>

Update kernel inclusions of capability.h to use capsicum.h instead; some
further refinement is required as some device drivers intended to be
portable over FreeBSD versions rely on __FreeBSD_version to decide whether
to include capability.h.

MFC after: 3 weeks


# f2b525e6 30-Nov-2013 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Make process descriptors standard part of the kernel. rwhod(8) already
requires process descriptors to work and having PROCDESC in GENERIC
seems not enough, especially that we hope to have more and more consumers
in the base.

MFC after: 3 days


# d9fae5ab 26-Nov-2013 Andriy Gapon <avg@FreeBSD.org>

dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINE

In its stead use the Solaris / illumos approach of emulating '-' (dash)
in probe names with '__' (two consecutive underscores).

Reviewed by: markj
MFC after: 3 weeks


# 54366c0b 25-Nov-2013 Attilio Rao <attilio@FreeBSD.org>

- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging
option, unbreak the lock tracing release semantic by embedding
calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined
version of the releasing functions for mutex, rwlock and sxlock.
Failing to do so skips the lockstat_probe_func invokation for
unlocking.
- As part of the LOCKSTAT support is inlined in mutex operation, for
kernel compiled without lock debugging options, potentially every
consumer must be compiled including opt_kdtrace.h.
Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the
dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES
is linked there and it is only used as a compile-time stub [0].

[0] immediately shows some new bug as DTRACE-derived support for debug
in sfxge is broken and it was never really tested. As it was not
including correctly opt_kdtrace.h before it was never enabled so it
was kept broken for a while. Fix this by using a protection stub,
leaving sfxge driver authors the responsibility for fixing it
appropriately [1].

Sponsored by: EMC / Isilon storage division
Discussed with: rstone
[0] Reported by: rstone
[1] Discussed with: philip


# b20a9aa9 17-Nov-2013 Jilles Tjoelker <jilles@FreeBSD.org>

Fix siginfo_t.si_status for wait6/waitid/SIGCHLD.

Per POSIX, si_status should contain the value passed to exit() for
si_code==CLD_EXITED and the signal number for other si_code. This was
incorrect for CLD_EXITED and CLD_DUMPED.

This is still not fully POSIX-compliant (Austin group issue #594 says that
the full value passed to exit() shall be returned via si_status, not just
the low 8 bits) but is sufficient for a si_status-related test in libnih
(upstart, Debian/kFreeBSD).

PR: kern/184002
Reported by: Dmitrijs Ledkovs
Tested by: Dmitrijs Ledkovs


# 7008be5b 04-Sep-2013 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Change the cap_rights_t type from uint64_t to a structure that we can extend
in the future in a backward compatible (API and ABI) way.

The cap_rights_t represents capability rights. We used to use one bit to
represent one right, but we are running out of spare bits. Currently the new
structure provides place for 114 rights (so 50 more than the previous
cap_rights_t), but it is possible to grow the structure to hold at least 285
rights, although we can make it even larger if 285 rights won't be enough.

The structure definition looks like this:

struct cap_rights {
uint64_t cr_rights[CAP_RIGHTS_VERSION + 2];
};

The initial CAP_RIGHTS_VERSION is 0.

The top two bits in the first element of the cr_rights[] array contain total
number of elements in the array - 2. This means if those two bits are equal to
0, we have 2 array elements.

The top two bits in all remaining array elements should be 0.
The next five bits in all array elements contain array index. Only one bit is
used and bit position in this five-bits range defines array index. This means
there can be at most five array elements in the future.

To define new right the CAPRIGHT() macro must be used. The macro takes two
arguments - an array index and a bit to set, eg.

#define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL)

We still support aliases that combine few rights, but the rights have to belong
to the same array element, eg:

#define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL)
#define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL)

#define CAP_FCHMODAT (CAP_FCHMOD | CAP_LOOKUP)

There is new API to manage the new cap_rights_t structure:

cap_rights_t *cap_rights_init(cap_rights_t *rights, ...);
void cap_rights_set(cap_rights_t *rights, ...);
void cap_rights_clear(cap_rights_t *rights, ...);
bool cap_rights_is_set(const cap_rights_t *rights, ...);

bool cap_rights_is_valid(const cap_rights_t *rights);
void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src);
void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src);
bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);

Capability rights to the cap_rights_init(), cap_rights_set(),
cap_rights_clear() and cap_rights_is_set() functions are provided by
separating them with commas, eg:

cap_rights_t rights;

cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);

There is no need to terminate the list of rights, as those functions are
actually macros that take care of the termination, eg:

#define cap_rights_set(rights, ...) \
__cap_rights_set((rights), __VA_ARGS__, 0ULL)
void __cap_rights_set(cap_rights_t *rights, ...);

Thanks to using one bit as an array index we can assert in those functions that
there are no two rights belonging to different array elements provided
together. For example this is illegal and will be detected, because CAP_LOOKUP
belongs to element 0 and CAP_PDKILL to element 1:

cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);

Providing several rights that belongs to the same array's element this way is
correct, but is not advised. It should only be used for aliases definition.

This commit also breaks compatibility with some existing Capsicum system calls,
but I see no other way to do that. This should be fine as Capsicum is still
experimental and this change is not going to 9.x.

Sponsored by: The FreeBSD Foundation


# 7b77e1fe 14-Aug-2013 Mark Johnston <markj@FreeBSD.org>

Specify SDT probe argument types in the probe definition itself rather than
using SDT_PROBE_ARGTYPE(). This will make it easy to extend the SDT(9) API
to allow probes with dynamically-translated types.

There is no functional change.

MFC after: 2 weeks


# 462314b3 21-Jul-2013 Mateusz Guzik <mjg@FreeBSD.org>

Remove duplicate assertion from tdsendsignal.

MFC after: 2 weeks


# b9ce4f67 05-Apr-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Fix memory leak in coredump().

Reviewed by: kib


# 1968f37b 18-Mar-2013 John Baldwin <jhb@FreeBSD.org>

Tweak some comments.


# 3cf3b9f0 18-Mar-2013 John Baldwin <jhb@FreeBSD.org>

Partially revert r195702. Deferring stops is now implemented via a set of
calls to toggle TDF_SBDRY rather than passing PBDRY to individual sleep
calls.
- Remove the stop_allowed parameters from cursig() and issignal().
issignal() checks TDF_SBDRY directly.
- Remove the PBDRY and SLEEPQ_STOP_ON_BDRY flags.


# 593efaf9 21-Feb-2013 John Baldwin <jhb@FreeBSD.org>

Further refine the handling of stop signals in the NFS client. The
changes in r246417 were incomplete as they did not add explicit calls to
sigdeferstop() around all the places that previously passed SBDRY to
_sleep(). In addition, nfs_getcacheblk() could trigger a write RPC from
getblk() resulting in sigdeferstop() recursing. Rather than manually
deferring stop signals in specific places, change the VFS_*() and VOP_*()
methods to defer stop signals for filesystems which request this behavior
via a new VFCF_SBDRY flag. Note that this has to be a VFC flag rather than
a MNTK flag so that it works properly with VFS_MOUNT() when the mount is
not yet fully constructed. For now, only the NFS clients are set this new
flag in VFS_SET().

A few other related changes:
- Add an assertion to ensure that TDF_SBDRY doesn't leak to userland.
- When a lookup request uses VOP_READLINK() to follow a symlink, mark
the request as being on behalf of the thread performing the lookup
(cnp_thread) rather than using a NULL thread pointer. This causes
NFS to properly handle signals during this VOP on an interruptible
mount.

PR: kern/176179
Reported by: Russell Cattelan (sigdeferstop() recursion)
Reviewed by: kib
MFC after: 1 month


# 6c08be2b 17-Feb-2013 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Add break to the default case.


# 888d4d4f 07-Feb-2013 Konstantin Belousov <kib@FreeBSD.org>

When vforked child is traced, the debugging events are not generated
until child performs exec(). The behaviour is reasonable when a
debugger is the real parent, because the parent is stopped until
exec(), and sending a debugging event to the debugger would deadlock
both parent and child.

On the other hand, when debugger is not the parent of the vforked
child, not sending debugging signals makes it impossible to debug
across vfork.

Fix the issue by declining generating debug signals only when vfork()
was done and child called ptrace(PT_TRACEME). Set a new process flag
P_PPTRACE from the attach code for PT_TRACEME, if P_PPWAIT flag is
set, which indicates that the process was created with vfork() and
still did not execed. Check P_PPTRACE from issignal(), instead of
refusing the trace outright for the P_PPWAIT case. The scope of
P_PPTRACE is exactly contained in the scope of P_PPWAIT.

Found and tested by: zont
Reviewed by: pluknet
MFC after: 2 weeks


# a120a7a3 06-Feb-2013 John Baldwin <jhb@FreeBSD.org>

Rework the handling of stop signals in the NFS client. The changes in
195702, 195703, and 195821 prevented a thread from suspending while holding
locks inside of NFS by forcing the thread to fail sleeps with EINTR or
ERESTART but defer the thread suspension to the user boundary. However,
this had the effect that stopping a process during an NFS request could
abort the request and trigger EINTR errors that were visible to userland
processes (previously the thread would have suspended and completed the
request once it was resumed).

This change instead effectively masks stop signals while in the NFS client.
It uses the existing TDF_SBDRY flag to effect this since SIGSTOP cannot
be masked directly. Also, instead of setting PBDRY on individual sleeps,
the NFS client now sets the TDF_SBDRY flag around each NFS request and
stop signals are masked for all sleeps during that region (the previous
change missed sleeps in lockmgr locks). The end result is that stop
signals sent to threads performing an NFS request are completely
ignored until after the NFS request has finished processing and the
thread prepares to return to userland. This restores the behavior of
stop signals being transparent to userland processes while still
preventing threads from suspending while holding NFS locks.

Reviewed by: kib
MFC after: 1 month


# c345faea 19-Dec-2012 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Replace expand_name() function with corefile_open() function, which not
only returns name, but also vnode of corefile to use.

This simplifies the code and closes few races, especially in %I handling.

Reviewed by: kib
Obtained from: WHEEL Systems


# 22a5d85a 19-Dec-2012 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Use correct file permissions when looking for available core file if
kern.corefile contains %I.

Obtained from: WHEEL Systems


# 07a8e078 18-Dec-2012 Pawel Jakub Dawidek <pjd@FreeBSD.org>

The 'flags' argument can be modified in vn_open_cred(), so we need to
set it for every loop interation.

Pointed out by: kib


# cc58032c 18-Dec-2012 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Do not audit paths we try when kern.corefile contains %I.

Obtained from: WHEEL Systems


# 29146f1a 18-Dec-2012 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Style cleanups.


# 086053a3 18-Dec-2012 Pawel Jakub Dawidek <pjd@FreeBSD.org>

The expand_name() function isn't called with the process lock held anymore,
so we can safely use malloc(M_WAITOK) now.

Pointed out by: kib


# f06f465d 17-Dec-2012 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Minor style tweaks.

Obtained from: WHEEL Systems


# c52ff611 17-Dec-2012 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Better variables naming in expand_name() to be more consistent with coredump().

Obtained from: WHEEL Systems


# dd57ce87 16-Dec-2012 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Move expand_name() after process lock is released.

This fixed panic where we hold mutex (process lock) and try to obtain sleepable
lock (vnode lock in expand_name()). The panic could occur when %I was used
in kern.corefile.

Additionally we avoid expand_name() overhead when coredumps are disabled.

Obtained from: WHEEL Systems


# 2ce1b32d 16-Dec-2012 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Don't add audit record when coredumps are disabled or name cannot be expanded.

Discussed with: rwatson
Obtained from: WHEEL Systems


# 7e73ee85 16-Dec-2012 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Make the check easier to read.

Obtained from: WHEEL Systems


# b039f8c2 16-Dec-2012 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Use 'cred' variable.

Obtained from: WHEEL Systems


# b0c9d4d7 27-Nov-2012 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Add kern.capmode_coredump sysctl/tunable to allow processes in capability mode
to dump core.

Reviewed by: rwatson
Obtained from: WHEEL Systems
MFC after: 2 weeks


# 8890f5d0 27-Nov-2012 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Allow to use kill(2) in capability mode, but process can send a signal only
to himself. For example abort(3) at first tries to do kill(getpid(), SIGABRT)
which was failing in capability mode, so the code was failing back to exit(1).

Reviewed by: rwatson
Obtained from: WHEEL Systems
MFC after: 2 weeks


# b62d05fc 27-Nov-2012 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Allow to modify kern.sugid_coredump and kern.corefile from loader.conf.

Obtained from: WHEEL Systems


# c3209846 27-Nov-2012 Pawel Jakub Dawidek <pjd@FreeBSD.org>

More style fixes.


# 23c6445a 27-Nov-2012 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Style fixes (mostly whitespaces).


# 5050aa86 22-Oct-2012 Konstantin Belousov <kib@FreeBSD.org>

Remove the support for using non-mpsafe filesystem modules.

In particular, do not lock Giant conditionally when calling into the
filesystem module, remove the VFS_LOCK_GIANT() and related
macros. Stop handling buffers belonging to non-mpsafe filesystems.

The VFS_VERSION is bumped to indicate the interface change which does
not result in the interface signatures changes.

Conducted and reviewed by: attilio
Tested by: pho


# 3d74f47b 21-Oct-2012 Eitan Adler <eadler@FreeBSD.org>

Correct the killpg(2) return values:

Return EPERM if processes were found but they
were unable to be signaled.

Return the first error from p_cansignal if no signal was successful.

Reviewed by: jilles
Approved by: cperciva
MFC after: 1 week


# 10950e46 21-Oct-2012 Eitan Adler <eadler@FreeBSD.org>

Colin acked the wrong diff originally. fixed version coming soon.

Approved by: cperciva (implicit)


# 2a1c0e4d 21-Oct-2012 Eitan Adler <eadler@FreeBSD.org>

Correct the killpg(2) return values:

Return EPERM if processes were found but they
were unable to be signaled.

Return the first error from p_cansignal if no signal was successful.

Reviewed by: jilles
Approved by: cperciva
MFC after: 1 week


# 0f14f15b 13-Sep-2012 John Baldwin <jhb@FreeBSD.org>

Ignore stop and continue signals sent to an exiting process. Stop signals
set p_xstat to the signal that triggered the stop, but p_xstat is also
used to hold the exit status of an exiting process. Without this change,
a stop signal that arrived after a process was marked P_WEXIT but before
it was marked a zombie would overwrite the exit status with the stop signal
number.

Reviewed by: kib
MFC after: 1 week


# 888aefef 18-Aug-2012 Konstantin Belousov <kib@FreeBSD.org>

Deliver SIGSYS to the guilty thread, not to the process.

MFC after: 1 week


# 7ce60f60 09-Jul-2012 David Xu <davidxu@FreeBSD.org>

Always clear p_xthread if current thread no longer needs it, in theory, if
debugger exited without calling ptrace(PT_DETACH), there is a time window
that the p_xthread may be pointing to non-existing thread, in practical,
this is not a problem because child process soon will be killed by parent
process.


# 2dd9ea6f 12-Apr-2012 Konstantin Belousov <kib@FreeBSD.org>

Add thread-private flag to indicate that error value is already placed
in td_errno. Flag is supposed to be used by syscalls returning
EJUSTRETURN because errno was already placed into the usermode frame
by a call to set_syscall_retval(9). Both ktrace and dtrace get errno
value from td_errno if the flag is set.

Use the flag to fix sigsuspend(2) error return ktrace records.

Requested by: bde
MFC after: 1 week


# 8a8be776 09-Apr-2012 Jilles Tjoelker <jilles@FreeBSD.org>

Remove unused and wrong SA_PROC internal signal property.

The SA_PROC signal property indicated whether each signal number is directed
at a specific thread or at the process in general. However, that depends on
how the signal was generated and not on the signal number. SA_PROC was not
used.


# 6472ac3d 07-Nov-2011 Ed Schouten <ed@FreeBSD.org>

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


# c241c5e4 28-Oct-2011 Sergey Kandaurov <pluknet@FreeBSD.org>

Fix arguments list for proc:::signal-discard DTrace probe.

Reported by: Anton Yuzhaninov <citrin citrin ru>
MFC after: 1 week


# 8e9a54ee 01-Oct-2011 Konstantin Belousov <kib@FreeBSD.org>

The sigwait(3) function shall not return EINTR, according to the
POSIX/SUSvN. The sigwait(2) syscall does return EINTR, and libc.so.7
contains the wrapper sigwait(3) which hides EINTR from callers. The
EINTR return is used by libthr to handle required cancellation point
in the sigwait(3).

To help the binaries linked against pre-libc.so.7, i.e. RELENG_6 and
earlier, to have right ABI for sigwait(3), transform EINTR return from
sigwait(2) into ERESTART.

Discussed with: davidxu
MFC after: 1 week


# 8451d0dd 16-Sep-2011 Kip Macy <kmacy@FreeBSD.org>

In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by: rwatson
Approved by: re (bz)


# cfb5f768 18-Aug-2011 Jonathan Anderson <jonathan@FreeBSD.org>

Add experimental support for process descriptors

A "process descriptor" file descriptor is used to manage processes
without using the PID namespace. This is required for Capsicum's
Capability Mode, where the PID namespace is unavailable.

New system calls pdfork(2) and pdkill(2) offer the functional equivalents
of fork(2) and kill(2). pdgetpid(2) allows querying the PID of the remote
process for debugging purposes. The currently-unimplemented pdwait(2) will,
in the future, allow querying rusage/exit status. In the interim, poll(2)
may be used to check (and wait for) process termination.

When a process is referenced by a process descriptor, it does not issue
SIGCHLD to the parent, making it suitable for use in libraries---a common
scenario when using library compartmentalisation from within large
applications (such as web browsers). Some observers may note a similarity
to Mach task ports; process descriptors provide a subset of this behaviour,
but in a UNIX style.

This feature is enabled by "options PROCDESC", but as with several other
Capsicum kernel features, is not enabled by default in GENERIC 9.0.

Reviewed by: jhb, kib
Approved by: re (kib), mentor (rwatson)
Sponsored by: Google Inc


# b8fdb0d9 26-May-2011 Edward Tomasz Napierala <trasz@FreeBSD.org>

Fix support for RACCT_CORE by merging forgotten file.


# 61009552 17-Apr-2011 Jilles Tjoelker <jilles@FreeBSD.org>

ktrace: Log the code for all signals (PSIG events).

The code provides information on how the signal was generated.

Formerly, the code was only logged for traps, much like only signal handlers
for traps received a meaningful si_code before FreeBSD 7.0.

In rare cases, no information is available and 0 is still logged.

MFC after: 1 week


# e806d352 06-Apr-2011 John Baldwin <jhb@FreeBSD.org>

Fix several places to ignore processes that are not yet fully constructed.

MFC after: 1 week


# c3b127e0 23-Mar-2011 John Baldwin <jhb@FreeBSD.org>

Small style fix.


# 6fa39a73 25-Jan-2011 Konstantin Belousov <kib@FreeBSD.org>

Allow debugger to specify that children of the traced process should be
automatically traced. Extend the ptrace(PL_LWPINFO) to report that child
just forked.

Reviewed by: davidxu, jhb
MFC after: 2 weeks


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# 407af02b 14-Oct-2010 David Xu <davidxu@FreeBSD.org>

In kern_sigtimedwait(), move initialization code out of process lock,
instead of using SIGISMEMBER to test every interesting signal, just
unmask the signal set and let cursig() return one, get the signal
after it returns, call reschedule_signal() after signals are blocked
again.

In kern_sigprocmask(), don't call reschedule_signal() when it is
unnecessary.

In reschedule_signal(), replace SIGISEMPTY() + SIGISMEMBER() with
sig_ffs(), rename variable 'i' to sig.


# fc4ecc1d 13-Oct-2010 David Xu <davidxu@FreeBSD.org>

sigqueue_collect_set() is no longer needed because other functions
maintain pending set correctly.


# cf7d9a8c 08-Oct-2010 David Xu <davidxu@FreeBSD.org>

Create a global thread hash table to speed up thread lookup, use
rwlock to protect the table. In old code, thread lookup is done with
process lock held, to find a thread, kernel has to iterate through
process and thread list, this is quite inefficient.
With this change, test shows in extreme case performance is
dramatically improved.

Earlier patch was reviewed by: jhb, julian


# 4d369413 10-Sep-2010 Matthew D Fleming <mdf@FreeBSD.org>

Replace sbuf_overflowed() with sbuf_error(), which returns any error
code associated with overflow or with the drain function. While this
function is not expected to be used often, it produces more information
in the form of an errno that sbuf_overflowed() did.


# 137cf33d 31-Aug-2010 David Xu <davidxu@FreeBSD.org>

rescure comments from RELENG_4.


# 83b718eb 31-Aug-2010 David Xu <davidxu@FreeBSD.org>

If a process is being debugged, skips job control caused by SIGSTOP/SIGCONT
signals, because it is managed by debugger, however a normal signal sent to
a interruptibly sleeping thread wakes up the thread so it will handle the
signal when the process leaves the stopped state.

PR: 150138
MFC after: 1 week


# 79856499 22-Aug-2010 Rui Paulo <rpaulo@FreeBSD.org>

Add an extra comment to the SDT probes definition. This allows us to get
use '-' in probe names, matching the probe names in Solaris.[1]

Add userland SDT probes definitions to sys/sdt.h.

Sponsored by: The FreeBSD Foundation
Discussed with: rwaston [1]


# 212bc4b3 19-Jul-2010 David Xu <davidxu@FreeBSD.org>

Fix function name in error messages.


# fc8cca02 08-Jul-2010 John Baldwin <jhb@FreeBSD.org>

- Various style and whitespace fixes.
- Make sugid_coredump and kern_logsigexit private to kern_sig.c.

Submitted by: bde (partially)
MFC after: 1 month


# 8a260079 04-Jul-2010 Konstantin Belousov <kib@FreeBSD.org>

Extend ptrace(PT_LWPINFO) to report siginfo for the signal that caused
debugee stop. The change should keep the ABI. Take care of compat32.

Discussed with: davidxu, jhb
MFC after: 2 weeks


# ad6eec7b 29-Jun-2010 John Baldwin <jhb@FreeBSD.org>

Tweak the in-kernel API for sending signals to threads:
- Rename tdsignal() to tdsendsignal() and make it private to kern_sig.c.
- Add tdsignal() and tdksignal() routines that mirror psignal() and
pksignal() except that they accept a thread as an argument instead of
a process. They send a signal to a specific thread rather than to an
individual process.

Reviewed by: kib


# c5105012 21-Jun-2010 Konstantin Belousov <kib@FreeBSD.org>

Do not report a stack garbage as the old value for debug.ncores sysctl.

Reported by: brucec


# afe1a688 23-May-2010 Konstantin Belousov <kib@FreeBSD.org>

Reorganize syscall entry and leave handling.

Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
usermode into struct syscall_args. The structure is machine-depended
(this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
from the syscall. It is a generalization of
cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
return value.
sv_syscallnames - the table of syscall names.

Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().

The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.

Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().

Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively. The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.

The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.

Reviewed by: jhb, marcel, marius, nwhitehorn, stas
Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
stas (mips)
MFC after: 1 month


# 88389d61 03-May-2010 Konstantin Belousov <kib@FreeBSD.org>

MFC r206264:
When OOM searches for a process to kill, ignore the processes already
killed by OOM. When killed process waits for a page allocation, try to
satisfy the request as fast as possible.


# b7402d82 29-Apr-2010 Alfred Perlstein <alfred@FreeBSD.org>

Avoid allocating MAXHOSTNAMELEN bytes on the stack in expand_name(),
use the heap instead.

Obtained from: Juniper Networks

Reviewed by: jhb


# 3f1c4c4f 06-Apr-2010 Konstantin Belousov <kib@FreeBSD.org>

When OOM searches for a process to kill, ignore the processes already
killed by OOM. When killed process waits for a page allocation, try to
satisfy the request as fast as possible.

This removes the often encountered deadlock, where OOM continously
selects the same victim process, that sleeps uninterruptibly waiting
for a page. The killed process may still sleep if page cannot be
obtained immediately, but testing has shown that system has much
higher chance to survive in OOM situation with the patch.

In collaboration with: pho
Reviewed by: alc
MFC after: 4 weeks


# e7228204 01-Mar-2010 Alfred Perlstein <alfred@FreeBSD.org>

Merge projects/enhanced_coredumps (r204346) into HEAD:

Enhanced process coredump routines.

This brings in the following features:
1) Limit number of cores per process via the %I coredump formatter.
Example:
if corefilename is set to %N.%I.core AND num_cores = 3, then
if a process "rpd" cores, then the corefile will be named
"rpd.0.core", however if it cores again, then the kernel will
generate "rpd.1.core" until we hit the limit of "num_cores".

this is useful to get several corefiles, but also prevent filling
the machine with corefiles.

2) Encode machine hostname in core dump name via %H.

3) Compress coredumps, useful for embedded platforms with limited space.
A sysctl kern.compress_user_cores is made available if turned on.

To enable compressed coredumps, the following config options need to be set:
options COMPRESS_USER_CORES
device zlib # brings in the zlib requirements.
device gzio # brings in the kernel vnode gzip output module.

4) Eventhandlers are fired to indicate coredumps in progress.

5) The imgact sv_coredump routine has grown a flag to pass in more
state, currently this is used only for passing a flag down to compress
the coredump or not.

Note that the gzio facility can be used for generic output of gzip'd
streams via vnodes.

Obtained from: Juniper Networks
Reviewed by: kan


# 661f092a 31-Jan-2010 Konstantin Belousov <kib@FreeBSD.org>

MFC r202881:
Staticise sigqueue manipulation functions used only in kern_sig.c.


# c9dc5d49 29-Jan-2010 Konstantin Belousov <kib@FreeBSD.org>

MFC r202692:
Remove the signal from sigqueue before notifying the debugger for traced
process, fixing the race between resuming from stopped state and other
thread noting the old signal on the queue and acting.


# a5799a4f 23-Jan-2010 Konstantin Belousov <kib@FreeBSD.org>

Staticise sigqueue manipulation functions used only in kern_sig.c.

MFC after: 1 week


# 6a671a6b 20-Jan-2010 Konstantin Belousov <kib@FreeBSD.org>

When traced process is about to receive the signal, the process is
stopped and debugger may modify or drop the signal. After the changes to
keep process-targeted signals on the process sigqueue, another thread
may note the old signal on the queue and act before the thread removes
changed or dropped signal from the process queue. Since process is
traced, it usually gets stopped. Or, if the same signal is delivered
while process was stopped, the thread may erronously remove it,
intending to remove the original signal.

Remove the signal from the queue before notifying the debugger. Restore
the siginfo to the head of sigqueue when signal is allowed to be
delivered to the debugee, using newly introduced KSI_HEAD ksiginfo_t
flag. This preserves required order of delivery. Always restore the
unchanged signal on the curthread sigqueue, not to the process queue,
since the thread is about to get it anyway, because sigmask cannot be
changed.

Handle failure of reinserting the siginfo into the queue by falling
back to sq_kill method, calling sigqueue_add with NULL ksi.

If debugger changed the signal to be delivered, use sigqueue_add()
with NULL ksi instead of only setting sq_signals bit.

Reported by: Gardner Bell <gbell72 rogers com>
Analyzed and first version of fix by: Tijl Coosemans <tijl coosemans org>
PR: 142757
Reviewed by: davidxu
MFC after: 2 weeks


# fb70e2f7 18-Dec-2009 Konstantin Belousov <kib@FreeBSD.org>

MFC r199355:
Add SI_KERNEL.

MFC r199418:
Fix pgsignal() call after signature change in r199355.


# 43ba7803 19-Dec-2009 Konstantin Belousov <kib@FreeBSD.org>

MFC r198507:
Use kern_sigprocmask() instead of direct manipulation of td_sigmask to
reschedule newly blocked signals.

MFC r198590:
Trapsignal() calls kern_sigprocmask() when delivering catched signal
with proc lock held.

MFC r198670:
For trapsignal() and postsig(), kern_sigprocmask() is called with
both process lock and curproc->p_sigacts->ps_mtx locked. Prevent lock
recursion on ps_mtx in reschedule_signals().


# 3134e115 19-Dec-2009 Konstantin Belousov <kib@FreeBSD.org>

MFC r198506:
In kern_sigsuspend(), manipulate thread signal mask using
kern_sigprocmask(). Also, do cursig/postsig loop immediately after
waiting for signal, repeating the wait if wakeup was spurious due to
race with other thread fetching signal from the process queue before us.

MFC r199136:
Use cpu_set_syscall_retval(9) to set syscall result, and return
EJUSTRETURN from kern_sigsuspend() to prevent syscall return code from
modifying wrong frame.
Take care of possibility that pending SIGCONT might be cancelled by
SIGSTOP, causing postsig() not to deliver any catched signal.


# 6ddf1cd2 19-Dec-2009 Konstantin Belousov <kib@FreeBSD.org>

MFC r197963:
Put process-directed signals to the process queue unconditionally,
selecting the thread to deliver the signal only by the thread returning
to usermode.
Change cursig() and postsig() to look both into the thread and process
signal queues.

MFC r197976:
Fix typo.

MFC r200082:
Remove wrong assertion. Debugee is allowed to lose a signal


# 4f17d481 03-Dec-2009 Konstantin Belousov <kib@FreeBSD.org>

Remove wrong assertion. Debugee is allowed to lose a signal.

Reported and tested by: jh
MFC after: 2 weeks


# a3de221d 17-Nov-2009 Konstantin Belousov <kib@FreeBSD.org>

Among signal generation syscalls, only sigqueue(2) is allowed by POSIX
to fail due to lack of resources to queue siginfo. Add KSI_SIGQ flag
that allows sigqueue_add() to fail while trying to allocate memory for
new siginfo. When the flag is not set, behaviour is the same as for
KSI_TRAP: if memory cannot be allocated, set bit in sq_kill. KSI_TRAP is
kept to preserve KBI.

Add SI_KERNEL si_code, to be used in siginfo.si_code when signal is
generated by kernel. Deliver siginfo when signal is generated by kill(2)
family of syscalls (SI_USER with properly filled si_uid and si_pid), or
by kernel (SI_KERNEL, mostly job control or SIGIO). Since KSI_SIGQ flag
is not set for the ksi, low memory condition cause old behaviour.

Keep psignal(9) KBI intact, but modify it to generate SI_KERNEL
si_code. Pgsignal(9) and gsignal(9) now take ksi explicitely. Add
pksignal(9) that behaves like psignal but takes ksi, and ddb kill
command implemented as pksignal(..., ksi = NULL) to not do allocation
while in debugger.

While there, remove some register specifiers and use ANSI C prototypes.

Reviewed by: davidxu
MFC after: 1 month


# 75c586a4 10-Nov-2009 Konstantin Belousov <kib@FreeBSD.org>

In r198506, kern_sigsuspend() started doing cursig/postsig loop to make
sure that a signal was delivered to the thread before returning from
syscall. Signal delivery puts new return frame on the user stack, and
modifies trap frame to enter signal handler. As a consequence, syscall
return code sets EINTR as error return for signal frame, instead of the
syscall return.

Also, for ia64, due to different registers layout for those two kind of
frames, usermode sigsegfaulted when returned from signal handler.

Use newly-introduced cpu_set_syscall_retval(9) to set syscall result,
and return EJUSTRETURN from kern_sigsuspend() to prevent syscall return
code from modifying this frame [1].

Another issue is that pending SIGCONT might be cancelled by SIGSTOP,
causing postsig() not to deliver any catched signal [2]. Modify
postsig() to return 1 if signal was posted, and 0 otherwise, and use
this in the kern_sigsuspend loop.

Proposed by: marcel [1]
Noted by: davidxu [2]
Reviewed by: marcel, davidxu
MFC after: 1 month


# 80a8b0f3 30-Oct-2009 Konstantin Belousov <kib@FreeBSD.org>

Trapsignal() and postsig() call kern_sigprocmask() with both process
lock and curproc->p_sigacts->ps_mtx. Reschedule_signals may need to have
ps_mtx locked to decide and wakeup a thread, causing recursion on the
mutex.

Inform kern_sigprocmask() and reschedule_signals() about lock state
of the ps_mtx by new flag SIGPROCMASK_PS_LOCKED to avoid recursion.

Reported and tested by: keramida
MFC after: 1 month


# 80845402 29-Oct-2009 Konstantin Belousov <kib@FreeBSD.org>

Trapsignal() calls kern_sigprocmask() when delivering catched signal
with proc lock held.

Reported and tested by: Mykola Dzham freebsd at levsha org ua
MFC after: 1 month


# d6e029ad 27-Oct-2009 Konstantin Belousov <kib@FreeBSD.org>

In r197963, a race with thread being selected for signal delivery
while in kernel mode, and later changing signal mask to block the
signal, was fixed for sigprocmask(2) and ptread_exit(3). The same race
exists for sigreturn(2), setcontext(2) and swapcontext(2) syscalls.

Use kern_sigprocmask() instead of direct manipulation of td_sigmask to
reschedule newly blocked signals, closing the race.

Reviewed by: davidxu
Tested by: pho
MFC after: 1 month


# 84440afb 27-Oct-2009 Konstantin Belousov <kib@FreeBSD.org>

In kern_sigsuspend(), better manipulate thread signal mask using
kern_sigprocmask() to properly notify other possible candidate threads
for signal delivery.

Since sigsuspend() shall only return to usermode after a signal was
delivered, do cursig/postsig loop immediately after waiting for
signal, repeating the wait if wakeup was spurious due to race with
other thread fetching signal from the process queue before us. Add
thread_suspend_check() call to allow the thread to be stopped or killed
while in loop.

Modify last argument of kern_sigprocmask() from boolean to flags,
allowing the function to be called with locked proc. Convertion of the
callers that supplied 1 to the old argument will be done in the next
commit, and due to SIGPROCMASK_OLD value equial to 1, code is formally
correct in between.

Reviewed by: davidxu
Tested by: pho
MFC after: 1 month


# 8b5adf9d 12-Oct-2009 Joseph Koshy <jkoshy@FreeBSD.org>

Improve the description of sysctl "kern.sugid_coredump".

Submitted by: Mel Flynn <mel.flynn+fbsd.hackers at mailing.thruhere.net>
on -hackers


# 7df8f6ab 12-Oct-2009 Konstantin Belousov <kib@FreeBSD.org>

Fix typo.

Submitted by: rdivacky
MFC after: 1 month


# 6b286ee8 11-Oct-2009 Konstantin Belousov <kib@FreeBSD.org>

Currently, when signal is delivered to the process and there is a thread
not blocking the signal, signal is placed on the thread sigqueue. If
the selected thread is in kernel executing thr_exit() or sigprocmask()
syscalls, then signal might be not delivered to usermode for arbitrary
amount of time, and for exiting thread it is lost.

Put process-directed signals to the process queue unconditionally,
selecting the thread to deliver the signal only by the thread returning
to usermode, since only then the thread can handle delivery of signal
reliably. For exiting thread or thread that has blocked some signals,
check whether the newly blocked signal is queued for the process, and
try to find a thread to wakeup for delivery, in reschedule_signal(). For
exiting thread, assume that all signals are blocked.

Change cursig() and postsig() to look both into the thread and process
signal queues. When there is a signal that thread returning to usermode
could consume, TDF_NEEDSIGCHK flag is not neccessary set now. Do
unlocked read of p_siglist and p_pendingcnt to check for queued signals.

Note that thread that has a signal unblocked might get spurious wakeup
and EINTR from the interruptible system call now, due to the possibility
of being selected by reschedule_signals(), while other thread returned
to usermode earlier and removed the signal from process queue. This
should not cause compliance issues, since the thread has not blocked a
signal and thus should be ready to receive it anyway.

Reported by: Justin Teller <justin.teller gmail com>
Reviewed by: davidxu, jilles
MFC after: 1 month


# 68ee1aac 03-Oct-2009 Konstantin Belousov <kib@FreeBSD.org>

MFC r197660:
Fix typo.

Approved by: re (bz, kensmith)


# 15b7a831 30-Sep-2009 Konstantin Belousov <kib@FreeBSD.org>

Fix typo.

MFC after: 3 days


# e76d823b 12-Sep-2009 Robert Watson <rwatson@FreeBSD.org>

Use C99 initialization for struct filterops.

Obtained from: Mac OS X
Sponsored by: Apple Inc.
MFC after: 3 weeks


# f33a947b 14-Jul-2009 Konstantin Belousov <kib@FreeBSD.org>

Add new msleep(9) flag PBDY that shall be specified together with
PCATCH, to indicate that thread shall not be stopped upon receipt of
SIGSTOP until it reaches the kernel->usermode boundary.

Also change thread_single(SINGLE_NO_EXIT) to only stop threads at
the user boundary unconditionally.

Tested by: pho
Reviewed by: jhb
Approved by: re (kensmith)


# 14961ba7 27-Jun-2009 Robert Watson <rwatson@FreeBSD.org>

Replace AUDIT_ARG() with variable argument macros with a set more more
specific macros for each audit argument type. This makes it easier to
follow call-graphs, especially for automated analysis tools (such as
fxr).

In MFC, we should leave the existing AUDIT_ARG() macros as they may be
used by third-party kernel modules.

Suggested by: brooks
Approved by: re (kib)
Obtained from: TrustedBSD Project
MFC after: 1 week


# 401679de 23-Jun-2009 Peter Holm <pho@FreeBSD.org>

vn_open_cred() needs a non NULL ucred pointer

Reviewed by: kib


# e0c161b8 21-Jun-2009 Konstantin Belousov <kib@FreeBSD.org>

Add another flags argument to vn_open_cred. Use it to specify that some
vn_open_cred invocations shall not audit namei path.

In particular, specify VN_OPEN_NOAUDIT for dotdot lookup performed by
default implementation of vop_vptocnp, and for the open done for core
file. vn_fullpath is called from the audit code, and vn_open there need
to disable audit to avoid infinite recursion. Core file is created on
return to user mode, that, in particular, happens during syscall return.
The creation of the core file is audited by direct calls, and we do not
want to overwrite audit information for syscall.

Reported, reviewed and tested by: rwatson


# 885868cd 10-Apr-2009 Robert Watson <rwatson@FreeBSD.org>

Remove VOP_LEASE and supporting functions. This hasn't been used since
the removal of NQNFS, but was left in in case it was required for NFSv4.
Since our new NFSv4 client and server can't use it for their
requirements, GC the old mechanism, as well as other unused lease-
related code and interfaces.

Due to its impact on kernel programming and binary interfaces, this
change should not be MFC'd.

Proposed by: jeff
Reviewed by: jeff
Discussed with: rmacklem, zach loafman @ isilon


# c90c9021 26-Feb-2009 Ed Schouten <ed@FreeBSD.org>

Remove even more unneeded variable assignments.

kern_time.c:
- Unused variable `p'.

kern_thr.c:
- Variable `error' is always caught immediately, so no reason to
initialize it. There is no way that error != 0 at the end of
create_thread().

kern_sig.c:
- Unused variable `code'.

kern_synch.c:
- `rval' is always assigned in all different cases.

kern_rwlock.c:
- `v' is always overwritten with RW_UNLOCKED further on.

kern_malloc.c:
- `size' is always initialized with the proper value before being used.

kern_exit.c:
- `error' is always caught and returned immediately. abort2() never
returns a non-zero value.

kern_exec.c:
- `len' is always assigned inside the if-statement right below it.

tty_info.c:
- `td' is always overwritten by FOREACH_THREAD_IN_PROC().

Found by: LLVM's scan-build


# 7b4a950a 04-Nov-2008 David Xu <davidxu@FreeBSD.org>

Revert rev 184216 and 184199, due to the way the thread_lock works,
it may cause a lockup.

Noticed by: peter, jhb


# 3f9be10e 23-Oct-2008 David Xu <davidxu@FreeBSD.org>

Actually, for signal and thread suspension, extra process spin lock is
unnecessary, the normal process lock and thread lock are enough. The
spin lock is still needed for process and thread exiting to mimic
single sched_lock.


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# 904c5ec4 15-Oct-2008 David Xu <davidxu@FreeBSD.org>

Move per-thread userland debugging flags into seperated field,
this eliminates some problems of locking, e.g, a thread lock is needed
but can not be used at that time. Only the process lock is needed now
for new field.


# 0359a12e 28-Aug-2008 Attilio Rao <attilio@FreeBSD.org>

Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread
was always curthread and totally unuseful.

Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>


# da7bbd2c 05-Aug-2008 John Baldwin <jhb@FreeBSD.org>

If a thread that is swapped out is made runnable, then the setrunnable()
routine wakes up proc0 so that proc0 can swap the thread back in.
Historically, this has been done by waking up proc0 directly from
setrunnable() itself via a wakeup(). When waking up a sleeping thread
that was swapped out (the usual case when waking proc0 since only sleeping
threads are eligible to be swapped out), this resulted in a bit of
recursion (e.g. wakeup() -> setrunnable() -> wakeup()).

With sleep queues having separate locks in 6.x and later, this caused a
spin lock LOR (sleepq lock -> sched_lock/thread lock -> sleepq lock).
An attempt was made to fix this in 7.0 by making the proc0 wakeup use
the ithread mechanism for doing the wakeup. However, this required
grabbing proc0's thread lock to perform the wakeup. If proc0 was asleep
elsewhere in the kernel (e.g. waiting for disk I/O), then this degenerated
into the same LOR since the thread lock would be some other sleepq lock.

Fix this by deferring the wakeup of the swapper until after the sleepq
lock held by the upper layer has been locked. The setrunnable() routine
now returns a boolean value to indicate whether or not proc0 needs to be
woken up. The end result is that consumers of the sleepq API such as
*sleep/wakeup, condition variables, sx locks, and lockmgr, have to wakeup
proc0 if they get a non-zero return value from sleepq_abort(),
sleepq_broadcast(), or sleepq_signal().

Discussed with: jeff
Glanced at by: sam
Tested by: Jurgen Weber jurgen - ish com au
MFC after: 2 weeks


# 5d217f17 24-May-2008 John Birrell <jb@FreeBSD.org>

Add DTrace 'proc' provider probes using the Statically Defined Trace
(sdt) mechanism.


# b7edba77 21-Mar-2008 Jeff Roberson <jeff@FreeBSD.org>

- Add a new td flag TDF_NEEDSUSPCHK that is set whenever a thread needs
to enter thread_suspend_check().
- Set TDF_ASTPENDING along with TDF_NEEDSUSPCHK so we can move the
thread_suspend_check() to ast() rather than userret().
- Check TDF_NEEDSUSPCHK in the sleepq_catch_signals() optimization so
that we don't miss a suspend request. If this is set use the
expensive signal path.
- Set NEEDSUSPCHK when creating a new thread in thr in case the
creating thread is due to be suspended as well but has not yet.

Reviewed by: davidxu (Authored original patch)


# 374ae2a3 19-Mar-2008 Jeff Roberson <jeff@FreeBSD.org>

- Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice from
requiring the per-process spinlock to only requiring the process lock.
- Reflect these changes in the proc.h documentation and consumers throughout
the kernel. This is a substantial reduction in locking cost for these
fields and was made possible by recent changes to threading support.


# 6617724c 12-Mar-2008 Jeff Roberson <jeff@FreeBSD.org>

Remove kernel support for M:N threading.

While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential. Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.


# 36b208e0 08-Mar-2008 Robert Watson <rwatson@FreeBSD.org>

Use sbuf routines to construct core dump filenames rather than custom
string buffer handling, making the code both easier to read and more
robust against string-handling bugs.

MFC after: 1 week


# eeccc367 08-Mar-2008 Robert Watson <rwatson@FreeBSD.org>

Unlock the process lock when expand_name() fails, or we may leak the
process lock leading to a hang. This bug was introduced in
kern_sig.c:1.351, when the call to expand_name() was moved earlier
bit this particular error case was not updated.


# 22db15c0 13-Jan-2008 Attilio Rao <attilio@FreeBSD.org>

VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>


# cb05b60a 09-Jan-2008 Attilio Rao <attilio@FreeBSD.org>

vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by: Diego Sardina <siarodx at gmail dot com>,
Andrea Di Pasquale <whyx dot it at gmail dot com>


# 10c2b8e1 18-Dec-2007 David E. O'Brien <obrien@FreeBSD.org>

Be more exact with sigaction SA_SIGINFO handling.

Reviewed by: marcel


# 89b57fcf 05-Nov-2007 Konstantin Belousov <kib@FreeBSD.org>

Fix for the panic("vm_thread_new: kstack allocation failed") and
silent NULL pointer dereference in the i386 and sparc64 pmap_pinit()
when the kmem_alloc_nofault() failed to allocate address space. Both
functions now return error instead of panicing or dereferencing NULL.

As consequence, vmspace_exec() and vmspace_unshare() returns the errno
int. struct vmspace arg was added to vm_forkproc() to avoid dealing
with failed allocation when most of the fork1() job is already done.

The kernel stack for the thread is now set up in the thread_alloc(),
that itself may return NULL. Also, allocation of the first process
thread is performed in the fork1() to properly deal with stack
allocation failure. proc_linkup() is separated into proc_linkup()
called from fork1(), and proc_linkup0(), that is used to set up the
kernel process (was known as swapper).

In collaboration with: Peter Holm
Reviewed by: jhb


# 57274c51 25-Oct-2007 Christian S.J. Peron <csjp@FreeBSD.org>

Implement AUE_CORE, which adds process core dump support into the kernel.
This change introduces audit_proc_coredump() which is called by coredump(9)
to create an audit record for the coredump event. When a process
dumps a core, it could be security relevant. It could be an indicator that
a stack within the process has been overflowed with an incorrectly constructed
malicious payload or a number of other events.

The record that is generated looks like this:

header,111,10,process dumped core,0,Thu Oct 25 19:36:29 2007, + 179 msec
argument,0,0xb,signal
path,/usr/home/csjp/test.core
subject,csjp,csjp,staff,csjp,staff,1101,1095,50457,10.37.129.2
return,success,1
trailer,111

- We allocate a completely new record to make sure we arent clobbering
the audit data associated with the syscall that produced the core
(assuming the core is being generated in response to SIGABRT and not
an invalid memory access).
- Shuffle around expand_name() so we can use the coredump name at the very
beginning of the coredump call. Make sure we free the storage referenced
by "name" if we need to bail out early.
- Audit both successful and failed coredump creation efforts

Obtained from: TrustedBSD Project
Reviewed by: rwatson
MFC after: 1 month


# 5ff3816d 23-Oct-2007 Christian S.J. Peron <csjp@FreeBSD.org>

Move where we audit the PID argument such that we unconditionally
audit it at the beginning of the syscall. This fixes a problem
where the user supplies an invalid process ID which is > 0 which
results in the PID argument not being audited.

Obtained from: TrustedBSD Project
MFC after: 1 week


# 6eeb364b 19-Jul-2007 Jeff Roberson <jeff@FreeBSD.org>

- Calling sched_nice() in tdsigwakeup() is no longer required by ULE and
actually causes LORs and other panics.

Reported by: mlaier
Approved by: re


# efe641b9 11-Jun-2007 Jeff Roberson <jeff@FreeBSD.org>

- Add a missing PROC_SUNLOCK() in tdsignal()


# 9b73d239 09-Jun-2007 Matt Jacob <mjacob@FreeBSD.org>

Initialized ets to zero. This is arguably a gcc bug in that ets is always
set to rts when timeout is non-NULL and then timevalid is set and ets is
only checked later when timervalid is set.


# a54e85fd 04-Jun-2007 Jeff Roberson <jeff@FreeBSD.org>

Commit 4/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
sychronization.
- Use the per-process spinlock rather than the sched_lock for per-process
scheduling synchronization.
- Move some common code into thread_suspend_switch() to handle the
mechanics of suspending a thread. The locking here is incredibly
convoluted and should be simplified.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


# 1c4bcd05 31-May-2007 Jeff Roberson <jeff@FreeBSD.org>

- Move rusage from being per-process in struct pstats to per-thread in
td_ru. This removes the requirement for per-process synchronization in
statclock() and mi_switch(). This was previously supported by
sched_lock which is going away. All modifications to rusage are now
done in the context of the owning thread. reads proceed without locks.
- Aggregate exiting threads rusage in thread_exit() such that the exiting
thread's rusage is not lost.
- Provide a new routine, rufetch() to fetch an aggregate of all rusage
structures from all threads in a process. This routine must be used
in any place requiring a rusage from a process prior to it's exit. The
exited process's rusage is still available via p_ru.
- Aggregate tick statistics only on demand via rufetch() or when a thread
exits. Tick statistics are kept in the thread and protected by sched_lock
until it exits.

Initial patch by: attilio
Reviewed by: attilio, bde (some objections), arch (mostly silent)


# 9e223287 31-May-2007 Konstantin Belousov <kib@FreeBSD.org>

Revert UF_OPENING workaround for CURRENT.
Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation
argument from being file descriptor index into the pointer to struct file.

Proposed and reviewed by: jhb
Reviewed by: daichi (unionfs)
Approved by: re (kensmith)


# 4dec0e67 23-May-2007 Robert Watson <rwatson@FreeBSD.org>

Comment that tdsignal() may be entered from the debugger.


# aa89d8cd 21-Mar-2007 John Baldwin <jhb@FreeBSD.org>

Rename the 'mtx_object', 'rw_object', and 'sx_object' members of mutexes,
rwlocks, and sx locks to 'lock_object'.


# 873fbcd7 05-Mar-2007 Robert Watson <rwatson@FreeBSD.org>

Further system call comment cleanup:

- Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde)
- Remove extra blank lines in some cases.
- Add extra blank lines in some cases.
- Remove no-op comments consisting solely of the function name, the word
"syscall", or the system call name.
- Add punctuation.
- Re-wrap some comments.


# 0c14ff0e 04-Mar-2007 Robert Watson <rwatson@FreeBSD.org>

Remove 'MPSAFE' annotations from the comments above most system calls: all
system calls now enter without Giant held, and then in some cases, acquire
Giant explicitly.

Remove a number of other MPSAFE annotations in the credential code and
tweak one or two other adjacent comments.


# d60226bd 09-Feb-2007 Xin LI <delphij@FreeBSD.org>

Give which signal caller has attempted to deliver when panicking.


# 4f506694 17-Jan-2007 Xin LI <delphij@FreeBSD.org>

Use FOREACH_PROC_IN_SYSTEM instead of using its unrolled form.


# 016fa302 24-Dec-2006 David Xu <davidxu@FreeBSD.org>

break loop early if we know that there are at least two signals.


# 6aeb05d7 11-Nov-2006 Tom Rhodes <trhodes@FreeBSD.org>

Merge posix4/* into normal kernel hierarchy.

Reviewed by: glanced at by jhb
Approved by: silence on -arch@ and -standards@


# 8460a577 26-Oct-2006 John Birrell <jb@FreeBSD.org>

Make KSE a kernel option, turned on by default in all GENERIC
kernel configs except sun4v (which doesn't process signals properly
with KSE).

Reviewed by: davidxu@


# 5c28a8d4 21-Oct-2006 David Xu <davidxu@FreeBSD.org>

Use macro TAILQ_FOREACH_SAFE instead of expanding it.


# 0fc32899 20-Oct-2006 John Baldwin <jhb@FreeBSD.org>

Remove the check that prevented signals from being delivered to exiting
processes. It was originally added back when support for Linux threads
(and thus shared sigacts objects) was added, but no one knows why. My
guess is that at some point during the Linux threads patches, the sigacts
object was torn down during exit1(), so this check was added to prevent
a panic for that race. However, the stuff that was actually committed to
the tree doesn't teardown sigacts until wait() making the above race moot.
Re-allowing signals here lets one interrupt a NFS request during process
teardown (such as closing descriptors) on an interruptible mount.

Requested by: kib (long time ago)
MFC after: 1 week


# c6511aea 04-Oct-2006 David Xu <davidxu@FreeBSD.org>

Move some declaration of 32-bit signal structures into file
freebsd32-signal.h, implement sigtimedwait and sigwaitinfo system calls.


# 73dbd3da 11-May-2006 John Baldwin <jhb@FreeBSD.org>

Remove various bits of conditional Alpha code and fixup a few comments.


# 11991ab4 07-May-2006 Tor Egge <tegge@FreeBSD.org>

Call vn_finished_write() before calling the coredump handler which will
indirectly call vn_start_write() as necessary for each write.


# 95f16c1e 21-Apr-2006 Paul Saab <ps@FreeBSD.org>

Don't try to kill embryonic processes in killpg1(). This prevents
a race condition between fork() and kill(pid,sig) with pid < 0 that
can cause a kernel panic.

Submitted by: up
MFC after: 3 weeks


# 33f19bee 28-Mar-2006 John Baldwin <jhb@FreeBSD.org>

- Conditionalize Giant around VFS operations for ALQ, ktrace, and
generating a coredump as the result of a signal.
- Fix a bug where we could leak a Giant lock if vn_start_write() failed
in coredump().

Reported by: jmg (2)


# 7b8d5e48 09-Mar-2006 David Xu <davidxu@FreeBSD.org>

Remove _STOPEVENT call, it is already called in issignal, simplify
code for SIGKILL signal.


# 3dfcaad6 02-Mar-2006 David Xu <davidxu@FreeBSD.org>

Add signal set sq_kill to sigqueue structure, the member saves all
signals sent by kill() syscall, without this, a signal sent by
sigqueue() can cause a signal sent by kill() to be lost.


# 7e0221a2 23-Feb-2006 David Xu <davidxu@FreeBSD.org>

1. Refine kern_sigtimedwait() to remove redundant code.
2. Fix a bug, if thread got a SIGKILL signal, call sigexit() to kill
its process.

MFC after: 3 days


# 7c9a98f1 22-Feb-2006 David Xu <davidxu@FreeBSD.org>

Code cleanup, simply compare with curproc.


# 94f0972b 15-Feb-2006 David Xu <davidxu@FreeBSD.org>

Fix a long standing race between sleep queue and thread
suspension code. When a thread A is going to sleep, it calls
sleepq_catch_signals() to detect any pending signals or thread
suspension request, if nothing happens, it returns without
holding process lock or scheduler lock, this opens a race
window which allows thread B to come in and do process
suspension work, however since A is still at running state,
thread B can do nothing to A, thread A continues, and puts
itself into actually sleeping state, but B has never seen it,
and it sits there forever until B is woken up by other threads
sometimes later(this can be very long delay or never
happen). Fix this bug by forcing sleepq_catch_signals to
return with scheduler lock held.
Fix sleepq_abort() by passing it an interrupted code, previously,
it worked as wakeup_one(), and the interruption can not be
identified correctly by sleep queue code when the sleeping
thread is resumed.
Let thread_suspend_check() returns EINTR or ERESTART, so sleep
queue no longer has to use SIGSTOP as a hack to build a return
value.

Reviewed by: jhb
MFC after: 1 week


# bfd7575a 13-Feb-2006 Wayne Salamon <wsalamon@FreeBSD.org>

Audit the arguments to the kill(2) and killpg(2) system calls.

Obtained from: TrustedBSD Project
Approved by: rwatson (mentor)


# d8267df7 12-Feb-2006 David Xu <davidxu@FreeBSD.org>

In order to speed up process suspension on MP machine, send IPI to
remote CPU. While here, abstract thread suspension code into a function
called sig_suspend_threads, the function is called when a process received
a STOP signal.


# 7f96995e 04-Feb-2006 David Xu <davidxu@FreeBSD.org>

Create childproc_jobstate function to report job control state, this
also fixes a bug in childproc_continued which ignored PS_NOCLDSTOP.


# d7bc12b0 23-Dec-2005 David Xu <davidxu@FreeBSD.org>

Avoid kernel panic when attaching a process which may not be stopped
by debugger, e.g process is dumping core. Only access p_xthread if
P_STOPPED_TRACE is set, this means thread is ready to exchange signal
with debugger, print a warning if P_STOPPED_TRACE is not set due to
some bugs in other code, if there is.

The patch has been tested by Anish Mistry mistry.7 at osu dot edu, and
is slightly adjusted.


# f71a882f 09-Dec-2005 David Xu <davidxu@FreeBSD.org>

Add a sysctl to force a process to sigexit if a trap signal is
being hold by current thread or ignored by current process,
otherwise, it is very possible the thread will enter an infinite loop
and lead to an administrator's nightmare.


# 761a4d94 08-Dec-2005 David Xu <davidxu@FreeBSD.org>

Cleanup sigqueue sysctl.


# 027f7604 05-Dec-2005 David Xu <davidxu@FreeBSD.org>

Fix a lock leak in childproc_continued().


# b51d237a 30-Nov-2005 David Xu <davidxu@FreeBSD.org>

set signal queue values for sysconf().


# 413cf3bb 11-Nov-2005 David Xu <davidxu@FreeBSD.org>

Make sure only remove one signal by debugger.


# f4d85223 09-Nov-2005 David Xu <davidxu@FreeBSD.org>

WIFxxx macros requires an int type but p_xstat is short, convert it
to int before using the macros.

Bug reported by : Pyun YongHyeon pyunyh at gmail dot com


# ebceaf6d 08-Nov-2005 David Xu <davidxu@FreeBSD.org>

Add support for queueing SIGCHLD same as other UNIX systems did.

For each child process whose status has been changed, a SIGCHLD instance
is queued, if the signal is stilling pending, and process changed status
several times, signal information is updated to reflect latest process
status. If wait() returns because the status of a child process is
available, pending SIGCHLD signal associated with the child process is
discarded. Any other pending SIGCHLD signals remain pending.

The signal information is allocated at the same time when proc structure
is allocated, if process signal queue is fully filled or there is a memory
shortage, it can still send the signal to process.

There is a booting time tunable kern.sigqueue.queue_sigchild which
can control the behavior, setting it to zero disables the SIGCHLD queueing
feature, the tunable will be removed if the function is proved that it is
stable enough.

Tested on: i386 (SMP and UP)


# 8f0371f1 04-Nov-2005 David Xu <davidxu@FreeBSD.org>

Fix name compatible problem with POSIX standard. the sigval_ptr and
sigval_int really should be sival_ptr and sival_int.
Also sigev_notify_function accepts a union sigval value but not a
pointer.


# 6d7b314b 02-Nov-2005 David Xu <davidxu@FreeBSD.org>

Cleanup some signal interfaces. Now the tdsignal function accepts
both proc pointer and thread pointer, if thread pointer is NULL,
tdsignal automatically finds a thread, otherwise it sends signal
to given thread.
Add utility function psignal_event to send a realtime sigevent
to a process according to the delivery requirement specified in
struct sigevent.


# 56c06c4b 29-Oct-2005 David Xu <davidxu@FreeBSD.org>

Let itimer store itimerspec instead of itimerval, so I don't have to
convert to or from timeval frequently.

Introduce function itimer_accept() to ack a timer signal in signal
acceptance code, this allows us to return more fresh overrun counter
than at signal generating time. while POSIX says:
"the value returned by timer_getoverrun() shall apply to the most
recent expiration signal delivery or acceptance for the timer,.."
I prefer returning it at acceptance time.

Introduce SIGEV_THREAD_ID notification mode, it is used by thread
libary to request kernel to deliver signal to a specified thread,
and in turn, the thread library may use the mechanism to implement
SIGEV_THREAD which is required by POSIX.

Timer signal is managed by timer code, so it can not fail even if
signal queue is full filled by sigqueue syscall.


# 5da49fcb 22-Oct-2005 David Xu <davidxu@FreeBSD.org>

1. Make ksiginfo_alloc and ksiginfo_free public.
2. Introduce flags KSI_EXT and KSI_INS. The flag KSI_EXT allows a ksiginfo
to be managed by outside code, the KSI_INS indicates sigqueue_add should
directly insert passed ksiginfo into queue other than copy it.


# 9104847f 13-Oct-2005 David Xu <davidxu@FreeBSD.org>

1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most
changes in MD code are trivial, before this change, trapsignal and
sendsig use discrete parameters, now they uses member fields of
ksiginfo_t structure. For sendsig, this change allows us to pass
POSIX realtime signal value to user code.

2. Remove cpu_thread_siginfo, it is no longer needed because we now always
generate ksiginfo_t data and feed it to libpthread.

3. Add p_sigqueue to proc structure to hold shared signals which were
blocked by all threads in the proc.

4. Add td_sigqueue to thread structure to hold all signals delivered to
thread.

5. i386 and amd64 now return POSIX standard si_code, other arches will
be fixed.

6. In this sigqueue implementation, pending signal set is kept as before,
an extra siginfo list holds additional siginfo_t data for signals.
kernel code uses psignal() still behavior as before, it won't be failed
even under memory pressure, only exception is when deleting a signal,
we should call sigqueue_delete to remove signal from sigqueue but
not SIGDELSET. Current there is no kernel code will deliver a signal
with additional data, so kernel should be as stable as before,
a ksiginfo can carry more information, for example, allow signal to
be delivered but throw away siginfo data if memory is not enough.
SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can
not be caught or masked.
The sigqueue() syscall allows user code to queue a signal to target
process, if resource is unavailable, EAGAIN will be returned as
specification said.
Just before thread exits, signal queue memory will be freed by
sigqueue_flush.
Current, all signals are allowed to be queued, not only realtime signals.

Earlier patch reviewed by: jhb, deischen
Tested on: i386, amd64


# ec8297bd 05-Jun-2005 David Xu <davidxu@FreeBSD.org>

Fix a bug relavant to debugging, a masked signal unexpectedly interrupts
a sleeping thread when process is being debugged.

PR: GNU/77818
Tested by: Sean C. Farley <sean-freebsd at farley org>


# 407948a5 19-Apr-2005 David Xu <davidxu@FreeBSD.org>

Oops, forgot to update this file.
Fix a race condition between kern_wait() and thread_stopped().
Problem is in kern_wait(), parent process steps through children list,
once a child process is skipped, and later even if the child is stopped,
parent process still sleeps in msleep(), the race happens if parent
masked SIGCHLD.

Submitted by : Peter Edwards peadar.edwards at gmail dot com
MFC after : 4 days


# f97c3df1 09-Apr-2005 David Schultz <das@FreeBSD.org>

Suspend all other threads in the process while generating a core dump.
The main reason for doing this is that the ELF dump handler expects
the thread list to be fixed while the dump header is generated, so an
upcall that occurs at the wrong time can lead to buffer overruns and
other Bad Things.

Another solution would be to grab sched_lock in the ELF dump handler,
but we might as well single-thread, since the process is about to die.
Furthermore, I think this should ensure that the register sets in the
core file are sequentially consistent.


# 627451c1 04-Mar-2005 David Xu <davidxu@FreeBSD.org>

The td_waitset is pointing to a stack address when thread is waiting
for a signal, because kernel stack is swappable, this causes page fault
in kernel under heavy swapping case. Fix this bug by eliminating unneeded
code.


# 6675b36e 02-Mar-2005 David Xu <davidxu@FreeBSD.org>

In kern_sigtimedwait, remove waitset bits for td_sigmask before
sleeping, so in do_tdsignal, we no longer need to test td_waitset.
now td_waitset is only used to give a thread higher priority when
delivering signal to multithreads process.
This also fixes a bug:
when a thread in sigwait states was suspended and later resumed
by SIGCONT, it can no longer receive signals belong to waitset.


# 1089f031 18-Feb-2005 David Xu <davidxu@FreeBSD.org>

Don't restart a timeout wait in kern_sigtimedwait, also allow it
to wait longer than a single integer can represent.


# 1a88a252 13-Feb-2005 Maxim Sobolev <sobomax@FreeBSD.org>

Backout previous change (disabling of security checks for signals delivered
in emulation layers), since it appears to be too broad.

Requested by: rwatson


# d8ff44b7 13-Feb-2005 Maxim Sobolev <sobomax@FreeBSD.org>

Split out kill(2) syscall service routine into user-level and kernel part, the
former is callable from user space and the latter from the kernel one. Make
kernel version take additional argument which tells if the respective call
should check for additional restrictions for sending signals to suid/sugid
applications or not.

Make all emulation layers using non-checked version, since signal numbers in
emulation layers can have different meaning that in native mode and such
protection can cause misbehaviour.

As a result remove LIBTHR from the signals allowed to be delivered to a
suid/sugid application.

Requested (sorta) by: rwatson
MFC after: 2 weeks


# 9454b2d8 06-Jan-2005 Warner Losh <imp@FreeBSD.org>

/* -> /*- for copyright notices, minor format tweaks as necessary


# 3ef6ac33 13-Dec-2004 Jeff Roberson <jeff@FreeBSD.org>

- If delivering a signal will result in killing a process that has a
nice value above 0, set it to 0 so that it may proceed with haste.
This is especially important on ULE, where adjusting the priority
does not guarantee that a thread will be granted a greater time slice.


# 6d1ab6ed 15-Nov-2004 Warner Losh <imp@FreeBSD.org>

Fix an off by one error. MAXPATHLEN already has +1.


# 90d75f78 29-Oct-2004 Alfred Perlstein <alfred@FreeBSD.org>

Allow kill -9 to kill processes stuck in procfs STOPEVENTs.


# cd71c414 29-Oct-2004 Alfred Perlstein <alfred@FreeBSD.org>

Backout 1.291.

re doesn't seem to think this fixes:
Desired features for 5.3-RELEASE "More truss problems"


# b3a4fb14 05-Oct-2004 David Xu <davidxu@FreeBSD.org>

Use scheduler api to adjust thread priority.


# 482d099c 03-Oct-2004 David Xu <davidxu@FreeBSD.org>

Don't bother to turn off other P_STOPPED bits for SIGKILL, doing
so would cause kernel to produce an unkillable process in some cases,
especially, P_STOPPED_SINGLE has a singling thread, turning off the
bit would mess the state.


# 50434413 01-Oct-2004 Alfred Perlstein <alfred@FreeBSD.org>

Clear a process's procfs trace points upon delivery of SIGKILL.

MT5 candidate. (Desired features for 5.3-RELEASE "More truss problems")


# 5995adc2 31-Aug-2004 Julian Elischer <julian@FreeBSD.org>

Remove an unneeded argument..
The removed argument could trivially be derived from the remaining one.
That in turn should be the same as curthread, but it is possible that curthread could be expensive to derive on some syste,s so leave it as an argument.
Having both proc and thread as an argumen tjust gives an opportunity for
them to get out sync.

MFC after: 3 days


# ad3b9257 15-Aug-2004 John-Mark Gurney <jmg@FreeBSD.org>

Add locking to the kqueue subsystem. This also makes the kqueue subsystem
a more complete subsystem, and removes the knowlege of how things are
implemented from the drivers. Include locking around filter ops, so a
module like aio will know when not to be unloaded if there are outstanding
knotes using it's filter ops.

Currently, it uses the MTX_DUPOK even though it is not always safe to
aquire duplicate locks. Witness currently doesn't support the ability
to discover if a dup lock is ok (in some cases).

Reviewed by: green, rwatson (both earlier versions)


# 6141e04a 08-Aug-2004 John-Mark Gurney <jmg@FreeBSD.org>

add option to automaticly mark core dumps with the nodump flag

PR: 57065
Submitted by: Walter C. Pelissero


# 24b2151f 03-Aug-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Don't skip permission checks when sending signals to zombie processes.

Pointed out by: bde
Reviewed by: rwatson


# 0b011ea3 29-Jul-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Syscall kill(2) called for a zombie process should return 0.

Obtained from: Darwin


# 6dbc0850 16-Jul-2004 John Baldwin <jhb@FreeBSD.org>

Improve readability a bit by changing some code at the end of a function
that did:

if (foo)
return
else
blah

to just do the simpler

if (!foo)
blah

instead.


# cbf4e354 13-Jul-2004 David Xu <davidxu@FreeBSD.org>

Add code to support debugging threaded process.

1. Add tm_lwpid into kse_thr_mailbox to indicate which kernel
thread current user thread is running on. Add tm_dflags into
kse_thr_mailbox, the flags is written by debugger, it tells
UTS and kernel what should be done when the process is being
debugged, current, there two flags TMDF_SSTEP and TMDF_DONOTRUNUSER.

TMDF_SSTEP is used to tell kernel to turn on single stepping,
or turn off if it is not set.

TMDF_DONOTRUNUSER is used to tell kernel to schedule upcall
whenever possible, to UTS, it means do not run the user thread
until debugger clears it, this behaviour is necessary because
gdb wants to resume only one thread when the thread's pc is
at a breakpoint, and thread needs to go forward, in order to
avoid other threads sneak pass the breakpoints, it needs to remove
breakpoint, only wants one thread to go. Also, add km_lwp to
kse_mailbox, the lwp id is copied to kse_thr_mailbox at context
switch time when process is not being debugged, so when process
is attached, debugger can map kernel thread to user thread.

2. Add p_xthread to proc strcuture and td_xsig to thread structure.
p_xthread is used by a thread when it wants to report event
to debugger, every thread can set the pointer, especially, when
it is used in ptracestop, it is the last thread reporting event
will win the race. Every thread has a td_xsig to exchange signal
with debugger, thread uses TDF_XSIG flag to indicate it is reporting
signal to debugger, if the flag is not cleared, thread will keep
retrying until it is cleared by debugger, p_xthread may be
used by debugger to indicate CURRENT thread. The p_xstat is still
in proc structure to keep wait() to work, in future, we may
just use td_xsig.

3. Add TDF_DBSUSPEND flag, the flag is used by debugger to suspend
a thread. When process stops, debugger can set the flag for
thread, thread will check the flag in thread_suspend_check,
enters a loop, unless it is cleared by debugger, process is
detached or process is existing. The flag is also checked in
ptracestop, so debugger can temporarily suspend a thread even
if the thread wants to exchange signal.

4. Current, in ptrace, we always resume all threads, but if a thread
has already a TDF_DBSUSPEND flag set by debugger, it won't run.

Encouraged by: marcel, julian, deischen


# fbc3247d 11-Jul-2004 Marcel Moolenaar <marcel@FreeBSD.org>

Implement the PT_LWPINFO request. This request can be used by the
tracing process to obtain information about the LWP that caused the
traced process to stop. Debuggers can use this information to select
the thread currently running on the LWP as the current thread.

The request has been made compatible with NetBSD for as much as
possible. This implementation differs from NetBSD in the following
ways:
1. The data argument is allowed to be smaller than the size of the
ptrace_lwpinfo structure known to the kernel, but not 0. This
is opposite to what NetBSD allows. The reason for this is that
we can extend the structure without affecting older binaries.
2. On NetBSD the tracing process is to set the pl_lwpid field to
the Id of the LWP it wants information of. We don't do that.
Our ptrace interface allows passing the LWP Id instead of the
PID. The tracing process is to set the PID to the LWP Id it
wants information of.
3. When the PID is actually the PID of the tracing process, this
request returns the information about the LWP that caused the
process to stop. This was the whole purpose of the request in
the first place.

When the traced process has exited, this request will return the
LWP Id 0, indicating that the process state is not the result of
an event specific to a LWP.


# bf0acc27 02-Jul-2004 John Baldwin <jhb@FreeBSD.org>

- Change mi_switch() and sched_switch() to accept an optional thread to
switch to. If a non-NULL thread pointer is passed in, then the CPU will
switch to that thread directly rather than calling choosethread() to pick
a thread to choose to.
- Make sched_switch() aware of idle threads and know to do
TD_SET_CAN_RUN() instead of sticking them on the run queue rather than
requiring all callers of mi_switch() to know to do this if they can be
called from an idlethread.
- Move constants for arguments to mi_switch() and thread_single() out of
the middle of the function prototypes and up above into their own
section.


# 1930e303 11-Jun-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Deorbit COMPAT_SUNOS.

We inherited this from the sparc32 port of BSD4.4-Lite1. We have neither
a sparc32 port nor a SunOS4.x compatibility desire these days.


# 36939a0a 07-Jun-2004 David Xu <davidxu@FreeBSD.org>

According to SUSv3, sigwait is different with sigwaitinfo, sigwait
returns error code in return value, not in errno.


# aa0aa7a1 02-Jun-2004 Tim J. Robbins <tjr@FreeBSD.org>

Move TDF_SA from td_flags to td_pflags (and rename it accordingly)
so that it is no longer necessary to hold sched_lock while
manipulating it.

Reviewed by: davidxu


# a4c2da15 21-May-2004 Bruce Evans <bde@FreeBSD.org>

Fixed some style bugs in tdsigwakeup().


# 80c4433c 20-May-2004 John Baldwin <jhb@FreeBSD.org>

In tdsigwakeup(), use TD_ON_SLEEPQ() rather than TD_IS_SLEEPING() to see if
a thread is on a sleep queue and should have it's sleep aborted.

Reported by: Thierry Herbelot thierry at herbelot dot com


# 4a3b3dcb 12-Apr-2004 Colin Percival <cperciva@FreeBSD.org>

stop() no longer needs sched_lock held; in fact, holding sched_lock causes
a LOR against sleepq. Fix the comment, and fix ptracestop() to pick up
sched_lock after stop() rather than before.

Reported by: Scott Sipe <cscotts@mindspring.com>
Reviewed by: rwatson, jhb


# 7f8a436f 05-Apr-2004 Warner Losh <imp@FreeBSD.org>

Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core


# 9a6a4cb5 29-Mar-2004 Peter Wemm <peter@FreeBSD.org>

Shorten some XXXKSE commentry


# 4ae89b95 05-Mar-2004 John Baldwin <jhb@FreeBSD.org>

- Push down Giant in exit() and wait().
- Push Giant down a bit in coredump() and call coredump() with the proc
lock already held rather than unlocking it only to turn around and
relock it.

Requested by: peter


# 86b5e563 03-Mar-2004 Dag-Erling Smørgrav <des@FreeBSD.org>

Use different dummy wait channels to avoid panic in msleep().

Reviewed by: jhb


# 44f3b092 27-Feb-2004 John Baldwin <jhb@FreeBSD.org>

Switch the sleep/wakeup and condition variable implementations to use the
sleep queue interface:
- Sleep queues attempt to merge some of the benefits of both sleep queues
and condition variables. Having sleep qeueus in a hash table avoids
having to allocate a queue head for each wait channel. Thus, struct cv
has shrunk down to just a single char * pointer now. However, the
hash table does not hold threads directly, but queue heads. This means
that once you have located a queue in the hash bucket, you no longer have
to walk the rest of the hash chain looking for threads. Instead, you have
a list of all the threads sleeping on that wait channel.
- Outside of the sleepq code and the sleep/cv code the kernel no longer
differentiates between cv's and sleep/wakeup. For example, calls to
abortsleep() and cv_abort() are replaced with a call to sleepq_abort().
Thus, the TDF_CVWAITQ flag is removed. Also, calls to unsleep() and
cv_waitq_remove() have been replaced with calls to sleepq_remove().
- The sched_sleep() function no longer accepts a priority argument as
sleep's no longer inherently bump the priority. Instead, this is soley
a propery of msleep() which explicitly calls sched_prio() before
blocking.
- The TDF_ONSLEEPQ flag has been dropped as it was never used. The
associated TDF_SET_ONSLEEPQ and TDF_CLR_ON_SLEEPQ macros have also been
dropped and replaced with a single explicit clearing of td_wchan.
TD_SET_ONSLEEPQ() would really have only made sense if it had taken
the wait channel and message as arguments anyway. Now that that only
happens in one place, a macro would be overkill.


# 91d5354a 04-Feb-2004 John Baldwin <jhb@FreeBSD.org>

Locking for the per-process resource limits structure.
- struct plimit includes a mutex to protect a reference count. The plimit
structure is treated similarly to struct ucred in that is is always copy
on write, so having a reference to a structure is sufficient to read from
it without needing a further lock.
- The proc lock protects the p_limit pointer and must be held while reading
limits from a process to keep the limit structure from changing out from
under you while reading from it.
- Various global limits that are ints are not protected by a lock since
int writes are atomic on all the archs we support and thus a lock
wouldn't buy us anything.
- All accesses to individual resource limits from a process are abstracted
behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return
either an rlimit, or the current or max individual limit of the specified
resource from a process.
- dosetrlimit() was renamed to kern_setrlimit() to match existing style of
other similar syscall helper functions.
- The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit()
(it didn't used the stackgap when it should have) but uses lim_rlimit()
and kern_setrlimit() instead.
- The svr4 compat no longer uses the stackgap for resource limits calls,
but uses lim_rlimit() and kern_setrlimit() instead.
- The ibcs2 compat no longer uses the stackgap for resource limits. It
also no longer uses the stackgap for accessing sysctl's for the
ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result,
ibcs2_sysconf() no longer needs Giant.
- The p_rlimit macro no longer exists.

Submitted by: mtm (mostly, I only did a few cleanups and catchups)
Tested on: i386
Compiled on: alpha, amd64


# 30a9f26d 28-Jan-2004 Robert Watson <rwatson@FreeBSD.org>

Assert process lock in ptracestop(), since we're going to rely
on it, and later unlock it.


# 97563428 27-Jan-2004 Alexander Kabaev <kan@FreeBSD.org>

Move the part of the comment which applies to osigsuspend where
it belongs. The current sigsuspend syscall does expect a pointer
to the mask as argument.

Submitted by: Igor Sysoev <is at rambler-co dot ru>


# 29bcc451 24-Jan-2004 Jeff Roberson <jeff@FreeBSD.org>

- Add a flags parameter to mi_switch. The value of flags may be SW_VOL or
SW_INVOL. Assert that one of these is set in mi_switch() and propery
adjust the rusage statistics. This is to simplify the large number of
users of this interface which were previously all required to adjust the
proper counter prior to calling mi_switch(). This also facilitates more
switch and locking optimizations.
- Change all callers of mi_switch() to pass the appropriate paramter and
remove direct references to the process statistics.


# def05568 10-Jan-2004 Robert Watson <rwatson@FreeBSD.org>

When not creating a core dump due to resource limits specifying
a maximum dump size of 0, return a size-related error, rather
than returning success. Otherwise, waitpid() will incorrectly
return a status indicating that a core dump was created. Note
that the specific error doesn't actually matter, since it's lost.

MFC after: 2 weeks
PR: 60367
Submitted by: Valentin Nechayev <netch@netch.kiev.ua>


# 047aa39b 08-Jan-2004 Robert Watson <rwatson@FreeBSD.org>

Drop the sigacts mutex around calls to stopevent() to avoid sleeping
holding the mutex. Because the sigacts pointer can't change while
the process is "live" (proc locking (x)), we know our pointer is still
valid.

In communication with: truckman
Reviewed by: jhb


# a30ec4b9 02-Jan-2004 David Xu <davidxu@FreeBSD.org>

Make sigaltstack as per-threaded, because per-process sigaltstack state
is useless for threaded programs, multiple threads can not share same
stack.
The alternative signal stack is private for thread, no lock is needed,
the orignal P_ALTSTACK is now moved into td_pflags and renamed to
TDP_ALTSTACK.
For single thread or Linux clone() based threaded program, there is no
semantic changed, because those programs only have one kernel thread
in every process.

Reviewed by: deischen, dfr


# a9a48d68 07-Dec-2003 David Xu <davidxu@FreeBSD.org>

Lock and unlock sched_lock when walking through thread list, current we
insert kse upcall thread into thread list at mi_switch time, process lock
is not enough.


# 7eeaaf9b 29-Oct-2003 David Xu <davidxu@FreeBSD.org>

Try to fetch thread mailbox address in page fault trap, so when thread
blocks in page fault hanlder, and upcall thread can be scheduled. It is
useful if process is doing lots of mmap based I/O.


# 36bbf86b 25-Oct-2003 Robert Watson <rwatson@FreeBSD.org>

Check (locked) before performing an advisory unlock following a failure
of vn_start_write(). Otherwise, we may inconsistently attempt to release
the advisory lock.

Pointed out by: teggej


# c447f5b2 25-Oct-2003 Robert Watson <rwatson@FreeBSD.org>

When generate a core dump, use advisory locking in an advisory way:
if we do acquire an advisory lock, great! We'll release it later.
However, if we fail to acquire a lock, we perform the coredump
anyway. This problem became particularly visible with NFS after
the introduction of rpc.lockd: if the lock manager isn't running,
then locking calls will fail, aborting the core dump (resulting in
a zero-byte dump file).

Reported by: Yogeshwar Shenoy <ynshenoy@alumni.cs.ucsb.edu>


# 3a2e2a0e 13-Oct-2003 David Xu <davidxu@FreeBSD.org>

Don't clear signal mask in execsig(). RELENG_4 does not clear it and POSIX
asks to inherit signal mask for execv.


# 4cc9f52f 26-Sep-2003 Robert Drehmel <robert@FreeBSD.org>

Move some tracing related code into its own function as it will
be needed for system call related ptrace functionality I plan
to commit soon.


# 41b3077a 10-Aug-2003 Jacques Vidrine <nectar@FreeBSD.org>

panic() if we try to handle an out-of-range signal number in
psignal()/tdsignal(). The test was historically in psignal(). It was
changed into a KASSERT, and then later moved to tdsignal() when the
latter was introduced.

Reviewed by: iedowse, jhb


# 1fc434dc 30-Jul-2003 David Xu <davidxu@FreeBSD.org>

Use correct signal when calling sigexit.


# 7c89f162 27-Jul-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Add fdidx argument to vn_open() and vn_open_cred() and pass -1 throughout.


# a6ca4808 24-Jul-2003 Mike Makonnen <mtm@FreeBSD.org>

The POSIX spec also requires that kern_sigtimedwait return
EINVAL if tv_nsec of the timeout is less than zero.


# 432b45de 20-Jul-2003 David Xu <davidxu@FreeBSD.org>

Always deliver synchronous signal to UTS for SA threads.


# 3074d1b4 17-Jul-2003 David Xu <davidxu@FreeBSD.org>

Fix sigwait to conform to POSIX.
When a signal is being delivered to process, first find a sigwait
thread to deliver, POSIX's argument is speed of delivering signal
to sigwait thread is faster than other ways. A signal in its wait
set will cause sigwait to return the signal number, a signal not
in its wait set but in not blocked by the thread also causes sigwait
to return, but sigwait returns EINTR, sigwait is oneshot operation,
only one signal can be delivered to its wait set, when a signal is
delivered to the sigwait thread, the thread's sigwait state is canceled.


# 4b7d5d84 14-Jul-2003 David Xu <davidxu@FreeBSD.org>

Rename thread_siginfo to cpu_thread_siginfo


# ffb2e92a 11-Jul-2003 David Xu <davidxu@FreeBSD.org>

If a thread is sending signal to its process, if the thread can handle
the signal itself, it should get it without looking for other threads.


# 14b5ae1a 05-Jul-2003 Mike Makonnen <mtm@FreeBSD.org>

Make the conditional, which decides what siglist to put a signal on,
more concise and improve the comment.

Submitted by: bde


# c197abc4 03-Jul-2003 Mike Makonnen <mtm@FreeBSD.org>

Signals sent specifically to a particular thread must
be delivered to that thread, regardless of whether it
has it masked or not.

Previously, if the targeted thread had the signal masked,
it would be put on the processes' siglist. If
another thread has the signal umasked or unmasks it before
the target, then the thread it was intended for would never
receive it.

This patch attempts to solve the problem by requiring callers
of tdsignal() to say whether the signal is for the thread or
for the process. If it is for the process, then normal processing
occurs and any thread that has it unmasked can receive it.
But if it is destined for a specific thread, it is put on
that thread's pending list regardless of whether it is currently
masked or not.

The new behaviour still needs more work, though. If the signal
is reposted for some reason it is always posted back to the
thread that handled it because the information regarding the
target of the signal has been lost by then.

Reviewed by: jdp, jeff, bde (style)


# 9dde3bc9 28-Jun-2003 David Xu <davidxu@FreeBSD.org>

o Change kse_thr_interrupt to allow send a signal to a specified thread,
or unblock a thread in kernel, and allow UTS to specify whether syscall
should be restarted.
o Add ability for UTS to monitor signal comes in and removed from process,
the flag PS_SIGEVENT is used to indicate the events.
o Add a KMF_WAITSIGEVENT for KSE mailbox flag, UTS call kse_release with
this flag set to wait for above signal event.
o For SA based thread, kernel masks all signal in its signal mask, let
UTS to use kse_thr_interrupt interrupt a thread, and install a signal
frame in userland for the thread.
o Add a tm_syncsig in thread mailbox, when a hardware trap occurs,
it is used to deliver synchronous signal to userland, and upcall
is schedule, so UTS can process the synchronous signal for the thread.

Reviewed by: julian (mentor)


# 418228df 28-Jun-2003 David Xu <davidxu@FreeBSD.org>

Fix POSIX compatible bug for sigwaitinfo and sigtimedwait.
POSIX says siginfo pointer parameter can be NULL and if the
function success, it should return signal number but not zero.
The waitset it past should be negatived before it can be
used as thread signal mask.


# 062cf543 19-Jun-2003 David Xu <davidxu@FreeBSD.org>

When a STOP signal is being sent to a process, it is possible all
threads in the process have already masked the signal, so job control
is delayed. But later a thread unmasking the STOP signal should enable
job control, so in issignal(), scanning all threads in process to see
if we can direct suspend some of them, not just suspend current thread.


# 8b56079e 19-Jun-2003 David Xu <davidxu@FreeBSD.org>

Fix typo. td should be td0.


# cd4f6ebb 14-Jun-2003 David Xu <davidxu@FreeBSD.org>

1. Add code to support bound thread. when blocked, a bound thread never
schedules an upcall. Signal delivering to a bound thread is same as
non-threaded process. This is intended to be used by libpthread to
implement PTHREAD_SCOPE_SYSTEM thread.
2. Simplify kse_release() a bit, remove sleep loop.


# 0e2a4d3a 14-Jun-2003 David Xu <davidxu@FreeBSD.org>

Rename P_THREADED to P_SA. P_SA means a process is using scheduler
activations.


# 677b542e 10-Jun-2003 David E. O'Brien <obrien@FreeBSD.org>

Use __FBSDID().


# 5e26dcb5 09-Jun-2003 John Baldwin <jhb@FreeBSD.org>

- Add a td_pflags field to struct thread for private flags accessed only by
curthread. Unlike td_flags, this field does not need any locking.
- Replace the td_inktr and td_inktrace variables with equivalent private
thread flags.
- Move TDF_OLDMASK over to the private flags field so it no longer requires
sched_lock.


# 8d542cb5 15-May-2003 David E. O'Brien <obrien@FreeBSD.org>

Fix long standing bug that prevents the PT_CONTINUE, PT_KILL and
PT_DETACH ptrace(2) requests from functioning as advertised in the
manual page. As described in kern/35175, the PT_DETACH request will,
under certain circumstances, pass an unwanted signal on to the traced
process upan detaching from it. The PT_CONTINUE request will
sometimes fail if you make it pass a signal that has "properties" that
differ from the properties of the signal that origionally caused the
traced process to be stopped. Since PT_KILL is nothing than
PT_CONTINUE with SIGKILL, it is broken too. In the PT_KILL case, this
leads to an unkillable process.

PR: 44011
Submitted by: Mark Kettenis <kettenis@chello.nl>
Approved by: re(jhb)


# 90af4afa 13-May-2003 John Baldwin <jhb@FreeBSD.org>

- Merge struct procsig with struct sigacts.
- Move struct sigacts out of the u-area and malloc() it using the
M_SUBPROC malloc bucket.
- Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(),
sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared().
- Remove the p_sigignore, p_sigacts, and p_sigcatch macros.
- Add a mutex to struct sigacts that protects all the members of the struct.
- Add sigacts locking.
- Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now
that sigacts is locked.
- Several in-kernel functions such as psignal(), tdsignal(), trapsignal(),
and thread_stopped() are now MP safe.

Reviewed by: arch@
Approved by: re (rwatson)


# b1bf1c3a 09-May-2003 John Baldwin <jhb@FreeBSD.org>

Remove Giant from kern_sigsuspend() and osigsuspend() as these should now
be MP safe.

Approved by: re (scottl)


# 854dc8c2 05-May-2003 John Baldwin <jhb@FreeBSD.org>

Mostly sort the includes.


# 18440c7f 05-May-2003 John Baldwin <jhb@FreeBSD.org>

Lock the proc lock around calls to tdsignal() in the sigwait() family of
syscalls.


# 6711f10f 05-May-2003 John Baldwin <jhb@FreeBSD.org>

Make issignal() private to kern_sig.c since it is only called from cursig()
and cursig() is now a function rather than a macro.


# a14e1189 30-Apr-2003 John Baldwin <jhb@FreeBSD.org>

Forgot to remove Giant around call to kern_sigaction() in
freebsd4_sigaction() in revision 1.232.


# 25d6dc06 25-Apr-2003 John Baldwin <jhb@FreeBSD.org>

Push Giant down into kern_sigaction() instead of locking it around calls
to kern_sigaction() in the various callers of the function.


# cf60731b 23-Apr-2003 John Baldwin <jhb@FreeBSD.org>

Remove Giant from osigblock(), osigsetmask(), and kern_sigaltstack().


# 5afe0c99 23-Apr-2003 John Baldwin <jhb@FreeBSD.org>

- Reorganize osigstack() to do the copyin first, grab the proc lock once,
do all the various sigstack dances, unlock the proc lock, and finally do
the copyout. This more closely resembles the behavior of
kern_sigaltstack() and closes a small race.
- Remove Giant from osigstack as it is no longer needed.


# 06ce69a7 18-Apr-2003 David Xu <davidxu@FreeBSD.org>

Unbreak sigaltstack syscall. sigonstack is now a function and
want proc lock be held.


# 8b94a061 18-Apr-2003 John Baldwin <jhb@FreeBSD.org>

- Make sigonstack() a regular function instead of an inline and add a proc
lock assertion to it.
- SIGPENDING() no longer needs sched_lock, so only grab sched_lock to set
the TDF_NEEDSIGCHK and TDF_ASTPENDING flags in signotify().
- Add a proc lock assertion to tdsigwakeup().
- Since we always set TDF_OLDMASK while holding the proc lock, the proc
lock is sufficient protection to check its state in postsig() and we only
need sched_lock when clearing the actual flag.


# e77daab1 18-Apr-2003 John Baldwin <jhb@FreeBSD.org>

Rename do_sigprocmask() to kern_sigprocmask() and make it a global symbol
so that it can be used by binary emulators.


# 9d8643ec 17-Apr-2003 John Baldwin <jhb@FreeBSD.org>

Don't hold the proc lock while performing sigset conversions on local
variables.


# 5edadff9 17-Apr-2003 John Baldwin <jhb@FreeBSD.org>

- Remove garbage SIGSETOR() that snuck into struct sigpending_args
definition.
- Use the proper constant for the last arg to kern_sigaction() in osigvec()
instead of a magic value.


# f9b89f7e 11-Apr-2003 David Xu <davidxu@FreeBSD.org>

Style fix.


# 5312b1c7 11-Apr-2003 David Xu <davidxu@FreeBSD.org>

Check SIG_HOLD action ealier to avoid missing test it in later code.


# c9dfa2e0 01-Apr-2003 Jeff Roberson <jeff@FreeBSD.org>

- p will be unused in cursig() if INVARIANTS is not defined. Access it
through td->td_proc to avoid the unused variable.

Spotted by: Maxim Konovalov <maxim@macomnet.ru>


# a447cd8b 31-Mar-2003 Jeff Roberson <jeff@FreeBSD.org>

- Define sigwait, sigtimedwait, and sigwaitinfo in terms of
kern_sigtimedwait() which is capable of supporting all of their semantics.
- These should be POSIX compliant but more careful review is needed before
we announce this.


# 4093529d 31-Mar-2003 Jeff Roberson <jeff@FreeBSD.org>

- Move p->p_sigmask to td->td_sigmask. Signal masks will be per thread with
a follow on commit to kern_sig.c
- signotify() now operates on a thread since unmasked pending signals are
stored in the thread.
- PS_NEEDSIGCHK moves to TDF_NEEDSIGCHK.


# da33176f 31-Mar-2003 Jeff Roberson <jeff@FreeBSD.org>

- Mark signals which may be delivered to any thread in the process with
SA_PROC. Signals without this flag should be directed to a particular
thread if this is possible.


# 1bf4700b 31-Mar-2003 Jeff Roberson <jeff@FreeBSD.org>

- Change trapsignal() to accept a thread and not a proc.
- Change all consumers to pass in a thread.

Right now this does not cause any functional changes but it will be important
later when signals can be delivered to specific threads.


# e574e444 10-Mar-2003 David Xu <davidxu@FreeBSD.org>

Fix threaded process job control bug. SMP tested.

Reviewed by: julian


# ef3dab76 08-Mar-2003 Tim J. Robbins <tjr@FreeBSD.org>

Hold the proc lock while accessing p_procsig in trapsignal().


# 26306795 04-Mar-2003 John Baldwin <jhb@FreeBSD.org>

Replace calls to WITNESS_SLEEP() and witness_list() with equivalent calls
to WITNESS_WARN().


# ac2e4153 26-Feb-2003 Julian Elischer <julian@FreeBSD.org>

Change the process flags P_KSES to be P_THREADED.
This is just a cosmetic change but I've been meaning to do it for about a year.


# 426269b2 25-Feb-2003 David Xu <davidxu@FreeBSD.org>

Fix a bug when handling SIGCONT.

Reported By: Mike Makonnen <mtm@identd.net>


# 58a3c273 17-Feb-2003 Jeff Roberson <jeff@FreeBSD.org>

- Add a new function, thread_signal_add(), that is called from postsig to
add a signal to a mailbox's pending set.
- Add a new function, thread_signal_upcall(), this causes the current thread
to upcall so that we can deliver pending signals.

Reviewed by: mini


# 4a338afd 17-Feb-2003 Julian Elischer <julian@FreeBSD.org>

Move a bunch of flags from the KSE to the thread.
I was in two minds as to where to put them in the first case..
I should have listenned to the other mind.

Submitted by: parts by davidxu@
Reviewed by: jeff@ mini@


# 5215b187 16-Feb-2003 Jeff Roberson <jeff@FreeBSD.org>

- Split the struct kse into struct upcall and struct kse. struct kse will
soon be visible only to schedulers. This greatly simplifies much the
KSE code.

Submitted by: davidxu


# 44443757 15-Feb-2003 Tim J. Robbins <tjr@FreeBSD.org>

Acquire Giant around calls to kern_sigaction() in sigaction(),
freebsd4_sigaction() and osigaction() instead of around the whole
body of those functions. They now no longer hold Giant around calls
to copyin() and copyout(), and it is slightly more obvious what
Giant is protecting.


# c41c566c 15-Feb-2003 Tim J. Robbins <tjr@FreeBSD.org>

osigpending() no longer needs Giant, for the same reason sigpending()
does not.


# 48e8f774 15-Feb-2003 Tim J. Robbins <tjr@FreeBSD.org>

All uses of p_siglist are protected by the proc lock now, so there's
no need to acquire Giant in sigpending() anymore.


# 6f8132a8 31-Jan-2003 Julian Elischer <julian@FreeBSD.org>

Reversion of commit by Davidxu plus fixes since applied.

I'm not convinced there is anything major wrong with the patch but
them's the rules..

I am using my "David's mentor" hat to revert this as he's
offline for a while.


# bf2053ca 27-Jan-2003 Peter Wemm <peter@FreeBSD.org>

No longer force COMPAT_FREEBSD4 to be on.


# 0dbb100b 26-Jan-2003 David Xu <davidxu@FreeBSD.org>

Move UPCALL related data structure out of kse, introduce a new
data structure called kse_upcall to manage UPCALL. All KSE binding
and loaning code are gone.

A thread owns an upcall can collect all completed syscall contexts in
its ksegrp, turn itself into UPCALL mode, and takes those contexts back
to userland. Any thread without upcall structure has to export their
contexts and exit at user boundary.

Any thread running in user mode owns an upcall structure, when it enters
kernel, if the kse mailbox's current thread pointer is not NULL, then
when the thread is blocked in kernel, a new UPCALL thread is created and
the upcall structure is transfered to the new UPCALL thread. if the kse
mailbox's current thread pointer is NULL, then when a thread is blocked
in kernel, no UPCALL thread will be created.

Each upcall always has an owner thread. Userland can remove an upcall by
calling kse_exit, when all upcalls in ksegrp are removed, the group is
atomatically shutdown. An upcall owner thread also exits when process is
in exiting state. when an owner thread exits, the upcall it owns is also
removed.

KSE is a pure scheduler entity. it represents a virtual cpu. when a thread
is running, it always has a KSE associated with it. scheduler is free to
assign a KSE to thread according thread priority, if thread priority is changed,
KSE can be moved from one thread to another.

When a ksegrp is created, there is always N KSEs created in the group. the
N is the number of physical cpu in the current system. This makes it is
possible that even an userland UTS is single CPU safe, threads in kernel still
can execute on different cpu in parallel. Userland calls kse_create to add more
upcall structures into ksegrp to increase concurrent in userland itself, kernel
is not restricted by number of upcalls userland provides.

The code hasn't been tested under SMP by author due to lack of hardware.

Reviewed by: julian


# b81c4d1e 06-Jan-2003 David Xu <davidxu@FreeBSD.org>

Forgot to call setrunnable() for un-idled thread.


# ea5ab16e 06-Jan-2003 David Xu <davidxu@FreeBSD.org>

Check signals for idled threads.


# 93a7aa79 27-Dec-2002 Julian Elischer <julian@FreeBSD.org>

Add code to ddb to allow backtracing an arbitrary thread.
(show thread {address})

Remove the IDLE kse state and replace it with a change in
the way threads sahre KSEs. Every KSE now has a thread, which is
considered its "owner" however a KSE may also be lent to other
threads in the same group to allow completion of in-kernel work.
n this case the owner remains the same and the KSE will revert to the
owner when the other work has been completed.

All creations of upcalls etc. is now done from
kse_reassign() which in turn is called from mi_switch or
thread_exit(). This means that special code can be removed from
msleep() and cv_wait().

kse_release() does not leave a KSE with no thread any more but
converts the existing thread into teh KSE's owner, and sets it up
for doing an upcall. It is just inhibitted from being scheduled until
there is some reason to do an upcall.

Remove all trace of the kse_idle queue since it is no-longer needed.
"Idle" KSEs are now on the loanable queue.


# d321df47 17-Dec-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Don't cast a pointer to (intptr_t) and then on to (int) when we cannot
be sure that (int) is large enough. Instead cast only to (intptr_t) and
cast the switch/case values to (intptr_t) as well.


# 23eeeff7 25-Oct-2002 Peter Wemm <peter@FreeBSD.org>

Split 4.x and 5.x signal handling so that we can keep 4.x signal
handling clean and functional as 5.x evolves. This allows some of the
nasty bandaids in the 5.x codepaths to be unwound.

Encapsulate 4.x signal handling under COMPAT_FREEBSD4 (there is an
anti-foot-shooting measure in place, 5.x folks need this for a while) and
finish encapsulating the older stuff under COMPAT_43. Since the ancient
stuff is required on alpha (longjmp(3) passes a 'struct osigcontext *'
to the current sigreturn(2), instead of the 'ucontext_t *' that sigreturn
is supposed to take), add a compile time check to prevent foot shooting
there too. Add uniform COMPAT_43 stubs for ia64/sparc64/powerpc.

Tested on: i386, alpha, ia64. Compiled on sparc64 (a few days ago).
Approved by: re


# 8c5d0137 02-Oct-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Fix mis-indentation.

Spotted by: FlexeLint


# 1d9c5696 01-Oct-2002 Juli Mallett <jmallett@FreeBSD.org>

Back our kernel support for reliable signal queues.

Requested by: rwatson, phk, and many others


# a88b260a 30-Sep-2002 Juli Mallett <jmallett@FreeBSD.org>

Back out code changes that snuck into the previous forced commit.


# 226e1171 30-Sep-2002 Juli Mallett <jmallett@FreeBSD.org>

(Forced commit, to clarify previous commit of ksiginfo/signal queue code.)

I've added a structure, kernel-private, to represent a pending or in-delivery
signal, called `ksiginfo'. It is roughly analogous to the basic information
that is exported by the POSIX interface 'siginfo_t', but more basic. I've
added functions to allocate these structures, and further to wrap all signal
operations using them.

Once the operations are wrapped, I've added a TailQ (see queue(3)) of these
structures to 'struct proc', and all pending signals are in that TailQ. When
a signal is being delivered, it is dequeued from the list. Once I finish
the spreading of ksiginfo throughout the tree, the dequeued structure will be
delivered to the process in question, whereas currently and normally, the
signal number is what is used.


# 1226f694 30-Sep-2002 Juli Mallett <jmallett@FreeBSD.org>

First half of implementation of ksiginfo, signal queues, and such. This
gets signals operating based on a TailQ, and is good enough to run X11,
GNOME, and do job control. There are some intricate parts which could be
more refined to match the sigset_t versions, but those require further
evaluation of directions in which our signal system can expand and contract
to fit our needs.

After this has been in the tree for a while, I will make in kernel API
changes, most notably to trapsignal(9) and sendsig(9), to use ksiginfo
more robustly, such that we can actually pass information with our
(queued) signals to the userland. That will also result in using a
struct ksiginfo pointer, rather than a signal number, in a lot of
kern_sig.c, to refer to an individual pending signal queue member, but
right now there is no defined behaviour for such.

CODAFS is unfinished in this regard because the logic is unclear in
some places.

Sponsored by: New Gold Technology
Reviewed by: bde, tjr, jake [an older version, logic similar]


# 21b68415 28-Sep-2002 David E. O'Brien <obrien@FreeBSD.org>

Fix style nit where conditionally compiled code was unconditionalized,
but style(9) was consulted.

Submitted by: bde


# 37c84183 28-Sep-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Be consistent about "static" functions: if the function is marked
static in its prototype, mark it static at the definition too.

Inspired by: FlexeLint warning #512


# c76e33b6 16-Sep-2002 Jonathan Mini <mini@FreeBSD.org>

Add kernel support needed for the KSE-aware libpthread:
- Use ucontext_t's to store KSE thread state.
- Synthesize state for the UTS upon each upcall, rather than
saving and copying a trapframe.
- Deliver signals to KSE-aware processes via upcall.
- Rename kse mailbox structure fields to be more BSD-like.
- Store the UTS's stack in struct proc in a stack_t.

Reviewed by: bde, deischen, julian
Approved by: -arch


# 4f0db5e0 15-Sep-2002 Julian Elischer <julian@FreeBSD.org>

Allocate KSEs and KSEGRPs separatly and remove them from the proc structure.
next step is to allow > 1 to be allocated per process. This would give
multi-processor threads. (when the rest of the infrastructure is
in place)

While doing this I noticed libkvm and sys/kern/kern_proc.c:fill_kinfo_proc
are diverging more than they should.. corrective action needed soon.


# 71fad9fd 11-Sep-2002 Julian Elischer <julian@FreeBSD.org>

Completely redo thread states.

Reviewed by: davidxu@freebsd.org


# 1279572a 05-Sep-2002 David Xu <davidxu@FreeBSD.org>

s/SGNL/SIG/
s/SNGL/SINGLE/
s/SNGLE/SINGLE/

Fix abbreviation for P_STOPPED_* etc flags, in original code they were
inconsistent and difficult to distinguish between them.

Approved by: julian (mentor)


# 35c32a76 02-Sep-2002 David Xu <davidxu@FreeBSD.org>

In the kernel code, we have the tsleep() call with the PCATCH argument.
PCATCH means 'if we get a signal, interrupt me!" and tsleep returns
either EINTR or ERESTART depending on the circumstances. ERESTART is
"special" because it causes the system call to fail, but right as it
returns back to userland it tells the trap handler to move %eip back a
bit so that userland will immediately re-run the syscall.
This is a syscall restart. It only works for things like read() etc where
nothing has changed yet. Note that *userland* is tricked into restarting
the syscall by the kernel. The kernel doesn't actually do the restart. It
is deadly for things like select, poll, nanosleep etc where it might cause
the elapsed time to be reset and start again from scratch. So those
syscalls do this to prevent userland rerunning the syscall:
if (error == ERESTART) error = EINTR;

Fake "signals" like SIGTSTP from ^Z etc do not normally invoke userland
signal handlers. But, in -current, the PCATCH *is* being triggered and
tsleep is returning ERESTART, and the syscall is aborted even though no
userland signal handler was run.
That is the fault here. We're triggering the PCATCH in cases that we
shouldn't. ie: it is being triggered on *any* signal processing, rather
than the case where the signal is posted to userland.
--- Peter

The work of psignal() is a patchwork of special case required by the process
debugging and job-control facilities...
--- Kirk McKusick
"The design and impelementation of the 4.4BSD Operating system"
Page 105

in STABLE source, when psignal is posting a STOP signal to sleeping
process and the signal action of the process is SIG_DFL, system will
directly change the process state from SSLEEP to SSTOP, and when
SIGCONT is posted to the stopped process, if it finds that the process
is still on sleep queue, the process state will be restored to SSLEEP,
and won't wakeup the process.

this commit mimics the behaviour in STABLE source tree.

Reviewed by: Jon Mini, Tim Robbins, Peter Wemm
Approved by: julian@freebsd.org (mentor)


# 8f19eb88 01-Sep-2002 Ian Dowse <iedowse@FreeBSD.org>

Split out a number of mostly VFS and signal related syscalls into
a kernel-internal kern_*() version and a wrapper that is called via
the syscall vector table. For paths and structure pointers, the
internal version either takes a uio_seg parameter or requires the
caller to copyin() the data to kernel memory as appropiate. This
will permit emulation layers to use these syscalls without having
to copy out translated arguments to the stack gap.

Discussed on: -arch
Review/suggestions: bde, jhb, peter, marcel


# b39f3284 25-Aug-2002 Julian Elischer <julian@FreeBSD.org>

move the assert to cover more cases


# d9d6e34f 23-Aug-2002 Julian Elischer <julian@FreeBSD.org>

Don't re-lock the sched lock if we didn't unlock it.

Original error by: David Xu <bsddiy@yahoo.com>
Fix by: David Xu <bsddiy@yahoo.com>
Completely failed to spot it: Julian Elischer <julian@freebsd.org>


# 721e5910 21-Aug-2002 Julian Elischer <julian@FreeBSD.org>

Revert some suspension/sleep/signal code from KSE-III
We need to rethink a bit of this and it doesn't matter if
we break the KSE test program for now as long
as non-KSE programs act as expected.

Submitted by: David Xu <bsddiy@yahoo.com>
(this guy's just asking to get hit with a commit bit..)


# 6933e3c1 08-Aug-2002 Julian Elischer <julian@FreeBSD.org>

Do some work on keeping better track of stopped/continued state.
I'm not sure what happenned to the original setting of the P_CONTINUED
flag. it appears to have been lost in the paper shuffling...

Submitted by: David Xu <bsddiy@yahoo.com>


# 1c530be4 06-Aug-2002 Bruce Evans <bde@FreeBSD.org>

Try harder to "set signal flags proprly [sic] for ast()". See rev.1.154.


# 04774f23 01-Aug-2002 Julian Elischer <julian@FreeBSD.org>

Slight cleanup of some comments/whitespace.
Make idle process state more consistant.
Add an assert on thread state.
Clean up idleproc/mi_switch() interaction.
Use a local instead of referencing curthread 7 times in a row
(I've been told curthread can be expensive on some architectures)
Remove some commented out code.
Add a little commented out code (completion coming soon)

Reviewed by: jhb@freebsd.org


# 4d492b43 30-Jul-2002 Julian Elischer <julian@FreeBSD.org>

Don't need to hold schedlock specifically for stop() ans it calls wakeup()
that locks it anyhow.

Reviewed by: jhb@freebsd.org


# 38038891 24-Jul-2002 Julian Elischer <julian@FreeBSD.org>

revert some of the handling of STOP signals in
issignal(). Let thread_suspend_check() actually do the suspension
at the user boundary.

Submitted by: David Xu <bsddiy@yahoo.com>


# 832dafad 10-Jul-2002 Don Lewis <truckman@FreeBSD.org>

Rearrange the code so that it checks whether the file is something
valid to write a core dump to before doing the preparations to actually
write to the file.

Call VOP_GETATTR() before dropping the initial vnode lock.


# aa0fa334 03-Jul-2002 Julian Elischer <julian@FreeBSD.org>

Try clean up some of the mess that resulted from layers and layers
of p4 merges from -current as things started getting different.

Corroborated by: Similar patches just mailed by BDE.


# ee9919b0 03-Jul-2002 Julian Elischer <julian@FreeBSD.org>

White space commit.
I'm working on this file but I wanted to make the whitespece commit
separatly.


# 0ac3b636 02-Jul-2002 Andrew Gallatin <gallatin@FreeBSD.org>

Hold the sched lock across call to forward_signal() in tdsignal() to
keep SMP systems from panic'ing when ^C'ing an app

suggested by julian


# e602ba25 29-Jun-2002 Julian Elischer <julian@FreeBSD.org>

Part 1 of KSE-III

The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)

Reviewed by: Almost everyone who counts
(at various times, peter, jhb, matt, alfred, mini, bernd,
and a cast of thousands)

NOTE: this is still Beta code, and contains lots of debugging stuff.
expect slight instability in signals..


# 01609114 28-Jun-2002 Alfred Perlstein <alfred@FreeBSD.org>

more caddr_t removal.


# 374a15aa 06-Jun-2002 John Baldwin <jhb@FreeBSD.org>

- trapsignal() no longer needs to acquire Giant for ktrpsig().
- Catch up to new ktrace API.


# ca18d53e 06-Jun-2002 Chad David <davidc@FreeBSD.org>

s/!SIGNOTEMPY/SIGISEMPTY/

Reviewed by: marcel, jhb, alfred


# 6ee093fb 01-Jun-2002 Mike Barcroft <mike@FreeBSD.org>

Add POSIX.1-2001 WCONTINUED option for waitpid(2). A proc flag
(P_CONTINUED) is set when a stopped process receives a SIGCONT and
cleared after it has notified a parent process that has requested
notification via waitpid(2) with WCONTINUED specified in its options
operand. The status value can be checked with the new WIFCONTINUED()
macro.

Reviewed by: jake


# 628855e7 29-May-2002 Julian Elischer <julian@FreeBSD.org>

CURSIG() is not a macro so rename it cursig().

Obtained from: KSE tree


# f44d9e24 18-May-2002 John Baldwin <jhb@FreeBSD.org>

Change p_can{debug,see,sched,signal}()'s first argument to be a thread
pointer instead of a proc pointer and require the process pointed to
by the second argument to be locked. We now use the thread ucred reference
for the credential checks in p_can*() as a result. p_canfoo() should now
no longer need Giant.


# 66101641 14-May-2002 Robert Watson <rwatson@FreeBSD.org>

p_cansignal() returns an errno value; at some point, the check for
inter-process signalling ceased to preserve and return that value,
instead always returning EPERM. This meant that it was possible
to "probe" the pid space for processes that were not otherwise
visible. This change reverts that reversion.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# d8f4f6a4 08-May-2002 Jonathan Mini <mini@FreeBSD.org>

Remove trace_req().

Reviewed by: alfred, jhb, peter


# 8b43b535 08-May-2002 Alfred Perlstein <alfred@FreeBSD.org>

expand_name fixes:

.) don't use MAXPATHLEN + 1, fix logic to compensate.
.) style(9) function parameters.
.) fix line wrapping.
.) remove duplicated error and string handling code.
.) don't NUL terminate already NUL terminated string.
.) all string length variables changed from int to size_t.
.) constify variables.
.) catch when corename would be truncated.
.) cast pid_t and uid_t args for format string.
.) add parens around return arguments.

Help and suggestions from: bde


# b2bc3101 07-May-2002 Alfred Perlstein <alfred@FreeBSD.org>

M_ZERO the temp buffer in expand_name() otherwise if an error occurs
while logging we may pass a non NUL terminated string to log(9) for a
%s format arg.


# f5216b9a 04-May-2002 Bruce Evans <bde@FreeBSD.org>

Return the correct error code (ENOSYS, not EINVAL) from nosys(). Getting
killed by SIGSYS for unimlemented syscalls is bad enough.

Obtained from: Lite2 branch

The Lite2 branch has some other interesting unmerged (?) bits in this
file. They are well hidden among cosmetic regressions.


# 9b3b1c5f 02-May-2002 John Baldwin <jhb@FreeBSD.org>

- Reorder execve() so that it performs blocking operations before it
locks the process.
- Defer other blocking operations such as vrele()'s until after we
release locks.
- execsigs() now requires the proc lock to be held when it is called
rather than locking the process internally.


# f1320723 01-May-2002 Alfred Perlstein <alfred@FreeBSD.org>

Redo the sigio locking.

Turn the sigio sx into a mutex.

Sigio lock is really only needed to protect interrupts from dereferencing
the sigio pointer in an object when the sigio itself is being destroyed.

In order to do this in the most unintrusive manner change pgsigio's
sigio * argument into a **, that way we can lock internally to the
function.


# ba1551ca 27-Apr-2002 Ian Dowse <iedowse@FreeBSD.org>

Avoid the user-visible effect of setting SA_NOCLDWAIT when the
SIGCHLD handler is SIG_IGN. This is a reimplementation of the
problematic revision 1.131 of kern_exit.c. To avoid accessing process
UPAGES, we set a new procsig flag when the SIGCHLD handler is SIG_IGN
and use that instead.


# ba626c1d 16-Apr-2002 John Baldwin <jhb@FreeBSD.org>

Lock proctree_lock instead of pgrpsess_lock.


# 9c1ab3e0 13-Apr-2002 John Baldwin <jhb@FreeBSD.org>

- Change killpg1()'s first argument to be a thread instead of a process so
we can use td_ucred.
- In killpg1(), the proc lock is sufficient to check if p_stat is SZOMB
or not. We don't need sched_lock.
- Close some races in psignal(). In psignal() there is a big switch
statement based on p_stat. All the different cases are assuming that
the process (or thread) isn't going to change state out from under it.
To ensure this is true, just lock sched_lock for the entire switch. We
practically held it the entire time already anyways. This also
simplifies the locking somewhat and actually results in fewer lock
operations.
- Allow signotify() to be called with the sched_lock held since psignal()
now does that.
- Use td_ucred in a couple of places.


# 79065dba 04-Apr-2002 Bruce Evans <bde@FreeBSD.org>

Moved signal handling and rescheduling from userret() to ast() so that
they aren't in the usual path of execution for syscalls and traps.
The main complication for this is that we have to set flags to control
ast() everywhere that changes the signal mask.

Avoid locking in userret() in most of the remaining cases.

Submitted by: luoqi (first part only, long ago, reorganized by me)
Reminded by: dillon


# 179235b3 04-Apr-2002 Bruce Evans <bde@FreeBSD.org>

Optimized the check for unmasked pending signals in CURSIG() using a new
inline function sigsetmasked() and a new macro SIGPENDING(). CURSIG()
will soon be moved out of the normal path of execution for syscalls and
traps. Then its efficiency will be less important but the new interfaces
will be useful for checking for unmasked pending signals in more places.

Submitted by: luoqi (long ago, in a slightly different form)

Assert that sched_lock is not held in CURSIG().


# 70f52b48 23-Mar-2002 Bruce Evans <bde@FreeBSD.org>

Fixed some style bugs in the removal of __P(()). The main ones were
not removing tabs before "__P((", and not outdenting continuation lines
to preserve non-KNF lining up of code with parentheses. Switch to KNF
formatting and/or rewrap the whole prototype in some cases.


# 4d77a549 19-Mar-2002 Alfred Perlstein <alfred@FreeBSD.org>

Remove __P.


# 0f5c7c4b 26-Feb-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Fix warning in !SMP case.

Submitted by: Maxime Henrion <mux@mu.org>


# f591779b 23-Feb-2002 Seigo Tanimura <tanimura@FreeBSD.org>

Lock struct pgrp, session and sigio.

New locks are:

- pgrpsess_lock which locks the whole pgrps and sessions,
- pg_mtx which protects the pgrp members, and
- s_mtx which protects the session members.

Please refer to sys/proc.h for the coverage of these locks.

Changes on the pgrp/session interface:

- pgfind() needs the pgrpsess_lock held.

- The caller of enterpgrp() is responsible to allocate a new pgrp and
session.

- Call enterthispgrp() in order to enter an existing pgrp.

- pgsignal() requires a pgrp lock held.

Reviewed by: jhb, alfred
Tested on: cvsup.jp.FreeBSD.org
(which is a quad-CPU machine running -current)


# 8c3d74f4 14-Feb-2002 Bruce Evans <bde@FreeBSD.org>

Fixed a typo in rev.1.65 that gave a reference to a nonexistent variable.
This was not detected by LINT because LINT is missing COMPAT_SUNOS.


# 2c100766 11-Feb-2002 Julian Elischer <julian@FreeBSD.org>

In a threaded world, differnt priorirites become properties of
different entities. Make it so.

Reviewed by: jhb@freebsd.org (john baldwin)


# 5da271f5 10-Feb-2002 Robert Watson <rwatson@FreeBSD.org>

Add a comment indicating that VOP_GETATTR() is called without appropriate
locking in the core dump code. This should be fixed.


# 079b7bad 07-Feb-2002 Julian Elischer <julian@FreeBSD.org>

Pre-KSE/M3 commit.
this is a low-functionality change that changes the kernel to access the main
thread of a process via the linked list of threads rather than
assuming that it is embedded in the process. It IS still embeded there
but remove all teh code that assumes that in preparation for the next commit
which will actually move it out.

Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,


# 2b87b6d4 09-Jan-2002 Robert Watson <rwatson@FreeBSD.org>

o Revert kern_sig.c#1.143, as cr_cansignal() doesn't currently permit
a number of desirable cases in which SIGIO/SIGURG are delivered. We'll
keep tweaking.

Reported by: Alexander Kabaev <ak03@gte.com>


# f8efde89 05-Jan-2002 Robert Watson <rwatson@FreeBSD.org>

- Teach SIGIO code to use cr_cansignal() instead of a custom CANSIGIO()
macro. As a result, mandatory signal delivery policies will be
applied consistently across the kernel.

- Note that this subtly changes the protection semantics, and we should
watch out for any resulting breakage. Previously, delivery of SIGIO
in this circumstance was limited to situations where the subject was
privileged, or where one of the subject's (ruid, euid) matched one
of the object's (ruid, euid). In the new scenario, subject (ruid, euid)
are matched against the object's (ruid, svuid), and the object uid's
must be a subset of the subject uid's. Likewise, jail now affects
delivery, and special handling for P_SUGID of the object is present.
This change can always be reversed or tweaked if it proves to disrupt
application behavior substantially.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# c86b6ff5 05-Jan-2002 John Baldwin <jhb@FreeBSD.org>

Change the preemption code for software interrupt thread schedules and
mutex releases to not require flags for the cases when preemption is
not allowed:

The purpose of the MTX_NOSWITCH and SWI_NOSWITCH flags is to prevent
switching to a higher priority thread on mutex releease and swi schedule,
respectively when that switch is not safe. Now that the critical section
API maintains a per-thread nesting count, the kernel can easily check
whether or not it should switch without relying on flags from the
programmer. This fixes a few bugs in that all current callers of
swi_sched() used SWI_NOSWITCH, when in fact, only the ones called from
fast interrupt handlers and the swi_sched of softclock needed this flag.
Note that to ensure that swi_sched()'s in clock and fast interrupt
handlers do not switch, these handlers have to be explicitly wrapped
in critical_enter/exit pairs. Presently, just wrapping the handlers is
sufficient, but in the future with the fully preemptive kernel, the
interrupt must be EOI'd before critical_exit() is called. (critical_exit()
can switch due to a deferred preemption in a fully preemptive kernel.)

I've tested the changes to the interrupt code on i386 and alpha. I have
not tested ia64, but the interrupt code is almost identical to the alpha
code, so I expect it will work fine. PowerPC and ARM do not yet have
interrupt code in the tree so they shouldn't be broken. Sparc64 is
broken, but that's been ok'd by jake and tmm who will be fixing the
interrupt code for sparc64 shortly.

Reviewed by: peter
Tested on: i386, alpha


# 48f1ba5b 13-Dec-2001 Robert Watson <rwatson@FreeBSD.org>

o Wording fix in comment.

Submitted by: tanimura via p4


# 6c1534a7 03-Nov-2001 Peter Wemm <peter@FreeBSD.org>

_SIG_MAXSIG (128) is the highest legal signal. The arrays are offset
by one - see _SIG_IDX(). Revert part of my mis-correction in kern_sig.c
(but signal 0 still has to be allowed) and fix _SIG_VALID() (it was
rejecting ignal 128).


# 049954de 02-Nov-2001 Peter Wemm <peter@FreeBSD.org>

Partial reversion of rev 1.138. kill and killpg allow a signal
argument of 0. You cannot return EINVAL for signal 0. This broke
(in 5 minutes of testing) at least ssh-agent and screen.

However, there was a bug in the original code. Signal 128 is not
valid.

Pointy-hat to: des, jhb


# 2899d606 02-Nov-2001 Dag-Erling Smørgrav <des@FreeBSD.org>

We have a _SIG_VALID() macro, so use it instead of duplicating the test all
over the place. Also replace a printf() + panic() with a KASSERT().

Reviewed by: jhb


# 80f42b55 07-Oct-2001 Ian Dowse <iedowse@FreeBSD.org>

Fix a typo in do_sigaction() where sa_sigaction and sa_handler were
confused. Since sa_sigaction and sa_handler alias each other in a
union, the bug was completely harmless. This had been fixed as part
of the SIGCHLD changes in revision 1.125, but it was reverted when
they were backed out in revision 1.126.


# 88b1d98f 25-Sep-2001 Paul Saab <ps@FreeBSD.org>

Lock the vnode while truncating the corefile. This fixes a panic
with softupdates dangling deps.

Submitted by: peter
MFC: ASAP :)


# fdd4e5c6 17-Sep-2001 Julian Elischer <julian@FreeBSD.org>

Replace line accidentally deleted during KSE additions.
Symptom.. Stopped program unable to be restarted if it was stopped
while already sleeping.


# 9844fbc3 15-Sep-2001 Robert Watson <rwatson@FreeBSD.org>

o Correct authorization check in CANSIGIO(), which suffered from incorrect
transcription during the (pcred,ucred) merge; this was not used for
the kill() system call, so does not affect direct explicit process
signalling.

Pointed out by: fenner


# b40ce416 12-Sep-2001 Julian Elischer <julian@FreeBSD.org>

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 06ae1e91 08-Sep-2001 Matthew Dillon <dillon@FreeBSD.org>

This brings in a Yahoo coredump patch from Paul, with additional mods by
me (addition of vn_rdwr_inchunks). The problem Yahoo is solving is that
if you have large process images core dumping, or you have a large number of
forked processes all core dumping at the same time, the original coredump code
would leave the vnode locked throughout. This can cause the directory vnode
to get locked up, which can cause the parent directory vnode to get locked
up, and so on all the way to the root node, locking the entire machine up
for extremely long periods of time.

This patch solves the problem in two ways. First it uses an advisory
non-blocking lock to abort multiple processes trying to core to the same
file. Second (my contribution) it chunks up the writes and uses bwillwrite()
to avoid holding the vnode locked while blocking in the buffer cache.

Submitted by: ps
Reviewed by: dillon
MFC after: 2 weeks


# df53e91c 06-Sep-2001 John Baldwin <jhb@FreeBSD.org>

Call sendsig() with the proc lock held and return with it held.


# fb99ab88 01-Sep-2001 Matthew Dillon <dillon@FreeBSD.org>

Giant Pushdown

clock_gettime() clock_settime() nanosleep() settimeofday()
adjtime() getitimer() setitimer() __sysctl() ogetkerninfo()
sigaction() osigaction() sigpending() osigpending() osigvec()
osigblock() osigsetmask() sigsuspend() osigsuspend() osigstack()
sigaltstack() kill() okillpg() trapsignal() nosys()


# 356861db 30-Aug-2001 Matthew Dillon <dillon@FreeBSD.org>

Remove the MPSAFE keyword from the parser for syscalls.master.
Instead introduce the [M] prefix to existing keywords. e.g.
MSTD is the MP SAFE version of STD. This is prepatory for a
massive Giant lock pushdown. The old MPSAFE keyword made
syscalls.master too messy.

Begin comments MP-Safe procedures with the comment:
/*
* MPSAFE
*/
This comments means that the procedure may be called without
Giant held (The procedure itself may still need to obtain
Giant temporarily to do its thing).

sv_prepsyscall() is now MP SAFE and assumed to be MP SAFE
sv_transtrap() is now MP SAFE and assumed to be MP SAFE

ktrsyscall() and ktrsysret() are now MP SAFE (Giant Pushdown)
trapsignal() is now MP SAFE (Giant Pushdown)

Places which used to do the if (mtx_owned(&Giant)) mtx_unlock(&Giant)
test in syscall[2]() in */*/trap.c now do not. Instead they
explicitly unlock Giant if they previously obtained it, and then
assert that it is no longer held to catch broken system calls.

Rebuild syscall tables.


# ccdbd10c 24-Aug-2001 Peter Pentchev <roam@FreeBSD.org>

Prevent passing a null pointer as a filename to vn_open(),
if for some reason expand_name() failed to build a core file name.

PR: 29931
Submitted by: Foldi Tamas <crow@kapu.hu>
Reviewed by: dd, -arch
MFC after: 1 month


# e8ebc08f 20-Aug-2001 Peter Wemm <peter@FreeBSD.org>

Make COMPAT_43 optional again. XXX we need COMPAT_FBSD3 etc for this
stuff.


# aa7a4dae 01-Aug-2001 Peter Wemm <peter@FreeBSD.org>

Temporarily back out kern_sig.c rev 1.125 and kern_exit.c rev 1.131.
This paniced my one of my machines one time too many :-( and there is
no sign of a solution in the pipeline. The deltas are still easily
available in cvs. The problem is that if the parent has been swapped
out, the child process cannot grope around in the parent's UPAGES to
see the sigact[] array or it will fault. This probably is a showstopper
for this implementation anyway.


# 4fec48c6 22-Jul-2001 Matthew Dillon <dillon@FreeBSD.org>

As per further discussions on hackers redo the SIGCHLD patch to not generate
an unexpected user-visible side effect with the sigaction flags. Also cleanup
a minor union issue.

Submitted by: Rudolf Cejka <cejkar@dcse.fee.vutbr.cz>
MFC addendum: MFC will be combined w/ original commit
MFC after: 3 days


# 64acb05b 02-Jul-2001 John Baldwin <jhb@FreeBSD.org>

Grab Giant around postsig() since sendsig() can call into the vm to
grow the stack and we already needed Giant for KTRACE.


# 2ad7d304 22-Jun-2001 John Baldwin <jhb@FreeBSD.org>

- Change CURSIG() and postsig() to require that the proc lock is held
rather than grabbing it and releasing it themselves. This allows callers
of these functions to get the lock to close race conditions.
- Grab Giant around ktrace in postsig.
- Count the switches performed on SIGSTOP's as involuntary context switches
in the resource usage stats.

Reported by: tegge (signal race), bde (missing csw stats)


# 6fad32af 18-Jun-2001 John Baldwin <jhb@FreeBSD.org>

Lock Giant in postsig() for the KTRACE case as ktrpsig() needs Giant when
it writes out to the trace file.

Reported by: peter, gallatin, and others


# c7fd62da 11-Jun-2001 David Malone <dwmalone@FreeBSD.org>

Try to make the setting of the SIGCHLD handler the same as setting of
the NOCLDWAI flag. Susv2 seems to require this.

Submitted by: Cejka Rudolf <cejkar@dcse.fee.vutbr.cz>
Reviewed by: dillon


# b1fc0ec1 25-May-2001 Robert Watson <rwatson@FreeBSD.org>

o Merge contents of struct pcred into struct ucred. Specifically, add the
real uid, saved uid, real gid, and saved gid to ucred, as well as the
pcred->pc_uidinfo, which was associated with the real uid, only rename
it to cr_ruidinfo so as not to conflict with cr_uidinfo, which
corresponds to the effective uid.
o Remove p_cred from struct proc; add p_ucred to struct proc, replacing
original macro that pointed.
p->p_ucred to p->p_cred->pc_ucred.
o Universally update code so that it makes use of ucred instead of pcred,
p->p_ucred instead of p->p_pcred, cr_ruidinfo instead of p_uidinfo,
cr_{r,sv}{u,g}id instead of p_*, etc.
o Remove pcred0 and its initialization from init_main.c; initialize
cr_ruidinfo there.
o Restruction many credential modification chunks to always crdup while
we figure out locking and optimizations; generally speaking, this
means moving to a structure like this:
newcred = crdup(oldcred);
...
p->p_ucred = newcred;
crfree(oldcred);
It's not race-free, but better than nothing. There are also races
in sys_process.c, all inter-process authorization, fork, exec, and
exit.
o Remove sigio->sio_ruid since sigio->sio_ucred now contains the ruid;
remove comments indicating that the old arrangement was a problem.
o Restructure exec1() a little to use newcred/oldcred arrangement, and
use improved uid management primitives.
o Clean up exit1() so as to do less work in credential cleanup due to
pcred removal.
o Clean up fork1() so as to do less work in credential cleanup and
allocation.
o Clean up ktrcanset() to take into account changes, and move to using
suser_xxx() instead of performing a direct uid==0 comparision.
o Improve commenting in various kern_prot.c credential modification
calls to better document current behavior. In a couple of places,
current behavior is a little questionable and we need to check
POSIX.1 to make sure it's "right". More commenting work still
remains to be done.
o Update credential management calls, such as crfree(), to take into
account new ruidinfo reference.
o Modify or add the following uid and gid helper routines:
change_euid()
change_egid()
change_ruid()
change_rgid()
change_svuid()
change_svgid()
In each case, the call now acts on a credential not a process, and as
such no longer requires more complicated process locking/etc. They
now assume the caller will do any necessary allocation of an
exclusive credential reference. Each is commented to document its
reference requirements.
o CANSIGIO() is simplified to require only credentials, not processes
and pcreds.
o Remove lots of (p_pcred==NULL) checks.
o Add an XXX to authorization code in nfs_lock.c, since it's
questionable, and needs to be considered carefully.
o Simplify posix4 authorization code to require only credentials, not
processes and pcreds. Note that this authorization, as well as
CANSIGIO(), needs to be updated to use the p_cansignal() and
p_cansched() centralized authorization routines, as they currently
do not take into account some desirable restrictions that are handled
by the centralized routines, as well as being inconsistent with other
similar authorization instances.
o Update libkvm to take these changes into account.

Obtained from: TrustedBSD Project
Reviewed by: green, bde, jhb, freebsd-arch, freebsd-audit


# 9081e5e8 15-May-2001 John Baldwin <jhb@FreeBSD.org>

- Remove unneeded include of sys/ipl.h.
- Require the proc lock be held for killproc() to allow for the vmdaemon to
kill a process when memory is exhausted while holding the lock of the
process to kill.


# 3b26be6a 07-May-2001 Akinori MUSHA <knu@FreeBSD.org>

Properly copy the P_ALTSTACK flag in struct proc::p_flag to the child
process on fork(2).

It is the supposed behavior stated in the manpage of sigaction(2), and
Solaris, NetBSD and FreeBSD 3-STABLE correctly do so.

The previous fix against libc_r/uthread/uthread_fork.c fixed the
problem only for the programs linked with libc_r, so back it out and
fix fork(2) itself to help those not linked with libc_r as well.

PR: kern/26705
Submitted by: KUROSAWA Takahiro <fwkg7679@mb.infoweb.ne.jp>
Tested by: knu, GOTOU Yuuzou <gotoyuzo@notwork.org>,
and some other people
Not objected by: hackers
MFC in: 3 days


# 6caa8a15 27-Apr-2001 John Baldwin <jhb@FreeBSD.org>

Overhaul of the SMP code. Several portions of the SMP kernel support have
been made machine independent and various other adjustments have been made
to support Alpha SMP.

- It splits the per-process portions of hardclock() and statclock() off
into hardclock_process() and statclock_process() respectively. hardclock()
and statclock() call the *_process() functions for the current process so
that UP systems will run as before. For SMP systems, it is simply necessary
to ensure that all other processors execute the *_process() functions when the
main clock functions are triggered on one CPU by an interrupt. For the alpha
4100, clock interrupts are delievered in a staggered broadcast fashion, so
we simply call hardclock/statclock on the boot CPU and call the *_process()
functions on the secondaries. For x86, we call statclock and hardclock as
usual and then call forward_hardclock/statclock in the MD code to send an IPI
to cause the AP's to execute forwared_hardclock/statclock which then call the
*_process() functions.
- forward_signal() and forward_roundrobin() have been reworked to be MI and to
involve less hackery. Now the cpu doing the forward sets any flags, etc. and
sends a very simple IPI_AST to the other cpu(s). AST IPIs now just basically
return so that they can execute ast() and don't bother with setting the
astpending or needresched flags themselves. This also removes the loop in
forward_signal() as sched_lock closes the race condition that the loop worked
around.
- need_resched(), resched_wanted() and clear_resched() have been changed to take
a process to act on rather than assuming curproc so that they can be used to
implement forward_roundrobin() as described above.
- Various other SMP variables have been moved to a MI subr_smp.c and a new
header sys/smp.h declares MI SMP variables and API's. The IPI API's from
machine/ipl.h have moved to machine/smp.h which is included by sys/smp.h.
- The globaldata_register() and globaldata_find() functions as well as the
SLIST of globaldata structures has become MI and moved into subr_smp.c.
Also, the globaldata list is only available if SMP support is compiled in.

Reviewed by: jake, peter
Looked over by: eivind


# 33a9ed9d 23-Apr-2001 John Baldwin <jhb@FreeBSD.org>

Change the pfind() and zpfind() functions to lock the process that they
find before releasing the allproc lock and returning.

Reviewed by: -smp, dfr, jake


# 4c5eb9c3 11-Apr-2001 Robert Watson <rwatson@FreeBSD.org>

o Replace p_cankill() with p_cansignal(), remove wrappage of p_can()
from signal authorization checking.
o p_cansignal() takes three arguments: subject process, object process,
and signal number, unlike p_cankill(), which only took into account
the processes and not the signal number, improving the abstraction
such that CANSIGNAL() from kern_sig.c can now also be eliminated;
previously CANSIGNAL() special-cased the handling of SIGCONT based
on process session. privused is now deprecated.
o The new p_cansignal() further limits the set of signals that may
be delivered to processes with P_SUGID set, and restructures the
access control check to allow it to be extended more easily.
o These changes take into account work done by the OpenBSD Project,
as well as by Robert Watson and Thomas Moestl on the TrustedBSD
Project.

Obtained from: TrustedBSD Project


# 5b3047d5 02-Apr-2001 John Baldwin <jhb@FreeBSD.org>

Change stop() to require the sched_lock as well as p's process lock to
avoid silly lock contention on sched_lock since in 2 out of the 3 places
that we call stop(), we get sched_lock right after calling it and we were
locking sched_lock inside of stop() anyways.


# 13330476 02-Apr-2001 John Baldwin <jhb@FreeBSD.org>

- Move the second stop() of process 'p' in issignal() to be after we send
SIGCHLD to our parent process. Otherwise, we could block while obtaining
the process lock for our parent process and switch out while we were
in SSTOP. Even worse, when we try to resume from the mutex being blocked
on our p_stat will be SRUN, not SSTOP.
- Fix a comment above stop() to indicate that it requires that the proc lock
be held, not a proctree lock.

Reported by: markm
Sleuthing by: jake


# 1005a129 28-Mar-2001 John Baldwin <jhb@FreeBSD.org>

Convert the allproc and proctree locks from lockmgr locks to sx locks.


# c31146a1 28-Mar-2001 John Baldwin <jhb@FreeBSD.org>

- Resort some includes to deal with the new witness code coming in shortly.
- Make sure we have Giant locked before calling coredump() in sigexit().

Spotted by: peter (2)


# 628d2653 06-Mar-2001 John Baldwin <jhb@FreeBSD.org>

- Proc locking. Most of signal handling is now MP safe and doesn't require
Giant. The only exception is the CANSIGNAL() macro. Unlocking the proc
lock around sendsig() in trapsignal() is also questionable. Note that
the functions sigexit(), psignal(), and issignal() must be called with
the proc lock of the process in question held. postsig() and
trapsignal() should not be called with the proc lock held, but they
also do not require Giant anymore either.
- Remove spl's that are now no longer needed as they are fully replaced.


# d2ef4060 19-Feb-2001 Bruce Evans <bde@FreeBSD.org>

Fixed a longstanding latency bug in signal delivery. When a signal
is sent to a process, psignal() needs to schedule an AST for the
process if the process is runnable, not just if it is current, so that
pending signals get checked for on the next return of the process to
user mode. This wasn't practical until recently because the AST flag
was per-cpu so setting it for a non-current process would usually just
cause a bogus AST for the current process.

For non-current processes looping in user mode, it took accidental
(?) magic to deliver signals at all. Signals were usually delivered
late as a side effect of rescheduling (need_resched() sets astpending,
etc.). In pre-SMPng, delivery was delayed by at most 1 quantum (the
need_resched() call in roundrobin() is certain to occur within 1
quantum for looping processes). In -current, things are complicated
by normal interrupt handlers being threads. Missing handling of the
complications makes roundrobin() a bogus no-op, but preemptive
scheduling sort of works anyway due to even larger bogons elsewhere.


# d5a08a60 11-Feb-2001 Jake Burkholder <jake@FreeBSD.org>

Implement a unified run queue and adjust priority levels accordingly.

- All processes go into the same array of queues, with different
scheduling classes using different portions of the array. This
allows user processes to have their priorities propogated up into
interrupt thread range if need be.
- I chose 64 run queues as an arbitrary number that is greater than
32. We used to have 4 separate arrays of 32 queues each, so this
may not be optimal. The new run queue code was written with this
in mind; changing the number of run queues only requires changing
constants in runq.h and adjusting the priority levels.
- The new run queue code takes the run queue as a parameter. This
is intended to be used to create per-cpu run queues. Implement
wrappers for compatibility with the old interface which pass in
the global run queue structure.
- Group the priority level, user priority, native priority (before
propogation) and the scheduling class into a struct priority.
- Change any hard coded priority levels that I found to use
symbolic constants (TTIPRI and TTOPRI).
- Remove the curpriority global variable and use that of curproc.
This was used to detect when a process' priority had lowered and
it should yield. We now effectively yield on every interrupt.
- Activate propogate_priority(). It should now have the desired
effect without needing to also propogate the scheduling class.
- Temporarily comment out the call to vm_page_zero_idle() in the
idle loop. It interfered with propogate_priority() because
the idle process needed to do a non-blocking acquire of Giant
and then other processes would try to propogate their priority
onto it. The idle process should not do anything except idle.
vm_page_zero_idle() will return in the form of an idle priority
kernel thread which is woken up at apprioriate times by the vm
system.
- Update struct kinfo_proc to the new priority interface. Deliberately
change its size by adjusting the spare fields. It remained the same
size, but the layout has changed, so userland processes that use it
would parse the data incorrectly. The size constraint should really
be changed to an arbitrary version number. Also add a debug.sizeof
sysctl node for struct kinfo_proc.


# 142ba5f3 09-Feb-2001 John Baldwin <jhb@FreeBSD.org>

- Make astpending and need_resched process attributes rather than CPU
attributes. This is needed for AST's to be properly posted in a preemptive
kernel. They are backed by two new flags in p_sflag: PS_ASTPENDING and
PS_NEEDRESCHED. They are still accesssed by their old macros:
aston(), astoff(), etc. For completeness, an astpending() macro has been
added to check for a pending AST, and clear_resched() has been added to
clear need_resched().
- Rename syscall2() on the x86 back to syscall() to be consistent with
other architectures.


# 9ed346ba 08-Feb-2001 Bosko Milekic <bmilekic@FreeBSD.org>

Change and clean the mutex lock interface.

mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)


# 40447cd4 24-Jan-2001 John Baldwin <jhb@FreeBSD.org>

- Proc locking.
- Catch up to proc flag changes.


# 568ae39f 19-Jan-2001 John Baldwin <jhb@FreeBSD.org>

Revert revision 1.102. I don't think p_nice needs to be protected with
sched_lock, and I'm fairly certain P_TRACED will be protected with the
proc lock instead.

Pointed out indirectly by: bde


# 238510fc 15-Jan-2001 Jason Evans <jasone@FreeBSD.org>

Implement condition variables.


# 5192404a 05-Jan-2001 John Baldwin <jhb@FreeBSD.org>

Protect p_nice and P_TRACED in psignal() above the switch statement with
sched_lock.


# 3e6831f5 02-Jan-2001 John Baldwin <jhb@FreeBSD.org>

The previous commit wasn't entirely correct. At least one goto to the
out: label in psignal() did not grab sched_lock before trying to release
it. Also, the previous version had several cases where it grabbed
sched_lock before jumping to out: unneccessarily, so rework this a bit.
The runfast: and out: labels must be called with sched_lock released, and
the run: label must be called with it held. Appropriate mtx_assert()'s
have been added that should catch any bugs that may still be in this
code.

Noticed by: bde


# 4bfba0cf 31-Dec-2000 John Baldwin <jhb@FreeBSD.org>

Push down sched_lock in psignal(). sched_lock was being held across
recursive calls into psignal() as well as calls to signotify(),
forward_signal(), etc.


# ef829407 31-Dec-2000 John Baldwin <jhb@FreeBSD.org>

Add in a missing release of the proctree lock.

Submitted by: Sja <sakari.jalovaara@eqonline.fi>


# 98f03f90 23-Dec-2000 Jake Burkholder <jake@FreeBSD.org>

Protect proc.p_pptr and proc.p_children/p_sibling with the
proctree_lock.

linprocfs not locked pending response from informal maintainer.

Reviewed by: jhb, -smp@


# d96cfeae 16-Dec-2000 Marcel Moolenaar <marcel@FreeBSD.org>

Fix a typo that allowed signals caused by traps to be delivered
to the process when said signal is masked.

PR: 23457
Submitted by: Yasuhiko Watanabe <yasu@mrit.mei.co.jp>


# c0c25570 12-Dec-2000 Jake Burkholder <jake@FreeBSD.org>

- Change the allproc_lock to use a macro, ALLPROC_LOCK(how), instead
of explicit calls to lockmgr. Also provides macros for the flags
pased to specify shared, exclusive or release which map to the
lockmgr flags. This is so that the use of lockmgr can be easily
replaced with optimized reader-writer locks.
- Add some locking that I missed the first time.


# 1c32c37c 01-Dec-2000 John Baldwin <jhb@FreeBSD.org>

Protect p_stat with sched_lock.


# d034d459 29-Nov-2000 Marcel Moolenaar <marcel@FreeBSD.org>

Don't use p->p_sigstk.ss_flags to keep state of whether the
process is on the alternate stack or not. For compatibility
with sigstack(2) state is being updated if such is needed.

We now determine whether the process is on the alternate
stack by looking at its stack pointer. This allows a process
to siglongjmp from a signal handler on the alternate stack
to the place of the sigsetjmp on the normal stack. When
maintaining state, this would have invalidated the state
information and causing a subsequent signal to be delivered
on the normal stack instead of the alternate stack.

PR: 22286


# 553629eb 22-Nov-2000 Jake Burkholder <jake@FreeBSD.org>

Protect the following with a lockmgr lock:

allproc
zombproc
pidhashtbl
proc.p_list
proc.p_hash
nextpid

Reviewed by: jhb
Obtained from: BSD/OS and netbsd


# 7da6f977 17-Nov-2000 Jake Burkholder <jake@FreeBSD.org>

- Split the run queue and sleep queue linkage, so that a process
may block on a mutex while on the sleep queue without corrupting
it.
- Move dropping of Giant to after the acquire of sched_lock.

Tested by: John Hay <jhay@icomtek.csir.co.za>
jhb


# 20cdcc5b 15-Nov-2000 John Baldwin <jhb@FreeBSD.org>

Don't release and acquire Giant in mi_switch(). Instead, release and
acquire Giant as needed in functions that call mi_switch(). The releases
need to be done outside of the sched_lock to avoid potential deadlocks
from trying to acquire Giant while interrupts are disabled.

Submitted by: witness


# 806d7daa 09-Nov-2000 Marcel Moolenaar <marcel@FreeBSD.org>

Make MINSIGSTKSZ machine dependent, and have the sigaltstack
syscall compare against a variable sv_minsigstksz in struct
sysentvec as to properly take the size of the machine- and
ABI dependent struct sigframe into account.

The SVR4 and iBCS2 modules continue to have a minsigstksz of
8192 to preserve behavior. The real values (if different) are
not known at this time. Other ABI modules use the real
values.

The native MINSIGSTKSZ is now defined as follows:

Arch MINSIGSTKSZ
---- -----------
alpha 4096
i386 2048
ia64 12288

Reviewed by: mjacob
Suggested by: bde


# 35e0e5b3 20-Oct-2000 John Baldwin <jhb@FreeBSD.org>

Catch up to moving headers:
- machine/ipl.h -> sys/ipl.h
- machine/mutex.h -> sys/mutex.h


# 33510ef1 17-Sep-2000 Bruce Evans <bde@FreeBSD.org>

Unpessimized CURSIG(). The fast path through CURSIG() was broken in
the 128-bit sigset_t changes by moving conditionally (rarely) executed
code to the beginning where it is always executed, and since this code
now involves 3 128-bit operations, the pessimization was relatively
large. This change speeds up lmbench's pipe latency benchmark by
3.5%.

Fixed style bugs in CURSIG().


# fbbeeb6c 17-Sep-2000 Bruce Evans <bde@FreeBSD.org>

Uninlined CURSIG() and unpolluted <sys/signalvar.h>. CURSIG() had become
very bloated, first with 128-bit sigset_t's, then with locking in the
SMP case, then with locking in all cases. The space bloat was probably
also time bloat, partly because the fast path through CURSIG() was
pessimized by the sigset_t changes. This change speeds up lmbench's
pipe-based latency benchmark by 4% on a Celeron. <sys/signalvar.h>
had become very polluted to support the bloat.


# 36240ea5 10-Sep-2000 Doug Rabson <dfr@FreeBSD.org>

Move the include of <sys/systm.h> so that KTR gets a declaration for
snprintf().


# 0384fff8 06-Sep-2000 Jason Evans <jasone@FreeBSD.org>

Major update to the way synchronization is done in the kernel. Highlights
include:

* Mutual exclusion is used instead of spl*(). See mutex(9). (Note: The
alpha port is still in transition and currently uses both.)

* Per-CPU idle processes.

* Interrupts are run in their own separate kernel threads and can be
preempted (i386 only).

Partially contributed by: BSDi (BSD/OS)
Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh


# 387d2c03 29-Aug-2000 Robert Watson <rwatson@FreeBSD.org>

o Centralize inter-process access control, introducing:

int p_can(p1, p2, operation, privused)

which allows specification of subject process, object process,
inter-process operation, and an optional call-by-reference privused
flag, allowing the caller to determine if privilege was required
for the call to succeed. This allows jail, kern.ps_showallprocs and
regular credential-based interaction checks to occur in one block of
code. Possible operations are P_CAN_SEE, P_CAN_SCHED, P_CAN_KILL,
and P_CAN_DEBUG. p_can currently breaks out as a wrapper to a
series of static function checks in kern_prot, which should not
be invoked directly.

o Commented out capabilities entries are included for some checks.

o Update most inter-process authorization to make use of p_can() instead
of manual checks, PRISON_CHECK(), P_TRESPASS(), and
kern.ps_showallprocs.

o Modify suser{,_xxx} to use const arguments, as it no longer modifies
process flags due to the disabling of ASU.

o Modify some checks/errors in procfs so that ENOENT is returned instead
of ESRCH, further improving concealment of processes that should not
be visible to other processes. Also introduce new access checks to
improve hiding of processes for procfs_lookup(), procfs_getattr(),
procfs_readdir(). Correct a bug reported by bp concerning not
handling the CREATE case in procfs_lookup(). Remove volatile flag in
procfs that caused apparently spurious qualifier warnigns (approved by
bde).

o Add comment noting that ktrace() has not been updated, as its access
control checks are different from ptrace(), whereas they should
probably be the same. Further discussion should happen on this topic.

Reviewed by: bde, green, phk, freebsd-security, others
Approved by: bde
Obtained from: TrustedBSD Project


# 31c8f3f0 25-Aug-2000 Marcel Moolenaar <marcel@FreeBSD.org>

Make this file compile again when COMPAT_43 has not been
defined. This boils down to conditionally compile the
old signal syscalls.

We might want to extend the types in syscalls.master to
make these syscalls conditionally on something more
appropriate than COMPAT_43.


# f2a2857b 11-Jul-2000 Kirk McKusick <mckusick@FreeBSD.org>

Add snapshots to the fast filesystem. Most of the changes support
the gating of system calls that cause modifications to the underlying
filesystem. The gating can be enabled by any filesystem that needs
to consistently suspend operations by adding the vop_stdgetwritemount
to their set of vnops. Once gating is enabled, the function
vfs_write_suspend stops all new write operations to a filesystem,
allows any filesystem modifying system calls already in progress
to complete, then sync's the filesystem to disk and returns. The
function vfs_write_resume allows the suspended write operations to
begin again. Gating is not added by default for all filesystems as
for SMP systems it adds two extra locks to such critical kernel
paths as the write system call. Thus, gating should only be added
as needed.

Details on the use and current status of snapshots in FFS can be
found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness
is not included here. Unless and until you create a snapshot file,
these changes should have no effect on your system (famous last words).


# e6796b67 03-Jul-2000 Kirk McKusick <mckusick@FreeBSD.org>

Move the truncation code out of vn_open and into the open system call
after the acquisition of any advisory locks. This fix corrects a case
in which a process tries to open a file with a non-blocking exclusive
lock. Even if it fails to get the lock it would still truncate the
file even though its open failed. With this change, the truncation
is done only after the lock is successfully acquired.

Obtained from: BSD/OS


# e3975643 25-May-2000 Jake Burkholder <jake@FreeBSD.org>

Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by: msmith and others


# 740a1973 23-May-2000 Jake Burkholder <jake@FreeBSD.org>

Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by: phk
Reviewed by: phk
Approved by: mdodd


# 2c9b67a8 30-Apr-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Remove unneeded #include <vm/vm_zone.h>

Generated by: src/tools/tools/kerninclude


# cb679c38 16-Apr-2000 Jonathan Lemon <jlemon@FreeBSD.org>

Introduce kqueue() and kevent(), a kernel event notification facility.


# 7c8fdcbd 02-Apr-2000 Matthew Dillon <dillon@FreeBSD.org>

Make the sigprocmask() and geteuid() system calls MP SAFE. Expand
commentary for copyin/copyout to indicate that they are MP SAFE as
well.

Reviewed by: msmith


# db6a4261 28-Mar-2000 Matthew Dillon <dillon@FreeBSD.org>

The SMP cleanup commit broke UP compiles. Make UP compiles work again.


# 36e9f877 28-Mar-2000 Matthew Dillon <dillon@FreeBSD.org>

Commit major SMP cleanups and move the BGL (big giant lock) in the
syscall path inward. A system call may select whether it needs the MP
lock or not (the default being that it does need it).

A great deal of conditional SMP code for various deadended experiments
has been removed. 'cil' and 'cml' have been removed entirely, and the
locking around the cpl has been removed. The conditional
separately-locked fast-interrupt code has been removed, meaning that
interrupts must hold the CPL now (but they pretty much had to anyway).
Another reason for doing this is that the original separate-lock for
interrupts just doesn't apply to the interrupt thread mechanism being
contemplated.

Modifications to the cpl may now ONLY occur while holding the MP
lock. For example, if an otherwise MP safe syscall needs to mess with
the cpl, it must hold the MP lock for the duration and must (as usual)
save/restore the cpl in a nested fashion.

This is precursor work for the real meat coming later: avoiding having
to hold the MP lock for common syscalls and I/O's and interrupt threads.
It is expected that the spl mechanisms and new interrupt threading
mechanisms will be able to run in tandem, allowing a slow piecemeal
transition to occur.

This patch should result in a moderate performance improvement due to
the considerable amount of code that has been removed from the critical
path, especially the simplification of the spl*() calls. The real
performance gains will come later.

Approved by: jkh
Reviewed by: current, bde (exception.s)
Some work taken from: luoqi's patch


# e5a28db9 21-Mar-2000 Paul Saab <ps@FreeBSD.org>

Add sysctl kern.coredump to enable/disable core dumps system wide.


# 762e6b85 15-Dec-1999 Eivind Eklund <eivind@FreeBSD.org>

Introduce NDFREE (and remove VOP_ABORTOP)


# a9e0361b 21-Nov-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Introduce the new function
p_trespass(struct proc *p1, struct proc *p2)
which returns zero or an errno depending on the legality of p1 trespassing
on p2.

Replace kern_sig.c:CANSIGNAL() with call to p_trespass() and one
extra signal related check.

Replace procfs.h:CHECKIO() macros with calls to p_trespass().

Only show command lines to process which can trespass on the target
process.


# da654d90 20-Nov-1999 Poul-Henning Kamp <phk@FreeBSD.org>

s/p_cred->pc_ucred/p_ucred/g


# 2e3c8fcb 16-Nov-1999 Poul-Henning Kamp <phk@FreeBSD.org>

This is a partial commit of the patch from PR 14914:

Alot of the code in sys/kern directly accesses the *Q_HEAD and *Q_ENTRY
structures for list operations. This patch makes all list operations
in sys/kern use the queue(3) macros, rather than directly accessing the
*Q_{HEAD,ENTRY} structures.

This batch of changes compile to the same object files.

Reviewed by: phk
Submitted by: Jake Burkholder <jake@checker.org>
PR: 14914


# 35a2598f 30-Oct-1999 Sean Eric Fagan <sef@FreeBSD.org>

Bail out of the process early if the coredumpfile limit is 0.

PR: kern/14540
Reviewed by: Nate Williams


# 6f841fb7 12-Oct-1999 Marcel Moolenaar <marcel@FreeBSD.org>

Don't let osigaction and osigvec accept the new signal numbers.

Fix style bugs caused by the sigset_t in general while I'm here.

Submitted by: bde


# 645682fd 11-Oct-1999 Luoqi Chen <luoqi@FreeBSD.org>

Add a per-signal flag to mark handlers registered with osigaction, so we
can provide the correct context to each signal handler.

Fix broken sigsuspend(): don't use p_oldsigmask as a flag, use SAS_OLDMASK
as we did before the linuxthreads support merge (submitted by bde).

Move ps_sigstk from to p_sigacts to the main proc structure since signal
stack should not be shared among threads.

Move SAS_OLDMASK and SAS_ALTSTACK flags from sigacts::ps_flags to proc::p_flag.
Move PS_NOCLDSTOP and PS_NOCLDWAIT flags from proc::p_flag to procsig::ps_flag.

Reviewed by: marcel, jdp, bde


# 2c42a146 29-Sep-1999 Marcel Moolenaar <marcel@FreeBSD.org>

sigset_t change (part 2 of 5)
-----------------------------

The core of the signalling code has been rewritten to operate
on the new sigset_t. No methodological changes have been made.
Most references to a sigset_t object are through macros (see
signalvar.h) to create a level of abstraction and to provide
a basis for further improvements.

The NSIG constant has not been changed to reflect the maximum
number of signals possible. The reason is that it breaks
programs (especially shells) which assume that all signals
have a non-null name in sys_signame. See src/bin/sh/trap.c
for an example. Instead _SIG_MAXSIG has been introduced to
hold the maximum signal possible with the new sigset_t.

struct sigprop has been moved from signalvar.h to kern_sig.c
because a) it is only used there, and b) access must be done
though function sigprop(). The latter because the table doesn't
holds properties for all signals, but only for the first NSIG
signals.

signal.h has been reorganized to make reading easier and to
add the new and/or modified structures. The "old" structures
are moved to signalvar.h to prevent namespace polution.

Especially the coda filesystem suffers from the change, because
it contained lines like (p->p_sigmask == SIGIO), which is easy
to do for integral types, but not for compound types.

NOTE: kdump (and port linux_kdump) must be recompiled.

Thanks to Garrett Wollman and Daniel Eischen for pressing the
importance of changing sigreturn as well.


# f3a6cf70 01-Sep-1999 Sean Eric Fagan <sef@FreeBSD.org>

Make prototype match function.


# fca666a1 31-Aug-1999 Julian Elischer <julian@FreeBSD.org>

General cleanup of core-dumping code.

Submitted by: Sean Fagan,


# c3aac50f 27-Aug-1999 Peter Wemm <peter@FreeBSD.org>

$Id$ -> $FreeBSD$


# 84300d62 23-Aug-1999 Martin Cracauer <cracauer@FreeBSD.org>

Fix a mistake in my last SA_SIGINFO commit. Processes could block
SIGKILL and SIGSTOP.

PR: kern/13293
Submitted by: dwmalone@maths.tcd.ie
Obtained from: PR had correct fix


# 87f1de5f 16-Aug-1999 Bill Fumerola <billf@FreeBSD.org>

expand_name:
use pid_t and uid_t in the declaration as that is what we are passed
fix printf formatters accordingly.

Reviewed by: green


# ce38ca0f 14-Aug-1999 Alfred Perlstein <alfred@FreeBSD.org>

Fix potential overflow, remove unnecessary bzero.
Pointed out by: green

remove redundant strlen, sprintf returns the length.

Reviewed by: peter


# 80e907a1 18-Jul-1999 Peter Wemm <peter@FreeBSD.org>

Reset SA_NOCLDWAIT on exec().

PR: kern/12669
Submitted by: Doug Ambrisko <ambrisko@whistle.com>


# aff66c54 06-Jul-1999 Martin Cracauer <cracauer@FreeBSD.org>

Implement SA_SIGINFO for i386. Thanks to Bruce Evans for much more
than a review, this was a nice puzzle.

This is supposed to be binary and source compatible with older
applications that access the old FreeBSD-style three arguments to a
signal handler.

Except those applications that access hidden signal handler arguments
bejond the documented third one. If you have applications that do,
please let me know so that we take the opportunity to provide the
functionality they need in a documented manner.

Also except application that use 'struct sigframe' directly. You need
to recompile gdb and doscmd. `make world` is recommended.

Example program that demonstrates how SA_SIGINFO and old-style FreeBSD
handlers (with their three args) may be used in the same process is at
http://www3.cons.org/tmp/fbsd-siginfo.c

Programs that use the old FreeBSD-style three arguments are easy to
change to SA_SIGINFO (although they don't need to, since the old style
will still work):

Old args to signal handler:
void handler_sn(int sig, int code, struct sigcontext *scp)

New args:
void handler_si(int sig, siginfo_t *si, void *third)
where:
old:code == new:second->si_code
old:scp == &(new:si->si_scp) /* Passed by value! */

The latter is also pointed to by new:third, but accessing via
si->si_scp is preferred because it is type-save.

FreeBSD implementation notes:
- This is just the framework to make the interface POSIX compatible.
For now, no additional functionality is provided. This is supposed
to happen now, starting with floating point values.
- We don't use 'sigcontext_t.si_value' for now (POSIX meant it for
realtime-related values).
- Documentation will be updated when new functionality is added and
the exact arguments passed are determined. The comments in
sys/signal.h are meant to be useful.

Reviewed by: BDE


# 3d177f46 03-May-1999 Bill Fumerola <billf@FreeBSD.org>

Add sysctl descriptions to many SYSCTL_XXXs

PR: kern/11197
Submitted by: Adrian Chadd <adrian@FreeBSD.org>
Reviewed by: billf(spelling/style/minor nits)
Looked at by: bde(style)


# 75c13541 28-Apr-1999 Poul-Henning Kamp <phk@FreeBSD.org>

This Implements the mumbled about "Jail" feature.

This is a seriously beefed up chroot kind of thing. The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.

For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact: "real virtual servers".

Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.

Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.

It generally does what one would expect, but setting up a jail
still takes a little knowledge.

A few notes:

I have no scripts for setting up a jail, don't ask me for them.

The IP number should be an alias on one of the interfaces.

mount a /proc in each jail, it will make ps more useable.

/proc/<pid>/status tells the hostname of the prison for
jailed processes.

Quotas are only sensible if you have a mountpoint per prison.

There are no privisions for stopping resource-hogging.

Some "#ifdef INET" and similar may be missing (send patches!)

If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!

Tools, comments, patches & documentation most welcome.

Have fun...

Sponsored by: http://www.rndassociates.com/
Run for almost a year by: http://www.servetheweb.com/


# 88c5ea45 25-Jan-1999 Julian Elischer <julian@FreeBSD.org>

Enable Linux threads support by default.
This takes the conditionals out of the code that has been tested by
various people for a while.
ps and friends (libkvm) will need a recompile as some proc structure
changes are made.

Submitted by: "Richard Seaman, Jr." <dick@tar.com>


# 219cbf59 09-Jan-1999 Eivind Eklund <eivind@FreeBSD.org>

KNFize, by bde.


# 5526d2d9 08-Jan-1999 Eivind Eklund <eivind@FreeBSD.org>

Split DIAGNOSTIC -> DIAGNOSTIC, INVARIANTS, and INVARIANT_SUPPORT as
discussed on -hackers.

Introduce 'KASSERT(assertion, ("panic message", args))' for simple
check + panic.

Reviewed by: msmith


# 6626c604 18-Dec-1998 Julian Elischer <julian@FreeBSD.org>

Reviewed by: Luoqi Chen, Jordan Hubbard
Submitted by: "Richard Seaman, Jr." <lists@tar.com>
Obtained from: linux :-)

Code to allow Linux Threads to run under FreeBSD.

By default not enabled
This code is dependent on the conditional
COMPAT_LINUX_THREADS (suggested by Garret)
This is not yet a 'real' option but will be within some number of hours.


# 0bfe2990 01-Dec-1998 Eivind Eklund <eivind@FreeBSD.org>

Check return value of malloc() in expand_name.

Reviewed by: sef


# 831d27a9 11-Nov-1998 Don Lewis <truckman@FreeBSD.org>

Installed the second patch attached to kern/7899 with some changes suggested
by bde, a few other tweaks to get the patch to apply cleanly again and
some improvements to the comments.

This change closes some fairly minor security holes associated with
F_SETOWN, fixes a few bugs, and removes some limitations that F_SETOWN
had on tty devices. For more details, see the description on the PR.

Because this patch increases the size of the proc and pgrp structures,
it is necessary to re-install the includes and recompile libkvm,
the vinum lkm, fstat, gcore, gdb, ipfilter, ps, top, and w.

PR: kern/7899
Reviewed by: bde, elvind


# 2c2a0cf1 21-Oct-1998 John Polstra <jdp@FreeBSD.org>

Eliminate a superfluous comment.


# c2844275 14-Sep-1998 John Polstra <jdp@FreeBSD.org>

Remove includes that are no longer needed, now that the core dumping
code has been moved into the respective imgact_xxx.c sources.


# 22d4b0fb 13-Sep-1998 John Polstra <jdp@FreeBSD.org>

Add provisions for variant core dump file formats, depending on the
object format of the executable being dumped. This is the first
step toward producing ELF core dumps in the proper format. I will
commit the code to generate the ELF core dumps Real Soon Now. In
the meantime, ELF executables won't dump core at all. That is
probably no less useful than dumping a.out-style core dumps as they
have done until now.

Submitted by: Alex <garbanzo@hooked.net> (with very minor changes by me)


# 57308494 28-Jul-1998 Joerg Wunsch <joerg@FreeBSD.org>

Make the logging of abnormally exiting processes optional by a sysctl.
PR: kern/1711
Submitted by: Nick Sayer <nsayer@kfu.com>


# a23d65bf 14-Jul-1998 Bruce Evans <bde@FreeBSD.org>

Cast pointers to uintptr_t/intptr_t instead of to u_long/long,
respectively. Most of the longs should probably have been
u_longs, but this changes is just to prevent warnings about
casts between pointers and integers of different sizes, not
to fix poorly chosen types.


# c5edb423 08-Jul-1998 Sean Eric Fagan <sef@FreeBSD.org>

Add support for run-time configuration of core file names. In a nutshell,
you can specify the corefile name by using:

sysctl -w kern.corefile="format"

where format is a pathname (relative or absolute -- default is "%N.core"),
with "%N" (process name), "%P" (process ID), and "%U" (user ID) formats.

Reviewed by: Mike Smith, with strong requests by Julian :)


# c87e2930 28-Jun-1998 David Greenman <dg@FreeBSD.org>

Added a sysctl variable kern.sugid_coredump for controlling coredump
behavior of setuid/setgid binaries that defaults to 0 (coredump disabled).


# ecbb00a2 07-Jun-1998 Doug Rabson <dfr@FreeBSD.org>

This commit fixes various 64bit portability problems required for
FreeBSD/alpha. The most significant item is to change the command
argument to ioctl functions from int to u_long. This change brings us
inline with various other BSD versions. Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.

The prototype FreeBSD/alpha machdep will follow in a couple of days
time.


# 3163861c 03-Mar-1998 Tor Egge <tegge@FreeBSD.org>

Forward the signal if the process runs on a different CPU. This reduces
the signal handling latency for cpu-bound processes that performs very
few system calls.

The IPI for forcing an additional software trap is no longer dependent upon
BETTER_CLOCK being defined.


# 0b08f5f7 05-Feb-1998 Eivind Eklund <eivind@FreeBSD.org>

Back out DIAGNOSTIC changes.


# 47cfdb16 04-Feb-1998 Eivind Eklund <eivind@FreeBSD.org>

Turn DIAGNOSTIC into a new-style option.


# 5591b823d 16-Dec-1997 Eivind Eklund <eivind@FreeBSD.org>

Make COMPAT_43 and COMPAT_SUNOS new-style options.


# 2a024a2b 05-Dec-1997 Sean Eric Fagan <sef@FreeBSD.org>

Changes to allow event-based process monitoring and control.


# cb226aaa 06-Nov-1997 Poul-Henning Kamp <phk@FreeBSD.org>

Move the "retval" (3rd) parameter from all syscall functions and put
it in struct proc instead.

This fixes a boatload of compiler warning, and removes a lot of cruft
from the sources.

I have not removed the /*ARGSUSED*/, they will require some looking at.

libkvm, ps and other userland struct proc frobbing programs will need
recompiled.


# 245f17d4 13-Sep-1997 Joerg Wunsch <joerg@FreeBSD.org>

Implement SA_NOCLDWAIT.

The implementation is done (unlike what i've originally been
contemplating) by reparenting kids of processes that have the
appropriate bit set to PID 1, and let PID 1 handle the zombie. This
is far less problematical than what would seem to be ``doing it
right'', for a number of reasons.

Of our currently shipping PID-1-intended programs, 50 % fail the above
assumption. ;-) (Read this: sysinstall doesn't do it right. This is
no problem as long as no program called by sysinstall actually uses
SA_NOCLDWAIT.)

ToDo: . clarify the correct SA_* flag inheritance, compared
to other systems,
. decide whether the compat cruft (osigvec(9)) should
deal with new system additions or not,
. merge OpenBSD's SA_SIGINFO implementation. ;)
Reviewed by: bde


# e4ba6a82 02-Sep-1997 Bruce Evans <bde@FreeBSD.org>

Removed unused #includes.


# 8a2d9f50 25-Aug-1997 Bruce Evans <bde@FreeBSD.org>

Finished staticizing.


# 3ac4d1ef 22-Mar-1997 Bruce Evans <bde@FreeBSD.org>

Don't #include <sys/fcntl.h> in <sys/file.h> if KERNEL is defined.
Fixed everything that depended on getting fcntl.h stuff from the wrong
place. Most things don't depend on file.h stuff at all.


# 6875d254 22-Feb-1997 Peter Wemm <peter@FreeBSD.org>

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 996c772f 09-Feb-1997 John Dyson <dyson@FreeBSD.org>

This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.

Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>


# 1130b656 14-Jan-1997 Jordan K. Hubbard <jkh@FreeBSD.org>

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# e6eeb36d 29-Nov-1996 Bruce Evans <bde@FreeBSD.org>

Fixed sigaction() for SIGKILL and SIGSTOP. Reading the old action now
succeeds. Writing an action now succeeds iff the handler isn't changed.
(POSIX allows attempts to change the handler to be ignored or cause an
error. Changing other parts of the action is allowed (except attempts
to mask unmaskable signals are silently ignored as usual).)

Found by: NIST-PCTS


# 8713ad74 18-Oct-1996 David Greenman <dg@FreeBSD.org>

Kill unnecessary test in coredump() that wasn't removed in rev 1.19
when the check for P_SUGID was added.


# 3d1b21c6 09-Jul-1996 Andrey A. Chernov <ache@FreeBSD.org>

Log not exited signal only, but the fact that core dumped (or not) too


# a794e791 30-Apr-1996 Bruce Evans <bde@FreeBSD.org>

Removed unnecessary #includes from <sys/imgact.h> so that it is
self-sufficient and added explicit #includes where required.


# 289ccde0 30-Mar-1996 Peter Wemm <peter@FreeBSD.org>

Correct the handling of NOCLDSTOP when using sigvec()
Make the SA_NODEFER handling more correct, previously if you called
sigaction to set a handler and had SA_NODEFER set, and manually masked
the signal itself in sa_mask, and when you read the settings back later,
you'd find SA_NODEFER incorrectly cleared.

Pointed out by: bde


# dedc04fe 15-Mar-1996 Peter Wemm <peter@FreeBSD.org>

Actually implement SA_RESETHAND - some of the sigaction code recognised it
but didn't actually do anything with it (*blush*).

This should fix bde's test case where the test program set SA_RESETHAND
and when reading it back, it was gone.

Tweak/optimize SA_NODEFER so that the implementation is a little simpler
and does not incur (slight) overhead for every signal at delivery time.


# edbfedac 11-Mar-1996 Peter Wemm <peter@FreeBSD.org>

Import 4.4BSD-Lite2 onto the vendor branch, note that in the kernel, all
files are off the vendor branch, so this should not change anything.

A "U" marker generally means that the file was not changed in between
the 4.4Lite and Lite-2 releases, and does not need a merge. "C" generally
means that there was a change.
[note new unused (in this form) syscalls.conf, to be 'cvs rm'ed]


# b75356e1 10-Mar-1996 Jeffrey Hsu <hsu@FreeBSD.org>

From Lite2: proc LIST changes.
Reviewed by: david & bde


# 8674077a 10-Mar-1996 Jeffrey Hsu <hsu@FreeBSD.org>

From Lite2: change code parameter to u_long and initialize ps_sig.
Reviewed by: davidg & bde


# d66a5066 02-Mar-1996 Peter Wemm <peter@FreeBSD.org>

Mega-commit for Linux emulator update.. This has been stress tested under
netscape-2.0 for Linux running all the Java stuff. The scrollbars are now
working, at least on my machine. (whew! :-)

I'm uncomfortable with the size of this commit, but it's too
inter-dependant to easily seperate out.

The main changes:

COMPAT_LINUX is *GONE*. Most of the code has been moved out of the i386
machine dependent section into the linux emulator itself. The int 0x80
syscall code was almost identical to the lcall 7,0 code and a minor tweak
allows them to both be used with the same C code. All kernels can now
just modload the lkm and it'll DTRT without having to rebuild the kernel
first. Like IBCS2, you can statically compile it in with "options LINUX".

A pile of new syscalls implemented, including getdents(), llseek(),
readv(), writev(), msync(), personality(). The Linux-ELF libraries want
to use some of these.

linux_select() now obeys Linux semantics, ie: returns the time remaining
of the timeout value rather than leaving it the original value.

Quite a few bugs removed, including incorrect arguments being used in
syscalls.. eg: mixups between passing the sigset as an int, vs passing
it as a pointer and doing a copyin(), missing return values, unhandled
cases, SIOC* ioctls, etc.

The build for the code has changed. i386/conf/files now knows how
to build linux_genassym and generate linux_assym.h on the fly.

Supporting changes elsewhere in the kernel:

The user-mode signal trampoline has moved from the U area to immediately
below the top of the stack (below PS_STRINGS). This allows the different
binary emulations to have their own signal trampoline code (which gets rid
of the hardwired syscall 103 (sigreturn on BSD, syslog on Linux)) and so
that the emulator can provide the exact "struct sigcontext *" argument to
the program's signal handlers.

The sigstack's "ss_flags" now uses SS_DISABLE and SS_ONSTACK flags, which
have the same values as the re-used SA_DISABLE and SA_ONSTACK which are
intended for sigaction only. This enables the support of a SA_RESETHAND
flag to sigaction to implement the gross SYSV and Linux SA_ONESHOT signal
semantics where the signal handler is reset when it's triggered.

makesyscalls.sh no longer appends the struct sysentvec on the end of the
generated init_sysent.c code. It's a lot saner to have it in a seperate
file rather than trying to update the structure inside the awk script. :-)

At exec time, the dozen bytes or so of signal trampoline code are copied
to the top of the user's stack, rather than obtaining the trampoline code
the old way by getting a clone of the parent's user area. This allows
Linux and native binaries to freely exec each other without getting
trampolines mixed up.


# 729b1e51 30-Jan-1996 David Greenman <dg@FreeBSD.org>

Improved killproc() log message and made it and the other similar message
tolerant of p_ucred being invalid. Starting using killproc() where
appropriate.


# db6a20e2 03-Jan-1996 Garrett Wollman <wollman@FreeBSD.org>

Converted two options over to the new scheme: USER_LDT and KTRACE.


# 87b6de2b 14-Dec-1995 Poul-Henning Kamp <phk@FreeBSD.org>

A Major staticize sweep. Generates a couple of warnings that I'll deal
with later.
A number of unused vars removed.
A number of unused procs removed or #ifdefed.


# efeaf95a 06-Dec-1995 David Greenman <dg@FreeBSD.org>

Untangled the vm.h include file spaghetti.


# 7bcb7905 18-Nov-1995 Bruce Evans <bde@FreeBSD.org>

Cleaned up SA_NODEFER changes.

Added prototypes.


# d2d3e875 11-Nov-1995 Bruce Evans <bde@FreeBSD.org>

Included <sys/sysproto.h> to get central declarations for syscall args
structs and prototypes for syscalls.

Ifdefed duplicated decentralized declarations of args structs. It's
convenient to have this visible but they are hard to maintain. Some
are already different from the central declarations. 4.4lite2 puts
them in comments in the function headers but I wanted to avoid the
large changes for that.


# 1e41c1b5 19-Oct-1995 Steven Wallace <swallace@FreeBSD.org>

Implement SA_NODEFER sa_flag for sigaction():
Add SA_NODEFER define to signal.h
Add ps_nodefer field to struct sigacts in signalvar.h.
Add code to kern_sig.c to handle SA_NODEFER.

If flag is set, when the signal is delivered, it is not masked automatically
from receiving the same signal again.

Reviewed by: wollman, bde


# 9b2e5354 30-May-1995 Rodney W. Grimes <rgrimes@FreeBSD.org>

Remove trailing whitespace.


# b5e8ce9f 16-Mar-1995 Bruce Evans <bde@FreeBSD.org>

Add and move declarations to fix all of the warnings from `gcc -Wimplicit'
(except in netccitt, netiso and netns) and most of the warnings from
`gcc -Wnested-externs'. Fix all the bugs found. There were no serious
ones.


# aa98692f 28-Jan-1995 Andreas Schulz <ats@FreeBSD.org>

Correct a name of one structure member in the sigaltstack structure.
Now it matches the man page and also the only other commercial implementation
i have found so far ( Solaris 2.x).
Changed the name from ss_base to ss_sp.


# 08ee5d13 06-Nov-1994 Andrey A. Chernov <ache@FreeBSD.org>

Security nitpicking: don't make *.core world readable


# d93f860c 09-Oct-1994 Poul-Henning Kamp <phk@FreeBSD.org>

Cosmetics. related to getting prototypes into view.


# c364e17e 29-Sep-1994 Andrey A. Chernov <ache@FreeBSD.org>

Log SA_CORE signals
Obtained from: FreeBSD 1.x


# bb56ec4a 25-Sep-1994 Poul-Henning Kamp <phk@FreeBSD.org>

While in the real world, I had a bad case of being swapped out for a lot of
cycles. While waiting there I added a lot of the extra ()'s I have, (I have
never used LISP to any extent). So I compiled the kernel with -Wall and
shut up a lot of "suggest you add ()'s", removed a bunch of unused var's
and added a couple of declarations here and there. Having a lap-top is
highly recommended. My kernel still runs, yell at me if you kernel breaks.


# 0b53fbe8 19-Sep-1994 Bruce Evans <bde@FreeBSD.org>

Don't use SIG_DFL or SIG_IGN for case label expressions. ANSI requires
such expressions to have integral type. "gcc -ansi -pedantic -W..."
fails to diagnose this constraint error.


# 3c4dd356 02-Aug-1994 David Greenman <dg@FreeBSD.org>

Added $Id$


# 26f9a767 25-May-1994 Rodney W. Grimes <rgrimes@FreeBSD.org>

The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.

Reviewed by: Rodney W. Grimes
Submitted by: John Dyson and David Greenman


# df8bae1d 24-May-1994 Rodney W. Grimes <rgrimes@FreeBSD.org>

BSD 4.4 Lite Kernel Sources