Cross Reference: /freebsd-current/sys/kern/kern

History log of /freebsd-current/sys/kern/kern_sig.c
Revision	Date	Author	Comments
# 53186bc1	19-Apr-2024	Konstantin Belousov <kib@FreeBSD.org>	sigqueue(2): add impl-specific flag __SIGQUEUE_TID The flag allows the pid argument to designate a thread from the calling process. The flag value is carved from the high bit of the signal number, which slightly changes the ABI of syscall. Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D44867
# 2effad53	19-Apr-2024	Konstantin Belousov <kib@FreeBSD.org>	kern_thr.c/kern_sig.c: remove sys/cdefs.h Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D44867
# 6a4616a5	06-Apr-2024	Jake Freeland <jfree@FreeBSD.org>	ktrace: Record signal violations with KTR_CAPFAIL Report the delivery of signals to processes other than self while Capsicum violation tracing with CAPFAIL_SIGNAL. Reviewed by: markj Approved by: markj (mentor) MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D40679
# 05296a0f	06-Apr-2024	Jake Freeland <jfree@FreeBSD.org>	ktrace: Record syscall violations with KTR_CAPFAIL Report syscalls that are not allowed in capability mode with CAPFAIL_SYSCALL. Reviewed by: markj Approved by: markj (mentor) MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D40678
# ed410b78	28-Nov-2023	Konstantin Belousov <kib@FreeBSD.org>	EVFILT_SIGNAL: do not use target process pointer on detach It is enough to know knlist to remove from it, and the list is autodestroyed on last removal. PR: 275286 Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D42777
# 29363fb4	23-Nov-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove ancient SCCS tags. Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script. Sponsored by: Netflix
# 0a713948	22-Nov-2023	Alexander Motin <mav@FreeBSD.org>	Replace random sbuf_printf() with cheaper cat/putc.
# 5804a162	25-Sep-2023	Konstantin Belousov <kib@FreeBSD.org>	nosys(): add kern.signosys tunable/sysctl to control SIGSYS Reviewed by: dchagin, markj Discussed with: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D41976
# b82b4ae7	25-Sep-2023	Konstantin Belousov <kib@FreeBSD.org>	sysentvec: add SV_SIGSYS flag to allow ABIs to indicate that SIGSYS is needed. Mark all native FreeBSD ABIs with the flag. This implicitly marks Linux' ABIs as not delivering SIGSYS on invalid syscall. Reviewed by: dchagin, markj Discussed with: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D41976
# 39024a89	25-Sep-2023	Konstantin Belousov <kib@FreeBSD.org>	syscalls: fix missing SIGSYS for several ENOSYS errors In particular, when the syscall number is too large, or when syscall is dynamic. For that, add nosys_sysent structure to pass fake sysent to syscall top code. Reviewed by: dchagin, markj Discussed with: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D41976
# 685dc743	16-Aug-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove $FreeBSD$: one-line .c pattern Remove /^[\s]__FBSDID$"\$FreeBSD\$"$;?\s*\n/
# 1a6238d1	05-Aug-2023	Ed Maste <emaste@FreeBSD.org>	sigexit: add a break in default case Suggested by: markj Fixes: 6edbe5616c76 ("Provide some more information for...") Sponsored by: The FreeBSD Foundation
# 6edbe561	27-Oct-2017	Ed Maste <emaste@FreeBSD.org>	Provide some more information for userland core dumps Previously the log message indicated only "(core dumped)" if a core was successfully created, or nothing if it was not. This provides insufficient information to faciliate debugging. Dtrace is no help as coredump() is static and we cannot find the return value via fbt. Expand the log message to include error return value information. Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D39942
# dfe17248	20-Jul-2023	Konstantin Belousov <kib@FreeBSD.org>	sigtd(): prefer non-stopped thread as a target for signal queue This should improve signal delivery latency and better expose the process state to the executing threads. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D41128
# aaa92413	20-Jul-2023	Konstantin Belousov <kib@FreeBSD.org>	Revert "killpg(): close a race with fork(), part 2" This reverts commits 81a37995c757b4e3ad8a5c699864197fd1ebdcf5 and 565a343ae3a30bc2973182ff8dfd2fa37d7f615f. There is still a leakage of the p_killpg_cnt, some but not all sources of which were identified. Second, and more important, is that there is a fundamental issue with blocked signals having KSI_KILLPG flag set. Queueing of such signal increments p_killpg_cnt, but it cannot be decremented until the signal is delivered. If, for instance, a single-threaded process with blocked signal receives killpg-kill and executes fork(2), the fork enter check returns with ERESTART. And since signal is blocked, the condition cannot be cleared. Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D41128
# 437e1e37	24-Jul-2023	Brooks Davis <brooks@FreeBSD.org>	kern_sig.c: include sys/jail.h per style(9) Fixes: e7228204343ac61d2316e74891e3606361a23f6e Sponsored by: DARPA
# 17cb2ac3	13-Jul-2023	Dmitry Chagin <dchagin@FreeBSD.org>	signal: Get rid of gsignal() as it not used anywhere Reviewed by: imp, kib Differential Revision: https://reviews.freebsd.org/D41007 MFC after: 1 week
# 565a343a	09-Jul-2023	Konstantin Belousov <kib@FreeBSD.org>	sigqueue_delete_set_proc(): initialize sq_proc for worklist This should fix leaks for the p_killpg_cnt counter, because sigqueue_flush() drops ksi's. Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 81a37995	15-Jun-2023	Konstantin Belousov <kib@FreeBSD.org>	killpg(): close a race with fork(), part 2 When we are sending terminating signal to the group, killpg() needs to guarantee that all group members are to be terminated (it does not need to ensure that they are terminated on return from killpg()). The pg_killsx change eliminates the largest window there, but still, if a multithreaded process is signalled, the following could happen: - thread 1 is selected for the signal delivery and gets descheduled - thread 2 waits for pg_killsx lock, obtains it and forks - thread 1 continue executing and terminates the process This scenario allows the child to escape still. To fix it, count the number of signals sent to the process with killpg(2), in p_killpg_cnt variable, which is incremented in killpg() and decremented after signal handler frame is created or in exit1() after single-threading. This way we avoid forking if the termination is due. Noted and reviewed by: markj (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D40493
# 3360b485	12-Jun-2023	Konstantin Belousov <kib@FreeBSD.org>	killpg(2): close a race with fork(2), part1 If the process group member performs fork(), the child could escape signalling from killpg(). Prevent it by introducing an sx process group lock pg_killsx which is taken interruptibly shared around fork. If there is a pending signal, do the trip through userspace with ERESTART to handle signal ASTs. The lock is taken exclusively during killpg(). The lock is also locked exclusive when the process changes group membership, to avoid escaping a signal by this means, by ensuring that the process group is stable during fork. Note that the new lock is before proctree lock, so in some situations we could only do trylocking to obtain it. This relatively simple approach cannot work for REAP_KILL, because process potentially belongs to more than one reaper tree by having sub-reapers. Reported by: dchagin Tested by: dchagin, pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D40493
# 4b59d172	15-Jun-2023	Konstantin Belousov <kib@FreeBSD.org>	killpg1(): update the herald comment Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D40493
# 0d3f1b4f	01-Jun-2023	Mark Johnston <markj@FreeBSD.org>	signal: Make the signal disposition table const No functional change intended. MFC after: 1 week
# 974be51b	22-Dec-2022	Konstantin Belousov <kib@FreeBSD.org>	Fixes for ptrace_syscallreq() Re-assign the sc local (syscall number) before moving args for SYS_syscall. Correct the audit and kdtrace hooks invocations. Fixes: 140ceb5d956bb8795a77c23d3fd5ef047b0f3c68 Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 140ceb5d	30-Nov-2022	Konstantin Belousov <kib@FreeBSD.org>	ptrace(2): add PT_SC_REMOTE remote syscall request Reviewed by: markj Discussed with: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D37590
# f0592b3c	30-Nov-2022	Konstantin Belousov <kib@FreeBSD.org>	Add a thread debugging flag TDB_BOUNDARY It indicates to a debugger that the thread is stopped at the kernel->user exit path. Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D37590
# e6feeae2	30-Nov-2022	Konstantin Belousov <kib@FreeBSD.org>	sys: rename td_coredump to td_remotereq and TDB_COREDUMPRQ to TDB_COREDUMPREQ Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D37590
# 69413598	10-Mar-2022	Mateusz Guzik <mjg@FreeBSD.org>	signal: use proc_iterate to save on work Most notably poudriere performs kill -9 -1 in jails for each port being built. This reduces the scan from hundrends of processes to literally 1. Reviewed by: jamie, markj Differential Revision: https://reviews.freebsd.org/D34522
# 0a4f2ac3	17-Aug-2022	Konstantin Belousov <kib@FreeBSD.org>	kern_sig.c: style Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D36207
# cdb58f9d	17-Aug-2022	Konstantin Belousov <kib@FreeBSD.org>	ksiginfo_tryfree(): change return type to bool The function result is already used as bool. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D36207
# cc29f221	17-Aug-2022	Konstantin Belousov <kib@FreeBSD.org>	ksiginfo_alloc(): change to directly take M_WAITOK/NOWAIT flags Also style, and remove unneeded cast. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D36207
# c53fec76	02-Aug-2022	Konstantin Belousov <kib@FreeBSD.org>	sig_suspend_threads(): remove 'sending' arg The TDA_AST flag is set on td2 unconditionally (as it was TDF_ASTPENDING before AST rework), so it is not used practically for some time. Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D36033
# f2fd7d8b	03-Aug-2022	Konstantin Belousov <kib@FreeBSD.org>	ast_sig(): add missed TDAI() Mask checked was completely wrong Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D36033
# 4fced864	22-Jul-2022	Konstantin Belousov <kib@FreeBSD.org>	sigfastblock_setpend() and fastblock_mask can be static now Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D35888
# c6d31b83	18-Jul-2022	Konstantin Belousov <kib@FreeBSD.org>	AST: rework Make most AST handlers dynamically registered. This allows to have subsystem-specific handler source located in the subsystem files, instead of making subr_trap.c aware of it. For instance, signal delivery code on return to userspace is now moved to kern_sig.c. Also, it allows to have some handlers designated as the cleanup (kclear) type, which are called both at AST and on thread/process exit. For instance, ast(), exit1(), and NFS server no longer need to be aware about UFS softdep processing. The dynamic registration also allows third-party modules to register AST handlers if needed. There is one caveat with loadable modules: the code does not make any effort to ensure that the module is not unloaded before all threads processed through AST handler in it. In fact, this is already present behavior for hwpmc.ko and ufs.ko. I do not think it is worth the efforts and the runtime overhead to try to fix it. Reviewed by: markj Tested by: emaste (arm64), pho Discussed with: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D35888
# 4493a13e	15-May-2022	Konstantin Belousov <kib@FreeBSD.org>	Do not single-thread itself when the process single-threaded some another process Since both self single-threading and remote single-threading rely on suspending the thread doing thread_single(), it cannot be mixed: thread doing thread_suspend_switch() might be subject to thread_suspend_one() and vice versa. In collaboration with: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D35310
# 02a2aacb	26-May-2022	Konstantin Belousov <kib@FreeBSD.org>	issignal(): ignore signals when process is single-threading for exit Places that will wait for curproc->p_singlethr to become zero (in the next commit, the counter of number of external single-threading is to be introduced), must wait for it interruptible, otherwise we deadlock. On the other hand, a signal delivered during this window, if directed to the waiting thread, would cause the wait loop to become a busy loop. Since we are exiting, it is safe to ignore the signals. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D35310
# d3000939	04-May-2022	Konstantin Belousov <kib@FreeBSD.org>	P2_WEXIT: avoid thread_single() for exiting process earlier before the process itself does thread_single(SINGLE_EXIT). We cannot single-thread such process in ALLPROC (external) mode, and properly detect and report the failure to do so due to the process becoming zombie is easier to prevent than handle. In collaboration with: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D35310
# 4a700f3c	25-Apr-2022	Dmitry Chagin <dchagin@FreeBSD.org>	sigtimedwait: Prevent timeout math overflows. Our kern_sigtimedwait() calculates absolute sleep timo value as 'uptime+timeout'. So, when the user specifies a big timeout value (LONG_MAX), the calculated timo can be less the the current uptime value. In that case kern_sigtimedwait() returns EAGAIN instead of EINTR, if unblocked signal was caught. While here switch to a high-precision sleep method. Reviewed by: mav, kib In collaboration with: mav Differential revision: https://reviews.freebsd.org/D34981 MFC after: 2 weeks
# c5c981d4	18-Apr-2022	Mateusz Guzik <mjg@FreeBSD.org>	signals: plug a set-but-not-used var Sponsored by: Rubicon Communications, LLC ("Netgate")
# 863070bb	04-Mar-2022	Eric van Gyzen <vangyzen@FreeBSD.org>	ksiginfo_alloc: pass M_WAITOK or M_NOWAIT to uma_zalloc It expects exactly one of those flags. A future commit will assert this. Reviewed by: rstone MFC after: 1 month Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D34451
# bb92cd7b	24-Mar-2022	Mateusz Guzik <mjg@FreeBSD.org>	vfs: NDFREE(&nd, NDF_ONLY_PNBUF) -> NDFREE_PNBUF(&nd)
# a24afbb4	08-Jan-2022	Konstantin Belousov <kib@FreeBSD.org>	Ignore debugger-injected signals left after detaching PR: 261010 Reported by: Martin Simmons <martin@lispworks.com> Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D33787
# 753c8513	04-Jan-2022	John Baldwin <jhb@FreeBSD.org>	sigev_findtd: Fix whitespace nit in argument list. Obtained from: CheriBSD
# fe27f1db	25-Dec-2021	Alexander Motin <mav@FreeBSD.org>	kern: Remove CTLFLAG_NEEDGIANT from some sysctls. MFC after: 2 weeks
# 7e1d3eef	25-Nov-2021	Mateusz Guzik <mjg@FreeBSD.org>	vfs: remove the unused thread argument from NDINIT* See b4a58fbf640409a1 ("vfs: remove cn_thread") Bump __FreeBSD_version to 1400043.
# 3d277851	21-Oct-2021	Konstantin Belousov <kib@FreeBSD.org>	sig_ast_checksusp(): mark the local p as __diagused It is only used to assert that the (current) process is locked Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 81f2e906	16-Oct-2021	Mark Johnston <markj@FreeBSD.org>	signal: Add SIG_FOREACH and refactor issignal() Add a SIG_FOREACH macro that can be used to iterate over a signal set. This is a bit cleaner and more efficient than calling sig_ffs() in a loop. The implementation is based on BIT_FOREACH_ISSET(), except that the bitset limbs are always 32 bits wide, and signal sets are 1-indexed rather than 0-indexed like bitset(9) sets. issignal() cannot really be modified to use SIG_FOREACH() directly. Take this opportunity to split the function into two explicit loops. I've always found this function hard to read and think that this change is an improvement. Remove sig_ffs(), nothing uses it now. Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32473
# 1adebca1	14-Oct-2021	Konstantin Belousov <kib@FreeBSD.org>	Style Sponsored by: The FreeBSD Foundation MFC after: 3 days
# 244ab566	04-Oct-2021	Konstantin Belousov <kib@FreeBSD.org>	Add curproc_sigkilled() Function returns an indicator that the process was killed with SIGKILL Reviewed by: imp, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32313
# dc2d0899	04-Oct-2021	Konstantin Belousov <kib@FreeBSD.org>	kern_sig.c: Remove unused SIGPROP_CANTMASK Reviewed by: imp, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32313
# 9b86d3e5	02-Oct-2021	Konstantin Belousov <kib@FreeBSD.org>	When queuing ignored signal, only abort target thread' sleep if it is inside sigwait() Reported and tested by: trasz Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32252
# f17eb93d	01-Oct-2021	Konstantin Belousov <kib@FreeBSD.org>	When sending ignored signal, arrange for zero return code from sleep Otherwise consumers get unexpected EINTR errors without seeing a properly discarded signal. Reported and tested by: trasz Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32252
# b599982b	02-Oct-2021	Konstantin Belousov <kib@FreeBSD.org>	Move td_pflags2 TDP2_SIGWAIT to td_flags TDF_SIGWAIT The flag should be accessible from non-current threads. Reviewed by: markj Tested by: trasz Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32252
# cf0ee873	12-Sep-2021	Konstantin Belousov <kib@FreeBSD.org>	Drop cloudabi According to https://github.com/NuxiNL/cloudlibc: CloudABI is no longer being maintained. It was an awesome experiment, but it never got enough traction to be sustainable. There is no reason to keep it in FreeBSD. Approved by: ed (private mail) Reviewed by: emaste Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D31923
# c4feb1ab	16-Aug-2021	Mark Johnston <markj@FreeBSD.org>	sigtimedwait: Use a unique wait channel for sleeping When a sigtimedwait(2) caller goes to sleep, it uses a wait channel of p->p_sigacts with the proc lock as the interlock. However, p_sigacts can be shared between processes if a child is created with rfork(RFSIGSHARE \| RFPROC). Thus we can end up with two threads sleeping on the same wait channel using different locks, which is not permitted. Fix the problem simply by using a process-unique wait channel, following the example of sigsuspend. The actual wait channel value is irrelevant here, sleeping threads are awoken using sleepq_abort(). Reported by: syzbot+8c417afabadb50bb8827@syzkaller.appspotmail.com Reported by: syzbot+1d89fc2a9ef92ef64fa8@syzkaller.appspotmail.com Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31563
# bc387624	05-Jun-2021	Konstantin Belousov <kib@FreeBSD.org>	Add a knob to not drop signal with default ignored or ignored actions Traditionally, BSD drops signals with the default action during send, not even putting them to the destination process queue. This semantic is not shared with other operating systems (Linux), which do queue such signals. In particular, sigtimedwait(2) and related syscalls can observe the delivery. Add a global knob kern.sig_discard_ign which can be set to false to force enqueuing of the signals with default action. Also add an ABI flag to indicate that signals should be queued. Note that it is not practical to run with the knob turned on, because almost all software that care about the delivery of such signals, is aware of the difference, and misbehaves if the signals are actually queued. The purpose of the knob as is is to allow for easier diagnostic of the programs that need the adjustments, to confirm the cause of problem. Reported by: dchagin Reviewed by: dchagin, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30675
# acced8b0	07-Jun-2021	Konstantin Belousov <kib@FreeBSD.org>	sigwait: add comment explaining EINTR/ERESTART details Reviewed by: dchagin, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30675
# afb36e28	06-Jun-2021	Konstantin Belousov <kib@FreeBSD.org>	sigwait(2) and sigtimedwait(2) must not be restarted. Reported by: dchagin Reviewed by: dchagin, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30675
# 87a64872	23-Apr-2021	Konstantin Belousov <kib@FreeBSD.org>	Add ptrace(PT_COREDUMP) It writes the core of live stopped process to the file descriptor provided as an argument. Based on the initial version from https://reviews.freebsd.org/D29691, submitted by Michał Górny <mgorny@gentoo.org>. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D29955
# 68d311b6	24-Apr-2021	Konstantin Belousov <kib@FreeBSD.org>	ptracestop: mark threads suspended there with the new TDB_SSWITCH flag This way threads in ptracestop can be discovered by debugger Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D29955
# 2fd1ffef	05-Mar-2021	Konstantin Belousov <kib@FreeBSD.org>	Stop arming kqueue timers on knote owner suspend or terminate This way, even if the process specified very tight reschedule intervals, it should be stoppable/killable. Reported and reviewed by: markj Tested by: markj, pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D29106
# dc47fdf1	05-Mar-2021	Konstantin Belousov <kib@FreeBSD.org>	Stop arming periodic process timers on suspend or terminate Reported and reviewed by: markj Tested by: markj, pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D29106
# dbec10e0	12-Mar-2021	Jonathan T. Looney <jtl@FreeBSD.org>	Fetch the sigfastblock value in syscalls that wait for signals We have seen several cases of processes which have become "stuck" in kern_sigsuspend(). When this occurs, the kernel's td_sigblock_val is set to 0x10 (one block outstanding) and the userspace copy of the word is set to 0 (unblocked). Because the kernel's cached value shows that signals are blocked, kern_sigsuspend() blocks almost all signals, which means the process hangs indefinitely in sigsuspend(). It is not entirely clear what is causing this condition to occur. However, it seems to make sense to add some protection against this case by fetching the latest sigfastblock value from userspace for syscalls which will sleep waiting for signals. Here, the change is applied to kern_sigsuspend() and kern_sigtimedwait(). Reviewed by: kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D29225
# 513320c0	11-Jan-2021	Konstantin Belousov <kib@FreeBSD.org>	sigfastblock_setpend(): do not set PEND user flag unless TDP_SIGFASTPENDING is set. User pending bit should not be set if kernel did not noted a pending signal. Reviewed by: markj Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28089
# 5844bd05	28-Dec-2020	Konstantin Belousov <kib@FreeBSD.org>	jobc: rework detection of orphaned groups. Instead of trying to maintain pg_jobc counter on each process group update (and sometimes before), just calculate the counter when needed. Still, for the benefit of the signal delivery code, explicitly mark orphaned groups as such with the new process group flag. This way we prevent bugs in the corner cases where updates to the counter were missed due to complicated configuration of p_pptr/p_opptr/real_parent (debugger). Since we need to iterate over all children of the process on exit, this change mostly affects the process group entry and leave, where we need to iterate all process group members to detect orpaned status. (For MFC, keep pg_jobc around but unused). Reported by: jhb Reviewed by: jilles Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27871
# e0d83cd3	30-Dec-2020	Konstantin Belousov <kib@FreeBSD.org>	issignal(): when handling STOP-like signals, drop sigacts mutex earlier. Reviewed by: jilles Tested by: pho MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27871
# 993a1699	30-Dec-2020	Konstantin Belousov <kib@FreeBSD.org>	Style. Improve some KASSERTs messages. Reviewed by: jilles Tested by: pho MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27871
# 203dda8a	08-Oct-2020	Konstantin Belousov <kib@FreeBSD.org>	sig_intr(9): return early if AST is not scheduled. Check td_flags for relevant AST requests lock-less. This opens the race slightly wider where sig_intr() returns false negative, but might be it is worth it. Requested by: mjg Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 0400be45	04-Oct-2020	Konstantin Belousov <kib@FreeBSD.org>	Add sig_intr(9). It gives the answer would the thread sleep according to current state of signals and suspensions. Of course the answer is racy and allows for false-negatives (no sleep when signal is delivered after process lock is dropped). Also the answer might change due to signal rescheduling among threads in multi-threaded process. Still it is the best approximation I can provide, to answering the question was the thread interrupted. Reviewed by: markj Tested by: pho, rmacklem Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D26628
# 0c82fb26	04-Oct-2020	Konstantin Belousov <kib@FreeBSD.org>	Refactor sleepq_catch_signals(). - Extract suspension check into sig_ast_checksusp() helper. - Extract signal check and calculation of the interruption errno into sig_ast_needsigchk() helper. The helpers are moved to kern_sig.c which is the proper place for signal-related code. Improve control flow in sleepq_catch_signals(), to handle ret == 0 (can sleep) and ret != 0 (interrupted) only once, by separating checking code into sleepq_check_ast_sq_locked(), which return value is interpreted at single location. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D26628
# 18f917a9	02-Sep-2020	Brooks Davis <brooks@FreeBSD.org>	Always report ENOSYS in init While rare, encountering an unimplemented system call early in init is catastrophic and difficult to debug. Even after a SIGSYS handler is registered, such configurations are problematic. As such, always report such events for pid 1 (following kern.lognosys if non-zero). Reviewed by: kevans, imp Obtained from: CheriBSD (plus suggestions from kevans) MFC after: 1 week Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D26288
# 6fed89b1	01-Sep-2020	Mateusz Guzik <mjg@FreeBSD.org>	kern: clean up empty lines in .c and .h files
# feabaaf9	24-Aug-2020	Mateusz Guzik <mjg@FreeBSD.org>	cache: drop the always curthread argument from reverse lookup routines Note VOP_VPTOCNP keeps getting it as temporary compatibility for zfs. Tested by: pho
# 773e541e	20-Aug-2020	Warner Losh <imp@FreeBSD.org>	Use devctl.h instead of bus.h to reduce newbus pollution. There's no need for these parts of the kernel to know about newbus, so narrow what is included to devctl.h for device_notify_*. Suggested by: kib@
# e298466e	29-May-2020	Andriy Gapon <avg@FreeBSD.org>	corefile_open_last: don't keep a locked vnode while locking other ones Consider this scenario: - kern.corefile=/var/coredumps/%N.%U.%I.core - multiple processes with the same name crash at the same time It's possible that one process selects existing file N as oldvp while it keeps looking for an unused file number. Another process scans through files and stumbles upon N. That process would be blocked on the vnode lock while holding the directory vnode exclusively locked. The first process would, thus, get blocked on the directory's vnode lock. More generally, holding a file's vnode lock (oldvp) while trying to lock its directory (for the next lookup) is a violation of the vnode locking order. I have observed this deadlock in the wild. So, the change to keep oldvp "opened" but unlocked and to lock it again only if it's to be returned as the result. As kib noted, an alternative would be to keep the directory locked and to use VOP_LOOKUP directly for scanning through existing core files. Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D25027
# fb3c434b	11-May-2020	Konstantin Belousov <kib@FreeBSD.org>	sigfastblock: fix delivery of the pending signals in single-threaded processes. If single-threaded process receives a signal during critical section established by sigfastblock(2) word, unblock did not caused signal delivery because sigfastblock(SIGFASTBLOCK_UNBLOCK) failed to request ast handling of the pending signals. Set TDF_ASTPENDING \| TDF_NEEDSIGCHK on unblock or when kernel forces end of sigfastblock critical section, to cause syscall exit to recheck and deliver any signal pending. Reported by: corydoras@ridiculousfish.com PR: 246385 Sponsored by: The FreeBSD Foundation
# 59838c1a	01-Apr-2020	John Baldwin <jhb@FreeBSD.org>	Retire procfs-based process debugging. Modern debuggers and process tracers use ptrace() rather than procfs for debugging. ptrace() has a supserset of functionality available via procfs and new debugging features are only added to ptrace(). While the two debugging services share some fields in struct proc, they each use dedicated fields and separate code. This results in extra complexity to support a feature that hasn't been enabled in the default install for several years. PR: 244939 (exp-run) Reviewed by: kib, mjg (earlier version) Relnotes: yes Differential Revision: https://reviews.freebsd.org/D23837
# 00ebd809	10-Mar-2020	Konstantin Belousov <kib@FreeBSD.org>	Fix signal delivery might be on sigfastblock clearing. When clearing sigfastblock, either by sigfastblock(UNSETPTR) call or implicitly on execve(2), kernel must check for pending signals and reschedule them if needed. E.g. on execve, all other threads are terminated, and current thread fast block pointer is cleaned. If any signal was left pending, it can now be delivered to the current thread, and we should prepare for ast() on return to userspace to notice the signals. Reported and tested by: pho Sponsored by: The FreeBSD Foundation
# 0bc52b0b	10-Mar-2020	Konstantin Belousov <kib@FreeBSD.org>	Return reschedule_signals() to being static again. It was used after sigfastblock_setpend() call in in ast() when current thread fast-blocks signals. Add a flag to sigfastblock_setpend() to request reschedule, and remove the direct use of the function from subr_trap.c Tested by: pho Sponsored by: The FreeBSD Foundation
# 7029da5c	26-Feb-2020	Pawel Biernacki <kaktus@FreeBSD.org>	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718
# fe20aaec	22-Feb-2020	Ryan Libby <rlibby@FreeBSD.org>	sys/kern: quiet -Wwrite-strings Quiet a variety of Wwrite-strings warnings in sys/kern at low-impact sites. This patch avoids addressing certain others which would need to plumb const through structure definitions. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23798
# a113b17f	20-Feb-2020	Konstantin Belousov <kib@FreeBSD.org>	Do not read sigfastblock word on syscall entry. On machines with SMAP, fueword executes two serializing instructions which can be seen in microbenchmarks. As a measure to restore microbenchmark numbers, only read the word on the attempt to deliver signal in ast(). If the word is set, signal is not delivered and word is kept, preventing interruption of interruptible sleeps by signals until userspace calls sigfastblock(UNBLOCK) which clears the word. This way, the spurious EINTR that userspace can see while in critical section is on first interruptible sleep, if a signal is pending, and on signal posting. It is believed that it is not important for rtld and lbithr critical sections. It might be visible for the application code e.g. for the callback of dl_iterate_phdr(3), but again the belief is that the non-compliance is acceptable. Most important is that the retry of the sleeping syscall does not interrupt unless additional signal is posted. For now I added the knob kern.sigfastblock_fetch_always to enable the word read on syscall entry to be able to diagnose possible issues due to spurious EINTR. While there, do some code restructuting to have all sigfastblock() handling located in kern_sig.c. Reviewed by: jeff Discussed with: mjg Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D23622
# 146fc63f	09-Feb-2020	Konstantin Belousov <kib@FreeBSD.org>	Add a way to manage thread signal mask using shared word, instead of syscall. A new syscall sigfastblock(2) is added which registers a uint32_t variable as containing the count of blocks for signal delivery. Its content is read by kernel on each syscall entry and on AST processing, non-zero count of blocks is interpreted same as the signal mask blocking all signals. The biggest downside of the feature that I see is that memory corruption that affects the registered fast sigblock location, would cause quite strange application misbehavior. For instance, the process would be immune to ^C (but killable by SIGKILL). With consumers (rtld and libthr added), benchmarks do not show a slow-down of the syscalls in micro-measurements, and macro benchmarks like buildworld do not demonstrate a difference. Part of the reason is that buildworld time is dominated by compiler, and clang already links to libthr. On the other hand, small utilities typically used by shell scripts have the total number of syscalls cut by half. The syscall is not exported from the stable libc version namespace on purpose. It is intended to be used only by our C runtime implementation internals. Tested by: pho Disscussed with: cem, emaste, jilles Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D12773
# 7739d927	01-Feb-2020	Mateusz Guzik <mjg@FreeBSD.org>	cache: replace kern___getcwd with vn_getcwd The previous routine was resulting in extra data copies most notably in linux_getcwd.
# 3ff65f71	30-Jan-2020	Mateusz Guzik <mjg@FreeBSD.org>	Remove duplicated empty lines from kern/*.c No functional changes.
# b249ce48	03-Jan-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: drop the mostly unused flags argument from VOP_UNLOCK Filesystems which want to use it in limited capacity can employ the VOP_UNLOCK_FLAGS macro. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D21427
# 61a74c5c	15-Dec-2019	Jeff Roberson <jeff@FreeBSD.org>	schedlock 1/4 Eliminate recursion from most thread_lock consumers. Return from sched_add() without the thread_lock held. This eliminates unnecessary atomics and lock word loads as well as reducing the hold time for scheduler locks. This will eventually allow for lockless remote adds. Discussed with: kib Reviewed by: jhb Tested by: pho Differential Revision: https://reviews.freebsd.org/D22626
# 34ad5ac2	13-Dec-2019	Edward Tomasz Napierala <trasz@FreeBSD.org>	Add kern_kill() and use it in Linuxulator. It's just a cleanup, no functional changes. Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22645
# 0cc9fb75	07-Dec-2019	Konstantin Belousov <kib@FreeBSD.org>	Only return EPERM from kill(-pid) when no process was signalled. As mandated by POSIX. Also clarify the kill(2) manpage. While there, restructure the code in killpg1() to use helper which keeps overall state of the process list iteration in the killpg1_ctx structued, later used to infer the error returned. Reported by: amdmi3 Reviewed by: jilles Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D22621
# ef401a85	27-Nov-2019	Konstantin Belousov <kib@FreeBSD.org>	Requested and tested by: kevans Reviewed by: kevans (previous version), markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D22546
# 079c5b9e	25-Sep-2019	Kyle Evans <kevans@FreeBSD.org>	rfork(2): add RFSPAWN flag When RFSPAWN is passed, rfork exhibits vfork(2) semantics but also resets signal handlers in the child during creation to avoid a point of corruption of parent state from the child. This flag will be used by posix_spawn(3) to handle potential signal issues. Reviewed by: jilles, kib Differential Revision: https://reviews.freebsd.org/D19058
# 7e097daa	11-Aug-2019	Konstantin Belousov <kib@FreeBSD.org>	Only enable COMPAT_43 changes for syscalls ABI for a.out processes. Reviewed by: imp, jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D21200
# 91898857	29-Jul-2019	Mark Johnston <markj@FreeBSD.org>	Avoid relying on header pollution from sys/refcount.h. MFC after: 3 days Sponsored by: The FreeBSD Foundation
# d26d63a4	17-Jul-2019	Alan Somers <asomers@FreeBSD.org>	fusefs: multiple interruptility improvements 1) Don't explicitly not mask SIGKILL. kern_sigprocmask won't allow it to be masked, anyway. 2) Fix an infinite loop bug. If a process received both a maskable signal lower than 9 (like SIGINT) and then received SIGKILL, fticket_wait_answer would spin. msleep would immediately return EINTR, but cursig would return SIGINT, so the sleep would get retried. Fix it by explicitly checking whether SIGKILL has been received. 3) Abandon the sig_isfatal optimization introduced by r346357. That optimization would cause fticket_wait_answer to return immediately, without waiting for a response from the server, if the process were going to exit anyway. However, it's vulnerable to a race: 1) fatal signal is received while fticket_wait_answer is sleeping. 2) fticket_wait_answer sends the FUSE_INTERRUPT operation. 3) fticket_wait_answer determines that the signal was fatal and returns without waiting for a response. 4) Another thread changes the signal to non-fatal. 5) The first thread returns to userspace. Instead of exiting, the process continues. 6) The application receives EINTR, wrongly believes that the operation was successfully interrupted, and restarts it. This could cause problems for non-idempotent operations like FUSE_RENAME. Reported by: kib (the race part) Sponsored by: The FreeBSD Foundation
# 9d3ecb7e	16-Jul-2019	Eric van Gyzen <vangyzen@FreeBSD.org>	Adds signal number format to kern.corefile Add format capability to core file names to include signal that generated the core. This can help various validation workflows where all cores should not be considered equally (SIGQUIT is often intentional and not an error unlike SIGSEGV or SIGBUS) Submitted by: David Leimbach (leimy2k@gmail.com) Reviewed by: markj MFC after: 1 week Relnotes: sysctl kern.corefile can now include the signal number Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D20970
# 89f2ab06	23-Jun-2019	Konstantin Belousov <kib@FreeBSD.org>	Switch to check for effective user id in r349320, and disable dumping into existing files for sugid processes. Despite using real user id pronounces the intent, it actually breaks suid coredumps, while not making any difference for non-sugid processes. The reason for the breakage is that non-existent core file is created with the effective uid (unless weird hacks like SUIDDIR are configured). Then, if user enabled kern.sugid_coredump, core dumping should not overwrite core files owned by effective uid, but we cannot pretend to use real uid for dumping. PR: 68905 admbugs: 358 Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 7a29e0bf	23-Jun-2019	Konstantin Belousov <kib@FreeBSD.org>	coredump: avoid writing to core files not owned by the real user. Reported by: blake frantz <trew@hick.org> PR: 68905 admbugs: 358 Sponsored by: The FreeBSD Foundation MFC after: 1 week
# ab74c843	29-May-2019	Konstantin Belousov <kib@FreeBSD.org>	Do not go into sleep in sleepq_catch_signals() when SIGSTOP from PT_ATTACH was consumed. In particular, do not clear TDP_FSTP in ptracestop() if td_wchan is non-NULL. Leave it to sleepq_catch_signal() to clear and convert zero return code to EINTR. Otherwise, per submitter report, if the PT_ATTACH SIGSTOP was delivered right after the thread was added to the sleepqueue but not yet really sleep, and cursig() caused debugger attach, the thread sleeps instead of returning to the userspace boundary with EINTR. PR: 231445 Reported by: Efi Weiss <valmarelox@gmail.com> Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D20381
# 83bf5ec3	25-Apr-2019	John Baldwin <jhb@FreeBSD.org>	Remove p_code from struct proc. Contrary to the comments, it was never used by core dumps or debuggers. Instead, it used to hold the signal code of a pending signal, but that was replaced by the 'ksi_code' member of ksiginfo_t when signal information was reworked in 7.0. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D20047
# ebbfe00e	24-Apr-2019	Alan Somers <asomers@FreeBSD.org>	fusefs: interruptibility improvements suggested by kib * Block stop signals in fticket_wait_answer * Hold ps_mtx while checking signal disposition * style(9) changes PR: 346357 Reported by: kib Sponsored by: The FreeBSD Foundation
# 559966d6	21-Apr-2019	Alan Somers <asomers@FreeBSD.org>	fusefs: commit missing files from r346387 PR: 346357 Sponsored by: The FreeBSD Foundation
# de2d0d29	04-Jan-2019	Kristof Provost <kp@FreeBSD.org>	Remove unneeded NULL check for td_ucred td_ucred is always set, so we don't need the ternary expression to check for it.
# 4ae4822d	29-Dec-2018	Kristof Provost <kp@FreeBSD.org>	Simplify jail ID printing on process exit As suggested by kib@, we don't need to check p_ucred, because that's only NULL during process creation, and cr_prison is never NULL.
# af8becca	29-Dec-2018	Kristof Provost <kp@FreeBSD.org>	Make kernel print jail ID when logging a process exit Kernel now includes jail ID when logging a process exit. jid is 0 for unjailed processes. Submitted by: Marie Helene Kvello-Aune <freebsd@mhka.no> Relnotes: yes Sponsored by: Modirum MDPay Differential Revision: https://reviews.freebsd.org/D18618
# c5786670	10-Dec-2018	John Baldwin <jhb@FreeBSD.org>	Don't report stale signal information for non-signal events in ptrace_lwpinfo. Once a signal's siginfo was copied to 'td_si' as part of the signal exchange in issignal(), it was never cleared. This caused future thread events that are reported as SIGTRAP events without signal information to report the stale siginfo in 'td_si'. For example, if a debugger created a new process and used SIGSTOP to stop it after PT_ATTACH, future system call entry / exit events would set PL_FLAG_SI with the SIGSTOP siginfo in pl_siginfo. This broke 'catch syscall' in current versions of gdb as it assumed PL_FLAG_SI with SIGTRAP indicates a breakpoint or single step trap. Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D18487
# affd9185	27-Nov-2018	Konstantin Belousov <kib@FreeBSD.org>	Improve sigonstack(). Avoid relying on unsigned overflow for the test. Simplify expressions to avoid duplicate check for the range. Style. Add herald comment. Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D18361
# a70e9a13	04-Aug-2018	Konstantin Belousov <kib@FreeBSD.org>	Swap in WKILLED processes. Swapped-out process that is WKILLED must be swapped in as soon as possible. The reason is that such process can be killed by OOM and its pages can be only freed if the process exits. To exit, the kernel stack of the process must be mapped. When allocating pages for the stack of the WKILLED process on swap in, use VM_ALLOC_SYSTEM requests to increase the chance of the allocation to succeed. Add counter of the swapped out processes to avoid unneeded iteration over the allprocs list when there is no work to do, reducing the allproc_lock ownership. Reviewed by: alc, markj (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D16489
# 6040822c	30-Jul-2018	Alan Somers <asomers@FreeBSD.org>	Make timespecadd(3) and friends public The timespecadd(3) family of macros were imported from NetBSD back in r35029. However, they were initially guarded by #ifdef _KERNEL. In the meantime, we have grown at least 28 syscalls that use timespecs in some way, leading many programs both inside and outside of the base system to redefine those macros. It's better just to make the definitions public. Our kernel currently defines two-argument versions of timespecadd and timespecsub. NetBSD, OpenBSD, and FreeDesktop.org's libbsd, however, define three-argument versions. Solaris also defines a three-argument version, but only in its kernel. This revision changes our definition to match the common three-argument version. Bump _FreeBSD_version due to the breaking KPI change. Discussed with: cem, jilles, ian, bde Differential Revision: https://reviews.freebsd.org/D14725
# f1fe1e02	15-Jul-2018	Mariusz Zaborski <oshogbo@FreeBSD.org>	Extend amount of possible coredumps from 10 to 100000 when using index format. The amount of digits in the name of corefile is assigned dynamically. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D16118
# 6cad1a5d	04-Jul-2018	Mariusz Zaborski <oshogbo@FreeBSD.org>	Add description to debug.ncores sysctl. Reviewed by: bcr Differential Revision: https://reviews.freebsd.org/D16123
# 0dea6e3c	01-Jul-2018	Mariusz Zaborski <oshogbo@FreeBSD.org>	core(5): overwrite the oldest core dump The '%I' format in the kern.corefile sysctl limits the number of core files that a process can generate to the number stored in the debug.ncores sysctl. The '%I' format is replaced by the single digit index. Previously, if all indexes were taken the kernel would overwrite only a core file with the highest index in a filename. Currently the system will create a new core file if there is a free index or if all slots are taken it will overwrite the oldest one. Reviewed by: kib(code), bcr (updating) Differential Revision: https://reviews.freebsd.org/D15991 Differential Revision: https://reviews.freebsd.org/D16084
# 349fcda4	26-Jun-2018	Warner Losh <imp@FreeBSD.org>	Fix devctl generation for core files. We have a problem with vn_fullpath_global when the file exists. Work around it by printing the full path if the core file name starts with /, or current working directory followed by the filename if not. Sponsored by: Netflix Differential Review: https://reviews.freebsd.org/D16026
# 6e22bbf6	21-Jun-2018	Konstantin Belousov <kib@FreeBSD.org>	fork: avoid endless wait with PTRACE_FORK and RFSTOPPED. An RFSTOPPED thread can't clean TDB_STOPATFORK, which is done in the fork_return() in its context, so parent is stuck forever. Triggered when trying to ptrace linux process. Instead of waiting for the new thread to clear TDB_STOPATFORK, tag it as traced and reparent to the debugger in do_fork(), and let it only notify the debugger when run. Submitted by: Yanko Yankulov <yanko.yankulov@gmail.com> Reviewed by: jhb MFC after: 1 week X-MFC-Note: keep p_dbgwait placeholder intact Differential revision: https://reviews.freebsd.org/D15857
# ddd4d15e	18-May-2018	Matt Macy <mmacy@FreeBSD.org>	signotify: don't create a stack local that isn't used on non-debug builds
# cbd92ce6	09-May-2018	Matt Macy <mmacy@FreeBSD.org>	Eliminate the overhead of gratuitous repeated reinitialization of cap_rights - Add macros to allow preinitialization of cap_rights_t. - Convert most commonly used code paths to use preinitialized cap_rights_t. A 3.6% speedup in fstat was measured with this change. Reported by: mjg Reviewed by: oshogbo Approved by: sbruno MFC after: 1 month
# 6469bdcd	06-Apr-2018	Brooks Davis <brooks@FreeBSD.org>	Move most of the contents of opt_compat.h to opt_global.h. opt_compat.h is mentioned in nearly 180 files. In-progress network driver compabibility improvements may add over 100 more so this is closer to "just about everywhere" than "only some files" per the guidance in sys/conf/options. Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h is created on all architectures. Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the set of compiled files. Reviewed by: kib, cem, jhb, jtl Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14941
# fb441a88	27-Mar-2018	Konstantin Belousov <kib@FreeBSD.org>	Fix several leaks of kernel stack data through paddings. It is random collection of fixes for issues not yet corrected, reported at https://tsyrklevi.ch/clang_analyzer/freebsd_013017/. Many issues from that list were already corrected. Most of them are for compat32, old compat32 or affect both primary host ABI and compat32. The freebsd32_kldstat(), for instance, was already fixed by using malloc(M_ZERO). Patch includes correction to report the supplied version back, which is just pedantic. Reviewed by: brooks, emaste (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D14868
# 681a1b75	20-Feb-2018	Mateusz Guzik <mjg@FreeBSD.org>	Make killpg1 perform process validity checks without proc lock held.
# 6026dcd7	13-Feb-2018	Mark Johnston <markj@FreeBSD.org>	Add support for zstd-compressed user and kernel core dumps. This works similarly to the existing gzip compression support, but zstd is typically faster and gives better compression ratios. Support for this functionality must be configured by adding ZSTDIO to one's kernel configuration file. dumpon(8)'s new -Z option is used to configure zstd compression for kernel dumps. savecore(8) now recognizes and saves zstd-compressed kernel dumps with a .zst extension. Submitted by: cem (original version) Relnotes: yes Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D13101, https://reviews.freebsd.org/D13633
# 78f57a9c	08-Jan-2018	Mark Johnston <markj@FreeBSD.org>	Generalize the gzio API. We currently use a set of subroutines in kern_gzio.c to perform compression of user and kernel core dumps. In the interest of adding support for other compression algorithms (zstd) in this role without complicating the API consumers, add a simple compressor API which can be used to select an algorithm. Also change the (non-default) GZIO kernel option to not enable compressed user cores by default. It's not clear that such a default would be desirable with support for multiple algorithms implemented, and it's inconsistent in that it isn't applied to kernel dumps. Reviewed by: cem Differential Revision: https://reviews.freebsd.org/D13632
# 51369649	20-Nov-2017	Pedro F. Giffuni <pfg@FreeBSD.org>	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.
# 537d0fb1	11-Nov-2017	Mateusz Guzik <mjg@FreeBSD.org>	Use pfind_any in linux_rt_sigqueueinfo and kern_sigqueue
# 6e1619da	11-Nov-2017	Mateusz Guzik <mjg@FreeBSD.org>	Add pfind_any It looks for both regular and zombie processes. This avoids allproc relocking previously seen with pfind -> zpfind calls.
# e9445808	16-Oct-2017	Konstantin Belousov <kib@FreeBSD.org>	Re-evaluate thread' signal mask after ptracestop(). The stop drops process lock, which allows the signal mask to be changed and our selected signal might become blocked, i.e. should be returned to the process queue instead of delivery. Also, for the existing check of the process no longer having an attached debugger, we should not loose the signal, but requeue it. Reported and tested by: bdrewery Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week
# cd735d8f	16-Oct-2017	Konstantin Belousov <kib@FreeBSD.org>	Improve assertion that an ignored or blocked signal is not delivered. Split two conditions into separate asserts. Print additional details, like the signal number and action value. Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 0167b33b	16-Oct-2017	Konstantin Belousov <kib@FreeBSD.org>	Style. Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 7ceeb35b	27-Jul-2017	Konstantin Belousov <kib@FreeBSD.org>	Make it possible to request nosys logging to console. New kern.lognosys values are 1 - log to ctty 2 - log to console 3 - log to both. Inspired by: eugen Sponsored by: The FreeBSD Foundation MFC after: 1 week
# f5a077c3	12-Jun-2017	Konstantin Belousov <kib@FreeBSD.org>	Print unimplemented syscall number to the ctty on SIGSYS, if enabled by the knob kern.lognosys. Discussed with: imp Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 3 weeks X-Differential revision: https://reviews.freebsd.org/D11080
# 3e85b721	16-May-2017	Ed Maste <emaste@FreeBSD.org>	Remove register keyword from sys/ and ANSIfy prototypes A long long time ago the register keyword told the compiler to store the corresponding variable in a CPU register, but it is not relevant for any compiler used in the FreeBSD world today. ANSIfy related prototypes while here. Reviewed by: cem, jhb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D10193
# 396a0d44	12-May-2017	Konstantin Belousov <kib@FreeBSD.org>	Do not wake up sleeping thread in reschedule_signals() if the signal is blocked. The spurious wakeup might result in spurious EINTR. The reschedule_signals() function is called when the calling thread has the signal mask changed. For each newly blocked signal, we try to find a thread which might have the signal not blocked. If no such thread exists, sigtd() returns random thread, which must not be waken up. I decided that re-checking, as suggested by PR submitter, is more reasonable change than to change sigtd() interface, due to other uses of sigtd(). signotify() already performs this check. Submitted by: Duane <parakleta@darkreality.org> PR: 219228 Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 0c46712c	11-May-2017	Mark Johnston <markj@FreeBSD.org>	Let ptracestop() suspend threads sleeping in an SBDRY section. When a thread enters ptracestop(), for example because it had received SIGSTOP from ptrace(PT_ATTACH), it attempts to suspend other threads in the same process. In the case of a thread sleeping interruptibly in an SBDRY section, sig_suspend_threads() must wake the thread and allow it to reach the user-mode boundary. However, sig_suspend_threads() would erroneously avoid waking up such threads, resulting in an apparent hang. Reviewed by: kib Tested by: pho MFC after: 2 weeks Sponsored by: Dell EMC Isilon
# f19351aa	05-May-2017	Brooks Davis <brooks@FreeBSD.org>	Provide a freebsd32 implementation of sigqueue() The previous misuse of sys_sigqueue() was sending random register or stack garbage to 64-bit targets. The freebsd32 implementation preserves the sival_int member of value when signaling a 64-bit process. Document the mixed ABI implementation of union sigval and the incompability of sival_ptr with pointer integrity schemes. Reviewed by: kib, wblock MFC after: 1 week Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D10605
# 86be94fc	30-Mar-2017	Tycho Nightingale <tychon@FreeBSD.org>	Add support for capturing 'struct ptrace_lwpinfo' for signals resulting in a process dumping core in the corefile. Also extend procstat to view select members of 'struct ptrace_lwpinfo' from the contents of the note. Sponsored by: Dell EMC Isilon
# 469ec1eb	17-Mar-2017	Konstantin Belousov <kib@FreeBSD.org>	When clearing altsigstack settings on exec, do it to the right thread. Diagnosed by: smh Sponsored by: The FreeBSD Foundation MFC after: 1 week
# b4d33259	16-Mar-2017	Eric Badger <badger@FreeBSD.org>	Don't clear p_ptevents on normal SIGKILL delivery The ptrace() user has the option of discarding the signal. In such a case, p_ptevents should not be modified. If the ptrace() user decides to send a SIGKILL, ptevents will be cleared in ptracestop(). procfs events do not have the capability to discard the signal, so continue to clear the mask in that case. Reviewed by: jhb (initial revision) MFC after: 1 week Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D9939
# b38bd91f	07-Mar-2017	Eric Badger <badger@FreeBSD.org>	don't stop in issignal() if P_SINGLE_EXIT is set Suppose a traced process is stopped in ptracestop() due to receipt of a SIGSTOP signal, and is awaiting orders from the tracing process on how to handle the signal. Before sending any such orders, the tracing process exits. This should kill the traced process. But suppose a second thread handles the SIGKILL and proceeds to exit1(), calling thread_single(). The first thread will now awaken and will have a chance to check once more if it should go to sleep due to the SIGSTOP. It must not sleep after P_SINGLE_EXIT has been set; this would prevent the SIGKILL from taking effect, leaving a stopped orphan behind after the tracing process dies. Also add new tests for this condition. Reviewed by: kib MFC after: 2 weeks Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D9890
# e052a8b9	02-Mar-2017	Ed Maste <emaste@FreeBSD.org>	kern_sig.c: ANSIfy and remove archaic register keyword Sponsored by: The FreeBSD Foundation
# 82a4538f	20-Feb-2017	Eric Badger <badger@FreeBSD.org>	Defer ptracestop() signals that cannot be delivered immediately When a thread is stopped in ptracestop(), the ptrace(2) user may request a signal be delivered upon resumption of the thread. Heretofore, those signals were discarded unless ptracestop()'s caller was issignal(). Fix this by modifying ptracestop() to queue up signals requested by the ptrace user that will be delivered when possible. Take special care when the signal is SIGKILL (usually generated from a PT_KILL request); no new stop events should be triggered after a PT_KILL. Add a number of tests for the new functionality. Several tests were authored by jhb. PR: 212607 Reviewed by: kib Approved by: kib (mentor) MFC after: 2 weeks Sponsored by: Dell EMC In collaboration with: jhb Differential Revision: https://reviews.freebsd.org/D9260
# 69a28758	15-Sep-2016	Ed Maste <emaste@FreeBSD.org>	Renumber license clauses in sys/kern to avoid skipping #3
# ed6d876b	06-Sep-2016	Brooks Davis <brooks@FreeBSD.org>	Modernize the initalization of sigproptbl. Use C99 designators to set the value of each slot and the nitems macro to check for valid entries. In the process, switch to indexing by signal number rather than signal-1 for improved clarity. Obtained from: CheriBSD (a6053c5abf03a5f53bbfcdd3a26429383f67e09f) Sponsored by: DARPA, AFRL Reviewed by: kib
# fd50a707	02-Sep-2016	Brooks Davis <brooks@FreeBSD.org>	Merge from CheriBSD: Rename sigprop-table constants to SIGPROP_ from SA_ to reduce the impression of a namespace collision. Submitted by: rwatson Reviewed by: jhb, kib (slightly different versions) Obtained from: CheriBSD (814ec5771cb1cb53deba317c561de62a91ae7684) Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D7616
# 7f649dda	18-Aug-2016	Mark Johnston <markj@FreeBSD.org>	Correct a check for P2_PTRACE_FSTP in ptracestop(). MFC after: 1 day
# b7a25e63	28-Jul-2016	Konstantin Belousov <kib@FreeBSD.org>	When a debugger attaches to the process, SIGSTOP is sent to the target. Due to a way issignal() selects the next signal to deliver and report, if the simultaneous or already pending another signal exists, that signal might be reported by the next waitpid(2) call. This causes minor annoyance for debuggers, which must be prepared to take any signal as the first event, then filter SIGSTOP later. More importantly, for tools like gcore(1), which attach and then detach without processing events, SIGSTOP might leak to be delivered after PT_DETACH. This results in the process being unintentionally stopped after detach, which is fatal for automatic tools. The solution is to force SIGSTOP to be the first signal reported after the attach. Attach code is modified to set P2_PTRACE_FSTP to indicate that the attaching ritual was not yet finished, and issignal() prefers SIGSTOP in that condition. Also, the thread which handles P2_PTRACE_FSTP is made to guarantee to own p_xthread during the first waitpid(2). All that ensures that SIGSTOP is consumed first. Additionally, if P2_PTRACE_FSTP is still set on detach, which means that waitpid(2) was not called at all, SIGSTOP is removed from the queue, ensuring that the process is resumed on detach. In issignal(), when acting on STOPing signals, remove the signal from queue before suspending. Otherwise parallel attach could result in ptracestop() acting on that STOP as if it was the STOP signal from the attach. Then SIGSTOP from attach leaks again. As a minor refactoring, some bits of the common attach code is moved to new helper proc_set_traced(). Reported by: markj Reviewed by: jhb, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D7256
# cc37baea	26-Jul-2016	Stephen J. Kiernan <stevek@FreeBSD.org>	Add the NUM_CORE_FILES kernel config option which specifies the limit for the number of core files allowed by a particular process when using the %I core file name pattern. Sanity check at compile time to ensure the value is within the valid range of 0-10. Reviewed by: jtl, sjg Approved by: sjg (mentor) Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D6812
# 8d570f64	15-Jul-2016	John Baldwin <jhb@FreeBSD.org>	Add a mask of optional ptrace() events. ptrace() now stores a mask of optional events in p_ptevents. Currently this mask is a single integer, but it can be expanded into an array of integers in the future. Two new ptrace requests can be used to manipulate the event mask: PT_GET_EVENT_MASK fetches the current event mask and PT_SET_EVENT_MASK sets the current event mask. The current set of events include: - PTRACE_EXEC: trace calls to execve(). - PTRACE_SCE: trace system call entries. - PTRACE_SCX: trace syscam call exits. - PTRACE_FORK: trace forks and auto-attach to new child processes. - PTRACE_LWP: trace LWP events. The S_PT_SCX and S_PT_SCE events in the procfs p_stops flags have been replaced by PTRACE_SCE and PTRACE_SCX. PTRACE_FORK replaces P_FOLLOW_FORK and PTRACE_LWP replaces P2_LWP_EVENTS. The PT_FOLLOW_FORK and PT_LWP_EVENTS ptrace requests remain for compatibility but now simply toggle corresponding flags in the event mask. While here, document that PT_SYSCALL, PT_TO_SCE, and PT_TO_SCX both modify the event mask and continue the traced process. Reviewed by: kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D7044
# 46e47c4f	03-Jul-2016	Konstantin Belousov <kib@FreeBSD.org>	Provide helper macros to detect 'non-silent SBDRY' state and to calculate appropriate return value for stops. Simplify the code by using them. Fix typo in sig_suspend_threads(). The thread which sleep must be aborted is td2. () In issignal(), when handling stopping signal for thread in TD_SBDRY_INTR state, do not stop, this is wrong and fires assert. This is yet another place where execution should be forced out of SBDRY-protected region. For such case, return -1 from issignal() and translate it to corresponding error code in sleepq_catch_signals(). Assert that other consumers of cursig() are not affected by the new return value. () Micro-optimize, mostly VFS and VOP methods, by avoiding calling the functions when SIGDEFERSTOP_NOP non-change is requested. (*) Reported and tested by: pho () Requested by: bde (**) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Approved by: re (gjb)
# 6f56cb8d	28-Jun-2016	Konstantin Belousov <kib@FreeBSD.org>	Complete r302215. TDF_SBDRY \| TDF_SERESTART and TDF_SBDRY \| TDF_SEINTR flags values, unlike TDF_SBDRY, must be treated almost as if TDF_SBDRY is not set for STOP signal delivery. The only difference is that sig_suspend_threads() should abort the sleep instead of doing immediate suspension. Reported by: ngie Sponsored by: The FreeBSD Foundation MFC after: 12 days Approved by: re (gjb)
# 9e590ff0	27-Jun-2016	Konstantin Belousov <kib@FreeBSD.org>	When filt_proc() removes event from the knlist due to the process exiting (NOTE_EXIT->knlist_remove_inevent()), two things happen: - knote kn_knlist pointer is reset - INFLUX knote is removed from the process knlist. And, there are two consequences: - KN_LIST_UNLOCK() on such knote is nop - there is nothing which would block exit1() from processing past the knlist_destroy() (and knlist_destroy() resets knlist lock pointers). Both consequences result either in leaked process lock, or dereferencing NULL function pointers for locking. Handle this by stopping embedding the process knlist into struct proc. Instead, the knlist is allocated together with struct proc, but marked as autodestroy on the zombie reap, by knlist_detach() function. The knlist is freed when last kevent is removed from the list, in particular, at the zombie reap time if the list is empty. As result, the knlist_remove_inevent() is no longer needed and removed. Other changes: In filt_procattach(), clear NOTE_EXEC and NOTE_FORK desired events from kn_sfflags for knote registered by kernel to only get NOTE_CHILD notifications. The flags leak resulted in excessive NOTE_EXEC/NOTE_FORK reports. Fix immediate note activation in filt_procattach(). Condition should be either the immediate CHILD_NOTE activation, or immediate NOTE_EXIT report for the exiting process. In knote_fork(), do not perform racy check for KN_INFLUX before kq lock is taken. Besides being racy, it did not accounted for notes just added by scan (KN_SCAN). Some minor and incomplete style fixes. Analyzed and tested by: Eric Badger <eric@badgerio.us> Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Approved by: re (gjb) Differential revision: https://reviews.freebsd.org/D6859
# 3a1e5dd8	26-Jun-2016	Konstantin Belousov <kib@FreeBSD.org>	Rewrite sigdeferstop(9) and sigallowstop(9) into more flexible framework allowing to set the suspension policy for the dynamic block. Extend the currently possible policies of stopping on interruptible sleeps and ignoring such sleeps by two more: do not suspend at interruptible sleeps, but interrupt them with either EINTR or ERESTART. Reviewed by: jilles Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Approved by: re (gjb)
# fb4cdc96	09-Jun-2016	Mariusz Zaborski <oshogbo@FreeBSD.org>	Define tunable instead of using CTLFLAG_RWTUN flag with kern.corefile. The allproc_lock lock used in the sysctl_kern_corefile function is initialized in the procinit function which is called after setting sysctl values at boot. That means if we set kern.corefile at boot we will be trying to use lock with is uninitialized and machine will crash. If we define kern.corefile as tunable instead of using CTFLAG_RWTUN we will not call the sysctl_kern_corefile function and we will not use an uninitialized lock. When machine will boot then we will start using function depending on the lock. Reviewed by: pjd
# 5fcfab6e	29-Dec-2015	John Baldwin <jhb@FreeBSD.org>	Add ptrace(2) reporting for LWP events. Add two new LWPINFO flags: PL_FLAG_BORN and PL_FLAG_EXITED for reporting thread creation and destruction. Newly created threads will stop to report PL_FLAG_BORN before returning to userland and exiting threads will stop to report PL_FLAG_EXIT before exiting completely. Both of these events are only enabled and reported if PT_LWP_EVENTS is enabled on a process.
# 36160958	16-Dec-2015	Mark Johnston <markj@FreeBSD.org>	Fix style issues around existing SDT probes. - Use SDT_PROBE<N>() instead of SDT_PROBE(). This has no functional effect at the moment, but will be needed for some future changes. - Don't hardcode the module component of the probe identifier. This is set automatically by the SDT framework. MFC after: 1 week
# 2f2f522b	27-Sep-2015	Andriy Gapon <avg@FreeBSD.org>	save some bytes by using more concise SDT_PROBE<n> instead of SDT_PROBE SDT_PROBE requires 5 parameters whereas SDT_PROBE<n> requires n parameters where n is typically smaller than 5. Perhaps SDT_PROBE should be made a private implementation detail. MFC after: 20 days
# f3fe76ec	12-Aug-2015	Ed Schouten <ed@FreeBSD.org>	Unignore signals when starting CloudABI processes. As CloudABI processes cannot adjust their signal handlers, we need to make sure that we start up CloudABI processes with consistent signal masks. Though the POSIx standard signal behavior is all right, we do need to make sure that we ignore SIGPIPE, as it would otherwise be hard to interact with pipes and sockets. Extend execsigs() to iterate over ps_sigignore and call sigdflt() for each of the ignored signals. Reviewed by: kib Obtained from: https://github.com/NuxiNL/freebsd Differential Revision: https://reviews.freebsd.org/D3365
# b4490c6e	18-Jul-2015	Konstantin Belousov <kib@FreeBSD.org>	The si_status field of the siginfo_t, provided by the waitid(2) and SIGCHLD signal, should keep full 32 bits of the status passed to the _exit(2). Split the combined p_xstat of the struct proc into the separate exit status p_xexit for normal process exit, and signalled termination information p_xsig. Kernel-visible macro KW_EXITCODE() reconstructs old p_xstat from p_xexit and p_xsig. p_xexit contains complete status and copied out into si_status. Requested by: Joerg Schilling Reviewed by: jilles (previous version), pho Tested by: pho Sponsored by: The FreeBSD Foundation
# ea566832	10-Jul-2015	Ed Schouten <ed@FreeBSD.org>	Add missing const keyword to kern_sigaction()'s 'act' parameter. This structure is not modified by the function. Also add const to sigact_flag_test(), as it is called by kern_sigaction().
# f6f6d240	10-Jun-2015	Mateusz Guzik <mjg@FreeBSD.org>	Implement lockless resource limits. Use the same scheme implemented to manage credentials. Code needing to look at process's credentials (as opposed to thred's) is provided with *_proc variants of relevant functions. Places which possibly had to take the proc lock anyway still use the proc pointer to access limits.
# aef68c96	29-May-2015	Konstantin Belousov <kib@FreeBSD.org>	When delivering a signal with default disposition to the thread, tdsigwakeup() increases the priority of the low-priority threads, to give them a chance to be terminated timely. Also, kernel allows user to signal kernel processes. The combined effect is that signalling idle process bump a priority of the selected delivery thread, which starts eating CPU. Check for the delivery thread be an idle thread and do not raise its priority then. The signal delivery to the kernel threads must be opt-in feature. Kernel thread should explicitely declare the ability to handle signals directed to it. E.g., nfsd threads check for signal as an indication of exit request. Most threads do not handle signals at all, and queuing the signal to them causes odd side-effects. Most innocent consequence is the memory leak due to queued ksiginfo, which is never deleted from the sigqueue. Code to prevent even queuing signals to the kernel threads is trivial, but it requires careful examination of each call to kproc/kthread creation to decide should the signalling be allowed. The commit is a stop-gap measure which fixes the immediate case for now. PR: 200493 Reported and tested by: trasz Discussed with: trasz, emaste Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 515b7a0b	25-May-2015	John Baldwin <jhb@FreeBSD.org>	Add KTR tracing for some MI ptrace events. Differential Revision: https://reviews.freebsd.org/D2643 Reviewed by: kib
# 0da9e11b	23-Mar-2015	Rui Paulo <rpaulo@FreeBSD.org>	Disable coredump_devctl because it could lead to leaking paths to jails.
# 5bc0ff88	20-Mar-2015	Mateusz Guzik <mjg@FreeBSD.org>	coredump: protect corefilename access with a lock Previously format string traversal could happen while the string itself was being modified. Use allproc_lock as coredumping is a rare operation and as such we don't have to create a dedicated lock. Submitted by: Tiwei Bie <btw mail.ustc.edu.cn> Reviewed by: kib X-Additional: JuniorJobs project
# aa14e9b7	08-Mar-2015	Mark Johnston <markj@FreeBSD.org>	Reimplement support for userland core dump compression using a new interface in kern_gzio.c. The old gzio interface was somewhat inflexible and has not worked properly since r272535: currently, the gzio functions are called with a range lock held on the output vnode, but kern_gzio.c does not pass the IO_RANGELOCKED flag to vn_rdwr() calls, resulting in deadlock when vn_rdwr() attempts to reacquire the range lock. Moreover, the new gzio interface can be used to implement kernel core compression. This change also modifies the kernel configuration options needed to enable userland core dump compression support: gzio is now an option rather than a device, and the COMPRESS_USER_CORES option is removed. Core dump compression is enabled using the kern.compress_user_cores sysctl/tunable. Differential Revision: https://reviews.freebsd.org/D1832 Reviewed by: rpaulo Discussed with: kib
# dacbc9db	24-Feb-2015	Konstantin Belousov <kib@FreeBSD.org>	Keep a reference on the coredump vnode for vn_fullpath() call. Do it by moving vn_close() after the point where notification is sent. Reported by: sbruno Tested by: pho, sbruno Sponsored by: The FreeBSD Foundation
# b5263b26	11-Feb-2015	Rui Paulo <rpaulo@FreeBSD.org>	Remove check against NULL after M_WAITOK. Submitted by: Oliver Pinter
# 6fbc0f7d	10-Feb-2015	Rui Paulo <rpaulo@FreeBSD.org>	Restore the data array in coredump(), but use a different style to calculate the length. Requested by: kib
# 624157bb	10-Feb-2015	Rui Paulo <rpaulo@FreeBSD.org>	Remove a printf and an strlen() from the coredump code.
# eb6368d4	09-Feb-2015	Rui Paulo <rpaulo@FreeBSD.org>	Sanitise the coredump file names sent to devd. While there, add a sysctl to turn this feature off as requested by kib@.
# 842ab62b	09-Feb-2015	Rui Paulo <rpaulo@FreeBSD.org>	Notify devd(8) when a process crashed. This change implements a notification (via devctl) to userland when the kernel produces coredumps after a process has crashed. devd can then run a specific command to produce a human readable crash report. The command is most usually a helper that runs gdb/lldb commands on the file/coredump pair. It's possible to use this functionality for implementing automatic generation of crash reports. devd(8) will be notified of the full path of the binary that crashed and the full path of the coredump file.
# 677258f7	18-Jan-2015	Konstantin Belousov <kib@FreeBSD.org>	Add procctl(2) PROC_TRACE_CTL command to enable or disable debugger attachment to the process. Note that the command is not intended to be a security measure, rather it is an obfuscation feature, implemented for parity with other operating systems. Discussed with: jilles, rwatson Man page fixes by: rwatson Sponsored by: The FreeBSD Foundation MFC after: 1 week
# e3612a4c	18-Jan-2015	Konstantin Belousov <kib@FreeBSD.org>	Make SIGSTOP working for sleeps done while waiting for fifo readers or writers in open(2), when the fifo is located on an NFS mount. Reported by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 271ab240	16-Jan-2015	Konstantin Belousov <kib@FreeBSD.org>	For sigaction(2), ignore possible garbage in sa_flags for sa_handler == SIG_DFL or SIG_IGN. Sloppy code does not fully initialize struct sigaction for such cases, and being too demanding in the case of default handler does not catch anything. Reported and tested by: Alex Tutubalin <lexa@lexa.ru> Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 8ee9765a	21-Dec-2014	Konstantin Belousov <kib@FreeBSD.org>	Add VN_OPEN_NAMECACHE flag for vn_open_cred(9), which requests that the created file name was cached. Use the flag for core dumps. Requested by: rpaulo Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# 6ddcc233	13-Dec-2014	Konstantin Belousov <kib@FreeBSD.org>	Add facility to stop all userspace processes. The supposed use of the feature is to quisce the system before suspend. Stop is implemented by reusing the thread_single(9) with the special mode SINGLE_ALLPROC. SINGLE_ALLPROC differs from the existing single-threading modes by allowing (requiring) caller to operate on other process. Interruptible sleeps for !TDF_SBDRY threads are suspended like SIGSTOP does it, instead of aborting the sleep, like SINGLE_NO_EXIT, to avoid spurious EINTRs on resume. Provide debugging sysctl debug.stop_all_proc, which causes total stop and suspends syncer, while waiting for variable reset for resume. It is used for debugging; should be removed after the real use of the interface is added. In collaboration with: pho Discussed with: avg Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# 70778bba	28-Nov-2014	Konstantin Belousov <kib@FreeBSD.org>	Assert the state of the process lock and sigact mutex in kern_sigprocmask() and reschedule_signals(). Discussed with: rea Sponsored by: The FreeBSD Foundation MFC after: 1 week
# e442f29f	26-Nov-2014	Konstantin Belousov <kib@FreeBSD.org>	Fix SA_SIGINFO \| SA_RESETHAND handling. The sysent' sv_sendsig() method needs pre-reset state of the ps_siginfo to correctly construct signal frame. Move sigdflt() call after the sv_sendsig() invocation in postsig(). Simultaneously extract common code from trapsignal() and postsig() into new helper postsig_done(). Submitted by: rea MFC after: 1 week
# 539c9eef	04-Oct-2014	Konstantin Belousov <kib@FreeBSD.org>	Fixes for i/o during coredumping: - Do not dump into system files. - Do not acquire write reference to the mount point where img.core is written, in the coredump(). The vn_rdwr() calls from ELF imgact request the write ref from vn_rdwr(). Recursive acqusition of the write ref deadlocks with the unmount. - Instead, take the range lock for the whole core file. This prevents parallel dumping from two processes executing the same image, converting the useless interleaved dump into sequential dumping, with second core overwriting the first. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# c83655f3	24-Aug-2014	Konstantin Belousov <kib@FreeBSD.org>	Revert the handling of all siginfo sa_flags except SA_SIGINFO to the pre-r270321. Namely, the flags are preserved for SIG_DFL and SIG_IGN dispositions. Requested and reviewed by: jilles Sponsored by: The FreeBSD Foundation MFC after: 1 week
# ce8daaad	24-Aug-2014	Mateusz Guzik <mjg@FreeBSD.org>	Use refcount_init in sigacts_alloc. This change is a no-op, but fixes up an inconsistency introduced with r268634. MFC after: 3 days
# 350ae563	22-Aug-2014	Konstantin Belousov <kib@FreeBSD.org>	Ensure that sigaction flags for signal, which disposition is reset to ignored or default, are not leaking. Apparently, there exists code which relies on SA_SIGINFO not reported for SIG_DFL or SIG_IGN. In kern_sigaction, ignore flags when resetting. Encapsulate the flag and disposition testing into helper sigact_flag_test(). On exec, and when delivering signal with SA_RESETHAND flag set, signals are reset automatically. Use new helper sigdflt(), which removes duplicated code and corrects all flag bits for the signal. For proc0, set sigintr bit for all ignored signals. Ignored signals are consumed in tdsendsignal() and not delivered to the victim thread at all. Reported and tested by: royger Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 2d864174	22-Aug-2014	Konstantin Belousov <kib@FreeBSD.org>	Check the validity of struct sigaction sa_flags value, reject unknown flags. Sponsored by: The FreeBSD Foundation MFC after: 1 week
# c959c237	14-Jul-2014	Mateusz Guzik <mjg@FreeBSD.org>	Manage struct sigacts refcnt with atomics instead of a mutex. MFC after: 1 week
# d00c8ea4	01-Jul-2014	Mateusz Guzik <mjg@FreeBSD.org>	Perform a lockless check in sigacts_shared. It is used only during execve (i.e. singlethreaded), so there is no fear of returning 'not shared' which soon becomes 'shared'. While here reorganize the code a little to avoid proc lock/unlock in shared case. MFC after: 1 week
# af3b2549	27-Jun-2014	Hans Petter Selasky <hselasky@FreeBSD.org>	Pull in r267961 and r267973 again. Fix for issues reported will follow.
# 37a107a4	27-Jun-2014	Glen Barber <gjb@FreeBSD.org>	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory
# 3da1cf1e	27-Jun-2014	Hans Petter Selasky <hselasky@FreeBSD.org>	Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies
# 4a144410	16-Mar-2014	Robert Watson <rwatson@FreeBSD.org>	Update kernel inclusions of capability.h to use capsicum.h instead; some further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h. MFC after: 3 weeks
# f2b525e6	30-Nov-2013	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Make process descriptors standard part of the kernel. rwhod(8) already requires process descriptors to work and having PROCDESC in GENERIC seems not enough, especially that we hope to have more and more consumers in the base. MFC after: 3 days
# d9fae5ab	26-Nov-2013	Andriy Gapon <avg@FreeBSD.org>	dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINE In its stead use the Solaris / illumos approach of emulating '-' (dash) in probe names with '__' (two consecutive underscores). Reviewed by: markj MFC after: 3 weeks
# 54366c0b	25-Nov-2013	Attilio Rao <attilio@FreeBSD.org>	- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip
# b20a9aa9	17-Nov-2013	Jilles Tjoelker <jilles@FreeBSD.org>	Fix siginfo_t.si_status for wait6/waitid/SIGCHLD. Per POSIX, si_status should contain the value passed to exit() for si_code==CLD_EXITED and the signal number for other si_code. This was incorrect for CLD_EXITED and CLD_DUMPED. This is still not fully POSIX-compliant (Austin group issue #594 says that the full value passed to exit() shall be returned via si_status, not just the low 8 bits) but is sufficient for a si_status-related test in libnih (upstart, Debian/kFreeBSD). PR: kern/184002 Reported by: Dmitrijs Ledkovs Tested by: Dmitrijs Ledkovs
# 7008be5b	04-Sep-2013	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Change the cap_rights_t type from uint64_t to a structure that we can extend in the future in a backward compatible (API and ABI) way. The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough. The structure definition looks like this: struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; }; The initial CAP_RIGHTS_VERSION is 0. The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements. The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future. To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg. #define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL) We still support aliases that combine few rights, but the rights have to belong to the same array element, eg: #define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL) #define CAP_FCHMODAT (CAP_FCHMOD \| CAP_LOOKUP) There is new API to manage the new cap_rights_t structure: cap_rights_t cap_rights_init(cap_rights_t rights, ...); void cap_rights_set(cap_rights_t rights, ...); void cap_rights_clear(cap_rights_t rights, ...); bool cap_rights_is_set(const cap_rights_t rights, ...); bool cap_rights_is_valid(const cap_rights_t rights); void cap_rights_merge(cap_rights_t dst, const cap_rights_t src); void cap_rights_remove(cap_rights_t dst, const cap_rights_t src); bool cap_rights_contains(const cap_rights_t big, const cap_rights_t little); Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg: cap_rights_t rights; cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT); There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg: #define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...); Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1: cap_rights_init(&rights, CAP_LOOKUP \| CAP_PDKILL); Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition. This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x. Sponsored by: The FreeBSD Foundation
# 7b77e1fe	14-Aug-2013	Mark Johnston <markj@FreeBSD.org>	Specify SDT probe argument types in the probe definition itself rather than using SDT_PROBE_ARGTYPE(). This will make it easy to extend the SDT(9) API to allow probes with dynamically-translated types. There is no functional change. MFC after: 2 weeks
# 462314b3	21-Jul-2013	Mateusz Guzik <mjg@FreeBSD.org>	Remove duplicate assertion from tdsendsignal. MFC after: 2 weeks
# b9ce4f67	05-Apr-2013	Gleb Smirnoff <glebius@FreeBSD.org>	Fix memory leak in coredump(). Reviewed by: kib
# 1968f37b	18-Mar-2013	John Baldwin <jhb@FreeBSD.org>	Tweak some comments.
# 3cf3b9f0	18-Mar-2013	John Baldwin <jhb@FreeBSD.org>	Partially revert r195702. Deferring stops is now implemented via a set of calls to toggle TDF_SBDRY rather than passing PBDRY to individual sleep calls. - Remove the stop_allowed parameters from cursig() and issignal(). issignal() checks TDF_SBDRY directly. - Remove the PBDRY and SLEEPQ_STOP_ON_BDRY flags.
# 593efaf9	21-Feb-2013	John Baldwin <jhb@FreeBSD.org>	Further refine the handling of stop signals in the NFS client. The changes in r246417 were incomplete as they did not add explicit calls to sigdeferstop() around all the places that previously passed SBDRY to _sleep(). In addition, nfs_getcacheblk() could trigger a write RPC from getblk() resulting in sigdeferstop() recursing. Rather than manually deferring stop signals in specific places, change the VFS_() and VOP_() methods to defer stop signals for filesystems which request this behavior via a new VFCF_SBDRY flag. Note that this has to be a VFC flag rather than a MNTK flag so that it works properly with VFS_MOUNT() when the mount is not yet fully constructed. For now, only the NFS clients are set this new flag in VFS_SET(). A few other related changes: - Add an assertion to ensure that TDF_SBDRY doesn't leak to userland. - When a lookup request uses VOP_READLINK() to follow a symlink, mark the request as being on behalf of the thread performing the lookup (cnp_thread) rather than using a NULL thread pointer. This causes NFS to properly handle signals during this VOP on an interruptible mount. PR: kern/176179 Reported by: Russell Cattelan (sigdeferstop() recursion) Reviewed by: kib MFC after: 1 month
# 6c08be2b	17-Feb-2013	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Add break to the default case.
# 888d4d4f	07-Feb-2013	Konstantin Belousov <kib@FreeBSD.org>	When vforked child is traced, the debugging events are not generated until child performs exec(). The behaviour is reasonable when a debugger is the real parent, because the parent is stopped until exec(), and sending a debugging event to the debugger would deadlock both parent and child. On the other hand, when debugger is not the parent of the vforked child, not sending debugging signals makes it impossible to debug across vfork. Fix the issue by declining generating debug signals only when vfork() was done and child called ptrace(PT_TRACEME). Set a new process flag P_PPTRACE from the attach code for PT_TRACEME, if P_PPWAIT flag is set, which indicates that the process was created with vfork() and still did not execed. Check P_PPTRACE from issignal(), instead of refusing the trace outright for the P_PPWAIT case. The scope of P_PPTRACE is exactly contained in the scope of P_PPWAIT. Found and tested by: zont Reviewed by: pluknet MFC after: 2 weeks
# a120a7a3	06-Feb-2013	John Baldwin <jhb@FreeBSD.org>	Rework the handling of stop signals in the NFS client. The changes in 195702, 195703, and 195821 prevented a thread from suspending while holding locks inside of NFS by forcing the thread to fail sleeps with EINTR or ERESTART but defer the thread suspension to the user boundary. However, this had the effect that stopping a process during an NFS request could abort the request and trigger EINTR errors that were visible to userland processes (previously the thread would have suspended and completed the request once it was resumed). This change instead effectively masks stop signals while in the NFS client. It uses the existing TDF_SBDRY flag to effect this since SIGSTOP cannot be masked directly. Also, instead of setting PBDRY on individual sleeps, the NFS client now sets the TDF_SBDRY flag around each NFS request and stop signals are masked for all sleeps during that region (the previous change missed sleeps in lockmgr locks). The end result is that stop signals sent to threads performing an NFS request are completely ignored until after the NFS request has finished processing and the thread prepares to return to userland. This restores the behavior of stop signals being transparent to userland processes while still preventing threads from suspending while holding NFS locks. Reviewed by: kib MFC after: 1 month
# c345faea	19-Dec-2012	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Replace expand_name() function with corefile_open() function, which not only returns name, but also vnode of corefile to use. This simplifies the code and closes few races, especially in %I handling. Reviewed by: kib Obtained from: WHEEL Systems
# 22a5d85a	19-Dec-2012	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Use correct file permissions when looking for available core file if kern.corefile contains %I. Obtained from: WHEEL Systems
# 07a8e078	18-Dec-2012	Pawel Jakub Dawidek <pjd@FreeBSD.org>	The 'flags' argument can be modified in vn_open_cred(), so we need to set it for every loop interation. Pointed out by: kib
# cc58032c	18-Dec-2012	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Do not audit paths we try when kern.corefile contains %I. Obtained from: WHEEL Systems
# 29146f1a	18-Dec-2012	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Style cleanups.
# 086053a3	18-Dec-2012	Pawel Jakub Dawidek <pjd@FreeBSD.org>	The expand_name() function isn't called with the process lock held anymore, so we can safely use malloc(M_WAITOK) now. Pointed out by: kib
# f06f465d	17-Dec-2012	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Minor style tweaks. Obtained from: WHEEL Systems
# c52ff611	17-Dec-2012	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Better variables naming in expand_name() to be more consistent with coredump(). Obtained from: WHEEL Systems
# dd57ce87	16-Dec-2012	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Move expand_name() after process lock is released. This fixed panic where we hold mutex (process lock) and try to obtain sleepable lock (vnode lock in expand_name()). The panic could occur when %I was used in kern.corefile. Additionally we avoid expand_name() overhead when coredumps are disabled. Obtained from: WHEEL Systems
# 2ce1b32d	16-Dec-2012	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Don't add audit record when coredumps are disabled or name cannot be expanded. Discussed with: rwatson Obtained from: WHEEL Systems
# 7e73ee85	16-Dec-2012	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Make the check easier to read. Obtained from: WHEEL Systems
# b039f8c2	16-Dec-2012	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Use 'cred' variable. Obtained from: WHEEL Systems
# b0c9d4d7	27-Nov-2012	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Add kern.capmode_coredump sysctl/tunable to allow processes in capability mode to dump core. Reviewed by: rwatson Obtained from: WHEEL Systems MFC after: 2 weeks
# 8890f5d0	27-Nov-2012	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Allow to use kill(2) in capability mode, but process can send a signal only to himself. For example abort(3) at first tries to do kill(getpid(), SIGABRT) which was failing in capability mode, so the code was failing back to exit(1). Reviewed by: rwatson Obtained from: WHEEL Systems MFC after: 2 weeks
# b62d05fc	27-Nov-2012	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Allow to modify kern.sugid_coredump and kern.corefile from loader.conf. Obtained from: WHEEL Systems
# c3209846	27-Nov-2012	Pawel Jakub Dawidek <pjd@FreeBSD.org>	More style fixes.
# 23c6445a	27-Nov-2012	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Style fixes (mostly whitespaces).
# 5050aa86	22-Oct-2012	Konstantin Belousov <kib@FreeBSD.org>	Remove the support for using non-mpsafe filesystem modules. In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho
# 3d74f47b	21-Oct-2012	Eitan Adler <eadler@FreeBSD.org>	Correct the killpg(2) return values: Return EPERM if processes were found but they were unable to be signaled. Return the first error from p_cansignal if no signal was successful. Reviewed by: jilles Approved by: cperciva MFC after: 1 week
# 10950e46	21-Oct-2012	Eitan Adler <eadler@FreeBSD.org>	Colin acked the wrong diff originally. fixed version coming soon. Approved by: cperciva (implicit)
# 2a1c0e4d	21-Oct-2012	Eitan Adler <eadler@FreeBSD.org>	Correct the killpg(2) return values: Return EPERM if processes were found but they were unable to be signaled. Return the first error from p_cansignal if no signal was successful. Reviewed by: jilles Approved by: cperciva MFC after: 1 week
# 0f14f15b	13-Sep-2012	John Baldwin <jhb@FreeBSD.org>	Ignore stop and continue signals sent to an exiting process. Stop signals set p_xstat to the signal that triggered the stop, but p_xstat is also used to hold the exit status of an exiting process. Without this change, a stop signal that arrived after a process was marked P_WEXIT but before it was marked a zombie would overwrite the exit status with the stop signal number. Reviewed by: kib MFC after: 1 week
# 888aefef	18-Aug-2012	Konstantin Belousov <kib@FreeBSD.org>	Deliver SIGSYS to the guilty thread, not to the process. MFC after: 1 week
# 7ce60f60	09-Jul-2012	David Xu <davidxu@FreeBSD.org>	Always clear p_xthread if current thread no longer needs it, in theory, if debugger exited without calling ptrace(PT_DETACH), there is a time window that the p_xthread may be pointing to non-existing thread, in practical, this is not a problem because child process soon will be killed by parent process.
# 2dd9ea6f	12-Apr-2012	Konstantin Belousov <kib@FreeBSD.org>	Add thread-private flag to indicate that error value is already placed in td_errno. Flag is supposed to be used by syscalls returning EJUSTRETURN because errno was already placed into the usermode frame by a call to set_syscall_retval(9). Both ktrace and dtrace get errno value from td_errno if the flag is set. Use the flag to fix sigsuspend(2) error return ktrace records. Requested by: bde MFC after: 1 week
# 8a8be776	09-Apr-2012	Jilles Tjoelker <jilles@FreeBSD.org>	Remove unused and wrong SA_PROC internal signal property. The SA_PROC signal property indicated whether each signal number is directed at a specific thread or at the process in general. However, that depends on how the signal was generated and not on the signal number. SA_PROC was not used.
# 6472ac3d	07-Nov-2011	Ed Schouten <ed@FreeBSD.org>	Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.
# c241c5e4	28-Oct-2011	Sergey Kandaurov <pluknet@FreeBSD.org>	Fix arguments list for proc:::signal-discard DTrace probe. Reported by: Anton Yuzhaninov <citrin citrin ru> MFC after: 1 week
# 8e9a54ee	01-Oct-2011	Konstantin Belousov <kib@FreeBSD.org>	The sigwait(3) function shall not return EINTR, according to the POSIX/SUSvN. The sigwait(2) syscall does return EINTR, and libc.so.7 contains the wrapper sigwait(3) which hides EINTR from callers. The EINTR return is used by libthr to handle required cancellation point in the sigwait(3). To help the binaries linked against pre-libc.so.7, i.e. RELENG_6 and earlier, to have right ABI for sigwait(3), transform EINTR return from sigwait(2) into ERESTART. Discussed with: davidxu MFC after: 1 week
# 8451d0dd	16-Sep-2011	Kip Macy <kmacy@FreeBSD.org>	In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)
# cfb5f768	18-Aug-2011	Jonathan Anderson <jonathan@FreeBSD.org>	Add experimental support for process descriptors A "process descriptor" file descriptor is used to manage processes without using the PID namespace. This is required for Capsicum's Capability Mode, where the PID namespace is unavailable. New system calls pdfork(2) and pdkill(2) offer the functional equivalents of fork(2) and kill(2). pdgetpid(2) allows querying the PID of the remote process for debugging purposes. The currently-unimplemented pdwait(2) will, in the future, allow querying rusage/exit status. In the interim, poll(2) may be used to check (and wait for) process termination. When a process is referenced by a process descriptor, it does not issue SIGCHLD to the parent, making it suitable for use in libraries---a common scenario when using library compartmentalisation from within large applications (such as web browsers). Some observers may note a similarity to Mach task ports; process descriptors provide a subset of this behaviour, but in a UNIX style. This feature is enabled by "options PROCDESC", but as with several other Capsicum kernel features, is not enabled by default in GENERIC 9.0. Reviewed by: jhb, kib Approved by: re (kib), mentor (rwatson) Sponsored by: Google Inc
# b8fdb0d9	26-May-2011	Edward Tomasz Napierala <trasz@FreeBSD.org>	Fix support for RACCT_CORE by merging forgotten file.
# 61009552	17-Apr-2011	Jilles Tjoelker <jilles@FreeBSD.org>	ktrace: Log the code for all signals (PSIG events). The code provides information on how the signal was generated. Formerly, the code was only logged for traps, much like only signal handlers for traps received a meaningful si_code before FreeBSD 7.0. In rare cases, no information is available and 0 is still logged. MFC after: 1 week
# e806d352	06-Apr-2011	John Baldwin <jhb@FreeBSD.org>	Fix several places to ignore processes that are not yet fully constructed. MFC after: 1 week
# c3b127e0	23-Mar-2011	John Baldwin <jhb@FreeBSD.org>	Small style fix.
# 6fa39a73	25-Jan-2011	Konstantin Belousov <kib@FreeBSD.org>	Allow debugger to specify that children of the traced process should be automatically traced. Extend the ptrace(PL_LWPINFO) to report that child just forked. Reviewed by: davidxu, jhb MFC after: 2 weeks
# a7d5f7eb	19-Oct-2010	Jamie Gritton <jamie@FreeBSD.org>	A new jail(8) with a configuration file, to replace the work currently done by /etc/rc.d/jail.
# 407af02b	14-Oct-2010	David Xu <davidxu@FreeBSD.org>	In kern_sigtimedwait(), move initialization code out of process lock, instead of using SIGISMEMBER to test every interesting signal, just unmask the signal set and let cursig() return one, get the signal after it returns, call reschedule_signal() after signals are blocked again. In kern_sigprocmask(), don't call reschedule_signal() when it is unnecessary. In reschedule_signal(), replace SIGISEMPTY() + SIGISMEMBER() with sig_ffs(), rename variable 'i' to sig.
# fc4ecc1d	13-Oct-2010	David Xu <davidxu@FreeBSD.org>	sigqueue_collect_set() is no longer needed because other functions maintain pending set correctly.
# cf7d9a8c	08-Oct-2010	David Xu <davidxu@FreeBSD.org>	Create a global thread hash table to speed up thread lookup, use rwlock to protect the table. In old code, thread lookup is done with process lock held, to find a thread, kernel has to iterate through process and thread list, this is quite inefficient. With this change, test shows in extreme case performance is dramatically improved. Earlier patch was reviewed by: jhb, julian
# 4d369413	10-Sep-2010	Matthew D Fleming <mdf@FreeBSD.org>	Replace sbuf_overflowed() with sbuf_error(), which returns any error code associated with overflow or with the drain function. While this function is not expected to be used often, it produces more information in the form of an errno that sbuf_overflowed() did.
# 137cf33d	31-Aug-2010	David Xu <davidxu@FreeBSD.org>	rescure comments from RELENG_4.
# 83b718eb	31-Aug-2010	David Xu <davidxu@FreeBSD.org>	If a process is being debugged, skips job control caused by SIGSTOP/SIGCONT signals, because it is managed by debugger, however a normal signal sent to a interruptibly sleeping thread wakes up the thread so it will handle the signal when the process leaves the stopped state. PR: 150138 MFC after: 1 week
# 79856499	22-Aug-2010	Rui Paulo <rpaulo@FreeBSD.org>	Add an extra comment to the SDT probes definition. This allows us to get use '-' in probe names, matching the probe names in Solaris.[1] Add userland SDT probes definitions to sys/sdt.h. Sponsored by: The FreeBSD Foundation Discussed with: rwaston [1]
# 212bc4b3	19-Jul-2010	David Xu <davidxu@FreeBSD.org>	Fix function name in error messages.
# fc8cca02	08-Jul-2010	John Baldwin <jhb@FreeBSD.org>	- Various style and whitespace fixes. - Make sugid_coredump and kern_logsigexit private to kern_sig.c. Submitted by: bde (partially) MFC after: 1 month
# 8a260079	04-Jul-2010	Konstantin Belousov <kib@FreeBSD.org>	Extend ptrace(PT_LWPINFO) to report siginfo for the signal that caused debugee stop. The change should keep the ABI. Take care of compat32. Discussed with: davidxu, jhb MFC after: 2 weeks
# ad6eec7b	29-Jun-2010	John Baldwin <jhb@FreeBSD.org>	Tweak the in-kernel API for sending signals to threads: - Rename tdsignal() to tdsendsignal() and make it private to kern_sig.c. - Add tdsignal() and tdksignal() routines that mirror psignal() and pksignal() except that they accept a thread as an argument instead of a process. They send a signal to a specific thread rather than to an individual process. Reviewed by: kib
# c5105012	21-Jun-2010	Konstantin Belousov <kib@FreeBSD.org>	Do not report a stack garbage as the old value for debug.ncores sysctl. Reported by: brucec
# afe1a688	23-May-2010	Konstantin Belousov <kib@FreeBSD.org>	Reorganize syscall entry and leave handling. Extend struct sysvec with three new elements: sv_fetch_syscall_args - the method to fetch syscall arguments from usermode into struct syscall_args. The structure is machine-depended (this might be reconsidered after all architectures are converted). sv_set_syscall_retval - the method to set a return value for usermode from the syscall. It is a generalization of cpu_set_syscall_retval(9) to allow ABIs to override the way to set a return value. sv_syscallnames - the table of syscall names. Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding the call to cpu_set_syscall_retval(). The new functions syscallenter(9) and syscallret(9) are provided that use sv_syscall pointers and contain the common repeated code from the syscall() implementations for the architecture-specific syscall trap handlers. Syscallenter() fetches arguments, calls syscall implementation from ABI sysent table, and set up return frame. The end of syscall bookkeeping is done by syscallret(). Take advantage of single place for MI syscall handling code and implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the thread is stopped at syscall entry or return point respectively. The EXEC flag augments SCX and notifies debugger that the process address space was changed by one of exec(2)-family syscalls. The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are changed to use syscallenter()/syscallret(). MIPS and arm are not converted and use the mostly unchanged syscall() implementation. Reviewed by: jhb, marcel, marius, nwhitehorn, stas Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc), stas (mips) MFC after: 1 month
# 88389d61	03-May-2010	Konstantin Belousov <kib@FreeBSD.org>	MFC r206264: When OOM searches for a process to kill, ignore the processes already killed by OOM. When killed process waits for a page allocation, try to satisfy the request as fast as possible.
# b7402d82	29-Apr-2010	Alfred Perlstein <alfred@FreeBSD.org>	Avoid allocating MAXHOSTNAMELEN bytes on the stack in expand_name(), use the heap instead. Obtained from: Juniper Networks Reviewed by: jhb
# 3f1c4c4f	06-Apr-2010	Konstantin Belousov <kib@FreeBSD.org>	When OOM searches for a process to kill, ignore the processes already killed by OOM. When killed process waits for a page allocation, try to satisfy the request as fast as possible. This removes the often encountered deadlock, where OOM continously selects the same victim process, that sleeps uninterruptibly waiting for a page. The killed process may still sleep if page cannot be obtained immediately, but testing has shown that system has much higher chance to survive in OOM situation with the patch. In collaboration with: pho Reviewed by: alc MFC after: 4 weeks
# e7228204	01-Mar-2010	Alfred Perlstein <alfred@FreeBSD.org>	Merge projects/enhanced_coredumps (r204346) into HEAD: Enhanced process coredump routines. This brings in the following features: 1) Limit number of cores per process via the %I coredump formatter. Example: if corefilename is set to %N.%I.core AND num_cores = 3, then if a process "rpd" cores, then the corefile will be named "rpd.0.core", however if it cores again, then the kernel will generate "rpd.1.core" until we hit the limit of "num_cores". this is useful to get several corefiles, but also prevent filling the machine with corefiles. 2) Encode machine hostname in core dump name via %H. 3) Compress coredumps, useful for embedded platforms with limited space. A sysctl kern.compress_user_cores is made available if turned on. To enable compressed coredumps, the following config options need to be set: options COMPRESS_USER_CORES device zlib # brings in the zlib requirements. device gzio # brings in the kernel vnode gzip output module. 4) Eventhandlers are fired to indicate coredumps in progress. 5) The imgact sv_coredump routine has grown a flag to pass in more state, currently this is used only for passing a flag down to compress the coredump or not. Note that the gzio facility can be used for generic output of gzip'd streams via vnodes. Obtained from: Juniper Networks Reviewed by: kan
# 661f092a	31-Jan-2010	Konstantin Belousov <kib@FreeBSD.org>	MFC r202881: Staticise sigqueue manipulation functions used only in kern_sig.c.
# c9dc5d49	29-Jan-2010	Konstantin Belousov <kib@FreeBSD.org>	MFC r202692: Remove the signal from sigqueue before notifying the debugger for traced process, fixing the race between resuming from stopped state and other thread noting the old signal on the queue and acting.
# a5799a4f	23-Jan-2010	Konstantin Belousov <kib@FreeBSD.org>	Staticise sigqueue manipulation functions used only in kern_sig.c. MFC after: 1 week
# 6a671a6b	20-Jan-2010	Konstantin Belousov <kib@FreeBSD.org>	When traced process is about to receive the signal, the process is stopped and debugger may modify or drop the signal. After the changes to keep process-targeted signals on the process sigqueue, another thread may note the old signal on the queue and act before the thread removes changed or dropped signal from the process queue. Since process is traced, it usually gets stopped. Or, if the same signal is delivered while process was stopped, the thread may erronously remove it, intending to remove the original signal. Remove the signal from the queue before notifying the debugger. Restore the siginfo to the head of sigqueue when signal is allowed to be delivered to the debugee, using newly introduced KSI_HEAD ksiginfo_t flag. This preserves required order of delivery. Always restore the unchanged signal on the curthread sigqueue, not to the process queue, since the thread is about to get it anyway, because sigmask cannot be changed. Handle failure of reinserting the siginfo into the queue by falling back to sq_kill method, calling sigqueue_add with NULL ksi. If debugger changed the signal to be delivered, use sigqueue_add() with NULL ksi instead of only setting sq_signals bit. Reported by: Gardner Bell <gbell72 rogers com> Analyzed and first version of fix by: Tijl Coosemans <tijl coosemans org> PR: 142757 Reviewed by: davidxu MFC after: 2 weeks
# fb70e2f7	18-Dec-2009	Konstantin Belousov <kib@FreeBSD.org>	MFC r199355: Add SI_KERNEL. MFC r199418: Fix pgsignal() call after signature change in r199355.
# 43ba7803	19-Dec-2009	Konstantin Belousov <kib@FreeBSD.org>	MFC r198507: Use kern_sigprocmask() instead of direct manipulation of td_sigmask to reschedule newly blocked signals. MFC r198590: Trapsignal() calls kern_sigprocmask() when delivering catched signal with proc lock held. MFC r198670: For trapsignal() and postsig(), kern_sigprocmask() is called with both process lock and curproc->p_sigacts->ps_mtx locked. Prevent lock recursion on ps_mtx in reschedule_signals().
# 3134e115	19-Dec-2009	Konstantin Belousov <kib@FreeBSD.org>	MFC r198506: In kern_sigsuspend(), manipulate thread signal mask using kern_sigprocmask(). Also, do cursig/postsig loop immediately after waiting for signal, repeating the wait if wakeup was spurious due to race with other thread fetching signal from the process queue before us. MFC r199136: Use cpu_set_syscall_retval(9) to set syscall result, and return EJUSTRETURN from kern_sigsuspend() to prevent syscall return code from modifying wrong frame. Take care of possibility that pending SIGCONT might be cancelled by SIGSTOP, causing postsig() not to deliver any catched signal.
# 6ddf1cd2	19-Dec-2009	Konstantin Belousov <kib@FreeBSD.org>	MFC r197963: Put process-directed signals to the process queue unconditionally, selecting the thread to deliver the signal only by the thread returning to usermode. Change cursig() and postsig() to look both into the thread and process signal queues. MFC r197976: Fix typo. MFC r200082: Remove wrong assertion. Debugee is allowed to lose a signal
# 4f17d481	03-Dec-2009	Konstantin Belousov <kib@FreeBSD.org>	Remove wrong assertion. Debugee is allowed to lose a signal. Reported and tested by: jh MFC after: 2 weeks
# a3de221d	17-Nov-2009	Konstantin Belousov <kib@FreeBSD.org>	Among signal generation syscalls, only sigqueue(2) is allowed by POSIX to fail due to lack of resources to queue siginfo. Add KSI_SIGQ flag that allows sigqueue_add() to fail while trying to allocate memory for new siginfo. When the flag is not set, behaviour is the same as for KSI_TRAP: if memory cannot be allocated, set bit in sq_kill. KSI_TRAP is kept to preserve KBI. Add SI_KERNEL si_code, to be used in siginfo.si_code when signal is generated by kernel. Deliver siginfo when signal is generated by kill(2) family of syscalls (SI_USER with properly filled si_uid and si_pid), or by kernel (SI_KERNEL, mostly job control or SIGIO). Since KSI_SIGQ flag is not set for the ksi, low memory condition cause old behaviour. Keep psignal(9) KBI intact, but modify it to generate SI_KERNEL si_code. Pgsignal(9) and gsignal(9) now take ksi explicitely. Add pksignal(9) that behaves like psignal but takes ksi, and ddb kill command implemented as pksignal(..., ksi = NULL) to not do allocation while in debugger. While there, remove some register specifiers and use ANSI C prototypes. Reviewed by: davidxu MFC after: 1 month
# 75c586a4	10-Nov-2009	Konstantin Belousov <kib@FreeBSD.org>	In r198506, kern_sigsuspend() started doing cursig/postsig loop to make sure that a signal was delivered to the thread before returning from syscall. Signal delivery puts new return frame on the user stack, and modifies trap frame to enter signal handler. As a consequence, syscall return code sets EINTR as error return for signal frame, instead of the syscall return. Also, for ia64, due to different registers layout for those two kind of frames, usermode sigsegfaulted when returned from signal handler. Use newly-introduced cpu_set_syscall_retval(9) to set syscall result, and return EJUSTRETURN from kern_sigsuspend() to prevent syscall return code from modifying this frame [1]. Another issue is that pending SIGCONT might be cancelled by SIGSTOP, causing postsig() not to deliver any catched signal [2]. Modify postsig() to return 1 if signal was posted, and 0 otherwise, and use this in the kern_sigsuspend loop. Proposed by: marcel [1] Noted by: davidxu [2] Reviewed by: marcel, davidxu MFC after: 1 month
# 80a8b0f3	30-Oct-2009	Konstantin Belousov <kib@FreeBSD.org>	Trapsignal() and postsig() call kern_sigprocmask() with both process lock and curproc->p_sigacts->ps_mtx. Reschedule_signals may need to have ps_mtx locked to decide and wakeup a thread, causing recursion on the mutex. Inform kern_sigprocmask() and reschedule_signals() about lock state of the ps_mtx by new flag SIGPROCMASK_PS_LOCKED to avoid recursion. Reported and tested by: keramida MFC after: 1 month
# 80845402	29-Oct-2009	Konstantin Belousov <kib@FreeBSD.org>	Trapsignal() calls kern_sigprocmask() when delivering catched signal with proc lock held. Reported and tested by: Mykola Dzham freebsd at levsha org ua MFC after: 1 month
# d6e029ad	27-Oct-2009	Konstantin Belousov <kib@FreeBSD.org>	In r197963, a race with thread being selected for signal delivery while in kernel mode, and later changing signal mask to block the signal, was fixed for sigprocmask(2) and ptread_exit(3). The same race exists for sigreturn(2), setcontext(2) and swapcontext(2) syscalls. Use kern_sigprocmask() instead of direct manipulation of td_sigmask to reschedule newly blocked signals, closing the race. Reviewed by: davidxu Tested by: pho MFC after: 1 month
# 84440afb	27-Oct-2009	Konstantin Belousov <kib@FreeBSD.org>	In kern_sigsuspend(), better manipulate thread signal mask using kern_sigprocmask() to properly notify other possible candidate threads for signal delivery. Since sigsuspend() shall only return to usermode after a signal was delivered, do cursig/postsig loop immediately after waiting for signal, repeating the wait if wakeup was spurious due to race with other thread fetching signal from the process queue before us. Add thread_suspend_check() call to allow the thread to be stopped or killed while in loop. Modify last argument of kern_sigprocmask() from boolean to flags, allowing the function to be called with locked proc. Convertion of the callers that supplied 1 to the old argument will be done in the next commit, and due to SIGPROCMASK_OLD value equial to 1, code is formally correct in between. Reviewed by: davidxu Tested by: pho MFC after: 1 month
# 8b5adf9d	12-Oct-2009	Joseph Koshy <jkoshy@FreeBSD.org>	Improve the description of sysctl "kern.sugid_coredump". Submitted by: Mel Flynn <mel.flynn+fbsd.hackers at mailing.thruhere.net> on -hackers
# 7df8f6ab	12-Oct-2009	Konstantin Belousov <kib@FreeBSD.org>	Fix typo. Submitted by: rdivacky MFC after: 1 month
# 6b286ee8	11-Oct-2009	Konstantin Belousov <kib@FreeBSD.org>	Currently, when signal is delivered to the process and there is a thread not blocking the signal, signal is placed on the thread sigqueue. If the selected thread is in kernel executing thr_exit() or sigprocmask() syscalls, then signal might be not delivered to usermode for arbitrary amount of time, and for exiting thread it is lost. Put process-directed signals to the process queue unconditionally, selecting the thread to deliver the signal only by the thread returning to usermode, since only then the thread can handle delivery of signal reliably. For exiting thread or thread that has blocked some signals, check whether the newly blocked signal is queued for the process, and try to find a thread to wakeup for delivery, in reschedule_signal(). For exiting thread, assume that all signals are blocked. Change cursig() and postsig() to look both into the thread and process signal queues. When there is a signal that thread returning to usermode could consume, TDF_NEEDSIGCHK flag is not neccessary set now. Do unlocked read of p_siglist and p_pendingcnt to check for queued signals. Note that thread that has a signal unblocked might get spurious wakeup and EINTR from the interruptible system call now, due to the possibility of being selected by reschedule_signals(), while other thread returned to usermode earlier and removed the signal from process queue. This should not cause compliance issues, since the thread has not blocked a signal and thus should be ready to receive it anyway. Reported by: Justin Teller <justin.teller gmail com> Reviewed by: davidxu, jilles MFC after: 1 month
# 68ee1aac	03-Oct-2009	Konstantin Belousov <kib@FreeBSD.org>	MFC r197660: Fix typo. Approved by: re (bz, kensmith)
# 15b7a831	30-Sep-2009	Konstantin Belousov <kib@FreeBSD.org>	Fix typo. MFC after: 3 days
# e76d823b	12-Sep-2009	Robert Watson <rwatson@FreeBSD.org>	Use C99 initialization for struct filterops. Obtained from: Mac OS X Sponsored by: Apple Inc. MFC after: 3 weeks
# f33a947b	14-Jul-2009	Konstantin Belousov <kib@FreeBSD.org>	Add new msleep(9) flag PBDY that shall be specified together with PCATCH, to indicate that thread shall not be stopped upon receipt of SIGSTOP until it reaches the kernel->usermode boundary. Also change thread_single(SINGLE_NO_EXIT) to only stop threads at the user boundary unconditionally. Tested by: pho Reviewed by: jhb Approved by: re (kensmith)
# 14961ba7	27-Jun-2009	Robert Watson <rwatson@FreeBSD.org>	Replace AUDIT_ARG() with variable argument macros with a set more more specific macros for each audit argument type. This makes it easier to follow call-graphs, especially for automated analysis tools (such as fxr). In MFC, we should leave the existing AUDIT_ARG() macros as they may be used by third-party kernel modules. Suggested by: brooks Approved by: re (kib) Obtained from: TrustedBSD Project MFC after: 1 week
# 401679de	23-Jun-2009	Peter Holm <pho@FreeBSD.org>	vn_open_cred() needs a non NULL ucred pointer Reviewed by: kib
# e0c161b8	21-Jun-2009	Konstantin Belousov <kib@FreeBSD.org>	Add another flags argument to vn_open_cred. Use it to specify that some vn_open_cred invocations shall not audit namei path. In particular, specify VN_OPEN_NOAUDIT for dotdot lookup performed by default implementation of vop_vptocnp, and for the open done for core file. vn_fullpath is called from the audit code, and vn_open there need to disable audit to avoid infinite recursion. Core file is created on return to user mode, that, in particular, happens during syscall return. The creation of the core file is audited by direct calls, and we do not want to overwrite audit information for syscall. Reported, reviewed and tested by: rwatson
# 885868cd	10-Apr-2009	Robert Watson <rwatson@FreeBSD.org>	Remove VOP_LEASE and supporting functions. This hasn't been used since the removal of NQNFS, but was left in in case it was required for NFSv4. Since our new NFSv4 client and server can't use it for their requirements, GC the old mechanism, as well as other unused lease- related code and interfaces. Due to its impact on kernel programming and binary interfaces, this change should not be MFC'd. Proposed by: jeff Reviewed by: jeff Discussed with: rmacklem, zach loafman @ isilon
# c90c9021	26-Feb-2009	Ed Schouten <ed@FreeBSD.org>	Remove even more unneeded variable assignments. kern_time.c: - Unused variable `p'. kern_thr.c: - Variable `error' is always caught immediately, so no reason to initialize it. There is no way that error != 0 at the end of create_thread(). kern_sig.c: - Unused variable `code'. kern_synch.c: - `rval' is always assigned in all different cases. kern_rwlock.c: - `v' is always overwritten with RW_UNLOCKED further on. kern_malloc.c: - `size' is always initialized with the proper value before being used. kern_exit.c: - `error' is always caught and returned immediately. abort2() never returns a non-zero value. kern_exec.c: - `len' is always assigned inside the if-statement right below it. tty_info.c: - `td' is always overwritten by FOREACH_THREAD_IN_PROC(). Found by: LLVM's scan-build
# 7b4a950a	04-Nov-2008	David Xu <davidxu@FreeBSD.org>	Revert rev 184216 and 184199, due to the way the thread_lock works, it may cause a lockup. Noticed by: peter, jhb
# 3f9be10e	23-Oct-2008	David Xu <davidxu@FreeBSD.org>	Actually, for signal and thread suspension, extra process spin lock is unnecessary, the normal process lock and thread lock are enough. The spin lock is still needed for process and thread exiting to mimic single sched_lock.
# d7f03759	19-Oct-2008	Ulf Lilleengen <lulf@FreeBSD.org>	- Import the HEAD csup code which is the basis for the cvsmode work.
# 904c5ec4	15-Oct-2008	David Xu <davidxu@FreeBSD.org>	Move per-thread userland debugging flags into seperated field, this eliminates some problems of locking, e.g, a thread lock is needed but can not be used at that time. Only the process lock is needed now for new field.
# 0359a12e	28-Aug-2008	Attilio Rao <attilio@FreeBSD.org>	Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread was always curthread and totally unuseful. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
# da7bbd2c	05-Aug-2008	John Baldwin <jhb@FreeBSD.org>	If a thread that is swapped out is made runnable, then the setrunnable() routine wakes up proc0 so that proc0 can swap the thread back in. Historically, this has been done by waking up proc0 directly from setrunnable() itself via a wakeup(). When waking up a sleeping thread that was swapped out (the usual case when waking proc0 since only sleeping threads are eligible to be swapped out), this resulted in a bit of recursion (e.g. wakeup() -> setrunnable() -> wakeup()). With sleep queues having separate locks in 6.x and later, this caused a spin lock LOR (sleepq lock -> sched_lock/thread lock -> sleepq lock). An attempt was made to fix this in 7.0 by making the proc0 wakeup use the ithread mechanism for doing the wakeup. However, this required grabbing proc0's thread lock to perform the wakeup. If proc0 was asleep elsewhere in the kernel (e.g. waiting for disk I/O), then this degenerated into the same LOR since the thread lock would be some other sleepq lock. Fix this by deferring the wakeup of the swapper until after the sleepq lock held by the upper layer has been locked. The setrunnable() routine now returns a boolean value to indicate whether or not proc0 needs to be woken up. The end result is that consumers of the sleepq API such as *sleep/wakeup, condition variables, sx locks, and lockmgr, have to wakeup proc0 if they get a non-zero return value from sleepq_abort(), sleepq_broadcast(), or sleepq_signal(). Discussed with: jeff Glanced at by: sam Tested by: Jurgen Weber jurgen - ish com au MFC after: 2 weeks
# 5d217f17	24-May-2008	John Birrell <jb@FreeBSD.org>	Add DTrace 'proc' provider probes using the Statically Defined Trace (sdt) mechanism.
# b7edba77	21-Mar-2008	Jeff Roberson <jeff@FreeBSD.org>	- Add a new td flag TDF_NEEDSUSPCHK that is set whenever a thread needs to enter thread_suspend_check(). - Set TDF_ASTPENDING along with TDF_NEEDSUSPCHK so we can move the thread_suspend_check() to ast() rather than userret(). - Check TDF_NEEDSUSPCHK in the sleepq_catch_signals() optimization so that we don't miss a suspend request. If this is set use the expensive signal path. - Set NEEDSUSPCHK when creating a new thread in thr in case the creating thread is due to be suspended as well but has not yet. Reviewed by: davidxu (Authored original patch)
# 374ae2a3	19-Mar-2008	Jeff Roberson <jeff@FreeBSD.org>	- Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice from requiring the per-process spinlock to only requiring the process lock. - Reflect these changes in the proc.h documentation and consumers throughout the kernel. This is a substantial reduction in locking cost for these fields and was made possible by recent changes to threading support.
# 6617724c	12-Mar-2008	Jeff Roberson <jeff@FreeBSD.org>	Remove kernel support for M:N threading. While the KSE project was quite successful in bringing threading to FreeBSD, the M:N approach taken by the kse library was never developed to its full potential. Backwards compatibility will be provided via libmap.conf for dynamically linked binaries and static binaries will be broken.
# 36b208e0	08-Mar-2008	Robert Watson <rwatson@FreeBSD.org>	Use sbuf routines to construct core dump filenames rather than custom string buffer handling, making the code both easier to read and more robust against string-handling bugs. MFC after: 1 week
# eeccc367	08-Mar-2008	Robert Watson <rwatson@FreeBSD.org>	Unlock the process lock when expand_name() fails, or we may leak the process lock leading to a hang. This bug was introduced in kern_sig.c:1.351, when the call to expand_name() was moved earlier bit this particular error case was not updated.
# 22db15c0	13-Jan-2008	Attilio Rao <attilio@FreeBSD.org>	VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
# cb05b60a	09-Jan-2008	Attilio Rao <attilio@FreeBSD.org>	vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>
# 10c2b8e1	18-Dec-2007	David E. O'Brien <obrien@FreeBSD.org>	Be more exact with sigaction SA_SIGINFO handling. Reviewed by: marcel
# 89b57fcf	05-Nov-2007	Konstantin Belousov <kib@FreeBSD.org>	Fix for the panic("vm_thread_new: kstack allocation failed") and silent NULL pointer dereference in the i386 and sparc64 pmap_pinit() when the kmem_alloc_nofault() failed to allocate address space. Both functions now return error instead of panicing or dereferencing NULL. As consequence, vmspace_exec() and vmspace_unshare() returns the errno int. struct vmspace arg was added to vm_forkproc() to avoid dealing with failed allocation when most of the fork1() job is already done. The kernel stack for the thread is now set up in the thread_alloc(), that itself may return NULL. Also, allocation of the first process thread is performed in the fork1() to properly deal with stack allocation failure. proc_linkup() is separated into proc_linkup() called from fork1(), and proc_linkup0(), that is used to set up the kernel process (was known as swapper). In collaboration with: Peter Holm Reviewed by: jhb
# 57274c51	25-Oct-2007	Christian S.J. Peron <csjp@FreeBSD.org>	Implement AUE_CORE, which adds process core dump support into the kernel. This change introduces audit_proc_coredump() which is called by coredump(9) to create an audit record for the coredump event. When a process dumps a core, it could be security relevant. It could be an indicator that a stack within the process has been overflowed with an incorrectly constructed malicious payload or a number of other events. The record that is generated looks like this: header,111,10,process dumped core,0,Thu Oct 25 19:36:29 2007, + 179 msec argument,0,0xb,signal path,/usr/home/csjp/test.core subject,csjp,csjp,staff,csjp,staff,1101,1095,50457,10.37.129.2 return,success,1 trailer,111 - We allocate a completely new record to make sure we arent clobbering the audit data associated with the syscall that produced the core (assuming the core is being generated in response to SIGABRT and not an invalid memory access). - Shuffle around expand_name() so we can use the coredump name at the very beginning of the coredump call. Make sure we free the storage referenced by "name" if we need to bail out early. - Audit both successful and failed coredump creation efforts Obtained from: TrustedBSD Project Reviewed by: rwatson MFC after: 1 month
# 5ff3816d	23-Oct-2007	Christian S.J. Peron <csjp@FreeBSD.org>	Move where we audit the PID argument such that we unconditionally audit it at the beginning of the syscall. This fixes a problem where the user supplies an invalid process ID which is > 0 which results in the PID argument not being audited. Obtained from: TrustedBSD Project MFC after: 1 week
# 6eeb364b	19-Jul-2007	Jeff Roberson <jeff@FreeBSD.org>	- Calling sched_nice() in tdsigwakeup() is no longer required by ULE and actually causes LORs and other panics. Reported by: mlaier Approved by: re
# efe641b9	11-Jun-2007	Jeff Roberson <jeff@FreeBSD.org>	- Add a missing PROC_SUNLOCK() in tdsignal()
# 9b73d239	09-Jun-2007	Matt Jacob <mjacob@FreeBSD.org>	Initialized ets to zero. This is arguably a gcc bug in that ets is always set to rts when timeout is non-NULL and then timevalid is set and ets is only checked later when timervalid is set.
# a54e85fd	04-Jun-2007	Jeff Roberson <jeff@FreeBSD.org>	Commit 4/14 of sched_lock decomposition. - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. - Move some common code into thread_suspend_switch() to handle the mechanics of suspending a thread. The locking here is incredibly convoluted and should be simplified. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
# 1c4bcd05	31-May-2007	Jeff Roberson <jeff@FreeBSD.org>	- Move rusage from being per-process in struct pstats to per-thread in td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits. Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)
# 9e223287	31-May-2007	Konstantin Belousov <kib@FreeBSD.org>	Revert UF_OPENING workaround for CURRENT. Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation argument from being file descriptor index into the pointer to struct file. Proposed and reviewed by: jhb Reviewed by: daichi (unionfs) Approved by: re (kensmith)
# 4dec0e67	23-May-2007	Robert Watson <rwatson@FreeBSD.org>	Comment that tdsignal() may be entered from the debugger.
# aa89d8cd	21-Mar-2007	John Baldwin <jhb@FreeBSD.org>	Rename the 'mtx_object', 'rw_object', and 'sx_object' members of mutexes, rwlocks, and sx locks to 'lock_object'.
# 873fbcd7	05-Mar-2007	Robert Watson <rwatson@FreeBSD.org>	Further system call comment cleanup: - Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde) - Remove extra blank lines in some cases. - Add extra blank lines in some cases. - Remove no-op comments consisting solely of the function name, the word "syscall", or the system call name. - Add punctuation. - Re-wrap some comments.
# 0c14ff0e	04-Mar-2007	Robert Watson <rwatson@FreeBSD.org>	Remove 'MPSAFE' annotations from the comments above most system calls: all system calls now enter without Giant held, and then in some cases, acquire Giant explicitly. Remove a number of other MPSAFE annotations in the credential code and tweak one or two other adjacent comments.
# d60226bd	09-Feb-2007	Xin LI <delphij@FreeBSD.org>	Give which signal caller has attempted to deliver when panicking.
# 4f506694	17-Jan-2007	Xin LI <delphij@FreeBSD.org>	Use FOREACH_PROC_IN_SYSTEM instead of using its unrolled form.
# 016fa302	24-Dec-2006	David Xu <davidxu@FreeBSD.org>	break loop early if we know that there are at least two signals.
# 6aeb05d7	11-Nov-2006	Tom Rhodes <trhodes@FreeBSD.org>	Merge posix4/* into normal kernel hierarchy. Reviewed by: glanced at by jhb Approved by: silence on -arch@ and -standards@
# 8460a577	26-Oct-2006	John Birrell <jb@FreeBSD.org>	Make KSE a kernel option, turned on by default in all GENERIC kernel configs except sun4v (which doesn't process signals properly with KSE). Reviewed by: davidxu@
# 5c28a8d4	21-Oct-2006	David Xu <davidxu@FreeBSD.org>	Use macro TAILQ_FOREACH_SAFE instead of expanding it.
# 0fc32899	20-Oct-2006	John Baldwin <jhb@FreeBSD.org>	Remove the check that prevented signals from being delivered to exiting processes. It was originally added back when support for Linux threads (and thus shared sigacts objects) was added, but no one knows why. My guess is that at some point during the Linux threads patches, the sigacts object was torn down during exit1(), so this check was added to prevent a panic for that race. However, the stuff that was actually committed to the tree doesn't teardown sigacts until wait() making the above race moot. Re-allowing signals here lets one interrupt a NFS request during process teardown (such as closing descriptors) on an interruptible mount. Requested by: kib (long time ago) MFC after: 1 week
# c6511aea	04-Oct-2006	David Xu <davidxu@FreeBSD.org>	Move some declaration of 32-bit signal structures into file freebsd32-signal.h, implement sigtimedwait and sigwaitinfo system calls.
# 73dbd3da	11-May-2006	John Baldwin <jhb@FreeBSD.org>	Remove various bits of conditional Alpha code and fixup a few comments.
# 11991ab4	07-May-2006	Tor Egge <tegge@FreeBSD.org>	Call vn_finished_write() before calling the coredump handler which will indirectly call vn_start_write() as necessary for each write.
# 95f16c1e	21-Apr-2006	Paul Saab <ps@FreeBSD.org>	Don't try to kill embryonic processes in killpg1(). This prevents a race condition between fork() and kill(pid,sig) with pid < 0 that can cause a kernel panic. Submitted by: up MFC after: 3 weeks
# 33f19bee	28-Mar-2006	John Baldwin <jhb@FreeBSD.org>	- Conditionalize Giant around VFS operations for ALQ, ktrace, and generating a coredump as the result of a signal. - Fix a bug where we could leak a Giant lock if vn_start_write() failed in coredump(). Reported by: jmg (2)
# 7b8d5e48	09-Mar-2006	David Xu <davidxu@FreeBSD.org>	Remove _STOPEVENT call, it is already called in issignal, simplify code for SIGKILL signal.
# 3dfcaad6	02-Mar-2006	David Xu <davidxu@FreeBSD.org>	Add signal set sq_kill to sigqueue structure, the member saves all signals sent by kill() syscall, without this, a signal sent by sigqueue() can cause a signal sent by kill() to be lost.
# 7e0221a2	23-Feb-2006	David Xu <davidxu@FreeBSD.org>	1. Refine kern_sigtimedwait() to remove redundant code. 2. Fix a bug, if thread got a SIGKILL signal, call sigexit() to kill its process. MFC after: 3 days
# 7c9a98f1	22-Feb-2006	David Xu <davidxu@FreeBSD.org>	Code cleanup, simply compare with curproc.
# 94f0972b	15-Feb-2006	David Xu <davidxu@FreeBSD.org>	Fix a long standing race between sleep queue and thread suspension code. When a thread A is going to sleep, it calls sleepq_catch_signals() to detect any pending signals or thread suspension request, if nothing happens, it returns without holding process lock or scheduler lock, this opens a race window which allows thread B to come in and do process suspension work, however since A is still at running state, thread B can do nothing to A, thread A continues, and puts itself into actually sleeping state, but B has never seen it, and it sits there forever until B is woken up by other threads sometimes later(this can be very long delay or never happen). Fix this bug by forcing sleepq_catch_signals to return with scheduler lock held. Fix sleepq_abort() by passing it an interrupted code, previously, it worked as wakeup_one(), and the interruption can not be identified correctly by sleep queue code when the sleeping thread is resumed. Let thread_suspend_check() returns EINTR or ERESTART, so sleep queue no longer has to use SIGSTOP as a hack to build a return value. Reviewed by: jhb MFC after: 1 week
# bfd7575a	13-Feb-2006	Wayne Salamon <wsalamon@FreeBSD.org>	Audit the arguments to the kill(2) and killpg(2) system calls. Obtained from: TrustedBSD Project Approved by: rwatson (mentor)
# d8267df7	12-Feb-2006	David Xu <davidxu@FreeBSD.org>	In order to speed up process suspension on MP machine, send IPI to remote CPU. While here, abstract thread suspension code into a function called sig_suspend_threads, the function is called when a process received a STOP signal.
# 7f96995e	04-Feb-2006	David Xu <davidxu@FreeBSD.org>	Create childproc_jobstate function to report job control state, this also fixes a bug in childproc_continued which ignored PS_NOCLDSTOP.
# d7bc12b0	23-Dec-2005	David Xu <davidxu@FreeBSD.org>	Avoid kernel panic when attaching a process which may not be stopped by debugger, e.g process is dumping core. Only access p_xthread if P_STOPPED_TRACE is set, this means thread is ready to exchange signal with debugger, print a warning if P_STOPPED_TRACE is not set due to some bugs in other code, if there is. The patch has been tested by Anish Mistry mistry.7 at osu dot edu, and is slightly adjusted.
# f71a882f	09-Dec-2005	David Xu <davidxu@FreeBSD.org>	Add a sysctl to force a process to sigexit if a trap signal is being hold by current thread or ignored by current process, otherwise, it is very possible the thread will enter an infinite loop and lead to an administrator's nightmare.
# 761a4d94	08-Dec-2005	David Xu <davidxu@FreeBSD.org>	Cleanup sigqueue sysctl.
# 027f7604	05-Dec-2005	David Xu <davidxu@FreeBSD.org>	Fix a lock leak in childproc_continued().
# b51d237a	30-Nov-2005	David Xu <davidxu@FreeBSD.org>	set signal queue values for sysconf().
# 413cf3bb	11-Nov-2005	David Xu <davidxu@FreeBSD.org>	Make sure only remove one signal by debugger.
# f4d85223	09-Nov-2005	David Xu <davidxu@FreeBSD.org>	WIFxxx macros requires an int type but p_xstat is short, convert it to int before using the macros. Bug reported by : Pyun YongHyeon pyunyh at gmail dot com
# ebceaf6d	08-Nov-2005	David Xu <davidxu@FreeBSD.org>	Add support for queueing SIGCHLD same as other UNIX systems did. For each child process whose status has been changed, a SIGCHLD instance is queued, if the signal is stilling pending, and process changed status several times, signal information is updated to reflect latest process status. If wait() returns because the status of a child process is available, pending SIGCHLD signal associated with the child process is discarded. Any other pending SIGCHLD signals remain pending. The signal information is allocated at the same time when proc structure is allocated, if process signal queue is fully filled or there is a memory shortage, it can still send the signal to process. There is a booting time tunable kern.sigqueue.queue_sigchild which can control the behavior, setting it to zero disables the SIGCHLD queueing feature, the tunable will be removed if the function is proved that it is stable enough. Tested on: i386 (SMP and UP)
# 8f0371f1	04-Nov-2005	David Xu <davidxu@FreeBSD.org>	Fix name compatible problem with POSIX standard. the sigval_ptr and sigval_int really should be sival_ptr and sival_int. Also sigev_notify_function accepts a union sigval value but not a pointer.
# 6d7b314b	02-Nov-2005	David Xu <davidxu@FreeBSD.org>	Cleanup some signal interfaces. Now the tdsignal function accepts both proc pointer and thread pointer, if thread pointer is NULL, tdsignal automatically finds a thread, otherwise it sends signal to given thread. Add utility function psignal_event to send a realtime sigevent to a process according to the delivery requirement specified in struct sigevent.
# 56c06c4b	29-Oct-2005	David Xu <davidxu@FreeBSD.org>	Let itimer store itimerspec instead of itimerval, so I don't have to convert to or from timeval frequently. Introduce function itimer_accept() to ack a timer signal in signal acceptance code, this allows us to return more fresh overrun counter than at signal generating time. while POSIX says: "the value returned by timer_getoverrun() shall apply to the most recent expiration signal delivery or acceptance for the timer,.." I prefer returning it at acceptance time. Introduce SIGEV_THREAD_ID notification mode, it is used by thread libary to request kernel to deliver signal to a specified thread, and in turn, the thread library may use the mechanism to implement SIGEV_THREAD which is required by POSIX. Timer signal is managed by timer code, so it can not fail even if signal queue is full filled by sigqueue syscall.
# 5da49fcb	22-Oct-2005	David Xu <davidxu@FreeBSD.org>	1. Make ksiginfo_alloc and ksiginfo_free public. 2. Introduce flags KSI_EXT and KSI_INS. The flag KSI_EXT allows a ksiginfo to be managed by outside code, the KSI_INS indicates sigqueue_add should directly insert passed ksiginfo into queue other than copy it.
# 9104847f	13-Oct-2005	David Xu <davidxu@FreeBSD.org>	1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most changes in MD code are trivial, before this change, trapsignal and sendsig use discrete parameters, now they uses member fields of ksiginfo_t structure. For sendsig, this change allows us to pass POSIX realtime signal value to user code. 2. Remove cpu_thread_siginfo, it is no longer needed because we now always generate ksiginfo_t data and feed it to libpthread. 3. Add p_sigqueue to proc structure to hold shared signals which were blocked by all threads in the proc. 4. Add td_sigqueue to thread structure to hold all signals delivered to thread. 5. i386 and amd64 now return POSIX standard si_code, other arches will be fixed. 6. In this sigqueue implementation, pending signal set is kept as before, an extra siginfo list holds additional siginfo_t data for signals. kernel code uses psignal() still behavior as before, it won't be failed even under memory pressure, only exception is when deleting a signal, we should call sigqueue_delete to remove signal from sigqueue but not SIGDELSET. Current there is no kernel code will deliver a signal with additional data, so kernel should be as stable as before, a ksiginfo can carry more information, for example, allow signal to be delivered but throw away siginfo data if memory is not enough. SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can not be caught or masked. The sigqueue() syscall allows user code to queue a signal to target process, if resource is unavailable, EAGAIN will be returned as specification said. Just before thread exits, signal queue memory will be freed by sigqueue_flush. Current, all signals are allowed to be queued, not only realtime signals. Earlier patch reviewed by: jhb, deischen Tested on: i386, amd64
# ec8297bd	05-Jun-2005	David Xu <davidxu@FreeBSD.org>	Fix a bug relavant to debugging, a masked signal unexpectedly interrupts a sleeping thread when process is being debugged. PR: GNU/77818 Tested by: Sean C. Farley <sean-freebsd at farley org>
# 407948a5	19-Apr-2005	David Xu <davidxu@FreeBSD.org>	Oops, forgot to update this file. Fix a race condition between kern_wait() and thread_stopped(). Problem is in kern_wait(), parent process steps through children list, once a child process is skipped, and later even if the child is stopped, parent process still sleeps in msleep(), the race happens if parent masked SIGCHLD. Submitted by : Peter Edwards peadar.edwards at gmail dot com MFC after : 4 days
# f97c3df1	09-Apr-2005	David Schultz <das@FreeBSD.org>	Suspend all other threads in the process while generating a core dump. The main reason for doing this is that the ELF dump handler expects the thread list to be fixed while the dump header is generated, so an upcall that occurs at the wrong time can lead to buffer overruns and other Bad Things. Another solution would be to grab sched_lock in the ELF dump handler, but we might as well single-thread, since the process is about to die. Furthermore, I think this should ensure that the register sets in the core file are sequentially consistent.
# 627451c1	04-Mar-2005	David Xu <davidxu@FreeBSD.org>	The td_waitset is pointing to a stack address when thread is waiting for a signal, because kernel stack is swappable, this causes page fault in kernel under heavy swapping case. Fix this bug by eliminating unneeded code.
# 6675b36e	02-Mar-2005	David Xu <davidxu@FreeBSD.org>	In kern_sigtimedwait, remove waitset bits for td_sigmask before sleeping, so in do_tdsignal, we no longer need to test td_waitset. now td_waitset is only used to give a thread higher priority when delivering signal to multithreads process. This also fixes a bug: when a thread in sigwait states was suspended and later resumed by SIGCONT, it can no longer receive signals belong to waitset.
# 1089f031	18-Feb-2005	David Xu <davidxu@FreeBSD.org>	Don't restart a timeout wait in kern_sigtimedwait, also allow it to wait longer than a single integer can represent.
# 1a88a252	13-Feb-2005	Maxim Sobolev <sobomax@FreeBSD.org>	Backout previous change (disabling of security checks for signals delivered in emulation layers), since it appears to be too broad. Requested by: rwatson
# d8ff44b7	13-Feb-2005	Maxim Sobolev <sobomax@FreeBSD.org>	Split out kill(2) syscall service routine into user-level and kernel part, the former is callable from user space and the latter from the kernel one. Make kernel version take additional argument which tells if the respective call should check for additional restrictions for sending signals to suid/sugid applications or not. Make all emulation layers using non-checked version, since signal numbers in emulation layers can have different meaning that in native mode and such protection can cause misbehaviour. As a result remove LIBTHR from the signals allowed to be delivered to a suid/sugid application. Requested (sorta) by: rwatson MFC after: 2 weeks
# 9454b2d8	06-Jan-2005	Warner Losh <imp@FreeBSD.org>	/* -> /*- for copyright notices, minor format tweaks as necessary
# 3ef6ac33	13-Dec-2004	Jeff Roberson <jeff@FreeBSD.org>	- If delivering a signal will result in killing a process that has a nice value above 0, set it to 0 so that it may proceed with haste. This is especially important on ULE, where adjusting the priority does not guarantee that a thread will be granted a greater time slice.
# 6d1ab6ed	15-Nov-2004	Warner Losh <imp@FreeBSD.org>	Fix an off by one error. MAXPATHLEN already has +1.
# 90d75f78	29-Oct-2004	Alfred Perlstein <alfred@FreeBSD.org>	Allow kill -9 to kill processes stuck in procfs STOPEVENTs.
# cd71c414	29-Oct-2004	Alfred Perlstein <alfred@FreeBSD.org>	Backout 1.291. re doesn't seem to think this fixes: Desired features for 5.3-RELEASE "More truss problems"
# b3a4fb14	05-Oct-2004	David Xu <davidxu@FreeBSD.org>	Use scheduler api to adjust thread priority.
# 482d099c	03-Oct-2004	David Xu <davidxu@FreeBSD.org>	Don't bother to turn off other P_STOPPED bits for SIGKILL, doing so would cause kernel to produce an unkillable process in some cases, especially, P_STOPPED_SINGLE has a singling thread, turning off the bit would mess the state.
# 50434413	01-Oct-2004	Alfred Perlstein <alfred@FreeBSD.org>	Clear a process's procfs trace points upon delivery of SIGKILL. MT5 candidate. (Desired features for 5.3-RELEASE "More truss problems")
# 5995adc2	31-Aug-2004	Julian Elischer <julian@FreeBSD.org>	Remove an unneeded argument.. The removed argument could trivially be derived from the remaining one. That in turn should be the same as curthread, but it is possible that curthread could be expensive to derive on some syste,s so leave it as an argument. Having both proc and thread as an argumen tjust gives an opportunity for them to get out sync. MFC after: 3 days
# ad3b9257	15-Aug-2004	John-Mark Gurney <jmg@FreeBSD.org>	Add locking to the kqueue subsystem. This also makes the kqueue subsystem a more complete subsystem, and removes the knowlege of how things are implemented from the drivers. Include locking around filter ops, so a module like aio will know when not to be unloaded if there are outstanding knotes using it's filter ops. Currently, it uses the MTX_DUPOK even though it is not always safe to aquire duplicate locks. Witness currently doesn't support the ability to discover if a dup lock is ok (in some cases). Reviewed by: green, rwatson (both earlier versions)
# 6141e04a	08-Aug-2004	John-Mark Gurney <jmg@FreeBSD.org>	add option to automaticly mark core dumps with the nodump flag PR: 57065 Submitted by: Walter C. Pelissero
# 24b2151f	03-Aug-2004	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Don't skip permission checks when sending signals to zombie processes. Pointed out by: bde Reviewed by: rwatson
# 0b011ea3	29-Jul-2004	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Syscall kill(2) called for a zombie process should return 0. Obtained from: Darwin
# 6dbc0850	16-Jul-2004	John Baldwin <jhb@FreeBSD.org>	Improve readability a bit by changing some code at the end of a function that did: if (foo) return else blah to just do the simpler if (!foo) blah instead.
# cbf4e354	13-Jul-2004	David Xu <davidxu@FreeBSD.org>	Add code to support debugging threaded process. 1. Add tm_lwpid into kse_thr_mailbox to indicate which kernel thread current user thread is running on. Add tm_dflags into kse_thr_mailbox, the flags is written by debugger, it tells UTS and kernel what should be done when the process is being debugged, current, there two flags TMDF_SSTEP and TMDF_DONOTRUNUSER. TMDF_SSTEP is used to tell kernel to turn on single stepping, or turn off if it is not set. TMDF_DONOTRUNUSER is used to tell kernel to schedule upcall whenever possible, to UTS, it means do not run the user thread until debugger clears it, this behaviour is necessary because gdb wants to resume only one thread when the thread's pc is at a breakpoint, and thread needs to go forward, in order to avoid other threads sneak pass the breakpoints, it needs to remove breakpoint, only wants one thread to go. Also, add km_lwp to kse_mailbox, the lwp id is copied to kse_thr_mailbox at context switch time when process is not being debugged, so when process is attached, debugger can map kernel thread to user thread. 2. Add p_xthread to proc strcuture and td_xsig to thread structure. p_xthread is used by a thread when it wants to report event to debugger, every thread can set the pointer, especially, when it is used in ptracestop, it is the last thread reporting event will win the race. Every thread has a td_xsig to exchange signal with debugger, thread uses TDF_XSIG flag to indicate it is reporting signal to debugger, if the flag is not cleared, thread will keep retrying until it is cleared by debugger, p_xthread may be used by debugger to indicate CURRENT thread. The p_xstat is still in proc structure to keep wait() to work, in future, we may just use td_xsig. 3. Add TDF_DBSUSPEND flag, the flag is used by debugger to suspend a thread. When process stops, debugger can set the flag for thread, thread will check the flag in thread_suspend_check, enters a loop, unless it is cleared by debugger, process is detached or process is existing. The flag is also checked in ptracestop, so debugger can temporarily suspend a thread even if the thread wants to exchange signal. 4. Current, in ptrace, we always resume all threads, but if a thread has already a TDF_DBSUSPEND flag set by debugger, it won't run. Encouraged by: marcel, julian, deischen
# fbc3247d	11-Jul-2004	Marcel Moolenaar <marcel@FreeBSD.org>	Implement the PT_LWPINFO request. This request can be used by the tracing process to obtain information about the LWP that caused the traced process to stop. Debuggers can use this information to select the thread currently running on the LWP as the current thread. The request has been made compatible with NetBSD for as much as possible. This implementation differs from NetBSD in the following ways: 1. The data argument is allowed to be smaller than the size of the ptrace_lwpinfo structure known to the kernel, but not 0. This is opposite to what NetBSD allows. The reason for this is that we can extend the structure without affecting older binaries. 2. On NetBSD the tracing process is to set the pl_lwpid field to the Id of the LWP it wants information of. We don't do that. Our ptrace interface allows passing the LWP Id instead of the PID. The tracing process is to set the PID to the LWP Id it wants information of. 3. When the PID is actually the PID of the tracing process, this request returns the information about the LWP that caused the process to stop. This was the whole purpose of the request in the first place. When the traced process has exited, this request will return the LWP Id 0, indicating that the process state is not the result of an event specific to a LWP.
# bf0acc27	02-Jul-2004	John Baldwin <jhb@FreeBSD.org>	- Change mi_switch() and sched_switch() to accept an optional thread to switch to. If a non-NULL thread pointer is passed in, then the CPU will switch to that thread directly rather than calling choosethread() to pick a thread to choose to. - Make sched_switch() aware of idle threads and know to do TD_SET_CAN_RUN() instead of sticking them on the run queue rather than requiring all callers of mi_switch() to know to do this if they can be called from an idlethread. - Move constants for arguments to mi_switch() and thread_single() out of the middle of the function prototypes and up above into their own section.
# 1930e303	11-Jun-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Deorbit COMPAT_SUNOS. We inherited this from the sparc32 port of BSD4.4-Lite1. We have neither a sparc32 port nor a SunOS4.x compatibility desire these days.
# 36939a0a	07-Jun-2004	David Xu <davidxu@FreeBSD.org>	According to SUSv3, sigwait is different with sigwaitinfo, sigwait returns error code in return value, not in errno.
# aa0aa7a1	02-Jun-2004	Tim J. Robbins <tjr@FreeBSD.org>	Move TDF_SA from td_flags to td_pflags (and rename it accordingly) so that it is no longer necessary to hold sched_lock while manipulating it. Reviewed by: davidxu
# a4c2da15	21-May-2004	Bruce Evans <bde@FreeBSD.org>	Fixed some style bugs in tdsigwakeup().
# 80c4433c	20-May-2004	John Baldwin <jhb@FreeBSD.org>	In tdsigwakeup(), use TD_ON_SLEEPQ() rather than TD_IS_SLEEPING() to see if a thread is on a sleep queue and should have it's sleep aborted. Reported by: Thierry Herbelot thierry at herbelot dot com
# 4a3b3dcb	12-Apr-2004	Colin Percival <cperciva@FreeBSD.org>	stop() no longer needs sched_lock held; in fact, holding sched_lock causes a LOR against sleepq. Fix the comment, and fix ptracestop() to pick up sched_lock after stop() rather than before. Reported by: Scott Sipe <cscotts@mindspring.com> Reviewed by: rwatson, jhb
# 7f8a436f	05-Apr-2004	Warner Losh <imp@FreeBSD.org>	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999. Approved by: core
# 9a6a4cb5	29-Mar-2004	Peter Wemm <peter@FreeBSD.org>	Shorten some XXXKSE commentry
# 4ae89b95	05-Mar-2004	John Baldwin <jhb@FreeBSD.org>	- Push down Giant in exit() and wait(). - Push Giant down a bit in coredump() and call coredump() with the proc lock already held rather than unlocking it only to turn around and relock it. Requested by: peter
# 86b5e563	03-Mar-2004	Dag-Erling Smørgrav <des@FreeBSD.org>	Use different dummy wait channels to avoid panic in msleep(). Reviewed by: jhb
# 44f3b092	27-Feb-2004	John Baldwin <jhb@FreeBSD.org>	Switch the sleep/wakeup and condition variable implementations to use the sleep queue interface: - Sleep queues attempt to merge some of the benefits of both sleep queues and condition variables. Having sleep qeueus in a hash table avoids having to allocate a queue head for each wait channel. Thus, struct cv has shrunk down to just a single char * pointer now. However, the hash table does not hold threads directly, but queue heads. This means that once you have located a queue in the hash bucket, you no longer have to walk the rest of the hash chain looking for threads. Instead, you have a list of all the threads sleeping on that wait channel. - Outside of the sleepq code and the sleep/cv code the kernel no longer differentiates between cv's and sleep/wakeup. For example, calls to abortsleep() and cv_abort() are replaced with a call to sleepq_abort(). Thus, the TDF_CVWAITQ flag is removed. Also, calls to unsleep() and cv_waitq_remove() have been replaced with calls to sleepq_remove(). - The sched_sleep() function no longer accepts a priority argument as sleep's no longer inherently bump the priority. Instead, this is soley a propery of msleep() which explicitly calls sched_prio() before blocking. - The TDF_ONSLEEPQ flag has been dropped as it was never used. The associated TDF_SET_ONSLEEPQ and TDF_CLR_ON_SLEEPQ macros have also been dropped and replaced with a single explicit clearing of td_wchan. TD_SET_ONSLEEPQ() would really have only made sense if it had taken the wait channel and message as arguments anyway. Now that that only happens in one place, a macro would be overkill.
# 91d5354a	04-Feb-2004	John Baldwin <jhb@FreeBSD.org>	Locking for the per-process resource limits structure. - struct plimit includes a mutex to protect a reference count. The plimit structure is treated similarly to struct ucred in that is is always copy on write, so having a reference to a structure is sufficient to read from it without needing a further lock. - The proc lock protects the p_limit pointer and must be held while reading limits from a process to keep the limit structure from changing out from under you while reading from it. - Various global limits that are ints are not protected by a lock since int writes are atomic on all the archs we support and thus a lock wouldn't buy us anything. - All accesses to individual resource limits from a process are abstracted behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return either an rlimit, or the current or max individual limit of the specified resource from a process. - dosetrlimit() was renamed to kern_setrlimit() to match existing style of other similar syscall helper functions. - The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit() (it didn't used the stackgap when it should have) but uses lim_rlimit() and kern_setrlimit() instead. - The svr4 compat no longer uses the stackgap for resource limits calls, but uses lim_rlimit() and kern_setrlimit() instead. - The ibcs2 compat no longer uses the stackgap for resource limits. It also no longer uses the stackgap for accessing sysctl's for the ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result, ibcs2_sysconf() no longer needs Giant. - The p_rlimit macro no longer exists. Submitted by: mtm (mostly, I only did a few cleanups and catchups) Tested on: i386 Compiled on: alpha, amd64
# 30a9f26d	28-Jan-2004	Robert Watson <rwatson@FreeBSD.org>	Assert process lock in ptracestop(), since we're going to rely on it, and later unlock it.
# 97563428	27-Jan-2004	Alexander Kabaev <kan@FreeBSD.org>	Move the part of the comment which applies to osigsuspend where it belongs. The current sigsuspend syscall does expect a pointer to the mask as argument. Submitted by: Igor Sysoev <is at rambler-co dot ru>
# 29bcc451	24-Jan-2004	Jeff Roberson <jeff@FreeBSD.org>	- Add a flags parameter to mi_switch. The value of flags may be SW_VOL or SW_INVOL. Assert that one of these is set in mi_switch() and propery adjust the rusage statistics. This is to simplify the large number of users of this interface which were previously all required to adjust the proper counter prior to calling mi_switch(). This also facilitates more switch and locking optimizations. - Change all callers of mi_switch() to pass the appropriate paramter and remove direct references to the process statistics.
# def05568	10-Jan-2004	Robert Watson <rwatson@FreeBSD.org>	When not creating a core dump due to resource limits specifying a maximum dump size of 0, return a size-related error, rather than returning success. Otherwise, waitpid() will incorrectly return a status indicating that a core dump was created. Note that the specific error doesn't actually matter, since it's lost. MFC after: 2 weeks PR: 60367 Submitted by: Valentin Nechayev <netch@netch.kiev.ua>
# 047aa39b	08-Jan-2004	Robert Watson <rwatson@FreeBSD.org>	Drop the sigacts mutex around calls to stopevent() to avoid sleeping holding the mutex. Because the sigacts pointer can't change while the process is "live" (proc locking (x)), we know our pointer is still valid. In communication with: truckman Reviewed by: jhb
# a30ec4b9	02-Jan-2004	David Xu <davidxu@FreeBSD.org>	Make sigaltstack as per-threaded, because per-process sigaltstack state is useless for threaded programs, multiple threads can not share same stack. The alternative signal stack is private for thread, no lock is needed, the orignal P_ALTSTACK is now moved into td_pflags and renamed to TDP_ALTSTACK. For single thread or Linux clone() based threaded program, there is no semantic changed, because those programs only have one kernel thread in every process. Reviewed by: deischen, dfr
# a9a48d68	07-Dec-2003	David Xu <davidxu@FreeBSD.org>	Lock and unlock sched_lock when walking through thread list, current we insert kse upcall thread into thread list at mi_switch time, process lock is not enough.
# 7eeaaf9b	29-Oct-2003	David Xu <davidxu@FreeBSD.org>	Try to fetch thread mailbox address in page fault trap, so when thread blocks in page fault hanlder, and upcall thread can be scheduled. It is useful if process is doing lots of mmap based I/O.
# 36bbf86b	25-Oct-2003	Robert Watson <rwatson@FreeBSD.org>	Check (locked) before performing an advisory unlock following a failure of vn_start_write(). Otherwise, we may inconsistently attempt to release the advisory lock. Pointed out by: teggej
# c447f5b2	25-Oct-2003	Robert Watson <rwatson@FreeBSD.org>	When generate a core dump, use advisory locking in an advisory way: if we do acquire an advisory lock, great! We'll release it later. However, if we fail to acquire a lock, we perform the coredump anyway. This problem became particularly visible with NFS after the introduction of rpc.lockd: if the lock manager isn't running, then locking calls will fail, aborting the core dump (resulting in a zero-byte dump file). Reported by: Yogeshwar Shenoy <ynshenoy@alumni.cs.ucsb.edu>
# 3a2e2a0e	13-Oct-2003	David Xu <davidxu@FreeBSD.org>	Don't clear signal mask in execsig(). RELENG_4 does not clear it and POSIX asks to inherit signal mask for execv.
# 4cc9f52f	26-Sep-2003	Robert Drehmel <robert@FreeBSD.org>	Move some tracing related code into its own function as it will be needed for system call related ptrace functionality I plan to commit soon.
# 41b3077a	10-Aug-2003	Jacques Vidrine <nectar@FreeBSD.org>	panic() if we try to handle an out-of-range signal number in psignal()/tdsignal(). The test was historically in psignal(). It was changed into a KASSERT, and then later moved to tdsignal() when the latter was introduced. Reviewed by: iedowse, jhb
# 1fc434dc	30-Jul-2003	David Xu <davidxu@FreeBSD.org>	Use correct signal when calling sigexit.
# 7c89f162	27-Jul-2003	Poul-Henning Kamp <phk@FreeBSD.org>	Add fdidx argument to vn_open() and vn_open_cred() and pass -1 throughout.
# a6ca4808	24-Jul-2003	Mike Makonnen <mtm@FreeBSD.org>	The POSIX spec also requires that kern_sigtimedwait return EINVAL if tv_nsec of the timeout is less than zero.
# 432b45de	20-Jul-2003	David Xu <davidxu@FreeBSD.org>	Always deliver synchronous signal to UTS for SA threads.
# 3074d1b4	17-Jul-2003	David Xu <davidxu@FreeBSD.org>	Fix sigwait to conform to POSIX. When a signal is being delivered to process, first find a sigwait thread to deliver, POSIX's argument is speed of delivering signal to sigwait thread is faster than other ways. A signal in its wait set will cause sigwait to return the signal number, a signal not in its wait set but in not blocked by the thread also causes sigwait to return, but sigwait returns EINTR, sigwait is oneshot operation, only one signal can be delivered to its wait set, when a signal is delivered to the sigwait thread, the thread's sigwait state is canceled.
# 4b7d5d84	14-Jul-2003	David Xu <davidxu@FreeBSD.org>	Rename thread_siginfo to cpu_thread_siginfo
# ffb2e92a	11-Jul-2003	David Xu <davidxu@FreeBSD.org>	If a thread is sending signal to its process, if the thread can handle the signal itself, it should get it without looking for other threads.
# 14b5ae1a	05-Jul-2003	Mike Makonnen <mtm@FreeBSD.org>	Make the conditional, which decides what siglist to put a signal on, more concise and improve the comment. Submitted by: bde
# c197abc4	03-Jul-2003	Mike Makonnen <mtm@FreeBSD.org>	Signals sent specifically to a particular thread must be delivered to that thread, regardless of whether it has it masked or not. Previously, if the targeted thread had the signal masked, it would be put on the processes' siglist. If another thread has the signal umasked or unmasks it before the target, then the thread it was intended for would never receive it. This patch attempts to solve the problem by requiring callers of tdsignal() to say whether the signal is for the thread or for the process. If it is for the process, then normal processing occurs and any thread that has it unmasked can receive it. But if it is destined for a specific thread, it is put on that thread's pending list regardless of whether it is currently masked or not. The new behaviour still needs more work, though. If the signal is reposted for some reason it is always posted back to the thread that handled it because the information regarding the target of the signal has been lost by then. Reviewed by: jdp, jeff, bde (style)
# 9dde3bc9	28-Jun-2003	David Xu <davidxu@FreeBSD.org>	o Change kse_thr_interrupt to allow send a signal to a specified thread, or unblock a thread in kernel, and allow UTS to specify whether syscall should be restarted. o Add ability for UTS to monitor signal comes in and removed from process, the flag PS_SIGEVENT is used to indicate the events. o Add a KMF_WAITSIGEVENT for KSE mailbox flag, UTS call kse_release with this flag set to wait for above signal event. o For SA based thread, kernel masks all signal in its signal mask, let UTS to use kse_thr_interrupt interrupt a thread, and install a signal frame in userland for the thread. o Add a tm_syncsig in thread mailbox, when a hardware trap occurs, it is used to deliver synchronous signal to userland, and upcall is schedule, so UTS can process the synchronous signal for the thread. Reviewed by: julian (mentor)
# 418228df	28-Jun-2003	David Xu <davidxu@FreeBSD.org>	Fix POSIX compatible bug for sigwaitinfo and sigtimedwait. POSIX says siginfo pointer parameter can be NULL and if the function success, it should return signal number but not zero. The waitset it past should be negatived before it can be used as thread signal mask.
# 062cf543	19-Jun-2003	David Xu <davidxu@FreeBSD.org>	When a STOP signal is being sent to a process, it is possible all threads in the process have already masked the signal, so job control is delayed. But later a thread unmasking the STOP signal should enable job control, so in issignal(), scanning all threads in process to see if we can direct suspend some of them, not just suspend current thread.
# 8b56079e	19-Jun-2003	David Xu <davidxu@FreeBSD.org>	Fix typo. td should be td0.
# cd4f6ebb	14-Jun-2003	David Xu <davidxu@FreeBSD.org>	1. Add code to support bound thread. when blocked, a bound thread never schedules an upcall. Signal delivering to a bound thread is same as non-threaded process. This is intended to be used by libpthread to implement PTHREAD_SCOPE_SYSTEM thread. 2. Simplify kse_release() a bit, remove sleep loop.
# 0e2a4d3a	14-Jun-2003	David Xu <davidxu@FreeBSD.org>	Rename P_THREADED to P_SA. P_SA means a process is using scheduler activations.
# 677b542e	10-Jun-2003	David E. O'Brien <obrien@FreeBSD.org>	Use __FBSDID().
# 5e26dcb5	09-Jun-2003	John Baldwin <jhb@FreeBSD.org>	- Add a td_pflags field to struct thread for private flags accessed only by curthread. Unlike td_flags, this field does not need any locking. - Replace the td_inktr and td_inktrace variables with equivalent private thread flags. - Move TDF_OLDMASK over to the private flags field so it no longer requires sched_lock.
# 8d542cb5	15-May-2003	David E. O'Brien <obrien@FreeBSD.org>	Fix long standing bug that prevents the PT_CONTINUE, PT_KILL and PT_DETACH ptrace(2) requests from functioning as advertised in the manual page. As described in kern/35175, the PT_DETACH request will, under certain circumstances, pass an unwanted signal on to the traced process upan detaching from it. The PT_CONTINUE request will sometimes fail if you make it pass a signal that has "properties" that differ from the properties of the signal that origionally caused the traced process to be stopped. Since PT_KILL is nothing than PT_CONTINUE with SIGKILL, it is broken too. In the PT_KILL case, this leads to an unkillable process. PR: 44011 Submitted by: Mark Kettenis <kettenis@chello.nl> Approved by: re(jhb)
# 90af4afa	13-May-2003	John Baldwin <jhb@FreeBSD.org>	- Merge struct procsig with struct sigacts. - Move struct sigacts out of the u-area and malloc() it using the M_SUBPROC malloc bucket. - Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(), sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared(). - Remove the p_sigignore, p_sigacts, and p_sigcatch macros. - Add a mutex to struct sigacts that protects all the members of the struct. - Add sigacts locking. - Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now that sigacts is locked. - Several in-kernel functions such as psignal(), tdsignal(), trapsignal(), and thread_stopped() are now MP safe. Reviewed by: arch@ Approved by: re (rwatson)
# b1bf1c3a	09-May-2003	John Baldwin <jhb@FreeBSD.org>	Remove Giant from kern_sigsuspend() and osigsuspend() as these should now be MP safe. Approved by: re (scottl)
# 854dc8c2	05-May-2003	John Baldwin <jhb@FreeBSD.org>	Mostly sort the includes.
# 18440c7f	05-May-2003	John Baldwin <jhb@FreeBSD.org>	Lock the proc lock around calls to tdsignal() in the sigwait() family of syscalls.
# 6711f10f	05-May-2003	John Baldwin <jhb@FreeBSD.org>	Make issignal() private to kern_sig.c since it is only called from cursig() and cursig() is now a function rather than a macro.
# a14e1189	30-Apr-2003	John Baldwin <jhb@FreeBSD.org>	Forgot to remove Giant around call to kern_sigaction() in freebsd4_sigaction() in revision 1.232.
# 25d6dc06	25-Apr-2003	John Baldwin <jhb@FreeBSD.org>	Push Giant down into kern_sigaction() instead of locking it around calls to kern_sigaction() in the various callers of the function.
# cf60731b	23-Apr-2003	John Baldwin <jhb@FreeBSD.org>	Remove Giant from osigblock(), osigsetmask(), and kern_sigaltstack().
# 5afe0c99	23-Apr-2003	John Baldwin <jhb@FreeBSD.org>	- Reorganize osigstack() to do the copyin first, grab the proc lock once, do all the various sigstack dances, unlock the proc lock, and finally do the copyout. This more closely resembles the behavior of kern_sigaltstack() and closes a small race. - Remove Giant from osigstack as it is no longer needed.
# 06ce69a7	18-Apr-2003	David Xu <davidxu@FreeBSD.org>	Unbreak sigaltstack syscall. sigonstack is now a function and want proc lock be held.
# 8b94a061	18-Apr-2003	John Baldwin <jhb@FreeBSD.org>	- Make sigonstack() a regular function instead of an inline and add a proc lock assertion to it. - SIGPENDING() no longer needs sched_lock, so only grab sched_lock to set the TDF_NEEDSIGCHK and TDF_ASTPENDING flags in signotify(). - Add a proc lock assertion to tdsigwakeup(). - Since we always set TDF_OLDMASK while holding the proc lock, the proc lock is sufficient protection to check its state in postsig() and we only need sched_lock when clearing the actual flag.
# e77daab1	18-Apr-2003	John Baldwin <jhb@FreeBSD.org>	Rename do_sigprocmask() to kern_sigprocmask() and make it a global symbol so that it can be used by binary emulators.
# 9d8643ec	17-Apr-2003	John Baldwin <jhb@FreeBSD.org>	Don't hold the proc lock while performing sigset conversions on local variables.
# 5edadff9	17-Apr-2003	John Baldwin <jhb@FreeBSD.org>	- Remove garbage SIGSETOR() that snuck into struct sigpending_args definition. - Use the proper constant for the last arg to kern_sigaction() in osigvec() instead of a magic value.
# f9b89f7e	11-Apr-2003	David Xu <davidxu@FreeBSD.org>	Style fix.
# 5312b1c7	11-Apr-2003	David Xu <davidxu@FreeBSD.org>	Check SIG_HOLD action ealier to avoid missing test it in later code.
# c9dfa2e0	01-Apr-2003	Jeff Roberson <jeff@FreeBSD.org>	- p will be unused in cursig() if INVARIANTS is not defined. Access it through td->td_proc to avoid the unused variable. Spotted by: Maxim Konovalov <maxim@macomnet.ru>
# a447cd8b	31-Mar-2003	Jeff Roberson <jeff@FreeBSD.org>	- Define sigwait, sigtimedwait, and sigwaitinfo in terms of kern_sigtimedwait() which is capable of supporting all of their semantics. - These should be POSIX compliant but more careful review is needed before we announce this.
# 4093529d	31-Mar-2003	Jeff Roberson <jeff@FreeBSD.org>	- Move p->p_sigmask to td->td_sigmask. Signal masks will be per thread with a follow on commit to kern_sig.c - signotify() now operates on a thread since unmasked pending signals are stored in the thread. - PS_NEEDSIGCHK moves to TDF_NEEDSIGCHK.
# da33176f	31-Mar-2003	Jeff Roberson <jeff@FreeBSD.org>	- Mark signals which may be delivered to any thread in the process with SA_PROC. Signals without this flag should be directed to a particular thread if this is possible.
# 1bf4700b	31-Mar-2003	Jeff Roberson <jeff@FreeBSD.org>	- Change trapsignal() to accept a thread and not a proc. - Change all consumers to pass in a thread. Right now this does not cause any functional changes but it will be important later when signals can be delivered to specific threads.
# e574e444	10-Mar-2003	David Xu <davidxu@FreeBSD.org>	Fix threaded process job control bug. SMP tested. Reviewed by: julian
# ef3dab76	08-Mar-2003	Tim J. Robbins <tjr@FreeBSD.org>	Hold the proc lock while accessing p_procsig in trapsignal().
# 26306795	04-Mar-2003	John Baldwin <jhb@FreeBSD.org>	Replace calls to WITNESS_SLEEP() and witness_list() with equivalent calls to WITNESS_WARN().
# ac2e4153	26-Feb-2003	Julian Elischer <julian@FreeBSD.org>	Change the process flags P_KSES to be P_THREADED. This is just a cosmetic change but I've been meaning to do it for about a year.
# 426269b2	25-Feb-2003	David Xu <davidxu@FreeBSD.org>	Fix a bug when handling SIGCONT. Reported By: Mike Makonnen <mtm@identd.net>
# 58a3c273	17-Feb-2003	Jeff Roberson <jeff@FreeBSD.org>	- Add a new function, thread_signal_add(), that is called from postsig to add a signal to a mailbox's pending set. - Add a new function, thread_signal_upcall(), this causes the current thread to upcall so that we can deliver pending signals. Reviewed by: mini
# 4a338afd	17-Feb-2003	Julian Elischer <julian@FreeBSD.org>	Move a bunch of flags from the KSE to the thread. I was in two minds as to where to put them in the first case.. I should have listenned to the other mind. Submitted by: parts by davidxu@ Reviewed by: jeff@ mini@
# 5215b187	16-Feb-2003	Jeff Roberson <jeff@FreeBSD.org>	- Split the struct kse into struct upcall and struct kse. struct kse will soon be visible only to schedulers. This greatly simplifies much the KSE code. Submitted by: davidxu
# 44443757	15-Feb-2003	Tim J. Robbins <tjr@FreeBSD.org>	Acquire Giant around calls to kern_sigaction() in sigaction(), freebsd4_sigaction() and osigaction() instead of around the whole body of those functions. They now no longer hold Giant around calls to copyin() and copyout(), and it is slightly more obvious what Giant is protecting.
# c41c566c	15-Feb-2003	Tim J. Robbins <tjr@FreeBSD.org>	osigpending() no longer needs Giant, for the same reason sigpending() does not.
# 48e8f774	15-Feb-2003	Tim J. Robbins <tjr@FreeBSD.org>	All uses of p_siglist are protected by the proc lock now, so there's no need to acquire Giant in sigpending() anymore.
# 6f8132a8	31-Jan-2003	Julian Elischer <julian@FreeBSD.org>	Reversion of commit by Davidxu plus fixes since applied. I'm not convinced there is anything major wrong with the patch but them's the rules.. I am using my "David's mentor" hat to revert this as he's offline for a while.
# bf2053ca	27-Jan-2003	Peter Wemm <peter@FreeBSD.org>	No longer force COMPAT_FREEBSD4 to be on.
# 0dbb100b	26-Jan-2003	David Xu <davidxu@FreeBSD.org>	Move UPCALL related data structure out of kse, introduce a new data structure called kse_upcall to manage UPCALL. All KSE binding and loaning code are gone. A thread owns an upcall can collect all completed syscall contexts in its ksegrp, turn itself into UPCALL mode, and takes those contexts back to userland. Any thread without upcall structure has to export their contexts and exit at user boundary. Any thread running in user mode owns an upcall structure, when it enters kernel, if the kse mailbox's current thread pointer is not NULL, then when the thread is blocked in kernel, a new UPCALL thread is created and the upcall structure is transfered to the new UPCALL thread. if the kse mailbox's current thread pointer is NULL, then when a thread is blocked in kernel, no UPCALL thread will be created. Each upcall always has an owner thread. Userland can remove an upcall by calling kse_exit, when all upcalls in ksegrp are removed, the group is atomatically shutdown. An upcall owner thread also exits when process is in exiting state. when an owner thread exits, the upcall it owns is also removed. KSE is a pure scheduler entity. it represents a virtual cpu. when a thread is running, it always has a KSE associated with it. scheduler is free to assign a KSE to thread according thread priority, if thread priority is changed, KSE can be moved from one thread to another. When a ksegrp is created, there is always N KSEs created in the group. the N is the number of physical cpu in the current system. This makes it is possible that even an userland UTS is single CPU safe, threads in kernel still can execute on different cpu in parallel. Userland calls kse_create to add more upcall structures into ksegrp to increase concurrent in userland itself, kernel is not restricted by number of upcalls userland provides. The code hasn't been tested under SMP by author due to lack of hardware. Reviewed by: julian
# b81c4d1e	06-Jan-2003	David Xu <davidxu@FreeBSD.org>	Forgot to call setrunnable() for un-idled thread.
# ea5ab16e	06-Jan-2003	David Xu <davidxu@FreeBSD.org>	Check signals for idled threads.
# 93a7aa79	27-Dec-2002	Julian Elischer <julian@FreeBSD.org>	Add code to ddb to allow backtracing an arbitrary thread. (show thread {address}) Remove the IDLE kse state and replace it with a change in the way threads sahre KSEs. Every KSE now has a thread, which is considered its "owner" however a KSE may also be lent to other threads in the same group to allow completion of in-kernel work. n this case the owner remains the same and the KSE will revert to the owner when the other work has been completed. All creations of upcalls etc. is now done from kse_reassign() which in turn is called from mi_switch or thread_exit(). This means that special code can be removed from msleep() and cv_wait(). kse_release() does not leave a KSE with no thread any more but converts the existing thread into teh KSE's owner, and sets it up for doing an upcall. It is just inhibitted from being scheduled until there is some reason to do an upcall. Remove all trace of the kse_idle queue since it is no-longer needed. "Idle" KSEs are now on the loanable queue.
# d321df47	17-Dec-2002	Poul-Henning Kamp <phk@FreeBSD.org>	Don't cast a pointer to (intptr_t) and then on to (int) when we cannot be sure that (int) is large enough. Instead cast only to (intptr_t) and cast the switch/case values to (intptr_t) as well.
# 23eeeff7	25-Oct-2002	Peter Wemm <peter@FreeBSD.org>	Split 4.x and 5.x signal handling so that we can keep 4.x signal handling clean and functional as 5.x evolves. This allows some of the nasty bandaids in the 5.x codepaths to be unwound. Encapsulate 4.x signal handling under COMPAT_FREEBSD4 (there is an anti-foot-shooting measure in place, 5.x folks need this for a while) and finish encapsulating the older stuff under COMPAT_43. Since the ancient stuff is required on alpha (longjmp(3) passes a 'struct osigcontext ' to the current sigreturn(2), instead of the 'ucontext_t ' that sigreturn is supposed to take), add a compile time check to prevent foot shooting there too. Add uniform COMPAT_43 stubs for ia64/sparc64/powerpc. Tested on: i386, alpha, ia64. Compiled on sparc64 (a few days ago). Approved by: re
# 8c5d0137	02-Oct-2002	Poul-Henning Kamp <phk@FreeBSD.org>	Fix mis-indentation. Spotted by: FlexeLint
# 1d9c5696	01-Oct-2002	Juli Mallett <jmallett@FreeBSD.org>	Back our kernel support for reliable signal queues. Requested by: rwatson, phk, and many others
# a88b260a	30-Sep-2002	Juli Mallett <jmallett@FreeBSD.org>	Back out code changes that snuck into the previous forced commit.
# 226e1171	30-Sep-2002	Juli Mallett <jmallett@FreeBSD.org>	(Forced commit, to clarify previous commit of ksiginfo/signal queue code.) I've added a structure, kernel-private, to represent a pending or in-delivery signal, called `ksiginfo'. It is roughly analogous to the basic information that is exported by the POSIX interface 'siginfo_t', but more basic. I've added functions to allocate these structures, and further to wrap all signal operations using them. Once the operations are wrapped, I've added a TailQ (see queue(3)) of these structures to 'struct proc', and all pending signals are in that TailQ. When a signal is being delivered, it is dequeued from the list. Once I finish the spreading of ksiginfo throughout the tree, the dequeued structure will be delivered to the process in question, whereas currently and normally, the signal number is what is used.
# 1226f694	30-Sep-2002	Juli Mallett <jmallett@FreeBSD.org>	First half of implementation of ksiginfo, signal queues, and such. This gets signals operating based on a TailQ, and is good enough to run X11, GNOME, and do job control. There are some intricate parts which could be more refined to match the sigset_t versions, but those require further evaluation of directions in which our signal system can expand and contract to fit our needs. After this has been in the tree for a while, I will make in kernel API changes, most notably to trapsignal(9) and sendsig(9), to use ksiginfo more robustly, such that we can actually pass information with our (queued) signals to the userland. That will also result in using a struct ksiginfo pointer, rather than a signal number, in a lot of kern_sig.c, to refer to an individual pending signal queue member, but right now there is no defined behaviour for such. CODAFS is unfinished in this regard because the logic is unclear in some places. Sponsored by: New Gold Technology Reviewed by: bde, tjr, jake [an older version, logic similar]
# 21b68415	28-Sep-2002	David E. O'Brien <obrien@FreeBSD.org>	Fix style nit where conditionally compiled code was unconditionalized, but style(9) was consulted. Submitted by: bde
# 37c84183	28-Sep-2002	Poul-Henning Kamp <phk@FreeBSD.org>	Be consistent about "static" functions: if the function is marked static in its prototype, mark it static at the definition too. Inspired by: FlexeLint warning #512
# c76e33b6	16-Sep-2002	Jonathan Mini <mini@FreeBSD.org>	Add kernel support needed for the KSE-aware libpthread: - Use ucontext_t's to store KSE thread state. - Synthesize state for the UTS upon each upcall, rather than saving and copying a trapframe. - Deliver signals to KSE-aware processes via upcall. - Rename kse mailbox structure fields to be more BSD-like. - Store the UTS's stack in struct proc in a stack_t. Reviewed by: bde, deischen, julian Approved by: -arch
# 4f0db5e0	15-Sep-2002	Julian Elischer <julian@FreeBSD.org>	Allocate KSEs and KSEGRPs separatly and remove them from the proc structure. next step is to allow > 1 to be allocated per process. This would give multi-processor threads. (when the rest of the infrastructure is in place) While doing this I noticed libkvm and sys/kern/kern_proc.c:fill_kinfo_proc are diverging more than they should.. corrective action needed soon.
# 71fad9fd	11-Sep-2002	Julian Elischer <julian@FreeBSD.org>	Completely redo thread states. Reviewed by: davidxu@freebsd.org
# 1279572a	05-Sep-2002	David Xu <davidxu@FreeBSD.org>	s/SGNL/SIG/ s/SNGL/SINGLE/ s/SNGLE/SINGLE/ Fix abbreviation for P_STOPPED_* etc flags, in original code they were inconsistent and difficult to distinguish between them. Approved by: julian (mentor)
# 35c32a76	02-Sep-2002	David Xu <davidxu@FreeBSD.org>	In the kernel code, we have the tsleep() call with the PCATCH argument. PCATCH means 'if we get a signal, interrupt me!" and tsleep returns either EINTR or ERESTART depending on the circumstances. ERESTART is "special" because it causes the system call to fail, but right as it returns back to userland it tells the trap handler to move %eip back a bit so that userland will immediately re-run the syscall. This is a syscall restart. It only works for things like read() etc where nothing has changed yet. Note that userland is tricked into restarting the syscall by the kernel. The kernel doesn't actually do the restart. It is deadly for things like select, poll, nanosleep etc where it might cause the elapsed time to be reset and start again from scratch. So those syscalls do this to prevent userland rerunning the syscall: if (error == ERESTART) error = EINTR; Fake "signals" like SIGTSTP from ^Z etc do not normally invoke userland signal handlers. But, in -current, the PCATCH is being triggered and tsleep is returning ERESTART, and the syscall is aborted even though no userland signal handler was run. That is the fault here. We're triggering the PCATCH in cases that we shouldn't. ie: it is being triggered on any signal processing, rather than the case where the signal is posted to userland. --- Peter The work of psignal() is a patchwork of special case required by the process debugging and job-control facilities... --- Kirk McKusick "The design and impelementation of the 4.4BSD Operating system" Page 105 in STABLE source, when psignal is posting a STOP signal to sleeping process and the signal action of the process is SIG_DFL, system will directly change the process state from SSLEEP to SSTOP, and when SIGCONT is posted to the stopped process, if it finds that the process is still on sleep queue, the process state will be restored to SSLEEP, and won't wakeup the process. this commit mimics the behaviour in STABLE source tree. Reviewed by: Jon Mini, Tim Robbins, Peter Wemm Approved by: julian@freebsd.org (mentor)
# 8f19eb88	01-Sep-2002	Ian Dowse <iedowse@FreeBSD.org>	Split out a number of mostly VFS and signal related syscalls into a kernel-internal kern_*() version and a wrapper that is called via the syscall vector table. For paths and structure pointers, the internal version either takes a uio_seg parameter or requires the caller to copyin() the data to kernel memory as appropiate. This will permit emulation layers to use these syscalls without having to copy out translated arguments to the stack gap. Discussed on: -arch Review/suggestions: bde, jhb, peter, marcel
# b39f3284	25-Aug-2002	Julian Elischer <julian@FreeBSD.org>	move the assert to cover more cases
# d9d6e34f	23-Aug-2002	Julian Elischer <julian@FreeBSD.org>	Don't re-lock the sched lock if we didn't unlock it. Original error by: David Xu <bsddiy@yahoo.com> Fix by: David Xu <bsddiy@yahoo.com> Completely failed to spot it: Julian Elischer <julian@freebsd.org>
# 721e5910	21-Aug-2002	Julian Elischer <julian@FreeBSD.org>	Revert some suspension/sleep/signal code from KSE-III We need to rethink a bit of this and it doesn't matter if we break the KSE test program for now as long as non-KSE programs act as expected. Submitted by: David Xu <bsddiy@yahoo.com> (this guy's just asking to get hit with a commit bit..)
# 6933e3c1	08-Aug-2002	Julian Elischer <julian@FreeBSD.org>	Do some work on keeping better track of stopped/continued state. I'm not sure what happenned to the original setting of the P_CONTINUED flag. it appears to have been lost in the paper shuffling... Submitted by: David Xu <bsddiy@yahoo.com>
# 1c530be4	06-Aug-2002	Bruce Evans <bde@FreeBSD.org>	Try harder to "set signal flags proprly [sic] for ast()". See rev.1.154.
# 04774f23	01-Aug-2002	Julian Elischer <julian@FreeBSD.org>	Slight cleanup of some comments/whitespace. Make idle process state more consistant. Add an assert on thread state. Clean up idleproc/mi_switch() interaction. Use a local instead of referencing curthread 7 times in a row (I've been told curthread can be expensive on some architectures) Remove some commented out code. Add a little commented out code (completion coming soon) Reviewed by: jhb@freebsd.org
# 4d492b43	30-Jul-2002	Julian Elischer <julian@FreeBSD.org>	Don't need to hold schedlock specifically for stop() ans it calls wakeup() that locks it anyhow. Reviewed by: jhb@freebsd.org
# 38038891	24-Jul-2002	Julian Elischer <julian@FreeBSD.org>	revert some of the handling of STOP signals in issignal(). Let thread_suspend_check() actually do the suspension at the user boundary. Submitted by: David Xu <bsddiy@yahoo.com>
# 832dafad	10-Jul-2002	Don Lewis <truckman@FreeBSD.org>	Rearrange the code so that it checks whether the file is something valid to write a core dump to before doing the preparations to actually write to the file. Call VOP_GETATTR() before dropping the initial vnode lock.
# aa0fa334	03-Jul-2002	Julian Elischer <julian@FreeBSD.org>	Try clean up some of the mess that resulted from layers and layers of p4 merges from -current as things started getting different. Corroborated by: Similar patches just mailed by BDE.
# ee9919b0	03-Jul-2002	Julian Elischer <julian@FreeBSD.org>	White space commit. I'm working on this file but I wanted to make the whitespece commit separatly.
# 0ac3b636	02-Jul-2002	Andrew Gallatin <gallatin@FreeBSD.org>	Hold the sched lock across call to forward_signal() in tdsignal() to keep SMP systems from panic'ing when ^C'ing an app suggested by julian
# e602ba25	29-Jun-2002	Julian Elischer <julian@FreeBSD.org>	Part 1 of KSE-III The ability to schedule multiple threads per process (one one cpu) by making ALL system calls optionally asynchronous. to come: ia64 and power-pc patches, patches for gdb, test program (in tools) Reviewed by: Almost everyone who counts (at various times, peter, jhb, matt, alfred, mini, bernd, and a cast of thousands) NOTE: this is still Beta code, and contains lots of debugging stuff. expect slight instability in signals..
# 01609114	28-Jun-2002	Alfred Perlstein <alfred@FreeBSD.org>	more caddr_t removal.
# 374a15aa	06-Jun-2002	John Baldwin <jhb@FreeBSD.org>	- trapsignal() no longer needs to acquire Giant for ktrpsig(). - Catch up to new ktrace API.
# ca18d53e	06-Jun-2002	Chad David <davidc@FreeBSD.org>	s/!SIGNOTEMPY/SIGISEMPTY/ Reviewed by: marcel, jhb, alfred
# 6ee093fb	01-Jun-2002	Mike Barcroft <mike@FreeBSD.org>	Add POSIX.1-2001 WCONTINUED option for waitpid(2). A proc flag (P_CONTINUED) is set when a stopped process receives a SIGCONT and cleared after it has notified a parent process that has requested notification via waitpid(2) with WCONTINUED specified in its options operand. The status value can be checked with the new WIFCONTINUED() macro. Reviewed by: jake
# 628855e7	29-May-2002	Julian Elischer <julian@FreeBSD.org>	CURSIG() is not a macro so rename it cursig(). Obtained from: KSE tree
# f44d9e24	18-May-2002	John Baldwin <jhb@FreeBSD.org>	Change p_can{debug,see,sched,signal}()'s first argument to be a thread pointer instead of a proc pointer and require the process pointed to by the second argument to be locked. We now use the thread ucred reference for the credential checks in p_can*() as a result. p_canfoo() should now no longer need Giant.
# 66101641	14-May-2002	Robert Watson <rwatson@FreeBSD.org>	p_cansignal() returns an errno value; at some point, the check for inter-process signalling ceased to preserve and return that value, instead always returning EPERM. This meant that it was possible to "probe" the pid space for processes that were not otherwise visible. This change reverts that reversion. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
# d8f4f6a4	08-May-2002	Jonathan Mini <mini@FreeBSD.org>	Remove trace_req(). Reviewed by: alfred, jhb, peter
# 8b43b535	08-May-2002	Alfred Perlstein <alfred@FreeBSD.org>	expand_name fixes: .) don't use MAXPATHLEN + 1, fix logic to compensate. .) style(9) function parameters. .) fix line wrapping. .) remove duplicated error and string handling code. .) don't NUL terminate already NUL terminated string. .) all string length variables changed from int to size_t. .) constify variables. .) catch when corename would be truncated. .) cast pid_t and uid_t args for format string. .) add parens around return arguments. Help and suggestions from: bde
# b2bc3101	07-May-2002	Alfred Perlstein <alfred@FreeBSD.org>	M_ZERO the temp buffer in expand_name() otherwise if an error occurs while logging we may pass a non NUL terminated string to log(9) for a %s format arg.
# f5216b9a	04-May-2002	Bruce Evans <bde@FreeBSD.org>	Return the correct error code (ENOSYS, not EINVAL) from nosys(). Getting killed by SIGSYS for unimlemented syscalls is bad enough. Obtained from: Lite2 branch The Lite2 branch has some other interesting unmerged (?) bits in this file. They are well hidden among cosmetic regressions.
# 9b3b1c5f	02-May-2002	John Baldwin <jhb@FreeBSD.org>	- Reorder execve() so that it performs blocking operations before it locks the process. - Defer other blocking operations such as vrele()'s until after we release locks. - execsigs() now requires the proc lock to be held when it is called rather than locking the process internally.
# f1320723	01-May-2002	Alfred Perlstein <alfred@FreeBSD.org>	Redo the sigio locking. Turn the sigio sx into a mutex. Sigio lock is really only needed to protect interrupts from dereferencing the sigio pointer in an object when the sigio itself is being destroyed. In order to do this in the most unintrusive manner change pgsigio's sigio * argument into a **, that way we can lock internally to the function.
# ba1551ca	27-Apr-2002	Ian Dowse <iedowse@FreeBSD.org>	Avoid the user-visible effect of setting SA_NOCLDWAIT when the SIGCHLD handler is SIG_IGN. This is a reimplementation of the problematic revision 1.131 of kern_exit.c. To avoid accessing process UPAGES, we set a new procsig flag when the SIGCHLD handler is SIG_IGN and use that instead.
# ba626c1d	16-Apr-2002	John Baldwin <jhb@FreeBSD.org>	Lock proctree_lock instead of pgrpsess_lock.
# 9c1ab3e0	13-Apr-2002	John Baldwin <jhb@FreeBSD.org>	- Change killpg1()'s first argument to be a thread instead of a process so we can use td_ucred. - In killpg1(), the proc lock is sufficient to check if p_stat is SZOMB or not. We don't need sched_lock. - Close some races in psignal(). In psignal() there is a big switch statement based on p_stat. All the different cases are assuming that the process (or thread) isn't going to change state out from under it. To ensure this is true, just lock sched_lock for the entire switch. We practically held it the entire time already anyways. This also simplifies the locking somewhat and actually results in fewer lock operations. - Allow signotify() to be called with the sched_lock held since psignal() now does that. - Use td_ucred in a couple of places.
# 79065dba	04-Apr-2002	Bruce Evans <bde@FreeBSD.org>	Moved signal handling and rescheduling from userret() to ast() so that they aren't in the usual path of execution for syscalls and traps. The main complication for this is that we have to set flags to control ast() everywhere that changes the signal mask. Avoid locking in userret() in most of the remaining cases. Submitted by: luoqi (first part only, long ago, reorganized by me) Reminded by: dillon
# 179235b3	04-Apr-2002	Bruce Evans <bde@FreeBSD.org>	Optimized the check for unmasked pending signals in CURSIG() using a new inline function sigsetmasked() and a new macro SIGPENDING(). CURSIG() will soon be moved out of the normal path of execution for syscalls and traps. Then its efficiency will be less important but the new interfaces will be useful for checking for unmasked pending signals in more places. Submitted by: luoqi (long ago, in a slightly different form) Assert that sched_lock is not held in CURSIG().
# 70f52b48	23-Mar-2002	Bruce Evans <bde@FreeBSD.org>	Fixed some style bugs in the removal of __P(()). The main ones were not removing tabs before "__P((", and not outdenting continuation lines to preserve non-KNF lining up of code with parentheses. Switch to KNF formatting and/or rewrap the whole prototype in some cases.
# 4d77a549	19-Mar-2002	Alfred Perlstein <alfred@FreeBSD.org>	Remove __P.
# 0f5c7c4b	26-Feb-2002	Poul-Henning Kamp <phk@FreeBSD.org>	Fix warning in !SMP case. Submitted by: Maxime Henrion <mux@mu.org>
# f591779b	23-Feb-2002	Seigo Tanimura <tanimura@FreeBSD.org>	Lock struct pgrp, session and sigio. New locks are: - pgrpsess_lock which locks the whole pgrps and sessions, - pg_mtx which protects the pgrp members, and - s_mtx which protects the session members. Please refer to sys/proc.h for the coverage of these locks. Changes on the pgrp/session interface: - pgfind() needs the pgrpsess_lock held. - The caller of enterpgrp() is responsible to allocate a new pgrp and session. - Call enterthispgrp() in order to enter an existing pgrp. - pgsignal() requires a pgrp lock held. Reviewed by: jhb, alfred Tested on: cvsup.jp.FreeBSD.org (which is a quad-CPU machine running -current)
# 8c3d74f4	14-Feb-2002	Bruce Evans <bde@FreeBSD.org>	Fixed a typo in rev.1.65 that gave a reference to a nonexistent variable. This was not detected by LINT because LINT is missing COMPAT_SUNOS.
# 2c100766	11-Feb-2002	Julian Elischer <julian@FreeBSD.org>	In a threaded world, differnt priorirites become properties of different entities. Make it so. Reviewed by: jhb@freebsd.org (john baldwin)
# 5da271f5	10-Feb-2002	Robert Watson <rwatson@FreeBSD.org>	Add a comment indicating that VOP_GETATTR() is called without appropriate locking in the core dump code. This should be fixed.
# 079b7bad	07-Feb-2002	Julian Elischer <julian@FreeBSD.org>	Pre-KSE/M3 commit. this is a low-functionality change that changes the kernel to access the main thread of a process via the linked list of threads rather than assuming that it is embedded in the process. It IS still embeded there but remove all teh code that assumes that in preparation for the next commit which will actually move it out. Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,
# 2b87b6d4	09-Jan-2002	Robert Watson <rwatson@FreeBSD.org>	o Revert kern_sig.c#1.143, as cr_cansignal() doesn't currently permit a number of desirable cases in which SIGIO/SIGURG are delivered. We'll keep tweaking. Reported by: Alexander Kabaev <ak03@gte.com>
# f8efde89	05-Jan-2002	Robert Watson <rwatson@FreeBSD.org>	- Teach SIGIO code to use cr_cansignal() instead of a custom CANSIGIO() macro. As a result, mandatory signal delivery policies will be applied consistently across the kernel. - Note that this subtly changes the protection semantics, and we should watch out for any resulting breakage. Previously, delivery of SIGIO in this circumstance was limited to situations where the subject was privileged, or where one of the subject's (ruid, euid) matched one of the object's (ruid, euid). In the new scenario, subject (ruid, euid) are matched against the object's (ruid, svuid), and the object uid's must be a subset of the subject uid's. Likewise, jail now affects delivery, and special handling for P_SUGID of the object is present. This change can always be reversed or tweaked if it proves to disrupt application behavior substantially. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
# c86b6ff5	05-Jan-2002	John Baldwin <jhb@FreeBSD.org>	Change the preemption code for software interrupt thread schedules and mutex releases to not require flags for the cases when preemption is not allowed: The purpose of the MTX_NOSWITCH and SWI_NOSWITCH flags is to prevent switching to a higher priority thread on mutex releease and swi schedule, respectively when that switch is not safe. Now that the critical section API maintains a per-thread nesting count, the kernel can easily check whether or not it should switch without relying on flags from the programmer. This fixes a few bugs in that all current callers of swi_sched() used SWI_NOSWITCH, when in fact, only the ones called from fast interrupt handlers and the swi_sched of softclock needed this flag. Note that to ensure that swi_sched()'s in clock and fast interrupt handlers do not switch, these handlers have to be explicitly wrapped in critical_enter/exit pairs. Presently, just wrapping the handlers is sufficient, but in the future with the fully preemptive kernel, the interrupt must be EOI'd before critical_exit() is called. (critical_exit() can switch due to a deferred preemption in a fully preemptive kernel.) I've tested the changes to the interrupt code on i386 and alpha. I have not tested ia64, but the interrupt code is almost identical to the alpha code, so I expect it will work fine. PowerPC and ARM do not yet have interrupt code in the tree so they shouldn't be broken. Sparc64 is broken, but that's been ok'd by jake and tmm who will be fixing the interrupt code for sparc64 shortly. Reviewed by: peter Tested on: i386, alpha
# 48f1ba5b	13-Dec-2001	Robert Watson <rwatson@FreeBSD.org>	o Wording fix in comment. Submitted by: tanimura via p4
# 6c1534a7	03-Nov-2001	Peter Wemm <peter@FreeBSD.org>	_SIG_MAXSIG (128) is the highest legal signal. The arrays are offset by one - see _SIG_IDX(). Revert part of my mis-correction in kern_sig.c (but signal 0 still has to be allowed) and fix _SIG_VALID() (it was rejecting ignal 128).
# 049954de	02-Nov-2001	Peter Wemm <peter@FreeBSD.org>	Partial reversion of rev 1.138. kill and killpg allow a signal argument of 0. You cannot return EINVAL for signal 0. This broke (in 5 minutes of testing) at least ssh-agent and screen. However, there was a bug in the original code. Signal 128 is not valid. Pointy-hat to: des, jhb
# 2899d606	02-Nov-2001	Dag-Erling Smørgrav <des@FreeBSD.org>	We have a _SIG_VALID() macro, so use it instead of duplicating the test all over the place. Also replace a printf() + panic() with a KASSERT(). Reviewed by: jhb
# 80f42b55	07-Oct-2001	Ian Dowse <iedowse@FreeBSD.org>	Fix a typo in do_sigaction() where sa_sigaction and sa_handler were confused. Since sa_sigaction and sa_handler alias each other in a union, the bug was completely harmless. This had been fixed as part of the SIGCHLD changes in revision 1.125, but it was reverted when they were backed out in revision 1.126.
# 88b1d98f	25-Sep-2001	Paul Saab <ps@FreeBSD.org>	Lock the vnode while truncating the corefile. This fixes a panic with softupdates dangling deps. Submitted by: peter MFC: ASAP :)
# fdd4e5c6	17-Sep-2001	Julian Elischer <julian@FreeBSD.org>	Replace line accidentally deleted during KSE additions. Symptom.. Stopped program unable to be restarted if it was stopped while already sleeping.
# 9844fbc3	15-Sep-2001	Robert Watson <rwatson@FreeBSD.org>	o Correct authorization check in CANSIGIO(), which suffered from incorrect transcription during the (pcred,ucred) merge; this was not used for the kill() system call, so does not affect direct explicit process signalling. Pointed out by: fenner
# b40ce416	12-Sep-2001	Julian Elischer <julian@FreeBSD.org>	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
# 06ae1e91	08-Sep-2001	Matthew Dillon <dillon@FreeBSD.org>	This brings in a Yahoo coredump patch from Paul, with additional mods by me (addition of vn_rdwr_inchunks). The problem Yahoo is solving is that if you have large process images core dumping, or you have a large number of forked processes all core dumping at the same time, the original coredump code would leave the vnode locked throughout. This can cause the directory vnode to get locked up, which can cause the parent directory vnode to get locked up, and so on all the way to the root node, locking the entire machine up for extremely long periods of time. This patch solves the problem in two ways. First it uses an advisory non-blocking lock to abort multiple processes trying to core to the same file. Second (my contribution) it chunks up the writes and uses bwillwrite() to avoid holding the vnode locked while blocking in the buffer cache. Submitted by: ps Reviewed by: dillon MFC after: 2 weeks
# df53e91c	06-Sep-2001	John Baldwin <jhb@FreeBSD.org>	Call sendsig() with the proc lock held and return with it held.
# fb99ab88	01-Sep-2001	Matthew Dillon <dillon@FreeBSD.org>	Giant Pushdown clock_gettime() clock_settime() nanosleep() settimeofday() adjtime() getitimer() setitimer() __sysctl() ogetkerninfo() sigaction() osigaction() sigpending() osigpending() osigvec() osigblock() osigsetmask() sigsuspend() osigsuspend() osigstack() sigaltstack() kill() okillpg() trapsignal() nosys()
# 356861db	30-Aug-2001	Matthew Dillon <dillon@FreeBSD.org>	Remove the MPSAFE keyword from the parser for syscalls.master. Instead introduce the [M] prefix to existing keywords. e.g. MSTD is the MP SAFE version of STD. This is prepatory for a massive Giant lock pushdown. The old MPSAFE keyword made syscalls.master too messy. Begin comments MP-Safe procedures with the comment: /* * MPSAFE / This comments means that the procedure may be called without Giant held (The procedure itself may still need to obtain Giant temporarily to do its thing). sv_prepsyscall() is now MP SAFE and assumed to be MP SAFE sv_transtrap() is now MP SAFE and assumed to be MP SAFE ktrsyscall() and ktrsysret() are now MP SAFE (Giant Pushdown) trapsignal() is now MP SAFE (Giant Pushdown) Places which used to do the if (mtx_owned(&Giant)) mtx_unlock(&Giant) test in syscall[2]() in /*/trap.c now do not. Instead they explicitly unlock Giant if they previously obtained it, and then assert that it is no longer held to catch broken system calls. Rebuild syscall tables.
# ccdbd10c	24-Aug-2001	Peter Pentchev <roam@FreeBSD.org>	Prevent passing a null pointer as a filename to vn_open(), if for some reason expand_name() failed to build a core file name. PR: 29931 Submitted by: Foldi Tamas <crow@kapu.hu> Reviewed by: dd, -arch MFC after: 1 month
# e8ebc08f	20-Aug-2001	Peter Wemm <peter@FreeBSD.org>	Make COMPAT_43 optional again. XXX we need COMPAT_FBSD3 etc for this stuff.
# aa7a4dae	01-Aug-2001	Peter Wemm <peter@FreeBSD.org>	Temporarily back out kern_sig.c rev 1.125 and kern_exit.c rev 1.131. This paniced my one of my machines one time too many :-( and there is no sign of a solution in the pipeline. The deltas are still easily available in cvs. The problem is that if the parent has been swapped out, the child process cannot grope around in the parent's UPAGES to see the sigact[] array or it will fault. This probably is a showstopper for this implementation anyway.
# 4fec48c6	22-Jul-2001	Matthew Dillon <dillon@FreeBSD.org>	As per further discussions on hackers redo the SIGCHLD patch to not generate an unexpected user-visible side effect with the sigaction flags. Also cleanup a minor union issue. Submitted by: Rudolf Cejka <cejkar@dcse.fee.vutbr.cz> MFC addendum: MFC will be combined w/ original commit MFC after: 3 days
# 64acb05b	02-Jul-2001	John Baldwin <jhb@FreeBSD.org>	Grab Giant around postsig() since sendsig() can call into the vm to grow the stack and we already needed Giant for KTRACE.
# 2ad7d304	22-Jun-2001	John Baldwin <jhb@FreeBSD.org>	- Change CURSIG() and postsig() to require that the proc lock is held rather than grabbing it and releasing it themselves. This allows callers of these functions to get the lock to close race conditions. - Grab Giant around ktrace in postsig. - Count the switches performed on SIGSTOP's as involuntary context switches in the resource usage stats. Reported by: tegge (signal race), bde (missing csw stats)
# 6fad32af	18-Jun-2001	John Baldwin <jhb@FreeBSD.org>	Lock Giant in postsig() for the KTRACE case as ktrpsig() needs Giant when it writes out to the trace file. Reported by: peter, gallatin, and others
# c7fd62da	11-Jun-2001	David Malone <dwmalone@FreeBSD.org>	Try to make the setting of the SIGCHLD handler the same as setting of the NOCLDWAI flag. Susv2 seems to require this. Submitted by: Cejka Rudolf <cejkar@dcse.fee.vutbr.cz> Reviewed by: dillon
# b1fc0ec1	25-May-2001	Robert Watson <rwatson@FreeBSD.org>	o Merge contents of struct pcred into struct ucred. Specifically, add the real uid, saved uid, real gid, and saved gid to ucred, as well as the pcred->pc_uidinfo, which was associated with the real uid, only rename it to cr_ruidinfo so as not to conflict with cr_uidinfo, which corresponds to the effective uid. o Remove p_cred from struct proc; add p_ucred to struct proc, replacing original macro that pointed. p->p_ucred to p->p_cred->pc_ucred. o Universally update code so that it makes use of ucred instead of pcred, p->p_ucred instead of p->p_pcred, cr_ruidinfo instead of p_uidinfo, cr_{r,sv}{u,g}id instead of p_*, etc. o Remove pcred0 and its initialization from init_main.c; initialize cr_ruidinfo there. o Restruction many credential modification chunks to always crdup while we figure out locking and optimizations; generally speaking, this means moving to a structure like this: newcred = crdup(oldcred); ... p->p_ucred = newcred; crfree(oldcred); It's not race-free, but better than nothing. There are also races in sys_process.c, all inter-process authorization, fork, exec, and exit. o Remove sigio->sio_ruid since sigio->sio_ucred now contains the ruid; remove comments indicating that the old arrangement was a problem. o Restructure exec1() a little to use newcred/oldcred arrangement, and use improved uid management primitives. o Clean up exit1() so as to do less work in credential cleanup due to pcred removal. o Clean up fork1() so as to do less work in credential cleanup and allocation. o Clean up ktrcanset() to take into account changes, and move to using suser_xxx() instead of performing a direct uid==0 comparision. o Improve commenting in various kern_prot.c credential modification calls to better document current behavior. In a couple of places, current behavior is a little questionable and we need to check POSIX.1 to make sure it's "right". More commenting work still remains to be done. o Update credential management calls, such as crfree(), to take into account new ruidinfo reference. o Modify or add the following uid and gid helper routines: change_euid() change_egid() change_ruid() change_rgid() change_svuid() change_svgid() In each case, the call now acts on a credential not a process, and as such no longer requires more complicated process locking/etc. They now assume the caller will do any necessary allocation of an exclusive credential reference. Each is commented to document its reference requirements. o CANSIGIO() is simplified to require only credentials, not processes and pcreds. o Remove lots of (p_pcred==NULL) checks. o Add an XXX to authorization code in nfs_lock.c, since it's questionable, and needs to be considered carefully. o Simplify posix4 authorization code to require only credentials, not processes and pcreds. Note that this authorization, as well as CANSIGIO(), needs to be updated to use the p_cansignal() and p_cansched() centralized authorization routines, as they currently do not take into account some desirable restrictions that are handled by the centralized routines, as well as being inconsistent with other similar authorization instances. o Update libkvm to take these changes into account. Obtained from: TrustedBSD Project Reviewed by: green, bde, jhb, freebsd-arch, freebsd-audit
# 9081e5e8	15-May-2001	John Baldwin <jhb@FreeBSD.org>	- Remove unneeded include of sys/ipl.h. - Require the proc lock be held for killproc() to allow for the vmdaemon to kill a process when memory is exhausted while holding the lock of the process to kill.
# 3b26be6a	07-May-2001	Akinori MUSHA <knu@FreeBSD.org>	Properly copy the P_ALTSTACK flag in struct proc::p_flag to the child process on fork(2). It is the supposed behavior stated in the manpage of sigaction(2), and Solaris, NetBSD and FreeBSD 3-STABLE correctly do so. The previous fix against libc_r/uthread/uthread_fork.c fixed the problem only for the programs linked with libc_r, so back it out and fix fork(2) itself to help those not linked with libc_r as well. PR: kern/26705 Submitted by: KUROSAWA Takahiro <fwkg7679@mb.infoweb.ne.jp> Tested by: knu, GOTOU Yuuzou <gotoyuzo@notwork.org>, and some other people Not objected by: hackers MFC in: 3 days
# 6caa8a15	27-Apr-2001	John Baldwin <jhb@FreeBSD.org>	Overhaul of the SMP code. Several portions of the SMP kernel support have been made machine independent and various other adjustments have been made to support Alpha SMP. - It splits the per-process portions of hardclock() and statclock() off into hardclock_process() and statclock_process() respectively. hardclock() and statclock() call the _process() functions for the current process so that UP systems will run as before. For SMP systems, it is simply necessary to ensure that all other processors execute the _process() functions when the main clock functions are triggered on one CPU by an interrupt. For the alpha 4100, clock interrupts are delievered in a staggered broadcast fashion, so we simply call hardclock/statclock on the boot CPU and call the _process() functions on the secondaries. For x86, we call statclock and hardclock as usual and then call forward_hardclock/statclock in the MD code to send an IPI to cause the AP's to execute forwared_hardclock/statclock which then call the _process() functions. - forward_signal() and forward_roundrobin() have been reworked to be MI and to involve less hackery. Now the cpu doing the forward sets any flags, etc. and sends a very simple IPI_AST to the other cpu(s). AST IPIs now just basically return so that they can execute ast() and don't bother with setting the astpending or needresched flags themselves. This also removes the loop in forward_signal() as sched_lock closes the race condition that the loop worked around. - need_resched(), resched_wanted() and clear_resched() have been changed to take a process to act on rather than assuming curproc so that they can be used to implement forward_roundrobin() as described above. - Various other SMP variables have been moved to a MI subr_smp.c and a new header sys/smp.h declares MI SMP variables and API's. The IPI API's from machine/ipl.h have moved to machine/smp.h which is included by sys/smp.h. - The globaldata_register() and globaldata_find() functions as well as the SLIST of globaldata structures has become MI and moved into subr_smp.c. Also, the globaldata list is only available if SMP support is compiled in. Reviewed by: jake, peter Looked over by: eivind
# 33a9ed9d	23-Apr-2001	John Baldwin <jhb@FreeBSD.org>	Change the pfind() and zpfind() functions to lock the process that they find before releasing the allproc lock and returning. Reviewed by: -smp, dfr, jake
# 4c5eb9c3	11-Apr-2001	Robert Watson <rwatson@FreeBSD.org>	o Replace p_cankill() with p_cansignal(), remove wrappage of p_can() from signal authorization checking. o p_cansignal() takes three arguments: subject process, object process, and signal number, unlike p_cankill(), which only took into account the processes and not the signal number, improving the abstraction such that CANSIGNAL() from kern_sig.c can now also be eliminated; previously CANSIGNAL() special-cased the handling of SIGCONT based on process session. privused is now deprecated. o The new p_cansignal() further limits the set of signals that may be delivered to processes with P_SUGID set, and restructures the access control check to allow it to be extended more easily. o These changes take into account work done by the OpenBSD Project, as well as by Robert Watson and Thomas Moestl on the TrustedBSD Project. Obtained from: TrustedBSD Project
# 5b3047d5	02-Apr-2001	John Baldwin <jhb@FreeBSD.org>	Change stop() to require the sched_lock as well as p's process lock to avoid silly lock contention on sched_lock since in 2 out of the 3 places that we call stop(), we get sched_lock right after calling it and we were locking sched_lock inside of stop() anyways.
# 13330476	02-Apr-2001	John Baldwin <jhb@FreeBSD.org>	- Move the second stop() of process 'p' in issignal() to be after we send SIGCHLD to our parent process. Otherwise, we could block while obtaining the process lock for our parent process and switch out while we were in SSTOP. Even worse, when we try to resume from the mutex being blocked on our p_stat will be SRUN, not SSTOP. - Fix a comment above stop() to indicate that it requires that the proc lock be held, not a proctree lock. Reported by: markm Sleuthing by: jake
# 1005a129	28-Mar-2001	John Baldwin <jhb@FreeBSD.org>	Convert the allproc and proctree locks from lockmgr locks to sx locks.
# c31146a1	28-Mar-2001	John Baldwin <jhb@FreeBSD.org>	- Resort some includes to deal with the new witness code coming in shortly. - Make sure we have Giant locked before calling coredump() in sigexit(). Spotted by: peter (2)
# 628d2653	06-Mar-2001	John Baldwin <jhb@FreeBSD.org>	- Proc locking. Most of signal handling is now MP safe and doesn't require Giant. The only exception is the CANSIGNAL() macro. Unlocking the proc lock around sendsig() in trapsignal() is also questionable. Note that the functions sigexit(), psignal(), and issignal() must be called with the proc lock of the process in question held. postsig() and trapsignal() should not be called with the proc lock held, but they also do not require Giant anymore either. - Remove spl's that are now no longer needed as they are fully replaced.
# d2ef4060	19-Feb-2001	Bruce Evans <bde@FreeBSD.org>	Fixed a longstanding latency bug in signal delivery. When a signal is sent to a process, psignal() needs to schedule an AST for the process if the process is runnable, not just if it is current, so that pending signals get checked for on the next return of the process to user mode. This wasn't practical until recently because the AST flag was per-cpu so setting it for a non-current process would usually just cause a bogus AST for the current process. For non-current processes looping in user mode, it took accidental (?) magic to deliver signals at all. Signals were usually delivered late as a side effect of rescheduling (need_resched() sets astpending, etc.). In pre-SMPng, delivery was delayed by at most 1 quantum (the need_resched() call in roundrobin() is certain to occur within 1 quantum for looping processes). In -current, things are complicated by normal interrupt handlers being threads. Missing handling of the complications makes roundrobin() a bogus no-op, but preemptive scheduling sort of works anyway due to even larger bogons elsewhere.
# d5a08a60	11-Feb-2001	Jake Burkholder <jake@FreeBSD.org>	Implement a unified run queue and adjust priority levels accordingly. - All processes go into the same array of queues, with different scheduling classes using different portions of the array. This allows user processes to have their priorities propogated up into interrupt thread range if need be. - I chose 64 run queues as an arbitrary number that is greater than 32. We used to have 4 separate arrays of 32 queues each, so this may not be optimal. The new run queue code was written with this in mind; changing the number of run queues only requires changing constants in runq.h and adjusting the priority levels. - The new run queue code takes the run queue as a parameter. This is intended to be used to create per-cpu run queues. Implement wrappers for compatibility with the old interface which pass in the global run queue structure. - Group the priority level, user priority, native priority (before propogation) and the scheduling class into a struct priority. - Change any hard coded priority levels that I found to use symbolic constants (TTIPRI and TTOPRI). - Remove the curpriority global variable and use that of curproc. This was used to detect when a process' priority had lowered and it should yield. We now effectively yield on every interrupt. - Activate propogate_priority(). It should now have the desired effect without needing to also propogate the scheduling class. - Temporarily comment out the call to vm_page_zero_idle() in the idle loop. It interfered with propogate_priority() because the idle process needed to do a non-blocking acquire of Giant and then other processes would try to propogate their priority onto it. The idle process should not do anything except idle. vm_page_zero_idle() will return in the form of an idle priority kernel thread which is woken up at apprioriate times by the vm system. - Update struct kinfo_proc to the new priority interface. Deliberately change its size by adjusting the spare fields. It remained the same size, but the layout has changed, so userland processes that use it would parse the data incorrectly. The size constraint should really be changed to an arbitrary version number. Also add a debug.sizeof sysctl node for struct kinfo_proc.
# 142ba5f3	09-Feb-2001	John Baldwin <jhb@FreeBSD.org>	- Make astpending and need_resched process attributes rather than CPU attributes. This is needed for AST's to be properly posted in a preemptive kernel. They are backed by two new flags in p_sflag: PS_ASTPENDING and PS_NEEDRESCHED. They are still accesssed by their old macros: aston(), astoff(), etc. For completeness, an astpending() macro has been added to check for a pending AST, and clear_resched() has been added to clear need_resched(). - Rename syscall2() on the x86 back to syscall() to be consistent with other architectures.
# 9ed346ba	08-Feb-2001	Bosko Milekic <bmilekic@FreeBSD.org>	Change and clean the mutex lock interface. mtx_enter(lock, type) becomes: mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized) similarily, for releasing a lock, we now have: mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument. The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind. Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two: MTX_QUIET and MTX_NOSWITCH The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers: mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively. Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case. Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled. Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those. Finally, caught up to the interface changes in all sys code. Contributors: jake, jhb, jasone (in no particular order)
# 40447cd4	24-Jan-2001	John Baldwin <jhb@FreeBSD.org>	- Proc locking. - Catch up to proc flag changes.
# 568ae39f	19-Jan-2001	John Baldwin <jhb@FreeBSD.org>	Revert revision 1.102. I don't think p_nice needs to be protected with sched_lock, and I'm fairly certain P_TRACED will be protected with the proc lock instead. Pointed out indirectly by: bde
# 238510fc	15-Jan-2001	Jason Evans <jasone@FreeBSD.org>	Implement condition variables.
# 5192404a	05-Jan-2001	John Baldwin <jhb@FreeBSD.org>	Protect p_nice and P_TRACED in psignal() above the switch statement with sched_lock.
# 3e6831f5	02-Jan-2001	John Baldwin <jhb@FreeBSD.org>	The previous commit wasn't entirely correct. At least one goto to the out: label in psignal() did not grab sched_lock before trying to release it. Also, the previous version had several cases where it grabbed sched_lock before jumping to out: unneccessarily, so rework this a bit. The runfast: and out: labels must be called with sched_lock released, and the run: label must be called with it held. Appropriate mtx_assert()'s have been added that should catch any bugs that may still be in this code. Noticed by: bde
# 4bfba0cf	31-Dec-2000	John Baldwin <jhb@FreeBSD.org>	Push down sched_lock in psignal(). sched_lock was being held across recursive calls into psignal() as well as calls to signotify(), forward_signal(), etc.
# ef829407	31-Dec-2000	John Baldwin <jhb@FreeBSD.org>	Add in a missing release of the proctree lock. Submitted by: Sja <sakari.jalovaara@eqonline.fi>
# 98f03f90	23-Dec-2000	Jake Burkholder <jake@FreeBSD.org>	Protect proc.p_pptr and proc.p_children/p_sibling with the proctree_lock. linprocfs not locked pending response from informal maintainer. Reviewed by: jhb, -smp@
# d96cfeae	16-Dec-2000	Marcel Moolenaar <marcel@FreeBSD.org>	Fix a typo that allowed signals caused by traps to be delivered to the process when said signal is masked. PR: 23457 Submitted by: Yasuhiko Watanabe <yasu@mrit.mei.co.jp>
# c0c25570	12-Dec-2000	Jake Burkholder <jake@FreeBSD.org>	- Change the allproc_lock to use a macro, ALLPROC_LOCK(how), instead of explicit calls to lockmgr. Also provides macros for the flags pased to specify shared, exclusive or release which map to the lockmgr flags. This is so that the use of lockmgr can be easily replaced with optimized reader-writer locks. - Add some locking that I missed the first time.
# 1c32c37c	01-Dec-2000	John Baldwin <jhb@FreeBSD.org>	Protect p_stat with sched_lock.
# d034d459	29-Nov-2000	Marcel Moolenaar <marcel@FreeBSD.org>	Don't use p->p_sigstk.ss_flags to keep state of whether the process is on the alternate stack or not. For compatibility with sigstack(2) state is being updated if such is needed. We now determine whether the process is on the alternate stack by looking at its stack pointer. This allows a process to siglongjmp from a signal handler on the alternate stack to the place of the sigsetjmp on the normal stack. When maintaining state, this would have invalidated the state information and causing a subsequent signal to be delivered on the normal stack instead of the alternate stack. PR: 22286
# 553629eb	22-Nov-2000	Jake Burkholder <jake@FreeBSD.org>	Protect the following with a lockmgr lock: allproc zombproc pidhashtbl proc.p_list proc.p_hash nextpid Reviewed by: jhb Obtained from: BSD/OS and netbsd
# 7da6f977	17-Nov-2000	Jake Burkholder <jake@FreeBSD.org>	- Split the run queue and sleep queue linkage, so that a process may block on a mutex while on the sleep queue without corrupting it. - Move dropping of Giant to after the acquire of sched_lock. Tested by: John Hay <jhay@icomtek.csir.co.za> jhb
# 20cdcc5b	15-Nov-2000	John Baldwin <jhb@FreeBSD.org>	Don't release and acquire Giant in mi_switch(). Instead, release and acquire Giant as needed in functions that call mi_switch(). The releases need to be done outside of the sched_lock to avoid potential deadlocks from trying to acquire Giant while interrupts are disabled. Submitted by: witness
# 806d7daa	09-Nov-2000	Marcel Moolenaar <marcel@FreeBSD.org>	Make MINSIGSTKSZ machine dependent, and have the sigaltstack syscall compare against a variable sv_minsigstksz in struct sysentvec as to properly take the size of the machine- and ABI dependent struct sigframe into account. The SVR4 and iBCS2 modules continue to have a minsigstksz of 8192 to preserve behavior. The real values (if different) are not known at this time. Other ABI modules use the real values. The native MINSIGSTKSZ is now defined as follows: Arch MINSIGSTKSZ ---- ----------- alpha 4096 i386 2048 ia64 12288 Reviewed by: mjacob Suggested by: bde
# 35e0e5b3	20-Oct-2000	John Baldwin <jhb@FreeBSD.org>	Catch up to moving headers: - machine/ipl.h -> sys/ipl.h - machine/mutex.h -> sys/mutex.h
# 33510ef1	17-Sep-2000	Bruce Evans <bde@FreeBSD.org>	Unpessimized CURSIG(). The fast path through CURSIG() was broken in the 128-bit sigset_t changes by moving conditionally (rarely) executed code to the beginning where it is always executed, and since this code now involves 3 128-bit operations, the pessimization was relatively large. This change speeds up lmbench's pipe latency benchmark by 3.5%. Fixed style bugs in CURSIG().
# fbbeeb6c	17-Sep-2000	Bruce Evans <bde@FreeBSD.org>	Uninlined CURSIG() and unpolluted <sys/signalvar.h>. CURSIG() had become very bloated, first with 128-bit sigset_t's, then with locking in the SMP case, then with locking in all cases. The space bloat was probably also time bloat, partly because the fast path through CURSIG() was pessimized by the sigset_t changes. This change speeds up lmbench's pipe-based latency benchmark by 4% on a Celeron. <sys/signalvar.h> had become very polluted to support the bloat.
# 36240ea5	10-Sep-2000	Doug Rabson <dfr@FreeBSD.org>	Move the include of <sys/systm.h> so that KTR gets a declaration for snprintf().
# 0384fff8	06-Sep-2000	Jason Evans <jasone@FreeBSD.org>	Major update to the way synchronization is done in the kernel. Highlights include: * Mutual exclusion is used instead of spl(). See mutex(9). (Note: The alpha port is still in transition and currently uses both.) Per-CPU idle processes. * Interrupts are run in their own separate kernel threads and can be preempted (i386 only). Partially contributed by: BSDi (BSD/OS) Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh
# 387d2c03	29-Aug-2000	Robert Watson <rwatson@FreeBSD.org>	o Centralize inter-process access control, introducing: int p_can(p1, p2, operation, privused) which allows specification of subject process, object process, inter-process operation, and an optional call-by-reference privused flag, allowing the caller to determine if privilege was required for the call to succeed. This allows jail, kern.ps_showallprocs and regular credential-based interaction checks to occur in one block of code. Possible operations are P_CAN_SEE, P_CAN_SCHED, P_CAN_KILL, and P_CAN_DEBUG. p_can currently breaks out as a wrapper to a series of static function checks in kern_prot, which should not be invoked directly. o Commented out capabilities entries are included for some checks. o Update most inter-process authorization to make use of p_can() instead of manual checks, PRISON_CHECK(), P_TRESPASS(), and kern.ps_showallprocs. o Modify suser{,_xxx} to use const arguments, as it no longer modifies process flags due to the disabling of ASU. o Modify some checks/errors in procfs so that ENOENT is returned instead of ESRCH, further improving concealment of processes that should not be visible to other processes. Also introduce new access checks to improve hiding of processes for procfs_lookup(), procfs_getattr(), procfs_readdir(). Correct a bug reported by bp concerning not handling the CREATE case in procfs_lookup(). Remove volatile flag in procfs that caused apparently spurious qualifier warnigns (approved by bde). o Add comment noting that ktrace() has not been updated, as its access control checks are different from ptrace(), whereas they should probably be the same. Further discussion should happen on this topic. Reviewed by: bde, green, phk, freebsd-security, others Approved by: bde Obtained from: TrustedBSD Project
# 31c8f3f0	25-Aug-2000	Marcel Moolenaar <marcel@FreeBSD.org>	Make this file compile again when COMPAT_43 has not been defined. This boils down to conditionally compile the old signal syscalls. We might want to extend the types in syscalls.master to make these syscalls conditionally on something more appropriate than COMPAT_43.
# f2a2857b	11-Jul-2000	Kirk McKusick <mckusick@FreeBSD.org>	Add snapshots to the fast filesystem. Most of the changes support the gating of system calls that cause modifications to the underlying filesystem. The gating can be enabled by any filesystem that needs to consistently suspend operations by adding the vop_stdgetwritemount to their set of vnops. Once gating is enabled, the function vfs_write_suspend stops all new write operations to a filesystem, allows any filesystem modifying system calls already in progress to complete, then sync's the filesystem to disk and returns. The function vfs_write_resume allows the suspended write operations to begin again. Gating is not added by default for all filesystems as for SMP systems it adds two extra locks to such critical kernel paths as the write system call. Thus, gating should only be added as needed. Details on the use and current status of snapshots in FFS can be found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness is not included here. Unless and until you create a snapshot file, these changes should have no effect on your system (famous last words).
# e6796b67	03-Jul-2000	Kirk McKusick <mckusick@FreeBSD.org>	Move the truncation code out of vn_open and into the open system call after the acquisition of any advisory locks. This fix corrects a case in which a process tries to open a file with a non-blocking exclusive lock. Even if it fails to get the lock it would still truncate the file even though its open failed. With this change, the truncation is done only after the lock is successfully acquired. Obtained from: BSD/OS
# e3975643	25-May-2000	Jake Burkholder <jake@FreeBSD.org>	Back out the previous change to the queue(3) interface. It was not discussed and should probably not happen. Requested by: msmith and others
# 740a1973	23-May-2000	Jake Burkholder <jake@FreeBSD.org>	Change the way that the queue(3) structures are declared; don't assume that the type argument to _HEAD and _ENTRY is a struct. Suggested by: phk Reviewed by: phk Approved by: mdodd
# 2c9b67a8	30-Apr-2000	Poul-Henning Kamp <phk@FreeBSD.org>	Remove unneeded #include <vm/vm_zone.h> Generated by: src/tools/tools/kerninclude
# cb679c38	16-Apr-2000	Jonathan Lemon <jlemon@FreeBSD.org>	Introduce kqueue() and kevent(), a kernel event notification facility.
# 7c8fdcbd	02-Apr-2000	Matthew Dillon <dillon@FreeBSD.org>	Make the sigprocmask() and geteuid() system calls MP SAFE. Expand commentary for copyin/copyout to indicate that they are MP SAFE as well. Reviewed by: msmith
# db6a4261	28-Mar-2000	Matthew Dillon <dillon@FreeBSD.org>	The SMP cleanup commit broke UP compiles. Make UP compiles work again.
# 36e9f877	28-Mar-2000	Matthew Dillon <dillon@FreeBSD.org>	Commit major SMP cleanups and move the BGL (big giant lock) in the syscall path inward. A system call may select whether it needs the MP lock or not (the default being that it does need it). A great deal of conditional SMP code for various deadended experiments has been removed. 'cil' and 'cml' have been removed entirely, and the locking around the cpl has been removed. The conditional separately-locked fast-interrupt code has been removed, meaning that interrupts must hold the CPL now (but they pretty much had to anyway). Another reason for doing this is that the original separate-lock for interrupts just doesn't apply to the interrupt thread mechanism being contemplated. Modifications to the cpl may now ONLY occur while holding the MP lock. For example, if an otherwise MP safe syscall needs to mess with the cpl, it must hold the MP lock for the duration and must (as usual) save/restore the cpl in a nested fashion. This is precursor work for the real meat coming later: avoiding having to hold the MP lock for common syscalls and I/O's and interrupt threads. It is expected that the spl mechanisms and new interrupt threading mechanisms will be able to run in tandem, allowing a slow piecemeal transition to occur. This patch should result in a moderate performance improvement due to the considerable amount of code that has been removed from the critical path, especially the simplification of the spl*() calls. The real performance gains will come later. Approved by: jkh Reviewed by: current, bde (exception.s) Some work taken from: luoqi's patch
# e5a28db9	21-Mar-2000	Paul Saab <ps@FreeBSD.org>	Add sysctl kern.coredump to enable/disable core dumps system wide.
# 762e6b85	15-Dec-1999	Eivind Eklund <eivind@FreeBSD.org>	Introduce NDFREE (and remove VOP_ABORTOP)
# a9e0361b	21-Nov-1999	Poul-Henning Kamp <phk@FreeBSD.org>	Introduce the new function p_trespass(struct proc p1, struct proc p2) which returns zero or an errno depending on the legality of p1 trespassing on p2. Replace kern_sig.c:CANSIGNAL() with call to p_trespass() and one extra signal related check. Replace procfs.h:CHECKIO() macros with calls to p_trespass(). Only show command lines to process which can trespass on the target process.
# da654d90	20-Nov-1999	Poul-Henning Kamp <phk@FreeBSD.org>	s/p_cred->pc_ucred/p_ucred/g
# 2e3c8fcb	16-Nov-1999	Poul-Henning Kamp <phk@FreeBSD.org>	This is a partial commit of the patch from PR 14914: Alot of the code in sys/kern directly accesses the Q_HEAD and Q_ENTRY structures for list operations. This patch makes all list operations in sys/kern use the queue(3) macros, rather than directly accessing the *Q_{HEAD,ENTRY} structures. This batch of changes compile to the same object files. Reviewed by: phk Submitted by: Jake Burkholder <jake@checker.org> PR: 14914
# 35a2598f	30-Oct-1999	Sean Eric Fagan <sef@FreeBSD.org>	Bail out of the process early if the coredumpfile limit is 0. PR: kern/14540 Reviewed by: Nate Williams
# 6f841fb7	12-Oct-1999	Marcel Moolenaar <marcel@FreeBSD.org>	Don't let osigaction and osigvec accept the new signal numbers. Fix style bugs caused by the sigset_t in general while I'm here. Submitted by: bde
# 645682fd	11-Oct-1999	Luoqi Chen <luoqi@FreeBSD.org>	Add a per-signal flag to mark handlers registered with osigaction, so we can provide the correct context to each signal handler. Fix broken sigsuspend(): don't use p_oldsigmask as a flag, use SAS_OLDMASK as we did before the linuxthreads support merge (submitted by bde). Move ps_sigstk from to p_sigacts to the main proc structure since signal stack should not be shared among threads. Move SAS_OLDMASK and SAS_ALTSTACK flags from sigacts::ps_flags to proc::p_flag. Move PS_NOCLDSTOP and PS_NOCLDWAIT flags from proc::p_flag to procsig::ps_flag. Reviewed by: marcel, jdp, bde
# 2c42a146	29-Sep-1999	Marcel Moolenaar <marcel@FreeBSD.org>	sigset_t change (part 2 of 5) ----------------------------- The core of the signalling code has been rewritten to operate on the new sigset_t. No methodological changes have been made. Most references to a sigset_t object are through macros (see signalvar.h) to create a level of abstraction and to provide a basis for further improvements. The NSIG constant has not been changed to reflect the maximum number of signals possible. The reason is that it breaks programs (especially shells) which assume that all signals have a non-null name in sys_signame. See src/bin/sh/trap.c for an example. Instead _SIG_MAXSIG has been introduced to hold the maximum signal possible with the new sigset_t. struct sigprop has been moved from signalvar.h to kern_sig.c because a) it is only used there, and b) access must be done though function sigprop(). The latter because the table doesn't holds properties for all signals, but only for the first NSIG signals. signal.h has been reorganized to make reading easier and to add the new and/or modified structures. The "old" structures are moved to signalvar.h to prevent namespace polution. Especially the coda filesystem suffers from the change, because it contained lines like (p->p_sigmask == SIGIO), which is easy to do for integral types, but not for compound types. NOTE: kdump (and port linux_kdump) must be recompiled. Thanks to Garrett Wollman and Daniel Eischen for pressing the importance of changing sigreturn as well.
# f3a6cf70	01-Sep-1999	Sean Eric Fagan <sef@FreeBSD.org>	Make prototype match function.
# fca666a1	31-Aug-1999	Julian Elischer <julian@FreeBSD.org>	General cleanup of core-dumping code. Submitted by: Sean Fagan,
# c3aac50f	27-Aug-1999	Peter Wemm <peter@FreeBSD.org>	$Id$ -> $FreeBSD$
# 84300d62	23-Aug-1999	Martin Cracauer <cracauer@FreeBSD.org>	Fix a mistake in my last SA_SIGINFO commit. Processes could block SIGKILL and SIGSTOP. PR: kern/13293 Submitted by: dwmalone@maths.tcd.ie Obtained from: PR had correct fix
# 87f1de5f	16-Aug-1999	Bill Fumerola <billf@FreeBSD.org>	expand_name: use pid_t and uid_t in the declaration as that is what we are passed fix printf formatters accordingly. Reviewed by: green
# ce38ca0f	14-Aug-1999	Alfred Perlstein <alfred@FreeBSD.org>	Fix potential overflow, remove unnecessary bzero. Pointed out by: green remove redundant strlen, sprintf returns the length. Reviewed by: peter
# 80e907a1	18-Jul-1999	Peter Wemm <peter@FreeBSD.org>	Reset SA_NOCLDWAIT on exec(). PR: kern/12669 Submitted by: Doug Ambrisko <ambrisko@whistle.com>
# aff66c54	06-Jul-1999	Martin Cracauer <cracauer@FreeBSD.org>	Implement SA_SIGINFO for i386. Thanks to Bruce Evans for much more than a review, this was a nice puzzle. This is supposed to be binary and source compatible with older applications that access the old FreeBSD-style three arguments to a signal handler. Except those applications that access hidden signal handler arguments bejond the documented third one. If you have applications that do, please let me know so that we take the opportunity to provide the functionality they need in a documented manner. Also except application that use 'struct sigframe' directly. You need to recompile gdb and doscmd. `make world` is recommended. Example program that demonstrates how SA_SIGINFO and old-style FreeBSD handlers (with their three args) may be used in the same process is at http://www3.cons.org/tmp/fbsd-siginfo.c Programs that use the old FreeBSD-style three arguments are easy to change to SA_SIGINFO (although they don't need to, since the old style will still work): Old args to signal handler: void handler_sn(int sig, int code, struct sigcontext scp) New args: void handler_si(int sig, siginfo_t si, void third) where: old:code == new:second->si_code old:scp == &(new:si->si_scp) / Passed by value! */ The latter is also pointed to by new:third, but accessing via si->si_scp is preferred because it is type-save. FreeBSD implementation notes: - This is just the framework to make the interface POSIX compatible. For now, no additional functionality is provided. This is supposed to happen now, starting with floating point values. - We don't use 'sigcontext_t.si_value' for now (POSIX meant it for realtime-related values). - Documentation will be updated when new functionality is added and the exact arguments passed are determined. The comments in sys/signal.h are meant to be useful. Reviewed by: BDE
# 3d177f46	03-May-1999	Bill Fumerola <billf@FreeBSD.org>	Add sysctl descriptions to many SYSCTL_XXXs PR: kern/11197 Submitted by: Adrian Chadd <adrian@FreeBSD.org> Reviewed by: billf(spelling/style/minor nits) Looked at by: bde(style)
# 75c13541	28-Apr-1999	Poul-Henning Kamp <phk@FreeBSD.org>	This Implements the mumbled about "Jail" feature. This is a seriously beefed up chroot kind of thing. The process is jailed along the same lines as a chroot does it, but with additional tough restrictions imposed on what the superuser can do. For all I know, it is safe to hand over the root bit inside a prison to the customer living in that prison, this is what it was developed for in fact: "real virtual servers". Each prison has an ip number associated with it, which all IP communications will be coerced to use and each prison has its own hostname. Needless to say, you need more RAM this way, but the advantage is that each customer can run their own particular version of apache and not stomp on the toes of their neighbors. It generally does what one would expect, but setting up a jail still takes a little knowledge. A few notes: I have no scripts for setting up a jail, don't ask me for them. The IP number should be an alias on one of the interfaces. mount a /proc in each jail, it will make ps more useable. /proc/<pid>/status tells the hostname of the prison for jailed processes. Quotas are only sensible if you have a mountpoint per prison. There are no privisions for stopping resource-hogging. Some "#ifdef INET" and similar may be missing (send patches!) If somebody wants to take it from here and develop it into more of a "virtual machine" they should be most welcome! Tools, comments, patches & documentation most welcome. Have fun... Sponsored by: http://www.rndassociates.com/ Run for almost a year by: http://www.servetheweb.com/
# 88c5ea45	25-Jan-1999	Julian Elischer <julian@FreeBSD.org>	Enable Linux threads support by default. This takes the conditionals out of the code that has been tested by various people for a while. ps and friends (libkvm) will need a recompile as some proc structure changes are made. Submitted by: "Richard Seaman, Jr." <dick@tar.com>
# 219cbf59	09-Jan-1999	Eivind Eklund <eivind@FreeBSD.org>	KNFize, by bde.
# 5526d2d9	08-Jan-1999	Eivind Eklund <eivind@FreeBSD.org>	Split DIAGNOSTIC -> DIAGNOSTIC, INVARIANTS, and INVARIANT_SUPPORT as discussed on -hackers. Introduce 'KASSERT(assertion, ("panic message", args))' for simple check + panic. Reviewed by: msmith
# 6626c604	18-Dec-1998	Julian Elischer <julian@FreeBSD.org>	Reviewed by: Luoqi Chen, Jordan Hubbard Submitted by: "Richard Seaman, Jr." <lists@tar.com> Obtained from: linux :-) Code to allow Linux Threads to run under FreeBSD. By default not enabled This code is dependent on the conditional COMPAT_LINUX_THREADS (suggested by Garret) This is not yet a 'real' option but will be within some number of hours.
# 0bfe2990	01-Dec-1998	Eivind Eklund <eivind@FreeBSD.org>	Check return value of malloc() in expand_name. Reviewed by: sef
# 831d27a9	11-Nov-1998	Don Lewis <truckman@FreeBSD.org>	Installed the second patch attached to kern/7899 with some changes suggested by bde, a few other tweaks to get the patch to apply cleanly again and some improvements to the comments. This change closes some fairly minor security holes associated with F_SETOWN, fixes a few bugs, and removes some limitations that F_SETOWN had on tty devices. For more details, see the description on the PR. Because this patch increases the size of the proc and pgrp structures, it is necessary to re-install the includes and recompile libkvm, the vinum lkm, fstat, gcore, gdb, ipfilter, ps, top, and w. PR: kern/7899 Reviewed by: bde, elvind
# 2c2a0cf1	21-Oct-1998	John Polstra <jdp@FreeBSD.org>	Eliminate a superfluous comment.
# c2844275	14-Sep-1998	John Polstra <jdp@FreeBSD.org>	Remove includes that are no longer needed, now that the core dumping code has been moved into the respective imgact_xxx.c sources.
# 22d4b0fb	13-Sep-1998	John Polstra <jdp@FreeBSD.org>	Add provisions for variant core dump file formats, depending on the object format of the executable being dumped. This is the first step toward producing ELF core dumps in the proper format. I will commit the code to generate the ELF core dumps Real Soon Now. In the meantime, ELF executables won't dump core at all. That is probably no less useful than dumping a.out-style core dumps as they have done until now. Submitted by: Alex <garbanzo@hooked.net> (with very minor changes by me)
# 57308494	28-Jul-1998	Joerg Wunsch <joerg@FreeBSD.org>	Make the logging of abnormally exiting processes optional by a sysctl. PR: kern/1711 Submitted by: Nick Sayer <nsayer@kfu.com>
# a23d65bf	14-Jul-1998	Bruce Evans <bde@FreeBSD.org>	Cast pointers to uintptr_t/intptr_t instead of to u_long/long, respectively. Most of the longs should probably have been u_longs, but this changes is just to prevent warnings about casts between pointers and integers of different sizes, not to fix poorly chosen types.
# c5edb423	08-Jul-1998	Sean Eric Fagan <sef@FreeBSD.org>	Add support for run-time configuration of core file names. In a nutshell, you can specify the corefile name by using: sysctl -w kern.corefile="format" where format is a pathname (relative or absolute -- default is "%N.core"), with "%N" (process name), "%P" (process ID), and "%U" (user ID) formats. Reviewed by: Mike Smith, with strong requests by Julian :)
# c87e2930	28-Jun-1998	David Greenman <dg@FreeBSD.org>	Added a sysctl variable kern.sugid_coredump for controlling coredump behavior of setuid/setgid binaries that defaults to 0 (coredump disabled).
# ecbb00a2	07-Jun-1998	Doug Rabson <dfr@FreeBSD.org>	This commit fixes various 64bit portability problems required for FreeBSD/alpha. The most significant item is to change the command argument to ioctl functions from int to u_long. This change brings us inline with various other BSD versions. Driver writers may like to use (__FreeBSD_version == 300003) to detect this change. The prototype FreeBSD/alpha machdep will follow in a couple of days time.
# 3163861c	03-Mar-1998	Tor Egge <tegge@FreeBSD.org>	Forward the signal if the process runs on a different CPU. This reduces the signal handling latency for cpu-bound processes that performs very few system calls. The IPI for forcing an additional software trap is no longer dependent upon BETTER_CLOCK being defined.
# 0b08f5f7	05-Feb-1998	Eivind Eklund <eivind@FreeBSD.org>	Back out DIAGNOSTIC changes.
# 47cfdb16	04-Feb-1998	Eivind Eklund <eivind@FreeBSD.org>	Turn DIAGNOSTIC into a new-style option.
# 5591b823d	16-Dec-1997	Eivind Eklund <eivind@FreeBSD.org>	Make COMPAT_43 and COMPAT_SUNOS new-style options.
# 2a024a2b	05-Dec-1997	Sean Eric Fagan <sef@FreeBSD.org>	Changes to allow event-based process monitoring and control.
# cb226aaa	06-Nov-1997	Poul-Henning Kamp <phk@FreeBSD.org>	Move the "retval" (3rd) parameter from all syscall functions and put it in struct proc instead. This fixes a boatload of compiler warning, and removes a lot of cruft from the sources. I have not removed the /ARGSUSED/, they will require some looking at. libkvm, ps and other userland struct proc frobbing programs will need recompiled.
# 245f17d4	13-Sep-1997	Joerg Wunsch <joerg@FreeBSD.org>	Implement SA_NOCLDWAIT. The implementation is done (unlike what i've originally been contemplating) by reparenting kids of processes that have the appropriate bit set to PID 1, and let PID 1 handle the zombie. This is far less problematical than what would seem to be ``doing it right'', for a number of reasons. Of our currently shipping PID-1-intended programs, 50 % fail the above assumption. ;-) (Read this: sysinstall doesn't do it right. This is no problem as long as no program called by sysinstall actually uses SA_NOCLDWAIT.) ToDo: . clarify the correct SA_* flag inheritance, compared to other systems, . decide whether the compat cruft (osigvec(9)) should deal with new system additions or not, . merge OpenBSD's SA_SIGINFO implementation. ;) Reviewed by: bde
# e4ba6a82	02-Sep-1997	Bruce Evans <bde@FreeBSD.org>	Removed unused #includes.
# 8a2d9f50	25-Aug-1997	Bruce Evans <bde@FreeBSD.org>	Finished staticizing.
# 3ac4d1ef	22-Mar-1997	Bruce Evans <bde@FreeBSD.org>	Don't #include <sys/fcntl.h> in <sys/file.h> if KERNEL is defined. Fixed everything that depended on getting fcntl.h stuff from the wrong place. Most things don't depend on file.h stuff at all.
# 6875d254	22-Feb-1997	Peter Wemm <peter@FreeBSD.org>	Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
# 996c772f	09-Feb-1997	John Dyson <dyson@FreeBSD.org>	This is the kernel Lite/2 commit. There are some requisite userland changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes. The system boots and can mount UFS filesystems. Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed. Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>
# 1130b656	14-Jan-1997	Jordan K. Hubbard <jkh@FreeBSD.org>	Make the long-awaited change from $Id$ to $FreeBSD$ This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
# e6eeb36d	29-Nov-1996	Bruce Evans <bde@FreeBSD.org>	Fixed sigaction() for SIGKILL and SIGSTOP. Reading the old action now succeeds. Writing an action now succeeds iff the handler isn't changed. (POSIX allows attempts to change the handler to be ignored or cause an error. Changing other parts of the action is allowed (except attempts to mask unmaskable signals are silently ignored as usual).) Found by: NIST-PCTS
# 8713ad74	18-Oct-1996	David Greenman <dg@FreeBSD.org>	Kill unnecessary test in coredump() that wasn't removed in rev 1.19 when the check for P_SUGID was added.
# 3d1b21c6	09-Jul-1996	Andrey A. Chernov <ache@FreeBSD.org>	Log not exited signal only, but the fact that core dumped (or not) too
# a794e791	30-Apr-1996	Bruce Evans <bde@FreeBSD.org>	Removed unnecessary #includes from <sys/imgact.h> so that it is self-sufficient and added explicit #includes where required.
# 289ccde0	30-Mar-1996	Peter Wemm <peter@FreeBSD.org>	Correct the handling of NOCLDSTOP when using sigvec() Make the SA_NODEFER handling more correct, previously if you called sigaction to set a handler and had SA_NODEFER set, and manually masked the signal itself in sa_mask, and when you read the settings back later, you'd find SA_NODEFER incorrectly cleared. Pointed out by: bde
# dedc04fe	15-Mar-1996	Peter Wemm <peter@FreeBSD.org>	Actually implement SA_RESETHAND - some of the sigaction code recognised it but didn't actually do anything with it (blush). This should fix bde's test case where the test program set SA_RESETHAND and when reading it back, it was gone. Tweak/optimize SA_NODEFER so that the implementation is a little simpler and does not incur (slight) overhead for every signal at delivery time.
# edbfedac	11-Mar-1996	Peter Wemm <peter@FreeBSD.org>	Import 4.4BSD-Lite2 onto the vendor branch, note that in the kernel, all files are off the vendor branch, so this should not change anything. A "U" marker generally means that the file was not changed in between the 4.4Lite and Lite-2 releases, and does not need a merge. "C" generally means that there was a change. [note new unused (in this form) syscalls.conf, to be 'cvs rm'ed]
# b75356e1	10-Mar-1996	Jeffrey Hsu <hsu@FreeBSD.org>	From Lite2: proc LIST changes. Reviewed by: david & bde
# 8674077a	10-Mar-1996	Jeffrey Hsu <hsu@FreeBSD.org>	From Lite2: change code parameter to u_long and initialize ps_sig. Reviewed by: davidg & bde
# d66a5066	02-Mar-1996	Peter Wemm <peter@FreeBSD.org>	Mega-commit for Linux emulator update.. This has been stress tested under netscape-2.0 for Linux running all the Java stuff. The scrollbars are now working, at least on my machine. (whew! :-) I'm uncomfortable with the size of this commit, but it's too inter-dependant to easily seperate out. The main changes: COMPAT_LINUX is GONE. Most of the code has been moved out of the i386 machine dependent section into the linux emulator itself. The int 0x80 syscall code was almost identical to the lcall 7,0 code and a minor tweak allows them to both be used with the same C code. All kernels can now just modload the lkm and it'll DTRT without having to rebuild the kernel first. Like IBCS2, you can statically compile it in with "options LINUX". A pile of new syscalls implemented, including getdents(), llseek(), readv(), writev(), msync(), personality(). The Linux-ELF libraries want to use some of these. linux_select() now obeys Linux semantics, ie: returns the time remaining of the timeout value rather than leaving it the original value. Quite a few bugs removed, including incorrect arguments being used in syscalls.. eg: mixups between passing the sigset as an int, vs passing it as a pointer and doing a copyin(), missing return values, unhandled cases, SIOC* ioctls, etc. The build for the code has changed. i386/conf/files now knows how to build linux_genassym and generate linux_assym.h on the fly. Supporting changes elsewhere in the kernel: The user-mode signal trampoline has moved from the U area to immediately below the top of the stack (below PS_STRINGS). This allows the different binary emulations to have their own signal trampoline code (which gets rid of the hardwired syscall 103 (sigreturn on BSD, syslog on Linux)) and so that the emulator can provide the exact "struct sigcontext *" argument to the program's signal handlers. The sigstack's "ss_flags" now uses SS_DISABLE and SS_ONSTACK flags, which have the same values as the re-used SA_DISABLE and SA_ONSTACK which are intended for sigaction only. This enables the support of a SA_RESETHAND flag to sigaction to implement the gross SYSV and Linux SA_ONESHOT signal semantics where the signal handler is reset when it's triggered. makesyscalls.sh no longer appends the struct sysentvec on the end of the generated init_sysent.c code. It's a lot saner to have it in a seperate file rather than trying to update the structure inside the awk script. :-) At exec time, the dozen bytes or so of signal trampoline code are copied to the top of the user's stack, rather than obtaining the trampoline code the old way by getting a clone of the parent's user area. This allows Linux and native binaries to freely exec each other without getting trampolines mixed up.
# 729b1e51	30-Jan-1996	David Greenman <dg@FreeBSD.org>	Improved killproc() log message and made it and the other similar message tolerant of p_ucred being invalid. Starting using killproc() where appropriate.
# db6a20e2	03-Jan-1996	Garrett Wollman <wollman@FreeBSD.org>	Converted two options over to the new scheme: USER_LDT and KTRACE.
# 87b6de2b	14-Dec-1995	Poul-Henning Kamp <phk@FreeBSD.org>	A Major staticize sweep. Generates a couple of warnings that I'll deal with later. A number of unused vars removed. A number of unused procs removed or #ifdefed.
# efeaf95a	06-Dec-1995	David Greenman <dg@FreeBSD.org>	Untangled the vm.h include file spaghetti.
# 7bcb7905	18-Nov-1995	Bruce Evans <bde@FreeBSD.org>	Cleaned up SA_NODEFER changes. Added prototypes.
# d2d3e875	11-Nov-1995	Bruce Evans <bde@FreeBSD.org>	Included <sys/sysproto.h> to get central declarations for syscall args structs and prototypes for syscalls. Ifdefed duplicated decentralized declarations of args structs. It's convenient to have this visible but they are hard to maintain. Some are already different from the central declarations. 4.4lite2 puts them in comments in the function headers but I wanted to avoid the large changes for that.
# 1e41c1b5	19-Oct-1995	Steven Wallace <swallace@FreeBSD.org>	Implement SA_NODEFER sa_flag for sigaction(): Add SA_NODEFER define to signal.h Add ps_nodefer field to struct sigacts in signalvar.h. Add code to kern_sig.c to handle SA_NODEFER. If flag is set, when the signal is delivered, it is not masked automatically from receiving the same signal again. Reviewed by: wollman, bde
# 9b2e5354	30-May-1995	Rodney W. Grimes <rgrimes@FreeBSD.org>	Remove trailing whitespace.
# b5e8ce9f	16-Mar-1995	Bruce Evans <bde@FreeBSD.org>	Add and move declarations to fix all of the warnings from `gcc -Wimplicit' (except in netccitt, netiso and netns) and most of the warnings from `gcc -Wnested-externs'. Fix all the bugs found. There were no serious ones.
# aa98692f	28-Jan-1995	Andreas Schulz <ats@FreeBSD.org>	Correct a name of one structure member in the sigaltstack structure. Now it matches the man page and also the only other commercial implementation i have found so far ( Solaris 2.x). Changed the name from ss_base to ss_sp.
# 08ee5d13	06-Nov-1994	Andrey A. Chernov <ache@FreeBSD.org>	Security nitpicking: don't make *.core world readable
# d93f860c	09-Oct-1994	Poul-Henning Kamp <phk@FreeBSD.org>	Cosmetics. related to getting prototypes into view.
# c364e17e	29-Sep-1994	Andrey A. Chernov <ache@FreeBSD.org>	Log SA_CORE signals Obtained from: FreeBSD 1.x
# bb56ec4a	25-Sep-1994	Poul-Henning Kamp <phk@FreeBSD.org>	While in the real world, I had a bad case of being swapped out for a lot of cycles. While waiting there I added a lot of the extra ()'s I have, (I have never used LISP to any extent). So I compiled the kernel with -Wall and shut up a lot of "suggest you add ()'s", removed a bunch of unused var's and added a couple of declarations here and there. Having a lap-top is highly recommended. My kernel still runs, yell at me if you kernel breaks.
# 0b53fbe8	19-Sep-1994	Bruce Evans <bde@FreeBSD.org>	Don't use SIG_DFL or SIG_IGN for case label expressions. ANSI requires such expressions to have integral type. "gcc -ansi -pedantic -W..." fails to diagnose this constraint error.
# 3c4dd356	02-Aug-1994	David Greenman <dg@FreeBSD.org>	Added $Id$
# 26f9a767	25-May-1994	Rodney W. Grimes <rgrimes@FreeBSD.org>	The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch. Reviewed by: Rodney W. Grimes Submitted by: John Dyson and David Greenman
# df8bae1d	24-May-1994	Rodney W. Grimes <rgrimes@FreeBSD.org>	BSD 4.4 Lite Kernel Sources