History log of /freebsd-current/sys/kern/sys_process.c
Revision Date Author Comments
# fdafd315 24-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Automated cleanup of cdefs and other formatting

Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.

Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/

Sponsored by: Netflix


# 676386b5 23-Aug-2023 Andrew Turner <andrew@FreeBSD.org>

Support dynamically sized register sets

We don't always know the size of the register set at compile time,
e.g. on arm64 the size of the SVE registers need to be queried on boot.
To support register sets that needs to be calculated at run time
query the correct size when it is zero.

Reviewed by: markj, kib (earlier version)
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D41302


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 653738e8 07-Jun-2023 John Baldwin <jhb@FreeBSD.org>

ptrace: Clear TDB_BORN during PT_DETACH.

If a debugger detaches from a process that has a new thread that has
not yet executed, the new thread will raise a SIGTRAP signal to report
it's thread birth event even after the detach. With the debugger
detached, this results in a SIGTRAP sent to the process and typically
a core dump. Fix this by clearing TDB_BORN from any new threads
during detach.

Bump __FreeBSD_version for debuggers to notice when the fix is
present.

Reported by: GDB's testsuite
Reviewed by: kib, markj (previous version)
Differential Revision: https://reviews.freebsd.org/D39856


# 140ceb5d 30-Nov-2022 Konstantin Belousov <kib@FreeBSD.org>

ptrace(2): add PT_SC_REMOTE remote syscall request

Reviewed by: markj
Discussed with: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D37590


# e6feeae2 30-Nov-2022 Konstantin Belousov <kib@FreeBSD.org>

sys: rename td_coredump to td_remotereq

and TDB_COREDUMPRQ to TDB_COREDUMPREQ

Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D37590


# c6d31b83 18-Jul-2022 Konstantin Belousov <kib@FreeBSD.org>

AST: rework

Make most AST handlers dynamically registered. This allows to have
subsystem-specific handler source located in the subsystem files,
instead of making subr_trap.c aware of it. For instance, signal
delivery code on return to userspace is now moved to kern_sig.c.

Also, it allows to have some handlers designated as the cleanup (kclear)
type, which are called both at AST and on thread/process exit. For
instance, ast(), exit1(), and NFS server no longer need to be aware
about UFS softdep processing.

The dynamic registration also allows third-party modules to register AST
handlers if needed. There is one caveat with loadable modules: the
code does not make any effort to ensure that the module is not unloaded
before all threads processed through AST handler in it. In fact, this
is already present behavior for hwpmc.ko and ufs.ko. I do not think it
is worth the efforts and the runtime overhead to try to fix it.

Reviewed by: markj
Tested by: emaste (arm64), pho
Discussed with: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D35888


# b1ad6a90 28-Mar-2022 Brooks Davis <brooks@FreeBSD.org>

syscallarg_t: Add a type for system call arguments

This more clearly differentiates system call arguments from integer
registers and return values. On current architectures it has no effect,
but on architectures where pointers are not integers (CHERI) and may
not even share registers (CHERI-MIPS) it is necessiary to differentiate
between system call arguments (syscallarg_t) and integer register values
(register_t).

Obtained from: CheriBSD

Reviewed by: imp, kib
Differential Revision: https://reviews.freebsd.org/D33780


# 879b0604 01-Mar-2022 Mark Johnston <markj@FreeBSD.org>

proc: Remove assertion that P_WEXIT is not set in proc_rwmem()

exit1() sets P_WEXIT before waiting for holding threads to finish,
rather than after, so this assertion is racy.

Fixes: 12fb39ec3e6b ("proc: Relax proc_rwmem()'s assertion on the process hold count")
Reported by: Jenkins


# 12fb39ec 01-Mar-2022 Mark Johnston <markj@FreeBSD.org>

proc: Relax proc_rwmem()'s assertion on the process hold count

This reference ensures that the process and its associated vmspace will
not be destroyed while proc_rwmem() is executing. If, however, the
calling thread belongs to the target process, then it is unnecessary to
hold the process. In particular, fasttrap - a module which enables
userspace dtrace - may frequently call proc_rwmem(), and we'd prefer to
avoid the overhead of locking and bumping the hold count when possible.

Thus, make the assertion conditional on "p != curproc". Also assert
that the process is not already exiting. No functional change intended.

MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation


# 949e3959 07-Feb-2022 John Baldwin <jhb@FreeBSD.org>

Trim duplicate code for copying in iovecs for PT_[GS]ETREGSET.

Reviewed by: andrew, emaste
Differential Revision: https://reviews.freebsd.org/D34177


# 548a2ec4 24-Jan-2022 Andrew Turner <andrew@FreeBSD.org>

Add PT_GETREGSET

This adds the PT_GETREGSET and PT_SETREGSET ptrace types. These can be
used to access all the registers from a specified core dump note type.
The NT_PRSTATUS and NT_FPREGSET notes are initially supported. Other
machine-dependant types are expected to be added in the future.

The ptrace addr points to a struct iovec pointing at memory to hold the
registers along with its length. On success the length in the iovec is
updated to tell userspace the actual length the kernel wrote or, if the
base address is NULL, the length the kernel would have written.

Because the data field is an int the arguments are backwards when
compared to the Linux PTRACE_GETREGSET call.

Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19831


# fe6db727 21-Jan-2022 Konstantin Belousov <kib@FreeBSD.org>

Add security.bsd.allow_ptrace sysctl

that disables any access to ptrace(2) for all processes.

Reviewed by: emaste
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33986


# 0910a41e 12-Jan-2022 Brooks Davis <brooks@FreeBSD.org>

Revert "syscallarg_t: Add a type for system call arguments"

Missed issues in truss on at least armv7 and powerpcspe need to be
resolved before recommit.

This reverts commit 3889fb8af0b611e3126dc250ebffb01805152104.
This reverts commit 1544e0f5d1f1e3b8c10a64cb899a936976ca7ea4.


# 1544e0f5 12-Jan-2022 Brooks Davis <brooks@FreeBSD.org>

syscallarg_t: Add a type for system call arguments

This more clearly differentiates system call arguments from integer
registers and return values. On current architectures it has no effect,
but on architectures where pointers are not integers (CHERI) and may
not even share registers (CHERI-MIPS) it is necessiary to differentiate
between system call arguments (syscallarg_t) and integer register values
(register_t).

Obtained from: CheriBSD

Reviewed by: imp, kib
Differential Revision: https://reviews.freebsd.org/D33780


# f575573c 15-Sep-2021 Konstantin Belousov <kib@FreeBSD.org>

Remove PT_GET_SC_ARGS_ALL

Reimplement bdf0f24bb16d556a5b by checking for the caller' ABI in
the implementation of PT_GET_SC_ARGS, and copying out everything if
it is Linuxolator.

Also fix a minor information leak: if PT_GET_SC_ARGS_ALL is done on the
thread reused after other process, it allows to read some number of that
thread last syscall arguments. Clear td_sa.args in thread_alloc().

Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D31968


# bdf0f24b 12-Sep-2021 Edward Tomasz Napierala <trasz@FreeBSD.org>

linux: implement PTRACE_GET_SYSCALL_INFO

This is one of the pieces required to make modern (ie Focal)
strace(1) work.

Reviewed By: jhb (earlier version)
Sponsored by: EPSRC
Differential Revision: https://reviews.freebsd.org/D28212


# b7924341 27-Aug-2021 Andrew Turner <andrew@FreeBSD.org>

Create sys/reg.h for the common code previously in machine/reg.h

Move the common kernel function signatures from machine/reg.h to a new
sys/reg.h. This is in preperation for adding PT_GETREGSET to ptrace(2).

Reviewed by: imp, markj
Sponsored by: DARPA, AFRL (original work)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19830


# d7a7ea5b 18-May-2021 Konstantin Belousov <kib@FreeBSD.org>

sys_process.c: extract ptrace_unsuspend()

Reviewed by: jhb
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differrential revision: https://reviews.freebsd.org/D30351


# 87a64872 23-Apr-2021 Konstantin Belousov <kib@FreeBSD.org>

Add ptrace(PT_COREDUMP)

It writes the core of live stopped process to the file descriptor
provided as an argument.

Based on the initial version from https://reviews.freebsd.org/D29691,
submitted by Michał Górny <mgorny@gentoo.org>.

Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D29955


# 9ebf9100 24-Apr-2021 Konstantin Belousov <kib@FreeBSD.org>

ptrace: do not allow for parallel ptrace requests

Set a new P2_PTRACEREQ flag around the request Wait for the target .
process P2_PTRACEREQ flag to clear before setting ours .

Otherwise, we rely on the moment that the process lock is not dropped
until the stopped target state is important. This is going to be no
longer true after some future change.

Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D29955


# 54c8baa0 24-Apr-2021 Konstantin Belousov <kib@FreeBSD.org>

kern_ptrace(): extract code to determine ptrace eligibility into helper

Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D29955


# 2bd0506c 30-Apr-2021 Konstantin Belousov <kib@FreeBSD.org>

kern_ptrace: change type of proctree_locked to bool

Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D29955


# f1f98706 18-Apr-2021 Warner Losh <imp@FreeBSD.org>

Minor style cleanup

We prefer 'while (0)' to 'while(0)' according to grep and stlye(9)'s
space after keyword rule. Remove a few stragglers of the latter.
Many of these usages were inconsistent within the file.

MFC After: 3 days
Sponsored by: Netflix


# a091c353 10-Apr-2021 Konstantin Belousov <kib@FreeBSD.org>

ptrace: restructure comments around reparenting on PT_DETACH

style code, and use {} for both branches.

Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 9d7e450b 10-Apr-2021 Konstantin Belousov <kib@FreeBSD.org>

ptrace: remove dead call to FIX_SSTEP()

It was an alias for procfs_fix_sstep() long time ago.

Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 2fd1ffef 05-Mar-2021 Konstantin Belousov <kib@FreeBSD.org>

Stop arming kqueue timers on knote owner suspend or terminate

This way, even if the process specified very tight reschedule
intervals, it should be stoppable/killable.

Reported and reviewed by: markj
Tested by: markj, pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D29106


# dc47fdf1 05-Mar-2021 Konstantin Belousov <kib@FreeBSD.org>

Stop arming periodic process timers on suspend or terminate

Reported and reviewed by: markj
Tested by: markj, pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D29106


# 1e2521ff 27-Sep-2020 Edward Tomasz Napierala <trasz@FreeBSD.org>

Get rid of sa->narg. It serves no purpose; use sa->callp->sy_narg instead.

Reviewed by: kib
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D26458


# feabaaf9 24-Aug-2020 Mateusz Guzik <mjg@FreeBSD.org>

cache: drop the always curthread argument from reverse lookup routines

Note VOP_VPTOCNP keeps getting it as temporary compatibility for zfs.

Tested by: pho


# 58b552dc 09-Jun-2020 John Baldwin <jhb@FreeBSD.org>

Refactor ptrace() ABI compatibility.

Add a freebsd32_ptrace() and move as many freebsd32 shims as possible
to freebsd32_ptrace(). Aside from register sets, freebsd32 passes
pointers to native structures to kern_ptrace() and converts to/from
native/32-bit structure formats in freebsd32_ptrace() outside of
kern_ptrace().

Reviewed by: kib
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D25195


# 59838c1a 01-Apr-2020 John Baldwin <jhb@FreeBSD.org>

Retire procfs-based process debugging.

Modern debuggers and process tracers use ptrace() rather than procfs
for debugging. ptrace() has a supserset of functionality available
via procfs and new debugging features are only added to ptrace().
While the two debugging services share some fields in struct proc,
they each use dedicated fields and separate code. This results in
extra complexity to support a feature that hasn't been enabled in the
default install for several years.

PR: 244939 (exp-run)
Reviewed by: kib, mjg (earlier version)
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D23837


# 3ff65f71 30-Jan-2020 Mateusz Guzik <mjg@FreeBSD.org>

Remove duplicated empty lines from kern/*.c

No functional changes.


# 2288078c 08-Oct-2019 Doug Moore <dougm@FreeBSD.org>

Define macro VM_MAP_ENTRY_FOREACH for enumerating the entries in a vm_map.
In case the implementation ever changes from using a chain of next pointers,
then changing the macro definition will be necessary, but changing all the
files that iterate over vm_map entries will not.

Drop a counter in vm_object.c that would have an effect only if the
vm_map entry count was wrong.

Discussed with: alc
Reviewed by: markj
Tested by: pho (earlier version)
Differential Revision: https://reviews.freebsd.org/D21882


# df08823d 27-Sep-2019 Konstantin Belousov <kib@FreeBSD.org>

Improve MD page fault handlers.

Centralize calculation of signal and ucode delivered on unhandled page
fault in new function vm_fault_trap(). MD trap_pfault() now almost
always uses the signal numbers and error codes calculated in
consistent MI way.

This introduces the protection fault compatibility sysctls to all
non-x86 architectures which did not have that bug, but apparently they
were already much more wrong in selecting delivered signals on
protection violations.

Change the delivered signal for accesses to mapped area after the
backing object was truncated. According to POSIX description for
mmap(2):
The system shall always zero-fill any partial page at the end of an
object. Further, the system shall never write out any modified
portions of the last page of an object which are beyond its
end. References within the address range starting at pa and
continuing for len bytes to whole pages following the end of an
object shall result in delivery of a SIGBUS signal.

An implementation may generate SIGBUS signals when a reference
would cause an error in the mapped object, such as out-of-space
condition.
Adjust according to the description, keeping the existing
compatibility code for SIGSEGV/SIGBUS on protection failures.

For situations where kernel cannot handle page fault due to resource
limit enforcement, SIGBUS with a new error code BUS_OBJERR is
delivered. Also, provide a new error code SEGV_PKUERR for SIGSEGV on
amd64 due to protection key access violation.

vm_fault_hold() is renamed to vm_fault(). Fixed some nits in
trap_pfault()s like mis-interpreting Mach errors as errnos. Removed
unneeded truncations of the fault addresses reported by hardware.

PR: 211924
Reviewed by: alc
Discussed with: jilles, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D21566


# fee2a2fa 09-Sep-2019 Mark Johnston <markj@FreeBSD.org>

Change synchonization rules for vm_page reference counting.

There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator. In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well. These references are protected by the page lock, which must
therefore be acquired for many per-page operations. This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.

Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter. A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held. As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.

The vm_page_wire() and vm_page_unwire() KPIs are changed. The former
requires that either the object lock or the busy lock is held. The
latter no longer has a return value and may free the page if it releases
the last reference to that page. vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold(). It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler. vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state). In particular, synchronization details are no longer
leaked into the caller.

The change excises the page lock from several frequently executed code
paths. In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock. In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.

__FreeBSD_version is bumped. The DRM ports have been updated to
accomodate the KPI changes.

Reviewed by: jeff (earlier version)
Tested by: gallatin (earlier version), pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20486


# 9f5103ab 05-Aug-2019 Mariusz Zaborski <oshogbo@FreeBSD.org>

process: style

We don't need to check if the parent is already set.
This is done already in the proc_reparent.

No functional behaviour changes intended.

MFC after: 1 month


# 91898857 29-Jul-2019 Mark Johnston <markj@FreeBSD.org>

Avoid relying on header pollution from sys/refcount.h.

MFC after: 3 days
Sponsored by: The FreeBSD Foundation


# 32451fb9 15-Jul-2019 John Baldwin <jhb@FreeBSD.org>

Add ptrace op PT_GET_SC_RET.

This ptrace operation returns a structure containing the error and
return values from the current system call. It is only valid when a
thread is stopped during a system call exit (PL_FLAG_SCX is set).

The sr_error member holds the error value from the system call. Note
that this error value is the native FreeBSD error value that has _not_
been translated to an ABI-specific error value similar to the values
logged to ktrace.

If sr_error is zero, then the return values of the system call will be
set in sr_retval[0] and sr_retval[1].

Reviewed by: kib
MFC after: 1 month
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D20901


# eeacb3b0 08-Jul-2019 Mark Johnston <markj@FreeBSD.org>

Merge the vm_page hold and wire mechanisms.

The hold_count and wire_count fields of struct vm_page are separate
reference counters with similar semantics. The remaining essential
differences are that holds are not counted as a reference with respect
to LRU, and holds have an implicit free-on-last unhold semantic whereas
vm_page_unwire() callers must explicitly determine whether to free the
page once the last reference to the page is released.

This change removes the KPIs which directly manipulate hold_count.
Functions such as vm_fault_quick_hold_pages() now return wired pages
instead. Since r328977 the overhead of maintaining LRU for wired pages
is lower, and in many cases vm_fault_quick_hold_pages() callers would
swap holds for wirings on the returned pages anyway, so with this change
we remove a number of page lock acquisitions.

No functional change is intended. __FreeBSD_version is bumped.

Reviewed by: alc, kib
Discussed with: jeff
Discussed with: jhb, np (cxgbe)
Tested by: pho (previous version)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D19247


# daec9284 21-May-2019 Conrad Meyer <cem@FreeBSD.org>

Include ktr.h in more compilation units

Similar to r348026, exhaustive search for uses of CTRn() and cross reference
ktr.h includes. Where it was obvious that an OS compat header of some kind
included ktr.h indirectly, .c files were left alone. Some of these files
clearly got ktr.h via header pollution in some scenarios, or tinderbox would
not be passing prior to this revision, but go ahead and explicitly include it
in files using it anyway.

Like r348026, these CUs did not show up in tinderbox as missing the include.

Reported by: peterj (arm64/mp_machdep.c)
X-MFC-With: r347984
Sponsored by: Dell EMC Isilon


# 02164d36 03-Dec-2018 Mark Johnston <markj@FreeBSD.org>

Add a missing definition for the !COMPAT_FREEBSD32 case.

Reported by: jenkins
MFC with: r341442
Sponsored by: The FreeBSD Foundation


# 352aaa51 03-Dec-2018 Mark Johnston <markj@FreeBSD.org>

Plug memory disclosures via ptrace(2).

On some architectures, the structures returned by PT_GET*REGS were not
fully populated and could contain uninitialized stack memory. The same
issue existed with the register files in procfs.

Reported by: Thomas Barabosch, Fraunhofer FKIE
Reviewed by: kib
MFC after: 3 days
Security: kernel stack memory disclosure
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18421


# 2c054ce9 16-Nov-2018 Mateusz Guzik <mjg@FreeBSD.org>

proc: always store parent pid in p_oppid

Doing so removes the dependency on proctree lock from sysctl process list
export which further reduces contention during poudriere -j 128 runs.

Reviewed by: kib (previous version)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17825


# 2203c46d 02-Nov-2018 Mark Johnston <markj@FreeBSD.org>

Initialize the eflags field of vm_map headers.

Initializing the eflags field of the map->header entry to a value with a
unique new bit set makes a few comparisons to &map->header unnecessary.

Submitted by: Doug Moore <dougm@rice.edu>
Reviewed by: alc, kib
Tested by: pho
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D14005


# a70e9a13 04-Aug-2018 Konstantin Belousov <kib@FreeBSD.org>

Swap in WKILLED processes.

Swapped-out process that is WKILLED must be swapped in as soon as
possible. The reason is that such process can be killed by OOM and
its pages can be only freed if the process exits. To exit, the kernel
stack of the process must be mapped.

When allocating pages for the stack of the WKILLED process on swap in,
use VM_ALLOC_SYSTEM requests to increase the chance of the allocation
to succeed.

Add counter of the swapped out processes to avoid unneeded iteration
over the allprocs list when there is no work to do, reducing the
allproc_lock ownership.

Reviewed by: alc, markj (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D16489


# ac4bc0c1 21-Jun-2018 Konstantin Belousov <kib@FreeBSD.org>

Update proc->p_ptevents annotation to reflect the actual locking.

Submitted by: Yanko Yankulov <yanko.yankulov@gmail.com>
Reviewed by: jhb
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D15954


# ac8b2d5c 18-May-2018 Matt Macy <mmacy@FreeBSD.org>

sys_process.c fix set but not used warning


# 6469bdcd 06-Apr-2018 Brooks Davis <brooks@FreeBSD.org>

Move most of the contents of opt_compat.h to opt_global.h.

opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by: kib, cem, jhb, jtl
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14941


# 8a36da99 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/kern: adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.


# 7e3e3606 13-Nov-2017 John Baldwin <jhb@FreeBSD.org>

Move loop to clear TDB_SUSPEND into PT_DETACH case.

The PT_DETACH case above the sendsig: label already looped over all
threads clearing flags in td_dbgflags. Reuse this loop to clear
TDB_SUSPEND and move the logic out of the sendsig: block.


# 2a2b23ca 13-Nov-2017 John Baldwin <jhb@FreeBSD.org>

Pull the PT_ATTACH case out of the 'sendsig:' block.

Most of the conditionals in the 'sendsig:' block are now only different
for PT_ATTACH vs other continue requests. Pull the PT_ATTACH-specific
logic up into the PT_ATTACH case and simplify the 'sendsig:' block. This
also permits moving the unlock of proctree_lock above the sendsig: label
since PT_KILL doesn't hold the lock and and the other cases all fall
through to the label.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D13073


# feeaec18 13-Nov-2017 John Baldwin <jhb@FreeBSD.org>

Only clear a pending thread event if one is pending.

This fixes a panic when attaching to an already-stopped process after
r325028. While here, clean up a few other things in the control flow
of the 'sendsig' section:
- Only check for P_STOPPED_TRACE rather than either of P_STOPPED_SIG
or P_STOPPED_TRACE for most ptrace requests. The signal handling
code in kern_sig.c never sets just P_STOPPED_SIG for a traced
process, so if P_STOPPED_SIG is stopped, P_STOPPED_TRACE should be
set anyway. Remove a related debug printf. Assuming P_STOPPED_TRACE
permits simplifications in the 'sendsig:' block.
- Move the block to clear the pending thread state up into a new
block conditional on P_STOPPED_TRACE and handle delivering pending
signals to the reporting thread and clearing the reporting thread's
state in this block.
- Consolidate case to send a signal to the process in a single case
for PT_ATTACH. The only case that could have been in the else before
was a PT_ATTACH where P_STOPPED_SIG was not set, so both instances
of kern_psignal() collapse down to just PT_ATTACH.

Reported by: pho, mmel
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D12837


# 9acf7b13 08-Nov-2017 Konstantin Belousov <kib@FreeBSD.org>

Zero whole struct ptrace_lwpinfo to not leak kernel stack data.

Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com>
Discussed with: secteam
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Differential revision: https://reviews.freebsd.org/D12796


# e012fe34 26-Oct-2017 John Baldwin <jhb@FreeBSD.org>

Discard the correct thread event reported for a ptrace stop.

When multiple threads wish to report a tracing event to a debugger,
both threads call ptracestop() and one thread will win the race to be
the reporting thread (p->p_xthread). The debugger uses PT_LWPINFO
with the process ID to determine which thread / LWP is reporting an
event and the details of that event. This event is cleared as a side
effect of the subsequent ptrace event that resumed the process
(PT_CONTINUE, PT_STEP, etc.). However, ptrace() was clearing the
event identified by the LWP ID passed to the resume request even if
that wasn't the 'p_xthread'. This could result in clearing an event
that had not yet been observed by the debugger and leaving the
existing event for 'p_thread' pending so that it was reported a second
time.

Specifically, if the debugger stopped due to a software breakpoint in
one thread, but then switched to another thread that was used to
resume (e.g. if the user switched to a different thread and issued a
step), the resume request (PT_STEP) cleared a pending event (if any)
for the thread being stepped. However, the process immediately
stopped and the first thread reported it's breakpoint event a second
time. The debugger decremented the PC for "both" breakpoint events
which resulted in the PC now pointing into the middle of an
instruction (on x86) and a SIGILL fault when the process was resumed a
second time.

To fix, always clear the pending event for 'p_xthread' when resuming a
process. ptrace() still honors the requested LWP ID when enabling
single-stepping (PT_STEP) or setting a different PC (PT_CONTINUE).

Reported by: GDB testsuite (gdb.threads/continue-pending-status.exp)
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D12794


# 09f3bb87 25-Sep-2017 John Baldwin <jhb@FreeBSD.org>

Log signal number passed to PT_STEP requests in KTR_PTRACE traces.

MFC after: 1 week


# 51645e83 29-Jun-2017 John Baldwin <jhb@FreeBSD.org>

Store a 32-bit PT_LWPINFO struct for 32-bit process core dumps.

Process core notes for a 32-bit process running on a 64-bit host need to
use 32-bit structures so that the note layout matches the layout of notes
of a core dump of a 32-bit process under a 32-bit kernel.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D11407


# b43ce76c 12-Jun-2017 Konstantin Belousov <kib@FreeBSD.org>

Add ptrace(PT_GET_SC_ARGS) command to return debuggee' current syscall
arguments.

Reviewed by: jhb (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
Differential revision: https://reviews.freebsd.org/D11080


# 2d88da2f 12-Jun-2017 Konstantin Belousov <kib@FreeBSD.org>

Move struct syscall_args syscall arguments parameters container into
struct thread.

For all architectures, the syscall trap handlers have to allocate the
structure on the stack. The structure takes 88 bytes on 64bit arches
which is not negligible. Also, it cannot be easily found by other
code, which e.g. caused duplication of some members of the structure
to struct thread already. The change removes td_dbg_sc_code and
td_dbg_sc_nargs which were directly copied from syscall_args.

The structure is put into the copied on fork part of the struct thread
to make the syscall arguments information correct in the child after
fork.

This move will also allow several more uses shortly.

Reviewed by: jhb (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
X-Differential revision: https://reviews.freebsd.org/D11080


# 86be94fc 30-Mar-2017 Tycho Nightingale <tychon@FreeBSD.org>

Add support for capturing 'struct ptrace_lwpinfo' for signals
resulting in a process dumping core in the corefile.

Also extend procstat to view select members of 'struct ptrace_lwpinfo'
from the contents of the note.

Sponsored by: Dell EMC Isilon


# 82a4538f 20-Feb-2017 Eric Badger <badger@FreeBSD.org>

Defer ptracestop() signals that cannot be delivered immediately

When a thread is stopped in ptracestop(), the ptrace(2) user may request
a signal be delivered upon resumption of the thread. Heretofore, those signals
were discarded unless ptracestop()'s caller was issignal(). Fix this by
modifying ptracestop() to queue up signals requested by the ptrace user that
will be delivered when possible. Take special care when the signal is SIGKILL
(usually generated from a PT_KILL request); no new stop events should be
triggered after a PT_KILL.

Add a number of tests for the new functionality. Several tests were authored
by jhb.

PR: 212607
Reviewed by: kib
Approved by: kib (mentor)
MFC after: 2 weeks
Sponsored by: Dell EMC
In collaboration with: jhb
Differential Revision: https://reviews.freebsd.org/D9260


# e5574e09 19-Aug-2016 Mark Johnston <markj@FreeBSD.org>

Don't set P2_PTRACE_FSTP in a process that invokes ptrace(PT_TRACE_ME).

Such processes are stopped synchronously by a direct call to
ptracestop(SIGTRAP) upon exec. P2_PTRACE_FSTP causes the exec()ing thread
to suspend itself while waiting for a SIGSTOP that never arrives.

Reviewed by: kib
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D7576


# b7a25e63 28-Jul-2016 Konstantin Belousov <kib@FreeBSD.org>

When a debugger attaches to the process, SIGSTOP is sent to the
target. Due to a way issignal() selects the next signal to deliver
and report, if the simultaneous or already pending another signal
exists, that signal might be reported by the next waitpid(2) call.
This causes minor annoyance for debuggers, which must be prepared to
take any signal as the first event, then filter SIGSTOP later.

More importantly, for tools like gcore(1), which attach and then
detach without processing events, SIGSTOP might leak to be delivered
after PT_DETACH. This results in the process being unintentionally
stopped after detach, which is fatal for automatic tools.

The solution is to force SIGSTOP to be the first signal reported after
the attach. Attach code is modified to set P2_PTRACE_FSTP to indicate
that the attaching ritual was not yet finished, and issignal() prefers
SIGSTOP in that condition. Also, the thread which handles
P2_PTRACE_FSTP is made to guarantee to own p_xthread during the first
waitpid(2). All that ensures that SIGSTOP is consumed first.

Additionally, if P2_PTRACE_FSTP is still set on detach, which means
that waitpid(2) was not called at all, SIGSTOP is removed from the
queue, ensuring that the process is resumed on detach.

In issignal(), when acting on STOPing signals, remove the signal from
queue before suspending. Otherwise parallel attach could result in
ptracestop() acting on that STOP as if it was the STOP signal from the
attach. Then SIGSTOP from attach leaks again.

As a minor refactoring, some bits of the common attach code is moved
to new helper proc_set_traced().

Reported by: markj
Reviewed by: jhb, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D7256


# fc4f075a 18-Jul-2016 John Baldwin <jhb@FreeBSD.org>

Add PTRACE_VFORK to trace vfork events.

First, PL_FLAG_FORKED events now also set a PL_FLAG_VFORKED flag when
the new child was created via vfork() rather than fork(). Second, a
new PL_FLAG_VFORK_DONE event can now be enabled via the PTRACE_VFORK
event mask. This new stop is reported after the vfork parent resumes
due to the child calling exit or exec. Debuggers can use this stop to
reinsert breakpoints in the vfork parent process before it resumes.

Reviewed by: kib
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D7045


# f470cca5 15-Jul-2016 Konstantin Belousov <kib@FreeBSD.org>

In ptrace_vm_entry(), do not call vmspace_free() while owning a vm
object lock.

The vmspace_free() operations might need to lock map, object etc on
last dereference. Postpone the free until object's inspection is
done.

Reported and tested by: will
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 8d570f64 15-Jul-2016 John Baldwin <jhb@FreeBSD.org>

Add a mask of optional ptrace() events.

ptrace() now stores a mask of optional events in p_ptevents. Currently
this mask is a single integer, but it can be expanded into an array of
integers in the future.

Two new ptrace requests can be used to manipulate the event mask:
PT_GET_EVENT_MASK fetches the current event mask and PT_SET_EVENT_MASK
sets the current event mask.

The current set of events include:
- PTRACE_EXEC: trace calls to execve().
- PTRACE_SCE: trace system call entries.
- PTRACE_SCX: trace syscam call exits.
- PTRACE_FORK: trace forks and auto-attach to new child processes.
- PTRACE_LWP: trace LWP events.

The S_PT_SCX and S_PT_SCE events in the procfs p_stops flags have
been replaced by PTRACE_SCE and PTRACE_SCX. PTRACE_FORK replaces
P_FOLLOW_FORK and PTRACE_LWP replaces P2_LWP_EVENTS.

The PT_FOLLOW_FORK and PT_LWP_EVENTS ptrace requests remain for
compatibility but now simply toggle corresponding flags in the
event mask.

While here, document that PT_SYSCALL, PT_TO_SCE, and PT_TO_SCX both
modify the event mask and continue the traced process.

Reviewed by: kib
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D7044


# 5fcfab6e 29-Dec-2015 John Baldwin <jhb@FreeBSD.org>

Add ptrace(2) reporting for LWP events.

Add two new LWPINFO flags: PL_FLAG_BORN and PL_FLAG_EXITED for reporting
thread creation and destruction. Newly created threads will stop to report
PL_FLAG_BORN before returning to userland and exiting threads will stop to
report PL_FLAG_EXIT before exiting completely. Both of these events are
only enabled and reported if PT_LWP_EVENTS is enabled on a process.


# 711fbd17 07-Dec-2015 Mark Johnston <markj@FreeBSD.org>

Add helper functions proc_readmem() and proc_writemem().

These helper functions can be used to read in or write a buffer from or to
an arbitrary process' address space. Without them, this can only be done
using proc_rwmem(), which requires the caller to fill out a uio. This is
onerous and results in code duplication; the new functions provide a simpler
interface which is sufficient for most existing callers of proc_rwmem().

This change also adds a manual page for proc_rwmem() and the new functions.

Reviewed by: jhb, kib
Differential Revision: https://reviews.freebsd.org/D4245


# d2871337 07-Nov-2015 Mark Johnston <markj@FreeBSD.org>

- Consistently use PROC_ASSERT_HELD() to verify that a process' hold count
is non-zero.
- Include the process address in the PROC_ASSERT_HELD() and
PROC_ASSERT_NOT_HELD() assertion messages so that the corresponding
process can be found easily when debugging.

MFC after: 1 week


# 8fc3db03 20-Oct-2015 Konstantin Belousov <kib@FreeBSD.org>

Trim spaces at end of line to record the proper commit message for
r289660:

Do not allow to execute ptrace(PT_TRACE_ME) when the process is
already traced.

Do not allow to execute ptrace(PT_TRACE_ME) when there is no parent
which can trace the process, i.e. when the parent is already init.
Note that after the PT_TRACE_ME request the process is unkillable and
non-continuable until a debugger is attached, or parent is killed, the
later clears P_TRACED state. Since init clearly would not debug the
caller, and cannot be killed, disallow creation of unkillable
processes.

Reviewed by: jhb, pho
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D3908


# 77b9bec3 20-Oct-2015 Konstantin Belousov <kib@FreeBSD.org>

Reviewed by: jhb, pho
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D3908


# 1ed2e49b 20-Oct-2015 Konstantin Belousov <kib@FreeBSD.org>

No need to dereference struct proc to pids when comparing processes
for equality.

Reviewed by: jhb, pho
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# c814b868 20-Oct-2015 John Baldwin <jhb@FreeBSD.org>

Switch pl_child_pid from int to pid_t.

Reviewed by: emaste, kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D3857


# 3edd0fff 05-Oct-2015 John Baldwin <jhb@FreeBSD.org>

Include additional info in ptrace(2) KTR traces:
- The new PC value and signal passed to PT_CONTINUE, PT_DETACH, PT_SYSCALL,
and PT_TO_SC[EX].
- The system call code returned via PT_LWPINFO.

MFC after: 1 week


# 183b68f7 01-Sep-2015 John Baldwin <jhb@FreeBSD.org>

Export current system call code and argument count for system call entry
and exit events. procfs stop events for system call tracing report these
values (argument count for system call entry and code for system call exit),
but ptrace() does not provide this information. (Note that while the system
call code can be determined in an ABI-specific manner during system call
entry, it is not generally available during system call exit.)

The values are exported via new fields at the end of struct ptrace_lwpinfo
available via PT_LWPINFO.

Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D3536


# 98685dc8 01-Aug-2015 John Baldwin <jhb@FreeBSD.org>

Clear P_TRACED before reparenting a detached process back to its
original parent. Otherwise the debugee will be set as an orphan of
the debugger.

Add tests for tracing forks via PT_FOLLOW_FORK.

Reviewed by: kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D2809


# b4490c6e 18-Jul-2015 Konstantin Belousov <kib@FreeBSD.org>

The si_status field of the siginfo_t, provided by the waitid(2) and
SIGCHLD signal, should keep full 32 bits of the status passed to the
_exit(2).

Split the combined p_xstat of the struct proc into the separate exit
status p_xexit for normal process exit, and signalled termination
information p_xsig. Kernel-visible macro KW_EXITCODE() reconstructs
old p_xstat from p_xexit and p_xsig. p_xexit contains complete status
and copied out into si_status.

Requested by: Joerg Schilling
Reviewed by: jilles (previous version), pho
Tested by: pho
Sponsored by: The FreeBSD Foundation


# 63e4c6cd 02-Jun-2015 Eric van Gyzen <vangyzen@FreeBSD.org>

Provide vnode in memory map info for files on tmpfs

When providing memory map information to userland, populate the vnode pointer
for tmpfs files. Set the memory mapping to appear as a vnode type, to match
FreeBSD 9 behavior.

This fixes the use of tmpfs files with the dtrace pid provider,
procstat -v, procfs, linprocfs, pmc (pmcstat), and ptrace (PT_VM_ENTRY).

Submitted by: Eric Badger <eric@badgerio.us> (initial revision)
Obtained from: Dell Inc.
PR: 198431
MFC after: 2 weeks
Reviewed by: jhb
Approved by: kib (mentor)


# 4c372ca2 01-Jun-2015 Xin LI <delphij@FreeBSD.org>

Clear p_stops when doing PT_DETACH.

Without this, if a process was being traced by truss(1), which
uses different p_stops bits than gdb(1), the latter would
misbehave because of the unexpected bits.

Reported by: jceel
Submitted by: sef
Sponsored by: iXsystems, Inc.
MFC after: 2 weeks


# 515b7a0b 25-May-2015 John Baldwin <jhb@FreeBSD.org>

Add KTR tracing for some MI ptrace events.

Differential Revision: https://reviews.freebsd.org/D2643
Reviewed by: kib


# 237623b0 14-Dec-2014 Konstantin Belousov <kib@FreeBSD.org>

Add a facility for non-init process to declare itself the reaper of
the orphaned descendants. Base of the API is modelled after the same
feature from the DragonFlyBSD.

Requested by: bapt
Reviewed by: jilles (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks


# 4bc68ed7 21-Oct-2014 Mateusz Guzik <mjg@FreeBSD.org>

Plug unnecessary PRS_NEW check in kern_procctl.

pfind does not return processes in such state.


# 7aa1071e 02-Oct-2014 John Baldwin <jhb@FreeBSD.org>

Require p_cansched() for changing a process' protection status via
procctl() rather than p_cansee().

Submitted by: rwatson
MFC after: 3 days


# d7359980 06-Aug-2014 Konstantin Belousov <kib@FreeBSD.org>

Correct the problems with the ptrace(2) making the debuggee an orphan.
One problem is inferior(9) looping due to the process tree becoming a
graph instead of tree if the parent is traced by child. Another issue
is due to the use of p_oppid to restore the original parent/child
relationship, because real parent could already exited and its pid
reused (noted by mjg).

Add the function proc_realparent(9), which calculates the parent for
given process. It uses the flag P_TREE_FIRST_ORPHAN to detect the head
element of the p_orphan list and than stepping back to its container
to find the parent process. If the parent has already exited, the
init(8) is returned.

Move the P_ORPHAN and the new helper flag from the p_flag* to new
p_treeflag field of struct proc, which is protected by proctree lock
instead of proc lock, since the orphans relationship is managed under
the proctree_lock already.

The remaining uses of p_oppid in ptrace(PT_DETACH) and process
reapping are replaced by proc_realparent(9).

Phabric: D417
Reviewed by: jhb
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 55648840 19-Sep-2013 John Baldwin <jhb@FreeBSD.org>

Extend the support for exempting processes from being killed when swap is
exhausted.
- Add a new protect(1) command that can be used to set or revoke protection
from arbitrary processes. Similar to ktrace it can apply a change to all
existing descendants of a process as well as future descendants.
- Add a new procctl(2) system call that provides a generic interface for
control operations on processes (as opposed to the debugger-specific
operations provided by ptrace(2)). procctl(2) uses a combination of
idtype_t and an id to identify the set of processes on which to operate
similar to wait6().
- Add a PROC_SPROTECT control operation to manage the protection status
of a set of processes. MADV_PROTECT still works for backwards
compatability.
- Add a p_flag2 to struct proc (and a corresponding ki_flag2 to kinfo_proc)
the first bit of which is used to track if P_PROTECT should be inherited
by new child processes.

Reviewed by: kib, jilles (earlier version)
Approved by: re (delphij)
MFC after: 1 month


# be996836 05-Aug-2013 Attilio Rao <attilio@FreeBSD.org>

Revert r253939:
We cannot busy a page before doing pagefaults.
Infact, it can deadlock against vnode lock, as it tries to vget().
Other functions, right now, have an opposite lock ordering, like
vm_object_sync(), which acquires the vnode lock first and then
sleeps on the busy mechanism.

Before this patch is reinserted we need to break this ordering.

Sponsored by: EMC / Isilon storage division
Reported by: kib


# 3b6714ca 04-Aug-2013 Attilio Rao <attilio@FreeBSD.org>

The page hold mechanism is fast but it has couple of fallouts:
- It does not let pages respect the LRU policy
- It bloats the active/inactive queues of few pages

Try to avoid it as much as possible with the long-term target to
completely remove it.
Use the soft-busy mechanism to protect page content accesses during
short-term operations (like uiomove_fromphys()).

After this change only vm_fault_quick_hold_pages() is still using the
hold mechanism for page content access.
There is an additional complexity there as the quick path cannot
immediately access the page object to busy the page and the slow path
cannot however busy more than one page a time (to avoid deadlocks).

Fixing such primitive can bring to complete removal of the page hold
mechanism.

Sponsored by: EMC / Isilon storage division
Discussed with: alc
Reviewed by: jeff
Tested by: pho


# bc403f03 08-Apr-2013 Attilio Rao <attilio@FreeBSD.org>

Switch some "low-hanging fruit" to acquire read lock on vmobjects
rather than write locks.

Sponsored by: EMC / Isilon storage division
Reviewed by: alc
Tested by: pho


# 89f6b863 08-Mar-2013 Attilio Rao <attilio@FreeBSD.org>

Switch the vm_object mutex to be a rwlock. This will enable in the
future further optimizations where the vm_object lock will be held
in read mode most of the time the page cache resident pool of pages
are accessed for reading purposes.

The change is mostly mechanical but few notes are reported:
* The KPI changes as follow:
- VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK()
- VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK()
- VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK()
- VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED()
(in order to avoid visibility of implementation details)
- The read-mode operations are added:
VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(),
VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED()
* The vm/vm_pager.h namespace pollution avoidance (forcing requiring
sys/mutex.h in consumers directly to cater its inlining functions
using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h
consumers now must include also sys/rwlock.h.
* zfs requires a quite convoluted fix to include FreeBSD rwlocks into
the compat layer because the name clash between FreeBSD and solaris
versions must be avoided.
At this purpose zfs redefines the vm_object locking functions
directly, isolating the FreeBSD components in specific compat stubs.

The KPI results heavilly broken by this commit. Thirdy part ports must
be updated accordingly (I can think off-hand of VirtualBox, for example).

Sponsored by: EMC / Isilon storage division
Reviewed by: jeff
Reviewed by: pjd (ZFS specific review)
Discussed with: alc
Tested by: pho


# 590f9303 25-Feb-2013 Attilio Rao <attilio@FreeBSD.org>

Merge from vmobj-rwlock branch:
Remove unused inclusion of vm/vm_pager.h and vm/vnode_pager.h.

Sponsored by: EMC / Isilon storage division
Tested by: pho
Reviewed by: alc


# 888d4d4f 07-Feb-2013 Konstantin Belousov <kib@FreeBSD.org>

When vforked child is traced, the debugging events are not generated
until child performs exec(). The behaviour is reasonable when a
debugger is the real parent, because the parent is stopped until
exec(), and sending a debugging event to the debugger would deadlock
both parent and child.

On the other hand, when debugger is not the parent of the vforked
child, not sending debugging signals makes it impossible to debug
across vfork.

Fix the issue by declining generating debug signals only when vfork()
was done and child called ptrace(PT_TRACEME). Set a new process flag
P_PPTRACE from the attach code for PT_TRACEME, if P_PPWAIT flag is
set, which indicates that the process was created with vfork() and
still did not execed. Check P_PPTRACE from issignal(), instead of
refusing the trace outright for the P_PPWAIT case. The scope of
P_PPTRACE is exactly contained in the scope of P_PPWAIT.

Found and tested by: zont
Reviewed by: pluknet
MFC after: 2 weeks


# 5050aa86 22-Oct-2012 Konstantin Belousov <kib@FreeBSD.org>

Remove the support for using non-mpsafe filesystem modules.

In particular, do not lock Giant conditionally when calling into the
filesystem module, remove the VFS_LOCK_GIANT() and related
macros. Stop handling buffers belonging to non-mpsafe filesystems.

The VFS_VERSION is bumped to indicate the interface change which does
not result in the interface signatures changes.

Conducted and reviewed by: attilio
Tested by: pho


# c0c6e95f 07-Aug-2012 Konstantin Belousov <kib@FreeBSD.org>

Always initialize pl_event.

Submitted by: Andrey Zonov <andrey@zonov.org>
MFC after: 3 days


# 5985d615 09-Jul-2012 David Xu <davidxu@FreeBSD.org>

If you have pressed CTRL+Z and a process is suspended, then you use gdb
to attach to the process, it is surprising that the process is resumed
without inputting any gdb commands, however ptrace manual said:
The tracing process will see the newly-traced process stop and may
then control it as if it had been traced all along.
But the current code does not work in this way, unless traced process
received a signal later, it will continue to run as a background task.
To fix this problem, just send signal SIGSTOP to the traced process after
we resumed it, this works like that you are attaching to a running process,
it is not perfect but better than nothing.


# dcd43281 23-Feb-2012 Konstantin Belousov <kib@FreeBSD.org>

Allow the parent to gather the exit status of the children reparented
to the debugger. When reparenting for debugging, keep the child in
the new orphan list of old parent. When looping over the children in
kern_wait(), iterate over both children list and orphan list to search
for the process by pid.

Submitted by: Dmitry Mikulin <dmitrym juniper.net>
MFC after: 2 weeks


# db327339 09-Feb-2012 Konstantin Belousov <kib@FreeBSD.org>

Mark the automatically attached child with PL_FLAG_CHILD in struct
lwpinfo flags, for PT_FOLLOWFORK auto-attachment.

In collaboration with: Dmitry Mikulin <dmitrym juniper net>
MFC after: 1 week


# 8451d0dd 16-Sep-2011 Kip Macy <kmacy@FreeBSD.org>

In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by: rwatson
Approved by: re (bz)


# 52e95a64 17-Jun-2011 David E. O'Brien <obrien@FreeBSD.org>

Add comment from CSRG rev 7.27 (1992/06/23 19:56:55; author: mckusick)


# f528c3fd 14-Jun-2011 David E. O'Brien <obrien@FreeBSD.org>

We should not return ECHILD when debugging a child and the parent does a
"wait4(-1, ..., WNOHANG, ...)". Instead wait(2) should behave as if the
child does not wish to report status at this time.

Reviewed by: jhb


# a5c1afad 26-Jan-2011 Dmitry Chagin <dchagin@FreeBSD.org>

Add macro to test the sv_flags of any process. Change some places to test
the flags instead of explicit comparing with address of known sysentvec
structures.

MFC after: 1 month


# 6fa39a73 25-Jan-2011 Konstantin Belousov <kib@FreeBSD.org>

Allow debugger to specify that children of the traced process should be
automatically traced. Extend the ptrace(PL_LWPINFO) to report that child
just forked.

Reviewed by: davidxu, jhb
MFC after: 2 weeks


# acd11c74 20-Dec-2010 Alan Cox <alc@FreeBSD.org>

Introduce vm_fault_hold() and use it to (1) eliminate a long-standing race
condition in proc_rwmem() and to (2) simplify the implementation of the
cxgb driver's vm_fault_hold_user_pages(). Specifically, in proc_rwmem()
the requested read or write could fail because the targeted page could be
reclaimed between the calls to vm_fault() and vm_page_hold().

In collaboration with: kib@
MFC after: 6 weeks


# 7f08176e 22-Nov-2010 Attilio Rao <attilio@FreeBSD.org>

Add the ability for GDB to printout the thread name along with other
thread specific informations.

In order to do that, and in order to avoid KBI breakage with existing
infrastructure the following semantic is implemented:
- For live programs, a new member to the PT_LWPINFO is added (pl_tdname)
- For cores, a new ELF note is added (NT_THRMISC) that can be used for
storing thread specific, miscellaneous, informations. Right now it is
just popluated with a thread name.

GDB, then, retrieves the correct informations from the corefile via the
BFD interface, as it groks the ELF notes and create appropriate
pseudo-sections.

Sponsored by: Sandvine Incorporated
Tested by: gianni
Discussed with: dim, kan, kib
MFC after: 2 weeks


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# cf7d9a8c 08-Oct-2010 David Xu <davidxu@FreeBSD.org>

Create a global thread hash table to speed up thread lookup, use
rwlock to protect the table. In old code, thread lookup is done with
process lock held, to find a thread, kernel has to iterate through
process and thread list, this is quite inefficient.
With this change, test shows in extreme case performance is
dramatically improved.

Earlier patch was reviewed by: jhb, julian


# 8a260079 04-Jul-2010 Konstantin Belousov <kib@FreeBSD.org>

Extend ptrace(PT_LWPINFO) to report siginfo for the signal that caused
debugee stop. The change should keep the ABI. Take care of compat32.

Discussed with: davidxu, jhb
MFC after: 2 weeks


# 60ae52f7 21-Jun-2010 Ed Schouten <ed@FreeBSD.org>

Use ISO C99 integer types in sys/kern where possible.

There are only about 100 occurences of the BSD-specific u_int*_t
datatypes in sys/kern. The ISO C99 integer types are used here more
often.


# af89e296 01-Jun-2010 John Baldwin <jhb@FreeBSD.org>

MFC 208555:
Ignore the 'addr' argument passed to PT_STEP (it is required to be '1'
for PT_STEP which means "ignore") and PT_DETACH.

Approved by: re (kib)


# 0bfbf4d2 25-May-2010 John Baldwin <jhb@FreeBSD.org>

Ignore the 'addr' argument passed to PT_STEP (it is required to be '1'
for PT_STEP which means "ignore") and PT_DETACH.

PR: kern/146167
MFC after: 1 week


# afe1a688 23-May-2010 Konstantin Belousov <kib@FreeBSD.org>

Reorganize syscall entry and leave handling.

Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
usermode into struct syscall_args. The structure is machine-depended
(this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
from the syscall. It is a generalization of
cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
return value.
sv_syscallnames - the table of syscall names.

Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().

The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.

Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().

Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively. The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.

The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.

Reviewed by: jhb, marcel, marius, nwhitehorn, stas
Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
stas (mips)
MFC after: 1 month


# 2965a453 29-Apr-2010 Kip Macy <kmacy@FreeBSD.org>

On Alan's advice, rather than do a wholesale conversion on a single
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.

Supported by: Bitgravity Inc.

Discussed with: alc, jeffr, and kib


# 4ccf64eb 06-Apr-2010 Nathan Whitehorn <nwhitehorn@FreeBSD.org>

MFC r205014,205015:

Provide groundwork for 32-bit binary compatibility on non-x86 platforms,
for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32
option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts
of the kernel and enhances the freebsd32 compatibility code to support
big-endian platforms.

This MFC is required for MFCs of later changes to the freebsd32
compatibility from HEAD.

Requested by: kib


# dfeca187 30-Mar-2010 Marcel Moolenaar <marcel@FreeBSD.org>

MFC rev 198341 and 198342:
o Introduce vm_sync_icache() for making the I-cache coherent with
the memory or D-cache, depending on the semantics of the platform.
vm_sync_icache() is basically a wrapper around pmap_sync_icache(),
that translates the vm_map_t argumument to pmap_t.
o Introduce pmap_sync_icache() to all PMAP implementation. For powerpc
it replaces the pmap_page_executable() function, added to solve
the I-cache problem in uiomove_fromphys().
o In proc_rwmem() call vm_sync_icache() when writing to a page that
has execute permissions. This assures that when breakpoints are
written, the I-cache will be coherent and the process will actually
hit the breakpoint.
o This also fixes the Book-E PMAP implementation that was missing
necessary locking while trying to deal with the I-cache coherency
in pmap_enter() (read: mmu_booke_enter_locked).


# 841c0c7e 11-Mar-2010 Nathan Whitehorn <nwhitehorn@FreeBSD.org>

Provide groundwork for 32-bit binary compatibility on non-x86 platforms,
for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32
option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts
of the kernel and enhances the freebsd32 compatibility code to support
big-endian platforms.

Reviewed by: kib, jhb


# d5f57f7e 06-Mar-2010 Marcel Moolenaar <marcel@FreeBSD.org>

MFC revs 203696, 203708, 203783 and 203788:
Add PT_VM_TIMESTAMP and PT_VM_ENTRY so that the tracing process can
obtain the memory map of the traced process.

Requested by: kib@


# af002ab8 11-Feb-2010 Marcel Moolenaar <marcel@FreeBSD.org>

Initialize pve_fsid and pve_fileid to VNOVAL.


# 16211027 11-Feb-2010 Marcel Moolenaar <marcel@FreeBSD.org>

o Add support for COMPAT_IA32.
o Incorporate review comments:
- Properly reference and lock the map
- Take into account that the VM map can change inbetween requests
- Add the fileid and fsid attributes

Credits: kib@
Reviewed by: kib@


# 8a25c0c7 09-Feb-2010 Marcel Moolenaar <marcel@FreeBSD.org>

Unbreak building kernels with COMPAT_32 enabled. The actual support
for the PT_VM_ENTRY request from 32-bit processes will follow.

Pointy hat: marcel


# 90b4621a 08-Feb-2010 Marcel Moolenaar <marcel@FreeBSD.org>

Add PT_VM_TIMESTAMP and PT_VM_ENTRY so that the tracing process can
obtain the memory map of the traced process. PT_VM_TIMESTAMP can be
used to check if the memory map changed since the last time to avoid
iterating over all the VM entries unnecesarily.

MFC after: 1 month


# c329abd0 07-Feb-2010 Konstantin Belousov <kib@FreeBSD.org>

MFC r202882:
For i386, amd64 and ia32 on amd64 MD syscall(), reread syscall number
and arguments after ptracestop(), if debugger modified anything in the
process environment.


# 5b1162b9 23-Jan-2010 Konstantin Belousov <kib@FreeBSD.org>

For PT_TO_SCE stop that stops the ptraced process upon syscall entry,
syscall arguments are collected before ptracestop() is called. As a
consequence, debugger cannot modify syscall or its arguments.

For i386, amd64 and ia32 on amd64 MD syscall(), reread syscall number
and arguments after ptracestop(), if debugger modified anything in the
process environment. Since procfs stopeven requires number of syscall
arguments in p_xstat, this cannot be solved by moving stop/trace point
before argument fetching.

Move the code to read arguments into separate function
fetch_syscall_args() to avoid code duplication. Note that ktrace point
for modified syscall is intentionally recorded twice, once with original
arguments, and second time with the arguments set by debugger.

PT_TO_SCX stop is executed after cpu_syscall_set_retval() already.

Reported by: Ali Polatel <alip exherbo org>
Briefly discussed with: jhb
MFC after: 3 weeks


# a6d42a0d 25-Nov-2009 Alan Cox <alc@FreeBSD.org>

Replace VM_PROT_OVERRIDE_WRITE by VM_PROT_COPY. VM_PROT_OVERRIDE_WRITE has
represented a write access that is allowed to override write protection.
Until now, VM_PROT_OVERRIDE_WRITE has been used to write breakpoints into
text pages. Text pages are not just write protected but they are also
copy-on-write. VM_PROT_OVERRIDE_WRITE overrides the write protection on the
text page and triggers the replication of the page so that the breakpoint
will be written to a private copy. However, here is where things become
confused. It is the debugger, not the process being debugged that requires
write access to the copied page. Nonetheless, the copied page is being
mapped into the process with write access enabled. In other words, once the
debugger sets a breakpoint within a text page, the program can write to its
private copy of that text page. Whereas prior to setting the breakpoint, a
SIGSEGV would have occurred upon a write access. VM_PROT_COPY addresses
this problem. The combination of VM_PROT_READ and VM_PROT_COPY forces the
replication of a copy-on-write page even though the access is only for read.
Moreover, the replicated page is only mapped into the process with read
access, and not write access.

Reviewed by: kib
MFC after: 4 weeks


# a0c703bf 24-Oct-2009 Alan Cox <alc@FreeBSD.org>

Update a comment to reflect the previous change.


# 1a4fcaeb 21-Oct-2009 Marcel Moolenaar <marcel@FreeBSD.org>

o Introduce vm_sync_icache() for making the I-cache coherent with
the memory or D-cache, depending on the semantics of the platform.
vm_sync_icache() is basically a wrapper around pmap_sync_icache(),
that translates the vm_map_t argumument to pmap_t.
o Introduce pmap_sync_icache() to all PMAP implementation. For powerpc
it replaces the pmap_page_executable() function, added to solve
the I-cache problem in uiomove_fromphys().
o In proc_rwmem() call vm_sync_icache() when writing to a page that
has execute permissions. This assures that when breakpoints are
written, the I-cache will be coherent and the process will actually
hit the breakpoint.
o This also fixes the Book-E PMAP implementation that was missing
necessary locking while trying to deal with the I-cache coherency
in pmap_enter() (read: mmu_booke_enter_locked).

The key property of this change is that the I-cache is made coherent
*after* writes have been done. Doing it in the PMAP layer when adding
or changing a mapping means that the I-cache is made coherent *before*
any writes happen. The difference is key when the I-cache prefetches.


# 2a565838 02-Jul-2009 Robert Watson <rwatson@FreeBSD.org>

Clean up a number of aspects of token generation from audit arguments to
system calls:

- Centralize generation of argument tokens for VM addresses in a macro,
ADDR_TOKEN(), and properly encode 64-bit addresses in 64-bit arguments.
- Fix up argument numbers across a large number of syscalls so that they
match the numeric argument into the system call.
- Don't audit the address argument to ioctl(2) or ptrace(2), but do keep
generating tokens for mmap(2), minherit(2), since they relate to passing
object access across execve(2).

Approved by: re (audit argument blanket)
Obtained from: TrustedBSD Project
MFC after: 1 week


# 14961ba7 27-Jun-2009 Robert Watson <rwatson@FreeBSD.org>

Replace AUDIT_ARG() with variable argument macros with a set more more
specific macros for each audit argument type. This makes it easier to
follow call-graphs, especially for automated analysis tools (such as
fxr).

In MFC, we should leave the existing AUDIT_ARG() macros as they may be
used by third-party kernel modules.

Suggested by: brooks
Approved by: re (kib)
Obtained from: TrustedBSD Project
MFC after: 1 week


# 3364c323 23-Jun-2009 Konstantin Belousov <kib@FreeBSD.org>

Implement global and per-uid accounting of the anonymous memory. Add
rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved
for the uid.

The accounting information (charge) is associated with either map entry,
or vm object backing the entry, assuming the object is the first one
in the shadow chain and entry does not require COW. Charge is moved
from entry to object on allocation of the object, e.g. during the mmap,
assuming the object is allocated, or on the first page fault on the
entry. It moves back to the entry on forks due to COW setup.

The per-entry granularity of accounting makes the charge process fair
for processes that change uid during lifetime, and decrements charge
for proper uid when region is unmapped.

The interface of vm_pager_allocate(9) is extended by adding struct ucred *,
that is used to charge appropriate uid when allocation if performed by
kernel, e.g. md(4).

Several syscalls, among them is fork(2), may now return ENOMEM when
global or per-uid limits are enforced.

In collaboration with: pho
Reviewed by: alc
Approved by: re (kensmith)


# 2883703e 02-Mar-2009 Konstantin Belousov <kib@FreeBSD.org>

Use the p_sysent->sv_flags flag SV_ILP32 to detect 32bit process
executing on 64bit kernel. This eliminates the direct comparisions
of p_sysent with &ia32_freebsd_sysvec, that were left intact after
r185169.


# 7b4a950a 04-Nov-2008 David Xu <davidxu@FreeBSD.org>

Revert rev 184216 and 184199, due to the way the thread_lock works,
it may cause a lockup.

Noticed by: peter, jhb


# 3f9be10e 23-Oct-2008 David Xu <davidxu@FreeBSD.org>

Actually, for signal and thread suspension, extra process spin lock is
unnecessary, the normal process lock and thread lock are enough. The
spin lock is still needed for process and thread exiting to mimic
single sched_lock.


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# 904c5ec4 15-Oct-2008 David Xu <davidxu@FreeBSD.org>

Move per-thread userland debugging flags into seperated field,
this eliminates some problems of locking, e.g, a thread lock is needed
but can not be used at that time. Only the process lock is needed now
for new field.


# 374ae2a3 19-Mar-2008 Jeff Roberson <jeff@FreeBSD.org>

- Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice from
requiring the per-process spinlock to only requiring the process lock.
- Reflect these changes in the proc.h documentation and consumers throughout
the kernel. This is a substantial reduction in locking cost for these
fields and was made possible by recent changes to threading support.


# 6617724c 12-Mar-2008 Jeff Roberson <jeff@FreeBSD.org>

Remove kernel support for M:N threading.

While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential. Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.


# dda7aec7 08-Nov-2007 Stephan Uphoff <ups@FreeBSD.org>

Use VM_FAULT_DIRTY to fault in pages for write access in
proc_rwmen.
Otherwise copy on write may create an anonymous page that is
not marked as dirty. Since writing data to these pages
in this function also does not dirty these pages they may be
later discarded by the pagedaemon.


# 8753688f 08-Oct-2007 Jeff Roberson <jeff@FreeBSD.org>

- Fix from pr kern/115469; Don't redeliver a signal once it has been
handled by the target process.

Contributed by: Tijl Coosemans <tijl@ulyssis.org>
Approved by: re


# b61ce5b0 16-Sep-2007 Jeff Roberson <jeff@FreeBSD.org>

- Move all of the PS_ flags into either p_flag or td_flags.
- p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or
previously the sched_lock. These bugs have existed for some time.
- Allow swapout to try each thread in a process individually and then
swapin the whole process if any of these fail. This allows us to move
most scheduler related swap flags into td_flags.
- Keep ki_sflag for backwards compat but change all in source tools to
use the new and more correct location of P_INMEM.

Reported by: pho
Reviewed by: attilio, kib
Approved by: re (kensmith)


# 982d11f8 04-Jun-2007 Jeff Roberson <jeff@FreeBSD.org>

Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
sychronization.
- Use the per-process spinlock rather than the sched_lock for per-process
scheduling synchronization.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


# 0c14ff0e 04-Mar-2007 Robert Watson <rwatson@FreeBSD.org>

Remove 'MPSAFE' annotations from the comments above most system calls: all
system calls now enter without Giant held, and then in some cases, acquire
Giant explicitly.

Remove a number of other MPSAFE annotations in the credential code and
tweak one or two other adjacent comments.


# 8460a577 26-Oct-2006 John Birrell <jb@FreeBSD.org>

Make KSE a kernel option, turned on by default in all GENERIC
kernel configs except sun4v (which doesn't process signals properly
with KSE).

Reviewed by: davidxu@


# ff766807 25-Oct-2006 David Xu <davidxu@FreeBSD.org>

Move sigqueue_take() call into proc_reparent(), this fixed bugs where
proc_reparent() is called but sigqueue_take() is forgotten.


# f51bf07a 14-Oct-2006 Tom Rhodes <trhodes@FreeBSD.org>

Close a race condition where num can be larger than tmp, giving the user
too large of a boundary.

Reported by: Ilja Van Sprundel


# 23a28f3a 20-Aug-2006 Colin Percival <cperciva@FreeBSD.org>

Fix a signedness bug.

MFC after: 3 days
Security: Local DoS


# 06ad42b2 22-Feb-2006 John Baldwin <jhb@FreeBSD.org>

Close some races between procfs/ptrace and exit(2):
- Reorder the events in exit(2) slightly so that we trigger the S_EXIT
stop event earlier. After we have signalled that, we set P_WEXIT and
then wait for any processes with a hold on the vmspace via PHOLD to
release it. PHOLD now KASSERT()'s that P_WEXIT is clear when it is
invoked, and PRELE now does a wakeup if P_WEXIT is set and p_lock drops
to zero.
- Change proc_rwmem() to require that the processing read from has its
vmspace held via PHOLD by the caller and get rid of all the junk to
screw around with the vmspace reference count as we no longer need it.
- In ptrace() and pseudofs(), treat a process with P_WEXIT set as if it
doesn't exist.
- Only do one PHOLD in kern_ptrace() now, and do it earlier so it covers
FIX_SSTEP() (since on alpha at least this can end up calling proc_rwmem()
to clear an earlier single-step simualted via a breakpoint). We only
do one to avoid races. Also, by making the EINVAL error for unknown
requests be part of the default: case in the switch, the various
switch cases can now just break out to return which removes a _lot_ of
duplicated PRELE and proc unlocks, etc. Also, it fixes at least one bug
where a LWP ptrace command could return EINVAL with the proc lock still
held.
- Changed the locking for ptrace_single_step(), ptrace_set_pc(), and
ptrace_clear_single_step() to always be called with the proc lock
held (it was a mixed bag previously). Alpha and arm have to drop
the lock while the mess around with breakpoints, but other archs
avoid extra lock release/acquires in ptrace(). I did have to fix a
couple of other consumers in kern_kse and a few other places to
hold the proc lock and PHOLD.

Tested by: ps (1 mostly, but some bits of 2-4 as well)
MFC after: 1 week


# 085a0d43 13-Feb-2006 Wayne Salamon <wsalamon@FreeBSD.org>

Audit the arguments to the ptrace(2) system call.

Obtained from: TrustedBSD Project
Approved by: rwatson (mentor)


# ea8e65b0 06-Feb-2006 David Xu <davidxu@FreeBSD.org>

Add members pl_sigmask and pl_siglist into ptrace_lwpinfo to get lwp's
signal mask and pending signals.


# d7bc12b0 23-Dec-2005 David Xu <davidxu@FreeBSD.org>

Avoid kernel panic when attaching a process which may not be stopped
by debugger, e.g process is dumping core. Only access p_xthread if
P_STOPPED_TRACE is set, this means thread is ready to exchange signal
with debugger, print a warning if P_STOPPED_TRACE is not set due to
some bugs in other code, if there is.

The patch has been tested by Anish Mistry mistry.7 at osu dot edu, and
is slightly adjusted.


# c20cedbf 08-Nov-2005 David Xu <davidxu@FreeBSD.org>

Make sure pending SIGCHLD is removed from previous parent when process
is attached or detached.


# 8c6d7a8d 19-Aug-2005 David Xu <davidxu@FreeBSD.org>

Fix a LOR between sched_lock and sleep queue lock.


# 62919d78 30-Jun-2005 Peter Wemm <peter@FreeBSD.org>

Jumbo-commit to enhance 32 bit application support on 64 bit kernels.
This is good enough to be able to run a RELENG_4 gdb binary against
a RELENG_4 application, along with various other tools (eg: 4.x gcore).
We use this at work.

ia32_reg.[ch]: handle the 32 bit register file format, used by ptrace,
procfs and core dumps.
procfs_*regs.c: vary the format of proc/XXX/*regs depending on the client
and target application.
procfs_map.c: Don't print a 64 bit value to 32 bit consumers, or their
sscanf fails. They expect an unsigned long.
imgact_elf.c: produce a valid 32 bit coredump for 32 bit apps.
sys_process.c: handle 32 bit consumers debugging 32 bit targets. Note
that 64 bit consumers can still debug 32 bit targets.

IA64 has got stubs for ia32_reg.c.

Known limitations: a 5.x/6.x gdb uses get/setcontext(), which isn't
implemented in the 32/64 wrapper yet. We also make a tiny patch to
gdb pacify it over conflicting formats of ld-elf.so.1.

Approved by: re


# f7fdcd45 18-Mar-2005 David Schultz <das@FreeBSD.org>

Add missing cases for PT_SYSCALL.

Found by: Coverity Prevent analysis tool


# 9454b2d8 06-Jan-2005 Warner Losh <imp@FreeBSD.org>

/* -> /*- for copyright notices, minor format tweaks as necessary


# 6004362e 26-Nov-2004 David Schultz <das@FreeBSD.org>

Don't include sys/user.h merely for its side-effect of recursively
including other headers.


# 1f2eac6c 08-Aug-2004 David Xu <davidxu@FreeBSD.org>

Add pl_flags to ptrace_lwpinfo, two flags PL_FLAG_SA and PL_FLAG_BOUND
indicate that a thread is in UTS critical region.

Reviewed by: deischen
Approved by: marcel


# 1a276a3f 26-Jul-2004 Alan Cox <alc@FreeBSD.org>

- Use atomic ops for updating the vmspace's refcnt and exitingcnt.
- Push down Giant into shmexit(). (Giant is acquired only if the vmspace
contains shm segments.)
- Eliminate the acquisition of Giant from proc_rwmem().
- Reduce the scope of Giant in exit1(), uncovering the destruction of the
address space.


# c3d88cba 17-Jul-2004 David Xu <davidxu@FreeBSD.org>

Fix typo.


# ef9457be 13-Jul-2004 David Xu <davidxu@FreeBSD.org>

Implement following commands: PT_CLEARSTEP, PT_SETSTEP, PT_SUSPEND
PT_RESUME, PT_GETNUMLWPS, PT_GETLWPLIST.


# fbc3247d 11-Jul-2004 Marcel Moolenaar <marcel@FreeBSD.org>

Implement the PT_LWPINFO request. This request can be used by the
tracing process to obtain information about the LWP that caused the
traced process to stop. Debuggers can use this information to select
the thread currently running on the LWP as the current thread.

The request has been made compatible with NetBSD for as much as
possible. This implementation differs from NetBSD in the following
ways:
1. The data argument is allowed to be smaller than the size of the
ptrace_lwpinfo structure known to the kernel, but not 0. This
is opposite to what NetBSD allows. The reason for this is that
we can extend the structure without affecting older binaries.
2. On NetBSD the tracing process is to set the pl_lwpid field to
the Id of the LWP it wants information of. We don't do that.
Our ptrace interface allows passing the LWP Id instead of the
PID. The tracing process is to set the PID to the LWP Id it
wants information of.
3. When the PID is actually the PID of the tracing process, this
request returns the information about the LWP that caused the
process to stop. This was the whole purpose of the request in
the first place.

When the traced process has exited, this request will return the
LWP Id 0, indicating that the process state is not the result of
an event specific to a LWP.


# f3b929bf 02-Jul-2004 David Xu <davidxu@FreeBSD.org>

Allow ptrace to deal with lwpid.

Reviewed by: marcel


# e43257aa 01-Apr-2004 John Baldwin <jhb@FreeBSD.org>

Finish fixing up Alpha to work with an MP safe ptrace():
- ptrace_single_step() is no longer called with the proc lock held, so
don't try to unlock it and then relock it.
- Push Giant down into proc_rwmem() instead of forcing all the consumers
(including Alpha breakpoint support) to explicitly wrap calls to
proc_rwmem() with Giant.

Tested by: kensmith


# 2b63e7f3 24-Mar-2004 Alan Cox <alc@FreeBSD.org>

Use uiomove_fromphys() instead of pmap_qenter() and pmap_qremove() in
proc_rwmem().


# 8ac61436 15-Mar-2004 John Baldwin <jhb@FreeBSD.org>

Drop the proc lock around calls to the MD functions ptrace_single_step(),
ptrace_set_pc(), and cpu_ptrace() so that those functions are free to
acquire Giant, sleep, etc. We already do a PHOLD/PRELE around them so
that it is safe to sleep inside of these routines if necessary. This
allows ptrace() to be marked MP safe again as it no longer triggers lock
order reversals on Alpha.

Tested by: wilko


# cf93aa16 19-Feb-2004 Don Lewis <truckman@FreeBSD.org>

When reparenting a process in the PT_DETACH code, only set p_sigparent
to SIGCHLD if the new parent process is initproc.

MFC after: 2 weeks


# 55b5f2a2 11-Feb-2004 Don Lewis <truckman@FreeBSD.org>

When reparenting a process to init, make sure that p_sigparent is
set to SIGCHLD. This avoids the creation of orphaned Linux-threaded
zombies that init is unable to reap. This can occur when the parent
process sets its SIGCHLD to SIG_IGN. Fix a similar situation in the
PT_DETACH code.

Tested by: "Steven Hartland" <killing AT multiplay.co.uk>


# ea924c4c 09-Oct-2003 Robert Drehmel <robert@FreeBSD.org>

Implement preliminary support for the PT_SYSCALL command to ptrace(2).


# 1c843354 14-Aug-2003 Marcel Moolenaar <marcel@FreeBSD.org>

Add or finish support for machine dependent ptrace requests. When we
check for permissions, do it for all requests, not the known requests.
Later when we actually service the request we deal with the invalid
requests we previously caught earlier.

This commit changes the behaviour of the ptrace(2) interface for
boundary cases such as an unknown request without proper permissions.
Previously we would return EINVAL. Now we return EBUSY or EPERM.

Platforms need to define __HAVE_PTRACE_MACHDEP when they have MD
requests. This makes the prototype of cpu_ptrace() visible and
introduces a call to this function for all requests greater or
equal to PT_FIRSTMACH.

Silence on: audit


# 007e25d9 10-Aug-2003 Jacques Vidrine <nectar@FreeBSD.org>

Add or correct range checking of signal numbers in system calls and
ioctls.

In the particular case of ptrace(), this commit more-or-less reverts
revision 1.53 of sys_process.c, which appears to have been erroneous.

Reviewed by: iedowse, jhb


# c6eb850a 09-Aug-2003 Alan Cox <alc@FreeBSD.org>

Background: When proc_rwmem() wired and mapped a page, it also added
a reference to the containing object. The purpose of the reference
being to prevent the destruction of the object and an attempt to free
the wired page. (Wired pages can't be freed.) Unfortunately, this
approach does not work. Some operations, like fork(2) that call
vm_object_split(), can move the wired page to a difference object,
thereby making the reference pointless and opening the possibility
of the wired page being freed.

A solution is to use vm_page_hold() in place of vm_page_wire(). Held
pages can be freed. They are moved to a special hold queue until the
hold is released.

Submitted by: tegge


# 884962ae 02-Aug-2003 Alan Cox <alc@FreeBSD.org>

Use kmem_alloc_nofault() rather than kmem_alloc_pageable() in proc_rwmem().
See revision 1.140 of kern/sys_pipe.c for a detailed rationale.

Submitted by: tegge


# c40f7377 11-Jun-2003 Alan Cox <alc@FreeBSD.org>

Add vm object locking.


# 677b542e 10-Jun-2003 David E. O'Brien <obrien@FreeBSD.org>

Use __FBSDID().


# 17b8a8a7 25-Apr-2003 John Baldwin <jhb@FreeBSD.org>

Push down Giant around calls to proc_rwmem() in kern_ptrace. kern_ptrace()
should now be MP safe.


# eeec6bab 22-Apr-2003 John Baldwin <jhb@FreeBSD.org>

Prefer the proc lock to sched_lock when testing PS_INMEM now that it is
safe to do so.


# b68e0849 17-Apr-2003 John Baldwin <jhb@FreeBSD.org>

The sched_lock is not needed while clearing two of the P_STOPPED bits in
p_flag. Also, the proc lock can't be recursed, so simplify an older proc
lock assertion.


# 4e8074eb 18-Mar-2003 Dag-Erling Smørgrav <des@FreeBSD.org>

Whitespace cleanup.


# 5c0cc63c 16-Oct-2002 John Baldwin <jhb@FreeBSD.org>

Add a missing PROC_UNLOCK in ptrace() for the PT_IO case.

PR: kern/44065
Submitted by: Mark Kettenis <kettenis@chello.nl>


# 71fad9fd 11-Sep-2002 Julian Elischer <julian@FreeBSD.org>

Completely redo thread states.

Reviewed by: davidxu@freebsd.org


# 1ed8cb48 07-Sep-2002 Peter Wemm <peter@FreeBSD.org>

Remove bogus fill_kinfo_proc() before ptrace_set_pc(). There was no need
for this.

Submitted by: bde


# 1279572a 05-Sep-2002 David Xu <davidxu@FreeBSD.org>

s/SGNL/SIG/
s/SNGL/SINGLE/
s/SNGLE/SINGLE/

Fix abbreviation for P_STOPPED_* etc flags, in original code they were
inconsistent and difficult to distinguish between them.

Approved by: julian (mentor)


# 012e544f 04-Sep-2002 Ian Dowse <iedowse@FreeBSD.org>

Split up ptrace() into a wrapper that does the copying to and from
user space and a kern_ptrace() implementation. Use the kern_*()
version in the Linux emulation code to remove more stack gap uses.

Approved by: des


# 93b0017f 25-Aug-2002 Philippe Charnier <charnier@FreeBSD.org>

Replace various spelling with FALLTHROUGH which is lint()able


# 4f18efe2 20-Jul-2002 Robert Watson <rwatson@FreeBSD.org>

Do preserve the error result from calling p_cansee() and use that when
failing because of the error.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# a4e80b6b 12-Jul-2002 Alan Cox <alc@FreeBSD.org>

Lock accesses to the page queues.


# 5c859660 12-Jul-2002 Thomas Moestl <tmm@FreeBSD.org>

Fix ptrace(PT_READ_*, ...) for non-little-endian architectures where
sizeof(register_t) != sizeof(int).


# e602ba25 29-Jun-2002 Julian Elischer <julian@FreeBSD.org>

Part 1 of KSE-III

The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)

Reviewed by: Almost everyone who counts
(at various times, peter, jhb, matt, alfred, mini, bernd,
and a cast of thousands)

NOTE: this is still Beta code, and contains lots of debugging stuff.
expect slight instability in signals..


# a9b4acea 18-May-2002 Marcel Moolenaar <marcel@FreeBSD.org>

All signals can be sent to the inferior process when it's restarted,
not just the legacy ones.

PR: 33299
Submitted by: Alexander N. Kabaev <ak03@gte.com>


# f44d9e24 18-May-2002 John Baldwin <jhb@FreeBSD.org>

Change p_can{debug,see,sched,signal}()'s first argument to be a thread
pointer instead of a proc pointer and require the process pointed to
by the second argument to be locked. We now use the thread ucred reference
for the credential checks in p_can*() as a result. p_canfoo() should now
no longer need Giant.


# d8f4f6a4 08-May-2002 Jonathan Mini <mini@FreeBSD.org>

Remove trace_req().

Reviewed by: alfred, jhb, peter


# 9daa5b14 20-Apr-2002 Marcel Moolenaar <marcel@FreeBSD.org>

GCC 3.x WARNS: Add a break to the default case.


# 46e12b42 14-Apr-2002 Alfred Perlstein <alfred@FreeBSD.org>

Don't allow one to trace an ancestor when already traced.

PR: kern/29741
Submitted by: Dave Zarzycki <zarzycki@FreeBSD.org>
Fix from: Tim J. Robbins <tim@robbins.dropbear.id.au>
MFC After: 2 weeks


# 6871a6c8 12-Apr-2002 John Baldwin <jhb@FreeBSD.org>

Rework ptrace(2) to be more locking friendly. We do any needed copyin()'s
and acquire the proctree_lock if needed first. Then we lock the process
if necessary and fiddle with it as appropriate. Finally we drop locks and
do any needed copyout's. This greatly simplifies the locking.


# 65c9b430 09-Apr-2002 John Baldwin <jhb@FreeBSD.org>

- Change fill_kinfo_proc() to require that the process is locked when it
is called.
- Change sysctl_out_proc() to require that the process is locked when it
is called and to drop the lock before it returns. If this proves too
complex we can change sysctl_out_proc() to simply acquire the lock at
the very end and have the calling code drop the lock right after it
returns.
- Lock the process we are going to export before the p_cansee() in the
loop in sysctl_kern_proc() and hold the lock until we call
sysctl_out_proc().
- Don't call p_cansee() on the process about to be exported twice in
the aforementioned loop.


# ac59490b 16-Mar-2002 Jake Burkholder <jake@FreeBSD.org>

Convert all pmap_kenter/pmap_kremove pairs in MI code to use pmap_qenter/
pmap_qremove. pmap_kenter is not safe to use in MI code because it is not
guaranteed to flush the mapping from the tlb on all cpus. If the process
in question is preempted and migrates cpus between the call to pmap_kenter
and pmap_kremove, the original cpu will be left with stale mappings in its
tlb. This is currently not a problem for i386 because we do not use PG_G on
SMP, and thus all mappings are flushed from the tlb on context switches, not
just user mappings. This is not the case on all architectures, and if PG_G
is to be used with SMP on i386 it will be a problem. This was committed by
peter earlier as part of his fine grained tlb shootdown work for i386, which
was backed out for other reasons.

Reviewed by: peter


# 8bc814e6 15-Mar-2002 Dag-Erling Smørgrav <des@FreeBSD.org>

Implement PT_IO (read / write arbitrary amounts of data or text).

Submitted by: Artur Grabowski <art@{blahonga,openbsd}.org>
Obtained from: OpenBSD


# a888d317 15-Mar-2002 Dag-Erling Smørgrav <des@FreeBSD.org>

PT_[GS]ET{,DB,FP}REGS isn't really optional any more, since we have dummy
backend functions for those archs that don't support them. I meant to do
this ages ago, but never got around to it.

Inspired by: OpenBSD


# d1693e17 27-Feb-2002 Peter Wemm <peter@FreeBSD.org>

Back out all the pmap related stuff I've touched over the last few days.
There is some unresolved badness that has been eluding me, particularly
affecting uniprocessor kernels. Turning off PG_G helped (which is a bad
sign) but didn't solve it entirely. Userland programs still crashed.


# bd1e3a0f 26-Feb-2002 Peter Wemm <peter@FreeBSD.org>

Jake further reduced IPI shootdowns on sparc64 in loops by using ranged
shootdowns in a couple of key places. Do the same for i386. This also
hides some physical addresses from higher levels and has it use the
generic vm_page_t's instead. This will help for PAE down the road.

Obtained from: jake (MI code, suggestions for MD part)


# f591779b 23-Feb-2002 Seigo Tanimura <tanimura@FreeBSD.org>

Lock struct pgrp, session and sigio.

New locks are:

- pgrpsess_lock which locks the whole pgrps and sessions,
- pg_mtx which protects the pgrp members, and
- s_mtx which protects the session members.

Please refer to sys/proc.h for the coverage of these locks.

Changes on the pgrp/session interface:

- pgfind() needs the pgrpsess_lock held.

- The caller of enterpgrp() is responsible to allocate a new pgrp and
session.

- Call enterthispgrp() in order to enter an existing pgrp.

- pgsignal() requires a pgrp lock held.

Reviewed by: jhb, alfred
Tested on: cvsup.jp.FreeBSD.org
(which is a quad-CPU machine running -current)


# 19610b66 20-Feb-2002 Bruce Evans <bde@FreeBSD.org>

Fixed some style bugs. Added a comment about a bug in PT_SSTEP.

Approved by: des


# 4b1aa58b 20-Feb-2002 Bruce Evans <bde@FreeBSD.org>

Recover bits that were lost in transition in rev.1.76:
- P_INMEM checks in all the functions. P_INMEM must be checked because
PHOLD() is broken. The old bits had bogus locking (using sched_lock)
to lock P_INMEM. After removing the P_INMEM checks, we were left with
just the bogus locking.
- large comments. They were too large, but better than nothing.

Remove obfuscations that were gained in transition in rev.1.76:
- PROC_REG_ACTION() is even more of an obfuscation than PROC_ACTION().

The change copies procfs_machdep.c rev.1.22 of i386/procfs_machdep.c
verbatim except for "fixing" the old-style function headers and adjusting
function names and comments. It doesn't remove the bogus locking.

Approved by: des


# fe0d0493 08-Feb-2002 Peter Wemm <peter@FreeBSD.org>

Bah, I managed to turn cosmetic things into real bugs. Fix shadowed
variable declarations. :-( Definately not my day today.


# 2d008b44 07-Feb-2002 Peter Wemm <peter@FreeBSD.org>

Fix a whole bunch of long lines introduced by previous commit by using
td = FIRST_THREAD_IN_PROC(p) once, after we have identified the process
that we are operating on.


# 079b7bad 07-Feb-2002 Julian Elischer <julian@FreeBSD.org>

Pre-KSE/M3 commit.
this is a low-functionality change that changes the kernel to access the main
thread of a process via the linked list of threads rather than
assuming that it is embedded in the process. It IS still embeded there
but remove all teh code that assumes that in preparation for the next commit
which will actually move it out.

Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,


# 7c629906 21-Oct-2001 Dag-Erling Smørgrav <des@FreeBSD.org>

Move procfs_* from procfs_machdep.c into sys_process.c, and rename them to
proc_* in the process; procfs_machdep.c is no longer needed.

Run-tested on i386, build-tested on Alpha, untested on other platforms.


# 3da32491 07-Oct-2001 Dag-Erling Smørgrav <des@FreeBSD.org>

Dissociate ptrace from procfs.

Until now, the ptrace syscall was implemented as a wrapper that called
various functions in procfs depending on which ptrace operation was
requested. Most of these functions were themselves wrappers around
procfs_{read,write}_{,db,fp}regs(), with only some extra error checks,
which weren't necessary in the ptrace case anyway.

This commit moves procfs_rwmem() from procfs_mem.c into sys_process.c
(renaming it to proc_rwmem() in the process), and implements ptrace()
directly in terms of procfs_{read,write}_{,db,fp}regs() instead of
having it fake up a struct uio and then call procfs_do{,db,fp}regs().

It also moves the prototypes for procfs_{read,write}_{,db,fp}regs()
and proc_rwmem() from proc.h to ptrace.h, and marks all procfs files
except procfs_machdep.c as "optional procfs" instead of "standard".


# 50f74e92 04-Oct-2001 Dag-Erling Smørgrav <des@FreeBSD.org>

Final style(9) commit: placement of opening brace; a continuation indent I
missed in the previous commit; a line that exceeded 80 characters. No
functional changes, but the object file's md5 checksum changes because some
lines have been displaced.


# 8a8d4e45 04-Oct-2001 Dag-Erling Smørgrav <des@FreeBSD.org>

More style(9) fixes: no spaces between function name and parameter list;
some indentation fixes (particularly continuation lines).

Reviewed by: md5(1)


# c5799337 04-Oct-2001 Dag-Erling Smørgrav <des@FreeBSD.org>

This file had a mixture of "return foo;" and "return (foo);"; standardize
on "return (foo);" as mandated by style(9).

Reviewed by: md5(1)


# 796ed2a6 18-Sep-2001 Mark Peek <mp@FreeBSD.org>

Set debug information on the process being traced, not the current (debugger)
process. This should allow gdb to function correctly on post-KSE kernels.


# b40ce416 12-Sep-2001 Julian Elischer <julian@FreeBSD.org>

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 2aca0c28 07-Aug-2001 Peter Wemm <peter@FreeBSD.org>

Zap 'ptrace(PT_READ_U, ...)' and 'ptrace(PT_WRITE_U, ...)' since they
are a really nasty interface that should have been killed long ago
when 'ptrace(PT_[SG]ETREGS' etc came along. The entity that they
operate on (struct user) will not be around much longer since it
is part-per-process and part-per-thread in a post-KSE world.

gdb does not actually use this except for the obscure 'info udot'
command which does a hexdump of as much of the child's 'struct user'
as it can get. It carries its own #defines so it doesn't break
compiles.


# a0f75161 05-Jul-2001 Robert Watson <rwatson@FreeBSD.org>

o Replace calls to p_can(..., P_CAN_xxx) with calls to p_canxxx().
The p_can(...) construct was a premature (and, it turns out,
awkward) abstraction. The individual calls to p_canxxx() better
reflect differences between the inter-process authorization checks,
such as differing checks based on the type of signal. This has
a side effect of improving code readability.
o Replace direct credential authorization checks in ktrace() with
invocation of p_candebug(), while maintaining the special case
check of KTR_ROOT. This allows ktrace() to "play more nicely"
with new mandatory access control schemes, as well as making its
authorization checks consistent with other "debugging class"
checks.
o Eliminate "privused" construct for p_can*() calls which allowed the
caller to determine if privilege was required for successful
evaluation of the access control check. This primitive is currently
unused, and as such, serves only to complicate the API.

Approved by: ({procfs,linprocfs} changes) des
Obtained from: TrustedBSD Project


# 99d300a1 23-May-2001 Ruslan Ermilov <ru@FreeBSD.org>

- FDESC, FIFO, NULL, PORTAL, PROC, UMAP and UNION file
systems were repo-copied from sys/miscfs to sys/fs.

- Renamed the following file systems and their modules:
fdesc -> fdescfs, portal -> portalfs, union -> unionfs.

- Renamed corresponding kernel options:
FDESC -> FDESCFS, PORTAL -> PORTALFS, UNION -> UNIONFS.

- Install header files for the above file systems.

- Removed bogus -I${.CURDIR}/../../sys CFLAGS from userland
Makefiles.


# 6c49a8e2 04-May-2001 John Baldwin <jhb@FreeBSD.org>

Fix a bug in the pfind() changes due to confusing the process returned by
pfind() ('pp') with the process being detached from ptrace.

Reported by: bde


# fb919e4d 01-May-2001 Mark Murray <markm@FreeBSD.org>

Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by: bde (with reservations)


# 33a9ed9d 23-Apr-2001 John Baldwin <jhb@FreeBSD.org>

Change the pfind() and zpfind() functions to lock the process that they
find before releasing the allproc lock and returning.

Reviewed by: -smp, dfr, jake


# 1005a129 28-Mar-2001 John Baldwin <jhb@FreeBSD.org>

Convert the allproc and proctree locks from lockmgr locks to sx locks.


# 731a1aea 06-Mar-2001 John Baldwin <jhb@FreeBSD.org>

- Proc locking.
- Remove some unneeded spl()'s.


# 9ed346ba 08-Feb-2001 Bosko Milekic <bmilekic@FreeBSD.org>

Change and clean the mutex lock interface.

mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)


# 3897ca7c 24-Jan-2001 John Baldwin <jhb@FreeBSD.org>

- Catch up to proc flag changes.
- Update stopevent() to assert that the proc lock is held when it is
held and is not recursed. Note that the STOPEVENT() macro obtains
the proc lock when calling this function.


# e9df486f 30-Dec-2000 Paul Saab <ps@FreeBSD.org>

Backout rev 1.57 & 1.58. While the previous revisions fixed
attaching to running processes, it completely breaks normal debugging.
A better fix is in the works, but cannot be properly tested until
the problem with gdb hanging the system in -current is solved.


# 894653d6 29-Dec-2000 Paul Saab <ps@FreeBSD.org>

Pass me the pointy hat. Do not hold sched_lock over psignal.

Submitted by: alfred


# 6a10f299 28-Dec-2000 Paul Saab <ps@FreeBSD.org>

Send a SIGCONT when detaching or continuing the excution of a traced
process. This fixes a problem when attaching to a process in gdb
and the process staying in the STOP'd state after quiting gdb.
This whole process seems a bit suspect, but this seems to work.

Reviewed by: peter


# 98f03f90 23-Dec-2000 Jake Burkholder <jake@FreeBSD.org>

Protect proc.p_pptr and proc.p_children/p_sibling with the
proctree_lock.

linprocfs not locked pending response from informal maintainer.

Reviewed by: jhb, -smp@


# 1f7d2501 12-Dec-2000 Kirk McKusick <mckusick@FreeBSD.org>

Change the proc information returned from the kernel so that it
no longer contains kernel specific data structures, but rather
only scalar values and structures that are already part of the
kernel/user interface, specifically rusage and rtprio. It no
longer contains proc, session, pcred, ucred, procsig, vmspace,
pstats, mtx, sigiolst, klist, callout, pasleep, or mdproc. If
any of these changed in size, ps, w, fstat, gcore, systat, and
top would all stop working. The new structure has over 200 bytes
of unassigned space for future values to be added, yet is nearly
100 bytes smaller per entry than the structure that it replaced.


# 0ebabc93 01-Dec-2000 John Baldwin <jhb@FreeBSD.org>

Protect p_stat with sched_lock.


# 2ec40c9a 13-Oct-2000 John W. De Boskey <jwd@FreeBSD.org>

Remove the signal value check from the PT_STEP codepath. It
can cause an bogus failure.

Reviewed by: Sean Eric Fagan <sef@kithrup.com>
and no other response to the review request.


# 387d2c03 29-Aug-2000 Robert Watson <rwatson@FreeBSD.org>

o Centralize inter-process access control, introducing:

int p_can(p1, p2, operation, privused)

which allows specification of subject process, object process,
inter-process operation, and an optional call-by-reference privused
flag, allowing the caller to determine if privilege was required
for the call to succeed. This allows jail, kern.ps_showallprocs and
regular credential-based interaction checks to occur in one block of
code. Possible operations are P_CAN_SEE, P_CAN_SCHED, P_CAN_KILL,
and P_CAN_DEBUG. p_can currently breaks out as a wrapper to a
series of static function checks in kern_prot, which should not
be invoked directly.

o Commented out capabilities entries are included for some checks.

o Update most inter-process authorization to make use of p_can() instead
of manual checks, PRISON_CHECK(), P_TRESPASS(), and
kern.ps_showallprocs.

o Modify suser{,_xxx} to use const arguments, as it no longer modifies
process flags due to the disabling of ASU.

o Modify some checks/errors in procfs so that ENOENT is returned instead
of ESRCH, further improving concealment of processes that should not
be visible to other processes. Also introduce new access checks to
improve hiding of processes for procfs_lookup(), procfs_getattr(),
procfs_readdir(). Correct a bug reported by bp concerning not
handling the CREATE case in procfs_lookup(). Remove volatile flag in
procfs that caused apparently spurious qualifier warnigns (approved by
bde).

o Add comment noting that ktrace() has not been updated, as its access
control checks are different from ptrace(), whereas they should
probably be the same. Further discussion should happen on this topic.

Reviewed by: bde, green, phk, freebsd-security, others
Approved by: bde
Obtained from: TrustedBSD Project


# a9e0361b 21-Nov-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Introduce the new function
p_trespass(struct proc *p1, struct proc *p2)
which returns zero or an errno depending on the legality of p1 trespassing
on p2.

Replace kern_sig.c:CANSIGNAL() with call to p_trespass() and one
extra signal related check.

Replace procfs.h:CHECKIO() macros with calls to p_trespass().

Only show command lines to process which can trespass on the target
process.


# 923502ff 29-Oct-1999 Poul-Henning Kamp <phk@FreeBSD.org>

useracc() the prequel:

Merge the contents (less some trivial bordering the silly comments)
of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts
the #defines for the vm_inherit_t and vm_prot_t types next to their
typedefs.

This paves the road for the commit to follow shortly: change
useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE}
as argument.


# d1f088da 11-Oct-1999 Peter Wemm <peter@FreeBSD.org>

Trim unused options (or #ifdef for undoc options).

Submitted by: phk


# c3aac50f 27-Aug-1999 Peter Wemm <peter@FreeBSD.org>

$Id$ -> $FreeBSD$


# ab001a72 08-Jul-1999 Jonathan Lemon <jlemon@FreeBSD.org>

Implement support for hardware debug registers on the i386.

Submitted by: Brian Dean <brdean@unx.sas.com>


# 7a0dde68 01-Jul-1999 Peter Wemm <peter@FreeBSD.org>

Moving the initialization for write sooner quiets a warning.


# 75c13541 28-Apr-1999 Poul-Henning Kamp <phk@FreeBSD.org>

This Implements the mumbled about "Jail" feature.

This is a seriously beefed up chroot kind of thing. The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.

For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact: "real virtual servers".

Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.

Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.

It generally does what one would expect, but setting up a jail
still takes a little knowledge.

A few notes:

I have no scripts for setting up a jail, don't ask me for them.

The IP number should be an alias on one of the interfaces.

mount a /proc in each jail, it will make ps more useable.

/proc/<pid>/status tells the hostname of the prison for
jailed processes.

Quotas are only sensible if you have a mountpoint per prison.

There are no privisions for stopping resource-hogging.

Some "#ifdef INET" and similar may be missing (send patches!)

If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!

Tools, comments, patches & documentation most welcome.

Have fun...

Sponsored by: http://www.rndassociates.com/
Run for almost a year by: http://www.servetheweb.com/


# f711d546 27-Apr-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Suser() simplification:

1:
s/suser/suser_xxx/

2:
Add new function: suser(struct proc *), prototyped in <sys/proc.h>.

3:
s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/

The remaining suser_xxx() calls will be scrutinized and dealt with
later.

There may be some unneeded #include <sys/cred.h>, but they are left
as an exercise for Bruce.

More changes to the suser() API will come along with the "jail" code.


# 67e7cb89 29-Mar-1999 Doug Rabson <dfr@FreeBSD.org>

Call ptrace_u_check with the right size.


# d254af07 27-Jan-1999 Matthew Dillon <dillon@FreeBSD.org>

Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile


# dae63452 26-Dec-1998 Doug Rabson <dfr@FreeBSD.org>

Tweak ptrace(PT_READ_U) so that the last alpha register can be read.


# 8a8a13c8 29-Jul-1998 Doug Rabson <dfr@FreeBSD.org>

Only access an int for READU/WRITEU since that is what ptrace is declared to
return.


# 6a206dd9 14-Jul-1998 Bruce Evans <bde@FreeBSD.org>

Cast function pointers to uintfptr_t before casting them to u_long.
Hopefully caddr_t is large enough to hold function pointers.

Cast object pointers to uintptr_t before casting them to u_long.

Types are wronger than usual for the PT_READ_U case. ptrace() can
only return ints, but longs are accessed.


# ecbb00a2 07-Jun-1998 Doug Rabson <dfr@FreeBSD.org>

This commit fixes various 64bit portability problems required for
FreeBSD/alpha. The most significant item is to change the command
argument to ioctl functions from int to u_long. This change brings us
inline with various other BSD versions. Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.

The prototype FreeBSD/alpha machdep will follow in a couple of days
time.


# afc6ea23 18-May-1998 Tor Egge <tegge@FreeBSD.org>

Disallow reading the current kernel stack. Only the user structure and
the current registers should be accessible.
Reviewed by: David Greenman <dg@root.com>


# 0b08f5f7 05-Feb-1998 Eivind Eklund <eivind@FreeBSD.org>

Back out DIAGNOSTIC changes.


# 47cfdb16 04-Feb-1998 Eivind Eklund <eivind@FreeBSD.org>

Turn DIAGNOSTIC into a new-style option.


# 2d8acc0f 22-Jan-1998 John Dyson <dyson@FreeBSD.org>

VM level code cleanups.

1) Start using TSM.
Struct procs continue to point to upages structure, after being freed.
Struct vmspace continues to point to pte object and kva space for kstack.
u_map is now superfluous.
2) vm_map's don't need to be reference counted. They always exist either
in the kernel or in a vmspace. The vmspaces are managed by reference
counts.
3) Remove the "wired" vm_map nonsense.
4) No need to keep a cache of kernel stack kva's.
5) Get rid of strange looking ++var, and change to var++.
6) Change more data structures to use our "zone" allocator. Added
struct proc, struct vmspace and struct vnode. This saves a significant
amount of kva space and physical memory. Additionally, this enables
TSM for the zone managed memory.
7) Keep ioopt disabled for now.
8) Remove the now bogus "single use" map concept.
9) Use generation counts or id's for data structures residing in TSM, where
it allows us to avoid unneeded restart overhead during traversals, where
blocking might occur.
10) Account better for memory deficits, so the pageout daemon will be able
to make enough memory available (experimental.)
11) Fix some vnode locking problems. (From Tor, I think.)
12) Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp.
(experimental.)
13) Significantly shrink, cleanup, and make slightly faster the vm_fault.c
code. Use generation counts, get rid of unneded collpase operations,
and clean up the cluster code.
14) Make vm_zone more suitable for TSM.

This commit is partially as a result of discussions and contributions from
other people, including DG, Tor Egge, PHK, and probably others that I
have forgotten to attribute (so let me know, if I forgot.)

This is not the infamous, final cleanup of the vnode stuff, but a necessary
step. Vnode mgmt should be correct, but things might still change, and
there is still some missing stuff (like ioopt, and physical backing of
non-merged cache files, debugging of layering concepts.)


# 2a024a2b 05-Dec-1997 Sean Eric Fagan <sef@FreeBSD.org>

Changes to allow event-based process monitoring and control.


# d72ec665 11-Nov-1997 Tor Egge <tegge@FreeBSD.org>

Set return value for the correct process in ptrace().


# cb226aaa 06-Nov-1997 Poul-Henning Kamp <phk@FreeBSD.org>

Move the "retval" (3rd) parameter from all syscall functions and put
it in struct proc instead.

This fixes a boatload of compiler warning, and removes a lot of cruft
from the sources.

I have not removed the /*ARGSUSED*/, they will require some looking at.

libkvm, ps and other userland struct proc frobbing programs will need
recompiled.


# e4ba6a82 02-Sep-1997 Bruce Evans <bde@FreeBSD.org>

Removed unused #includes.


# cf72998e 27-Apr-1997 Alexander Langer <alex@FreeBSD.org>

Remove bogon from previous commit: doubly included sys/systm.h.


# ee7877df 27-Apr-1997 Alexander Langer <alex@FreeBSD.org>

Prevent debugger attachment to init when securelevel > 0.

Noticed by: Brian Buchanan <brian@wasteland.calbbs.com>


# 6875d254 22-Feb-1997 Peter Wemm <peter@FreeBSD.org>

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 996c772f 09-Feb-1997 John Dyson <dyson@FreeBSD.org>

This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.

Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>


# 1130b656 14-Jan-1997 Jordan K. Hubbard <jkh@FreeBSD.org>

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# 4ebce1e9 02-Jun-1996 John Dyson <dyson@FreeBSD.org>

Remove the now-unnecessary and incorrect wiring of the "other" processes
page table pages. The pmap layer now handles that fully.


# e911eafc 02-May-1996 Poul-Henning Kamp <phk@FreeBSD.org>

removed:
CLBYTES PD_SHIFT PGSHIFT NBPG PGOFSET CLSIZELOG2 CLSIZE pdei()
ptei() kvtopte() ptetov() ispt() ptetoav() &c &c
new:
NPDEPG

Major macro cleanup.


# 2eb80d36 30-Mar-1996 Peter Wemm <peter@FreeBSD.org>

Because of the way that ptrace() now calls procfs routines to read/write
the process's memory, it was possible for the procfs_domem() call to
return a residual leftover, but with no errno. Since this is no good for
ptrace which ignored the the residual, remap a leftover amount into an
errno rather than fooling the caller into thinking it was successful when
in fact it was not.

Submitted by: bde (a very long time ago :-)


# b0281cef 24-Jan-1996 Peter Wemm <peter@FreeBSD.org>

Major fixes for ptrace()...

PT_ATTACH/PT_DETACH implemented now and fully operational.
PT_{GET|SET}{REGS|FPREFS} implemented now, using code shared with procfs
PT_{READ|WRITE}_{I|D} now uses code shared with procfs
ptrace opcodes now fully permission checked, including ownerships.
doing an operation to the u-area on a swapped process should no longer
panic.
running gdb as root works for me now, where it didn't before.
general cleanup..

Note, that this has some tightening of permissions/access checks etc.
Some of these may be going too far.. In particular, the "owner" of the
traced process is enforced. The process that created or attached to
the traced process is now the only one that can "do" things to it.


# bd7e5f99 18-Jan-1996 John Dyson <dyson@FreeBSD.org>

Eliminated many redundant vm_map_lookup operations for vm_mmap.
Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish
overhead for merged cache.
Efficiency improvement for vfs_cluster. It used to do alot of redundant
calls to cluster_rbuild.
Correct the ordering for vrele of .text and release of credentials.
Use the selective tlb update for 486/586/P6.
Numerous fixes to the size of objects allocated for files. Additionally,
fixes in the various pagers.
Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs.
Fixes in the swap pager for exhausted resources. The pageout code
will not as readily thrash.
Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into
page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE),
thereby improving efficiency of several routines.
Eliminate even more unnecessary vm_page_protect operations.
Significantly speed up process forks.
Make vm_object_page_clean more efficient, thereby eliminating the pause
that happens every 30seconds.
Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the
case of filesystems mounted async.
Fix a panic with busy pages when write clustering is done for non-VMIO
buffers.


# 63c8f421 16-Dec-1995 Bruce Evans <bde@FreeBSD.org>

Updated to match 1TB filesize changes. Some pindexes were still offsets
and weren't converted. ptrace() was broken.


# 5add07e5 16-Dec-1995 Bruce Evans <bde@FreeBSD.org>

Removed dead debugging code.


# efeaf95a 06-Dec-1995 David Greenman <dg@FreeBSD.org>

Untangled the vm.h include file spaghetti.


# 45a4ad11 14-Nov-1995 Poul-Henning Kamp <phk@FreeBSD.org>

Move the process-table stuff to a more suitable file.
Remove filetable stuff from kern_sysctl.c


# d2d3e875 11-Nov-1995 Bruce Evans <bde@FreeBSD.org>

Included <sys/sysproto.h> to get central declarations for syscall args
structs and prototypes for syscalls.

Ifdefed duplicated decentralized declarations of args structs. It's
convenient to have this visible but they are hard to maintain. Some
are already different from the central declarations. 4.4lite2 puts
them in comments in the function headers but I wanted to avoid the
large changes for that.


# 9b2e5354 30-May-1995 Rodney W. Grimes <rgrimes@FreeBSD.org>

Remove trailing whitespace.


# c4cf09ff 12-May-1995 David Greenman <dg@FreeBSD.org>

pread/pwrite() should be static.

Submitted by: sef


# b5e8ce9f 16-Mar-1995 Bruce Evans <bde@FreeBSD.org>

Add and move declarations to fix all of the warnings from `gcc -Wimplicit'
(except in netccitt, netiso and netns) and most of the warnings from
`gcc -Wnested-externs'. Fix all the bugs found. There were no serious
ones.


# 9219d44e 18-Feb-1995 David Greenman <dg@FreeBSD.org>

Truncate the pte address to a page boundry. This probably won't fix the
panic, but at least it's more correct.


# eb2463e1 15-Feb-1995 David Greenman <dg@FreeBSD.org>

Fixed botched previous change - use 'pageno' not initialized to NULL 'kva'.

Submitted by: Lars Fredriksen


# 914a63eb 10-Feb-1995 David Greenman <dg@FreeBSD.org>

Wire the page table before doing the vm_fault(). Fixes a panic that
happens when using gdb.

Submitted by: John Dyson


# 20415301 14-Jan-1995 Bruce Evans <bde@FreeBSD.org>

Fix security holes in sigreturn(), ptrace() and procfs. sigreturn()
attempted to check for insecure and fatal eflags and segment
selectors, but missed many cases and got the IOPL check back to
front. The other syscalls didn't check at all.

sys_process.c, machdep.c:
Only allow PT_WRITE_U to write to the registers (ordinary and FP).

psl.h, locore.s, machdep.c:
Eliminate PSL_MBZ, PSL_MBO and PSL_USERCLR. We are not supposed
to assume anything about the reserved bits. Use PSL_USERCHANGE
and PSL_KERNEL instead. Rename PSL_USERSET to PSL_USER.

exception.s:
Define a private label for use by doreti when returning to user
mode fails.

machdep.c:
In syscalls, allow changing only the eflags that can be changed on
486's in user mode (no longer attempt to allow benign IOPL changes;
allow changing the nasty PSL_NT; don't allow changing the i586
bits).

Don't attempt to check all the cases involving invalid selectors
and %eip's. Just check for privilege violations and let the invalid
things cause a trap.

procfs_machdep.c:
Call the ptrace register functions to do all the work for reading
and writing ordinary registers and for single stepping.

trap.c:
Ignore traps caused by PSL_NT being set. Previously, users could
cause a fatal trap in user mode by setting PSL_NT and executing an
iret, and a fatal trap in kernel mode by setting PSL_NT and making
a syscall. PSL_NT was cleared too late and not in enough modes to
fix the problem.

Make all traps in user mode (except T_NMI) nonfatal.

Recover from traps caused by attempting to load invalid user
registers in doreti by restarting the traps so that they appear to
occur in user mode.
---

Fix bogons that I noticed while fixing the above:

psl.h:
Fix some comments.

Uniformize idempotency ifdef.

exception.s, machdep.c:
Remove rsvd[0-14]. rsvd0 hasn't been reserved since the 486 came
out. Replace rsvd0 by `align'. rsvd[0-11] used wrong (magic
non-unique) trap numbers. Replace rsvd[1-14] by rsvd.

locore.s:
Enable alignment check flag on 486's and 586's.

machdep.c:
Use a better type for kstack[].

Use TFREGP() to find the registers.

Reformat ptrace functions from SEF to something closer to KNF.

procfs_machdep.c:
The wrong pointer to the registers got fixed as a side effect.

Implement reading and writing of FP registers.

/proc/*/*regs now work (only) for processes that are in memory.

Clean up comments.

trap.c, trap.h:
Remove unused trap types.


# bb56ec4a 25-Sep-1994 Poul-Henning Kamp <phk@FreeBSD.org>

While in the real world, I had a bad case of being swapped out for a lot of
cycles. While waiting there I added a lot of the extra ()'s I have, (I have
never used LISP to any extent). So I compiled the kernel with -Wall and
shut up a lot of "suggest you add ()'s", removed a bunch of unused var's
and added a couple of declarations here and there. Having a lap-top is
highly recommended. My kernel still runs, yell at me if you kernel breaks.


# f23b4c91 18-Aug-1994 Garrett Wollman <wollman@FreeBSD.org>

Fix up some sloppy coding practices:

- Delete redundant declarations.
- Add -Wredundant-declarations to Makefile.i386 so they don't come back.
- Delete sloppy COMMON-style declarations of uninitialized data in
header files.
- Add a few prototypes.
- Clean up warnings resulting from the above.

NB: ioconf.c will still generate a redundant-declaration warning, which
is unavoidable unless somebody volunteers to make `config' smarter.


# f540b106 12-Aug-1994 Garrett Wollman <wollman@FreeBSD.org>

Change all #includes to follow the current Berkeley style. Some of these
``changes'' are actually not changes at all, but CVS sometimes has trouble
telling the difference.

This also includes support for second-directory compiles. This is not
quite complete yet, as `config' doesn't yet do the right thing. You can
still make it work trivially, however, by doing the following:

rm /sys/compile
mkdir /usr/obj/sys/compile
ln -s M-. /sys/compile
cd /sys/i386/conf
config MYKERNEL
cd ../../compile/MYKERNEL
ln -s /sys @
rm machine
ln -s @/i386/include machine
make depend
make


# 4e68ceab 08-Aug-1994 David Greenman <dg@FreeBSD.org>

Process tracing code. Written by Sean Eric Fagan.

Submitted by: Sean Eric Fagan


# 3c4dd356 02-Aug-1994 David Greenman <dg@FreeBSD.org>

Added $Id$


# 26f9a767 25-May-1994 Rodney W. Grimes <rgrimes@FreeBSD.org>

The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.

Reviewed by: Rodney W. Grimes
Submitted by: John Dyson and David Greenman


# df8bae1d 24-May-1994 Rodney W. Grimes <rgrimes@FreeBSD.org>

BSD 4.4 Lite Kernel Sources