History log of /freebsd-10.1-release/sys/fs/procfs/
Revision Date Author Comments
272461 03-Oct-2014 gjb

Copy stable/10@r272459 to releng/10.1 as part of
the 10.1-RELEASE process.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


251423 05-Jun-2013 alc

Relax the vm object locking. Use a read lock.

Sponsored by: EMC / Isilon Storage Division


248084 09-Mar-2013 attilio

Switch the vm_object mutex to be a rwlock. This will enable in the
future further optimizations where the vm_object lock will be held
in read mode most of the time the page cache resident pool of pages
are accessed for reading purposes.

The change is mostly mechanical but few notes are reported:
* The KPI changes as follow:
- VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK()
- VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK()
- VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK()
- VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED()
(in order to avoid visibility of implementation details)
- The read-mode operations are added:
VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(),
VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED()
* The vm/vm_pager.h namespace pollution avoidance (forcing requiring
sys/mutex.h in consumers directly to cater its inlining functions
using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h
consumers now must include also sys/rwlock.h.
* zfs requires a quite convoluted fix to include FreeBSD rwlocks into
the compat layer because the name clash between FreeBSD and solaris
versions must be avoided.
At this purpose zfs redefines the vm_object locking functions
directly, isolating the FreeBSD components in specific compat stubs.

The KPI results heavilly broken by this commit. Thirdy part ports must
be updated accordingly (I can think off-hand of VirtualBox, for example).

Sponsored by: EMC / Isilon storage division
Reviewed by: jeff
Reviewed by: pjd (ZFS specific review)
Discussed with: alc
Tested by: pho


241896 22-Oct-2012 kib

Remove the support for using non-mpsafe filesystem modules.

In particular, do not lock Giant conditionally when calling into the
filesystem module, remove the VFS_LOCK_GIANT() and related
macros. Stop handling buffers belonging to non-mpsafe filesystems.

The VFS_VERSION is bumped to indicate the interface change which does
not result in the interface signatures changes.

Conducted and reviewed by: attilio
Tested by: pho


232278 29-Feb-2012 mm

Add procfs to jail-mountable filesystems.

Reviewed by: jamie
MFC after: 1 week


230145 15-Jan-2012 trociny

Abrogate nchr argument in proc_getargv() and proc_getenvv(): we always want
to read strings completely to know the actual size.

As a side effect it fixes the issue with kern.proc.args and kern.proc.env
sysctls, which didn't return the size of available data when calling
sysctl(3) with the NULL argument for oldp.

Note, in get_ps_strings(), which does actual work for proc_getargv() and
proc_getenvv(), we still have a safety limit on the size of data read in
case of a corrupted procces stack.

Suggested by: kib
MFC after: 3 days


230132 15-Jan-2012 uqs

Convert files to UTF-8


227834 22-Nov-2011 trociny

In procfs_doproccmdline() if arguments are not cashed read them from
the process stack.

Suggested by: kib
Reviewed by: kib
Tested by: pho
MFC after: 2 weeks


227393 09-Nov-2011 kib

Lock the thread lock around block that retrieves td_wmesg. Otherwise,
procfs could see a thread with assigned td_wchan but still NULL td_wmesg.

Reported and tested by: pho
MFC after: 1 week


227104 05-Nov-2011 kib

Fix typo.

MFC after: 3 days


225617 16-Sep-2011 kmacy

In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by: rwatson
Approved by: re (bz)


224915 16-Aug-2011 kib

Do not return success and a string "unknown" when vn_fullpath() was unable
to resolve the path of the text vnode of the process. The behaviour is
very confusing for any consumer of the procfs, in particular, java.

Reported and tested by: bf
MFC after: 2 weeks
Approved by: re (bz)


217896 26-Jan-2011 dchagin

Add macro to test the sv_flags of any process. Change some places to test
the flags instead of explicit comparing with address of known sysentvec
structures.

MFC after: 1 month


216128 02-Dec-2010 trasz

Replace pointer to "struct uidinfo" with pointer to "struct ucred"
in "struct vm_object". This is required to make it possible to account
for per-jail swap usage.

Reviewed by: kib@
Tested by: pho@
Sponsored by: FreeBSD Foundation


216120 02-Dec-2010 kib

For non-stopped threads, td_frame pointer is undefined. As a
consequence, fill_regs() and fill_fpregs() access random data, usually
on the thread kernel stack. Most often the td_frame points to the
previous frame saved by last kernel entry sequence, but this is not
guaranteed.

For /proc/<pid>/{regs,fpregs} read access, require the thread to be in
stopped state. Otherwise, return EBUSY as is done for write case.

Reported and tested by: pho
Approved by: des (procfs maintainer)
MFC after: 1 week


209062 11-Jun-2010 avg

fix a few cases where a string is passed via format argument instead of
via %s

Most of the cases looked harmless, but this is done for the sake of
correctness. In one case it even allowed to drop an intermediate buffer.

Found by: clang
MFC after: 2 week


207848 10-May-2010 kib

The thread_unsuspend() requires both process mutex and process spinlock
locked. Postpone the process unlock till the thread_unsuspend() is called.

Approved by: des (procfs maintainer)
MFC after: 1 week


207847 10-May-2010 kib

For detach procfs ctl command, also clear P_STOPPED_TRACE process stop
flag, and for each thread, TDB_SUSPEND debug flag, same as it is done by
exit1() for orphaned debugee.

Approved by: des (procfs maintainer)
MFC after: 1 week


205014 11-Mar-2010 nwhitehorn

Provide groundwork for 32-bit binary compatibility on non-x86 platforms,
for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32
option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts
of the kernel and enhances the freebsd32 compatibility code to support
big-endian platforms.

Reviewed by: kib, jhb


201954 09-Jan-2010 brooks

Update the comment on printing group membership to reflect that fact
that each groupt the process is a member of is printed rather than an
entry for each group the user could be a member of.

MFC after: 3 days


197428 23-Sep-2009 kib

Add per-process osrel node to the procfs, to allow read and set p_osrel
value for the process.

Approved by: des (procfs maintainer)
MFC after: 3 weeks


195840 24-Jul-2009 jhb

Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to
a device pager (OBJT_DEVICE) object in that it uses fictitious pages to
provide aliases to other memory addresses. The primary difference is that
it uses an sglist(9) to determine the physical addresses for a given offset
into the object instead of invoking the d_mmap() method in a device driver.

Reviewed by: alc
Approved by: re (kensmith)
MFC after: 2 weeks


194766 23-Jun-2009 kib

Implement global and per-uid accounting of the anonymous memory. Add
rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved
for the uid.

The accounting information (charge) is associated with either map entry,
or vm object backing the entry, assuming the object is the first one
in the shadow chain and entry does not require COW. Charge is moved
from entry to object on allocation of the object, e.g. during the mmap,
assuming the object is allocated, or on the first page fault on the
entry. It moves back to the entry on forks due to COW setup.

The per-entry granularity of accounting makes the charge process fair
for processes that change uid during lifetime, and decrements charge
for proper uid when region is unmapped.

The interface of vm_pager_allocate(9) is extended by adding struct ucred *,
that is used to charge appropriate uid when allocation if performed by
kernel, e.g. md(4).

Several syscalls, among them is fork(2), may now return ENOMEM when
global or per-uid limits are enforced.

In collaboration with: pho
Reviewed by: alc
Approved by: re (kensmith)


192895 27-May-2009 jamie

Add hierarchical jails. A jail may further virtualize its environment
by creating a child jail, which is visible to that jail and to any
parent jails. Child jails may be restricted more than their parents,
but never less. Jail names reflect this hierarchy, being MIB-style
dot-separated strings.

Every thread now points to a jail, the default being prison0, which
contains information about the physical system. Prison0's root
directory is the same as rootvnode; its hostname is the same as the
global hostname, and its securelevel replaces the global securelevel.
Note that the variable "securelevel" has actually gone away, which
should not cause any problems for code that properly uses
securelevel_gt() and securelevel_ge().

Some jail-related permissions that were kept in global variables and
set via sysctls are now per-jail settings. The sysctls still exist for
backward compatibility, used only by the now-deprecated jail(2) system
call.

Approved by: bz (mentor)


189282 02-Mar-2009 kib

Use the p_sysent->sv_flags flag SV_ILP32 to detect 32bit process
executing on 64bit kernel. This eliminates the direct comparisions
of p_sysent with &ia32_freebsd_sysvec, that were left intact after
r185169.


188677 16-Feb-2009 des

Fix a logic bug that caused the pfs_attr method to be called only for
PFS_PROCDEP nodes.

Submitted by: Andrew Brampton <brampton@gmail.com>
MFC after: 2 weeks


186563 29-Dec-2008 kib

vm_map_lock_read() does not increment map->timestamp, so we should
compare map->timestamp with saved timestamp after map read lock is
reacquired, not with saved timestamp + 1. The only consequence of the +1
was unconditional lookup of the next map entry, though.

Tested by: pho
Approved by: des
MFC after: 2 weeks


186562 29-Dec-2008 kib

Use curproc->p_sysent->sv_flags bit SV_ILP32 for detection of the 32 bit
caller, instead of direct comparision with ia32_freebsd_sysvec.

Tested by: pho
Approved by: des
MFC after: 2 weeks


185984 12-Dec-2008 kib

Reference the vmspace of the process being inspected by procfs, linprocfs
and sysctl kern_proc_vmmap handlers.

Reported and tested by: pho
Reviewed by: rwatson, des
MFC after: 1 week


185864 10-Dec-2008 kib

Relock user map earlier, to have the lock held when break leaves the
loop earlier due to sbuf error.

Pointy hat to: me
Submitted by: dchagin


185766 08-Dec-2008 kib

Make two style changes to create new commit and document proper commit
message for r185765.

Noted by: rdivacky
Requested by: des

Commit message for r185765 should be:
In procfs map handler, and in linprocfs maps handler, do not call
vn_fullpath() while having vm map locked. This is done in anticipation
of the vop_vptocnp commit, that would make vn_fullpath sometime
acquire vnode lock.

Also, in linprocfs, maps handler already acquires vnode lock.

No objections from: des
MFC after: 2 week


185765 08-Dec-2008 kib

Change the linprocfs <pid>/maps and procfs <pid>/map handlers to use
sbuf instead of doing uiomove. This allows for reads from non-zero
offsets to work.

Patch is forward-ported des@' one, and was adopted to current code
by dchagin@ and me.

Reviewed by: des (linprocfs part)
PR: kern/101453
MFC after: 1 week


184652 04-Nov-2008 jhb

Remove unnecessary locking around vn_fullpath(). The vnode lock for the
vnode in question does not need to be held. All the data structures used
during the name lookup are protected by the global name cache lock.
Instead, the caller merely needs to ensure a reference is held on the
vnode (such as vhold()) to keep it from being freed.

In the case of procfs' <pid>/file entry, grab the process lock while we
gain a new reference (via vhold()) on p_textvp to fully close races with
execve(2).

For the kern.proc.vmmap sysctl handler, use a shared vnode lock around
the call to VOP_GETATTR() rather than an exclusive lock.

MFC after: 1 month


183600 04-Oct-2008 kib

Change the linprocfs <pid>/maps and procfs <pid>/map handlers to use
sbuf instead of doing uiomove. This allows for reads from non-zero
offsets to work.

Patch is forward-ported des@' one, and was adopted to current code
by dchagin@ and me.

Reviewed by: des (linprocfs part)
PR: kern/101453
MFC after: 1 week


177091 12-Mar-2008 jeff

Remove kernel support for M:N threading.

While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential. Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.


175294 13-Jan-2008 attilio

VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>


175202 10-Jan-2008 attilio

vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by: Diego Sardina <siarodx at gmail dot com>,
Andrea Di Pasquale <whyx dot it at gmail dot com>


172207 17-Sep-2007 jeff

- Move all of the PS_ flags into either p_flag or td_flags.
- p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or
previously the sched_lock. These bugs have existed for some time.
- Allow swapout to try each thread in a process individually and then
swapin the whole process if any of these fail. This allows us to move
most scheduler related swap flags into td_flags.
- Keep ki_sflag for backwards compat but change all in source tools to
use the new and more correct location of P_INMEM.

Reported by: pho
Reviewed by: attilio, kib
Approved by: re (kensmith)


170587 12-Jun-2007 rwatson

Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in
some cases, move to priv_check() if it was an operation on a thread and
no other flags were present.

Eliminate caller-side jail exception checking (also now-unused); jail
privilege exception code now goes solely in kern_jail.c.

We can't yet eliminate suser() due to some cases in the KAME code where
a privilege check is performed and then used in many different deferred
paths. Do, however, move those prototypes to priv.h.

Reviewed by: csjp
Obtained from: TrustedBSD Project


170472 09-Jun-2007 attilio

rufetch and calcru sometimes should be called atomically together.
This patch fixes places where they should be called atomically changing
their locking requirements (both assume per-proc spinlock held) and
introducing rufetchcalc which wrappers both calls to be performed in
atomic way.

Reviewed by: jeff
Approved by: jeff (mentor)


170307 05-Jun-2007 jeff

Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
sychronization.
- Use the per-process spinlock rather than the sched_lock for per-process
scheduling synchronization.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


169168 01-May-2007 des

The process lock is held when procfs_ioctl() is called. Assert that this
is so, and PHOLD the process while sleeping since msleep() will release
the lock.


168968 23-Apr-2007 alc

Add synchronization. Eliminate the acquisition and release of Giant.

Reviewed by: tegge


168763 15-Apr-2007 des

Instead of stating GIANT_REQUIRED, just acquire and release Giant where
needed. This does not make a difference now, but will when procfs is
marked MPSAFE.


168759 15-Apr-2007 des

Fix the same bug as in procfs_doproc{,db}regs(): check that uio_offset is
0 upon entry, and don't reset it before returning.

MFC after: 3 weeks


168758 15-Apr-2007 des

Don't reset uio_offset to 0 before returning. Instead, refuse to service
requests where uio_offset is not 0 to begin with. This fixes a long-
standing bug where e.g. 'cat /proc/$$/regs' would loop forever.

MFC after: 3 weeks


167482 12-Mar-2007 des

Add a pn_destroy field to pfs_node. This field points to a destructor
function which is called from pfs_destroy() before the node is reclaimed.

Modify pfs_create_{dir,file,link}() to accept a pointer to a destructor
function in addition to the usual attr / fill / vis pointers.

This breaks both the programming and binary interfaces between pseudofs
and its consumers. It is believed that there are no pseudofs consumers
outside the source tree, so that the impact of this change is minimal.

Submitted by: Aniruddha Bohra <bohra@cs.rutgers.edu>


166826 19-Feb-2007 rwatson

Do allow PIOCSFL in jail for setguid processes; this is more consistent
with other debugging checks elsewhere. XXX comment on the fact that
p_candebug() is not being used here remains.


166548 07-Feb-2007 kib

Fix the race of dereferencing /proc/<pid>/file with execve(2) by caching
the value of p_textvp. This way, we always unlock the locked vnode.
While there, vhold() the vnode around the vn_lock().

Reported and tested by: Guy Helmer (ghelmer palisadesys com)
Approved by: des (procfs maintainer)
MFC after: 1 week


164936 06-Dec-2006 julian

Threading cleanup.. part 2 of several.

Make part of John Birrell's KSE patch permanent..
Specifically, remove:
Any reference of the ksegrp structure. This feature was
never fully utilised and made things overly complicated.
All code in the scheduler that tried to make threaded programs
fair to unthreaded programs. Libpthread processes will already
do this to some extent and libthr processes already disable it.

Also:
Since this makes such a big change to the scheduler(s), take the opportunity
to rename some structures and elements that had to be moved anyhow.
This makes the code a lot more readable.

The ULE scheduler compiles again but I have no idea if it works.

The 4bsd scheduler still reqires a little cleaning and some functions that now do
ALMOST nothing will go away, but I thought I'd do that as a separate commit.

Tested by David Xu, and Dan Eischen using libthr and libpthread.


164356 17-Nov-2006 kib

Wake up PIOCWAIT handler on the process exit in addition to the stop
events. &p->p_stype is explicitely woken up on process exit for us.

Now, truss /nonexistent exits with error instead of waiting until killed
by signal.

Reported by: Nikos Vassiliadis nvass at teledomenet gr
Reviewed by: jhb
MFC after: 1 week


164033 06-Nov-2006 rwatson

Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges. These may
require some future tweaking.

Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>


163709 26-Oct-2006 jb

Make KSE a kernel option, turned on by default in all GENERIC
kernel configs except sun4v (which doesn't process signals properly
with KSE).

Reviewed by: davidxu@


162711 27-Sep-2006 ru

Fix our ioctl(2) implementation when the argument is "int". New
ioctls passing integer arguments should use the _IOWINT() macro.
This fixes a lot of ioctl's not working on sparc64, most notable
being keyboard/syscons ioctls.

Full ABI compatibility is provided, with the bonus of fixing the
handling of old ioctls on sparc64.

Reviewed by: bde (with contributions)
Tested by: emax, marius
MFC after: 1 week


159283 05-Jun-2006 ghelmer

Upon further review, DES prefers this change over that in revision 1.13
to resolve the directory access problem for processes with P_SUGID flag
set.

Suggested by: des


158880 24-May-2006 ghelmer

Revision 1.4 set access for all sensitive files in /proc/<PID> to mode 0
if a process's uid or gid has changed, but the /proc/<PID> directory
itself was also set to mode 0. Assuming this doesn't open any
security holes, open access to the /proc/<PID> directory for users
other than root to read or search the directory.

Reviewed by: des (back in February)
MFC after: 3 weeks


155918 22-Feb-2006 jhb

Hold the proc lock while calling proc_sstep() since the function asserts
it and remove a PRELE() that didn't have a matching PHOLD(). The calling
code already has a PHOLD anyway.

MFC after: 1 week


153706 24-Dec-2005 trhodes

Make tv_sec a time_t on all platforms but alpha. Brings us more in line with
POSIX. This also makes the struct correct we ever implement an i386-time64
architecture. Not that we need too.

Reviewed by: imp, brooks
Approved by: njl (acpica), des (no objects, touches procfs)
Tested with: make universe


151316 14-Oct-2005 davidxu

1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most
changes in MD code are trivial, before this change, trapsignal and
sendsig use discrete parameters, now they uses member fields of
ksiginfo_t structure. For sendsig, this change allows us to pass
POSIX realtime signal value to user code.

2. Remove cpu_thread_siginfo, it is no longer needed because we now always
generate ksiginfo_t data and feed it to libpthread.

3. Add p_sigqueue to proc structure to hold shared signals which were
blocked by all threads in the proc.

4. Add td_sigqueue to thread structure to hold all signals delivered to
thread.

5. i386 and amd64 now return POSIX standard si_code, other arches will
be fixed.

6. In this sigqueue implementation, pending signal set is kept as before,
an extra siginfo list holds additional siginfo_t data for signals.
kernel code uses psignal() still behavior as before, it won't be failed
even under memory pressure, only exception is when deleting a signal,
we should call sigqueue_delete to remove signal from sigqueue but
not SIGDELSET. Current there is no kernel code will deliver a signal
with additional data, so kernel should be as stable as before,
a ksiginfo can carry more information, for example, allow signal to
be delivered but throw away siginfo data if memory is not enough.
SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can
not be caught or masked.
The sigqueue() syscall allows user code to queue a signal to target
process, if resource is unavailable, EAGAIN will be returned as
specification said.
Just before thread exits, signal queue memory will be freed by
sigqueue_flush.
Current, all signals are allowed to be queued, not only realtime signals.

Earlier patch reviewed by: jhb, deischen
Tested on: i386, amd64


147692 30-Jun-2005 peter

Jumbo-commit to enhance 32 bit application support on 64 bit kernels.
This is good enough to be able to run a RELENG_4 gdb binary against
a RELENG_4 application, along with various other tools (eg: 4.x gcore).
We use this at work.

ia32_reg.[ch]: handle the 32 bit register file format, used by ptrace,
procfs and core dumps.
procfs_*regs.c: vary the format of proc/XXX/*regs depending on the client
and target application.
procfs_map.c: Don't print a 64 bit value to 32 bit consumers, or their
sscanf fails. They expect an unsigned long.
imgact_elf.c: produce a valid 32 bit coredump for 32 bit apps.
sys_process.c: handle 32 bit consumers debugging 32 bit targets. Note
that 64 bit consumers can still debug 32 bit targets.

IA64 has got stubs for ia32_reg.c.

Known limitations: a 5.x/6.x gdb uses get/setcontext(), which isn't
implemented in the 32/64 wrapper yet. We also make a tiny patch to
gdb pacify it over conflicting formats of ld-elf.so.1.

Approved by: re


147676 30-Jun-2005 peter

Conditionally weaken sys_generic.c rev 1.136 to allow certain dubious
ioctl numbers in backwards compatability mode. eg: an IOC_IN ioctl with
a size of zero. Traditionally this was what you did before IOC_VOID
existed, and we had some established users of this in the tree, namely
procfs. Certain 3rd party drivers with binary userland components also
have this too.

This is necessary to have 4.x and 5.x binaries use these ioctl's. We
found this at work when trying to run 4.x binaries.

Approved by: re


143629 15-Mar-2005 phk

Don't export major,minor, instead export tty name.


139776 06-Jan-2005 imp

/* -> /*- for copyright notices, minor format tweaks as necessary


138281 01-Dec-2004 cperciva

Fix unvalidated pointer dereference. This is FreeBSD-SA-04:17.procfs.


136152 05-Oct-2004 jhb

Rework how we store process times in the kernel such that we always store
the raw values including for child process statistics and only compute the
system and user timevals on demand.

- Fix the various kern_wait() syscall wrappers to only pass in a rusage
pointer if they are going to use the result.
- Add a kern_getrusage() function for the ABI syscalls to use so that they
don't have to play stackgap games to call getrusage().
- Fix the svr4_sys_times() syscall to just call calcru() to calculate the
times it needs rather than calling getrusage() twice with associated
stackgap, etc.
- Add a new rusage_ext structure to store raw time stats such as tick counts
for user, system, and interrupt time as well as a bintime of the total
runtime. A new p_rux field in struct proc replaces the same inline fields
from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux
field in struct proc contains the "raw" child time usage statistics.
ruadd() has been changed to handle adding the associated rusage_ext
structures as well as the values in rusage. Effectively, the values in
rusage_ext replace the ru_utime and ru_stime values in struct rusage. These
two fields in struct rusage are no longer used in the kernel.
- calcru() has been split into a static worker function calcru1() that
calculates appropriate timevals for user and system time as well as updating
the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a
copy of the process' p_rux structure to compute the timevals after updating
the runtime appropriately if any of the threads in that process are
currently executing. It also now only locks sched_lock internally while
doing the rux_runtime fixup. calcru() now only requires the caller to
hold the proc lock and calcru1() only requires the proc lock internally.
calcru() also no longer allows callers to ask for an interrupt timeval
since none of them actually did.
- calcru() now correctly handles threads executing on other CPUs.
- A new calccru() function computes the child system and user timevals by
calling calcru1() on p_crux. Note that this means that any code that wants
child times must now call this function rather than reading from p_cru
directly. This function also requires the proc lock.
- This finishes the locking for rusage and friends so some of the Giant locks
in exit1() and kern_wait() are now gone.
- The locking in ttyinfo() has been tweaked so that a shared lock of the
proctree lock is used to protect the process group rather than the process
group lock. By holding this lock until the end of the function we now
ensure that the process/thread that we pick to dump info about will no
longer vanish while we are trying to output its info to the console.

Submitted by: bde (mostly)
MFC after: 1 month


136004 01-Oct-2004 das

Don't PHOLD() the target process in procfs, since this is already done
in pseudofs. Moreover, PHOLD() may block between the p_candebug()
access check and the actual operation.


128019 07-Apr-2004 imp

Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson


127694 01-Apr-2004 pjd

Remove ps_argsopen from this check, because of two reasons:
1. This check if wrong, because it is true by default
(kern.ps_argsopen is 1 by default) (p_cansee() is not even checked).
2. Sysctl kern.ps_argsopen is going away.


125454 04-Feb-2004 jhb

Locking for the per-process resource limits structure.
- struct plimit includes a mutex to protect a reference count. The plimit
structure is treated similarly to struct ucred in that is is always copy
on write, so having a reference to a structure is sufficient to read from
it without needing a further lock.
- The proc lock protects the p_limit pointer and must be held while reading
limits from a process to keep the limit structure from changing out from
under you while reading from it.
- Various global limits that are ints are not protected by a lock since
int writes are atomic on all the archs we support and thus a lock
wouldn't buy us anything.
- All accesses to individual resource limits from a process are abstracted
behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return
either an rlimit, or the current or max individual limit of the specified
resource from a process.
- dosetrlimit() was renamed to kern_setrlimit() to match existing style of
other similar syscall helper functions.
- The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit()
(it didn't used the stackgap when it should have) but uses lim_rlimit()
and kern_setrlimit() instead.
- The svr4 compat no longer uses the stackgap for resource limits calls,
but uses lim_rlimit() and kern_setrlimit() instead.
- The ibcs2 compat no longer uses the stackgap for resource limits. It
also no longer uses the stackgap for accessing sysctl's for the
ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result,
ibcs2_sysconf() no longer needs Giant.
- The p_rlimit macro no longer exists.

Submitted by: mtm (mostly, I only did a few cleanups and catchups)
Tested on: i386
Compiled on: alpha, amd64


124219 07-Jan-2004 rwatson

Lock p->p_textvp before calling vn_fullpath() on it. Note the
potential lock order concern due to the vnode lock held
simultaneously by the caller into procfs.

Reported by: kuriyama
Approved by: des


123247 07-Dec-2003 des

Minor whitespace and style issues.


121247 19-Oct-2003 mux

Remove debug printf().


120665 02-Oct-2003 nectar

Introduce a uiomove_frombuf helper routine that handles computing and
validating the offset within a given memory buffer before handing the
real work off to uiomove(9).

Use uiomove_frombuf in procfs to correct several issues with
integer arithmetic that could result in underflows/overflows. As a
side-effect, the code is significantly simplified.

Add additional sanity checks when computing a memory allocation size
in pfs_read.

Submitted by: rwatson (original uiomove_frombuf -- bugs are mine :-)
Reported by: Joost Pol <joost@pine.nl> (integer underflows/overflows)


120583 29-Sep-2003 rwatson

Add a new column to the procfs map to hold the name of the mapped
file for vnode mappings. Note that this uses vn_fullpath() and may
be somewhat unreliable, although not too unreliable for shared
libraries. For non-vnode mappings, just print "-" for the field.

Obtained from: TrustedBSD Projects
Sponsored by: DARPA, AFRL, Network Associates Laboratories


118907 14-Aug-2003 rwatson

Add p_candebug() check to access a process map file in procfs; limit
access to map information for processes that you wouldn't otherwise
have debug rights on.

Tested by: bms


116361 15-Jun-2003 davidxu

Rename P_THREADED to P_SA. P_SA means a process is using scheduler
activations.


114734 05-May-2003 rwatson

Clean up proc locking in procfs: make sure the proc lock is held before
entering sys_process.c debugging primitives, or we violate assertions.
Also, be more careful about releasing the process lock around calls
to uiomove() which may sleep waiting for paging machinations or
related notions. We may want to defer the uiomove() in at least
one case, but jhb will look into that at a later date.

Reported by: Philippe Charnier <charnier@xp11.frmug.org>
Reviewed by: jhb


114434 01-May-2003 des

Instead of recording the Unix time in a process when it starts, record the
uptime. Where necessary, convert it back to Unix time by adding boottime
to it. This fixes a potential problem in the accounting code, which would
compute the elapsed time incorrectly if the Unix time was stepped during
the lifetime of the process.


113867 22-Apr-2003 jhb

- Always call faultin() in _PHOLD() if PS_INMEM is clear. This closes a
race where a thread could assume that a process was swapped in by
PHOLD() when it actually wasn't fully swapped in yet.
- In faultin(), always msleep() if PS_SWAPPINGIN is set instead of doing
this check after bumping p_lock in the PS_INMEM == 0 case. Also,
sched_lock is only needed for setting and clearning swapping PS_*
flags and the swap thread inhibitor.
- Don't set and clear the thread swap inhibitor in the same loops as the
pmap_swapin/out_thread() since we have to do it under sched_lock.
Instead, mimic the treatment of the PS_INMEM flag and use separate loops
to set the inhibitors when clearing PS_INMEM and clear the inhibitors
when setting PS_INMEM.
- swapout() now returns with the proc lock held as it holds the lock
while adjusting the swapping-related PS_* flags so that the proc lock
can be used to test those flags.
- Only use the proc lock to check the swapping-related PS_* flags in
several places.
- faultin() no longer requires sched_lock to be held by callers.
- Rename PS_SWAPPING to PS_SWAPPINGOUT to be less ambiguous now that we
have PS_SWAPPINGIN.


113620 17-Apr-2003 jhb

- Use a local variable to close a minor race when determining if the wmesg
printed out needs a prefix such as when a thread is blocked on a lock.
- Use another local variable to close another race for the td_wmesg and
td_wchan members of struct thread.


113619 17-Apr-2003 jhb

Protect p_flag with the proc lock. The sched_lock is not needed to turn
off P_STOPPED_SIG in p_flag.


113618 17-Apr-2003 jhb

- P_SHOULDSTOP just needs proc lock now, so don't acquire sched_lock unless
it is needed.
- Add a proc lock assertion.


113617 17-Apr-2003 jhb

Add a proc lock assertion and move another assertion up to the top of the
function.


111738 02-Mar-2003 des

wakeup(9) and msleep(9) take void * arguments, not caddr_t.


111585 27-Feb-2003 julian

Change the process flags P_KSES to be P_THREADED.
This is just a cosmetic change but I've been meaning to do it for about a year.


105988 26-Oct-2002 rwatson

Slightly change the semantics of vnode labels for MAC: rather than
"refreshing" the label on the vnode before use, just get the label
right from inception. For single-label file systems, set the label
in the generic VFS getnewvnode() code; for multi-label file systems,
leave the labeling up to the file system. With UFS1/2, this means
reading the extended attribute during vfs_vget() as the inode is
pulled off disk, rather than hitting the extended attributes
frequently during operations later, improving performance. This
also corrects sematics for shared vnode locks, which were not
previously present in the system. This chances the cache
coherrency properties WRT out-of-band access to label data, but in
an acceptable form. With UFS1, there is a small race condition
during automatic extended attribute start -- this is not present
with UFS2, and occurs because EAs aren't available at vnode
inception. We'll introduce a work around for this shortly.

Approved by: re
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


105560 20-Oct-2002 phk

Remove even more '&' from pointers to functions.

Spotted by: FlexeLint


104306 01-Oct-2002 jmallett

Back our kernel support for reliable signal queues.

Requested by: rwatson, phk, and many others


104233 30-Sep-2002 jmallett

First half of implementation of ksiginfo, signal queues, and such. This
gets signals operating based on a TailQ, and is good enough to run X11,
GNOME, and do job control. There are some intricate parts which could be
more refined to match the sigset_t versions, but those require further
evaluation of directions in which our signal system can expand and contract
to fit our needs.

After this has been in the tree for a while, I will make in kernel API
changes, most notably to trapsignal(9) and sendsig(9), to use ksiginfo
more robustly, such that we can actually pass information with our
(queued) signals to the userland. That will also result in using a
struct ksiginfo pointer, rather than a signal number, in a lot of
kern_sig.c, to refer to an individual pending signal queue member, but
right now there is no defined behaviour for such.

CODAFS is unfinished in this regard because the logic is unclear in
some places.

Sponsored by: New Gold Technology
Reviewed by: bde, tjr, jake [an older version, logic similar]


103767 21-Sep-2002 jake

Use the fields in the sysentvec and in the vm map header in place of the
constants VM_MIN_ADDRESS, VM_MAXUSER_ADDRESS, USRSTACK and PS_STRINGS.
This is mainly so that they can be variable even for the native abi, based
on different machine types. Get stack protections from the sysentvec too.
This makes it trivial to map the stack non-executable for certain abis, on
machines that support it.


103216 11-Sep-2002 julian

Completely redo thread states.

Reviewed by: davidxu@freebsd.org


102950 05-Sep-2002 davidxu

s/SGNL/SIG/
s/SNGL/SINGLE/
s/SNGLE/SINGLE/

Fix abbreviation for P_STOPPED_* etc flags, in original code they were
inconsistent and difficult to distinguish between them.

Approved by: julian (mentor)


101901 15-Aug-2002 jake

Fixed 64bit big endian bugs relating to abuse of ioctl argument passing.
This makes truss work on sparc64.


101132 01-Aug-2002 rwatson

Introduce support for Mandatory Access Control and extensible
kernel access control.

Modify procfs so that (when mounted multilabel) it exports process MAC
labels as the vnode labels of procfs vnodes associated with processes.

Approved by: des
Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


100884 29-Jul-2002 julian

Create a new thread state to describe threads that would be ready to run
except for the fact tha they are presently swapped out. Also add a process
flag to indicate that the process has started the struggle to swap
back in. This will be needed for the case where multiple threads
start the swapin action top a collision. Also add code to stop
a process fropm being swapped out if one of the threads in this
process is actually off running on another CPU.. that might hurt...

Submitted by: Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>


99072 29-Jun-2002 julian

Part 1 of KSE-III

The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)

Reviewed by: Almost everyone who counts
(at various times, peter, jhb, matt, alfred, mini, bernd,
and a cast of thousands)

NOTE: this is still Beta code, and contains lots of debugging stuff.
expect slight instability in signals..


96886 19-May-2002 jhb

Change p_can{debug,see,sched,signal}()'s first argument to be a thread
pointer instead of a proc pointer and require the process pointed to
by the second argument to be locked. We now use the thread ucred reference
for the credential checks in p_can*() as a result. p_canfoo() should now
no longer need Giant.


95210 21-Apr-2002 bde

Include <sys/systm.h> for (at least) the definition of atomic functions
which are sometimes used by the macros in <sys/mutex.h>; don't depend
on not-quite-necessary namespace pollution in <sys/mutex.h>.


95090 20-Apr-2002 rwatson

Spelling fix for comment.


94624 13-Apr-2002 jhb

- Change procfs_control()'s first argument to be a thread pointer instead
of a process pointer.
- Move the p_candebug() at the start of procfs_control() a bit to make
locking feasible. We still perform the access check before doing
anything, we just now perform it after acquiring locks.
- Don't lock the sched_lock for TRACE_WAIT_P() and when checking to see if
p_stat is SSTOP. We lock the process while setting p_stat to SSTOP
so locking the process is sufficient to do a read to see if p_stat is
SSTOP or not.


94623 13-Apr-2002 jhb

Lock the target process for p_candebug().


94622 13-Apr-2002 jhb

Lock the target process in procfs_doproc*regs() for p_candebug and while
reading/writing the registers.


94620 13-Apr-2002 jhb

- p_cansee() needs the target process locked.
- We need the proc lock held for more of procfs_doprocstatus().


93593 01-Apr-2002 jhb

Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API. The entire API now consists of two functions
similar to the pre-KSE API. The suser() function takes a thread pointer
as its only argument. The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0. The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on: smp@


93393 29-Mar-2002 alfred

Protect proc struct (p_args and p_comm) when doing procfs IO that pulls
data from it.

Submitted by: Jonathan Mini <mini@haikugeek.com>


92727 19-Mar-2002 alfred

Remove __P.


91140 23-Feb-2002 tanimura

Lock struct pgrp, session and sigio.

New locks are:

- pgrpsess_lock which locks the whole pgrps and sessions,
- pg_mtx which protects the pgrp members, and
- s_mtx which protects the session members.

Please refer to sys/proc.h for the coverage of these locks.

Changes on the pgrp/session interface:

- pgfind() needs the pgrpsess_lock held.

- The caller of enterpgrp() is responsible to allocate a new pgrp and
session.

- Call enterthispgrp() in order to enter an existing pgrp.

- pgsignal() requires a pgrp lock held.

Reviewed by: jhb, alfred
Tested on: cvsup.jp.FreeBSD.org
(which is a quad-CPU machine running -current)


90873 18-Feb-2002 des

Paranoia: if the process is setugid, set all sensitive files mode 0.


90717 16-Feb-2002 bde

FIxed the following style bugs:
- clobbering of jsp's $Id$ by FreeBSD's old $Id$.
- long lines in recent KSE changes (procfs_ctl.c).
- other style bugs in KSE changes (most related to an shadowed variable
in procfs_status.c -- the td in the outer scope is obfuscated by
PFS_FILL_ARGS).

Approved by: des


90716 16-Feb-2002 bde

FIxed the following style bugs:
- clobbering of jsp's $Id$ by FreeBSD's old $Id$.
- lost Berkeley id in procfs_dbregs.c
- long lines in recent KSE changes.
- various gratuitous differences between procfs_*regs.c.


90715 16-Feb-2002 bde

Fixed missing PHOLD()/PRELE().

Obtained from: procfs_dbregs.c
Approved by: des


90361 07-Feb-2002 julian

Pre-KSE/M3 commit.
this is a low-functionality change that changes the kernel to access the main
thread of a process via the linked list of threads rather than
assuming that it is embedded in the process. It IS still embeded there
but remove all teh code that assumes that in preparation for the next commit
which will actually move it out.

Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,


87669 11-Dec-2001 des

Remove an obsolete prototype for procfs_kmemaccess().

Submitted by: rwatson


87542 09-Dec-2001 des

Fix various bugs in the debugging code and reenable it.


87538 08-Dec-2001 des

Fix a KSEfication brain-o in procfs_doprocfile(): return the path of the target process,
not the calling process. While we're here, also unstaticize procfs_doprocfile() and
procfs_docurproc() so linprocfs can call them directly instead of duplicating them.

Submitted by: Dominic Mitchell <dom@semantico.com>


87321 04-Dec-2001 des

Pseudofsize procfs(5).


87275 03-Dec-2001 rwatson

o Introduce pr_mtx into struct prison, providing protection for the
mutable contents of struct prison (hostname, securelevel, refcount,
pr_linux, ...)
o Generally introduce mtx_lock()/mtx_unlock() calls throughout kern/
so as to enforce these protections, in particular, in kern_mib.c
protection sysctl access to the hostname and securelevel, as well as
kern_prot.c access to the securelevel for access control purposes.
o Rewrite linux emulator abstractions for accessing per-jail linux
mib entries (osname, osrelease, osversion) so that they don't return
a pointer to the text in the struct linux_prison, rather, a copy
to an array passed into the calls. Likewise, update linprocfs to
use these primitives.
o Update in_pcb.c to always use prison_getip() rather than directly
accessing struct prison.

Reviewed by: jhb


86165 07-Nov-2001 peter

Fix printf format bugs introduced in rev 1.34 for printing times.
quad_t cannot be printed with %lld on 64 bit systems.

Dont waste cpu to round user and system times up to long long, it is
highly improbable that a process will have accumulated 68 years of
user or system cpu time (not wall clock time) before a reboot or
process restart.


86136 06-Nov-2001 green

Correctly unlock the target process if /proc/$foo/mem is open()ed by
another process which cannot p_candebug() it. The bug was introduced
in rev. 1.100.

Approved by: des


85644 28-Oct-2001 dillon

Adjust printfs to be time_t agnostic.


85320 22-Oct-2001 des

No, you may not /* FALLTHROUGH */. Not only will you return an incorrect
result, but you'd corrupt the kernel malloc() arena if it weren't for a
small but life-saving optimization in ioctl().

MFC after: 1 week


85297 21-Oct-2001 des

Move procfs_* from procfs_machdep.c into sys_process.c, and rename them to
proc_* in the process; procfs_machdep.c is no longer needed.

Run-tested on i386, build-tested on Alpha, untested on other platforms.


84637 07-Oct-2001 des

Dissociate ptrace from procfs.

Until now, the ptrace syscall was implemented as a wrapper that called
various functions in procfs depending on which ptrace operation was
requested. Most of these functions were themselves wrappers around
procfs_{read,write}_{,db,fp}regs(), with only some extra error checks,
which weren't necessary in the ptrace case anyway.

This commit moves procfs_rwmem() from procfs_mem.c into sys_process.c
(renaming it to proc_rwmem() in the process), and implements ptrace()
directly in terms of procfs_{read,write}_{,db,fp}regs() instead of
having it fake up a struct uio and then call procfs_do{,db,fp}regs().

It also moves the prototypes for procfs_{read,write}_{,db,fp}regs()
and proc_rwmem() from proc.h to ptrace.h, and marks all procfs files
except procfs_machdep.c as "optional procfs" instead of "standard".


84634 07-Oct-2001 des

Remove some useless preprocesor paranoia.


84633 07-Oct-2001 des

In procfs_readdir(), when the directory being read was a process directory,
the target process was being held locked during the uiomove() call. If the
process calling readdir() was the same as the target process (for instance
'ls /proc/curproc/'), and uiomove() caused a page fault, the result would
be a proc lock recursion. I have no idea how long this has been broken -
possibly ever since pfind() was changed to lock the process it returns.

Also replace the one and only call to procfs_findtextvp() with a direct
test of td->td_proc->p_textvp.


83920 25-Sep-2001 mike

A process name may contain whitespace and unprintable characters,
so convert those characters to octal notation. Also convert
backslashes to octal notation to avoid confusion.

Reviewed by: des
MFC after: 1 week


83635 18-Sep-2001 rwatson

o Remove redundant securelevel/pid1 check in procfs_rw() -- this
protection is enforced at the invidual method layer using
p_candebug().

Obtained from: TrustedBSD Project


83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


81112 03-Aug-2001 rwatson

Remove dangling prototype for the now defunct procfs_kmemaccess()
call.

Obtained from: TrustedBSD Project


81109 03-Aug-2001 rwatson

Collapse a Pmem case in with the other debugging files case for procfs,
as there are now "unusual" protection properties to Pmem that differ
from the other files. While I'm at it, introduce proc locking for
the other files, which was previously present only in the Pmem case.

Obtained from: TrustedBSD Project


81108 03-Aug-2001 rwatson

Remove read permission for group on the /proc/*/mem file, since kmem
no longer requires access.

Reviewed by: tmm
Obtained from: TrustedBSD Project


81107 03-Aug-2001 rwatson

Prior to support for almost all ps activity via sysctl, ps used procfs,
and so special-casing was introduced to provide extra procfs privilege
to the kmem group. With the advent of non-setgid kmem ps, this code
is no longer required, and in fact, can is potentially harmful as it
allocates privilege to a gid that is increasingly less meaningful.
Knowledge of specific gid's in kernel is also generally bad precedent,
as the kernel security policy doesn't distinguish gid's specifically,
only uid 0.

This commit removes reference to kmem in procfs, both in terms of
access control decisions, and the applying of gid kmem to the
/proc/*/mem file, simplifying the associated code considerably.
Processes are still permitted to access the mem file based on
the debugging policy, so ps -e still works fine for normal
processes and use.

Reviewed by: tmm
Obtained from: TrustedBSD Project


79335 05-Jul-2001 rwatson

o Replace calls to p_can(..., P_CAN_xxx) with calls to p_canxxx().
The p_can(...) construct was a premature (and, it turns out,
awkward) abstraction. The individual calls to p_canxxx() better
reflect differences between the inter-process authorization checks,
such as differing checks based on the type of signal. This has
a side effect of improving code readability.
o Replace direct credential authorization checks in ktrace() with
invocation of p_candebug(), while maintaining the special case
check of KTR_ROOT. This allows ktrace() to "play more nicely"
with new mandatory access control schemes, as well as making its
authorization checks consistent with other "debugging class"
checks.
o Eliminate "privused" construct for p_can*() calls which allowed the
caller to determine if privilege was required for successful
evaluation of the access control check. This primitive is currently
unused, and as such, serves only to complicate the API.

Approved by: ({procfs,linprocfs} changes) des
Obtained from: TrustedBSD Project


79224 04-Jul-2001 dillon

With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage). Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.


77799 06-Jun-2001 tanimura

Lock VM Giant prior to locking a vm map.

Spotted by: Daniel Rock <D.Rock@t-online.de>
Tested by: David Wolfskill <david@catwhisker.org>,
Sean Eric Fagan <sef@kithrup.com>


77183 25-May-2001 rwatson

o Merge contents of struct pcred into struct ucred. Specifically, add the
real uid, saved uid, real gid, and saved gid to ucred, as well as the
pcred->pc_uidinfo, which was associated with the real uid, only rename
it to cr_ruidinfo so as not to conflict with cr_uidinfo, which
corresponds to the effective uid.
o Remove p_cred from struct proc; add p_ucred to struct proc, replacing
original macro that pointed.
p->p_ucred to p->p_cred->pc_ucred.
o Universally update code so that it makes use of ucred instead of pcred,
p->p_ucred instead of p->p_pcred, cr_ruidinfo instead of p_uidinfo,
cr_{r,sv}{u,g}id instead of p_*, etc.
o Remove pcred0 and its initialization from init_main.c; initialize
cr_ruidinfo there.
o Restruction many credential modification chunks to always crdup while
we figure out locking and optimizations; generally speaking, this
means moving to a structure like this:
newcred = crdup(oldcred);
...
p->p_ucred = newcred;
crfree(oldcred);
It's not race-free, but better than nothing. There are also races
in sys_process.c, all inter-process authorization, fork, exec, and
exit.
o Remove sigio->sio_ruid since sigio->sio_ucred now contains the ruid;
remove comments indicating that the old arrangement was a problem.
o Restructure exec1() a little to use newcred/oldcred arrangement, and
use improved uid management primitives.
o Clean up exit1() so as to do less work in credential cleanup due to
pcred removal.
o Clean up fork1() so as to do less work in credential cleanup and
allocation.
o Clean up ktrcanset() to take into account changes, and move to using
suser_xxx() instead of performing a direct uid==0 comparision.
o Improve commenting in various kern_prot.c credential modification
calls to better document current behavior. In a couple of places,
current behavior is a little questionable and we need to check
POSIX.1 to make sure it's "right". More commenting work still
remains to be done.
o Update credential management calls, such as crfree(), to take into
account new ruidinfo reference.
o Modify or add the following uid and gid helper routines:
change_euid()
change_egid()
change_ruid()
change_rgid()
change_svuid()
change_svgid()
In each case, the call now acts on a credential not a process, and as
such no longer requires more complicated process locking/etc. They
now assume the caller will do any necessary allocation of an
exclusive credential reference. Each is commented to document its
reference requirements.
o CANSIGIO() is simplified to require only credentials, not processes
and pcreds.
o Remove lots of (p_pcred==NULL) checks.
o Add an XXX to authorization code in nfs_lock.c, since it's
questionable, and needs to be considered carefully.
o Simplify posix4 authorization code to require only credentials, not
processes and pcreds. Note that this authorization, as well as
CANSIGIO(), needs to be updated to use the p_cansignal() and
p_cansched() centralized authorization routines, as they currently
do not take into account some desirable restrictions that are handled
by the centralized routines, as well as being inconsistent with other
similar authorization instances.
o Update libkvm to take these changes into account.

Obtained from: TrustedBSD Project
Reviewed by: green, bde, jhb, freebsd-arch, freebsd-audit


77031 23-May-2001 ru

- FDESC, FIFO, NULL, PORTAL, PROC, UMAP and UNION file
systems were repo-copied from sys/miscfs to sys/fs.

- Renamed the following file systems and their modules:
fdesc -> fdescfs, portal -> portalfs, union -> unionfs.

- Renamed corresponding kernel options:
FDESC -> FDESCFS, PORTAL -> PORTALFS, UNION -> UNIONFS.

- Install header files for the above file systems.

- Removed bogus -I${.CURDIR}/../../sys CFLAGS from userland
Makefiles.


76827 19-May-2001 alfred

Introduce a global lock for the vm subsystem (vm_mtx).

vm_mtx does not recurse and is required for most low level
vm operations.

faults can not be taken without holding Giant.

Memory subsystems can now call the base page allocators safely.

Almost all atomic ops were removed as they are covered under the
vm mutex.

Alpha and ia64 now need to catch up to i386's trap handlers.

FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).

Reviewed (partially) by: jake, jhb


76491 11-May-2001 jhb

GC prototype for procfs_bmap() missed during a previous commit.


76166 01-May-2001 markm

Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by: bde (with reservations)


76131 29-Apr-2001 phk

Add a vop_stdbmap(), and make it part of the default vop vector.

Make 7 filesystems which don't really know about VOP_BMAP rely
on the default vector, rather than more or less complete local
vop_nopbmap() implementations.


76117 29-Apr-2001 grog

Revert consequences of changes to mount.h, part 2.

Requested by: bde


75893 24-Apr-2001 jhb

Change the pfind() and zpfind() functions to lock the process that they
find before releasing the allproc lock and returning.

Reviewed by: -smp, dfr, jake


75858 23-Apr-2001 grog

Correct #includes to work with fixed sys/mount.h.


74996 29-Mar-2001 jhb

- Various style fixes.
- Fix a silly bug so that we return the actual error code if a procfs
attach fails rather than always returning 0.

Reported by: bde


74927 28-Mar-2001 jhb

Convert the allproc and proctree locks from lockmgr locks to sx locks.


74914 28-Mar-2001 jhb

Catch up to header include changes:
- <sys/mutex.h> now requires <sys/systm.h>
- <sys/mutex.h> and <sys/sx.h> now require <sys/lock.h>


73920 07-Mar-2001 jhb

Proc locking identical to that of linprocfs' vnops except that we hold the
proc lock while calling psignal.


73919 07-Mar-2001 jhb

Protect read to p_pptr with proc lock rather than proctree lock.


73918 07-Mar-2001 jhb

Proc locking. Lock around psignal() and also ensure both an exclusive
proctree lock and the process lock are held when updating p_pptr and
p_oppid. When we are just reaading p_pptr we only need the proc lock and
not a proctree lock as well.


73906 07-Mar-2001 jhb

Protect p_flag with the proc lock.


73383 03-Mar-2001 dfr

Remove the copyinstr call which was trying to copy the pathname in from
user space. It has already been copied in and mp->mnt_stat.f_mntonname has
already been initialised by the caller.

This fixes a panic on the alpha caused by the fact that the variable
'size' wasn't initialised because the call to copyinstr() bailed out with
an EFAULT error.


72786 21-Feb-2001 rwatson

o Move per-process jail pointer (p->pr_prison) to inside of the subject
credential structure, ucred (cr->cr_prison).
o Allow jail inheritence to be a function of credential inheritence.
o Abstract prison structure reference counting behind pr_hold() and
pr_free(), invoked by the similarly named credential reference
management functions, removing this code from per-ABI fork/exit code.
o Modify various jail() functions to use struct ucred arguments instead
of struct proc arguments.
o Introduce jailed() function to determine if a credential is jailed,
rather than directly checking pointers all over the place.
o Convert PRISON_CHECK() macro to prison_check() function.
o Move jail() function prototypes to jail.h.
o Emulate the P_JAILED flag in fill_kinfo_proc() and no longer set the
flag in the process flags field itself.
o Eliminate that "const" qualifier from suser/p_can/etc to reflect
mutex use.

Notes:

o Some further cleanup of the linux/jail code is still required.
o It's now possible to consider resolving some of the process vs
credential based permission checking confusion in the socket code.
o Mutex protection of struct prison is still not present, and is
required to protect the reference count plus some fields in the
structure.

Reviewed by: freebsd-arch
Obtained from: TrustedBSD Project


72200 09-Feb-2001 bmilekic

Change and clean the mutex lock interface.

mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)


71999 04-Feb-2001 phk

Mechanical change to use <sys/queue.h> macro API instead of
fondling implementation details.

Created with: sed(1)
Reviewed by: md5(1)


71569 24-Jan-2001 jhb

- Catch up to proc flag changes.


70536 31-Dec-2000 phk

Use macro API to <sys/queue.h>


70317 23-Dec-2000 jake

Protect proc.p_pptr and proc.p_children/p_sibling with the
proctree_lock.

linprocfs not locked pending response from informal maintainer.

Reviewed by: jhb, -smp@


69958 13-Dec-2000 rwatson

o Tighten restrictions on use of /proc/pid/ctl and move access checks
in ctl to using centralized p_can() inter-process access control
interface.

Reviewed by: sef


69947 13-Dec-2000 jake

- Change the allproc_lock to use a macro, ALLPROC_LOCK(how), instead
of explicit calls to lockmgr. Also provides macros for the flags
pased to specify shared, exclusive or release which map to the
lockmgr flags. This is so that the use of lockmgr can be easily
replaced with optimized reader-writer locks.
- Add some locking that I missed the first time.


69798 09-Dec-2000 des

Add a module version (so that linprocfs can properly depend on procfs)


69507 02-Dec-2000 jhb

Protect p_stat with the sched_lock.

Reviewed by: jake


68505 08-Nov-2000 eivind

More paranoia against overflows


68199 01-Nov-2000 eivind

Fix overflow from jail hostname.

Bug found by: Esa Etelavuori <eetelavu@cc.hut.fi>


66701 05-Oct-2000 alfred

return correct type for process directory entries, DT_DIR not DT_REG


65445 04-Sep-2000 des

Remove a comment that has been not only obsolete but patently wrong for the
last 31 revisions (almost three years).


65339 01-Sep-2000 rwatson

o Simplify if/then clause equating ESRCH with ENOENT when hiding a process

Submitted by: des


65331 01-Sep-2000 rwatson

o Make procfs use vaccess() for procfs_access() DAC and super-user checks,
rather than implementing its own {uid,gid,other} checks against vnode
mode. Similar change to linprocfs currently under review.

Obtained from: TrustedBSD Project


65237 30-Aug-2000 rwatson

o Centralize inter-process access control, introducing:

int p_can(p1, p2, operation, privused)

which allows specification of subject process, object process,
inter-process operation, and an optional call-by-reference privused
flag, allowing the caller to determine if privilege was required
for the call to succeed. This allows jail, kern.ps_showallprocs and
regular credential-based interaction checks to occur in one block of
code. Possible operations are P_CAN_SEE, P_CAN_SCHED, P_CAN_KILL,
and P_CAN_DEBUG. p_can currently breaks out as a wrapper to a
series of static function checks in kern_prot, which should not
be invoked directly.

o Commented out capabilities entries are included for some checks.

o Update most inter-process authorization to make use of p_can() instead
of manual checks, PRISON_CHECK(), P_TRESPASS(), and
kern.ps_showallprocs.

o Modify suser{,_xxx} to use const arguments, as it no longer modifies
process flags due to the disabling of ASU.

o Modify some checks/errors in procfs so that ENOENT is returned instead
of ESRCH, further improving concealment of processes that should not
be visible to other processes. Also introduce new access checks to
improve hiding of processes for procfs_lookup(), procfs_getattr(),
procfs_readdir(). Correct a bug reported by bp concerning not
handling the CREATE case in procfs_lookup(). Remove volatile flag in
procfs that caused apparently spurious qualifier warnigns (approved by
bde).

o Add comment noting that ktrace() has not been updated, as its access
control checks are different from ptrace(), whereas they should
probably be the same. Further discussion should happen on this topic.

Reviewed by: bde, green, phk, freebsd-security, others
Approved by: bde
Obtained from: TrustedBSD Project


64819 18-Aug-2000 phk

Introduce vop_stdinactive() and make it the default if no vop_inactive
is declared.

Sort and prune a few vop_op[].


59760 29-Apr-2000 phk

Remove unneeded #include <sys/kernel.h>


59652 26-Apr-2000 green

Move procfs_fullpath() to vfs_cache.c, with a rename to textvp_fullpath().
There's no excuse to have code in synthetic filestores that allows direct
references to the textvp anymore.

Feature requested by: msmith
Feature agreed to by: warner
Move requested by: phk
Move agreed to by: bde


59522 22-Apr-2000 green

Quiet an unused variable warning by commenting out a variable declaration
that goes with a commented out statement.


59482 22-Apr-2000 green

There's no reason to make "file" 0500 rather than 0555.


59481 22-Apr-2000 green

Welcome back our old friend from procfs, "file"!


55206 29-Dec-1999 peter

Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot). This is consistant with the other
BSD's who made this change quite some time ago. More commits to come.


55153 27-Dec-1999 peter

Fix typo "," vs ";"

PR: 15696
Submitted by: Takashi Okumura <taka@cs.pitt.edu>


54908 20-Dec-1999 eivind

Include vm/vm_extern.h to get at prototypes


54803 19-Dec-1999 rwatson

Second pass commit to introduce new ACL and Extended Attribute system
calls, vnops, vfsops, both in /kern, and to individual file systems that
require a vfsop_ array entry.

Reviewed by: eivind


54655 15-Dec-1999 eivind

Introduce NDFREE (and remove VOP_ABORTOP)


54424 11-Dec-1999 peter

Don't simulate a pseudo address-space beyond VM_MAXUSER_ADDRESS that
maps onto the upages. We used to use this extensively, particularly
for ps and gdb. Both of these have been "fixed". ps gets the p_stats
via eproc along with all the other stats, and gdb uses the regs, fpregs
etc files.

Once apon a time the UPAGES were mapped here, but that changed back
in January '96. This essentially kills my revisions 1.16 and 1.17.
The 2-page "hole" above the stack can be reclaimed now.


54292 08-Dec-1999 phk

Remove unused #includes.

Obtained from: http://bogon.freebsd.dk/include


53709 26-Nov-1999 phk

Add a sysctl to control if argv is disclosed to the world:
kern.ps_argsopen
It defaults to 1 which means that all users can see all argvs in ps(1).

Reviewed by: Warner


53518 21-Nov-1999 phk

Introduce the new function
p_trespass(struct proc *p1, struct proc *p2)
which returns zero or an errno depending on the legality of p1 trespassing
on p2.

Replace kern_sig.c:CANSIGNAL() with call to p_trespass() and one
extra signal related check.

Replace procfs.h:CHECKIO() macros with calls to p_trespass().

Only show command lines to process which can trespass on the target
process.


53503 21-Nov-1999 phk

s/p_cred->pc_ucred/p_ucred/g


53467 20-Nov-1999 sef

A process should be able to examine itself.


53301 17-Nov-1999 phk

Make proc/*/cmdline use the cached argv if available.

Submitted by: Paul Saab <paul@mu.org>
Reviewed by: phk


53300 17-Nov-1999 phk

The function `procfs_getattr()' in procfs doesn't set the value of
vap->va_fsid, so we cannot get valid information about procfs.

Submitted by: SAWADA Mizuki miz@pa.aix.or.jp
Reviewed by: phk
PR: 1654


53045 09-Nov-1999 alc

Passing "0" or "FALSE" as the fourth argument to vm_fault is wrong. It
should be "VM_FAULT_NORMAL".


52990 08-Nov-1999 sef

Explain why Warner is right, and I am wrong, in the removing of the
file object. Also explain some possible directions to re-implement it --
I'm not sure it should be, given the minimal application use. (Other
than having the debugger automatically access the symbols for a process,
the main use I'd found was with some minor accounting ability, but _that_
depends on it being in the filesystem space; an ioctl access method would
be useless in that case.)

This is a code-less change; only a comment has been added.


52961 07-Nov-1999 sef

Make an incredibly stupid change because Warner threatened to do it and
continue doing it despite objections by me (the principal author).

Note that this doesn't fix the real problem -- the real problem is generally
bad setup by ignorant users, and education is the right way to fix it.

So while this doesn't actually solve the prolem mentioned in the complaint
(since it's still possible to do it via other methods, although they mostly
involve a bit more complicity), and there are better methods to do this,
nobody was willing or able to provide me with a real world example that
couldn't be worked around using the existing permissions and group
mechanism. And therefore, security by removing features is the method of
the day.

I only had three applications that used it, in any event. One of them would
have made debugging easier, but I still haven't finished it, and won't
now, so it doesn't really matter.


52635 29-Oct-1999 phk

useracc() the prequel:

Merge the contents (less some trivial bordering the silly comments)
of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts
the #defines for the vm_inherit_t and vm_prot_t types next to their
typedefs.

This paves the road for the commit to follow shortly: change
useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE}
as argument.


51791 29-Sep-1999 marcel

sigset_t change (part 2 of 5)
-----------------------------

The core of the signalling code has been rewritten to operate
on the new sigset_t. No methodological changes have been made.
Most references to a sigset_t object are through macros (see
signalvar.h) to create a level of abstraction and to provide
a basis for further improvements.

The NSIG constant has not been changed to reflect the maximum
number of signals possible. The reason is that it breaks
programs (especially shells) which assume that all signals
have a non-null name in sys_signame. See src/bin/sh/trap.c
for an example. Instead _SIG_MAXSIG has been introduced to
hold the maximum signal possible with the new sigset_t.

struct sigprop has been moved from signalvar.h to kern_sig.c
because a) it is only used there, and b) access must be done
though function sigprop(). The latter because the table doesn't
holds properties for all signals, but only for the first NSIG
signals.

signal.h has been reorganized to make reading easier and to
add the new and/or modified structures. The "old" structures
are moved to signalvar.h to prevent namespace polution.

Especially the coda filesystem suffers from the change, because
it contained lines like (p->p_sigmask == SIGIO), which is easy
to do for integral types, but not for compound types.

NOTE: kdump (and port linux_kdump) must be recompiled.

Thanks to Garrett Wollman and Daniel Eischen for pressing the
importance of changing sigreturn as well.


51138 11-Sep-1999 alfred

Seperate the export check in VFS_FHTOVP, exports are now checked via
VFS_CHECKEXP.

Add fh(open|stat|stafs) syscalls to allow userland to query filesystems
based on (network) filehandle.

Obtained from: NetBSD


51068 07-Sep-1999 alfred

All unimplemented VFS ops now have entries in kern/vfs_default.c that return
reasonable defaults.

This avoids confusing and ugly casting to eopnotsupp or making dummy functions.
Bogus casting of filesystem sysctls to eopnotsupp() have been removed.

This should make *_vfsops.c more readable and reduce bloat.

Reviewed by: msmith, eivind
Approved by: phk
Tested by: Jeroen Ruigrok/Asmodai <asmodai@wxs.nl>


50477 28-Aug-1999 peter

$Id$ -> $FreeBSD$


50061 19-Aug-1999 marcel

Let processes retrieve their argv through procfs. Revert to the original
behaviour in all other cases.

Submitted by: Andrew Gordon <arg@arg1.demon.co.uk>


49525 08-Aug-1999 bde

Fixed printf format errors (%qu -> %llu; the arg was already unsigned long
long to hide problems on alphas).


48719 09-Jul-1999 phk

Allow jailed proccesses to open non-process vnodes like the root of the fs.


48715 09-Jul-1999 peter

Use %q rather than rolling a custom routine.


48692 09-Jul-1999 jlemon

Support for i386 hardware breakpoints.

Submitted by: Brian Dean <brdean@unx.sas.com>


48691 09-Jul-1999 jlemon

Implement support for hardware debug registers on the i386.

Submitted by: Brian Dean <brdean@unx.sas.com>


47897 13-Jun-1999 phk

Eliminate the bogus procfs private almost struct dirent structure.

Spotted by: Lars Hamren
Reviewed by: bde


47407 22-May-1999 dt

Don't call calcru() on a swapped-out process. calcru() access p_stats, which
is in U-area.


46389 04-May-1999 phk

Make the type and map files claim 0 bytes size. Tar doesn't get confused
now, but doesn't store any data eiter.

I wonder if we shouldn't claim to be fifos instead...


46388 04-May-1999 phk

Add even more () to CHECKIO which by now feels positively LISPish.

Submitted by: bde
Reviewed by: phk


46201 30-Apr-1999 phk

Add a new "file" to procfs: "rlimit" which shows the resource limits for
the process.

PR: 11342
Submitted by: Adrian Chadd adrian@freebsd.org
Reviewed by: phk


46155 28-Apr-1999 phk

This Implements the mumbled about "Jail" feature.

This is a seriously beefed up chroot kind of thing. The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.

For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact: "real virtual servers".

Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.

Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.

It generally does what one would expect, but setting up a jail
still takes a little knowledge.

A few notes:

I have no scripts for setting up a jail, don't ask me for them.

The IP number should be an alias on one of the interfaces.

mount a /proc in each jail, it will make ps more useable.

/proc/<pid>/status tells the hostname of the prison for
jailed processes.

Quotas are only sensible if you have a mountpoint per prison.

There are no privisions for stopping resource-hogging.

Some "#ifdef INET" and similar may be missing (send patches!)

If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!

Tools, comments, patches & documentation most welcome.

Have fun...

Sponsored by: http://www.rndassociates.com/
Run for almost a year by: http://www.servetheweb.com/


46116 27-Apr-1999 phk

Change suser_xxx() to suser() where it applies.


46112 27-Apr-1999 phk

Suser() simplification:

1:
s/suser/suser_xxx/

2:
Add new function: suser(struct proc *), prototyped in <sys/proc.h>.

3:
s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/

The remaining suser_xxx() calls will be scrutinized and dealt with
later.

There may be some unneeded #include <sys/cred.h>, but they are left
as an exercise for Bruce.

More changes to the suser() API will come along with the "jail" code.


44146 19-Feb-1999 luoqi

Hide access to vmspace:vm_pmap with inline function vmspace_pmap(). This
is the preparation step for moving pmap storage out of vmspace proper.

Reviewed by: Alan Cox <alc@cs.rice.edu>
Matthew Dillion <dillon@apollo.backplane.com>


43748 07-Feb-1999 dillon

Remove MAP_ENTRY_IS_A_MAP 'share' maps. These maps were once used to
attempt to optimize forks but were essentially given-up on due to
problems and replaced with an explicit dup of the vm_map_entry structure.
Prior to the removal, they were entirely unused.


43634 05-Feb-1999 jdp

Correct a format mismatch on 64-bit architectures. This should
fix the erroneous values in the procfs "map" file on the Alpha.


43305 27-Jan-1999 dillon

Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile


42957 21-Jan-1999 dillon

This is a rather large commit that encompasses the new swapper,
changes to the VM system to support the new swapper, VM bug
fixes, several VM optimizations, and some additional revamping of the
VM code. The specific bug fixes will be documented with additional
forced commits. This commit is somewhat rough in regards to code
cleanup issues.

Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>


42301 05-Jan-1999 peter

A partial implementation of the procfs cmdline pseudo-file. This
is enough to satisfy things like StarOffice. This is a hack, but doing
it properly would be a LOT of work, and would require extensive grovelling
around in the user address space to find the argv[].

Obtained from: Mostly from Andrzej Bialecki <abial@nask.pl>.


41514 04-Dec-1998 archie

Examine all occurrences of sprintf(), strcat(), and str[n]cpy()
for possible buffer overflow problems. Replaced most sprintf()'s
with snprintf(); for others cases, added terminating NUL bytes where
appropriate, replaced constants like "16" with sizeof(), etc.

These changes include several bug fixes, but most changes are for
maintainability's sake. Any instance where it wasn't "immediately
obvious" that a buffer overflow could not occur was made safer.

Reviewed by: Bruce Evans <bde@zeta.org.au>
Reviewed by: Matthew Dillon <dillon@apollo.backplane.com>
Reviewed by: Mike Spengler <mks@networkcs.com>


40700 28-Oct-1998 dg

Added a second argument, "activate" to the vm_page_unwire() call so that
the caller can select either inactive or active queue to put the page on.


38909 07-Sep-1998 bde

Removed statically configured mount type numbers (MOUNT_*) and all
references to them.

The change a couple of days ago to ignore these numbers in statically
configured vfsconf structs was slightly premature because the cd9660,
cfs, devfs, ext2fs, nfs vfs's still used MOUNT_* instead of the number
in their vfsconf struct.


37898 27-Jul-1998 alex

Style fixes and a bug fix: don't remove the exit handler if unmount
fails.

Submitted by: bde


37877 27-Jul-1998 alex

A better solution to the rm_at_exit problem: Register the exit function
during first mount. Unregister the exit function at last unmount.

Concept by: sef
Reviewed by: sef
Implemented by: alex


37864 25-Jul-1998 alex

Override the default VFS LKM dispatch functions so that a module
unload function can be provided (this is necessary to unregister
the at_exit handler).


37649 15-Jul-1998 bde

Cast pointers to uintptr_t/intptr_t instead of to u_long/long,
respectively. Most of the longs should probably have been
u_longs, but this changes is just to prevent warnings about
casts between pointers and integers of different sizes, not
to fix poorly chosen types.


37555 11-Jul-1998 bde

Fixed printf format errors.


37465 07-Jul-1998 bde

Quick fix for type mismatches which were fatal if longs aren't 32
bits. We used a private, wrong, version of `struct dirent' to help
break getdirentries(), and we use a silly check that the size of this
struct is a power of 2 to help break mount() if getdirentries() would
not work. This fix just changes the struct to match `struct dirent'
(except for the name length).


37154 25-Jun-1998 dt

Remove "not hungly" panics. Cookies now used by the linux and ibcs2
emulators. The emulators assume that filesystem may just ignore cookies, and
handle this case correctly. So we just ignore cookies.

Also sync *_readdir "prototypes" with reality.


36969 14-Jun-1998 bde

Avoid a 64-bit division in procfs_readdir(). Fixed related overflows.
Check args using the same expression as in fdesc and kernfs. The check
was actually already correct, modulo overflow. It could be tightened
up to either allow huge (aligned) offsets, treating them as EOF, or
disallow all offsets beyond EOF.

Didn't fix invalid address calculation &foo[i] where i may be out of
bounds.

Didn't fix shooting of foot using a private unportable dirent struct.


36840 10-Jun-1998 peter

Don't silently accept attempts to change flags where they are not
supported.


36735 07-Jun-1998 dfr

This commit fixes various 64bit portability problems required for
FreeBSD/alpha. The most significant item is to change the command
argument to ioctl functions from int to u_long. This change brings us
inline with various other BSD versions. Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.

The prototype FreeBSD/alpha machdep will follow in a couple of days
time.


36168 19-May-1998 tegge

Disallow reading the current kernel stack. Only the user structure and
the current registers should be accessible.
Reviewed by: David Greenman <dg@root.com>


35769 06-May-1998 msmith

As described by the submitter:

Reverse the VFS_VRELE patch. Reference counting of vnodes does not need
to be done per-fs. I noticed this while fixing vfs layering violations.
Doing reference counting in generic code is also the preference cited by
John Heidemann in recent discussions with him.

The implementation of alternative vnode management per-fs is still a valid
requirement for some filesystems but will be revisited sometime later,
most likely using a different framework.

Submitted by: Michael Hancock <michaelh@cet.co.jp>


35497 29-Apr-1998 dyson

Tighten up management of memory and swap space during map allocation,
deallocation cycles. This should provide a measurable improvement
on swap and memory allocation on loaded systems. It is unlikely a
complete solution. Also, provide more map info with procfs.
Chuck Cranor spurred on this improvement.


35256 17-Apr-1998 des

Seventy-odd "its" / "it's" typos in comments fixed as per kern/6108.


34901 26-Mar-1998 phk

Add two new functions, get{micro|nano}time.

They are atomic, but return in essence what is in the "time" variable.
gettime() is now a macro front for getmicrotime().

Various patches to use the two new functions instead of the various
hacks used in their absence.

Some puntuation and grammer patches from Bruce.

A couple of XXX comments.


33964 01-Mar-1998 msmith

The intent is to get rid of WILLRELE in vnode_if.src by making
a complement to all ops that return a vpp, VFS_VRELE. This is
initially only for file systems that implement the following ops
that do a WILLRELE:

vop_create, vop_whiteout, vop_mknod, vop_remove, vop_link,
vop_rename, vop_mkdir, vop_rmdir, vop_symlink

This is initial DNA that doesn't do anything yet. VFS_VRELE is
implemented but not called.

A default vfs_vrele was created for fs implementations that use the
standard vnode management routines.

VFS_VRELE implementations were made for the following file systems:

Standard (vfs_vrele)
ffs mfs nfs msdosfs devfs ext2fs

Custom
union umapfs

Just EOPNOTSUPP
fdesc procfs kernfs portal cd9660

These implementations may change as VOP changes are implemented.

In the next phase, in the vop implementations calls to vrele and the vrele
part of vput will be moved to the top layer vfs_vnops and made visible
to all layers. vput will be replaced by unlock in these cases. Unlocking
will still be done in the per fs layer but the refcount decrement will be
triggered at the top because it doesn't hurt to hold a vnode reference a
little longer. This will have minimal impact on the structure of the
existing code.

This will only be done for vnode arguments that are released by the various
fs vop implementations.

Wider use of VFS_VRELE will likely require restructuring of the code.

Reviewed by: phk, dyson, terry et. al.
Submitted by: Michael Hancock <michaelh@cet.co.jp>


33181 09-Feb-1998 eivind

Staticize.


33134 06-Feb-1998 eivind

Back out DIAGNOSTIC changes.


33108 04-Feb-1998 eivind

Turn DIAGNOSTIC into a new-style option.


32702 22-Jan-1998 dyson

VM level code cleanups.

1) Start using TSM.
Struct procs continue to point to upages structure, after being freed.
Struct vmspace continues to point to pte object and kva space for kstack.
u_map is now superfluous.
2) vm_map's don't need to be reference counted. They always exist either
in the kernel or in a vmspace. The vmspaces are managed by reference
counts.
3) Remove the "wired" vm_map nonsense.
4) No need to keep a cache of kernel stack kva's.
5) Get rid of strange looking ++var, and change to var++.
6) Change more data structures to use our "zone" allocator. Added
struct proc, struct vmspace and struct vnode. This saves a significant
amount of kva space and physical memory. Additionally, this enables
TSM for the zone managed memory.
7) Keep ioopt disabled for now.
8) Remove the now bogus "single use" map concept.
9) Use generation counts or id's for data structures residing in TSM, where
it allows us to avoid unneeded restart overhead during traversals, where
blocking might occur.
10) Account better for memory deficits, so the pageout daemon will be able
to make enough memory available (experimental.)
11) Fix some vnode locking problems. (From Tor, I think.)
12) Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp.
(experimental.)
13) Significantly shrink, cleanup, and make slightly faster the vm_fault.c
code. Use generation counts, get rid of unneded collpase operations,
and clean up the cluster code.
14) Make vm_zone more suitable for TSM.

This commit is partially as a result of discussions and contributions from
other people, including DG, Tor Egge, PHK, and probably others that I
have forgotten to attribute (so let me know, if I forgot.)

This is not the infamous, final cleanup of the vnode stuff, but a necessary
step. Vnode mgmt should be correct, but things might still change, and
there is still some missing stuff (like ioopt, and physical backing of
non-merged cache files, debugging of layering concepts.)


32286 06-Jan-1998 dyson

Make our v_usecount vnode reference count work identically to the
original BSD code. The association between the vnode and the vm_object
no longer includes reference counts. The major difference is that
vm_object's are no longer freed gratuitiously from the vnode, and so
once an object is created for the vnode, it will last as long as the
vnode does.

When a vnode object reference count is incremented, then the underlying
vnode reference count is incremented also. The two "objects" are now
more intimately related, and so the interactions are now much less
complex.

When vnodes are now normally placed onto the free queue with an object still
attached. The rundown of the object happens at vnode rundown time, and
happens with exactly the same filesystem semantics of the original VFS
code. There is absolutely no need for vnode_pager_uncache and other
travesties like that anymore.

A side-effect of these changes is that SMP locking should be much simpler,
the I/O copyin/copyout optimizations work, NFS should be more ponderable,
and further work on layered filesystems should be less frustrating, because
of the totally coherent management of the vnode objects and vnodes.

Please be careful with your system while running this code, but I would
greatly appreciate feedback as soon a reasonably possible.


32285 06-Jan-1998 sef

Use CHECKIO in procfs_ioctl() to ensure that any changes in UID/GID result
in the expected failure.


32120 30-Dec-1997 bde

Fixed a missing/misplaced/misstyled prototype.


32011 27-Dec-1997 bde

Unspammed nested include of <vm/vm_zone.h>.


31891 20-Dec-1997 sef

Clear the p_stops field on change of user/group id, unless the correct
flag is set in the p_pfsflags field. This, essentially, prevents an SUID
proram from hanging after being traced. (E.g., "truss /usr/bin/rlogin" would
fail, but leave rlogin in a stopevent state.) Yet another case where procctl
is (hopefully ;)) no longer needed in the general case.

Reviewed by: bde (thanks bruce :))


31691 13-Dec-1997 sef

Change the ioctls for procfs around a bit; in particular, whever possible,
change from

ioctl(fd, PIOC<foo>, &i);

to

ioctl(fd, PIOC<foo>, i);

This is going from the _IOW to _IO ioctl macro. The kernel, procctl, and
truss must be in synch for it all to work (not doing so will get errors about
inappropriate ioctl's, fortunately). Hopefully I didn't forget anything :).


31674 12-Dec-1997 sef

Fix a problem with procfs_exit() that resulted in missing some procfs
nodes; this also apparantly caused a panic in some circumstances.
Also, since procfs_exit() is getting rid of the nodes when a process
exits, don't bother checking for the process' existance in procfs_inactive().


31640 09-Dec-1997 sef

Code to prevent a panic caused by procfs_exit(). Note that i don't know
what is teh root cause -- but, sometimes, a procfs vnode in pfshead is
apparantly corrupt (or a UFS vnode instead). Without this patch, I can
get it to panic by doing (in csh)

while (1)
ps auxwww
end

and it will panic when the PID's wrap. With it, it does not panic.
Yes -- I know that this is NOT the right way to fix it. But I haven't
been able to get it to panic yet (which confuses me). I am going to
be looking into the vgone() code now, as that may be a part of it.


31636 08-Dec-1997 sef

A couple of fixes from bruce: first of all, psignal is a void (stupid
me; unfortunately, also makes it hard ot check for errors); second, I had
managed to forget a change to PIOCSFL (it should be _IOW, not _IOR) I had
in my local copy, and Bruce called me on it.

Submitted by: bde


31618 08-Dec-1997 sef

Use at_exit() to invoke procfs_exit() instead of calling it directly.
Note that an unload facility should be used to call rm_at_exit() (if
procfs is being loaded as an LKM and is subsequently removed), but it
was non-obvious how to do this in the VFS framework.

Reviewed by: Julian Elischer


31595 07-Dec-1997 sef

Clear the stop events and wakeup the process on teh last close of the
procfs/mem file. While this doesn't prevent an unkillable process, it
means that a broken truss prorgam won't do it accidently now (well,
there's a small window of opportunity). Note that this requires the
change to truss I am about to commit.


31564 06-Dec-1997 sef

Changes to allow event-based process monitoring and control.


31561 05-Dec-1997 bde

Don't include <sys/lock.h> in headers when only `struct simplelock' is
required. Fixed everything that depended on the pollution.


31174 14-Nov-1997 tegge

Don't try to obtain an excluive lock on the vm map, since a deadlock might
occur if the process owning the map is wiring pages.


31016 07-Nov-1997 phk

Remove a bunch of variables which were unused both in GENERIC and LINT.

Found by: -Wunused


30785 27-Oct-1997 bde

KNFize rev.1.31.


30780 27-Oct-1997 bde

Removed unused #includes. The need for most of them went away with
recent changes (docluster* and vfs improvements).


30743 26-Oct-1997 phk

VFS interior redecoration.

Rename vn_default_error to vop_defaultop all over the place.
Move vn_bwrite from vfs_bio.c to vfs_default.c and call it vop_stdbwrite.
Use vop_null instead of nullop.
Move vop_nopoll from vfs_subr.c to vfs_default.c
Move vop_sharedlock from vfs_subr.c to vfs_default.c
Move vop_nolock from vfs_subr.c to vfs_default.c
Move vop_nounlock from vfs_subr.c to vfs_default.c
Move vop_noislocked from vfs_subr.c to vfs_default.c
Use vop_ebadf instead of *_ebadf.
Add vop_defaultop for getpages on master vnode in MFS.


30496 16-Oct-1997 phk

VFS clean up "hekto commit"

1. Add defaults for more VOPs
VOP_LOCK vop_nolock
VOP_ISLOCKED vop_noislocked
VOP_UNLOCK vop_nounlock
and remove direct reference in filesystems.

2. Rename the nfsv2 vnop tables to improve sorting order.


30492 16-Oct-1997 phk

Another VFS cleanup "kilo commit"

1. Remove VOP_UPDATE, it is (also) an UFS/{FFS,LFS,EXT2FS,MFS}
intereface function, and now lives in the ufsmount structure.

2. Remove VOP_SEEK, it was unused.

3. Add mode default vops:

VOP_ADVLOCK vop_einval
VOP_CLOSE vop_null
VOP_FSYNC vop_null
VOP_IOCTL vop_enotty
VOP_MMAP vop_einval
VOP_OPEN vop_null
VOP_PATHCONF vop_einval
VOP_READLINK vop_einval
VOP_REALLOCBLKS vop_eopnotsupp

And remove identical functionality from filesystems

4. Add vop_stdpathconf, which returns the canonical stuff. Use
it in the filesystems. (XXX: It's probably wrong that specfs
and fifofs sets this vop, shouldn't it come from the "host"
filesystem, for instance ufs or cd9660 ?)

5. Try to make system wide VOP functions have vop_* names.

6. Initialize the um_* vectors in LFS.

(Recompile your LKMS!!!)


30474 16-Oct-1997 phk

VFS mega cleanup commit (x/N)

1. Add new file "sys/kern/vfs_default.c" where default actions for
VOPs go. Implement proper defaults for ABORTOP, BWRITE, LEASE,
POLL, REVOKE and STRATEGY. Various stuff spread over the entire
tree belongs here.

2. Change VOP_BLKATOFF to a normal function in cd9660.

3. Kill VOP_BLKATOFF, VOP_TRUNCATE, VOP_VFREE, VOP_VALLOC. These
are private interface functions between UFS and the underlying
storage manager layer (FFS/LFS/MFS/EXT2FS). The functions now
live in struct ufsmount instead.

4. Remove a kludge of VOP_ functions in all filesystems, that did
nothing but obscure the simplicity and break the expandability.
If a filesystem doesn't implement VOP_FOO, it shouldn't have an
entry for it in its vnops table. The system will try to DTRT
if it is not implemented. There are still some cruft left, but
the bulk of it is done.

5. Fix another VCALL in vfs_cache.c (thanks Bruce!)


30434 15-Oct-1997 phk

Hmm, realign the vnops into two columns.


30431 15-Oct-1997 phk

Stylistic overhaul of vnops tables.
1. Remove comment stating the blatantly obvious.
2. Align in two columns.
3. Sort all but the default element alphabetically.
4. Remove XXX comments pointing out entries not needed.


29653 21-Sep-1997 dyson

Change the M_NAMEI allocations to use the zone allocator. This change
plus the previous changes to use the zone allocator decrease the useage
of malloc by half. The Zone allocator will be upgradeable to be able
to use per CPU-pools, and has more intelligent usage of SPLs. Additionally,
it has reasonable stats gathering capabilities, while making most calls
inline.


29362 14-Sep-1997 peter

Convert select -> poll.
Delete 'always succeed' select/poll handlers, replaced with generic call.
Flag missing vnode op table entries.


29179 07-Sep-1997 bde

Some staticized variables were still declared to be extern.


28270 16-Aug-1997 wollman

Fix all areas of the system (or at least all those in LINT) to avoid storing
socket addresses in mbufs. (Socket buffers are the one exception.) A number
of kernel APIs needed to get fixed in order to make this happen. Also,
fix three protocol families which kept PCBs in mbufs to not malloc them
instead. Delete some old compatibility cruft while we're at it, and add
some new routines in the in_cksum family.


28089 12-Aug-1997 sef

Check permissions for fp regs as well as normal regs.


28086 12-Aug-1997 sef

Fix procfs security hole -- check permissions on meaningful I/Os (namely,
reading/writing of mem and regs). Also have to check for the requesting
process being group KMEM -- this is a bit of a hack, but ps et al need it.

Reviewed by: davidg


27845 02-Aug-1997 bde

Removed unused #includes.


26962 26-Jun-1997 alex

Style fix my previous commit.


26769 21-Jun-1997 alex

Block all write operations to /proc/1/* when securelevel > 0.
The additional check in procfs_ctl.c could be backed out, but
I'm leaving it in for good measure.

Reviewed by: Theo de Raadt <deraadt@OpenBSD.org>


25207 27-Apr-1997 alex

Removed bogon from previous commit: doubly included sys/systm.h.


25200 27-Apr-1997 alex

Prevent debugger attachment to init when securelevel > 0.

Noticed by: Brian Buchanan <brian@wasteland.calbbs.com>


25055 20-Apr-1997 dyson

Fix both a problem with accessing backing objects, and also release
the process map on nonexistant pages.
PR: kern/3327
Submitted by: Tor Egge <Tor.Egge@idi.ntnu.no>


24666 06-Apr-1997 dyson

Fix the gdb executable modify problem. Thanks to the detective work
by Alan Cox <alc@cs.rice.edu>, and his description of the problem.

The bug was primarily in procfs_mem, but the mistake likely happened
due to the lack of vm system support for the operation. I added
better support for selective marking of page dirty flags so that
vm_map_pageable(wiring) will not cause this problem again.

The code in procfs_mem is now less bogus (but maybe still a little
so.)


24203 24-Mar-1997 bde

Don't include <sys/ioctl.h> in the kernel. Stage 1: don't include
it when it is not used. In most cases, the reasons for including it
went away when the special ioctl headers became self-sufficient.


23526 08-Mar-1997 bde

Fixed missing initialisation of vp->v_type for types Pfile and Pmem
in procfs_allocvp(). This fixes at least stat() of /proc/*/mem.

stat() of /proc/*/file already worked. I think procfs_allocvp() isn't
actually called for type Pfile.


23077 24-Feb-1997 bde

Fixed procfs's locking vops. They were missed in the Lite2 merge,
partly because the #define's for them were moved to a different
file. At least the null VOP_LOCK() no longer works, since vclean()
expects VOP_LOCK( ..., LK_DRAIN | LK_INTERLOCK, ...) to clear the
interlock. This probably only matters when simple_lock() is not
null, i.e., when there are multiple CPUs or SIMPLELOCK_DEBUG is
defined.


22975 22-Feb-1997 peter

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


22579 12-Feb-1997 mpp

Add function prototypes for most of the new Lite2 functions.
Also made a few of the miscfs routines static to be
consistent. Some modules simply required some additional
#includes to remove -Wall warnings.


22521 10-Feb-1997 dyson

This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.

Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>


21754 16-Jan-1997 dyson

Change the map entry flags from bitfields to bitmasks. Allows
for some code simplification.


21673 14-Jan-1997 jkh

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


19261 30-Oct-1996 dyson

Fix a potential deadlock from the previous commit.


19260 30-Oct-1996 dyson

Fix the /proc/???/map file so that it is possible to read an arbitrarily
large process map. Another commit will follow to fix a problem just found
during this one... Sorry!!! :-(.


19141 24-Oct-1996 dyson

Fix setting breakpoints in shared regions.


18020 03-Sep-1996 bde

Eliminated nested include of <sys/unistd.h> in <sys/file.h> in the kernel.
Include it directly in the few places where it is used.

Reduced some #includes of <sys/file.h> to #includes of <sys/fcntl.h> or
nothing.


17974 31-Aug-1996 bde

Fixed the easy cases of const poisoning in the kernel. Cosmetic.


17306 27-Jul-1996 dyson

Modify slightly the output from the map file in /proc. Now the
executable bit is shown.


17303 27-Jul-1996 dyson

Under certain circumstances, reading the /proc/*/map file can
crash the system. Nonexistant objects were not handled correctly.


16901 02-Jul-1996 dyson

Implement locking for pfs nodes, when at the leaf. Concurrent access
to information from a single process causes hangs. Specifically, this
fixes problems (hangs) with concurrent ps commands, when the system is under
heavy memory load.
Reviewed by: davidg


16889 02-Jul-1996 dyson

Fix a serious problem, with a window where an object lock is needed,
but not there. The extent of the object lock is expanded to be over the
range that it is needed. Additionally, clean up the code so that it conforms
to better coding style.


16476 18-Jun-1996 dyson

Add procfs_type.c to the repository.


16474 18-Jun-1996 dyson

Clean-up the new VM map procfs code, and also add support for executable
format file "etype". It contains a description of the binary type for
a process.


16468 17-Jun-1996 dyson

This file is the "meat" of the process address space capability. If you
would like other things added, just ask!!! It might be pretty easy to add.


16467 17-Jun-1996 dyson

Add a feature to procfs to allow display of the process address map
with multiple entries as follows:

start address, end address, resident pages in range, private pages
in range, RW/RO, COW or not, (vnode/device/swap/default).


16312 12-Jun-1996 dg

Moved the fsnode MALLOC to before the call to getnewvnode() so that the
process won't possibly block before filling in the fsnode pointer (v_data)
which might be dereferenced during a sync since the vnode is put on the
mnt_vnodelist by getnewvnode.

Pointed out by Matt Day <mday@artisoft.com>


16308 11-Jun-1996 dyson

Properly lock the vm space when accessing the memory in a process. This
fix could solve some "interesting" problems that could happen during
process rundown.


14532 11-Mar-1996 hsu

For Lite2: proc LIST changes.
Reviewed by: davidg & bde


13838 02-Feb-1996 wosch

add ruid and rgid to file 'status'


13627 25-Jan-1996 peter

This time, really make the procfs work when reading stuff from the UPAGES.

This is a really ugly bandaid on the problem, but it works well enough for
'ps -u' to start working again. The problem was caused by the user
address space shrinking by a little bit and the UPAGES being "cast off" to
become a seperate entity rather than being at the top of the process's
vmspace. That optimization was part of John's most recent VM speedups.

Now, rather than decoding the VM space, it merely ensures the pages are
in core and accesses them the same way the ptrace(PT_READ_U..) code does,
ie: off the p->p_addr pointer.


13608 24-Jan-1996 peter

Major fixes for procfs..

Implement a "variable" directory structure. Files that do not make
sense for the given process do not "appear" and cannot be opened.
For example, "system" processes do not have "file", "regs" or "fpregs",
because they do not have a user area.

"attempt" to fill in the user area of a given process when it is being
accessed via /proc/pid/mem (the user struct is just after
VM_MAXUSER_ADDRESS in the process address space.)

Dont do IO to the U area while it's swapped, hold it in place if possible.

Lock off access to the "ctl" file if it's done a setuid like the other
pseudo-files in there.


13490 19-Jan-1996 dyson

Eliminated many redundant vm_map_lookup operations for vm_mmap.
Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish
overhead for merged cache.
Efficiency improvement for vfs_cluster. It used to do alot of redundant
calls to cluster_rbuild.
Correct the ordering for vrele of .text and release of credentials.
Use the selective tlb update for 486/586/P6.
Numerous fixes to the size of objects allocated for files. Additionally,
fixes in the various pagers.
Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs.
Fixes in the swap pager for exhausted resources. The pageout code
will not as readily thrash.
Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into
page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE),
thereby improving efficiency of several routines.
Eliminate even more unnecessary vm_page_protect operations.
Significantly speed up process forks.
Make vm_object_page_clean more efficient, thereby eliminating the pause
that happens every 30seconds.
Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the
case of filesystems mounted async.
Fix a panic with busy pages when write clustering is done for non-VMIO
buffers.


12904 17-Dec-1995 bde

Fixed 1TB filesize changes. Some pindexes had bogus names and types
but worked because vm_pindex_t is indistinuishable from vm_offset_t.


12767 11-Dec-1995 dyson

Changes to support 1Tb filesizes. Pages are now named by an
(object,index) pair instead of (object,offset) pair.


12662 07-Dec-1995 dg

Untangled the vm.h include file spaghetti.


12595 03-Dec-1995 bde

Added prototypes.

Removed some unnecessary #includes.


12337 16-Nov-1995 bde

Moved declarations for static functions to the correct place (not in a
header).

Removed stupid comments.


12336 16-Nov-1995 bde

Fixed the type of procfs_sync(). Trailing args were missing.

Fixed the type of procfs_fhtovp(). The args had little resemblance to
the correct ones.

Added prototypes.


12158 09-Nov-1995 bde

Introduced a type `vop_t' for vnode operation functions and used
it 1138 times (:-() in casts and a few more times in declarations.
This change is null for the i386.

The type has to be `typedef int vop_t(void *)' and not `typedef
int vop_t()' because `gcc -Wstrict-prototypes' warns about the
latter. Since vnode op functions are called with args of different
(struct pointer) types, neither of these function types is any use
for type checking of the arg, so it would be preferable not to use
the complete function type, especially since using the complete
type requires adding 1138 casts to avoid compiler warnings and
another 40+ casts to reverse the function pointer conversions before
calling the functions.


12143 07-Nov-1995 phk

Make a lot of private stuff static.
Should anybody out there wonder about this vendetta against global
variables, it is basically to make it more visible what our interfaces
in the kernel really are.
I'm almost convinced we should have a
#define PUBLIC /* public interface */
and use it in the #includes...


11707 23-Oct-1995 dyson

Removal of unnecessary usage of PG_COPYONWRITE.


10531 02-Sep-1995 mpp

Change procfs_lookup to not allow delete/rename operations
to prevent panics when a user tries to remove/rename the
contents of /proc/###/*.

Obtained from: 4.4BSD-lite2


10024 11-Aug-1995 dg

Be careful not to dereference NULL credentials pointers when doing the
getattr function.


9540 16-Jul-1995 bde

Don't include <sys/tty.h> in drivers that aren't tty drivers or in general
files that don't depend on the internals of <sys/tty.h>


9507 13-Jul-1995 dg

NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct
proc or any VM system structure will have to be rebuilt!!!

Much needed overhaul of the VM system. Included in this first round of
changes:

1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages,
haspage, and sync operations are supported. The haspage interface now
provides information about clusterability. All pager routines now take
struct vm_object's instead of "pagers".

2) Improved data structures. In the previous paradigm, there is constant
confusion caused by pagers being both a data structure ("allocate a
pager") and a collection of routines. The idea of a pager structure has
escentially been eliminated. Objects now have types, and this type is
used to index the appropriate pager. In most cases, items in the pager
structure were duplicated in the object data structure and thus were
unnecessary. In the few cases that remained, a un_pager structure union
was created in the object to contain these items.

3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now
be removed. For instance, vm_object_enter(), vm_object_lookup(),
vm_object_remove(), and the associated object hash list were some of the
things that were removed.

4) simple_lock's removed. Discussion with several people reveals that the
SMP locking primitives used in the VM system aren't likely the mechanism
that we'll be adopting. Even if it were, the locking that was in the code
was very inadequate and would have to be mostly re-done anyway. The
locking in a uni-processor kernel was a no-op but went a long way toward
making the code difficult to read and debug.

5) Places that attempted to kludge-up the fact that we don't have kernel
thread support have been fixed to reflect the reality that we are really
dealing with processes, not threads. The VM system didn't have complete
thread support, so the comments and mis-named routines were just wrong.
We now use tsleep and wakeup directly in the lock routines, for instance.

6) Where appropriate, the pagers have been improved, especially in the
pager_alloc routines. Most of the pager_allocs have been rewritten and
are now faster and easier to maintain.

7) The pagedaemon pageout clustering algorithm has been rewritten and
now tries harder to output an even number of pages before and after
the requested page. This is sort of the reverse of the ideal pagein
algorithm and should provide better overall performance.

8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup
have been removed. Some other unnecessary casts have also been removed.

9) Some almost useless debugging code removed.

10) Terminology of shadow objects vs. backing objects straightened out.
The fact that the vm_object data structure escentially had this
backwards really confused things. The use of "shadow" and "backing
object" throughout the code is now internally consistent and correct
in the Mach terminology.

11) Several minor bug fixes, including one in the vm daemon that caused
0 RSS objects to not get purged as intended.

12) A "default pager" has now been created which cleans up the transition
of objects to the "swap" type. The previous checks throughout the code
for swp->pg_data != NULL were really ugly. This change also provides
the rudiments for future backing of "anonymous" memory by something
other than the swap pager (via the vnode pager, for example), and it
allows the decision about which of these pagers to use to be made
dynamically (although will need some additional decision code to do
this, of course).

13) (dyson) MAP_COPY has been deprecated and the corresponding "copy
object" code has been removed. MAP_COPY was undocumented and non-
standard. It was furthermore broken in several ways which caused its
behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will
continue to work correctly, but via the slightly different semantics
of MAP_PRIVATE.

14) (dyson) Sharing maps have been removed. It's marginal usefulness in a
threads design can be worked around in other ways. Both #12 and #13
were done to simplify the code and improve readability and maintain-
ability. (As were most all of these changes)

TODO:

1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing
this will reduce the vnode pager to a mere fraction of its current size.

2) Rewrite vm_fault and the swap/vnode pagers to use the clustering
information provided by the new haspage pager interface. This will
substantially reduce the overhead by eliminating a large number of
VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be
improved to provide both a "behind" and "ahead" indication of
contiguousness.

3) Implement the extended features of pager_haspage in swap_pager_haspage().
It currently just says 0 pages ahead/behind.

4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps
via a much more general mechanism that could also be used for disk
striping of regular filesystems.

5) Do something to improve the architecture of vm_object_collapse(). The
fact that it makes calls into the swap pager and knows too much about
how the swap pager operates really bothers me. It also doesn't allow
for collapsing of non-swap pager objects ("unnamed" objects backed by
other pagers).


9346 28-Jun-1995 dg

Killed the "probably_never" ifdef'd code.


8876 30-May-1995 rgrimes

Remove trailing whitespace.


8740 25-May-1995 dg

Fixed panic that resulted from mmaping files in kernfs and procfs. A
regular user could panic the machine with a simple "tail /proc/curproc/mem"
command. The problem was twofold: both kernfs and procfs didn't fill in
the mnt_stat statfs struct (which would later lead to an integer divide
fault in the vnode pager), and kernfs bogusly paniced if a bmap was
attempted.

Reviewed by: John Dyson


8456 11-May-1995 rgrimes

Fix -Wformat warnings from LINT kernel.


7835 15-Apr-1995 dg

For P_SUGID processes, we must also change ownership of the mem file
to root so that group kmem can still get to it. *SIGH*


7833 15-Apr-1995 dg

Retain group kmem readability for P_SUGID processes.


7832 15-Apr-1995 dg

Made /proc/n/mem file group kmem and group readable. Needed to fix ps so
that it doesn't need to be setuid root.


7095 16-Mar-1995 wollman

Add four more filesystem flags:

VFCF_NETWORK (this FS goes over the net)
VFCF_READONLY (read-write mounts do not make any sense)
VFCF_SYNTHETIC (data in this FS is not real)
VFCF_LOOPBACK (this FS aliases something else)

cd9660 is readonly; nullfs, umapfs, and union are loopback; NFS is netowkr;
procfs, kernfs, and fdesc are synthetic.


7090 16-Mar-1995 bde

Add and move declarations to fix all of the warnings from `gcc -Wimplicit'
(except in netccitt, netiso and netns) and most of the warnings from
`gcc -Wnested-externs'. Fix all the bugs found. There were no serious
ones.


6569 20-Feb-1995 dg

Make sure process isn't swapped when messing with it.
Added missing newline to log() call.


6151 03-Feb-1995 dg

Fixed bmap run-length brokeness.
Use bmap run-length extension when doing clustered paging.

Submitted by: John Dyson


5403 05-Jan-1995 dg

Initialize map start hint to vm_map_find()...not doing so will cause it
to fail if the random thing on the stack happens to be too large.

Submitted by: David Jones <dej@qpoint.torfree.net>


5312 31-Dec-1994 ache

Fix problem when attached process detached
Submitted by: Gary Jennejohn


3687 18-Oct-1994 dg

Fixed bug I just introduced that would have allowed a user to clobber
his kernel stack.


3685 18-Oct-1994 dg

Allow upages to be paged in/accessed.

Submitted by: John Dyson


3496 10-Oct-1994 phk

Cosmetics. reduce the noise from gcc -Wall.


3396 06-Oct-1994 dg

Use tsleep() rather than sleep so that 'ps' is more informative about
the wait.


3054 24-Sep-1994 dg

1) Added "." and ".." entries.
2) Fixed directory size to return something reasonable.
3) Disabled "file" until the code is completed.
4) Corrected directory link counts.


2946 21-Sep-1994 wollman

Implemented loadable VFS modules, and made most existing filesystems
loadable. (NFS is a notable exception.)


2807 15-Sep-1994 bde

Supply prototypes for some functions that were implicitly declared and
fix the resulting warnings.


2112 18-Aug-1994 wollman

Fix up some sloppy coding practices:

- Delete redundant declarations.
- Add -Wredundant-declarations to Makefile.i386 so they don't come back.
- Delete sloppy COMMON-style declarations of uninitialized data in
header files.
- Add a few prototypes.
- Clean up warnings resulting from the above.

NB: ioconf.c will still generate a redundant-declaration warning, which
is unavoidable unless somebody volunteers to make `config' smarter.


1817 02-Aug-1994 dg

Added $Id$


1549 25-May-1994 rgrimes

The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.

Reviewed by: Rodney W. Grimes
Submitted by: John Dyson and David Greenman


1541 24-May-1994 rgrimes

BSD 4.4 Lite Kernel Sources