History log of /freebsd-10.1-release/sys/kern/kern_resource.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 272461 02-Oct-2014 gjb

Copy stable/10@r272459 to releng/10.1 as part of
the 10.1-RELEASE process.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

# 262260 20-Feb-2014 mjg

MFC r259330,r259331:

rlimit: add and utilize lim_shared

rlimit: avoid unnecessary copying of rlimits

If refcount is 1 just modify rlimits in place.


# 256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


# 247905 07-Mar-2013 ian

Call sched_prio() to immediately change the priority of the thread in
response to an rtprio_thread() call, when the priority is different
than the old priority, and either the old or the new priority class is
not RTP_PRIO_NORMAL (timeshare).

The reasoning for the second half of the test is that if it's a change in
timeshare priority, then the scheduler is going to adjust that priority
in a way that completely wipes out the requested change anyway, so
what's the point? (If that's not true, then allowing a thread to change
its own timeshare priority would subvert the scheduler's adjustments and
let a cpu-bound thread monopolize the cpu; if allowed at all, that
should require priveleges.)

On the other hand, if either the old or new priority class is not
timeshare, then the scheduler doesn't make automatic adjustments, so we
should honor the request and make the priority change right away. The
reason the old class gets caught up in this is the very reason for this
change: when thread A changes the priority of its child thread B from
idle back to timeshare, thread B never actually gets moved to a
timeshare-range run queue unless there are some idle cycles available
to allow it to first get scheduled again as an idle thread.

Reviewed by: jhb@


# 247800 04-Mar-2013 davide

MFcalloutng (r244251 with minor changes):
Specify that precision of 0.5s is enough for resource limitation.

Sponsored by: Google Summer of Code 2012, iXsystems inc.
Tested by: flo, marius, ian, markj, Fabian Keil


# 230470 22-Jan-2012 trociny

Change kern.proc.rlimit sysctl to:

- retrive only one, specified limit for a process, not the whole
array, as it was previously (the sysctl has been added recently and
has not been backported to stable yet, so this change is ok);

- allow to set a resource limit for another process.

Submitted by: Andrey Zonov <andrey at zonov.org>
Discussed with: kib
Reviewed by: kib
MFC after: 2 weeks


# 229622 05-Jan-2012 jhb

Fix a logic bug in change 228207 in the check for a thread's new user
priority being a realtime priority.

MFC after: 3 days


# 228470 13-Dec-2011 eadler

- Add a sysctl to allow non-root users the ability to set idle
priorities.

- While here fix up some style nits.

Discussed with: cperciva (breifly)
Reviewed by: pjd (earlier version)
Reviewed by: bde
Approved by: jhb
MFC after: 1 month


# 228207 02-Dec-2011 jhb

When changing the user priority of a thread, change the real priority
in addition to the user priority for threads whose current real priority
is equal to the previous user priority or if the new priority is a
real-time priority. This allows priority changes of other threads to
have an immediate effect.

MFC after: 2 weeks


# 227315 07-Nov-2011 trociny

In lim_fork() assert that processes locks are held.

Suggested by: kib


# 225617 16-Sep-2011 kmacy

In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by: rwatson
Approved by: re (bz)


# 224188 18-Jul-2011 jhb

- Export each thread's individual resource usage in in struct kinfo_proc's
ki_rusage member when KERN_PROC_INC_THREAD is passed to one of the
process sysctls.
- Correctly account for the current thread's cputime in the thread when
doing the runtime fixup in calcru().
- Use TIDs as the key to lookup the previous thread to compute IO stat
deltas in IO mode in top when thread display is enabled.

Reviewed by: kib
Approved by: re (kib)


# 220390 06-Apr-2011 jhb

Fix several places to ignore processes that are not yet fully constructed.

MFC after: 1 week


# 220137 29-Mar-2011 trasz

Add racct. It's an API to keep per-process, per-jail, per-loginclass
and per-loginclass resource accounting information, to be used by the new
resource limits code. It's connected to the build, but the code that
actually calls the new functions will come later.

Sponsored by: The FreeBSD Foundation
Reviewed by: kib (earlier version)


# 219968 24-Mar-2011 jhb

Fix some locking nits with the p_state field of struct proc:
- Hold the proc lock while changing the state from PRS_NEW to PRS_NORMAL
in fork to honor the locking requirements. While here, expand the scope
of the PROC_LOCK() on the new process (p2) to avoid some LORs. Previously
the code was locking the new child process (p2) after it had locked the
parent process (p1). However, when locking two processes, the safe order
is to lock the child first, then the parent.
- Fix various places that were checking p_state against PRS_NEW without
having the process locked to use PROC_LOCK(). Every place was already
locking the process, just after the PRS_NEW check.
- Remove or reduce the use of PROC_SLOCK() for places that were checking
p_state against PRS_NEW. The PROC_LOCK() alone is sufficient for reading
the current state.
- Reorder fill_kinfo_proc() slightly so it only acquires PROC_SLOCK() once.

MFC after: 1 week


# 216791 29-Dec-2010 davidxu

- Follow r216313, the sched_unlend_user_prio is no longer needed, always
use sched_lend_user_prio to set lent priority.
- Improve pthread priority-inherit mutex, when a contender's priority is
lowered, repropagete priorities, this may cause mutex owner's priority
to be lowerd, in old code, mutex owner's priority is rise-only.


# 216504 17-Dec-2010 jhb

Add back a bounds check on valid idle priorities that was lost in an
earlier commit. While here, move the thread lock down in rtp_to_pri().
It is not needed for all of the priority value checks and the computation
of newpri.

Reported by: swell.k @ gmail
MFC after: 3 days


# 214025 18-Oct-2010 emaste

We've already set p = td->td_proc, so use it.


# 213642 09-Oct-2010 davidxu

Create a global thread hash table to speed up thread lookup, use
rwlock to protect the table. In old code, thread lookup is done with
process lock held, to find a thread, kernel has to iterate through
process and thread list, this is quite inefficient.
With this change, test shows in extreme case performance is
dramatically improved.

Earlier patch was reviewed by: jhb, julian


# 210226 18-Jul-2010 trasz

Revert r210225 - turns out I was wrong; the "/*-" is not license-only
thing; it's also used to indicate that the comment should not be automatically
rewrapped.

Explained by: cperciva@


# 210225 18-Jul-2010 trasz

The "/*-" comment marker is supposed to denote copyrights. Remove non-copyright
occurences from sys/sys/ and sys/kern/.


# 210224 18-Jul-2010 trasz

Remove outdated comment and move part of it into more applicable place.


# 209390 21-Jun-2010 ed

Use ISO C99 integer types in sys/kern where possible.

There are only about 100 occurences of the BSD-specific u_int*_t
datatypes in sys/kern. The ISO C99 integer types are used here more
often.


# 208488 24-May-2010 kib

Fix the double counting of the last process thread td_incruntime
on exit, that is done once in thread_exit() and the second time in
proc_reap(), by clearing td_incruntime.

Use the opportunity to revert to the pre-RUSAGE_THREAD exporting of ruxagg()
instead of ruxagg_locked() and use it from thread_exit().

Diagnosed and tested by: neel
MFC after: 3 days


# 207602 04-May-2010 kib

Implement RUSAGE_THREAD. Add td_rux to keep extended runtime and ticks
information for thread to allow calcru1() (re)use.

Rename ruxagg()->ruxagg_locked(), ruxagg_tlock()->ruxagg() [1].
The ruxagg_locked() function no longer clears thread ticks nor
td_incruntime.

Requested by: attilio [1]
Discussed with: attilio, bde
Reviewed by: bde
Based on submission by: Alexander Krizhanovsky <ak natsys-lab com>
MFC after: 1 week
X-MFC-Note: td_rux shall be moved to the end of struct thread


# 207468 01-May-2010 kib

Extract thread_lock()/ruxagg()/thread_unlock() fragment into utility
function ruxagg_tlock().
Convert the definition of kern_getrusage() to ANSI C.

Submitted by: Alexander Krizhanovsky <ak natsys-lab com>
MFC after: 1 week


# 204670 03-Mar-2010 rrs

sched_getparam was just plain broke for time-share
processes. It did not return an error but instead
just let garbage be passed back. This I fix so
it actually properly translates the priority the
process is at to a posix's high means more priority.
I also fix it so that if the ULE scheduler has bumped
it up to a realtime process you get back a sane value
i.e. the highest priority (63 for time-share).

sched_setscheduler() had the setting of the
timeshare class priority disabled. With some notes
about rejecting the posix high numbers is greater
priority and use nice instead. This fix also
adjusts that to work, with the cavet that a t-s
process may well get bumped up or down i.e. the
setscheduler() will NOT change the nice value only
the current priority. I think this is reasonable
considering if the user wants to play with nice then
he can. At least all the posix'ish interfaces now
respond sanely.

MFC after: 3 weeks


# 194766 23-Jun-2009 kib

Implement global and per-uid accounting of the anonymous memory. Add
rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved
for the uid.

The accounting information (charge) is associated with either map entry,
or vm object backing the entry, assuming the object is the first one
in the shadow chain and entry does not require COW. Charge is moved
from entry to object on allocation of the object, e.g. during the mmap,
assuming the object is allocated, or on the first page fault on the
entry. It moves back to the entry on forks due to COW setup.

The per-entry granularity of accounting makes the charge process fair
for processes that change uid during lifetime, and decrements charge
for proper uid when region is unmapped.

The interface of vm_pager_allocate(9) is extended by adding struct ucred *,
that is used to charge appropriate uid when allocation if performed by
kernel, e.g. md(4).

Several syscalls, among them is fork(2), may now return ENOMEM when
global or per-uid limits are enforced.

In collaboration with: pho
Reviewed by: alc
Approved by: re (kensmith)


# 184217 23-Oct-2008 davidxu

Don't rearm callout if the process is exiting, it may leak a callout
because callout_drain() only waits for running callout, but not disable
it if it is rearmed.


# 184205 23-Oct-2008 des

Retire the MALLOC and FREE macros. They are an abomination unto style(9).

MFC after: 3 months


# 182792 05-Sep-2008 ed

Fix a small typo in a comment in calcru1().

The word "happene" should read "happened".

Submitted by: Jille Timmermans <jille quis cx>


# 181905 20-Aug-2008 ed

Integrate the new MPSAFE TTY layer to the FreeBSD operating system.

The last half year I've been working on a replacement TTY layer for the
FreeBSD kernel. The new TTY layer was designed to improve the following:

- Improved driver model:

The old TTY layer has a driver model that is not abstract enough to
make it friendly to use. A good example is the output path, where the
device drivers directly access the output buffers. This means that an
in-kernel PPP implementation must always convert network buffers into
TTY buffers.

If a PPP implementation would be built on top of the new TTY layer
(still needs a hooks layer, though), it would allow the PPP
implementation to directly hand the data to the TTY driver.

- Improved hotplugging:

With the old TTY layer, it isn't entirely safe to destroy TTY's from
the system. This implementation has a two-step destructing design,
where the driver first abandons the TTY. After all threads have left
the TTY, the TTY layer calls a routine in the driver, which can be
used to free resources (unit numbers, etc).

The pts(4) driver also implements this feature, which means
posix_openpt() will now return PTY's that are created on the fly.

- Improved performance:

One of the major improvements is the per-TTY mutex, which is expected
to improve scalability when compared to the old Giant locking.
Another change is the unbuffered copying to userspace, which is both
used on TTY device nodes and PTY masters.

Upgrading should be quite straightforward. Unlike previous versions,
existing kernel configuration files do not need to be changed, except
when they reference device drivers that are listed in UPDATING.

Obtained from: //depot/projects/mpsafetty/...
Approved by: philip (ex-mentor)
Discussed: on the lists, at BSDCan, at the DevSummit
Sponsored by: Snow B.V., the Netherlands
dcons(4) fixed by: kan


# 177377 19-Mar-2008 pjd

Remove extra uihold() call that accidentally sneak in during perforce
change @125544.


# 177368 19-Mar-2008 jeff

- Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice from
requiring the per-process spinlock to only requiring the process lock.
- Reflect these changes in the proc.h documentation and consumers throughout
the kernel. This is a substantial reduction in locking cost for these
fields and was made possible by recent changes to threading support.


# 177278 16-Mar-2008 pjd

Whitespace cleanups.


# 177277 16-Mar-2008 pjd

- Use wait-free method to manage ui_sbsize and ui_proccnt fields in the
uidinfo structure. This entirely removes contention observed on the
ui_mtxp mutex (as it is now gone).
- Convert the uihashtbl_mtx mutex to a rwlock, as most of the time we just
need to read-lock it.

Reviewed by: jhb, jeff, kris & others
Tested by: kris


# 177264 16-Mar-2008 pjd

Style fixes.


# 177263 16-Mar-2008 pjd

Fix information leak. We can find PIDs of running processes from within
a jail, etc. by simply calling setpriority(PRIO_PROCESS, <PID>, 0) and
checking the return value: 0 means that the process exists and -1 that
it doesn't exist.

Reviewed by: rwatson
MFC after: 1 week


# 177091 12-Mar-2008 jeff

Remove kernel support for M:N threading.

While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential. Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.


# 175219 10-Jan-2008 rwatson

Don't zero td_runtime when billing thread CPU usage to the process;
maintain a separate td_incruntime to hold unbilled CPU usage for
the thread that has the previous properties of td_runtime.

When thread information is requested using the thread monitoring
sysctls, export thread td_runtime instead of process rusage runtime
in kinfo_proc.

This restores the display of individual ithread and other kernel
thread CPU usage since inception in ps -H and top -SH, as well for
libthr user threads, valuable debugging information lost with the
move to try kthreads since they are no longer independent processes.

There is universal agreement that we should rewrite the process and
thread export sysctls, but this commit gets things going a bit
better in the mean time. Likewise, there are resevations about the
continued validity of statclock given the speed of modern processors.

Reviewed by: attilio, emaste, jhb, julian


# 174536 11-Dec-2007 davidxu

Fix LOR of thread lock and umtx's priority propagation mutex due
to the reworking of scheduler lock.

MFC: after 3 days


# 171468 16-Jul-2007 jeff

- Use ruxagg() in calcru() to make sure we have current tick information
from all threads.

Discussed with: bde, attilio
Approved by: re


# 171410 12-Jul-2007 jhb

Fix a couple of issues with the stack limit for 32-bit processes on 64-bit
kernels exposed by the recent fixes to resource limits for 32-bit processes
on 64-bit kernels:
- Let ABIs expose their maximum stack size via a new pointer in sysentvec
and use that in preference to maxssiz during exec() rather than always
using maxssiz for all processses.
- Apply the ABI's limit fixup to the previous stack size when adjusting
RLIMIT_STACK to determine if the existing mapping for the stack needs to
be grown or shrunk (as well as how much it should be grown or shrunk).

Approved by: re (kensmith)


# 170745 14-Jun-2007 rwatson

Remove the restriction that rtprio(2) cannot be used to set the realtime
or idle priority of another process owned by the same user. This means
that privilege in rtprio(2) (and rtprio_thread(2)) is required indirectly
via p_cansched(9) or directly to set realtime/idle privilege, rather than
directly affecting target process authorization.


# 170587 11-Jun-2007 rwatson

Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in
some cases, move to priv_check() if it was an operation on a thread and
no other flags were present.

Eliminate caller-side jail exception checking (also now-unused); jail
privilege exception code now goes solely in kern_jail.c.

We can't yet eliminate suser() due to some cases in the KAME code where
a privilege check is performed and then used in many different deferred
paths. Do, however, move those prototypes to priv.h.

Reviewed by: csjp
Obtained from: TrustedBSD Project


# 170472 09-Jun-2007 attilio

rufetch and calcru sometimes should be called atomically together.
This patch fixes places where they should be called atomically changing
their locking requirements (both assume per-proc spinlock held) and
introducing rufetchcalc which wrappers both calls to be performed in
atomic way.

Reviewed by: jeff
Approved by: jeff (mentor)


# 170466 09-Jun-2007 attilio

The current rusage code show peculiar problems:
- Unsafeness on ruadd() in thread_exit()
- Unatomicity of thread_exiit() in the exit1() operations

This patch addresses these problems allocating p_fd as part of the
process and modifying the way it is accessed.

A small chunk of this patch, resolves a race about p_state in kern_wait(),
since we have to be sure about the zombif-ing process.

Submitted by: jeff
Approved by: jeff (mentor)


# 170307 04-Jun-2007 jeff

Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
sychronization.
- Use the per-process spinlock rather than the sched_lock for per-process
scheduling synchronization.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


# 170175 31-May-2007 jeff

Forced commit to describe changes in the last revision.

- Move cpu limit handling to a callout that runs once per-second and sums
up all threads tick times to check for violations. This removes all code
from mi_switch() that touches the proc. This also cleans up ast() a bit
by removing one large case.


# 170174 31-May-2007 jeff

- Move rusage from being per-process in struct pstats to per-thread in
td_ru. This removes the requirement for per-process synchronization in
statclock() and mi_switch(). This was previously supported by
sched_lock which is going away. All modifications to rusage are now
done in the context of the owning thread. reads proceed without locks.
- Aggregate exiting threads rusage in thread_exit() such that the exiting
thread's rusage is not lost.
- Provide a new routine, rufetch() to fetch an aggregate of all rusage
structures from all threads in a process. This routine must be used
in any place requiring a rusage from a process prior to it's exit. The
exited process's rusage is still available via p_ru.
- Aggregate tick statistics only on demand via rufetch() or when a thread
exits. Tick statistics are kept in the thread and protected by sched_lock
until it exits.

Initial patch by: attilio
Reviewed by: attilio, bde (some objections), arch (mostly silent)


# 170035 27-May-2007 rwatson

Universally adopt most conventional spelling of acquire.


# 169565 14-May-2007 jhb

Rework the support for ABIs to override resource limits (used by 32-bit
processes under 64-bit kernels). Previously, each 32-bit process overwrote
its resource limits at exec() time. The problem with this approach is that
the new limits affect all child processes of the 32-bit process, including
if the child process forks and execs a 64-bit process. To fix this, don't
ovewrite the resource limits during exec(). Instead, sv_fixlimits() is
now replaced with a different function sv_fixlimit() which asks the ABI to
sanitize a single resource limit. We then use this when querying and
setting resource limits. Thus, if a 32-bit process sets a limit, then
that new limit will be inherited by future children. However, if the
32-bit process doesn't change a limit, then a future 64-bit child will
see the "full" 64-bit limit rather than the 32-bit limit.

MFC is tentative since it will break the ABI of old linux.ko modules (no
other modules are affected).

MFC after: 1 week


# 167232 05-Mar-2007 rwatson

Further system call comment cleanup:

- Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde)
- Remove extra blank lines in some cases.
- Add extra blank lines in some cases.
- Remove no-op comments consisting solely of the function name, the word
"syscall", or the system call name.
- Add punctuation.
- Re-wrap some comments.


# 167211 04-Mar-2007 rwatson

Remove 'MPSAFE' annotations from the comments above most system calls: all
system calls now enter without Giant held, and then in some cases, acquire
Giant explicitly.

Remove a number of other MPSAFE annotations in the credential code and
tweak one or two other adjacent comments.


# 167007 26-Feb-2007 delphij

Close race conditions between fork() and [sg]etpriority()'s
PRIO_USER case, possibly also other places that deferences
p_ucred.

In the past, we insert a new process into the allproc list right
after PID allocation, and release the allproc_lock sx. Because
most content in new proc's structure is not yet initialized,
this could lead to undefined result if we do not handle PRS_NEW
with care.

The problem with PRS_NEW state is that it does not provide fine
grained information about how much initialization is done for a
new process. By defination, after PRIO_USER setpriority(), all
processes that belongs to given user should have their nice value
set to the specified value. Therefore, if p_{start,end}copy
section was done for a PRS_NEW process, we can not safely ignore
it because p_nice is in this area. On the other hand, we should
be careful on PRS_NEW processes because we do not allow non-root
users to lower their nice values, and without a successful copy
of the copy section, we can get stale values that is inherted
from the uninitialized area of the process structure.

This commit tries to close the race condition by grabbing proc
mutex *before* we release allproc_lock xlock, and do copy as
well as zero immediately after the allproc_lock xunlock. This
guarantees that the new process would have its p_copy and p_zero
sections, as well as user credential informaion initialized. In
getpriority() case, instead of grabbing PROC_LOCK for a PRS_NEW
process, we just skip the process in question, because it does
not affect the final result of the call, as the p_nice value
would be copied from its parent, and we will see it during
allproc traverse.

Other potential solutions are still under evaluation.

Discussed with: davidxu, jhb, rwatson
PR: kern/108071
MFC after: 2 weeks


# 166828 19-Feb-2007 rwatson

Use priv_check(9) instead of suser(9) for checking the privilege to
set real-time priority on a thread. It looks like this suser(9)
call was introduced after my first pass through replacing superuser
checks with named privilege checks.


# 166073 17-Jan-2007 delphij

Use FOREACH_PROC_IN_SYSTEM instead of using its unrolled form.


# 164936 06-Dec-2006 julian

Threading cleanup.. part 2 of several.

Make part of John Birrell's KSE patch permanent..
Specifically, remove:
Any reference of the ksegrp structure. This feature was
never fully utilised and made things overly complicated.
All code in the scheduler that tried to make threaded programs
fair to unthreaded programs. Libpthread processes will already
do this to some extent and libthr processes already disable it.

Also:
Since this makes such a big change to the scheduler(s), take the opportunity
to rename some structures and elements that had to be moved anyhow.
This makes the code a lot more readable.

The ULE scheduler compiles again but I have no idea if it works.

The 4bsd scheduler still reqires a little cleaning and some functions that now do
ALMOST nothing will go away, but I thought I'd do that as a separate commit.

Tested by David Xu, and Dan Eischen using libthr and libpthread.


# 164431 20-Nov-2006 davidxu

Use scheduler API sched_user_prio() to adjust thread's userland priority,
use td_base_user_prio to get real userland priority since POSIX priority
mutex may adjust td_user_pri which is an effective priority.


# 164033 06-Nov-2006 rwatson

Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges. These may
require some future tweaking.

Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>


# 163709 26-Oct-2006 jb

Make KSE a kernel option, turned on by default in all GENERIC
kernel configs except sun4v (which doesn't process signals properly
with KSE).

Reviewed by: davidxu@


# 162497 21-Sep-2006 davidxu

Replace system call thr_getscheduler, thr_setscheduler, thr_setschedparam
with rtprio_thread, while rtprio system call is for process only, the new
system call rtprio_thread is responsible for LWP.


# 160964 04-Aug-2006 yar

Commit the results of the typo hunt by Darren Pilgrim.
This change affects documentation and comments only,
no real code involved.

PR: misc/101245
Submitted by: Darren Pilgrim <darren pilgrim bitfreak org>
Tested by: md5(1)
MFC after: 1 week


# 156570 11-Mar-2006 phk

Go over calcru and friends once more.

Reintroduce the monotonicity for the normal case and make the two
special cases behave in what is belived to be the most sensible fasion.


# 156484 09-Mar-2006 phk

Add slop to "backwards" cpu accounting messages, 3 usec or 1% whichever
triggers.

This should eliminate all the trivial messages which result from minor
increases in cpu_tick frequency.

Machines which don't du cpu clock fiddling shouldn't issue "backwards"
messages now.

Laptops and other machines where the initial estimate of cputicks may be
waaaay off will still issue warnings.


# 155916 22-Feb-2006 jhb

Various style and comment fixes.

Submitted by: bde


# 155882 21-Feb-2006 jhb

Split calcru() back into a calcru1() function shared with calccru() and
a calcru() wrapper that passes a local rusage_ext on the stack that is
a snapshot to do the calculations on. Now we can pass p->p_crux to
calcru1() in calccru() again which fixes the issues with runtime going
backwards messages when dead processes are harvested by init.

Reviewed by: phk
Tested by: Stefan Ehmann shoesoft at gmx dot net


# 155534 11-Feb-2006 phk

CPU time accounting speedup (step 2)

Keep accounting time (in per-cpu) cputicks and the statistics counts
in the thread and summarize into struct proc when at context switch.

Don't reach across CPUs in calcru().

Add code to calibrate the top speed of cpu_tickrate() for variable
cpu_tick hardware (like TSC on power managed machines).

Don't enforce monotonicity (at least for now) in calcru. While the
calibrated cpu_tickrate ramps up it may not be true.

Use 27MHz counter on i386/Geode.

Use TSC on amd64 & i386 if present.

Use tick counter on sparc64


# 155444 07-Feb-2006 phk

Modify the way we account for CPU time spent (step 1)

Keep track of time spent by the cpu in various contexts in units of
"cputicks" and scale to real-world microsec^H^H^H^H^H^H^H^Hclock_t
only when somebody wants to inspect the numbers.

For now "cputicks" are still derived from the current timecounter
and therefore things should by definition remain sensible also on
SMP machines. (The main reason for this first milestone commit is
to verify that hypothesis.)

On slower machines, the avoided multiplications to normalize timestams
at every context switch, comes out as a 5-7% better score on the
unixbench/context1 microbenchmark. On more modern hardware no change
in performance is seen.


# 154793 25-Jan-2006 ups

Back out changes made in rev. 1.151.
They were bogus.

Cluebat applied by: jhb@


# 154731 23-Jan-2006 ups

Hopefully fix the "calcru: runtime went backwards from ..." problem by
keeping the resource values locked (where needed) while we use them
for calculations.

MFC after: 3 days


# 151980 02-Nov-2005 ps

Calling setrlimit from 32bit apps could potentially increase certain
limits beyond what should be capiable in a 32bit process, so we
must fixup the limits.

Reviewed by: jhb


# 150633 27-Sep-2005 jhb

Use the reference count API to manage the reference counts for process
limit structures rather than using pool mutexes to protect the reference
counts.

Tested on: i386, alpha, sparc64


# 146879 01-Jun-2005 alc

Giant is no longer required in kern_setrlimit(); remove its acquisition and
release.

Reviewed by: jhb


# 139451 30-Dec-2004 jhb

Stop explicitly touching td_base_pri outside of the scheduler and simply
set a thread's priority via sched_prio() when that is the desired action.
The schedulers will start managing td_base_pri internally shortly.


# 136152 05-Oct-2004 jhb

Rework how we store process times in the kernel such that we always store
the raw values including for child process statistics and only compute the
system and user timevals on demand.

- Fix the various kern_wait() syscall wrappers to only pass in a rusage
pointer if they are going to use the result.
- Add a kern_getrusage() function for the ABI syscalls to use so that they
don't have to play stackgap games to call getrusage().
- Fix the svr4_sys_times() syscall to just call calcru() to calculate the
times it needs rather than calling getrusage() twice with associated
stackgap, etc.
- Add a new rusage_ext structure to store raw time stats such as tick counts
for user, system, and interrupt time as well as a bintime of the total
runtime. A new p_rux field in struct proc replaces the same inline fields
from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux
field in struct proc contains the "raw" child time usage statistics.
ruadd() has been changed to handle adding the associated rusage_ext
structures as well as the values in rusage. Effectively, the values in
rusage_ext replace the ru_utime and ru_stime values in struct rusage. These
two fields in struct rusage are no longer used in the kernel.
- calcru() has been split into a static worker function calcru1() that
calculates appropriate timevals for user and system time as well as updating
the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a
copy of the process' p_rux structure to compute the timevals after updating
the runtime appropriately if any of the threads in that process are
currently executing. It also now only locks sched_lock internally while
doing the rux_runtime fixup. calcru() now only requires the caller to
hold the proc lock and calcru1() only requires the proc lock internally.
calcru() also no longer allows callers to ask for an interrupt timeval
since none of them actually did.
- calcru() now correctly handles threads executing on other CPUs.
- A new calccru() function computes the child system and user timevals by
calling calcru1() on p_crux. Note that this means that any code that wants
child times must now call this function rather than reading from p_cru
directly. This function also requires the proc lock.
- This finishes the locking for rusage and friends so some of the Giant locks
in exit1() and kern_wait() are now gone.
- The locking in ttyinfo() has been tweaked so that a shared lock of the
proctree lock is used to protect the process group rather than the process
group lock. By holding this lock until the end of the function we now
ensure that the process/thread that we pick to dump info about will no
longer vanish while we are trying to output its info to the console.

Submitted by: bde (mostly)
MFC after: 1 month


# 135688 23-Sep-2004 jhb

A modest collection of various and sundry style, spelling, and whitespace
fixes.

Submitted by: bde (mostly)


# 135573 22-Sep-2004 jhb

Various small style fixes.


# 133233 06-Aug-2004 rwatson

Push UIDINFO_UNLOCK() slightly earlier in chgsbize(), as it's not
needed if we print the local variable version of the limit rather
than the shared version.


# 133126 04-Aug-2004 rwatson

Remove spl's from kern_resource.c.


# 132653 26-Jul-2004 cperciva

Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is
somewhat clearer, but more importantly allows for a consistent naming
scheme for suser_cred flags.

The old name is still defined, but will be removed in a few days (unless I
hear any complaints...)

Discussed with: rwatson, scottl
Requested by: jhb


# 130858 21-Jun-2004 bde

Turned off the "calcru: negative time" warning for certain SMP cases
where it is known to detect a problem but the problem is not very easy
to fix. The warning became very common recently after a call to calcru()
was added to fill_kinfo_thread().

Another (much older) cause of "negative times" (actually non-monotonic
times) was fixed in rev.1.237 of kern_exit.c.

Print separate messages for non-monotonic and negative times.


# 130551 15-Jun-2004 julian

Nice, is a property of a process as a whole..
I mistakenly moved it to the ksegroup when breaking up the process
structure. Put it back in the proc structure.


# 130344 11-Jun-2004 phk

Deorbit COMPAT_SUNOS.

We inherited this from the sparc32 port of BSD4.4-Lite1. We have neither
a sparc32 port nor a SunOS4.x compatibility desire these days.


# 129050 08-May-2004 julian

Fix rtprio() to do sensible things when called from threaded processes.
It's not quite correct from a posix Point Of view, but it is a lot better
than what was there before. This will be revisited later
when we decide what form our priority extensions will take. Posix doesn't
specify how a system scope thread can change its priority so you need to
add non-standard extensions to be able to do it..
For now make this slightly non standard to allow it to be done.

Submitted by: Dan Eischen originally, changed by myself.


# 128088 10-Apr-2004 mux

Remove a comment that complains about the lack of %qd, to justify
truncating a rlim_t to a long. We have %qd since some time now.
However, the correct format to use here is %jd and a cast to
intmax_t, so do this.


# 127911 05-Apr-2004 imp

Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core


# 125712 11-Feb-2004 jhb

Argh! Fix a bogon. lim_cur() was returning the hard (max) limit rather
than the soft (cur) limit.

Submitted by: bde


# 125525 06-Feb-2004 jhb

- Convert the plimit lock to a pool mutex lock.
- Hide struct plimit from userland.

Submitted by: bde (2)


# 125524 06-Feb-2004 jhb

- Correct the translation of old rlimit values to properly handle the old
RLIM_INFINITY case for ogetrlimit().
- Use %jd and intmax_t to output negative time in usec in calcru().
- Rework getrusage() to make a copy of the rusage struct into a local
variable while holding Giant and then do the copyout from the local
variable to avoid having to have the original process rusage struct
locked while doing the copyout (which would not be safe). This also
includes a few style fixes from Bruce to getrusage().

Submitted by: bde (1, parts of 3)
Suggested by: bde (2)


# 125523 06-Feb-2004 jhb

A few more style fixes from Bruce including a few I missed last time.

Submitted by: bde


# 125495 05-Feb-2004 jhb

- A lot of style and whitespace fixes.
- Update a few comments regarding locking notes.

Submitted by: bde (1, mostly)


# 125454 04-Feb-2004 jhb

Locking for the per-process resource limits structure.
- struct plimit includes a mutex to protect a reference count. The plimit
structure is treated similarly to struct ucred in that is is always copy
on write, so having a reference to a structure is sufficient to read from
it without needing a further lock.
- The proc lock protects the p_limit pointer and must be held while reading
limits from a process to keep the limit structure from changing out from
under you while reading from it.
- Various global limits that are ints are not protected by a lock since
int writes are atomic on all the archs we support and thus a lock
wouldn't buy us anything.
- All accesses to individual resource limits from a process are abstracted
behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return
either an rlimit, or the current or max individual limit of the specified
resource from a process.
- dosetrlimit() was renamed to kern_setrlimit() to match existing style of
other similar syscall helper functions.
- The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit()
(it didn't used the stackgap when it should have) but uses lim_rlimit()
and kern_setrlimit() instead.
- The svr4 compat no longer uses the stackgap for resource limits calls,
but uses lim_rlimit() and kern_setrlimit() instead.
- The ibcs2 compat no longer uses the stackgap for resource limits. It
also no longer uses the stackgap for accessing sysctl's for the
ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result,
ibcs2_sysconf() no longer needs Giant.
- The p_rlimit macro no longer exists.

Submitted by: mtm (mostly, I only did a few cleanups and catchups)
Tested on: i386
Compiled on: alpha, amd64


# 121608 27-Oct-2003 jeff

- Don't set td_priority directly here, use sched_prio().


# 117494 12-Jul-2003 truckman

Extend the mutex pool implementation to permit the creation and use of
multiple mutex pools with different options and sizes. Mutex pools can
be created with either the default sleep mutexes or with spin mutexes.
A dynamically created mutex pool can now be destroyed if it is no longer
needed.

Create two pools by default, one that matches the existing pool that
uses the MTX_NOWITNESS option that should be used for building higher
level locks, and a new pool with witness checking enabled.

Modify the users of the existing mutex pool to use the appropriate pool
in the new implementation.

Reviewed by: jhb


# 116182 10-Jun-2003 obrien

Use __FBSDID().


# 113921 23-Apr-2003 jhb

Remove Giant from [gs]etpriority().


# 113872 22-Apr-2003 jhb

Lock both the proc lock and sched_lock when calling sched_nice since
kg_nice is now protected by both. Being protected by both means that
other places in the kernel that want to read kg_nice only need one of the
two locks.


# 113684 18-Apr-2003 jhb

Add a couple of sched_lock asserts.


# 113355 11-Apr-2003 jeff

- Adjust sched hooks for fork and exec to take processes as arguments instead
of ksegs since they primarily operation on processes.
- KSEs take ticks so pass the kse through sched_clock().
- Add a sched_class() routine that adjusts a ksegrp pri class.
- Define a sched_fork_{kse,thread,ksegrp} and sched_exit_{kse,thread,ksegrp}
that will be used to tell the scheduler about new instances of these
structures within the same process. These will be used by THR and KSE.
- Change sched_4bsd to reflect this API update.


# 112169 12-Mar-2003 tjr

Back out previous. The locking here needs a rethink.


# 112140 12-Mar-2003 tjr

Acquire sched_lock around use of FOREACH_KSEGRP_IN_PROC, accesses
to kg_nice and calls to sched_nice() in getpriority() and setpriority()
(really donice()).


# 111163 20-Feb-2003 tjr

Remove the PL_SHAREMOD flag from struct plimit, which could have been
used to share resource limits between rfork threads, but never was.
Removing it makes resource limit locking much simpler -- only the current
process can change the contents of the structure that p_limit points to.


# 111119 19-Feb-2003 imp

Back out M_* changes, per decision of the TRB.

Approved by: trb


# 111024 17-Feb-2003 jeff

- Move ke_sticks, ke_iticks, ke_uticks, ke_uu, ke_su, and ke_iu back into
the proc. These counters are only examined through calcru.

Submitted by: davidxu
Tested on: x86, alpha, UP/SMP


# 110787 13-Feb-2003 tjr

Add an XXX comment noting that getrusage() accesses p_stats->p_ru
and p_stats->p_cru without holding the appropriate locks.


# 110190 01-Feb-2003 julian

Reversion of commit by Davidxu plus fixes since applied.

I'm not convinced there is anything major wrong with the patch but
them's the rules..

I am using my "David's mentor" hat to revert this as he's
offline for a while.


# 109877 26-Jan-2003 davidxu

Move UPCALL related data structure out of kse, introduce a new
data structure called kse_upcall to manage UPCALL. All KSE binding
and loaning code are gone.

A thread owns an upcall can collect all completed syscall contexts in
its ksegrp, turn itself into UPCALL mode, and takes those contexts back
to userland. Any thread without upcall structure has to export their
contexts and exit at user boundary.

Any thread running in user mode owns an upcall structure, when it enters
kernel, if the kse mailbox's current thread pointer is not NULL, then
when the thread is blocked in kernel, a new UPCALL thread is created and
the upcall structure is transfered to the new UPCALL thread. if the kse
mailbox's current thread pointer is NULL, then when a thread is blocked
in kernel, no UPCALL thread will be created.

Each upcall always has an owner thread. Userland can remove an upcall by
calling kse_exit, when all upcalls in ksegrp are removed, the group is
atomatically shutdown. An upcall owner thread also exits when process is
in exiting state. when an owner thread exits, the upcall it owns is also
removed.

KSE is a pure scheduler entity. it represents a virtual cpu. when a thread
is running, it always has a KSE associated with it. scheduler is free to
assign a KSE to thread according thread priority, if thread priority is changed,
KSE can be moved from one thread to another.

When a ksegrp is created, there is always N KSEs created in the group. the
N is the number of physical cpu in the current system. This makes it is
possible that even an userland UTS is single CPU safe, threads in kernel still
can execute on different cpu in parallel. Userland calls kse_create to add more
upcall structures into ksegrp to increase concurrent in userland itself, kernel
is not restricted by number of upcalls userland provides.

The code hasn't been tested under SMP by author due to lack of hardware.

Reviewed by: julian


# 109623 21-Jan-2003 alfred

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


# 104964 12-Oct-2002 jeff

- Create a new scheduler api that is defined in sys/sched.h
- Begin moving scheduler specific functionality into sched_4bsd.c
- Replace direct manipulation of scheduler data with hooks provided by the
new api.
- Remove KSE specific state modifications and single runq assumptions from
kern_switch.c

Reviewed by: -arch


# 104719 09-Oct-2002 jhb

- Move p_cpulimit to struct proc from struct plimit and protect it with
sched_lock. This means that we no longer access p_limit in mi_switch()
and the p_limit pointer can be protected by the proc lock.
- Remove PRS_ZOMBIE check from CPU limit test in mi_switch(). PRS_ZOMBIE
processes don't call mi_switch(), and even if they did there is no longer
the danger of p_limit being NULL (which is what the original zombie check
was added for).
- When we bump the current processes soft CPU limit in ast(), just bump the
private p_cpulimit instead of the shared rlimit. This fixes an XXX for
some value of fix. There is still a (probably benign) bug in that this
code doesn't check that the new soft limit exceeds the hard limit.

Inspired by: bde (2)


# 104239 30-Sep-2002 jhb

Change p_cpulimit to be in seconds instead of microseconds. Since
p_runtime now is a bintime, it is no longer an optimization to store
p_cpulimit as microseconds.

Suggested by: phk


# 103767 21-Sep-2002 jake

Use the fields in the sysentvec and in the vm map header in place of the
constants VM_MIN_ADDRESS, VM_MAXUSER_ADDRESS, USRSTACK and PS_STRINGS.
This is mainly so that they can be variable even for the native abi, based
on different machine types. Get stack protections from the sysentvec too.
This makes it trivial to map the stack non-executable for certain abis, on
machines that support it.


# 103367 15-Sep-2002 julian

Allocate KSEs and KSEGRPs separatly and remove them from the proc structure.
next step is to allow > 1 to be allocated per process. This would give
multi-processor threads. (when the rest of the infrastructure is
in place)

While doing this I noticed libkvm and sys/kern/kern_proc.c:fill_kinfo_proc
are diverging more than they should.. corrective action needed soon.


# 100591 24-Jul-2002 jdp

Widen struct sockbuf's sb_timeo member to int from short. With
non-default but reasonable values of hz this member overflowed,
breaking NFS over UDP.

Also, as long as I'm plowing up struct sockbuf ... Change certain
members from u_long/long to u_int/int in order to reduce wasted
space on 64-bit machines. This change was requested by Andrew
Gallatin.

Netstat and systat need to be rebuilt. I am incrementing
__FreeBSD_version in case any ports need to change.


# 99012 29-Jun-2002 alfred

more caddr_t removal.


# 96886 18-May-2002 jhb

Change p_can{debug,see,sched,signal}()'s first argument to be a thread
pointer instead of a proc pointer and require the process pointed to
by the second argument to be locked. We now use the thread ucred reference
for the credential checks in p_can*() as a result. p_canfoo() should now
no longer need Giant.


# 94861 16-Apr-2002 jhb

Lock proctree_lock instead of pgrpsess_lock.


# 94625 13-Apr-2002 jhb

- Change donice() to take a thread as the first argument instead of a
process so it can use td_ucred.
- Require the target process of donice() to be locked when donice() is
called.
- Use td_ucred.
- Lock the target process of p_cansee() and while reading the credentials
of a process.
- Change the logic of rtprio() slightly so it does it's copyin() if needed
prior to locking the target process.
- rtprio() no longer needs Giant. In theory with full KSE it would still
need Giant to protect p_ucred of curproc for the p_canfoo() functions
but p_canfoo() will be changing to using td_ucred of curthread before
full KSE hits the tree.


# 93818 04-Apr-2002 jhb

Change callers of mtx_init() to pass in an appropriate lock type name. In
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.

Tested on: i386, alpha, sparc64


# 93593 01-Apr-2002 jhb

Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API. The entire API now consists of two functions
similar to the pre-KSE API. The suser() function takes a thread pointer
as its only argument. The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0. The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on: smp@


# 92723 19-Mar-2002 alfred

Remove __P.


# 91292 26-Feb-2002 phk

Cast the variable, not the constant to 64 bits.


# 91140 23-Feb-2002 tanimura

Lock struct pgrp, session and sigio.

New locks are:

- pgrpsess_lock which locks the whole pgrps and sessions,
- pg_mtx which protects the pgrp members, and
- s_mtx which protects the session members.

Please refer to sys/proc.h for the coverage of these locks.

Changes on the pgrp/session interface:

- pgfind() needs the pgrpsess_lock held.

- The caller of enterpgrp() is responsible to allocate a new pgrp and
session.

- Call enterthispgrp() in order to enter an existing pgrp.

- pgsignal() requires a pgrp lock held.

Reviewed by: jhb, alfred
Tested on: cvsup.jp.FreeBSD.org
(which is a quad-CPU machine running -current)


# 91066 22-Feb-2002 phk

Convert p->p_runtime and PCPU(switchtime) to bintime format.


# 90538 11-Feb-2002 julian

In a threaded world, differnt priorirites become properties of
different entities. Make it so.

Reviewed by: jhb@freebsd.org (john baldwin)


# 89594 20-Jan-2002 alfred

use mutex pool mutexes for uidinfo locking.
replace mutex_lock calls on uidinfo with macro calls:
mtx_lock(&uidp->ui_mtx) -> UIDINFO_LOCK(uidp)

Terry Lambert <tlambert2@mindspring.com> helped with this.


# 86036 04-Nov-2001 peter

Forced commit:
Fix breakage in previous time_t related commit:
quad_t is not a 'long long' - don't use that for casting for printf %lld
(it is plain "long" on our 64 bit platforms).

Also, These timestamps are relative to boot time, not 1970. We dont need
to slow things down here with synthetic types on i386 since we have
other wraparounds that occur *long* before 68 years of uptime.


# 86034 04-Nov-2001 peter

*** empty log message ***


# 85844 01-Nov-2001 rwatson

o Move suser() calls in kern/ to using suser_xxx() with an explicit
credential selection, rather than reference via a thread or process
pointer. This is part of a gradual migration to suser() accepting
a struct ucred instead of a struct proc, simplifying the reference
and locking semantics of suser().

Obtained from: TrustedBSD Project


# 85644 28-Oct-2001 dillon

Adjust printfs to be time_t agnostic.


# 84783 10-Oct-2001 ps

Make MAXTSIZ, DFLDSIZ, MAXDSIZ, DFLSSIZ, MAXSSIZ, SGROWSIZ loader
tunable.

Reviewed by: peter
MFC after: 2 weeks


# 83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 82776 01-Sep-2001 jhb

Use sched_lock to protect rtp_to_pri() and pri_to_rtp() when needed.


# 82749 01-Sep-2001 dillon

Giant Pushdown. Saved the worst P4 tree breakage for last.

reboot() getpriority() setpriority() rtprio() osetrlimit() ogetrlimit()
setrlimit() getrlimit() getrusage() getpid() getppid() getpgrp()
getpgid() getsid() getgid() getegid() getgroups() setsid() setpgid()
setuid() seteuid() setgid() setegid() setgroups() setreuid() setregid()
setresuid() setresgid() getresuid() getresgid () __setugid() getlogin()
setlogin() modnext() modfnext() modstat() modfind() kldload() kldunload()
kldfind() kldnext() kldstat() kldfirstmod() kldsym() getdtablesize()
dup2() dup() fcntl() close() ofstat() fstat() nfsstat() fpathconf()
flock()


# 80116 21-Jul-2001 assar

add prototype for dosetrlimit


# 79335 05-Jul-2001 rwatson

o Replace calls to p_can(..., P_CAN_xxx) with calls to p_canxxx().
The p_can(...) construct was a premature (and, it turns out,
awkward) abstraction. The individual calls to p_canxxx() better
reflect differences between the inter-process authorization checks,
such as differing checks based on the type of signal. This has
a side effect of improving code readability.
o Replace direct credential authorization checks in ktrace() with
invocation of p_candebug(), while maintaining the special case
check of KTR_ROOT. This allows ktrace() to "play more nicely"
with new mandatory access control schemes, as well as making its
authorization checks consistent with other "debugging class"
checks.
o Eliminate "privused" construct for p_can*() calls which allowed the
caller to determine if privilege was required for successful
evaluation of the access control check. This primitive is currently
unused, and as such, serves only to complicate the API.

Approved by: ({procfs,linprocfs} changes) des
Obtained from: TrustedBSD Project


# 79224 04-Jul-2001 dillon

With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage). Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.


# 76827 18-May-2001 alfred

Introduce a global lock for the vm subsystem (vm_mtx).

vm_mtx does not recurse and is required for most low level
vm operations.

faults can not be taken without holding Giant.

Memory subsystems can now call the base page allocators safely.

Almost all atomic ops were removed as they are covered under the
vm mutex.

Alpha and ia64 now need to catch up to i386's trap handlers.

FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).

Reviewed (partially) by: jake, jhb


# 76141 29-Apr-2001 jake

Make rtprio work again.
- add a missing break which caused RTP_SET to always return EINVAL
- break instead of returning if p_can fails so proc_lock is always
dropped correctly
- only copyin data that is actually needed
- use break instead of goto
- make rtp_to_pri return EINVAL instead of -1 if the values are out
or range so we don't have to translate


# 75893 23-Apr-2001 jhb

Change the pfind() and zpfind() functions to lock the process that they
find before releasing the allproc lock and returning.

Reviewed by: -smp, dfr, jake


# 75450 12-Apr-2001 rwatson

o Limit process information leakage by introducing a p_can(...P_CAN_SEE...)
in rtprio()'s RTP_LOOKIP implementation.

Obtained from: TrustedBSD Project


# 74927 28-Mar-2001 jhb

Convert the allproc and proctree locks from lockmgr locks to sx locks.


# 74914 28-Mar-2001 jhb

Catch up to header include changes:
- <sys/mutex.h> now requires <sys/systm.h>
- <sys/mutex.h> and <sys/sx.h> now require <sys/lock.h>


# 74043 09-Mar-2001 alfred

Don't call malloc with M_WAITOK while holding a mutex.


# 72920 22-Feb-2001 tegge

Backout previous commit. sched_lock is held, thus interrupts are prevented
here.

Submitted by: jhb


# 72919 22-Feb-2001 tegge

Protect update of the per processor switchtime variable against
interrupts.

Protect usage of the per processor switchtime variable against
interrupts in calcru().

This seem to eliminate the "microuptime() went backwards" warnings.


# 72780 20-Feb-2001 tegge

Ensure that RLIMIT_NPROC limits are at least 1 to avoid bad interaction
with chgproccnt. MFC candiate.

Reviewed by: alfred


# 72376 11-Feb-2001 jake

Implement a unified run queue and adjust priority levels accordingly.

- All processes go into the same array of queues, with different
scheduling classes using different portions of the array. This
allows user processes to have their priorities propogated up into
interrupt thread range if need be.
- I chose 64 run queues as an arbitrary number that is greater than
32. We used to have 4 separate arrays of 32 queues each, so this
may not be optimal. The new run queue code was written with this
in mind; changing the number of run queues only requires changing
constants in runq.h and adjusting the priority levels.
- The new run queue code takes the run queue as a parameter. This
is intended to be used to create per-cpu run queues. Implement
wrappers for compatibility with the old interface which pass in
the global run queue structure.
- Group the priority level, user priority, native priority (before
propogation) and the scheduling class into a struct priority.
- Change any hard coded priority levels that I found to use
symbolic constants (TTIPRI and TTOPRI).
- Remove the curpriority global variable and use that of curproc.
This was used to detect when a process' priority had lowered and
it should yield. We now effectively yield on every interrupt.
- Activate propogate_priority(). It should now have the desired
effect without needing to also propogate the scheduling class.
- Temporarily comment out the call to vm_page_zero_idle() in the
idle loop. It interfered with propogate_priority() because
the idle process needed to do a non-blocking acquire of Giant
and then other processes would try to propogate their priority
onto it. The idle process should not do anything except idle.
vm_page_zero_idle() will return in the form of an idle priority
kernel thread which is woken up at apprioriate times by the vm
system.
- Update struct kinfo_proc to the new priority interface. Deliberately
change its size by adjusting the spare fields. It remained the same
size, but the layout has changed, so userland processes that use it
would parse the data incorrectly. The size constraint should really
be changed to an arbitrary version number. Also add a debug.sizeof
sysctl node for struct kinfo_proc.


# 72200 09-Feb-2001 bmilekic

Change and clean the mutex lock interface.

mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)


# 71562 24-Jan-2001 jhb

- Add a mtx_assert() for sched_lock in calcru().
- Protect calcru() with sched_lock later on in the file when it is called.


# 70861 10-Jan-2001 jake

Use PCPU_GET, PCPU_PTR and PCPU_SET to access all per-cpu variables
other then curproc.


# 69947 12-Dec-2000 jake

- Change the allproc_lock to use a macro, ALLPROC_LOCK(how), instead
of explicit calls to lockmgr. Also provides macros for the flags
pased to specify shared, exclusive or release which map to the
lockmgr flags. This is so that the use of lockmgr can be easily
replaced with optimized reader-writer locks.
- Add some locking that I missed the first time.


# 69444 01-Dec-2000 alfred

Translate alfred to english.

Submitted by: bde


# 69403 30-Nov-2000 alfred

use a oppurtunistic locking strategy with the uidinfo structures to avoid
locking the global hash on each uifree()

make struct uidinfo only visible to the kernel

make uihold() a function rather than a macro to reduce bloat

swap the order of a spl/mutex to maintain consistancy


# 69206 26-Nov-2000 alfred

Make uidinfo subsystem mpsafe
use a mutex lock when looking up/deleting entries on the hashlist
use a mutex lock on each uidinfo when updating fields

make uifree() a void function rather than 'int' since no one cares

allocate uidinfo structs with the M_ZERO flag and don't explicitly initialize
them

Assisted by: eivind, jhb, jakeb


# 69022 22-Nov-2000 jake

Protect the following with a lockmgr lock:

allproc
zombproc
pidhashtbl
proc.p_list
proc.p_hash
nextpid

Reviewed by: jhb
Obtained from: BSD/OS and netbsd


# 66035 18-Sep-2000 ps

Add new line character to debugging printf's.


# 65557 06-Sep-2000 jasone

Major update to the way synchronization is done in the kernel. Highlights
include:

* Mutual exclusion is used instead of spl*(). See mutex(9). (Note: The
alpha port is still in transition and currently uses both.)

* Per-CPU idle processes.

* Interrupts are run in their own separate kernel threads and can be
preempted (i386 only).

Partially contributed by: BSDi (BSD/OS)
Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh


# 65536 06-Sep-2000 truckman

Change the calls to panic() in uifree(), chgproccnt(), and chgsbsize()
to printf(). Any errors detected are not likely to be fatal, so it
should be safe to let things keep running.


# 65495 05-Sep-2000 truckman

Remove uidinfo hash table lookup and maintenance out of chgproccnt() and
chgsbsize(), which are called rather frequently and may be called from an
interrupt context in the case of chgsbsize(). Instead, do the hash table
lookup and maintenance when credentials are changed, which is a lot less
frequent. Add pointers to the uidinfo structures to the ucred and pcred
structures for fast access. Pass a pointer to the credential to chgproccnt()
and chgsbsize() instead of passing the uid. Add a reference count to the
uidinfo structure and use it to decide when to free the structure rather
than freeing the structure when the resource consumption drops to zero.
Move the resource tracking code from kern_proc.c to kern_resource.c. Move
some duplicate code sequences in kern_prot.c to separate helper functions.
Change KASSERTs in this code to unconditional tests and calls to panic().


# 65237 30-Aug-2000 rwatson

o Centralize inter-process access control, introducing:

int p_can(p1, p2, operation, privused)

which allows specification of subject process, object process,
inter-process operation, and an optional call-by-reference privused
flag, allowing the caller to determine if privilege was required
for the call to succeed. This allows jail, kern.ps_showallprocs and
regular credential-based interaction checks to occur in one block of
code. Possible operations are P_CAN_SEE, P_CAN_SCHED, P_CAN_KILL,
and P_CAN_DEBUG. p_can currently breaks out as a wrapper to a
series of static function checks in kern_prot, which should not
be invoked directly.

o Commented out capabilities entries are included for some checks.

o Update most inter-process authorization to make use of p_can() instead
of manual checks, PRISON_CHECK(), P_TRESPASS(), and
kern.ps_showallprocs.

o Modify suser{,_xxx} to use const arguments, as it no longer modifies
process flags due to the disabling of ASU.

o Modify some checks/errors in procfs so that ENOENT is returned instead
of ESRCH, further improving concealment of processes that should not
be visible to other processes. Also introduce new access checks to
improve hiding of processes for procfs_lookup(), procfs_getattr(),
procfs_readdir(). Correct a bug reported by bp concerning not
handling the CREATE case in procfs_lookup(). Remove volatile flag in
procfs that caused apparently spurious qualifier warnigns (approved by
bde).

o Add comment noting that ktrace() has not been updated, as its access
control checks are different from ptrace(), whereas they should
probably be the same. Further discussion should happen on this topic.

Reviewed by: bde, green, phk, freebsd-security, others
Approved by: bde
Obtained from: TrustedBSD Project


# 65038 24-Aug-2000 green

Revert the suser -> suser_xxx change made previously. It was right
before.


# 64736 16-Aug-2000 green

Fix a couple cases where p_trespass wasn't transitioned into place.

Make RTP_SET (rtprio) only accessible to real root, not root in jails.


# 61507 10-Jun-2000 phk

fix a typo


# 61235 04-Jun-2000 rwatson

o Modify jail to limit creation of sockets to UNIX domain sockets,
TCP/IP (v4) sockets, and routing sockets. Previously, interaction
with IPv6 was not well-defined, and might be inappropriate for some
environments. Similarly, sysctl MIB entries providing interface
information also give out only addresses from those protocol domains.

For the time being, this functionality is enabled by default, and
toggleable using the sysctl variable jail.socket_unixiproute_only.
In the future, protocol domains will be able to determine whether or
not they are ``jail aware''.

o Further limitations on process use of getpriority() and setpriority()
by jailed processes. Addresses problem described in kern/17878.

Reviewed by: phk, jmg


# 57228 15-Feb-2000 phk

Don't try to account for the partial quantum unless the process is
curproc. This only makes any difference on SMP, where we used a
(potentially very bogus) switchtime from our own CPU to calculate
resource usage on another CPU.

This should remove some if not all calcru() related warnings on SMP.

Approved by: jkh


# 56766 28-Jan-2000 green

Fix a bug that could crash the system if you press ^T while a slower
system is slowed down and in the right spot (a race condition in fork()).

The "previous time" fields have moved from pstat to proc. Anything which
uses KVM needs to be recompiled with a new libkvm/headers.

A couple wacky u_quad_t's in struct proc are now u_int64_t (the same, but
according to lack of 'quad's in proc.h and usage in kern_resource.c).
This will have no effect on code.

This has been make-world-and-installed-new-kernel-which-works-fine-tested.

Reviewed by: bde (previous version)


# 53880 29-Nov-1999 phk

Add a bit of sanity checking and problem avoidance in case the
timecounter hardware is bogus.

This will produce a new warning "microuptime() went backwards"
and try to not screw up the process resource accounting.


# 53212 16-Nov-1999 phk

This is a partial commit of the patch from PR 14914:

Alot of the code in sys/kern directly accesses the *Q_HEAD and *Q_ENTRY
structures for list operations. This patch makes all list operations
in sys/kern use the queue(3) macros, rather than directly accessing the
*Q_{HEAD,ENTRY} structures.

This batch of changes compile to the same object files.

Reviewed by: phk
Submitted by: Jake Burkholder <jake@checker.org>
PR: 14914


# 52635 29-Oct-1999 phk

useracc() the prequel:

Merge the contents (less some trivial bordering the silly comments)
of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts
the #defines for the vm_inherit_t and vm_prot_t types next to their
typedefs.

This paves the road for the commit to follow shortly: change
useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE}
as argument.


# 52128 11-Oct-1999 peter

Trim unused options (or #ifdef for undoc options).

Submitted by: phk


# 50477 27-Aug-1999 peter

$Id$ -> $FreeBSD$


# 46155 28-Apr-1999 phk

This Implements the mumbled about "Jail" feature.

This is a seriously beefed up chroot kind of thing. The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.

For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact: "real virtual servers".

Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.

Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.

It generally does what one would expect, but setting up a jail
still takes a little knowledge.

A few notes:

I have no scripts for setting up a jail, don't ask me for them.

The IP number should be an alias on one of the interfaces.

mount a /proc in each jail, it will make ps more useable.

/proc/<pid>/status tells the hostname of the prison for
jailed processes.

Quotas are only sensible if you have a mountpoint per prison.

There are no privisions for stopping resource-hogging.

Some "#ifdef INET" and similar may be missing (send patches!)

If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!

Tools, comments, patches & documentation most welcome.

Have fun...

Sponsored by: http://www.rndassociates.com/
Run for almost a year by: http://www.servetheweb.com/


# 46116 27-Apr-1999 phk

Change suser_xxx() to suser() where it applies.


# 46112 27-Apr-1999 phk

Suser() simplification:

1:
s/suser/suser_xxx/

2:
Add new function: suser(struct proc *), prototyped in <sys/proc.h>.

3:
s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/

The remaining suser_xxx() calls will be scrutinized and dealt with
later.

There may be some unneeded #include <sys/cred.h>, but they are left
as an exercise for Bruce.

More changes to the suser() API will come along with the "jail" code.


# 44725 13-Mar-1999 bde

Enforce monotonicity of apparent process user, system and interrupt times.

PR: 975, 10402


# 44673 11-Mar-1999 bde

Fixed runtime accounting. The time since the previous context switch
was discarded on every call to calcru(). Hacking on the `switchtime'
global for a related fix in rev.1.38 of kern_resource.c was too fragile
and broke when p_switchtime went away.

PR: 10402


# 44487 05-Mar-1999 bde

The magic "no-cpu" cpu number is 0xff. Don't misrepresent cpu
numbers as chars or use bogus casts in an attempt to unmisrepresnt
them. In top, don't assume that 0xff is the only negative cpu
number when cpu numbers are (mis)represented.


# 44327 28-Feb-1999 bde

Removed all traces of `p_switchtime'. The relevant timestamp is per-cpu,
not per-process. Keep it in `switchtime' consistently.

It is now clear that the timestamp is always valid in fork_trampoline()
except when the child is running on a previously idle cpu, which
can only happen if there are multiple cpus, so don't check or set
the timestamp in fork_trampoline except in the (i386) SMP case.
Just remove the alpha code for setting it unconditionally, since
there is no SMP case for alpha and the code had rotted.

Parts reviewed by: dfr, phk


# 44256 25-Feb-1999 bde

Don't forget to update `switchticks' in corner cases (except for
the alpha fork_trampoline(), forget it because it I believe it is
only necessary for the unsupported SMP case).


# 43448 31-Jan-1999 newton

Added comments about non-staticization so it doesn't get un-done next
time someone goes on a staticization binge.

Suggested by: eivind


# 43408 30-Jan-1999 newton

Unstaticized routines which are needed by the svr4 KLD and the streams
garbage needed to support SysVR4 networking.


# 37894 27-Jul-1998 bde

Fixed double counting of runtime after a process exits. The last
timeslice of the exiting process was counted for both the exiting
process and the next process to run if the next process runs
immediately.

Broken in: mostly in kern_clock.c rev.1.70 (1998/05/28)


# 36441 28-May-1998 phk

Some cleanups related to timecounters and weird ifdefs in <sys/time.h>.

Clean up (or if antipodic: down) some of the msgbuf stuff.

Use an inline function rather than a macro for timecounter delta.

Maintain process "on-cpu" time as 64 bits of microseconds to avoid
needless second rollover overhead.

Avoid calling microuptime the second time in mi_switch() if we do
not pass through _idle in cpu_switch()

This should reduce our context-switch overhead a bit, in particular
on pre-P5 and SMP systems.

WARNING: Programs which muck about with struct proc in userland
will have to be fixed.

Reviewed, but found imperfect by: bde


# 36119 17-May-1998 phk

s/nanoruntime/nanouptime/g
s/microruntime/microuptime/g

Reviewed by: bde


# 35036 05-Apr-1998 peter

Fix previous commit. Don't people read compiler messages or something??


# 35029 04-Apr-1998 phk

Time changes mark 2:

* Figure out UTC relative to boottime. Four new functions provide
time relative to boottime.

* move "runtime" into struct proc. This helps fix the calcru()
problem in SMP.

* kill mono_time.

* add timespec{add|sub|cmp} macros to time.h. (XXX: These may change!)

* nanosleep, select & poll takes long sleeps one day at a time

Reviewed by: bde
Tested by: ache and others


# 34029 04-Mar-1998 dufault

Reviewed by: msmith, bde long ago
Fix for RTPRIO scheduler to eliminate invalid context switches.

POSIX.4 headers and sysctl variables. Nothing should change
unless POSIX4 is defined or _POSIX_VERSION is set to 199309.


# 33181 09-Feb-1998 eivind

Staticize.


# 33102 04-Feb-1998 dg

Restrict idleprio to superuser:

Realtime priority has to be restricted for reasons which should be
obvious. However, for idle priority, there is a potential for
system deadlock if an idleprio process gains a lock on a resource
that other processes need (and the idleprio process can't run
due to a CPU-bound normal process). Fix me! XXX
PR: 5639


# 32618 19-Jan-1998 bde

Set p_retval for the correct process in getpriority(). This fixes
a null pointer panic when the pointer for the incorrect process is
NULL. getpriority() was broken in rev.1.27. Rev.1.28 broke the
warning instead of fixing the problem.

PR: 5495


# 31778 16-Dec-1997 eivind

Make COMPAT_43 and COMPAT_SUNOS new-style options.


# 31016 07-Nov-1997 phk

Remove a bunch of variables which were unused both in GENERIC and LINT.

Found by: -Wunused


# 30994 06-Nov-1997 phk

Move the "retval" (3rd) parameter from all syscall functions and put
it in struct proc instead.

This fixes a boatload of compiler warning, and removes a lot of cruft
from the sources.

I have not removed the /*ARGSUSED*/, they will require some looking at.

libkvm, ps and other userland struct proc frobbing programs will need
recompiled.


# 28768 25-Aug-1997 bde

Print more info in the "calcru: negative time" message.


# 25164 26-Apr-1997 peter

Man the liferafts! Here comes the long awaited SMP -> -current merge!

There are various options documented in i386/conf/LINT, there is more to
come over the next few days.

The kernel should run pretty much "as before" without the options to
activate SMP mode.

There are a handful of known "loose ends" that need to be fixed, but
have been put off since the SMP kernel is in a moderately good condition
at the moment.

This commit is the result of the tinkering and testing over the last 14
months by many people. A special thanks to Steve Passe for implementing
the APIC code!


# 22975 22-Feb-1997 peter

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 22521 10-Feb-1997 dyson

This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.

Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>


# 21673 14-Jan-1997 jkh

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# 20821 22-Dec-1996 joerg

Make DFLDSIZ and MAXDSIZ fully-supported options.

"Don't forget to do a ``make depend''" :-)


# 16222 08-Jun-1996 bde

Fixed accumulation of run time for processes that don't accumulate
any statclock ticks. Pretend that all the time up to the first
statclock tick is system time. . This makes a difference mainly for
benchmarks that test short-lived processes - the user and system
times for processes that each lived for about 1ms only added up to
about 10% of the real time even when there was very little interrupt
activity.

Break the printing of a quad_t variable correctly.


# 14528 11-Mar-1996 hsu

From Lite2: proc LIST changes
stylistic changes to function prototypes
Reviewed by: david & bde


# 13467 16-Jan-1996 phk

Fix a printf, well, actually break it, that is...
We don't have the ability to print 64bit things yet...


# 12662 07-Dec-1995 dg

Untangled the vm.h include file spaghetti.


# 12221 12-Nov-1995 bde

Included <sys/sysproto.h> to get central declarations for syscall args
structs and prototypes for syscalls.

Ifdefed duplicated decentralized declarations of args structs. It's
convenient to have this visible but they are hard to maintain. Some
are already different from the central declarations. 4.4lite2 puts
them in comments in the function headers but I wanted to avoid the
large changes for that.


# 12201 10-Nov-1995 bde

Fixed types of rtprio(), osetrlimit() and setrlimit(). The args struct
tag and/or member names conflicted with the machine generated ones in
<sys/sysproto.h>.


# 11731 23-Oct-1995 bde

Fix a sign extension bug that was unleashed by the previous change.
The total process time was sometimes 2^32 usec too large but that
wasn't a problem before because the time was bogusly truncated mod
2^32.


# 11610 21-Oct-1995 bde

Avoid overflow in calcru(). Fixes PR 788.

Submitted by: imdave@synet.net (Dave Bodenstab)


# 8876 30-May-1995 rgrimes

Remove trailing whitespace.


# 6577 20-Feb-1995 guido

Implement maxprocperuid and maxfilesperproc. They are tunable
via sysctl(8). The initial value of maxprocperuid is maxproc-1,
that of maxfilesperproc is maxfiles (untill maxfile will disappear)

Now it is at least possible to prohibit one user opening maxfiles

-Guido

Submitted by:
Obtained from:


# 5009 06-Dec-1994 bde

Don't allow negative limits at all. Convert them to RLIM_INFINITY instead
of returning EINVAL since something may depend on them being broken.
Allowing negative limits caused bugs almost everywhere. The recent
fixes for MAXSSIZ checked the limits too late to stop anyone defeating
limits set by root...


# 4908 02-Dec-1994 ats

Add one forgotten u_quad_t typecast in dosetrlimit.


# 4889 01-Dec-1994 ats

The values for setrlimit in the data size and stack size case are
used as an address value. Then all comparisons should be done unsigned
and not signed. Fix it with a typecast of u_quad_t.
Error can be demonstrated with the current bash in port, do a
ulimit -s unlimited and the machine hangs. bash delivers through
an internal error a large negative value for the stacksize, the
comparison saw this smaller than MAXSSIZ and then tried to expand
the stack to this size.


# 3485 09-Oct-1994 phk

Cosmetics. related to getting prototypes into view.


# 3291 02-Oct-1994 dg

"idle priority" support. Based on code from Henrik Vestergaard Draboel,
but substantially rewritten by me.


# 3098 25-Sep-1994 phk

While in the real world, I had a bad case of being swapped out for a lot of
cycles. While waiting there I added a lot of the extra ()'s I have, (I have
never used LISP to any extent). So I compiled the kernel with -Wall and
shut up a lot of "suggest you add ()'s", removed a bunch of unused var's
and added a couple of declarations here and there. Having a lap-top is
highly recommended. My kernel still runs, yell at me if you kernel breaks.


# 2441 01-Sep-1994 dg

Realtime priority scheduling support.

Submitted by: Henrik Vestergaard Draboel


# 1817 02-Aug-1994 dg

Added $Id$


# 1549 25-May-1994 rgrimes

The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.

Reviewed by: Rodney W. Grimes
Submitted by: John Dyson and David Greenman


# 1542 24-May-1994 rgrimes

This commit was generated by cvs2svn to compensate for changes in r1541,
which included commits to RCS files with non-trunk default branches.


# 1541 24-May-1994 rgrimes

BSD 4.4 Lite Kernel Sources