#
256281 |
|
10-Oct-2013 |
gjb |
Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation |
#
254649 |
|
22-Aug-2013 |
kib |
Remove the deprecated VM_ALLOC_RETRY flag for the vm_page_grab(9). The flag was mandatory since r209792, where vm_page_grab(9) was changed to only support the alloc retry semantic.
Suggested and reviewed by: alc Sponsored by: The FreeBSD Foundation
|
#
254025 |
|
07-Aug-2013 |
jeff |
Replace kernel virtual address space allocation with vmem. This provides transparent layering and better fragmentation.
- Normalize functions that allocate memory to use kmem_* - Those that allocate address space are named kva_* - Those that operate on maps are named kmap_* - Implement recursive allocation handling for kmem_arena in vmem.
Reviewed by: alc Tested by: pho Sponsored by: EMC / Isilon Storage Division
|
#
248084 |
|
09-Mar-2013 |
attilio |
Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes.
The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs.
The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example).
Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho
|
#
237477 |
|
23-Jun-2012 |
kib |
Move the code dealing with shared page into a dedicated kern_sharedpage.c source file from kern_exec.c.
MFC after: 29 days
|
#
237474 |
|
23-Jun-2012 |
kib |
Stop updating the struct vdso_timehands from even handler executed in the scheduled task from tc_windup(). Do it directly from tc_windup in interrupt context [1].
Establish the permanent mapping of the shared page into the kernel address space, avoiding the potential need to sleep waiting for allocation of sf buffer during vdso_timehands update. As a consequence, shared_page_write_start() and shared_page_write_end() functions are not needed anymore.
Guess and memorize the pointers to native host and compat32 sysentvec during initialization, to avoid the need to get shared_page_alloc_sx lock during the update.
In tc_fill_vdso_timehands(), do not loop waiting for timehands generation to stabilize, since vdso_timehands is written in the same interrupt context which wrote timehands.
Requested by: mav [1] MFC after: 29 days
|
#
237433 |
|
22-Jun-2012 |
kib |
Implement mechanism to export some kernel timekeeping data to usermode, using shared page. The structures and functions have vdso prefix, to indicate the intended location of the code in some future.
The versioned per-algorithm data is exported in the format of struct vdso_timehands, which mostly repeats the content of in-kernel struct timehands. Usermode reading of the structure can be lockless. Compatibility export for 32bit processes on 64bit host is also provided. Kernel also provides usermode with indication about currently used timecounter, so that libc can fall back to syscall if configured timecounter is unknown to usermode code.
The shared data updates are initiated both from the tc_windup(), where a fast task is queued to do the update, and from sysctl handlers which change timecounter. A manual override switch kern.timecounter.fast_gettime allows to turn off the mechanism.
Only x86 architectures export the real algorithm data, and there, only for tsc timecounter. HPET counters page could be exported as well, but I prefer to not further glue the kernel and libc ABI there until proper vdso-based solution is developed.
Minimal stubs neccessary for non-x86 architectures to still compile are provided.
Discussed with: bde Reviewed by: jhb Tested by: flo MFC after: 1 month
|
#
237431 |
|
22-Jun-2012 |
kib |
Enchance the shared page chunk allocator.
Do not rely on the busy state of the page from which we allocate the chunk, to protect allocator state. Use statically allocated sx lock instead.
Provide more flexible KPI. In particular, allow to allocate chunk without providing initial data, and allow writes into existing allocation. Allow to get an sf buf which temporary maps the chunk, to allow sequential updates to shared page content without unmapping in between.
Reviewed by: jhb Tested by: flo MFC after: 1 month
|
#
232700 |
|
08-Mar-2012 |
jhb |
Add a new sched_clear_name() method to the scheduler interface to clear the cached name used for KTR_SCHED traces when a thread's name changes. This way KTR_SCHED traces (and thus schedgraph) will notice when a thread's name changes, most commonly via execve().
MFC after: 2 weeks
|
#
230341 |
|
19-Jan-2012 |
kib |
Use shared lock for the executable vnode in the exec path after the VV_TEXT changes are handled. Assert that vnode is exclusively locked at the places that modify VV_TEXT.
Discussed with: alc MFC after: 3 weeks
|
#
225791 |
|
27-Sep-2011 |
kib |
Do not deliver SIGTRAP on exec as the normal signal, use ptracestop() on syscall exit path. Otherwise, if SIGTRAP is ignored, that tdsendsignal() do not want to deliver the signal, and debugger never get a notification of exec.
Found and tested by: Anton Yuzhaninov <citrin citrin ru> Discussed with: jhb MFC after: 2 weeks
|
#
225617 |
|
16-Sep-2011 |
kmacy |
In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls.
Reviewed by: rwatson Approved by: re (bz)
|
#
224778 |
|
11-Aug-2011 |
rwatson |
Second-to-last commit implementing Capsicum capabilities in the FreeBSD kernel for FreeBSD 9.0:
Add a new capability mask argument to fget(9) and friends, allowing system call code to declare what capabilities are required when an integer file descriptor is converted into an in-kernel struct file *. With options CAPABILITIES compiled into the kernel, this enforces capability protection; without, this change is effectively a no-op.
Some cases require special handling, such as mmap(2), which must preserve information about the maximum rights at the time of mapping in the memory map so that they can later be enforced in mprotect(2) -- this is done by narrowing the rights in the existing max_protection field used for similar purposes with file permissions.
In namei(9), we assert that the code is not reached from within capability mode, as we're not yet ready to enforce namespace capabilities there. This will follow in a later commit.
Update two capability names: CAP_EVENT and CAP_KEVENT become CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they represent.
Approved by: re (bz) Submitted by: jonathan Sponsored by: Google Inc
|
#
224159 |
|
17-Jul-2011 |
rwatson |
Define two new sysctl node flags: CTLFLAG_CAPRD and CTLFLAG_CAPRW, which may be jointly referenced via the mask CTLFLAG_CAPRW. Sysctls with these flags are available in Capsicum's capability mode; other sysctl nodes are not.
Flag several useful sysctls as available in capability mode, such as memory layout sysctls required by the run-time linker and malloc(3). Also expose access to randomness and available kernel features.
A few sysctls are enabled to support name->MIB conversion; these may leak information to capability mode by virtue of providing resolution on names not flagged for access in capability mode. This is, generally, not a huge problem, but might be something to resolve in the future. Flag these cases with XXX comments.
Submitted by: jonathan Sponsored by: Google, Inc.
|
#
223692 |
|
30-Jun-2011 |
jonathan |
Add some checks to ensure that Capsicum is behaving correctly, and add some more explicit comments about what's going on and what future maintainers need to do when e.g. adding a new operation to a sys_machdep.c.
Approved by: mentor(rwatson), re(bz)
|
#
219042 |
|
25-Feb-2011 |
dchagin |
Introduce preliminary support of the show description of the ABI of traced process by adding two new events which records value of process sv_flags to the trace file at process creation/execing/exiting time.
MFC after: 1 Month.
|
#
217151 |
|
08-Jan-2011 |
kib |
Create shared (readonly) page. Each ABI may specify the use of page by setting SV_SHP flag and providing pointer to the vm object and mapping address. Provide simple allocator to carve space in the page, tailored to put the code with alignment restrictions.
Enable shared page use for amd64, both native and 32bit FreeBSD binaries. Page is private mapped at the top of the user address space, moving a start of the stack one page down. Move signal trampoline code from the top of the stack to the shared page.
Reviewed by: alc
|
#
214158 |
|
21-Oct-2010 |
jhb |
- When disabling ktracing on a process, free any pending requests that may be left. This fixes a memory leak that can occur when tracing is disabled on a process via disabling tracing of a specific file (or if an I/O error occurs with the tracefile) if the process's next system call is exit(). The trace disabling code clears p_traceflag, so exit1() doesn't do any KTRACE-related cleanup leading to the leak. I chose to make the free'ing of pending records synchronous rather than patching exit1(). - Move KTRACE-specific logic out of kern_(exec|exit|fork).c and into kern_ktrace.c instead. Make ktrace_mtx private to kern_ktrace.c as a result.
MFC after: 1 month
|
#
212002 |
|
30-Aug-2010 |
jh |
execve(2) has a special check for file permissions: a file must have at least one execute bit set, otherwise execve(2) will return EACCES even for an user with PRIV_VFS_EXEC privilege.
Add the check also to vaccess(9), vaccess_acl_nfs4(9) and vaccess_acl_posix1e(9). This makes access(2) to better agree with execve(2). Because ZFS doesn't use vaccess(9) for VEXEC, add the check to zfs_freebsd_access() too. There may be other file systems which are not using vaccess*() functions and need to be handled separately.
PR: kern/125009 Reviewed by: bde, trasz Approved by: pjd (ZFS part)
|
#
211616 |
|
22-Aug-2010 |
rpaulo |
Add an extra comment to the SDT probes definition. This allows us to get use '-' in probe names, matching the probe names in Solaris.[1]
Add userland SDT probes definitions to sys/sdt.h.
Sponsored by: The FreeBSD Foundation Discussed with: rwaston [1]
|
#
211412 |
|
17-Aug-2010 |
kib |
Supply some useful information to the started image using ELF aux vectors. In particular, provide pagesize and pagesizes array, the canary value for SSP use, number of host CPUs and osreldate.
Tested by: marius (sparc64) MFC after: 1 month
|
#
210555 |
|
28-Jul-2010 |
alc |
The interpreter name should no longer be treated as a buffer that can be overwritten. (This change should have been included in r210545.)
Submitted by: kib
|
#
210545 |
|
27-Jul-2010 |
alc |
Introduce exec_alloc_args(). The objective being to encapsulate the details of the string buffer allocation in one place.
Eliminate the portion of the string buffer that was dedicated to storing the interpreter name. The pointer to the interpreter name can simply be made to point to the appropriate argument string.
Reviewed by: kib
|
#
210475 |
|
25-Jul-2010 |
alc |
Change the order in which the file name, arguments, environment, and shell command are stored in exec*()'s demand-paged string buffer. For a "buildworld" on an 8GB amd64 multiprocessor, the new order reduces the number of global TLB shootdowns by 31%. It also eliminates about 330k page faults on the kernel address space.
Change exec_shell_imgact() to use "args->begin_argv" consistently as the start of the argument and environment strings. Previously, it would sometimes use "args->buf", which is the start of the overall buffer, but no longer the start of the argument and environment strings. While I'm here, eliminate unnecessary passing of "&length" to copystr(), where we don't actually care about the length of the copied string.
Clean up the initialization of the exec map. In particular, use the correct size for an entry, and express that size in the same way that is used when an entry is allocated. The old size was one page too large. (This discrepancy originated in 2004 when I rewrote exec_map_first_page() to use sf_buf_alloc() instead of the exec map for mapping the first page of the executable.)
Reviewed by: kib
|
#
210429 |
|
23-Jul-2010 |
alc |
Eliminate a little bit of duplicated code.
|
#
209848 |
|
09-Jul-2010 |
jhb |
Accidentally committed an older version of this comment rather than the final one.
|
#
209847 |
|
09-Jul-2010 |
jhb |
Refine a comment.
Reviewed by: bde
|
#
209648 |
|
02-Jul-2010 |
alc |
Use vm_page_next() instead of vm_page_lookup() in exec_map_first_page() because vm_page_next() is faster.
|
#
209592 |
|
29-Jun-2010 |
jhb |
Tweak the in-kernel API for sending signals to threads: - Rename tdsignal() to tdsendsignal() and make it private to kern_sig.c. - Add tdsignal() and tdksignal() routines that mirror psignal() and pksignal() except that they accept a thread as an argument instead of a process. They send a signal to a specific thread rather than to an individual process.
Reviewed by: kib
|
#
208453 |
|
23-May-2010 |
kib |
Reorganize syscall entry and leave handling.
Extend struct sysvec with three new elements: sv_fetch_syscall_args - the method to fetch syscall arguments from usermode into struct syscall_args. The structure is machine-depended (this might be reconsidered after all architectures are converted). sv_set_syscall_retval - the method to set a return value for usermode from the syscall. It is a generalization of cpu_set_syscall_retval(9) to allow ABIs to override the way to set a return value. sv_syscallnames - the table of syscall names.
Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding the call to cpu_set_syscall_retval().
The new functions syscallenter(9) and syscallret(9) are provided that use sv_*syscall* pointers and contain the common repeated code from the syscall() implementations for the architecture-specific syscall trap handlers.
Syscallenter() fetches arguments, calls syscall implementation from ABI sysent table, and set up return frame. The end of syscall bookkeeping is done by syscallret().
Take advantage of single place for MI syscall handling code and implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the thread is stopped at syscall entry or return point respectively. The EXEC flag augments SCX and notifies debugger that the process address space was changed by one of exec(2)-family syscalls.
The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are changed to use syscallenter()/syscallret(). MIPS and arm are not converted and use the mostly unchanged syscall() implementation.
Reviewed by: jhb, marcel, marius, nwhitehorn, stas Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc), stas (mips) MFC after: 1 month
|
#
207728 |
|
06-May-2010 |
alc |
Eliminate page queues locking around most calls to vm_page_free().
|
#
207669 |
|
05-May-2010 |
alc |
Acquire the page lock around all remaining calls to vm_page_free() on managed pages that didn't already have that lock held. (Freeing an unmanaged page, such as the various pmaps use, doesn't require the page lock.)
This allows a change in vm_page_remove()'s locking requirements. It now expects the page lock to be held instead of the page queues lock. Consequently, the page queues lock is no longer required at all by callers to vm_page_rename().
Discussed with: kib
|
#
207410 |
|
29-Apr-2010 |
kmacy |
On Alan's advice, rather than do a wholesale conversion on a single architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps.
Supported by: Bitgravity Inc.
Discussed with: alc, jeffr, and kib
|
#
205643 |
|
25-Mar-2010 |
nwhitehorn |
Add the ELF relocation base to struct image_params. This will be required to correctly relocate the executable entry point's function descriptor on powerpc64.
|
#
205642 |
|
25-Mar-2010 |
nwhitehorn |
Change the arguments of exec_setregs() so that it receives a pointer to the image_params struct instead of several members of that struct individually. This makes it easier to expand its arguments in the future without touching all platforms.
Reviewed by: jhb
|
#
205574 |
|
24-Mar-2010 |
nwhitehorn |
The nargvstr and nenvstr properties of arginfo are ints, not longs, so should be copied to userspace with suword32() instead of suword(). This alleviates problems on 64-bit big-endian architectures, and is a no-op on all 32-bit architectures.
Tested on: amd64, sparc64, powerpc64
|
#
198411 |
|
23-Oct-2009 |
jhb |
- Fix several off-by-one errors when using MAXCOMLEN. The p_comm[] and td_name[] arrays are actually MAXCOMLEN + 1 in size and a few places that created shadow copies of these arrays were just using MAXCOMLEN. - Prefer using sizeof() of an array type to explicit constants for the array length in a few places. - Ensure that all of p_comm[] and td_name[] is always zero'd during execve() to guard against any possible information leaks. Previously trailing garbage in p_comm[] could be leaked to userland in ktrace record headers via td_name[].
Reviewed by: bde
|
#
197711 |
|
02-Oct-2009 |
bz |
Add a mitigation feature that will prevent user mappings at virtual address 0, limiting the ability to convert a kernel NULL pointer dereference into a privilege escalation attack.
If the sysctl is set to 0 a newly started process will not be able to map anything in the address range of the first page (0 to PAGE_SIZE). This is the default. Already running processes are not affected by this.
You can either change the sysctl or the tunable from loader in case you need to map at a virtual address of 0, for example when running any of the extinct species of a set of a.out binaries, vm86 emulation, .. In that case set security.bsd.map_at_zero="1".
Superseeds: r197537 In collaboration with: jhb, kib, alc
|
#
197031 |
|
09-Sep-2009 |
kib |
Unlock the image vnode around the call of pmc PMC_FN_PROCESS_EXEC hook. The hook calls vn_fullpath(9), that should not be executed with a vnode lock held.
Reported by: Bruce Cran <bruce cran org uk> Tested by: pho MFC after: 3 days
|
#
195995 |
|
31-Jul-2009 |
jhb |
Fix some LORs between vnode locks and filedescriptor table locks. - Don't grab the filedesc lock just to read fd_cmask. - Drop vnode locks earlier when mounting the root filesystem and before sanitizing stdin/out/err file descriptors during execve().
Submitted by: kib Approved by: re (rwatson) MFC after: 1 week
|
#
195926 |
|
28-Jul-2009 |
rwatson |
Rework vnode argument auditing to follow the same structure, in order to avoid exposing ARG_ macros/flag values outside of the audit code in order to name which one of two possible vnodes will be audited for a system call.
Approved by: re (kib) Obtained from: TrustedBSD Project MFC after: 1 month
|
#
195104 |
|
27-Jun-2009 |
rwatson |
Replace AUDIT_ARG() with variable argument macros with a set more more specific macros for each audit argument type. This makes it easier to follow call-graphs, especially for automated analysis tools (such as fxr).
In MFC, we should leave the existing AUDIT_ARG() macros as they may be used by third-party kernel modules.
Suggested by: brooks Approved by: re (kib) Obtained from: TrustedBSD Project MFC after: 1 week
|
#
194498 |
|
19-Jun-2009 |
brooks |
Rework the credential code to support larger values of NGROUPS and NGROUPS_MAX, eliminate ABI dependencies on them, and raise the to 1024 and 1023 respectively. (Previously they were equal, but under a close reading of POSIX, NGROUPS_MAX was defined to be too large by 1 since it is the number of supplemental groups, not total number of groups.)
The bulk of the change consists of converting the struct ucred member cr_groups from a static array to a pointer. Do the equivalent in kinfo_proc.
Introduce new interfaces crcopysafe() and crsetgroups() for duplicating a process credential before modifying it and for setting group lists respectively. Both interfaces take care for the details of allocating groups array. crsetgroups() takes care of truncating the group list to the current maximum (NGROUPS) if necessary. In the future, crsetgroups() may be responsible for insuring invariants such as sorting the supplemental groups to allow groupmember() to be implemented as a binary search.
Because we can not change struct xucred without breaking application ABIs, we leave it alone and introduce a new XU_NGROUPS value which is always 16 and is to be used or NGRPS as appropriate for things such as NFS which need to use no more than 16 groups. When feasible, truncate the group list rather than generating an error.
Minor changes: - Reduce the number of hand rolled versions of groupmember(). - Do not assign to both cr_gid and cr_groups[0]. - Modify ipfw to cache ucreds instead of part of their contents since they are immutable once referenced by more than one entity.
Submitted by: Isilon Systems (initial implementation) X-MFC after: never PR: bin/113398 kern/133867
|
#
193643 |
|
07-Jun-2009 |
alc |
Eliminate unnecessary obfuscation when testing a page's valid bits.
|
#
193593 |
|
06-Jun-2009 |
alc |
If vm_pager_get_pages() returns VM_PAGER_OK, then there is no need to check the page's valid bits. The page is guaranteed to be fully valid. (For the record, this is documented in vm/vm_pager.h's comments.)
|
#
193511 |
|
05-Jun-2009 |
rwatson |
Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include.
Discussed with: pjd
|
#
189927 |
|
17-Mar-2009 |
kib |
Supply AT_EXECPATH auxinfo entry to the interpreter, both for native and compat32 binaries.
Tested by: pho Reviewed by: kan
|
#
189076 |
|
26-Feb-2009 |
ed |
Remove unneeded pointer `ndp'.
Inside do_execve(), we have a pointer `ndp', which always points to `&nd'. I can imagine a primitive (non-optimizing) compiler to really reserve space for such a pointer, so just remove the variable and use `&nd' directly.
|
#
189074 |
|
26-Feb-2009 |
ed |
Remove even more unneeded variable assignments.
kern_time.c: - Unused variable `p'.
kern_thr.c: - Variable `error' is always caught immediately, so no reason to initialize it. There is no way that error != 0 at the end of create_thread().
kern_sig.c: - Unused variable `code'.
kern_synch.c: - `rval' is always assigned in all different cases.
kern_rwlock.c: - `v' is always overwritten with RW_UNLOCKED further on.
kern_malloc.c: - `size' is always initialized with the proper value before being used.
kern_exit.c: - `error' is always caught and returned immediately. abort2() never returns a non-zero value.
kern_exec.c: - `len' is always assigned inside the if-statement right below it.
tty_info.c: - `td' is always overwritten by FOREACH_THREAD_IN_PROC().
Found by: LLVM's scan-build
|
#
185647 |
|
05-Dec-2008 |
kib |
Several threads in a process may do vfork() simultaneously. Then, all parent threads sleep on the parent' struct proc until corresponding child releases the vmspace. Each sleep is interlocked with proc mutex of the child, that triggers assertion in the sleepq_add(). The assertion requires that at any time, all simultaneous sleepers for the channel use the same interlock.
Silent the assertion by using conditional variable allocated in the child. Broadcast the variable event on exec() and exit().
Since struct proc * sleep wait channel is overloaded for several unrelated events, I was unable to remove wakeups from the places where cv_broadcast() is added, except exec().
Reported and tested by: ganbold Suggested and reviewed by: jhb MFC after: 2 week
|
#
184700 |
|
05-Nov-2008 |
rodrigc |
Merge latest DTrace changes from Perforce.
Approved by: jb
|
#
182905 |
|
10-Sep-2008 |
trasz |
Remove VSVTX, VSGID and VSUID. This should be a no-op, as VSVTX == S_ISVTX, VSGID == S_ISGID and VSUID == S_ISUID.
Approved by: rwatson (mentor)
|
#
182371 |
|
28-Aug-2008 |
attilio |
Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread was always curthread and totally unuseful.
Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
|
#
182158 |
|
25-Aug-2008 |
rwatson |
More fully audit fexecve(2) and its arguments.
Obtained from: TrustedBSD Project Sponsored by: Google, Inc.
|
#
182063 |
|
23-Aug-2008 |
rwatson |
Introduce two related changes to the TrustedBSD MAC Framework:
(1) Abstract interpreter vnode labeling in execve(2) and mac_execve(2) so that the general exec code isn't aware of the details of allocating, copying, and freeing labels, rather, simply passes in a void pointer to start and stop functions that will be used by the framework. This change will be MFC'd.
(2) Introduce a new flags field to the MAC_POLICY_SET(9) interface allowing policies to declare which types of objects require label allocation, initialization, and destruction, and define a set of flags covering various supported object types (MPC_OBJECT_PROC, MPC_OBJECT_VNODE, MPC_OBJECT_INPCB, ...). This change reduces the overhead of compiling the MAC Framework into the kernel if policies aren't loaded, or if policies require labels on only a small number or even no object types. Each time a policy is loaded or unloaded, we recalculate a mask of labeled object types across all policies present in the system. Eliminate MAC_ALWAYS_LABEL_MBUF option as it is no longer required.
MFC after: 1 week ((1) only) Reviewed by: csjp Obtained from: TrustedBSD Project Sponsored by: Apple, Inc.
|
#
181647 |
|
12-Aug-2008 |
csjp |
Reduce the scope of the vnode lock such that it does not cover the various copyouts associated with initializing the process's argv/env data in userspace. It is possible that these copyout operations can fault under memory pressure, possibly resulting in dead locks. This is believed to be safe since none of the copyout_strings() operations need to interact with the vnode here.
Submitted by: Zhouyi Zhou PR: kern/111260 Discussed with: kib MFC after: 3 weeks
|
#
180799 |
|
25-Jul-2008 |
kib |
Call pargs_drop() unconditionally in do_execve(), the function correctly handles the NULL argument. Make pargs_free() static.
MFC after: 1 week
|
#
180570 |
|
17-Jul-2008 |
kib |
Pair the VOP_OPEN call from do_execve() with the reciprocal VOP_CLOSE. This was unnoticed because local filesystems usually do nothing non-trivial in the close vop.
Reported and tested by: Rick Macklem MFC after: 2 weeks
|
#
179276 |
|
24-May-2008 |
jb |
Add DTrace 'proc' provider probes using the Statically Defined Trace (sdt) mechanism.
|
#
177787 |
|
31-Mar-2008 |
kib |
Implement the fexecve(2) syscall.
Based on the submission by rdivacky, sponsored by Google Summer of Code 2007 Reviewed by: rwatson, rdivacky Tested by: pho
|
#
177091 |
|
12-Mar-2008 |
jeff |
Remove kernel support for M:N threading.
While the KSE project was quite successful in bringing threading to FreeBSD, the M:N approach taken by the kse library was never developed to its full potential. Backwards compatibility will be provided via libmap.conf for dynamically linked binaries and static binaries will be broken.
|
#
175294 |
|
13-Jan-2008 |
attilio |
VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary.
KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed.
Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
|
#
175202 |
|
09-Jan-2008 |
attilio |
vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed.
Manpage and FreeBSD_version will be updated through further commits.
As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock.
Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>
|
#
174982 |
|
29-Dec-2007 |
alc |
Add the superpage reservation system. This is "part 2 of 2" of the machine-independent support for superpages. (The earlier part was the rewrite of the physical memory allocator.) The remainder of the code required for superpages support is machine-dependent and will be added to the various pmap implementations at a later date.
Initially, I am only supporting one large page size per architecture. Moreover, I am only enabling the reservation system on amd64. (In an emergency, it can be disabled by setting VM_NRESERVLEVELS to 0 in amd64/include/vmparam.h or your kernel configuration file.)
|
#
174253 |
|
04-Dec-2007 |
kib |
Implement fetching of the __FreeBSD_version from the ELF ABI-tag note. The value is read into the p_osrel member of the struct proc. p_osrel is set to 0 for the binaries without the note.
MFC after: 3 days
|
#
173599 |
|
14-Nov-2007 |
julian |
Make sure there is a good default thread name for all threads.
|
#
173361 |
|
05-Nov-2007 |
kib |
Fix for the panic("vm_thread_new: kstack allocation failed") and silent NULL pointer dereference in the i386 and sparc64 pmap_pinit() when the kmem_alloc_nofault() failed to allocate address space. Both functions now return error instead of panicing or dereferencing NULL.
As consequence, vmspace_exec() and vmspace_unshare() returns the errno int. struct vmspace arg was added to vm_forkproc() to avoid dealing with failed allocation when most of the fork1() job is already done.
The kernel stack for the thread is now set up in the thread_alloc(), that itself may return NULL. Also, allocation of the first process thread is performed in the fork1() to properly deal with stack allocation failure. proc_linkup() is separated into proc_linkup() called from fork1(), and proc_linkup0(), that is used to set up the kernel process (was known as swapper).
In collaboration with: Peter Holm Reviewed by: jhb
|
#
172930 |
|
24-Oct-2007 |
rwatson |
Merge first in a series of TrustedBSD MAC Framework KPI changes from Mac OS X Leopard--rationalize naming for entry points to the following general forms:
mac_<object>_<method/action> mac_<object>_check_<method/action>
The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names.
All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI.
Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer
|
#
172317 |
|
25-Sep-2007 |
alc |
Change the management of cached pages (PQ_CACHE) in two fundamental ways:
(1) Cached pages are no longer kept in the object's resident page splay tree and memq. Instead, they are kept in a separate per-object splay tree of cached pages. However, access to this new per-object splay tree is synchronized by the _free_ page queues lock, not to be confused with the heavily contended page queues lock. Consequently, a cached page can be reclaimed by vm_page_alloc(9) without acquiring the object's lock or the page queues lock.
This solves a problem independently reported by tegge@ and Isilon. Specifically, they observed the page daemon consuming a great deal of CPU time because of pages bouncing back and forth between the cache queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of this problem turned out to be a deadlock avoidance strategy employed when selecting a cached page to reclaim in vm_page_select_cache(). However, the root cause was really that reclaiming a cached page required the acquisition of an object lock while the page queues lock was already held. Thus, this change addresses the problem at its root, by eliminating the need to acquire the object's lock.
Moreover, keeping cached pages in the object's primary splay tree and memq was, in effect, optimizing for the uncommon case. Cached pages are reclaimed far, far more often than they are reactivated. Instead, this change makes reclamation cheaper, especially in terms of synchronization overhead, and reactivation more expensive, because reactivated pages will have to be reentered into the object's primary splay tree and memq.
(2) Cached pages are now stored alongside free pages in the physical memory allocator's buddy queues, increasing the likelihood that large allocations of contiguous physical memory (i.e., superpages) will succeed.
Finally, as a result of this change long-standing restrictions on when and where a cached page can be reclaimed and returned by vm_page_alloc(9) are eliminated. Specifically, calls to vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and return a formerly cached page. Consequently, a call to malloc(9) specifying M_NOWAIT is less likely to fail.
Discussed with: many over the course of the summer, including jeff@, Justin Husted @ Isilon, peter@, tegge@ Tested by: an earlier version by kris@ Approved by: re (kensmith)
|
#
171410 |
|
12-Jul-2007 |
jhb |
Fix a couple of issues with the stack limit for 32-bit processes on 64-bit kernels exposed by the recent fixes to resource limits for 32-bit processes on 64-bit kernels: - Let ABIs expose their maximum stack size via a new pointer in sysentvec and use that in preference to maxssiz during exec() rather than always using maxssiz for all processses. - Apply the ABI's limit fixup to the previous stack size when adjusting RLIMIT_STACK to determine if the existing mapping for the stack needs to be grown or shrunk (as well as how much it should be grown or shrunk).
Approved by: re (kensmith)
|
#
170684 |
|
13-Jun-2007 |
jhb |
Conditionally acquire Giant when dropping a reference on the ktrace vnode during execve() when turning off tracing due to executing a setuid binary as non-root. Previously this could fail to acquire Giant and fail an assertion if the ktrace file was on a non-MPSAFE filesystem and the executable was on an MPSAFE filesystem.
MFC after: 3 days Reported by: kris
|
#
170587 |
|
11-Jun-2007 |
rwatson |
Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in some cases, move to priv_check() if it was an operation on a thread and no other flags were present.
Eliminate caller-side jail exception checking (also now-unused); jail privilege exception code now goes solely in kern_jail.c.
We can't yet eliminate suser() due to some cases in the KAME code where a privilege check is performed and then used in many different deferred paths. Do, however, move those prototypes to priv.h.
Reviewed by: csjp Obtained from: TrustedBSD Project
|
#
170152 |
|
31-May-2007 |
kib |
Revert UF_OPENING workaround for CURRENT. Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation argument from being file descriptor index into the pointer to struct file.
Proposed and reviewed by: jhb Reviewed by: daichi (unionfs) Approved by: re (kensmith)
|
#
169565 |
|
14-May-2007 |
jhb |
Rework the support for ABIs to override resource limits (used by 32-bit processes under 64-bit kernels). Previously, each 32-bit process overwrote its resource limits at exec() time. The problem with this approach is that the new limits affect all child processes of the 32-bit process, including if the child process forks and execs a 64-bit process. To fix this, don't ovewrite the resource limits during exec(). Instead, sv_fixlimits() is now replaced with a different function sv_fixlimit() which asks the ABI to sanitize a single resource limit. We then use this when querying and setting resource limits. Thus, if a 32-bit process sets a limit, then that new limit will be inherited by future children. However, if the 32-bit process doesn't change a limit, then a future 64-bit child will see the "full" 64-bit limit rather than the 32-bit limit.
MFC is tentative since it will break the ABI of old linux.ko modules (no other modules are affected).
MFC after: 1 week
|
#
167876 |
|
25-Mar-2007 |
kris |
Update a comment: we usually call exec_vmspace_new with Giant not held, but sometimes it is.
|
#
167232 |
|
05-Mar-2007 |
rwatson |
Further system call comment cleanup:
- Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde) - Remove extra blank lines in some cases. - Add extra blank lines in some cases. - Remove no-op comments consisting solely of the function name, the word "syscall", or the system call name. - Add punctuation. - Re-wrap some comments.
|
#
167211 |
|
04-Mar-2007 |
rwatson |
Remove 'MPSAFE' annotations from the comments above most system calls: all system calls now enter without Giant held, and then in some cases, acquire Giant explicitly.
Remove a number of other MPSAFE annotations in the credential code and tweak one or two other adjacent comments.
|
#
164033 |
|
06-Nov-2006 |
rwatson |
Sweep kernel replacing suser(9) calls with priv(9) calls, assigning specific privilege names to a broad range of privileges. These may require some future tweaking.
Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>
|
#
163614 |
|
22-Oct-2006 |
alc |
The page queues lock is no longer required by vm_page_busy() or vm_page_wakeup(). Reduce or eliminate its use accordingly.
|
#
163606 |
|
22-Oct-2006 |
rwatson |
Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead.
This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd.
Obtained from: TrustedBSD Project Sponsored by: SPARTA
|
#
163604 |
|
22-Oct-2006 |
alc |
Replace PG_BUSY with VPO_BUSY. In other words, changes to the page's busy flag, i.e., VPO_BUSY, are now synchronized by the per-vm object lock instead of the global page queues lock.
|
#
161813 |
|
01-Sep-2006 |
wsalamon |
Audit the argv and env vectors passed in on exec: Add the argument auditing functions for argv and env. Add kernel-specific versions of the tokenizer functions for the arg and env represented as a char array. Implement the AUDIT_ARGV and AUDIT_ARGE audit policy commands to enable/disable argv/env auditing. Call the argument auditing from the exec system calls.
Obtained from: TrustedBSD Project Approved by: rwatson (mentor)
|
#
161302 |
|
15-Aug-2006 |
netchild |
- Change process_exec function handlers prototype to include struct image_params arg. - Change struct image_params to include struct sysentvec pointer and initialize it. - Change all consumers of process_exit/process_exec eventhandlers to new prototypes (includes splitting up into distinct exec/exit functions). - Add eventhandler to userret.
Sponsored by: Google SoC 2006 Submitted by: rdivacky Parts suggested by: jhb (on hackers@)
|
#
159003 |
|
28-May-2006 |
rwatson |
In execve(), audit the path name being executed. In the future, it would also be good to audit the interpreter pathname, if any.
Obtained from: TrustedBSD Project
|
#
158324 |
|
05-May-2006 |
tegge |
Temporarily unlock vnode for new image being executed to avoid lock order reversals that can lead to deadlocks. Normally vn_close(), namei() or vrele() should not be called while holding vnode locks.
|
#
157443 |
|
03-Apr-2006 |
peter |
Remove the unused sva and eva arguments from pmap_remove_pages().
|
#
156440 |
|
08-Mar-2006 |
ups |
Fix exec_map resource leaks.
Tested by: kris@
|
#
155402 |
|
06-Feb-2006 |
jhb |
- Always call exec_free_args() in kern_execve() instead of doing it in all the callers if the exec either succeeds or fails early. - Move the code to call exit1() if the exec fails after the vmspace is gone to the bottom of kern_execve() to cut down on some code duplication.
|
#
155205 |
|
02-Feb-2006 |
jeff |
- textvp may have been from a different mountpoint than ndp->ni_vp and we may need to acquire giant to vrele it.
Found by: mjacob MFC After: 3 days
|
#
153311 |
|
11-Dec-2005 |
alc |
Remove unneeded calls to pmap_remove_all(). The given page is not mapped.
Reviewed by: tegge
|
#
153259 |
|
09-Dec-2005 |
davidxu |
Register itimers_event_hook as a kernel event handler, so I don't have to duplicate code to call it in exec() and exit1().
|
#
153158 |
|
06-Dec-2005 |
alc |
Reduce the scope of the page queues lock in exec_map_first_page(). The vm object lock is sufficient for reading a page's PG_BUSY and busy flags.
MFC after: 1 week
|
#
151993 |
|
03-Nov-2005 |
davidxu |
Cleanup some signal interfaces. Now the tdsignal function accepts both proc pointer and thread pointer, if thread pointer is NULL, tdsignal automatically finds a thread, otherwise it sends signal to given thread. Add utility function psignal_event to send a realtime sigevent to a process according to the delivery requirement specified in struct sigevent.
|
#
151980 |
|
02-Nov-2005 |
ps |
Calling setrlimit from 32bit apps could potentially increase certain limits beyond what should be capiable in a 32bit process, so we must fixup the limits.
Reviewed by: jhb
|
#
151585 |
|
23-Oct-2005 |
davidxu |
Make p_itimers as a pointer, so file sys/proc.h does not need to include sys/timers.h.
|
#
151576 |
|
23-Oct-2005 |
davidxu |
Implement POSIX timers. Current only CLOCK_REALTIME and CLOCK_MONOTONIC clock are supported. I have plan to merge XSI timer ITIMER_REAL and other two CPU timers into the new code, current three slots are available for the XSI timers. The SIGEV_THREAD notification type is not supported yet because our sigevent struct lacks of two member fields: sigev_notify_function sigev_notify_attributes I have found the sigevent is used in AIO, so I won't add the two members unless the AIO code is adjusted.
|
#
151316 |
|
14-Oct-2005 |
davidxu |
1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most changes in MD code are trivial, before this change, trapsignal and sendsig use discrete parameters, now they uses member fields of ksiginfo_t structure. For sendsig, this change allows us to pass POSIX realtime signal value to user code.
2. Remove cpu_thread_siginfo, it is no longer needed because we now always generate ksiginfo_t data and feed it to libpthread.
3. Add p_sigqueue to proc structure to hold shared signals which were blocked by all threads in the proc.
4. Add td_sigqueue to thread structure to hold all signals delivered to thread.
5. i386 and amd64 now return POSIX standard si_code, other arches will be fixed.
6. In this sigqueue implementation, pending signal set is kept as before, an extra siginfo list holds additional siginfo_t data for signals. kernel code uses psignal() still behavior as before, it won't be failed even under memory pressure, only exception is when deleting a signal, we should call sigqueue_delete to remove signal from sigqueue but not SIGDELSET. Current there is no kernel code will deliver a signal with additional data, so kernel should be as stable as before, a ksiginfo can carry more information, for example, allow signal to be delivered but throw away siginfo data if memory is not enough. SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can not be caught or masked. The sigqueue() syscall allows user code to queue a signal to target process, if resource is unavailable, EAGAIN will be returned as specification said. Just before thread exits, signal queue memory will be freed by sigqueue_flush. Current, all signals are allowed to be queued, not only realtime signals.
Earlier patch reviewed by: jhb, deischen Tested on: i386, amd64
|
#
151252 |
|
12-Oct-2005 |
dds |
Move execve's access time update functionality into a new vfs_mark_atime() function, and use the new function for performing efficient atime updates in mmap().
Reviewed by: bde MFC after: 2 weeks
|
#
150895 |
|
04-Oct-2005 |
truckman |
Add missing word to comment.
|
#
150854 |
|
03-Oct-2005 |
cperciva |
If sufficiently bad things happen during a call to kern_execve(), it is possible for do_execve() to call exit1() rather than returning. As a result, the sequence "allocate memory; call kern_execve; free memory" can end up leaking memory.
This commit documents this astonishing behaviour and adds a call to exec_free_args() before the exit1() call in do_execve(). Since all the users of kern_execve() in the tree use exec_free_args() to free the command-line arguments after kern_execve() returns, this should be safe, and it fixes the memory leak which can otherwise occur.
Submitted by: Peter Holm MFC after: 3 days Security: Local denial of service
|
#
150780 |
|
01-Oct-2005 |
truckman |
Copy new process argument list in do_execve() before grabbing PROC_LOCK to avoid touching pageable memory while holding a mutex.
Simplify argument list replacement logic.
PR: kern/84935 Submitted by: "Antoine Pelisse" apelisse AT gmail.com (in a different form) MFC after: 3 days
|
#
147708 |
|
30-Jun-2005 |
jkoshy |
MFP4:
- pmcstat(8) gprof output mode fixes:
lib/libpmc/pmclog.{c,h}, sys/sys/pmclog.h: + Add a 'is_usermode' field to the PMCLOG_PCSAMPLE event + Add an 'entryaddr' field to the PMCLOG_PROCEXEC event, so that pmcstat(8) can determine where the runtime loader /libexec/ld-elf.so.1 is getting loaded.
sys/kern/kern_exec.c: + Use a local struct to group the entry address of the image being exec()'ed and the process credential changed flag to the exec handling hook inside hwpmc(4).
usr.sbin/pmcstat/*: + Support "-k kernelpath", "-D sampledir". + Implement the ELF bits of 'gmon.out' profile generation in a new file "pmcstat_log.c". Move all log related functions to this file. + Move local definitions and prototypes to "pmcstat.h"
- Other bug fixes: + lib/libpmc/pmclog.c: correctly handle EOF in pmclog_read(). + sys/dev/hwpmc_mod.c: unconditionally log a PROCEXIT event to all attached PMCs when a process exits. + sys/sys/pmc.h: correct a function prototype. + Improve usage checks in pmcstat(8).
Approved by: re (blanket hwpmc)
|
#
147565 |
|
23-Jun-2005 |
peter |
Move HWPMC_HOOKS into its own opt_hwpmc_hooks.h file. It doesn't merit being in opt_global.h and forcing a global recompile when only a few files reference it.
Approved by: re
|
#
147191 |
|
09-Jun-2005 |
jkoshy |
MFP4:
- Implement sampling modes and logging support in hwpmc(4).
- Separate MI and MD parts of hwpmc(4) and allow sharing of PMC implementations across different architectures. Add support for P4 (EMT64) style PMCs to the amd64 code.
- New pmcstat(8) options: -E (exit time counts) -W (counts every context switch), -R (print log file).
- pmc(3) API changes, improve our ability to keep ABI compatibility in the future. Add more 'alias' names for commonly used events.
- bug fixes & documentation.
|
#
146829 |
|
31-May-2005 |
kensmith |
This patch addresses a standards violation issue. The standards say a file's access time should be updated when it gets executed. A while ago the mechanism used to exec was changed to use a more mmap based mechanism and this behavior was broken as a side-effect of that.
A new vnode flag is added that gets set when the file gets executed, and the VOP_SETATTR() vnode operation gets called. The underlying filesystem is expected to handle it based on its own semantics, some filesystems don't support access time at all. Those that do should handle it in a way that does not block, does not generate I/O if possible, etc. In particular vn_start_write() has not been called. The UFS code handles it the same way as it would normally handle the access time if a file was read - the IN_ACCESS flag gets set in the inode but no other action happens at this point. The actual time update will happen later during a sync (which handles all the necessary locking).
Got me into this: cperciva Discussed with: a lot with bde, a little with kan Showed patches to: phk, jeffr, standards@, arch@ Minor discussion on: arch@
|
#
145833 |
|
03-May-2005 |
jeff |
- Initialize vfslocked correctly early enough for MAC to compile. - Fix one place where we explicitly drop Giant!
Pointy hat to: me Submitted by: Max Laier Warned by: Tinderbox
|
#
145821 |
|
03-May-2005 |
jeff |
- Use namei to acquire Giant for VFS if it is necessary. Drop the explicit Giant acquisition. - Remove GIANT_REQUIRED in the few remaining cases; the vm and vfs have both been locked.
|
#
145731 |
|
30-Apr-2005 |
jeff |
- Return EACCES if we're trying to exec on a vp with no object.
Errno supplied by: cperciva
|
#
145584 |
|
27-Apr-2005 |
jeff |
- Pass the ISOPEN flag to namei so filesystems will know we're about to open them or otherwise access the data.
|
#
145256 |
|
19-Apr-2005 |
jkoshy |
Bring a working snapshot of hwpmc(4), its associated libraries, userland utilities and documentation into -CURRENT.
Bump FreeBSD_version.
Reviewed by: alc, jhb (kernel changes)
|
#
142453 |
|
25-Feb-2005 |
sobomax |
Welcome to the 21st century: increase MAXSHELLCMDLEN from 128 bytes to PAGE_SIZE.
Unlike originator of the PR suggests retain MAXSHELLCMDLEN definition (he has been proposing to replace it with PAGE_SIZE everywhere), not only this reduced the diff significantly, but prevents code obfuscation and also allows to increase/decrease this parameter easily if needed.
PR: kern/64196 Submitted by: Magnus Bäckström <b@etek.chalmers.se>
|
#
141010 |
|
29-Jan-2005 |
sobomax |
Grrr, this committer needs to have a sleep. Remove lines from the previous delta not intended for public consumption.
MFC after: 2 weeks
|
#
141009 |
|
29-Jan-2005 |
sobomax |
Fix small non-conformance introduced in the previous commit: execve() is expected to return ENAMETOOLONG, not E2BIG if first argument doesn't fit into {PATH_MAX} bytes.
MFC after: 2 weeks
|
#
140992 |
|
29-Jan-2005 |
sobomax |
o Split out kernel part of execve(2) syscall into two parts: one that copies arguments into the kernel space and one that operates completely in the kernel space;
o use kernel-only version of execve(2) to kill another stackgap in linuxlator/i386.
Obtained from: DragonFlyBSD (partially) MFC after: 2 weeks
|
#
140782 |
|
24-Jan-2005 |
phk |
Don't use VOP_GETVOBJECT, use vp->v_object directly.
|
#
139804 |
|
06-Jan-2005 |
imp |
/* -> /*- for copyright notices, minor format tweaks as necessary
|
#
138832 |
|
14-Dec-2004 |
phk |
Add new function fdunshare() which encapsulates the necessary light magic for ensuring that a process' filedesc is not shared with anybody.
Use it in the two places which previously had private implmentations.
This collects all fd_refcnt handling in kern_descrip.c
|
#
138129 |
|
27-Nov-2004 |
das |
Don't include sys/user.h merely for its side-effect of recursively including other headers.
|
#
137647 |
|
13-Nov-2004 |
phk |
Introduce an alias for FILEDESC_{UN}LOCK() with the suffix _FAST.
Use this in all the places where sleeping with the lock held is not an issue.
The distinction will become significant once we finalize the exact lock-type to use for this kind of case.
|
#
137384 |
|
08-Nov-2004 |
phk |
Use more intuitive pointer for fdinit() and fdcopy().
Change fdcopy() to take unlocked filedesc.
|
#
136404 |
|
11-Oct-2004 |
peter |
Put on my peril sensitive sunglasses and add a flags field to the internal sysctl routines and state. Add some code to use it for signalling the need to downconvert a data structure to 32 bits on a 64 bit OS when requested by a 32 bit app.
I tried to do this in a generic abi wrapper that intercepted the sysctl oid's, or looked up the format string etc, but it was a real can of worms that turned into a fragile mess before I even got it partially working.
With this, we can now run 'sysctl -a' on a 32 bit sysctl binary and have it not abort. Things like netstat, ps, etc have a long way to go.
This also fixes a bug in the kern.ps_strings and kern.usrstack hacks. These do matter very much because they are used by libc_r and other things.
|
#
136221 |
|
07-Oct-2004 |
davidxu |
Add an execve command for kse_thr_interrupt to allow libpthread to restore signal mask correctly, this is required by POSIX.
Reviewed by: deischen
|
#
136177 |
|
05-Oct-2004 |
davidxu |
In original kern_execve() code, at the start of the function, it forces all other threads to suicide, problem is execve() could be failed, and a failed execve() would change threaded process to unthreaded, this side effect is unexpected. The new code introduces a new single threading mode SINGLE_BOUNDARY, in the mode, all threads should suspend themself at user boundary except the singler. we can not use SINGLE_NO_EXIT because we want to start from a clean state if execve() is successful, suspending other threads at unknown point and later resuming them from there and forcing them to exit at user boundary may cause the process to start from a dirty state. If execve() is successful, current thread upgrades to SINGLE_EXIT mode and forces other threads to suicide at user boundary, otherwise, other threads will be resumed and their interrupted syscall will be restarted.
Reviewed by: julian
|
#
135633 |
|
23-Sep-2004 |
jhb |
- Don't try to unlock Giant if single threading fails since we don't have it locked. - Unlock Giant before calling exit1() since exit1() does not require Giant.
|
#
135562 |
|
21-Sep-2004 |
julian |
Revert the last change.. Better to kill all other threads than to panic the system if 2 threads call execve() at the same time. A better fix will be committed later.
Note that this only affects the case where the execve fails.
|
#
135552 |
|
21-Sep-2004 |
julian |
In a threaded process, don't kill off all the other threads until we have a reasonable chance that the eceve() is going to succeeed. I.e. wait until we've done the permission checks etc.
MFC after: 1 week
|
#
134791 |
|
05-Sep-2004 |
julian |
Refactor a bunch of scheduler code to give basically the same behaviour but with slightly cleaned up interfaces.
The KSE structure has become the same as the "per thread scheduler private data" structure. In order to not make the diffs too great one is #defined as the other at this time.
The KSE (or td_sched) structure is now allocated per thread and has no allocation code of its own.
Concurrency for a KSEGRP is now kept track of via a simple pair of counters rather than using KSE structures as tokens.
Since the KSE structure is different in each scheduler, kern_switch.c is now included at the end of each scheduler. Nothing outside the scheduler knows the contents of the KSE (aka td_sched) structure.
The fields in the ksegrp structure that are to do with the scheduler's queueing mechanisms are now moved to the kg_sched structure. (per ksegrp scheduler private data structure). In other words how the scheduler queues and keeps track of threads is no-one's business except the scheduler's. This should allow people to write experimental schedulers with completely different internal structuring.
A scheduler call sched_set_concurrency(kg, N) has been added that notifies teh scheduler that no more than N threads from that ksegrp should be allowed to be on concurrently scheduled. This is also used to enforce 'fainess' at this time so that a ksegrp with 10000 threads can not swamp a the run queue and force out a process with 1 thread, since the current code will not set the concurrency above NCPU, and both schedulers will not allow more than that many onto the system run queue at a time. Each scheduler should eventualy develop their own methods to do this now that they are effectively separated.
Rejig libthr's kernel interface to follow the same code paths as linkse for scope system threads. This has slightly hurt libthr's performance but I will work to recover as much of it as I can.
Thread exit code has been cleaned up greatly. exit and exec code now transitions a process back to 'standard non-threaded mode' before taking the next step. Reviewed by: scottl, peter MFC after: 1 week
|
#
133741 |
|
15-Aug-2004 |
jmg |
Add locking to the kqueue subsystem. This also makes the kqueue subsystem a more complete subsystem, and removes the knowlege of how things are implemented from the drivers. Include locking around filter ops, so a module like aio will know when not to be unloaded if there are outstanding knotes using it's filter ops.
Currently, it uses the MTX_DUPOK even though it is not always safe to aquire duplicate locks. Witness currently doesn't support the ability to discover if a dup lock is ok (in some cases).
Reviewed by: green, rwatson (both earlier versions)
|
#
132653 |
|
26-Jul-2004 |
cperciva |
Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is somewhat clearer, but more importantly allows for a consistent naming scheme for suser_cred flags.
The old name is still defined, but will be removed in a few days (unless I hear any complaints...)
Discussed with: rwatson, scottl Requested by: jhb
|
#
132592 |
|
24-Jul-2004 |
julian |
White space fix.. diff reduction for upcoming commit.
|
#
132082 |
|
13-Jul-2004 |
alc |
Push down the acquisition and release of the page queues lock into pmap_remove_pages(). (The implementation of pmap_remove_pages() is optional. If pmap_remove_pages() is unimplemented, the acquisition and release of the page queues lock is unnecessary.)
Remove spl calls from the alpha, arm, and ia64 pmap_remove_pages().
|
#
129989 |
|
02-Jun-2004 |
tjr |
Move TDF_SA from td_flags to td_pflags (and rename it accordingly) so that it is no longer necessary to hold sched_lock while manipulating it.
Reviewed by: davidxu
|
#
129547 |
|
21-May-2004 |
davidxu |
Clear KSE thread flags after KSE thread mode is ended. The side effect of not clearing the flags for execv() syscall will result that a new program runs in KSE thread mode without enabling it.
Submitted by: tjr Modified by: davidxu
|
#
128568 |
|
23-Apr-2004 |
alc |
Utilize sf_buf_alloc() rather than pmap_qenter() (and sometimes kmem_alloc_wait()) for mapping the image header. On all machines with a direct virtual-to-physical mapping and SMP/HTT i386s, this is a clear win.
|
#
128134 |
|
11-Apr-2004 |
alc |
Use vm_page_hold() rather than vm_page_wire() for short-duration page wiring. The reason being that vm_page_hold() is cheaper.
|
#
127696 |
|
31-Mar-2004 |
pjd |
Remove sysctl kern.ps_argsopen, it is not very useful, one should use security.bsd.see_other_uids instead.
Discussed with: phk, rwatson
|
#
126941 |
|
14-Mar-2004 |
peter |
Make the process_exit eventhandler run without Giant. Add Giant hooks in the two consumers that need it.. processes using AIO and netncp. Update docs. Say that process_exec is called with Giant, but not to depend on it. All our consumers can handle it without Giant.
|
#
126932 |
|
13-Mar-2004 |
peter |
Push Giant down a little further: - no longer serialize on Giant for thread_single*() and family in fork, exit and exec - thread_wait() is mpsafe, assert no Giant - reduce scope of Giant in exit to not cover thread_wait and just do vm_waitproc(). - assert that thread_single() family are not called with Giant - remove the DROP/PICKUP_GIANT macros from thread_single() family - assert that thread_suspend_check() s not called with Giant - remove manual drop_giant hack in thread_suspend_check since we know it isn't held. - remove the DROP/PICKUP_GIANT macros from thread_suspend_check() family - mark kse_create() mpsafe
|
#
126889 |
|
12-Mar-2004 |
ru |
Do what the execve(2) manpage says and enforce what a Strictly Conforming POSIX application should do by disallowing the argv argument to be NULL.
PR: kern/33738 Submitted by: Marc Olzheim, Serge van den Boom OK'ed by: nectar
|
#
126671 |
|
05-Mar-2004 |
jhb |
Lock Giant around the single threading code in exec() to satisfy an assertion in the single threading code.
|
#
125953 |
|
17-Feb-2004 |
peter |
Checkpoint a hack to enable running i386 libc_r binaries on a 64 bit kernel. I'm not happy with it yet - refinements are to come. This hack allows the kern.ps_strings and kern.usrstack sysctls to respond to a 32 bit request, such as those coming from emulated i386 binaries.
|
#
123924 |
|
28-Dec-2003 |
bde |
Fixed some style bugs (mainly, try to always use explicit comparisons with NULL when checking for null pointers).
|
#
123923 |
|
28-Dec-2003 |
bde |
Fixed some disordering in revs.1.194 and 1,196. Moved the exceve() syscall function back to near the beginning of the file. Rev.1.194 moved it into the middle of auxiliary functions following kern_execve(). Moved the __mac_execve() syscall function up together with execve(). It was new in rev1.1.196 and perfectly misplaced after execve().
|
#
123909 |
|
27-Dec-2003 |
alc |
Remove GIANT_REQUIRED from exec_unmap_first_page().
|
#
122524 |
|
12-Nov-2003 |
rwatson |
Modify the MAC Framework so that instead of embedding a (struct label) in various kernel objects to represent security data, we embed a (struct label *) pointer, which now references labels allocated using a UMA zone (mac_label.c). This allows the size and shape of struct label to be varied without changing the size and shape of these kernel objects, which become part of the frozen ABI with 5-STABLE. This opens the door for boot-time selection of the number of label slots, and hence changes to the bound on the number of simultaneous labeled policies at boot-time instead of compile-time. This also makes it easier to embed label references in new objects as required for locking/caching with fine-grained network stack locking, such as inpcb structures.
This change also moves us further in the direction of hiding the structure of kernel objects from MAC policy modules, not to mention dramatically reducing the number of '&' symbols appearing in both the MAC Framework and MAC policy modules, and improving readability.
While this results in minimal performance change with MAC enabled, it will observably shrink the size of a number of critical kernel data structures for the !MAC case, and should have a small (but measurable) performance benefit (i.e., struct vnode, struct socket) do to memory conservation and reduced cost of zeroing memory.
NOTE: Users of MAC must recompile their kernel and all MAC modules as a result of this change. Because this is an API change, third party MAC modules will also need to be updated to make less use of the '&' symbol.
Suggestions from: bmilekic Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
#
121294 |
|
20-Oct-2003 |
marcel |
Remove md_bspstore from the MD fields of struct thread. Now that the backing store is at a fixed address, there's no need for a per-thread variable.
|
#
121268 |
|
20-Oct-2003 |
marcel |
Put the RSE backing store at a fixed address. This change is triggered by libguile that needs to know the base of the RSE backing store. We currently do not export the fixed address to userland by means of a sysctl so user code needs to hardcode it for now. This will be revisited later.
The RSE backing store is now at the bottom of region 4. The memory stack is at the top of region 4. This means that the whole region is usable for the stacks, giving a 61-bit stack space.
Port: lang/guile (depended of x11/gnome2)
|
#
120769 |
|
04-Oct-2003 |
alc |
Eliminate some unnecessary uses of the vm page queues lock around the vm page's valid field. This field is being synchronized using the containing vm object's lock.
|
#
120532 |
|
27-Sep-2003 |
marcel |
Remove the regstkpages sysctl variable. We have a growable register stack now.
|
#
120531 |
|
27-Sep-2003 |
marcel |
Part 2 of implementing rstacks: add the ability to create rstacks and use the ability on ia64 to map the register stack. The orientation of the stack (i.e. its grow direction) is passed to vm_map_stack() in the overloaded cow argument. Since the grow direction is represented by bits, it is possible and allowed to create bi-directional stacks. This is not an advertised feature, more of a side-effect.
Fix a bug in vm_map_growstack() that's specific to rstacks and which we could only find by having the ability to create rstacks: when the mapped stack ends at the faulting address, we have not actually mapped the faulting address. we need to include or cover the faulting address.
Note that at this time mmap(2) has not been extended to allow the creation of rstacks by processes. If such a need arises, this can be done.
Tested on: alpha, i386, ia64, sparc64
|
#
120422 |
|
24-Sep-2003 |
peter |
Add sysentvec->sv_fixlimits() hook so that we can catch cases on 64 bit systems where the data/stack/etc limits are too big for a 32 bit process.
Move the 5 or so identical instances of ELF_RTLD_ADDR() into imgact_elf.c.
Supply an ia32_fixlimits function. Export the clip/default values to sysctl under the compat.ia32 heirarchy.
Have mmap(0, ...) respect the current p->p_limits[RLIMIT_DATA].rlim_max value rather than the sysctl tweakable variable. This allows mmap to place mappings at sensible locations when limits have been reduced.
Have the imgact_elf.c ld-elf.so.1 placement algorithm use the same method as mmap(0, ...) now does.
Note that we cannot remove all references to the sysctl tweakable maxdsiz etc variables because /etc/login.conf specifies a datasize of 'unlimited'. And that causes exec etc to fail since it can no longer find space to mmap things.
|
#
118047 |
|
26-Jul-2003 |
phk |
Add a "int fd" argument to VOP_OPEN() which in the future will contain the filedescriptor number on opens from userland.
The index is used rather than a "struct file *" since it conveys a bit more information, which may be useful to in particular fdescfs and /dev/fd/*
For now pass -1 all over the place.
|
#
116361 |
|
14-Jun-2003 |
davidxu |
Rename P_THREADED to P_SA. P_SA means a process is using scheduler activations.
|
#
116279 |
|
13-Jun-2003 |
alc |
Add vm object locking to various pagers' "get pages" methods, i386 stack management functions, and a u area management function.
|
#
116182 |
|
10-Jun-2003 |
obrien |
Use __FBSDID().
|
#
116114 |
|
09-Jun-2003 |
alc |
Update the vm object and page locking in exec_map_first_page(). Mark the one still anticipated change with XXX. Otherwise, this function is done.
|
#
116013 |
|
08-Jun-2003 |
alc |
Lock the vm object when performing vm_page_grab().
|
#
114983 |
|
13-May-2003 |
jhb |
- Merge struct procsig with struct sigacts. - Move struct sigacts out of the u-area and malloc() it using the M_SUBPROC malloc bucket. - Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(), sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared(). - Remove the p_sigignore, p_sigacts, and p_sigcatch macros. - Add a mutex to struct sigacts that protects all the members of the struct. - Add sigacts locking. - Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now that sigacts is locked. - Several in-kernel functions such as psignal(), tdsignal(), trapsignal(), and thread_stopped() are now MP safe.
Reviewed by: arch@ Approved by: re (rwatson)
|
#
112910 |
|
31-Mar-2003 |
jeff |
- Borrow the KSE single threading code for exec and exit. We use the check if (p->p_numthreads > 1) and not a flag because action is only necessary if there are other threads. The rest of the system has no need to identify thr threaded processes. - In kern_thread.c use thr_exit1() instead of thread_exit() if P_THREADED is not set.
|
#
112564 |
|
24-Mar-2003 |
jhb |
Replace the at_fork, at_exec, and at_exit functions with the slightly more flexible process_fork, process_exec, and process_exit eventhandlers. This reduces code duplication and also means that I don't have to go duplicate the eventhandler locking three more times for each of at_fork, at_exec, and at_exit.
Reviewed by: phk, jake, almost complete silence on arch@
|
#
112198 |
|
13-Mar-2003 |
jhb |
- Cache a reference to the credential of the thread that starts a ktrace in struct proc as p_tracecred alongside the current cache of the vnode in p_tracep. This credential is then used for all later ktrace operations on this file rather than using the credential of the current thread at the time of each ktrace event. - Now that we have multiple ktrace-related items in struct proc that are pointers, rename p_tracep to p_tracevp to make it less ambiguous.
Requested by: rwatson (1)
|
#
111585 |
|
27-Feb-2003 |
julian |
Change the process flags P_KSES to be P_THREADED. This is just a cosmetic change but I've been meaning to do it for about a year.
|
#
111119 |
|
19-Feb-2003 |
imp |
Back out M_* changes, per decision of the TRB.
Approved by: trb
|
#
111028 |
|
17-Feb-2003 |
jeff |
- Split the struct kse into struct upcall and struct kse. struct kse will soon be visible only to schedulers. This greatly simplifies much the KSE code.
Submitted by: davidxu
|
#
110190 |
|
01-Feb-2003 |
julian |
Reversion of commit by Davidxu plus fixes since applied.
I'm not convinced there is anything major wrong with the patch but them's the rules..
I am using my "David's mentor" hat to revert this as he's offline for a while.
|
#
109877 |
|
26-Jan-2003 |
davidxu |
Move UPCALL related data structure out of kse, introduce a new data structure called kse_upcall to manage UPCALL. All KSE binding and loaning code are gone.
A thread owns an upcall can collect all completed syscall contexts in its ksegrp, turn itself into UPCALL mode, and takes those contexts back to userland. Any thread without upcall structure has to export their contexts and exit at user boundary.
Any thread running in user mode owns an upcall structure, when it enters kernel, if the kse mailbox's current thread pointer is not NULL, then when the thread is blocked in kernel, a new UPCALL thread is created and the upcall structure is transfered to the new UPCALL thread. if the kse mailbox's current thread pointer is NULL, then when a thread is blocked in kernel, no UPCALL thread will be created.
Each upcall always has an owner thread. Userland can remove an upcall by calling kse_exit, when all upcalls in ksegrp are removed, the group is atomatically shutdown. An upcall owner thread also exits when process is in exiting state. when an owner thread exits, the upcall it owns is also removed.
KSE is a pure scheduler entity. it represents a virtual cpu. when a thread is running, it always has a KSE associated with it. scheduler is free to assign a KSE to thread according thread priority, if thread priority is changed, KSE can be moved from one thread to another.
When a ksegrp is created, there is always N KSEs created in the group. the N is the number of physical cpu in the current system. This makes it is possible that even an userland UTS is single CPU safe, threads in kernel still can execute on different cpu in parallel. Userland calls kse_create to add more upcall structures into ksegrp to increase concurrent in userland itself, kernel is not restricted by number of upcalls userland provides.
The code hasn't been tested under SMP by author due to lack of hardware.
Reviewed by: julian
|
#
109623 |
|
21-Jan-2003 |
alfred |
Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
|
#
109606 |
|
21-Jan-2003 |
rwatson |
Perform VOP_GETATTR() before mac_check_vnode_exec() so that the cached attributes are available to MAC modules.
Submitted by: mike halderman <mrh@nosc.mil> Obtained from: TrustedBSD Project
|
#
109205 |
|
13-Jan-2003 |
dillon |
It is possible for an active aio to prevent shared memory from being dereferenced when a process exits due to the vmspace ref-count being bumped. Change shmexit() and shmexit_myhook() to take a vmspace instead of a process and call it in vmspace_dofree(). This way if it is missed in exit1()'s early-resource-free it will still be caught when the zombie is reaped.
Also fix a potential race in shmexit_myhook() by NULLing out vmspace->vm_shm prior to calling shm_delete_mapping() and free().
MFC after: 7 days
|
#
108869 |
|
07-Jan-2003 |
davidxu |
Clear some KSE fields after kse mode was turned off.
|
#
108645 |
|
04-Jan-2003 |
jake |
Add a sysctl to get the vm protections for the stack of the current process. On architectures with a non-executable stack, eg sparc64, this is used by libgcc to determine at runtime if its necessary to enable execute permissions on a region of the stack which will be used to execute code, allowing the call to mprotect to be avoided if the kernel is configured to map the stack executable.
|
#
108522 |
|
31-Dec-2002 |
alfred |
fdcopy() only needs a filedesc pointer.
|
#
108053 |
|
18-Dec-2002 |
alc |
Hold the page queues lock when performing vm_page_busy().
|
#
107850 |
|
14-Dec-2002 |
alfred |
remove syscallarg().
Suggested by: peter
|
#
107274 |
|
26-Nov-2002 |
robert |
To avoid sleeping with all sorts of resources acquired (the reported problem was a locked directory vnode), do not give the process a chance to sleep in state "stopevent" (depends on the S_EXEC bit being set in p_stops) until most resources have been released again.
Approved by: re
|
#
107216 |
|
25-Nov-2002 |
alc |
Acquire and release the page queues lock around pmap_remove_pages() because it updates several of vm_page's fields.
|
#
107001 |
|
17-Nov-2002 |
jeff |
- Release the imgp vnode prior to freeing exec_map resources to avoid deadlock.
|
#
106981 |
|
16-Nov-2002 |
alc |
Now that pmap_remove_all() is exported by our pmap implementations use it directly.
|
#
106720 |
|
10-Nov-2002 |
alc |
When prot is VM_PROT_NONE, call pmap_page_protect() directly rather than indirectly through vm_page_protect(). The one remaining page flag that is updated by vm_page_protect() is already being updated by our various pmap implementations.
Note: A later commit will similarly change the VM_PROT_READ case and eliminate vm_page_protect().
|
#
106470 |
|
05-Nov-2002 |
rwatson |
Correct merge-o: disable the right execve() variation if !MAC
|
#
106468 |
|
05-Nov-2002 |
rwatson |
Bring in two sets of changes:
(1) Permit userland applications to request a change of label atomic with an execve() via mac_execve(). This is required for the SEBSD port of SELinux/FLASK. Attempts to invoke this without MAC compiled in result in ENOSYS, as with all other MAC system calls. Complexity, if desired, is present in policy modules, rather than the framework.
(2) Permit policies to have access to both the label of the vnode being executed as well as the interpreter if it's a shell script or related UNIX nonsense. Because we can't hold both vnode locks at the same time, cache the interpreter label. SEBSD relies on this because it supports secure transitioning via shell script executables. Other policies might want to take both labels into account during an integrity or confidentiality decision at execve()-time.
Approved by: re Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
#
106459 |
|
05-Nov-2002 |
rwatson |
Hook up the mac_will_execve_transition() and mac_execve_transition() entrypoints, #ifdef MAC. The supporting logic already existed in kern_mac.c, so no change there. This permits MAC policies to cause a process label change as the result of executing a binary -- typically, as a result of executing a specially labeled binary.
For example, the SEBSD port of SELinux/FLASK uses this functionality to implement TE type transitions on processes using transitioning binaries, in a manner similar to setuid. Policies not implementing a notion of transition (all the ones in the tree right now) require no changes, since the old label data is copied to the new label via mac_create_cred() even if a transition does occur.
Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
#
106437 |
|
04-Nov-2002 |
rwatson |
Remove reference to struct execve_args from struct imgact, which describes an image activation instance. Instead, make use of the existing fname structure entry, and introduce two new entries, userspace_argv, and userspace_envv. With the addition of mac_execve(), this divorces the image structure from the specifics of the execve() system call, removes a redundant pointer, etc. No semantic change from current behavior, but it means that the structure doesn't depend on syscalls.master-generated includes.
There seems to be some redundant initialization of imgact entries, which I have maintained, but which could probably use some cleaning up at some point.
Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
#
104937 |
|
11-Oct-2002 |
jhb |
- Move the 'done1' label down below the unlock of the proc lock and move the locking of the proc lock after the goto to done1 to avoid locking the lock in an error case just so we can turn around and unlock it. - Move the exec_setregs() stuff out from under the proc lock and after the p_args stuff. This allows exec_setregs() to be able to sleep or write things out to userland, etc. which ia64 does.
Tested by: peter
|
#
103767 |
|
21-Sep-2002 |
jake |
Use the fields in the sysentvec and in the vm map header in place of the constants VM_MIN_ADDRESS, VM_MAXUSER_ADDRESS, USRSTACK and PS_STRINGS. This is mainly so that they can be variable even for the native abi, based on different machine types. Get stack protections from the sysentvec too. This makes it trivial to map the stack non-executable for certain abis, on machines that support it.
|
#
103326 |
|
14-Sep-2002 |
njl |
Move setugidsafety() call outside of process lock. This prevents a lock recursion when closef() calls pfind() which also wants the proc lock. This case only occurred when setugidsafety() needed to close unsafe files.
Reviewed by: truckman
|
#
103277 |
|
13-Sep-2002 |
truckman |
Drop the proc lock while calling fdcheckstd() which may block to allocate memory.
Reviewed by: jhb
|
#
102950 |
|
05-Sep-2002 |
davidxu |
s/SGNL/SIG/ s/SNGL/SINGLE/ s/SNGLE/SINGLE/
Fix abbreviation for P_STOPPED_* etc flags, in original code they were inconsistent and difficult to distinguish between them.
Approved by: julian (mentor)
|
#
102808 |
|
01-Sep-2002 |
jake |
Added fields for VM_MIN_ADDRESS, PS_STRINGS and stack protections to sysentvec. Initialized all fields of all sysentvecs, which will allow them to be used instead of constants in more places. Provided stack fixup routines for emulations that previously used the default.
|
#
102561 |
|
29-Aug-2002 |
jake |
Renamed poorly named setregs to exec_setregs. Moved its prototype to imgact.h with the other exec support functions.
|
#
102548 |
|
28-Aug-2002 |
jake |
Don't require that sysentvec.sv_szsigcode be non-NULL.
|
#
102424 |
|
25-Aug-2002 |
jake |
Fixed most indentation bugs.
|
#
102423 |
|
25-Aug-2002 |
jake |
Fixed placement of operators. Wrapped long lines.
|
#
102381 |
|
24-Aug-2002 |
jake |
Fixed white space around operators, casts and reserved words.
Reviewed by: md5
|
#
102377 |
|
24-Aug-2002 |
jake |
return x; -> return (x); return(x); -> return (x);
Reviewed by: md5
|
#
102292 |
|
22-Aug-2002 |
julian |
slight cleanup of single-threading code for KSE processes
|
#
101771 |
|
13-Aug-2002 |
jeff |
- Hold the vnode lock throughout execve. - Set VV_TEXT in the top level execve code. - Fixup the image activators to deal with the newly locked vnode.
|
#
101158 |
|
01-Aug-2002 |
rwatson |
Introduce support for Mandatory Access Control and extensible kernel access control.
Invoke an appropriate MAC entry point to authorize execution of a file by a process. The check is placed slightly differently than it appears in the trustedbsd_mac tree so that it prevents a little more information leakage about the target of the execve() operation.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
100950 |
|
30-Jul-2002 |
nectar |
For processes which are set-user-ID or set-group-ID, the kernel performs a few special actions for safety. One of these is to make sure that file descriptors 0..2 are in use, by opening /dev/null for those that are not already open. Another is to close any file descriptors 0..2 that reference procfs. However, these checks were made out of order, so that it was still possible for a set-user-ID or set-group-ID process to be started with some of the file descriptors 0..2 unused.
Submitted by: Georgi Guninski <guninski@guninski.com>
|
#
100759 |
|
27-Jul-2002 |
rwatson |
Slight restructuring of the logic for credential change case identification during execve() to use a 'credential_changing' variable. This makes it easier to have outstanding patchsets against this code, as well as to add conditionally defined clauses.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
100384 |
|
20-Jul-2002 |
peter |
Infrastructure tweaks to allow having both an Elf32 and an Elf64 executable handler in the kernel at the same time. Also, allow for the exec_new_vmspace() code to build a different sized vmspace depending on the executable environment. This is a big help for execing i386 binaries on ia64. The ELF exec code grows the ability to map partial pages when there is a page size difference, eg: emulating 4K pages on 8K or 16K hardware pages.
Flesh out the i386 emulation support for ia64. At this point, the only binary that I know of that fails is cvsup, because the cvsup runtime tries to execute code in pages not marked executable.
Obtained from: dfr (mostly, many tweaks from me).
|
#
99981 |
|
14-Jul-2002 |
alc |
In execve(), delay the acquisition of Giant until after kmem_alloc_wait(). (Operations on the exec_map don't require Giant.)
|
#
99895 |
|
13-Jul-2002 |
jhb |
We don't need to clear oldcred here since newcred is not NULL yet.
|
#
99805 |
|
11-Jul-2002 |
alc |
o Lock accesses to the page queues.
|
#
99487 |
|
06-Jul-2002 |
jeff |
Clean up execve locking:
- Grab the vnode object early in exec when we still have the vnode lock. - Cache the object in the image_params. - Make use of the cached object in imgact_*.c
|
#
99232 |
|
01-Jul-2002 |
peter |
#include <sys/ktrace.h> would be useful too. (for ktrace_mtx)
|
#
99226 |
|
01-Jul-2002 |
peter |
Add #include "opt_ktrace.h"
|
#
99072 |
|
29-Jun-2002 |
julian |
Part 1 of KSE-III
The ability to schedule multiple threads per process (one one cpu) by making ALL system calls optionally asynchronous. to come: ia64 and power-pc patches, patches for gdb, test program (in tools)
Reviewed by: Almost everyone who counts (at various times, peter, jhb, matt, alfred, mini, bernd, and a cast of thousands)
NOTE: this is still Beta code, and contains lots of debugging stuff. expect slight instability in signals..
|
#
99009 |
|
28-Jun-2002 |
alfred |
More caddr_t removal, make fo_ioctl take a void * instead of a caddr_t.
|
#
98818 |
|
25-Jun-2002 |
alc |
o Eliminate vmspace::vm_minsaddr. It's initialized but never used. o Replace stale comments in vmspace by "const until freed" annotations on some fields.
|
#
98497 |
|
20-Jun-2002 |
alfred |
Don't leak resources if fdcheckstd() fails during exec.
Submitted by: Mike Makonnen <makonnen@pacbell.net>
|
#
98417 |
|
19-Jun-2002 |
alfred |
Squish the "could sleep with process lock" messages caused by calling uifind() with a proc lock held.
change_ruid() and change_euid() have been modified to take a uidinfo structure which will be pre-allocated by callers, they will then call uihold() on the uidinfo structure so that the caller's logic is simplified.
This allows one to call uifind() before locking the proc struct and thereby avoid a potential blocking allocation with the proc lock held.
This may need revisiting, perhaps keeping a spare uidinfo allocated per process to handle this situation or re-examining if the proc lock needs to be held over the entire operation of changing real or effective user id.
Submitted by: Don Lewis <dl-freebsd@catspoiler.org>
|
#
97997 |
|
07-Jun-2002 |
jhb |
Properly lock accesses to p_tracep and p_traceflag. Also make a few ktrace-only things #ifdef KTRACE that were not before.
|
#
95936 |
|
02-May-2002 |
jhb |
- Reorder execve() so that it performs blocking operations before it locks the process. - Defer other blocking operations such as vrele()'s until after we release locks. - execsigs() now requires the proc lock to be held when it is called rather than locking the process internally.
|
#
95017 |
|
18-Apr-2002 |
nectar |
When exec'ing a set[ug]id program, make sure that the stdio file descriptors (0, 1, 2) are allocated by opening /dev/null for any which are not already open.
Reviewed by: alfred, phk MFC after: 2 days
|
#
93850 |
|
04-Apr-2002 |
peter |
Increase the size of the register stack storage on ia64 from 32K to 2MB so that we can compile gcc. This is a hack because it adds a fixed 2MB to each process's VSIZE regardless of how much is really being used since there is no grow-up stack support. At least it isn't physical memory. Sigh.
Add a sysctl to enable tweaking it for new processes.
|
#
93593 |
|
01-Apr-2002 |
jhb |
Change the suser() API to take advantage of td_ucred as well as do a general cleanup of the API. The entire API now consists of two functions similar to the pre-KSE API. The suser() function takes a thread pointer as its only argument. The td_ucred member of this thread must be valid so the only valid thread pointers are curthread and a few kernel threads such as thread0. The suser_cred() function takes a pointer to a struct ucred as its first argument and an integer flag as its second argument. The flag is currently only used for the PRISON_ROOT flag.
Discussed on: smp@
|
#
93460 |
|
30-Mar-2002 |
alc |
Add a local proc *p in exec_new_vmspace() to avoid repeated dereferencing to obtain it.
|
#
93295 |
|
27-Mar-2002 |
alfred |
Make the reference counting of 'struct pargs' SMP safe.
There is still some locations where the PROC lock should be held in order to prevent inconsistent views from outside (like the proc->p_fd fix for kern/vfs_syscalls.c:checkdirs()) that can be fixed later.
Submitted by: Jonathan Mini <mini@haikugeek.com>
|
#
93240 |
|
26-Mar-2002 |
alc |
Remove an unnecessary and inconsistently used variable from exec_new_vmspace().
|
#
92723 |
|
19-Mar-2002 |
alfred |
Remove __P.
|
#
92461 |
|
16-Mar-2002 |
jake |
Convert all pmap_kenter/pmap_kremove pairs in MI code to use pmap_qenter/ pmap_qremove. pmap_kenter is not safe to use in MI code because it is not guaranteed to flush the mapping from the tlb on all cpus. If the process in question is preempted and migrates cpus between the call to pmap_kenter and pmap_kremove, the original cpu will be left with stale mappings in its tlb. This is currently not a problem for i386 because we do not use PG_G on SMP, and thus all mappings are flushed from the tlb on context switches, not just user mappings. This is not the case on all architectures, and if PG_G is to be used with SMP on i386 it will be a problem. This was committed by peter earlier as part of his fine grained tlb shootdown work for i386, which was backed out for other reasons.
Reviewed by: peter
|
#
91424 |
|
27-Feb-2002 |
imp |
Remove now unused struct proc *p.
Approved by: jhb
|
#
91406 |
|
27-Feb-2002 |
jhb |
Simple p_ucred -> td_ucred changes to start using the per-thread ucred reference.
|
#
91367 |
|
27-Feb-2002 |
peter |
Back out all the pmap related stuff I've touched over the last few days. There is some unresolved badness that has been eluding me, particularly affecting uniprocessor kernels. Turning off PG_G helped (which is a bad sign) but didn't solve it entirely. Userland programs still crashed.
|
#
91344 |
|
27-Feb-2002 |
peter |
Jake further reduced IPI shootdowns on sparc64 in loops by using ranged shootdowns in a couple of key places. Do the same for i386. This also hides some physical addresses from higher levels and has it use the generic vm_page_t's instead. This will help for PAE down the road.
Obtained from: jake (MI code, suggestions for MD part)
|
#
90361 |
|
07-Feb-2002 |
julian |
Pre-KSE/M3 commit. this is a low-functionality change that changes the kernel to access the main thread of a process via the linked list of threads rather than assuming that it is embedded in the process. It IS still embeded there but remove all teh code that assumes that in preparation for the next commit which will actually move it out.
Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,
|
#
89314 |
|
13-Jan-2002 |
alc |
o Call the functions registered with at_exec() from exec_new_vmspace() instead of execve(). Otherwise, the possibility still exists for a pending AIO to modify the new address space.
Reviewed by: alfred
|
#
89306 |
|
13-Jan-2002 |
alfred |
SMP Lock struct file, filedesc and the global file list.
Seigo Tanimura (tanimura) posted the initial delta.
I've polished it quite a bit reducing the need for locking and adapting it for KSE.
Locks:
1 mutex in each filedesc protects all the fields. protects "struct file" initialization, while a struct file is being changed from &badfileops -> &pipeops or something the filedesc should be locked.
1 mutex in each struct file protects the refcount fields. doesn't protect anything else. the flags used for garbage collection have been moved to f_gcflag which was the FILLER short, this doesn't need locking because the garbage collection is a single threaded container. could likely be made to use a pool mutex.
1 sx lock for the global filelist.
struct file * fhold(struct file *fp); /* increments reference count on a file */
struct file * fhold_locked(struct file *fp); /* like fhold but expects file to locked */
struct file * ffind_hold(struct thread *, int fd); /* finds the struct file in thread, adds one reference and returns it unlocked */
struct file * ffind_lock(struct thread *, int fd); /* ffind_hold, but returns file locked */
I still have to smp-safe the fget cruft, I'll get to that asap.
|
#
88633 |
|
29-Dec-2001 |
alfred |
Make AIO a loadable module.
Remove the explicit call to aio_proc_rundown() from exit1(), instead AIO will use at_exit(9).
Add functions at_exec(9), rm_at_exec(9) which function nearly the same as at_exec(9) and rm_at_exec(9), these functions are called on behalf of modules at the time of execve(2) after the image activator has run.
Use a modified version of tegge's suggestion via at_exec(9) to close an exploitable race in AIO.
Fix SYSCALL_MODULE_HELPER such that it's archetecuterally neutral, the problem was that one had to pass it a paramater indicating the number of arguments which were actually the number of "int". Fix it by using an inline version of the AS macro against the syscall arguments. (AS should be available globally but we'll get to that later.)
Add a primative system for dynamically adding kqueue ops, it's really not as sophisticated as it should be, but I'll discuss with jlemon when he's around.
|
#
87593 |
|
10-Dec-2001 |
obrien |
Repeat after me -- "Use of ANSI string concatenation can be bad." In this case, C99's __func__ is properly defined as:
static const char __func__[] = "function-name";
and GCC 3.1 will not allow it to be used in bogus string concatenation.
|
#
86180 |
|
07-Nov-2001 |
peter |
For what its worth, sync up the type of ps_arg_cache_max (unsigned long) with the sysctl type (signed long).
|
#
85598 |
|
27-Oct-2001 |
des |
Add a P_INEXEC flag that indicates that the process has called execve() and it has not yet returned. Use this flag to deny debugging requests while the process is execve()ing, and close once and for all any race conditions that might occur between execve() and various debugging interfaces.
Reviewed by: jhb, rwatson
|
#
85410 |
|
24-Oct-2001 |
robert |
Use vm_offset_t instead of caddr_t to fix a warning and remove two casts.
|
#
85397 |
|
23-Oct-2001 |
dillon |
Fix ktrace enablement/disablement races that can result in a vnode ref count panic.
Bug noticed by: ps Reviewed by: ps MFC after: 1 day
|
#
84783 |
|
10-Oct-2001 |
ps |
Make MAXTSIZ, DFLDSIZ, MAXDSIZ, DFLSSIZ, MAXSSIZ, SGROWSIZ loader tunable.
Reviewed by: peter MFC after: 2 weeks
|
#
84778 |
|
10-Oct-2001 |
dfr |
Move setregs() out from under the PROC_LOCK so that it can use functions list suword() which may trap.
|
#
84729 |
|
09-Oct-2001 |
jhb |
proces -> process in a comment.
|
#
83366 |
|
12-Sep-2001 |
julian |
KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
|
#
82710 |
|
01-Sep-2001 |
dillon |
Pushdown Giant for acct(), kqueue(), kevent(), execve(), fork(), vfork(), rfork(), jail().
|
#
82147 |
|
22-Aug-2001 |
alex |
Fix a simple typo I just happened to find.
|
#
79569 |
|
11-Jul-2001 |
dd |
Correct spelling in a comment and remove trailing newline from a panic() call (panic() adds it itself).
|
#
79480 |
|
09-Jul-2001 |
guido |
Don't share sig handlers after an exec
Reviewed by: Alfred Perlstein
|
#
79224 |
|
04-Jul-2001 |
dillon |
With Alfred's permission, remove vm_mtx in favor of a fine-grained approach (this commit is just the first stage). Also add various GIANT_ macros to formalize the removal of Giant, making it easy to test in a more piecemeal fashion. These macros will allow us to test fine-grained locks to a degree before removing Giant, and also after, and to remove Giant in a piecemeal fashion via sysctl's on those subsystems which the authors believe can operate without Giant.
|
#
78519 |
|
20-Jun-2001 |
jhb |
Fix some lock order reversals where we called free() while holding a proc lock. We now use temporary variables to save the process argument pointer and just update the pointer while holding the lock. We then perform the free on the cached pointer after releasing the lock.
|
#
78371 |
|
16-Jun-2001 |
peter |
Move setugid() a little sooner to before we release tracing in case crdup() or change_e*id() block on malloc() or mutex.
|
#
77242 |
|
26-May-2001 |
rwatson |
o pcred-removal changes included modifications to optimize the setting of the saved uid and gid during execve(). Unfortunately, the optimizations were incorrect in the case where the credential was updated, skipping the setting of the saved uid and gid when new credentials were generated. This change corrects that problem by handling the newcred!=NULL case correctly.
Reported/tested by: David Malone <dwmalone@maths.tcd.ie>
Obtained from: TrustedBSD Project
|
#
77183 |
|
25-May-2001 |
rwatson |
o Merge contents of struct pcred into struct ucred. Specifically, add the real uid, saved uid, real gid, and saved gid to ucred, as well as the pcred->pc_uidinfo, which was associated with the real uid, only rename it to cr_ruidinfo so as not to conflict with cr_uidinfo, which corresponds to the effective uid. o Remove p_cred from struct proc; add p_ucred to struct proc, replacing original macro that pointed. p->p_ucred to p->p_cred->pc_ucred. o Universally update code so that it makes use of ucred instead of pcred, p->p_ucred instead of p->p_pcred, cr_ruidinfo instead of p_uidinfo, cr_{r,sv}{u,g}id instead of p_*, etc. o Remove pcred0 and its initialization from init_main.c; initialize cr_ruidinfo there. o Restruction many credential modification chunks to always crdup while we figure out locking and optimizations; generally speaking, this means moving to a structure like this: newcred = crdup(oldcred); ... p->p_ucred = newcred; crfree(oldcred); It's not race-free, but better than nothing. There are also races in sys_process.c, all inter-process authorization, fork, exec, and exit. o Remove sigio->sio_ruid since sigio->sio_ucred now contains the ruid; remove comments indicating that the old arrangement was a problem. o Restructure exec1() a little to use newcred/oldcred arrangement, and use improved uid management primitives. o Clean up exit1() so as to do less work in credential cleanup due to pcred removal. o Clean up fork1() so as to do less work in credential cleanup and allocation. o Clean up ktrcanset() to take into account changes, and move to using suser_xxx() instead of performing a direct uid==0 comparision. o Improve commenting in various kern_prot.c credential modification calls to better document current behavior. In a couple of places, current behavior is a little questionable and we need to check POSIX.1 to make sure it's "right". More commenting work still remains to be done. o Update credential management calls, such as crfree(), to take into account new ruidinfo reference. o Modify or add the following uid and gid helper routines: change_euid() change_egid() change_ruid() change_rgid() change_svuid() change_svgid() In each case, the call now acts on a credential not a process, and as such no longer requires more complicated process locking/etc. They now assume the caller will do any necessary allocation of an exclusive credential reference. Each is commented to document its reference requirements. o CANSIGIO() is simplified to require only credentials, not processes and pcreds. o Remove lots of (p_pcred==NULL) checks. o Add an XXX to authorization code in nfs_lock.c, since it's questionable, and needs to be considered carefully. o Simplify posix4 authorization code to require only credentials, not processes and pcreds. Note that this authorization, as well as CANSIGIO(), needs to be updated to use the p_cansignal() and p_cansched() centralized authorization routines, as they currently do not take into account some desirable restrictions that are handled by the centralized routines, as well as being inconsistent with other similar authorization instances. o Update libkvm to take these changes into account.
Obtained from: TrustedBSD Project Reviewed by: green, bde, jhb, freebsd-arch, freebsd-audit
|
#
76939 |
|
21-May-2001 |
jhb |
Axe unneeded spl()'s.
|
#
76827 |
|
18-May-2001 |
alfred |
Introduce a global lock for the vm subsystem (vm_mtx).
vm_mtx does not recurse and is required for most low level vm operations.
faults can not be taken without holding Giant.
Memory subsystems can now call the base page allocators safely.
Almost all atomic ops were removed as they are covered under the vm mutex.
Alpha and ia64 now need to catch up to i386's trap handlers.
FFS and NFS have been tested, other filesystems will need minor changes (grabbing the vm lock when twiddling page properties).
Reviewed (partially) by: jake, jhb
|
#
76166 |
|
01-May-2001 |
markm |
Undo part of the tangle of having sys/lock.h and sys/mutex.h included in other "system" header files.
Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files.
Sort sys/*.h includes where possible in affected files.
OK'ed by: bde (with reservations)
|
#
76117 |
|
29-Apr-2001 |
grog |
Revert consequences of changes to mount.h, part 2.
Requested by: bde
|
#
75858 |
|
23-Apr-2001 |
grog |
Correct #includes to work with fixed sys/mount.h.
|
#
73926 |
|
07-Mar-2001 |
jhb |
Proc locking.
|
#
72091 |
|
06-Feb-2001 |
asmodai |
Fix typo: seperate -> separate.
Seperate does not exist in the english language.
|
#
70317 |
|
23-Dec-2000 |
jake |
Protect proc.p_pptr and proc.p_children/p_sibling with the proctree_lock.
linprocfs not locked pending response from informal maintainer.
Reviewed by: jhb, -smp@
|
#
69411 |
|
30-Nov-2000 |
rwatson |
o Add a comment to exec_check_permissions() to indicate that the passed vnode must be locked; this is the case because of calls to VOP_GETATTR(), VOP_ACCESS(), and VOP_OPEN(). This becomes more of an issue when VOP_ACCESS() gets a bit more complicated, which it does when you introduce ACL, Capability, and MAC support.
Obtained from: TrustedBSD Project
|
#
67365 |
|
20-Oct-2000 |
jhb |
Catch up to moving headers: - machine/ipl.h -> sys/ipl.h - machine/mutex.h -> sys/mutex.h
|
#
67015 |
|
12-Oct-2000 |
dfr |
Add a gross hack for ia64 to allocate the backing store for a new program.
|
#
66379 |
|
26-Sep-2000 |
takawata |
Make size of dynamic loader argument variable to support various executable file format.
Reviewed by: peter
|
#
66162 |
|
21-Sep-2000 |
truckman |
Remove unneeded #include that was a remnant of an earlier version of my uidinfo patch.
Found by: phk
|
#
65980 |
|
17-Sep-2000 |
bde |
Added used include of <sys/mutex.h> (don't depend on pollution in <sys/signalvar.h>).
|
#
65770 |
|
12-Sep-2000 |
bp |
Add three new VOPs: VOP_CREATEVOBJECT, VOP_DESTROYVOBJECT and VOP_GETVOBJECT. They will be used by nullfs and other stacked filesystems to support full cache coherency.
Reviewed in general by: mckusick, dillon
|
#
65495 |
|
05-Sep-2000 |
truckman |
Remove uidinfo hash table lookup and maintenance out of chgproccnt() and chgsbsize(), which are called rather frequently and may be called from an interrupt context in the case of chgsbsize(). Instead, do the hash table lookup and maintenance when credentials are changed, which is a lot less frequent. Add pointers to the uidinfo structures to the ucred and pcred structures for fast access. Pass a pointer to the credential to chgproccnt() and chgsbsize() instead of passing the uid. Add a reference count to the uidinfo structure and use it to decide when to free the structure rather than freeing the structure when the resource consumption drops to zero. Move the resource tracking code from kern_proc.c to kern_resource.c. Move some duplicate code sequences in kern_prot.c to separate helper functions. Change KASSERTs in this code to unconditional tests and calls to panic().
|
#
62622 |
|
05-Jul-2000 |
jhb |
Support for unsigned integer and long sysctl variables. Update the SYSCTL_LONG macro to be consistent with other integer sysctl variables and require an initial value instead of assuming 0. Update several sysctl variables to use the unsigned types.
PR: 15251 Submitted by: Kelly Yancey <kbyanc@posi.net>
|
#
59794 |
|
30-Apr-2000 |
phk |
Remove unneeded #include <vm/vm_zone.h>
Generated by: src/tools/tools/kerninclude
|
#
59663 |
|
26-Apr-2000 |
dillon |
Fix #! script exec under linux emulation. If a script is exec'd from a program running under linux emulation, the script binary is checked for in /compat/linux first. Without this patch the wrong script binary (i.e. the FreeBSD binary) will be run instead of the linux binary. For example, #!/bin/sh, thus breaking out of linux compatibility mode.
This solves a number of problems people have had installing linux software on FreeBSD boxes.
|
#
59368 |
|
18-Apr-2000 |
phk |
Remove unneeded <sys/buf.h> includes.
Due to some interesting cpp tricks in lockmgr, the LINT kernel shrinks by 924 bytes.
|
#
59288 |
|
16-Apr-2000 |
jlemon |
Introduce kqueue() and kevent(), a kernel event notification facility.
|
#
56313 |
|
20-Jan-2000 |
imp |
When we are execing a setugid program, and we have a procfs filesystem file open in one of the special file descriptors (0, 1, or 2), close it before completing the exec.
Submitted by: nergal@idea.avet.com.pl Constructive comments: deraadt@openbsd.org, sef, peter, jkh
|
#
55141 |
|
27-Dec-1999 |
bde |
Changed the type used to represent the user stack pointer from `long *' to `register_t *'. This fixes bugs like misplacement of argc and argv on the user stack on i386's with 64-bit longs. We still use longs to represent "words" like argc and argv, and assume that they are on the stack (and that there is stack). The suword() and fuword() families should also use register_t.
|
#
54655 |
|
15-Dec-1999 |
eivind |
Introduce NDFREE (and remove VOP_ABORTOP)
|
#
53709 |
|
26-Nov-1999 |
phk |
Add a sysctl to control if argv is disclosed to the world: kern.ps_argsopen It defaults to 1 which means that all users can see all argvs in ps(1).
Reviewed by: Warner
|
#
53239 |
|
16-Nov-1999 |
phk |
Introduce commandline caching in the kernel.
This fixes some nasty procfs problems for SMP, makes ps(1) run much faster, and makes ps(1) even less dependent on /proc which will aid chroot and jails alike.
To disable this facility and revert to previous behaviour: sysctl -w kern.ps_arg_cache_limit=0
For full details see the current@FreeBSD.org mail-archives.
|
#
52635 |
|
29-Oct-1999 |
phk |
useracc() the prequel:
Merge the contents (less some trivial bordering the silly comments) of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts the #defines for the vm_inherit_t and vm_prot_t types next to their typedefs.
This paves the road for the commit to follow shortly: change useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE} as argument.
|
#
50477 |
|
27-Aug-1999 |
peter |
$Id$ -> $FreeBSD$
|
#
49636 |
|
11-Aug-1999 |
imp |
Stop profiling on exec.
Obtained from: NetBSD
|
#
46112 |
|
27-Apr-1999 |
phk |
Suser() simplification:
1: s/suser/suser_xxx/
2: Add new function: suser(struct proc *), prototyped in <sys/proc.h>.
3: s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/
The remaining suser_xxx() calls will be scrutinized and dealt with later.
There may be some unneeded #include <sys/cred.h>, but they are left as an exercise for Bruce.
More changes to the suser() API will come along with the "jail" code.
|
#
45821 |
|
19-Apr-1999 |
peter |
unifdef -DVM_STACK - it's been on for a while for x86 and was checked and appeared to be working for the Alpha some time ago.
|
#
45270 |
|
03-Apr-1999 |
jdp |
Restore support for executing BSD/OS binaries on the i386 by passing the address of the ps_strings structure to the process via %ebx. For other kinds of binaries, %ebx is still zeroed as before.
Submitted by: Thomas Stephens <tas@stephens.org> Reviewed by: jdp
|
#
44146 |
|
19-Feb-1999 |
luoqi |
Hide access to vmspace:vm_pmap with inline function vmspace_pmap(). This is the preparation step for moving pmap storage out of vmspace proper.
Reviewed by: Alan Cox <alc@cs.rice.edu> Matthew Dillion <dillon@apollo.backplane.com>
|
#
43311 |
|
27-Jan-1999 |
dillon |
Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile
|
#
43301 |
|
27-Jan-1999 |
dillon |
Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile
|
#
42360 |
|
06-Jan-1999 |
julian |
Add (but don't activate) code for a special VM option to make downward growing stacks more general. Add (but don't activate) code to use the new stack facility when running threads, (specifically the linux threads support). This allows people to use both linux compiled linuxthreads, and also the native FreeBSD linux-threads port.
The code is conditional on VM_STACK. Not using this will produce the old heavily tested system.
Submitted by: Richard Seaman <dick@tar.com>
|
#
42175 |
|
30-Dec-1998 |
dfr |
Various changes to support OSF1 emulation:
* Move the user stack from VM_MAXUSER_ADDRESS to a place below the 32bit boundary (needed to support 32bit OSF programs). This should also save one pagetable per process. * Add cvtqlsv to the set of instructions handled by the floating point software completion code. * Disable all floating point exceptions by default. * A minor change to execve to allow the OSF1 image activator to support dynamic loading.
|
#
42095 |
|
27-Dec-1998 |
dfr |
Fix some 64bit truncation problems which crept into SYSCTL_LONG() with the last cleanup. Since the oid_arg2 field of struct sysctl_oid is not wide enough to hold a long, the SYSCTL_LONG() macro has been modified to only support exporting long variables by pointer instead of by value.
Reviewed by: bde
|
#
41871 |
|
16-Dec-1998 |
bde |
Removed the cast to a pointer in the definition of PS_STRINGS and adjusted related casts to match (only in the kernel in this commit). The pointer was only wanted in one place in kern_exec.c. Applications should use the kern.ps_strings sysctl instead of PS_STRINGS, so they shouldn't notice this change.
|
#
41870 |
|
16-Dec-1998 |
bde |
Removed all traces of SYSCTL_INTPTR(). Pointers can't really be passed across the kernel -> application interface, and for the one sysctl where they were passed and actually used (kern.ps_strings), the applications want addresses represented as u_longs anyway (the other sysctl that passed them, kern.usrstack, has never been used).
Agreed to by: dfr, phk
|
#
40700 |
|
28-Oct-1998 |
dg |
Added a second argument, "activate" to the vm_page_unwire() call so that the caller can select either inactive or active queue to put the page on.
|
#
40435 |
|
16-Oct-1998 |
peter |
*gulp*. Jordan specifically OK'ed this..
This is the bulk of the support for doing kld modules. Two linker_sets were replaced by SYSINIT()'s. VFS's and exec handlers are self registered. kld is now a superset of lkm. I have converted most of them, they will follow as a seperate commit as samples. This all still works as a static a.out kernel using LKM's.
|
#
38799 |
|
04-Sep-1998 |
dfr |
Cosmetic changes to the PAGE_XXX macros to make them consistent with the other objects in vm.
|
#
38517 |
|
24-Aug-1998 |
dfr |
Change various syscalls to use size_t arguments instead of u_int.
Add some overflow checks to read/write (from bde).
Change all modifications to vm_page::flags, vm_page::busy, vm_object::flags and vm_object::paging_in_progress to use operations which are not interruptable.
Reviewed by: Bruce Evans <bde@zeta.org.au>
|
#
37662 |
|
15-Jul-1998 |
bde |
Cast between longs and pointers via intptr_t. The results of fuword() should be checked before casting. The results of suword() should be checked.
|
#
36735 |
|
07-Jun-1998 |
dfr |
This commit fixes various 64bit portability problems required for FreeBSD/alpha. The most significant item is to change the command argument to ioctl functions from int to u_long. This change brings us inline with various other BSD versions. Driver writers may like to use (__FreeBSD_version == 300003) to detect this change.
The prototype FreeBSD/alpha machdep will follow in a couple of days time.
|
#
35256 |
|
17-Apr-1998 |
des |
Seventy-odd "its" / "it's" typos in comments fixed as per kern/6108.
|
#
34234 |
|
08-Mar-1998 |
dyson |
Free the first page also if it is not valid.
|
#
34206 |
|
07-Mar-1998 |
dyson |
This mega-commit is meant to fix numerous interrelated problems. There has been some bitrot and incorrect assumptions in the vfs_bio code. These problems have manifest themselves worse on NFS type filesystems, but can still affect local filesystems under certain circumstances. Most of the problems have involved mmap consistancy, and as a side-effect broke the vfs.ioopt code. This code might have been committed seperately, but almost everything is interrelated.
1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that are fully valid. 2) Rather than deactivating erroneously read initial (header) pages in kern_exec, we now free them. 3) Fix the rundown of non-VMIO buffers that are in an inconsistent (missing vp) state. 4) Fix the disassociation of pages from buffers in brelse. The previous code had rotted and was faulty in a couple of important circumstances. 5) Remove a gratuitious buffer wakeup in vfs_vmio_release. 6) Remove a crufty and currently unused cluster mechanism for VBLK files in vfs_bio_awrite. When the code is functional, I'll add back a cleaner version. 7) The page busy count wakeups assocated with the buffer cache usage were incorrectly cleaned up in a previous commit by me. Revert to the original, correct version, but with a cleaner implementation. 8) The cluster read code now tries to keep data associated with buffers more aggressively (without breaking the heuristics) when it is presumed that the read data (buffers) will be soon needed. 9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The delay loop waiting is not useful for filesystem locks, due to the length of the time intervals. 10) Correct and clean-up spec_getpages. 11) Implement a fully functional nfs_getpages, nfs_putpages. 12) Fix nfs_write so that modifications are coherent with the NFS data on the server disk (at least as well as NFS seems to allow.) 13) Properly support MS_INVALIDATE on NFS. 14) Properly pass down MS_INVALIDATE to lower levels of the VM code from vm_map_clean. 15) Better support the notion of pages being busy but valid, so that fewer in-transit waits occur. (use p->busy more for pageouts instead of PG_BUSY.) Since the page is fully valid, it is still usable for reads. 16) It is possible (in error) for cached pages to be busy. Make the page allocation code handle that case correctly. (It should probably be a printf or panic, but I want the system to handle coding errors robustly. I'll probably add a printf.) 17) Correct the design and usage of vm_page_sleep. It didn't handle consistancy problems very well, so make the design a little less lofty. After vm_page_sleep, if it ever blocked, it is still important to relookup the page (if the object generation count changed), and verify it's status (always.) 18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up. 19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush. 20) Fix vm_pager_put_pages and it's descendents to support an int flag instead of a boolean, so that we can pass down the invalidate bit.
|
#
33983 |
|
02-Mar-1998 |
peter |
Update the ELF image activator to use some of the exec resources rather than rolling it's own. This means that it now uses the "safe" exec_map_first_page() to get the ld.so headers rather than risking a panic on a page fault failure (eg: NFS server goes down). Since all the ELF tools go to a lot of trouble to make sure everything lives in the first page for executables, this is a win. I have not seen any ELF executable on any system where all the headers didn't fit in the first page with lots of room to spare. I have been running variations of this code for some time on my pure ELF systems.
|
#
33834 |
|
25-Feb-1998 |
bde |
Removed unused #includes.
|
#
33134 |
|
06-Feb-1998 |
eivind |
Back out DIAGNOSTIC changes.
|
#
33109 |
|
05-Feb-1998 |
dyson |
1) Start using a cleaner and more consistant page allocator instead of the various ad-hoc schemes. 2) When bringing in UPAGES, the pmap code needs to do another vm_page_lookup. 3) When appropriate, set the PG_A or PG_M bits a-priori to both avoid some processor errata, and to minimize redundant processor updating of page tables. 4) Modify pmap_protect so that it can only remove permissions (as it originally supported.) The additional capability is not needed. 5) Streamline read-only to read-write page mappings. 6) For pmap_copy_page, don't enable write mapping for source page. 7) Correct and clean-up pmap_incore. 8) Cluster initial kern_exec pagin. 9) Removal of some minor lint from kern_malloc. 10) Correct some ioopt code. 11) Remove some dead code from the MI swapout routine. 12) Correct vm_object_deallocate (to remove backing_object ref.) 13) Fix dead object handling, that had problems under heavy memory load. 14) Add minor vm_page_lookup improvements. 15) Some pages are not in objects, and make sure that the vm_page.c can properly support such pages. 16) Add some more page deficit handling. 17) Some minor code readability improvements.
|
#
33108 |
|
04-Feb-1998 |
eivind |
Turn DIAGNOSTIC into a new-style option.
|
#
32446 |
|
11-Jan-1998 |
dyson |
Implement the first page access for object type determination more VM clean. Also, use vm_map_insert instead of vm_mmap. Reviewed by: dg@freebsd.org
|
#
32286 |
|
06-Jan-1998 |
dyson |
Make our v_usecount vnode reference count work identically to the original BSD code. The association between the vnode and the vm_object no longer includes reference counts. The major difference is that vm_object's are no longer freed gratuitiously from the vnode, and so once an object is created for the vnode, it will last as long as the vnode does.
When a vnode object reference count is incremented, then the underlying vnode reference count is incremented also. The two "objects" are now more intimately related, and so the interactions are now much less complex.
When vnodes are now normally placed onto the free queue with an object still attached. The rundown of the object happens at vnode rundown time, and happens with exactly the same filesystem semantics of the original VFS code. There is absolutely no need for vnode_pager_uncache and other travesties like that anymore.
A side-effect of these changes is that SMP locking should be much simpler, the I/O copyin/copyout optimizations work, NFS should be more ponderable, and further work on layered filesystems should be less frustrating, because of the totally coherent management of the vnode objects and vnodes.
Please be careful with your system while running this code, but I would greatly appreciate feedback as soon a reasonably possible.
|
#
32011 |
|
27-Dec-1997 |
bde |
Unspammed nested include of <vm/vm_zone.h>.
|
#
31891 |
|
20-Dec-1997 |
sef |
Clear the p_stops field on change of user/group id, unless the correct flag is set in the p_pfsflags field. This, essentially, prevents an SUID proram from hanging after being traced. (E.g., "truss /usr/bin/rlogin" would fail, but leave rlogin in a stopevent state.) Yet another case where procctl is (hopefully ;)) no longer needed in the general case.
Reviewed by: bde (thanks bruce :))
|
#
31774 |
|
16-Dec-1997 |
dg |
Fix bug where a struct buf was free()'d back to the system malloc pool. Quite amazing that the system runs at all with this bug. Also present in 2.2.5. The bug appears to have come in with changes in rev 1.53.
PR: might fix PR#5313 Submitted by: bde
|
#
31564 |
|
06-Dec-1997 |
sef |
Changes to allow event-based process monitoring and control.
|
#
30994 |
|
06-Nov-1997 |
phk |
Move the "retval" (3rd) parameter from all syscall functions and put it in struct proc instead.
This fixes a boatload of compiler warning, and removes a lot of cruft from the sources.
I have not removed the /*ARGSUSED*/, they will require some looking at.
libkvm, ps and other userland struct proc frobbing programs will need recompiled.
|
#
30451 |
|
15-Oct-1997 |
guido |
On execing a sgid program, do not set P_SUGID when cr_gid and cr)_uid do not change. PR: 4755 Reviewed by: Bruce Evans
|
#
29653 |
|
21-Sep-1997 |
dyson |
Change the M_NAMEI allocations to use the zone allocator. This change plus the previous changes to use the zone allocator decrease the useage of malloc by half. The Zone allocator will be upgradeable to be able to use per CPU-pools, and has more intelligent usage of SPLs. Additionally, it has reasonable stats gathering capabilities, while making most calls inline.
|
#
29041 |
|
02-Sep-1997 |
bde |
Removed unused #includes.
|
#
27883 |
|
04-Aug-1997 |
dg |
Fixed security hole with sharing the file descriptor table (via rfork) when execing a setuid/setgid binary. Code submitted by Sean Eric Fagan (sef@freebsd.org). Also consolidated the setuid/setgid checks into one place. Reviewed by: dyson,sef
|
#
25115 |
|
23-Apr-1997 |
ache |
Don't clobber user space argv0 memory on shell exec, mainly for vfork() Fix another bug: if argv[0] is NULL, garbadge args might be added for shell script Submitted by: Tor Egge <Tor.Egge@idi.ntnu.no> (with yet one fault detect from me)
|
#
24994 |
|
18-Apr-1997 |
dg |
Brought fix from the 2.2 branch forward (see rev 1.47.2.7): serious bugs with reading the image header.
|
#
24849 |
|
13-Apr-1997 |
dyson |
Correct the previous thread-fix commit. I made a clerical error.
|
#
24848 |
|
12-Apr-1997 |
dyson |
Fully implement vfork. Vfork is now much much faster than even our fork. (On my machine, fork is about 240usecs, vfork is 78usecs.)
Implement rfork(!RFPROC !RFMEM), which allows a thread to divorce its memory from the other threads of a group.
Implement rfork(!RFPROC RFCFDG), which closes all file descriptors, eliminating possible existing shares with other threads/processes.
Implement rfork(!RFPROC RFFDG), which divorces the file descriptors for a thread from the rest of the group.
Fix the case where a thread does an exec. It is almost nonsense for a thread to modify the other threads address space by an exec, so we now automatically divorce the address space before modifying it.
|
#
24830 |
|
12-Apr-1997 |
dyson |
Effectively remove the previous commit to fix threads forking. The change was a false-start, and needs more work.
|
#
24828 |
|
11-Apr-1997 |
dyson |
Allow a kernel-supported process thread to do an exec without blasting away the VM space of all of the other, associated threads.
|
#
24617 |
|
04-Apr-1997 |
dg |
Killed unnecessary vp == NULL check after namei.
|
#
24614 |
|
04-Apr-1997 |
dg |
Oops, only free component name buffer if namei() didn't. This bug has been in here since I wrote the code 3 years ago! Thanks, Bruce!
Submitted by: bde
|
#
24609 |
|
04-Apr-1997 |
dg |
Various fixes:
1. imgp->image_header needs to be cleared for the bp == NULL && `goto interpret' case, else exec_fail_dealloc would free it twice after an error.
2. Moved the vp->v_writecount check in exec_check_permissions() to near the end. This fixes execve("/dev/null", ...) returning the bogus errno ETXTBSY. ETXTBSY is still returned for attempts to exec interpreted files that are open for writing. The man page is very old and wrong here. It says that ETXTBSY is for pure procedure (shared text) files that are open for writing or reading.
3. Moved the setuid disabling in exec_check_permissions() to the end. Cosmetic. It's more natural to dispose of all the error cases first.
...plus a couple of other cosmetic changes.
Submitted by: bde
|
#
24603 |
|
03-Apr-1997 |
dg |
Lose the vnode lock on a permissions failure.
Submitted by: Tor Egge <Tor.Egge@idi.ntnu.no>
|
#
24437 |
|
31-Mar-1997 |
dg |
Changed the way that the exec image header is read to be filesystem- centric rather than VM-centric to fix a problem with errors not being detectable when the header is read. Killed exech_map as a result of these changes. There appears to be no performance difference with this change.
|
#
22975 |
|
22-Feb-1997 |
peter |
Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
|
#
22913 |
|
19-Feb-1997 |
dg |
Fix from PR #2757:
execve() clears the P_SUGID process flag in execve() if the binary executed does not have suid or sgid permission bits set.
This also happens when the effective uid is different from the real uid or the effective gid is different from the real gid. Under these circumstances, the process still has set id privileges and the P_SUGID flag should not be cleared.
Submitted by: Tor Egge <Tor.Egge@idt.ntnu.no>
|
#
22521 |
|
10-Feb-1997 |
dyson |
This is the kernel Lite/2 commit. There are some requisite userland changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes.
The system boots and can mount UFS filesystems.
Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed.
Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>
|
#
21673 |
|
14-Jan-1997 |
jkh |
Make the long-awaited change from $Id$ to $FreeBSD$
This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
|
#
19556 |
|
09-Nov-1996 |
dyson |
Fix an ordering bug -- pmap_remove_pages should be called BEFORE vm_map_remove, not after...
2.2-RELEASE candidate.
|
#
18897 |
|
12-Oct-1996 |
dyson |
Performance optimizations. One of which was meant to go in before the previous snap. Specifically, kern_exit and kern_exec now makes a call into the pmap module to do a very fast removal of pages from the address space. Additionally, the pmap module now updates the PG_MAPPED and PG_WRITABLE flags. This is an optional optimization, but helpful on the X86.
|
#
17334 |
|
30-Jul-1996 |
dyson |
Backed out the recent changes/enhancements to the VM code. The problem with the 'shell scripts' was found, but there was a 'strange' problem found with a 486 laptop that we could not find. This commit backs the code back to 25-jul, and will be re-entered after the snapshot in smaller (more easily tested) chunks.
|
#
17294 |
|
27-Jul-1996 |
dyson |
This commit is meant to solve a couple of VM system problems or performance issues.
1) The pmap module has had too many inlines, and so the object file is simply bigger than it needs to be. Some common code is also merged into subroutines. 2) Removal of some *evil* PHYS_TO_VM_PAGE macro calls. Unfortunately, a few have needed to be added also. The removal caused the need for more vm_page_lookups. I added lookup hints to minimize the need for the page table lookup operations. 3) Removal of some bogus performance improvements, that mostly made the code more complex (tracking individual page table page updates unnecessarily). Those improvements actually hurt 386 processors perf (not that people who worry about perf use 386 processors anymore :-)). 4) Changed pv queue manipulations/structures to be TAILQ's. 5) The pv queue code has had some performance problems since day one. Some significant scalability issues are resolved by threading the pv entries from the pmap AND the physical address instead of just the physical address. This makes certain pmap operations run much faster. This does not affect most micro-benchmarks, but should help loaded system performance *significantly*. DG helped and came up with most of the solution for this one. 6) Most if not all pmap bit operations follow the pattern: pmap_test_bit(); pmap_clear_bit(); That made for twice the necessary pv list traversal. The pmap interface now supports only pmap_tc_bit type operations: pmap_[test/clear]_modified, pmap_[test/clear]_referenced. Additionally, the modified routine now takes a vm_page_t arg instead of a phys address. This eliminates a PHYS_TO_VM_PAGE operation. 7) Several rewrites of routines that contain redundant code to use common routines, so that there is a greater likelihood of keeping the cache footprint smaller.
|
#
17108 |
|
12-Jul-1996 |
bde |
Don't use NULL in non-pointer contexts.
|
#
16085 |
|
03-Jun-1996 |
dg |
Use kmem_alloc_wait/kmem_free_wakeup() to avoid allocation failures from running out of string space in the exec_map.
|
#
16084 |
|
03-Jun-1996 |
dg |
Fix declaration of ps_strings.
|
#
15809 |
|
18-May-1996 |
dyson |
This set of commits to the VM system does the following, and contain contributions or ideas from Stephen McKay <syssgm@devetir.qld.gov.au>, Alan Cox <alc@cs.rice.edu>, David Greenman <davidg@freebsd.org> and me:
More usage of the TAILQ macros. Additional minor fix to queue.h. Performance enhancements to the pageout daemon. Addition of a wait in the case that the pageout daemon has to run immediately. Slightly modify the pageout algorithm. Significant revamp of the pmap/fork code: 1) PTE's and UPAGES's are NO LONGER in the process's map. 2) PTE's and UPAGES's reside in their own objects. 3) TOTAL elimination of recursive page table pagefaults. 4) The page directory now resides in the PTE object. 5) Implemented pmap_copy, thereby speeding up fork time. 6) Changed the pv entries so that the head is a pointer and not an entire entry. 7) Significant cleanup of pmap_protect, and pmap_remove. 8) Removed significant amounts of machine dependent fork code from vm_glue. Pushed much of that code into the machine dependent pmap module. 9) Support more completely the reuse of already zeroed pages (Page table pages and page directories) as being already zeroed. Performance and code cleanups in vm_map: 1) Improved and simplified allocation of map entries. 2) Improved vm_map_copy code. 3) Corrected some minor problems in the simplify code. Implemented splvm (combo of splbio and splimp.) The VM code now seldom uses splhigh. Improved the speed of and simplified kmem_malloc. Minor mod to vm_fault to avoid using pre-zeroed pages in the case of objects with backing objects along with the already existant condition of having a vnode. (If there is a backing object, there will likely be a COW... With a COW, it isn't necessary to start with a pre-zeroed page.) Minor reorg of source to perhaps improve locality of ref.
|
#
15494 |
|
01-May-1996 |
bde |
Removed unnecessary #includes from <sys/imgact.h> so that it is self-sufficient and added explicit #includes where required.
|
#
15448 |
|
29-Apr-1996 |
smpatel |
Fixed two typos in the comment.
Pointed out by: davidg
|
#
15129 |
|
07-Apr-1996 |
dg |
Killed sections 3 and 4 of my copyright as I don't agree with it (I believe it to be unnecessarily restrictive). For tty_subr.c, update to my standard copyright.
|
#
14456 |
|
10-Mar-1996 |
sos |
First attempt at FreeBSD & Linux ELF support.
Compile and link a new kernel, that will give native ELF support, and provide the hooks for other ELF interpreters as well.
To make native ELF binaries use John Polstras elf-kit-1.0.1.. For the time being also use his ld-elf.so.1 and put it in /usr/libexec.
The Linux emulator has been enhanced to also run ELF binaries, it is however in its very first incarnation. Just get some Linux ELF libs (Slackware-3.0) and put them in the prober place (/compat/linux/...). I've ben able to run all the Slackware-3.0 binaries I've tried so far. (No it won't run quake yet :)
|
#
14331 |
|
02-Mar-1996 |
peter |
Mega-commit for Linux emulator update.. This has been stress tested under netscape-2.0 for Linux running all the Java stuff. The scrollbars are now working, at least on my machine. (whew! :-)
I'm uncomfortable with the size of this commit, but it's too inter-dependant to easily seperate out.
The main changes:
COMPAT_LINUX is *GONE*. Most of the code has been moved out of the i386 machine dependent section into the linux emulator itself. The int 0x80 syscall code was almost identical to the lcall 7,0 code and a minor tweak allows them to both be used with the same C code. All kernels can now just modload the lkm and it'll DTRT without having to rebuild the kernel first. Like IBCS2, you can statically compile it in with "options LINUX".
A pile of new syscalls implemented, including getdents(), llseek(), readv(), writev(), msync(), personality(). The Linux-ELF libraries want to use some of these.
linux_select() now obeys Linux semantics, ie: returns the time remaining of the timeout value rather than leaving it the original value.
Quite a few bugs removed, including incorrect arguments being used in syscalls.. eg: mixups between passing the sigset as an int, vs passing it as a pointer and doing a copyin(), missing return values, unhandled cases, SIOC* ioctls, etc.
The build for the code has changed. i386/conf/files now knows how to build linux_genassym and generate linux_assym.h on the fly.
Supporting changes elsewhere in the kernel:
The user-mode signal trampoline has moved from the U area to immediately below the top of the stack (below PS_STRINGS). This allows the different binary emulations to have their own signal trampoline code (which gets rid of the hardwired syscall 103 (sigreturn on BSD, syslog on Linux)) and so that the emulator can provide the exact "struct sigcontext *" argument to the program's signal handlers.
The sigstack's "ss_flags" now uses SS_DISABLE and SS_ONSTACK flags, which have the same values as the re-used SA_DISABLE and SA_ONSTACK which are intended for sigaction only. This enables the support of a SA_RESETHAND flag to sigaction to implement the gross SYSV and Linux SA_ONESHOT signal semantics where the signal handler is reset when it's triggered.
makesyscalls.sh no longer appends the struct sysentvec on the end of the generated init_sysent.c code. It's a lot saner to have it in a seperate file rather than trying to update the structure inside the awk script. :-)
At exec time, the dozen bytes or so of signal trampoline code are copied to the top of the user's stack, rather than obtaining the trampoline code the old way by getting a clone of the parent's user area. This allows Linux and native binaries to freely exec each other without getting trampolines mixed up.
|
#
14235 |
|
24-Feb-1996 |
peter |
Add two sysctl variables that can be read by libutil and libkvm so that they can adapt to simple kernel VM layout changes.
|
#
13523 |
|
20-Jan-1996 |
bde |
Removed stale #includes of "opt_sysvipc.h".
|
#
13490 |
|
19-Jan-1996 |
dyson |
Eliminated many redundant vm_map_lookup operations for vm_mmap. Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish overhead for merged cache. Efficiency improvement for vfs_cluster. It used to do alot of redundant calls to cluster_rbuild. Correct the ordering for vrele of .text and release of credentials. Use the selective tlb update for 486/586/P6. Numerous fixes to the size of objects allocated for files. Additionally, fixes in the various pagers. Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs. Fixes in the swap pager for exhausted resources. The pageout code will not as readily thrash. Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE), thereby improving efficiency of several routines. Eliminate even more unnecessary vm_page_protect operations. Significantly speed up process forks. Make vm_object_page_clean more efficient, thereby eliminating the pause that happens every 30seconds. Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the case of filesystems mounted async. Fix a panic with busy pages when write clustering is done for non-VMIO buffers.
|
#
13332 |
|
08-Jan-1996 |
peter |
(gulp!) reran makesyscalls..
sysv_ipc.c: add stub functions that either simply return (for the hooks in kern_fork/kern_exit) or log() a messgae and call enosys() (for the syscalls). sysv_ipc.c will become "standard" in conf/files and has #ifs for all the permutations.
|
#
13226 |
|
04-Jan-1996 |
wollman |
Convert SYSV IPC to new-style options. (I hope I got everything...) The LKMs will need an extra file, to come later.
|
#
12819 |
|
14-Dec-1995 |
phk |
A Major staticize sweep. Generates a couple of warnings that I'll deal with later. A number of unused vars removed. A number of unused procs removed or #ifdefed.
|
#
12679 |
|
09-Dec-1995 |
peter |
Reorganise ps_strings in order to gain BSD/OS 2.0 binary compatability. This is now in line with NetBSD as well..
Note that once this series of commits is finished, you must recompile libkvm, then ps and maybe 'w'. If you are running the recently imported sendmail-8.7, you should recompile that too (src/conf.c at least).
|
#
12662 |
|
07-Dec-1995 |
dg |
Untangled the vm.h include file spaghetti.
|
#
12258 |
|
13-Nov-1995 |
dg |
Use kmem_alloc_pageable/kmem_free to allocate memory instead of individual VM map functions.
|
#
12221 |
|
12-Nov-1995 |
bde |
Included <sys/sysproto.h> to get central declarations for syscall args structs and prototypes for syscalls.
Ifdefed duplicated decentralized declarations of args structs. It's convenient to have this visible but they are hard to maintain. Some are already different from the central declarations. 4.4lite2 puts them in comments in the function headers but I wanted to avoid the large changes for that.
|
#
12130 |
|
06-Nov-1995 |
dg |
All: Changed vnodep -> vp for consistency with the rest of the kernel, and changed iparams -> imgp for brevity.
kern_exec.c: Explicitly initialized some additional parts of the image_params struct to avoid bzeroing it. Rewrote the set-id code to reduce the number of logical tests. The rewrite exposed a mostly benign bug in the algorithm: traced set-id images would get ktracing disabled even if the set-id didn't happen for other reasons.
|
#
11607 |
|
21-Oct-1995 |
dg |
Killed a few gratuitous #include's.
|
#
11332 |
|
07-Oct-1995 |
swallace |
Remove prototype definitions from <sys/systm.h>. Prototypes are located in <sys/sysproto.h>.
Add appropriate #include <sys/sysproto.h> to files that needed protos from systm.h.
Add structure definitions to appropriate files that relied on sys/systm.h, right before system call definition, as in the rest of the kernel source.
In kern_prot.c, instead of using the dummy structure "args", create individual dummy structures named <syscall>_args. This makes life easier for prototype generation.
|
#
10221 |
|
24-Aug-1995 |
dg |
Moved setting of VTEXT flag into the appropriate image activators. This fixes a bug where linux binaries would get the flag set inappropriately.
|
#
8876 |
|
30-May-1995 |
rgrimes |
Remove trailing whitespace.
|
#
7342 |
|
24-Mar-1995 |
dg |
Use 'p' rather than 'curproc' when appropriate.
|
#
7341 |
|
24-Mar-1995 |
dg |
Use NDINIT macro to initialize fields for namei.
|
#
7177 |
|
19-Mar-1995 |
dg |
Fixed bug introduced in the previous commit - the lock must be held until after the call to exec_check_permissions().
|
#
7176 |
|
19-Mar-1995 |
dg |
Lose the lock on the vnode. Changes to implement proper locking in the vnode pager now require this.
Submitted by: John Dyson
|
#
7090 |
|
16-Mar-1995 |
bde |
Add and move declarations to fix all of the warnings from `gcc -Wimplicit' (except in netccitt, netiso and netns) and most of the warnings from `gcc -Wnested-externs'. Fix all the bugs found. There were no serious ones.
|
#
6983 |
|
10-Mar-1995 |
dg |
Removed some #include's of unnecessary include files.
|
#
6792 |
|
01-Mar-1995 |
dg |
No longer assume that a process's address space can be directly written to.
|
#
6579 |
|
20-Feb-1995 |
dg |
Use of vm_allocate() and vm_deallocate() has been deprecated.
|
#
6380 |
|
14-Feb-1995 |
sos |
First attempt to run linux binaries. This is only the changes needed to the generic kernel. The actual emulator is a separate LKM. (not finished yet, sorry). Submitted by: sos@freebsd.org & sef@kithrup.com
|
#
5455 |
|
09-Jan-1995 |
dg |
These changes embody the support of the fully coherent merged VM buffer cache, much higher filesystem I/O performance, and much better paging performance. It represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are (mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to support the new VM/buffer scheme.
vfs_bio.c: Significant rewrite of most of vfs_bio to support the merged VM buffer cache scheme. The scheme is almost fully compatible with the old filesystem interface. Significant improvement in the number of opportunities for write clustering.
vfs_cluster.c, vfs_subr.c Upgrade and performance enhancements in vfs layer code to support merged VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c: Yet more improvements in the collapse code. Elimination of some windows that can cause list corruption.
vm_pageout.c: Fixed it, it really works better now. Somehow in 2.0, some "enhancements" broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of kernel PTs.
vm_glue.c Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the code doesn't need it anymore.
machdep.c Changes to better support the parameter values for the merged VM/buffer cache scheme.
machdep.c, kern_exec.c, vm_glue.c Implemented a seperate submap for temporary exec string space and another one to contain process upages. This eliminates all map fragmentation problems that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on busy buffers.
Submitted by: John Dyson and David Greenman
|
#
3308 |
|
02-Oct-1994 |
phk |
All of this is cosmetic. prototypes, #includes, printfs and so on. Makes GCC a lot more silent.
|
#
3098 |
|
25-Sep-1994 |
phk |
While in the real world, I had a bad case of being swapped out for a lot of cycles. While waiting there I added a lot of the extra ()'s I have, (I have never used LISP to any extent). So I compiled the kernel with -Wall and shut up a lot of "suggest you add ()'s", removed a bunch of unused var's and added a couple of declarations here and there. Having a lap-top is highly recommended. My kernel still runs, yell at me if you kernel breaks.
|
#
3053 |
|
24-Sep-1994 |
dg |
Added support for p_textvp which stores the vnode pointer of the execed binary.
|
#
2756 |
|
14-Sep-1994 |
dg |
Added missing #ifdef SYSVSHM.
|
#
2729 |
|
13-Sep-1994 |
dfr |
Added SYSV ipcs.
Obtained from: NetBSD and FreeBSD-1.1.5
|
#
2252 |
|
24-Aug-1994 |
dg |
Pay attention to *all* errors from copyinstr(). This patch fixes a bug that causes a no-panic instant reboot when bogus argv/envvs are fed to execve().
|
#
2112 |
|
18-Aug-1994 |
wollman |
Fix up some sloppy coding practices:
- Delete redundant declarations. - Add -Wredundant-declarations to Makefile.i386 so they don't come back. - Delete sloppy COMMON-style declarations of uninitialized data in header files. - Add a few prototypes. - Clean up warnings resulting from the above.
NB: ioconf.c will still generate a redundant-declaration warning, which is unavoidable unless somebody volunteers to make `config' smarter.
|
#
1886 |
|
06-Aug-1994 |
dg |
Implemented support for the "ps_strings" structure (grrrr...) for use in the userland library libkvm.
|
#
1549 |
|
25-May-1994 |
rgrimes |
The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.
Reviewed by: Rodney W. Grimes Submitted by: John Dyson and David Greenman
|
#
1542 |
|
24-May-1994 |
rgrimes |
This commit was generated by cvs2svn to compensate for changes in r1541, which included commits to RCS files with non-trunk default branches.
|
#
1541 |
|
24-May-1994 |
rgrimes |
BSD 4.4 Lite Kernel Sources
|