Cross Reference: /freebsd-10.0-release/sys/sys/syscallsubr.h

History log of /freebsd-10.0-release/sys/sys/syscallsubr.h
Revision	Date	Author	Comments (<<< Hide modified files) (Show modified files >>>)
# 259065	07-Dec-2013	gjb	- Copy stable/10 (r259064) to releng/10.0 as part of the 10.0-RELEASE cycle. - Update __FreeBSD_version [1] - Set branch name to -RC1 [1] 10.0-CURRENT __FreeBSD_version value ended at '55', so start releng/10.0 at '100' so the branch is started with a value ending in zero. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation /freebsd-10.0-release /freebsd-10.0-release/sys/conf/newvers.sh /freebsd-10.0-release/sys/sys/param.h
# 256281	10-Oct-2013	gjb	Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
# 255708	19-Sep-2013	jhb	Extend the support for exempting processes from being killed when swap is exhausted. - Add a new protect(1) command that can be used to set or revoke protection from arbitrary processes. Similar to ktrace it can apply a change to all existing descendants of a process as well as future descendants. - Add a new procctl(2) system call that provides a generic interface for control operations on processes (as opposed to the debugger-specific operations provided by ptrace(2)). procctl(2) uses a combination of idtype_t and an id to identify the set of processes on which to operate similar to wait6(). - Add a PROC_SPROTECT control operation to manage the protection status of a set of processes. MADV_PROTECT still works for backwards compatability. - Add a p_flag2 to struct proc (and a corresponding ki_flag2 to kinfo_proc) the first bit of which is used to track if P_PROTECT should be inherited by new child processes. Reviewed by: kib, jilles (earlier version) Approved by: re (delphij) MFC after: 1 month
# 255675	18-Sep-2013	rdivacky	Revert r255672, it has some serious flaws, leaking file references etc. Approved by: re (delphij)
# 255672	18-Sep-2013	rdivacky	Implement epoll support in Linuxulator. This is a tiny wrapper around kqueue to implement epoll subset of functionality. The kqueue user data are 32bit on i386 which is not enough for epoll user data so this patch overrides kqueue fileops to maintain enough space in struct file. Initial patch developed by me in 2007 and then extended and finished by Yuri Victorovich. Approved by: re (delphij) Sponsored by: Google Summer of Code Submitted by: Yuri Victorovich <yuri at rawbw dot com> Tested by: Yuri Victorovich <yuri at rawbw dot com>
# 254481	18-Aug-2013	pjd	Implement 32bit versions of the cap_ioctls_limit(2) and cap_ioctls_get(2) system calls as unsigned longs have different size on i386 and amd64. Reported by: jilles Sponsored by: The FreeBSD Foundation
# 253530	21-Jul-2013	kib	Implement compat32 wrappers for the ktimer_* syscalls. Reported, reviewed and tested by: Petr Salinger <Petr.Salinger@seznam.cz> Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 253494	20-Jul-2013	kib	id_t is 64bit, provide the compat32 wrapper for clock_getcpuclockid2(2). Reported and tested by: Petr Salinger <Petr.Salinger@seznam.cz> PR: threads/180652 Sponsored by: The FreeBSD Foundation
# 250154	01-May-2013	jilles	Add accept4() system call. The accept4() function, compared to accept(), allows setting the new file descriptor atomically close-on-exec and explicitly controlling the non-blocking status on the new socket. (Note that the latter point means that accept() is not equivalent to any form of accept4().) The linuxulator's accept4 implementation leaves a race window where the new file descriptor is not close-on-exec because it calls sys_accept(). This implementation leaves no such race window (by using falloc() flags). The linuxulator could be fixed and simplified by using the new code. Like accept(), accept4() is async-signal-safe, a cancellation point and permitted in capability mode.
# 248951	31-Mar-2013	jilles	Rename do_pipe() to kern_pipe2() and declare it properly.
# 243135	16-Nov-2012	kib	Move the definition of the idtype_t from sys/types.h to sys/wait.h. Fix the bug, use #if __BSD_VISIBLE instead of #if defined(__BSD_VISIBLE), since __BSD_VISIBLE is always defined. Reformat the comments from the Solaris style to KNF. Reported and reviewed by: bde MFC after: 28 days
# 243134	16-Nov-2012	kib	Alphabetically reorder the forward-declarations of the structures. Add the declaration for enum idtype, to be used later. Reported and reviewed by: bde MFC after: 28 days
# 242958	13-Nov-2012	kib	Add the wait6(2) system call. It takes POSIX waitid()-like process designator to select a process which is waited for. The system call optionally returns siginfo_t which would be otherwise provided to SIGCHLD handler, as well as extended structure accounting for child and cumulative grandchild resource usage. Allow to get the current rusage information for non-exited processes as well, similar to Solaris. The explicit WEXITED flag is required to wait for exited processes, allowing for more fine-grained control of the events the waiter is interested in. Fix the handling of siginfo for WNOWAIT option for all wait*(2) family, by not removing the queued signal state. PR: standards/170346 Submitted by: "Jukka A. Ukkonen" <jau@iki.fi> MFC after: 1 month
# 235886	24-May-2012	gleb	Add kern_fhstat(), adjust sys_fhstat() to use it. Extend kern_getdirentries() to accept uio segflag and optionally return buffer residue. Sponsored by: Google Summer of Code 2011
# 227502	14-Nov-2011	jhb	- Split out a kern_posix_fadvise() from the posix_fadvise() system call so it can be used by in-kernel consumers. - Make kern_posix_fallocate() public. - Use kern_posix_fadvise() and kern_posix_fallocate() to implement the freebsd32 wrappers for the two system calls.
# 220238	01-Apr-2011	kib	Add support for executing the FreeBSD 1/i386 a.out binaries on amd64. In particular: - implement compat shims for old stat(2) variants and ogetdirentries(2); - implement delivery of signals with ancient stack frame layout and corresponding sigreturn(2); - implement old getpagesize(2); - provide a user-mode trampoline and LDT call gate for lcall $7,$0; - port a.out image activator and connect it to the build as a module on amd64. The changes are hidden under COMPAT_43. MFC after: 1 month
# 220158	30-Mar-2011	kib	Provide compat32 shims for kldstat(2). Requested and tested by: jpaetzel MFC after: 1 week
# 209613	30-Jun-2010	jhb	Move prototypes for kern_sigtimedwait() and kern_sigprocmask() to <sys/syscallsubr.h> where all other kern_<syscall> prototypes live.
# 202113	11-Jan-2010	mckusick	Background: When renaming a directory it passes through several intermediate states. First its new name will be created causing it to have two names (from possibly different parents). Next, if it has different parents, its value of ".." will be changed from pointing to the old parent to pointing to the new parent. Concurrently, its old name will be removed bringing it back into a consistent state. When fsck encounters an extra name for a directory, it offers to remove the "extraneous hard link"; when it finds that the names have been changed but the update to ".." has not happened, it offers to rewrite ".." to point at the correct parent. Both of these changes were considered unexpected so would cause fsck in preen mode or fsck in background mode to fail with the need to run fsck manually to fix these problems. Fsck running in preen mode or background mode now corrects these expected inconsistencies that arise during directory rename. The functionality added with this update is used by fsck running in background mode to make these fixes. Solution: This update adds three new fsck sysctl commands to support background fsck in correcting expected inconsistencies that arise from incomplete directory rename operations. They are: setcwd(dirinode) - set the current directory to dirinode in the filesystem associated with the snapshot. setdotdot(oldvalue, newvalue) - Verify that the inode number for ".." in the current directory is oldvalue then change it to newvalue. unlink(nameptr, oldvalue) - Verify that the inode number associated with nameptr in the current directory is oldvalue then unlink it. As with all other fsck sysctls, these new ones may only be used by processes with appropriate priviledge. Reported by: jeff Security issues: rwatson
# 198508	27-Oct-2009	kib	Current pselect(3) is implemented in usermode and thus vulnerable to well-known race condition, which elimination was the reason for the function appearance in first place. If sigmask supplied as argument to pselect() enables a signal, the signal might be delivered before thread called select(2), causing lost wakeup. Reimplement pselect() in kernel, making change of sigmask and sleep atomic. Since signal shall be delivered to the usermode, but sigmask restored, set TDP_OLDMASK and save old mask in td_oldsigmask. The TDP_OLDMASK should be cleared by ast() in case signal was not gelivered during syscall execution. Reviewed by: davidxu Tested by: pho MFC after: 1 month
# 198506	27-Oct-2009	kib	In kern_sigsuspend(), better manipulate thread signal mask using kern_sigprocmask() to properly notify other possible candidate threads for signal delivery. Since sigsuspend() shall only return to usermode after a signal was delivered, do cursig/postsig loop immediately after waiting for signal, repeating the wait if wakeup was spurious due to race with other thread fetching signal from the process queue before us. Add thread_suspend_check() call to allow the thread to be stopped or killed while in loop. Modify last argument of kern_sigprocmask() from boolean to flags, allowing the function to be called with locked proc. Convertion of the callers that supplied 1 to the old argument will be done in the next commit, and due to SIGPROCMASK_OLD value equial to 1, code is formally correct in between. Reviewed by: davidxu Tested by: pho MFC after: 1 month
# 197049	09-Sep-2009	kib	kern_select(9) copies fd_set in and out of userspace in quantities of longs. Since 32bit processes longs are 4 bytes, 64bit kernel may copy in or out 4 bytes more then the process expected. Calculate the amount of bytes to copy taking into account size of fd_set for the current process ABI. Diagnosed and tested by: Peter Jeremy <peterjeremy acm org> Reviewed by: jhb MFC after: 1 week
# 195458	08-Jul-2009	trasz	There is an optimization in chmod(1), that makes it not to call chmod(2) if the new file mode is the same as it was before; however, this optimization must be disabled for filesystems that support NFSv4 ACLs. Chmod uses pathconf(2) to determine whether this is the case - however, pathconf(2) always follows symbolic links, while the 'chmod -h' doesn't. This change adds lpathconf(3) to make it possible to solve that problem in a clean way. Reviewed by: rwatson (earlier version) Approved by: re (kib)
# 193167	31-May-2009	dchagin	Split native socketpair() syscall onto kern_socketpair() which should be used by kernel consumers and socketpair() itself. Approved by: kib (mentor) MFC after: 1 month
# 192895	27-May-2009	jamie	Add hierarchical jails. A jail may further virtualize its environment by creating a child jail, which is visible to that jail and to any parent jails. Child jails may be restricted more than their parents, but never less. Jail names reflect this hierarchy, being MIB-style dot-separated strings. Every thread now points to a jail, the default being prison0, which contains information about the physical system. Prison0's root directory is the same as rootvnode; its hostname is the same as the global hostname, and its securelevel replaces the global securelevel. Note that the variable "securelevel" has actually gone away, which should not cause any problems for code that properly uses securelevel_gt() and securelevel_ge(). Some jail-related permissions that were kept in global variables and set via sysctls are now per-jail settings. The sysctls still exist for backward compatibility, used only by the now-deprecated jail(2) system call. Approved by: bz (mentor)
# 191673	29-Apr-2009	jamie	Introduce the extensible jail framework, using the same "name=value" interface as nmount(2). Three new system calls are added: * jail_set, to create jails and change the parameters of existing jails. This replaces jail(2). * jail_get, to read the parameters of existing jails. This replaces the security.jail.list sysctl. * jail_remove to kill off a jail's processes and remove the jail. Most jail parameters may now be changed after creation, and jails may be set to exist without any attached processes. The current jail(2) system call still exists, though it is now a stub to jail_set(2). Approved by: bz (mentor)
# 188849	20-Feb-2009	ed	Don't make Linux stat() open character devices to resolve its name. The existing code calls kern_open() to resolve the vnode of a pathname right after a stat(). This is not correct, because it causes random character devices to be opened in /dev. This means ls'ing a tape streamer will cause it to rewind, for example. Changes I have made: - Add kern_statat_vnhook() to allow binary emulators to `post-process' struct stat, using the proper vnode. - Remove unneeded printf's from stat() and statfs(). - Make the Linuxolator use kern_statat_vnhook(), replacing translate_path_major_minor_at(). - Let translate_fd_major_minor() use vp->v_rdev instead of vp->v_un.vu_cdev. Result: crw-rw-rw- 1 root root 0, 14 Feb 20 13:54 /dev/ptmx crw--w---- 1 root adm 136, 0 Feb 20 14:03 /dev/pts/0 crw--w---- 1 root adm 136, 1 Feb 20 14:02 /dev/pts/1 crw--w---- 1 ed tty 136, 2 Feb 20 14:03 /dev/pts/2 Before this commit, ptmx also had a major number of 136, because it silently allocated and deallocated a pseudo-terminal. Device nodes that cannot be opened now have proper major/minor-numbers. Reviewed by: kib, netchild, rdivacky (thanks!)
# 184849	11-Nov-2008	ed	Several cleanups related to pipe(2). - Use `fildes[2]' instead of `*fildes' to make more clear that pipe(2) fills an array with two descriptors. - Remove EFAULT from the manual page. Because of the current calling convention, pipe(2) raises a segmentation fault when an invalid address is passed. - Introduce kern_pipe() to make it easier for binary emulations to implement pipe(2). - Make Linux binary emulation use kern_pipe(), which means we don't have to recover td_retval after calling the FreeBSD system call. Approved by: rdivacky Discussed on: arch
# 184183	22-Oct-2008	jhb	Split the copyout of *base at the end of getdirentries() out leaving the rest in kern_getdirentries(). Use kern_getdirentries() to implement freebsd32_getdirentries(). This fixes a bug where calls to getdirentries() in 32-bit binaries would trash the 4 bytes after the 'long base' in userland. Submitted by: ups MFC after: 1 week
# 177997	08-Apr-2008	kib	Implement the linux syscalls openat, mkdirat, mknodat, fchownat, futimesat, fstatat, unlinkat, renameat, linkat, symlinkat, readlinkat, fchmodat, faccessat. Submitted by: rdivacky Sponsored by: Google Summer of Code 2007 Tested by: pho
# 177786	31-Mar-2008	kib	Implement the openat(2), faccessat(2), fchmodat(2), fchownat(2), fstatat(2), futimesat(2), linkat(2), mkdirat(2), mkfifoat(2), mknodat(2), readlinkat(2), renameat(2), symlinkat(2) syscalls. Based on the submission by rdivacky, sponsored by Google Summer of Code 2007 Reviewed by: rwatson, rdivacky Tested by: pho
# 176215	12-Feb-2008	ru	Change readlink(2)'s return type and type of the last argument to match POSIX. Prodded by: Alexey Lyashkov
# 175140	07-Jan-2008	jhb	Make ftruncate a 'struct file' operation rather than a vnode operation. This makes it possible to support ftruncate() on non-vnode file types in the future. - 'struct fileops' grows a 'fo_truncate' method to handle an ftruncate() on a given file descriptor. - ftruncate() moves to kern/sys_generic.c and now just fetches a file object and invokes fo_truncate(). - The vnode-specific portions of ftruncate() move to vn_truncate() in vfs_vnops.c which implements fo_truncate() for vnode file types. - Non-vnode file types return EINVAL in their fo_truncate() method. Submitted by: rwatson
# 170404	07-Jun-2007	jhb	- Remove unused variable from create_thread(). - Move kern_thr_() prototype to <sys/syscallsubr.h> where all the other kern_() prototypes live.
# 165403	20-Dec-2006	jkim	MFP4: (part of) 110058 copyin()/copyout() for message type is separated from msgsnd()/msgrcv() and it is done from its wrapper functions to support 32-bit emulations. After I implemented this, I have briefly referenced NetBSD and Darwin. NetBSD passes copyin()/copyout() function pointers from wrappers. Darwin passes size of message type as an argument, which is actually similar to my first implementation (P4 109706). We may revisit these implementations later.
# 160765	27-Jul-2006	jhb	Fix a file descriptor race I reintroduced when I split accept1() up into kern_accept() and accept1(). If another thread closed the new file descriptor and the first thread later got an error trying to copyout the socket address, then it would attempt to close the wrong file object. To fix, add a struct file ** argument to kern_accept(). If it is non-NULL, then on success kern_accept() will store a pointer to the new file object there and not release any of the references. It is up to the calling code to drop the references appropriately (including a call to fdclose() in case of error to safely handle the aforementioned race). While I'm at it, go ahead and fix the svr4 streams code to not leak the accept fd if it gets an error trying to copyout the streams structures.
# 160249	10-Jul-2006	jhb	- Split out kern_accept(), kern_getpeername(), and kern_getsockname() for use by ABI emulators. - Alter the interface of kern_recvit() somewhat. Specifically, go ahead and hard code UIO_USERSPACE in the uio as that's what all the callers specify. In place, add a new uioseg to indicate what type of pointer is in mp->msg_name. Previously it was always a userland address, but ABI emulators may pass in kernel-side sockaddrs. Also, remove the namelenp field and instead require the two places that used it to explicitly copy mp->msg_namelen out to userland. - Use the patched kern_recvit() to replace svr4_recvit() and the stock kern_sendit() to replace svr4_sendit(). - Use kern_bind() instead of stackgap use in ti_bind(). - Use kern_getpeername() and kern_getsockname() instead of stackgap in svr4_stream_ti_ioctl(). - Use kern_connect() instead of stackgap in svr4_do_putmsg(). - Use kern_getpeername() and kern_accept() instead of stackgap in svr4_do_getmsg(). - Retire the stackgap from SVR4 compat as it is no longer used.
# 160192	08-Jul-2006	jhb	- Split ioctl() up into ioctl() and kern_ioctl(). The kern_ioctl() assumes that the 'data' pointer is already setup to point to a valid KVM buffer or contains the copied-in data from userland as appropriate (ioctl(2) still does this). kern_ioctl() takes care of looking up a file pointer, implementing FIONCLEX and FIOCLEX, and calling fi_ioctl(). - Use kern_ioctl() to implement xenix_rdchk() instead of using the stackgap and mark xenix_rdchk() MPSAFE.
# 160190	08-Jul-2006	jhb	Add a kern_close() so that the ABIs can close a file descriptor w/o having to populate a close_args struct and change some of the places that do.
# 160187	08-Jul-2006	jhb	Rework kern_semctl a bit to always assume the UIO_SYSSPACE case. This mostly consists of pushing a few copyin's and copyout's up into __semctl() as all the other callers were already doing the UIO_SYSSPACE case. This also changes kern_semctl() to set the return value in a passed in pointer to a register_t rather than td->td_retval[0] directly so that callers can only set td->td_retval[0] if all the various copyout's succeed. As a result of these changes, kern_semctl() no longer does copyin/copyout (except for GETALL/SETALL) so simplify the locking to acquire the semakptr mutex before the MAC check and hold it all the way until the end of the big switch statement. The GETALL/SETALL cases have to temporarily drop it while they do copyin/malloc and copyout. Also, simplify the SETALL case to remove handling for a non-existent race condition.
# 160139	06-Jul-2006	jhb	Add kern_setgroups() and kern_getgroups() and use them to implement ibcs2_[gs]etgroups() rather than using the stackgap. This also makes ibcs2_[gs]etgroups() MPSAFE. Also, it cleans up one bit of weirdness in the old setgroups() where it allocated an entire credential just so it had a place to copy the group list into. Now setgroups just allocates a NGROUPS_MAX array on the stack that it copies into and then passes to kern_setgroups().
# 159991	27-Jun-2006	jhb	- Add a kern_semctl() helper function for __semctl(). It accepts a pointer to a copied-in copy of the 'union semun' and a uioseg to indicate which memory space the 'buf' pointer of the union points to. This is then used in linux_semctl() and svr4_sys_semctl() to eliminate use of the stackgap. - Mark linux_ipc() and svr4_sys_semsys() MPSAFE.
# 159588	13-Jun-2006	jhb	- Add a kern_kldload() that is most of the previous kldload() and push Giant down in it. - Push Giant down in kern_kldunload() and reorganize it slightly to avoid using gotos. Also, expose this function to the rest of the kernel.
# 156114	28-Feb-2006	ps	Fix 32bit sendfile by implementing kern_sendfile so that it takes the header and trailers as iovec arguments instead of copying them in inside of sendfile. Reviewed by: jhb MFC after: 3 weeks
# 155401	06-Feb-2006	jhb	Add a kern_eaccess() function and use it to implement xenix_eaccess() rather than kern_access(). Suggested by: rwatson
# 151909	31-Oct-2005	ps	Reformat socket control messages on input/output for 32bit compatibility on 64bit systems. Submitted by: ps, ups Reviewed by: jhb
# 151359	15-Oct-2005	ps	Implement the 32bit versions of recvmsg, recvfrom, sendmsg Partially obtained from: jhb
# 151357	15-Oct-2005	ps	Implement 32bit wrappers for clock_gettime, clock_settime, and clock_getres.
# 147813	07-Jul-2005	jhb	- Add two new system calls: preadv() and pwritev() which are like readv() and writev() except that they take an additional offset argument and do not change the current file position. In SAT speak: preadv:readv::pread:read and pwritev:writev::pwrite:write. - Try to reduce code duplication some by merging most of the old kern_foov() and dofilefoo() functions into new dofilefoo() functions that are called by kern_foov() and kern_pfoov(). The non-v functions now all generate a simple uio on the stack from the passed in arguments and then call kern_foov(). For example, read() now just builds a uio and calls kern_readv() and pwrite() just builds a uio and calls kern_pwritev(). PR: kern/80362 Submitted by: Marc Olzheim marcolz at stack dot nl (1) Approved by: re (scottl) MFC after: 1 week
# 147302	11-Jun-2005	pjd	Do not allocate memory based on not-checked argument from userland. It can be used to panic the kernel by giving too big value. Fix it by moving allocation and size verification into kern_getfsstat(). This even simplifies kern_getfsstat() consumers, but destroys symmetry - memory is allocated inside kern_getfsstat(), but has to be freed by the caller. Found by: FreeBSD Kernel Stress Test Suite: http://www.holm.cc/stress/ Reported by: Peter Holm <peter@holm.cc>
# 147178	09-Jun-2005	pjd	Avoid code duplication in serval places by introducing universal kern_getfsstat() function. Obtained from: jhb
# 146950	03-Jun-2005	ps	Wrap copyin/copyout for kevent so the 32bit wrapper does not have to malloc nchanges * sizeof(struct kevent) AND/OR nevents * sizeof(struct kevent) on every syscall. Glanced at by: peter, jmg Obtained from: Yahoo! MFC after: 2 weeks
# 144445	31-Mar-2005	jhb	Implement kern_adjtime(), kern_readv(), kern_sched_rr_get_interval(), kern_settimeofday(), and kern_writev() to allow for further stackgap reduction in the compat ABIs.
# 142933	01-Mar-2005	ps	regen
# 141815	13-Feb-2005	sobomax	Backout previous change (disabling of security checks for signals delivered in emulation layers), since it appears to be too broad. Requested by: rwatson
# 141812	13-Feb-2005	sobomax	Split out kill(2) syscall service routine into user-level and kernel part, the former is callable from user space and the latter from the kernel one. Make kernel version take additional argument which tells if the respective call should check for additional restrictions for sending signals to suid/sugid applications or not. Make all emulation layers using non-checked version, since signal numbers in emulation layers can have different meaning that in native mode and such protection can cause misbehaviour. As a result remove LIBTHR from the signals allowed to be delivered to a suid/sugid application. Requested (sorta) by: rwatson MFC after: 2 weeks
# 141484	07-Feb-2005	jhb	Implement a kern_pathconf() wrapper for pathconf() which can take the filename from either a user space or a kernel space pointer.
# 141471	07-Feb-2005	jhb	- Tweak kern_msgctl() to return a copy of the requested message queue id structure in the struct pointed to by the 3rd argument for IPC_STAT and get rid of the 4th argument. The old way returned a pointer into the kernel array that the calling function would then access afterwards without holding the appropriate locks and doing non-lock-safe things like copyout() with the data anyways. This change removes that unsafeness and resulting race conditions as well as simplifying the interface. - Implement kern_foo wrappers for stat(), lstat(), fstat(), statfs(), fstatfs(), and fhstatfs(). Use these wrappers to cut out a lot of code duplication for freebsd4 and netbsd compatability system calls. - Add a new lookup function kern_alternate_path() that looks up a filename under an alternate prefix and determines which filename should be used. This is basically a more general version of linux_emul_convpath() that can be shared by all the ABIs thus allowing for further reduction of code duplication.
# 141029	30-Jan-2005	sobomax	Extend kern_sendit() to take another enum uio_seg argument, which specifies where the buffer to send lies and use it to eliminate yet another stackgap in linuxlator. MFC after: 2 weeks
# 140992	29-Jan-2005	sobomax	o Split out kernel part of execve(2) syscall into two parts: one that copies arguments into the kernel space and one that operates completely in the kernel space; o use kernel-only version of execve(2) to kill another stackgap in linuxlator/i386. Obtained from: DragonFlyBSD (partially) MFC after: 2 weeks
# 140839	25-Jan-2005	sobomax	Split out kernel side of msgctl(2) into two parts: the first that pops data from the userland and pushes results back and the second which does actual processing. Use the latter to eliminate stackgap in the linux wrapper of that syscall. MFC after: 2 weeks
# 140836	25-Jan-2005	sobomax	More kern_{get,set}itiver() where they belong. Submitted by: dwmalone MFC after: 2 weeks
# 140483	19-Jan-2005	ps	move kern_nanosleep to sys/syscallsubr.h Requested by: jhb
# 139825	07-Jan-2005	imp	/* -> /*- for license, minor formatting changes
# 139739	05-Jan-2005	jhb	- Move the function prototypes for kern_setrlimit() and kern_wait() to sys/syscallsubr.h where all the other kern_foo() prototypes live. - Resort kern_execve() while I'm there.
# 136221	07-Oct-2004	davidxu	Add an execve command for kse_thr_interrupt to allow libpthread to restore signal mask correctly, this is required by POSIX. Reviewed by: deischen
# 136152	05-Oct-2004	jhb	Rework how we store process times in the kernel such that we always store the raw values including for child process statistics and only compute the system and user timevals on demand. - Fix the various kern_wait() syscall wrappers to only pass in a rusage pointer if they are going to use the result. - Add a kern_getrusage() function for the ABI syscalls to use so that they don't have to play stackgap games to call getrusage(). - Fix the svr4_sys_times() syscall to just call calcru() to calculate the times it needs rather than calling getrusage() twice with associated stackgap, etc. - Add a new rusage_ext structure to store raw time stats such as tick counts for user, system, and interrupt time as well as a bintime of the total runtime. A new p_rux field in struct proc replaces the same inline fields from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux field in struct proc contains the "raw" child time usage statistics. ruadd() has been changed to handle adding the associated rusage_ext structures as well as the values in rusage. Effectively, the values in rusage_ext replace the ru_utime and ru_stime values in struct rusage. These two fields in struct rusage are no longer used in the kernel. - calcru() has been split into a static worker function calcru1() that calculates appropriate timevals for user and system time as well as updating the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a copy of the process' p_rux structure to compute the timevals after updating the runtime appropriately if any of the threads in that process are currently executing. It also now only locks sched_lock internally while doing the rux_runtime fixup. calcru() now only requires the caller to hold the proc lock and calcru1() only requires the proc lock internally. calcru() also no longer allows callers to ask for an interrupt timeval since none of them actually did. - calcru() now correctly handles threads executing on other CPUs. - A new calccru() function computes the child system and user timevals by calling calcru1() on p_crux. Note that this means that any code that wants child times must now call this function rather than reading from p_cru directly. This function also requires the proc lock. - This finishes the locking for rusage and friends so some of the Giant locks in exit1() and kern_wait() are now gone. - The locking in ttyinfo() has been tweaked so that a shared lock of the proctree lock is used to protect the process group rather than the process group lock. By holding this lock until the end of the function we now ensure that the process/thread that we pick to dump info about will no longer vanish while we are trying to output its info to the console. Submitted by: bde (mostly) MFC after: 1 month
# 135764	24-Sep-2004	jhb	Sort forward declared structures.
# 132313	17-Jul-2004	dwmalone	Add a kern_setsockopt and kern_getsockopt which can read the option values from either user land or from the kernel. Use them for [gs]etsockopt and to clean up some calls to [gs]etsockopt in the Linux emulation code that uses the stackgap.
# 122088	04-Nov-2003	fjoe	Back out the following revisions: 1.36 +73 -60 src/sys/compat/linux/linux_ipc.c 1.83 +102 -48 src/sys/kern/sysv_shm.c 1.8 +4 -0 src/sys/sys/syscallsubr.h That change was intended to support vmware3, but wantrem parameter is useless because vmware3 uses SYSV shared memory to talk with X server and X server is native application. The patch worked because check for wantrem was not valid (wantrem and SHMSEG_REMOVED was never checked for SHMSEG_ALLOCATED segments). Add kern.ipc.shm_allow_removed (integer, rw) sysctl (default 0) which when set to 1 allows to return removed segments in shm_find_segment_by_shmid() and shm_find_segment_by_shmidx(). MFC after: 1 week
# 114749	05-May-2003	dwmalone	Split sendit into two parts. The first part, still called sendit, that does the copyin stuff and then calls the second part kern_sendit to do the hard work. Don't bother holding Giant during the copyin phase. The intent of this is to allow the Linux emulator to impliment send* syscalls without using the stackgap.
# 114724	05-May-2003	mbr	Change the semantics of sysv shm emulation to take a additional argument to the functions shm{at,ctl}1 and shm_find_segment_by_shmid{x}. The BSD semantics didn't allow the usage of shared segment after being marked for removal through IPC_RMID. The patch involves the following functions: - shmat - shmctl - shm_find_segment_by_shmid - shm_find_segment_by_shmidx - linux_shmat - linux_shmctl Submitted by: Orlando Bassotto <orlando.bassotto@ieo-research.it> Reviewed by: marcel
# 113685	18-Apr-2003	jhb	Rename do_sigprocmask() to kern_sigprocmask() and make it a global symbol so that it can be used by binary emulators.
# 110294	03-Feb-2003	ume	Break out the bind and connect syscalls to intend to make calling these syscalls internally easy. This is preparation for force coming IPv6 support for Linuxlator. Submitted by: dwmalone MFC after: 10 days
# 105950	25-Oct-2002	peter	Split 4.x and 5.x signal handling so that we can keep 4.x signal handling clean and functional as 5.x evolves. This allows some of the nasty bandaids in the 5.x codepaths to be unwound. Encapsulate 4.x signal handling under COMPAT_FREEBSD4 (there is an anti-foot-shooting measure in place, 5.x folks need this for a while) and finish encapsulating the older stuff under COMPAT_43. Since the ancient stuff is required on alpha (longjmp(3) passes a 'struct osigcontext ' to the current sigreturn(2), instead of the 'ucontext_t ' that sigreturn is supposed to take), add a compile time check to prevent foot shooting there too. Add uniform COMPAT_43 stubs for ia64/sparc64/powerpc. Tested on: i386, alpha, ia64. Compiled on sparc64 (a few days ago). Approved by: re
# 102946	04-Sep-2002	iedowse	Split up ptrace() into a wrapper that does the copying to and from user space and a kern_ptrace() implementation. Use the kern_*() version in the Linux emulation code to remove more stack gap uses. Approved by: des
# 102870	02-Sep-2002	iedowse	Split up __getcwd so that kernel callers of the internal version can specify whether the buffer is in user or system space.
# 102868	02-Sep-2002	iedowse	Split fcntl() into a wrapper and a kernel-callable kern_fcntl() implementation. The wrapper is responsible for copying additional structure arguments (struct flock) to and from userland.
# 102779	01-Sep-2002	iedowse	Split out a number of mostly VFS and signal related syscalls into a kernel-internal kern_*() version and a wrapper that is called via the syscall vector table. For paths and structure pointers, the internal version either takes a uio_seg parameter or requires the caller to copyin() the data to kernel memory as appropiate. This will permit emulation layers to use these syscalls without having to copy out translated arguments to the stack gap. Discussed on: -arch Review/suggestions: bde, jhb, peter, marcel