Cross Reference: /freebsd-10.1-release/sys/kern/syscalls.master

History log of /freebsd-10.1-release/sys/kern/syscalls.master
Revision	Date	Author	Comments (<<< Hide modified files) (Show modified files >>>)
# 272461	02-Oct-2014	gjb	Copy stable/10@r272459 to releng/10.1 as part of the 10.1-RELEASE process. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation /freebsd-10.1-release
# 256281	10-Oct-2013	gjb	Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
# 255708	19-Sep-2013	jhb	Extend the support for exempting processes from being killed when swap is exhausted. - Add a new protect(1) command that can be used to set or revoke protection from arbitrary processes. Similar to ktrace it can apply a change to all existing descendants of a process as well as future descendants. - Add a new procctl(2) system call that provides a generic interface for control operations on processes (as opposed to the debugger-specific operations provided by ptrace(2)). procctl(2) uses a combination of idtype_t and an id to identify the set of processes on which to operate similar to wait6(). - Add a PROC_SPROTECT control operation to manage the protection status of a set of processes. MADV_PROTECT still works for backwards compatability. - Add a p_flag2 to struct proc (and a corresponding ki_flag2 to kinfo_proc) the first bit of which is used to track if P_PROTECT should be inherited by new child processes. Reviewed by: kib, jilles (earlier version) Approved by: re (delphij) MFC after: 1 month
# 255490	12-Sep-2013	jhb	Fix the type of the idtype argument to wait6() in syscalls.master. Approved by: re (kib) MFC after: 1 week
# 255219	04-Sep-2013	pjd	Change the cap_rights_t type from uint64_t to a structure that we can extend in the future in a backward compatible (API and ABI) way. The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough. The structure definition looks like this: struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; }; The initial CAP_RIGHTS_VERSION is 0. The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements. The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future. To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg. #define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL) We still support aliases that combine few rights, but the rights have to belong to the same array element, eg: #define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL) #define CAP_FCHMODAT (CAP_FCHMOD \| CAP_LOOKUP) There is new API to manage the new cap_rights_t structure: cap_rights_t cap_rights_init(cap_rights_t rights, ...); void cap_rights_set(cap_rights_t rights, ...); void cap_rights_clear(cap_rights_t rights, ...); bool cap_rights_is_set(const cap_rights_t rights, ...); bool cap_rights_is_valid(const cap_rights_t rights); void cap_rights_merge(cap_rights_t dst, const cap_rights_t src); void cap_rights_remove(cap_rights_t dst, const cap_rights_t src); bool cap_rights_contains(const cap_rights_t big, const cap_rights_t little); Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg: cap_rights_t rights; cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT); There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg: #define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...); Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1: cap_rights_init(&rights, CAP_LOOKUP \| CAP_PDKILL); Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition. This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x. Sponsored by: The FreeBSD Foundation
# 251526	08-Jun-2013	glebius	Add new system call - aio_mlock(). The name speaks for itself. It allows to perform the mlock(2) operation, which can consume a lot of time, under control of aio(4). Reviewed by: kib, jilles Sponsored by: Nginx, Inc.
# 250853	21-May-2013	kib	Fix the wait6(2) on 32bit architectures and for the compat32, by using the right type for the argument in syscalls.master. Also fix the posix_fallocate(2) and posix_fadvise(2) compat32 syscalls on the architectures which require padding of the 64bit argument. Noted and reviewed by: jhb Pointy hat to: kib MFC after: 1 week
# 250159	01-May-2013	jilles	Add pipe2() system call. The pipe2() function is similar to pipe() but allows setting FD_CLOEXEC and O_NONBLOCK (on both sides) as part of the function. If p points to two writable ints, pipe2(p, 0) is equivalent to pipe(p). If the pointer is not valid, behaviour differs: pipe2() writes into the array from the kernel like socketpair() does, while pipe() writes into the array from an architecture-specific assembler wrapper. Reviewed by: kan, kib
# 250154	01-May-2013	jilles	Add accept4() system call. The accept4() function, compared to accept(), allows setting the new file descriptor atomically close-on-exec and explicitly controlling the non-blocking status on the new socket. (Note that the latter point means that accept() is not equivalent to any form of accept4().) The linuxulator's accept4 implementation leaves a race window where the new file descriptor is not close-on-exec because it calls sys_accept(). This implementation leaves no such race window (by using falloc() flags). The linuxulator could be fixed and simplified by using the new code. Like accept(), accept4() is async-signal-safe, a cancellation point and permitted in capability mode.
# 248995	02-Apr-2013	mdf	Fix return type of extattr_set_* and fix rmextattr(8) utility. extattr_set_{fd,file,link} is logically a write(2)-like operation and should return ssize_t, just like extattr_get_. Also, the user-space utility was using an int for the return value of extattr_get_ and extattr_list_*, both of which return an ssize_t. MFC after: 1 week
# 248599	21-Mar-2013	pjd	Implement chflagsat(2) system call, similar to fchmodat(2), but operates on file flags. Reviewed by: kib, jilles Sponsored by: The FreeBSD Foundation
# 248597	21-Mar-2013	pjd	- Make 'flags' argument to chflags(2), fchflags(2) and lchflags(2) of type u_long. Before this change it was of type int for syscalls, but prototypes in sys/stat.h and documentation for chflags(2) and fchflags(2) (but not for lchflags(2)) stated that it was u_long. Now some related functions use u_long type for flags (strtofflags(3), fflagstostr(3)). - Make path argument of type 'const char *' for consistency. Discussed on: arch Sponsored by: The FreeBSD Foundation
# 247667	02-Mar-2013	pjd	- Implement two new system calls: int bindat(int fd, int s, const struct sockaddr addr, socklen_t addrlen); int connectat(int fd, int s, const struct sockaddr name, socklen_t namelen); which allow to bind and connect respectively to a UNIX domain socket with a path relative to the directory associated with the given file descriptor 'fd'. - Add manual pages for the new syscalls. - Make the new syscalls available for processes in capability mode sandbox. - Add capability rights CAP_BINDAT and CAP_CONNECTAT that has to be present on the directory descriptor for the syscalls to work. - Update audit(4) to support those two new syscalls and to handle path in sockaddr_un structure relative to the given directory descriptor. - Update procstat(1) to recognize the new capability rights. - Document the new capability rights in cap_rights_limit(2). Sponsored by: The FreeBSD Foundation Discussed with: rwatson, jilles, kib, des
# 247602	01-Mar-2013	pjd	Merge Capsicum overhaul: - Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights. - The cap_new(2) system call is left, but it is no longer documented and should not be used in new code. - The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one. - The cap_getrights(2) syscall is renamed to cap_rights_get(2). - If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall. - If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2). - To support ioctl and fcntl white-listing the filedesc structure was heavly modified. - The audit subsystem, kdump and procstat tools were updated to recognize new syscalls. - Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below: CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT. Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2). Added CAP_SYMLINKAT: - Allow for symlinkat(2). Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2). Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory. Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall. Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call. Removed CAP_MAPEXEC. CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE. Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ \| PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ \| PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE \| PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ \| PROT_WRITE \| PROT_EXEC). Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT. CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required). CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required). Added convinient defines: #define CAP_PREAD (CAP_SEEK \| CAP_READ) #define CAP_PWRITE (CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP \| CAP_SEEK \| CAP_READ) #define CAP_MMAP_W (CAP_MMAP \| CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP \| CAP_SEEK \| 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R \| CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R \| CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W \| CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R \| CAP_MMAP_W \| CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE #define CAP_SOCK_CLIENT \ (CAP_CONNECT \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| CAP_GETSOCKOPT \| \ CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| CAP_SETSOCKOPT \| CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT \| CAP_BIND \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| \ CAP_GETSOCKOPT \| CAP_LISTEN \| CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| \ CAP_SETSOCKOPT \| CAP_SHUTDOWN) Added defines for backward API compatibility: #define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT \| CAP_SOCK_SERVER) Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib
# 242958	13-Nov-2012	kib	Add the wait6(2) system call. It takes POSIX waitid()-like process designator to select a process which is waited for. The system call optionally returns siginfo_t which would be otherwise provided to SIGCHLD handler, as well as extended structure accounting for child and cumulative grandchild resource usage. Allow to get the current rusage information for non-exited processes as well, similar to Solaris. The explicit WEXITED flag is required to wait for exited processes, allowing for more fine-grained control of the events the waiter is interested in. Fix the handling of siginfo for WNOWAIT option for all wait*(2) family, by not removing the queued signal state. PR: standards/170346 Submitted by: "Jukka A. Ukkonen" <jau@iki.fi> MFC after: 1 month
# 239347	17-Aug-2012	davidxu	Implement syscall clock_getcpuclockid2, so we can get a clock id for process, thread or others we want to support. Use the syscall to implement POSIX API clock_getcpuclock and pthread_getcpuclockid. PR: 168417
# 236026	25-May-2012	ed	Remove use of non-ISO-C integer types from system call tables. These files already use ISO-C-style integer types, so make them less inconsistent by preferring the standard types.
# 227776	20-Nov-2011	lstewart	- Add the ffclock_getcounter(), ffclock_getestimate() and ffclock_setestimate() system calls to provide feed-forward clock management capabilities to userspace processes. ffclock_getcounter() returns the current value of the kernel's feed-forward clock counter. ffclock_getestimate() returns the current feed-forward clock parameter estimates and ffclock_setestimate() updates the feed-forward clock parameter estimates. - Document the syscalls in the ffclock.2 man page. - Regenerate the script-derived syscall related files. Committed on behalf of Julien Ridoux and Darryl Veitch from the University of Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward Clock Synchronization Algorithms" project. For more information, see http://www.synclab.org/radclock/ Submitted by: Julien Ridoux (jridoux at unimelb edu au)
# 227691	19-Nov-2011	ed	Improve access() parameter name consistency. The current code mixes the use of `flags' and `mode'. This is a bit confusing, since the faccessat() function as a `flag' parameter to store the AT_ flag. Make this less confusing by using the same name as used in the POSIX specification -- `amode'.
# 227070	04-Nov-2011	jhb	Add the posix_fadvise(2) system call. It is somewhat similar to madvise(2) except that it operates on a file descriptor instead of a memory region. It is currently only supported on regular files. Just as with madvise(2), the advice given to posix_fadvise(2) can be divided into two types. The first type provide hints about data access patterns and are used in the file read and write routines to modify the I/O flags passed down to VOP_READ() and VOP_WRITE(). These modes are thus filesystem independent. Note that to ease implementation (and since this API is only advisory anyway), only a single non-normal range is allowed per file descriptor. The second type of hints are used to hint to the OS that data will or will not be used. These hints are implemented via a new VOP_ADVISE(). A default implementation is provided which does nothing for the WILLNEED request and attempts to move any clean pages to the cache page queue for the DONTNEED request. This latter case required two other changes. First, a new V_CLEANONLY flag was added to vinvalbuf(). This requests vinvalbuf() to only flush clean buffers for the vnode from the buffer cache and to not remove any backing pages from the vnode. This is used to ensure clean pages are not wired into the buffer cache before attempting to move them to the cache page queue. The second change adds a new vm_object_page_cache() method. This method is somewhat similar to vm_object_page_remove() except that instead of freeing each page in the specified range, it attempts to move clean pages to the cache queue if possible. To preserve the ABI of struct file, the f_cdevpriv pointer is now reused in a union to point to the currently active advice region if one is present for regular files. Reviewed by: jilles, kib, arch@ Approved by: re (kib) MFC after: 1 month
# 224987	18-Aug-2011	jonathan	Add experimental support for process descriptors A "process descriptor" file descriptor is used to manage processes without using the PID namespace. This is required for Capsicum's Capability Mode, where the PID namespace is unavailable. New system calls pdfork(2) and pdkill(2) offer the functional equivalents of fork(2) and kill(2). pdgetpid(2) allows querying the PID of the remote process for debugging purposes. The currently-unimplemented pdwait(2) will, in the future, allow querying rusage/exit status. In the interim, poll(2) may be used to check (and wait for) process termination. When a process is referenced by a process descriptor, it does not issue SIGCHLD to the parent, making it suitable for use in libraries---a common scenario when using library compartmentalisation from within large applications (such as web browsers). Some observers may note a similarity to Mach task ports; process descriptors provide a subset of this behaviour, but in a UNIX style. This feature is enabled by "options PROCDESC", but as with several other Capsicum kernel features, is not enabled by default in GENERIC 9.0. Reviewed by: jhb, kib Approved by: re (kib), mentor (rwatson) Sponsored by: Google Inc
# 224066	15-Jul-2011	jonathan	Add cap_new() and cap_getrights() system calls. Implement two previously-reserved Capsicum system calls: - cap_new() creates a capability to wrap an existing file descriptor - cap_getrights() queries the rights mask of a capability. Approved by: mentor (rwatson), re (Capsicum blanket) Sponsored by: Google Inc
# 220791	18-Apr-2011	mdf	Add the posix_fallocate(2) syscall. The default implementation in vop_stdallocate() is filesystem agnostic and will run as slow as a read/write loop in userspace; however, it serves to correctly implement the functionality for filesystems that do not implement a VOP_ALLOCATE. Note that __FreeBSD_version was already bumped today to 900036 for any ports which would like to use this function. Also reserve space in the syscall table for posix_fadvise(2). Reviewed by: -arch (previous version)
# 220163	30-Mar-2011	trasz	Add rctl. It's used by racct to take user-configurable actions based on the set of rules it maintains and the current resource usage. It also privides userland API to manage that ruleset. Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)
# 219304	05-Mar-2011	trasz	Add two new system calls, setloginclass(2) and getloginclass(2). This makes it possible for the kernel to track login class the process is assigned to, which is required for RCTL. This change also make setusercontext(3) call setloginclass(2) and makes it possible to retrieve current login class using id(1). Reviewed by: kib (as part of a larger patch)
# 219129	01-Mar-2011	rwatson	Add initial support for Capsicum's Capability Mode to the FreeBSD kernel, compiled conditionally on options CAPABILITIES: Add a new credential flag, CRED_FLAG_CAPMODE, which indicates that a subject (typically a process) is in capability mode. Add two new system calls, cap_enter(2) and cap_getmode(2), which allow setting and querying (but never clearing) the flag. Export the capability mode flag via process information sysctls. Sponsored by: Google, Inc. Reviewed by: anderson Discussed with: benl, kris, pjd Obtained from: Capsicum Project MFC after: 3 months
# 211998	30-Aug-2010	kib	Make the syscalls reserved for AFS usable by OpenAFS port. Submitted by: Benjamin Kaduk <kaduk mit edu> MFC after: 2 weeks
# 211838	26-Aug-2010	kib	Fix typo. Submitted by: Ben Kaduk <minimarmot gmail com>
# 209579	28-Jun-2010	kib	Count number of threads that enter and leave dynamically registered syscalls. On the dynamic syscall deregistration, wait until all threads leave the syscall code. This somewhat increases the safety of the loadable modules unloading. Reviewed by: jhb Tested by: pho MFC after: 1 month
# 203660	08-Feb-2010	ed	Remove unused LIBCOMPAT keyword from syscalls.master.
# 198508	27-Oct-2009	kib	Current pselect(3) is implemented in usermode and thus vulnerable to well-known race condition, which elimination was the reason for the function appearance in first place. If sigmask supplied as argument to pselect() enables a signal, the signal might be delivered before thread called select(2), causing lost wakeup. Reimplement pselect() in kernel, making change of sigmask and sleep atomic. Since signal shall be delivered to the usermode, but sigmask restored, set TDP_OLDMASK and save old mask in td_oldsigmask. The TDP_OLDMASK should be cleared by ast() in case signal was not gelivered during syscall execution. Reviewed by: davidxu Tested by: pho MFC after: 1 month
# 197636	30-Sep-2009	rwatson	Reserve system call numbers for Capsicum security framework capabilities, capability mode, and process descriptors: cap_new, cap_getrights, cap_enter, cap_getmode, pdfork, pdkill, pdgetpid, and pdwait. Obtained from: TrustedBSD Project Sponsored by: Google MFC after: 3 weeks
# 195458	08-Jul-2009	trasz	There is an optimization in chmod(1), that makes it not to call chmod(2) if the new file mode is the same as it was before; however, this optimization must be disabled for filesystems that support NFSv4 ACLs. Chmod uses pathconf(2) to determine whether this is the case - however, pathconf(2) always follows symbolic links, while the 'chmod -h' doesn't. This change adds lpathconf(3) to make it possible to solve that problem in a clean way. Reviewed by: rwatson (earlier version) Approved by: re (kib)
# 194910	24-Jun-2009	jhb	Change the ABI of some of the structures used by the SYSV IPC API: - The uid/cuid members of struct ipc_perm are now uid_t instead of unsigned short. - The gid/cgid members of struct ipc_perm are now gid_t instead of unsigned short. - The mode member of struct ipc_perm is now mode_t instead of unsigned short (this is merely a style bug). - The rather dubious padding fields for ABI compat with SV/I386 have been removed from struct msqid_ds and struct semid_ds. - The shm_segsz member of struct shmid_ds is now a size_t instead of an int. This removes the need for the shm_bsegsz member in struct shmid_kernel and should allow for complete support of SYSV SHM regions >= 2GB. - The shm_nattch member of struct shmid_ds is now an int instead of a short. - The shm_internal member of struct shmid_ds is now gone. The internal VM object pointer for SHM regions has been moved into struct shmid_kernel. - The existing __semctl(), msgctl(), and shmctl() system call entries are now marked COMPAT7 and new versions of those system calls which support the new ABI are now present. - The new system calls are assigned to the FBSD-1.1 version in libc. The FBSD-1.0 symbols in libc now refer to the old COMPAT7 system calls. - A simplistic framework for tagging system calls with compatibility symbol versions has been added to libc. Version tags are added to system calls by adding an appropriate __sym_compat() entry to src/lib/libc/incldue/compat.h. [1] PR: kern/16195 kern/113218 bin/129855 Reviewed by: arch@, rwatson Discussed with: kan, kib [1]
# 194894	24-Jun-2009	jhb	Deprecate the msgsys(), semsys(), and shmsys() system calls by moving them under COMPAT_FREEBSD[4567]. Starting with FreeBSD 5.0 the SYSV IPC API was implemented via direct system calls (e.g. msgctl(), msgget(), etc.) rather than indirecting through the var-args *sys() system calls. The shmsys() system call was already effectively deprecated for all but COMPAT_FREEBSD4 already as its implementation for the !COMPAT_FREEBSD4 case was to simply invoke nosys().
# 194833	24-Jun-2009	jhb	Add a new COMPAT7 flag for FreeBSD 7.x compatibility system calls.
# 194645	22-Jun-2009	jhb	Fix a typo in a comment.
# 194390	17-Jun-2009	jhb	- Add the ability to mix multiple flags seperated by pipe ('\|') characters in the type field of system call tables. Specifically, one can now use the 'NO' types as flags in addition to the 'COMPAT' types. For example, to tag 'COMPAT' system calls as living in a KLD via NOSTD. The COMPAT type is required to be listed first in this case. - Add new functions 'type()' and 'flag()' to the embedded awk script in makesyscalls.sh that return true if a requested flag is found in the type field ($3). The flag() function checks all of the flags in the field, but type() only checks the first flag. type() is meant to be used in the top-level "switch" statement and flag() should be used otherwise. - Retire the CPT_NOA type, it is now replaced with "COMPAT\|NOARGS" using the flags approach. - Tweak the comment descriptions of COMPAT[46] system calls so that they say "freebsd[46] foo" rather than "old foo". - Document the COMPAT6 type. - Sync comments in compat32 syscall table with the master table.
# 194384	17-Jun-2009	jhb	Remove the now-unused NOIMPL flag. It serves no useful purpose given the existing UNIMPL and NOSTD types.
# 194383	17-Jun-2009	jhb	- NOSTD results in lkmressys being used instead of lkmssys. - Mark nfsclnt as UNIMPL. It should have been NOSTD instead of NOIMPL back when it lived in nfsclient.ko, but it was removed from that a long time ago.
# 194262	15-Jun-2009	jhb	Add a new 'void closefrom(int lowfd)' system call. When called, it closes any open file descriptors >= 'lowfd'. It is largely identical to the same function on other operating systems such as Solaris, DFly, NetBSD, and OpenBSD. One difference from other *BSD is that this closefrom() does not fail with any errors. In practice, while the manpages for NetBSD and OpenBSD claim that they return EINTR, they ignore internal errors from close() and never return EINTR. DFly does return EINTR, but for the common use case (closing fd's prior to execve()), the caller really wants all fd's closed and returning EINTR just forces callers to call closefrom() in a loop until it stops failing. Note that this implementation of closefrom(2) does not make any effort to resolve userland races with open(2) in other threads. As such, it is not multithread safe. Submitted by: rwatson (initial version) Reviewed by: rwatson MFC after: 2 weeks
# 191673	29-Apr-2009	jamie	Introduce the extensible jail framework, using the same "name=value" interface as nmount(2). Three new system calls are added: * jail_set, to create jails and change the parameters of existing jails. This replaces jail(2). * jail_get, to read the parameters of existing jails. This replaces the security.jail.list sysctl. * jail_remove to kill off a jail's processes and remove the jail. Most jail parameters may now be changed after creation, and jails may be set to exist without any attached processes. The current jail(2) system call still exists, though it is now a stub to jail_set(2). Approved by: bz (mentor)
# 184789	09-Nov-2008	ed	Mark uname(), getdomainname() and setdomainname() with COMPAT_FREEBSD4. Looking at our source code history, it seems the uname(), getdomainname() and setdomainname() system calls got deprecated somewhere after FreeBSD 1.1, but they have never been phased out properly. Because we don't have a COMPAT_FREEBSD1, just use COMPAT_FREEBSD4. Also fix the Linuxolator to build without the setdomainname() routine by just making it call userland_sysctl on kern.domainname. Also replace the setdomainname()'s implementation to use this approach, because we're duplicating code with sysctl_domainname(). I wasn't able to keep these three routines working in our COMPAT_FREEBSD32, because that would require yet another keyword for syscalls.master (COMPAT4+NOPROTO). Because this routine is probably unused already, this won't be a problem in practice. If it turns out to be a problem, we'll just restore this functionality. Reviewed by: rdivacky, kib
# 184588	03-Nov-2008	dfr	Implement support for RPCSEC_GSS authentication to both the NFS client and server. This replaces the RPC implementation of the NFS client and server with the newer RPC implementation originally developed (actually ported from the userland sunrpc code) to support the NFS Lock Manager. I have tested this code extensively and I believe it is stable and that performance is at least equal to the legacy RPC implementation. The NFS code currently contains support for both the new RPC implementation and the older legacy implementation inherited from the original NFS codebase. The default is to use the new implementation - add the NFS_LEGACYRPC option to fall back to the old code. When I merge this support back to RELENG_7, I will probably change this so that users have to 'opt in' to get the new code. To use RPCSEC_GSS on either client or server, you must build a kernel which includes the KGSSAPI option and the crypto device. On the userland side, you must build at least a new libc, mountd, mount_nfs and gssd. You must install new versions of /etc/rc.d/gssd and /etc/rc.d/nfsd and add 'gssd_enable=YES' to /etc/rc.conf. As long as gssd is running, you should be able to mount an NFS filesystem from a server that requires RPCSEC_GSS authentication. The mount itself can happen without any kerberos credentials but all access to the filesystem will be denied unless the accessing user has a valid ticket file in the standard place (/tmp/krb5cc_<uid>). There is currently no support for situations where the ticket file is in a different place, such as when the user logged in via SSH and has delegated credentials from that login. This restriction is also present in Solaris and Linux. In theory, we could improve this in future, possibly using Brooks Davis' implementation of variant symlinks. Supporting RPCSEC_GSS on a server is nearly as simple. You must create service creds for the server in the form 'nfs/<fqdn>@<REALM>' and install them in /etc/krb5.keytab. The standard heimdal utility ktutil makes this fairly easy. After the service creds have been created, you can add a '-sec=krb5' option to /etc/exports and restart both mountd and nfsd. The only other difference an administrator should notice is that nfsd doesn't fork to create service threads any more. In normal operation, there will be two nfsd processes, one in userland waiting for TCP connections and one in the kernel handling requests. The latter process will create as many kthreads as required - these should be visible via 'top -H'. The code has some support for varying the number of service threads according to load but initially at least, nfsd uses a fixed number of threads according to the value supplied to its '-n' option. Sponsored by: Isilon Systems MFC after: 1 month
# 183361	25-Sep-2008	jhb	Tidy up a few things with syscall generation: - Instead of using a syscall slot (370) just to get a function prototype for lkmressys(), add an explicit function prototype to <sys/sysent.h>. This also removes unused special case checks for 'lkmressys' from makesyscalls.sh. - Instead of having magic logic in makesyscalls.sh to only generate a function prototype the first time 'lkmnosys' is seen, make 'NODEF' always not generate a function prototype and include an explicit prototype for 'lkmnosys' in <sys/sysent.h>. - As a result of the fix in (2), update the LKM syscall entries in the freebsd32 syscall table to use 'lkmnosys' rather than 'nosys'. - Use NOPROTO for the __syscall() entry (198) in the native ABI. This avoids the need for magic logic in makesyscalls.h to only generate a function prototype the first time 'nosys' is encountered.
# 182123	24-Aug-2008	rwatson	When MPSAFE ttys were merged, a new BSM audit event identifier was allocated for posix_openpt(2). Unfortunately, that identifier conflicts with other events already allocated to other systems in OpenBSM. Assign a new globally unique identifier and conform better to the AUE_ event naming scheme. This is a stopgap until a new OpenBSM import is done with the correct identifier, so we'll maintain this as a local diff in svn until then. Discussed with: ed Obtained from: TrustedBSD Project
# 181972	21-Aug-2008	obrien	Add comments on NOARGS, NODEF, and NOPROTO.
# 181905	20-Aug-2008	ed	Integrate the new MPSAFE TTY layer to the FreeBSD operating system. The last half year I've been working on a replacement TTY layer for the FreeBSD kernel. The new TTY layer was designed to improve the following: - Improved driver model: The old TTY layer has a driver model that is not abstract enough to make it friendly to use. A good example is the output path, where the device drivers directly access the output buffers. This means that an in-kernel PPP implementation must always convert network buffers into TTY buffers. If a PPP implementation would be built on top of the new TTY layer (still needs a hooks layer, though), it would allow the PPP implementation to directly hand the data to the TTY driver. - Improved hotplugging: With the old TTY layer, it isn't entirely safe to destroy TTY's from the system. This implementation has a two-step destructing design, where the driver first abandons the TTY. After all threads have left the TTY, the TTY layer calls a routine in the driver, which can be used to free resources (unit numbers, etc). The pts(4) driver also implements this feature, which means posix_openpt() will now return PTY's that are created on the fly. - Improved performance: One of the major improvements is the per-TTY mutex, which is expected to improve scalability when compared to the old Giant locking. Another change is the unbuffered copying to userspace, which is both used on TTY device nodes and PTY masters. Upgrading should be quite straightforward. Unlike previous versions, existing kernel configuration files do not need to be changed, except when they reference device drivers that are listed in UPDATING. Obtained from: //depot/projects/mpsafetty/... Approved by: philip (ex-mentor) Discussed: on the lists, at BSDCan, at the DevSummit Sponsored by: Snow B.V., the Netherlands dcons(4) fixed by: kan
# 178888	09-May-2008	julian	Add code to allow the system to handle multiple routing tables. This particular implementation is designed to be fully backwards compatible and to be MFC-able to 7.x (and 6.x) Currently the only protocol that can make use of the multiple tables is IPv4 Similar functionality exists in OpenBSD and Linux. From my notes: ----- One thing where FreeBSD has been falling behind, and which by chance I have some time to work on is "policy based routing", which allows different packet streams to be routed by more than just the destination address. Constraints: ------------ I want to make some form of this available in the 6.x tree (and by extension 7.x) , but FreeBSD in general needs it so I might as well do it in -current and back port the portions I need. One of the ways that this can be done is to have the ability to instantiate multiple kernel routing tables (which I will now refer to as "Forwarding Information Bases" or "FIBs" for political correctness reasons). Which FIB a particular packet uses to make the next hop decision can be decided by a number of mechanisms. The policies these mechanisms implement are the "Policies" referred to in "Policy based routing". One of the constraints I have if I try to back port this work to 6.x is that it must be implemented as a EXTENSION to the existing ABIs in 6.x so that third party applications do not need to be recompiled in timespan of the branch. This first version will not have some of the bells and whistles that will come with later versions. It will, for example, be limited to 16 tables in the first commit. Implementation method, Compatible version. (part 1) ------------------------------- For this reason I have implemented a "sufficient subset" of a multiple routing table solution in Perforce, and back-ported it to 6.x. (also in Perforce though not always caught up with what I have done in -current/P4). The subset allows a number of FIBs to be defined at compile time (8 is sufficient for my purposes in 6.x) and implements the changes needed to allow IPV4 to use them. I have not done the changes for ipv6 simply because I do not need it, and I do not have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it. Other protocol families are left untouched and should there be users with proprietary protocol families, they should continue to work and be oblivious to the existence of the extra FIBs. To understand how this is done, one must know that the current FIB code starts everything off with a single dimensional array of pointers to FIB head structures (One per protocol family), each of which in turn points to the trie of routes available to that family. The basic change in the ABI compatible version of the change is to extent that array to be a 2 dimensional array, so that instead of protocol family X looking at rt_tables[X] for the table it needs, it looks at rt_tables[Y][X] when for all protocol families except ipv4 Y is always 0. Code that is unaware of the change always just sees the first row of the table, which of course looks just like the one dimensional array that existed before. The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign() are all maintained, but refer only to the first row of the array, so that existing callers in proprietary protocols can continue to do the "right thing". Some new entry points are added, for the exclusive use of ipv4 code called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(), which have an extra argument which refers the code to the correct row. In addition, there are some new entry points (currently called rtalloc_fib() and friends) that check the Address family being looked up and call either rtalloc() (and friends) if the protocol is not IPv4 forcing the action to row 0 or to the appropriate row if it IS IPv4 (and that info is available). These are for calling from code that is not specific to any particular protocol. The way these are implemented would change in the non ABI preserving code to be added later. One feature of the first version of the code is that for ipv4, the interface routes show up automatically on all the FIBs, so that no matter what FIB you select you always have the basic direct attached hosts available to you. (rtinit() does this automatically). You CAN delete an interface route from one FIB should you want to but by default it's there. ARP information is also available in each FIB. It's assumed that the same machine would have the same MAC address, regardless of which FIB you are using to get to it. This brings us as to how the correct FIB is selected for an outgoing IPV4 packet. Firstly, all packets have a FIB associated with them. if nothing has been done to change it, it will be FIB 0. The FIB is changed in the following ways. Packets fall into one of a number of classes. 1/ locally generated packets, coming from a socket/PCB. Such packets select a FIB from a number associated with the socket/PCB. This in turn is inherited from the process, but can be changed by a socket option. The process in turn inherits it on fork. I have written a utility call setfib that acts a bit like nice.. setfib -3 ping target.example.com # will use fib 3 for ping. It is an obvious extension to make it a property of a jail but I have not done so. It can be achieved by combining the setfib and jail commands. 2/ packets received on an interface for forwarding. By default these packets would use table 0, (or possibly a number settable in a sysctl(not yet)). but prior to routing the firewall can inspect them (see below). (possibly in the future you may be able to associate a FIB with packets received on an interface.. An ifconfig arg, but not yet.) 3/ packets inspected by a packet classifier, which can arbitrarily associate a fib with it on a packet by packet basis. A fib assigned to a packet by a packet classifier (such as ipfw) would over-ride a fib associated by a more default source. (such as cases 1 or 2). 4/ a tcp listen socket associated with a fib will generate accept sockets that are associated with that same fib. 5/ Packets generated in response to some other packet (e.g. reset or icmp packets). These should use the FIB associated with the packet being reponded to. 6/ Packets generated during encapsulation. gif, tun and other tunnel interfaces will encapsulate using the FIB that was in effect withthe proces that set up the tunnel. thus setfib 1 ifconfig gif0 [tunnel instructions] will set the fib for the tunnel to use to be fib 1. Routing messages would be associated with their process, and thus select one FIB or another. messages from the kernel would be associated with the fib they refer to and would only be received by a routing socket associated with that fib. (not yet implemented) In addition Netstat has been edited to be able to cope with the fact that the array is now 2 dimensional. (It looks in system memory using libkvm (!)). Old versions of netstat see only the first FIB. In addition two sysctls are added to give: a) the number of FIBs compiled in (active) b) the default FIB of the calling process. Early testing experience: ------------------------- Basically our (IronPort's) appliance does this functionality already using ipfw fwd but that method has some drawbacks. For example, It can't fully simulate a routing table because it can't influence the socket's choice of local address when a connect() is done. Testing during the generating of these changes has been remarkably smooth so far. Multiple tables have co-existed with no notable side effects, and packets have been routes accordingly. ipfw has grown 2 new keywords: setfib N ip from anay to any count ip from any to any fib N In pf there seems to be a requirement to be able to give symbolic names to the fibs but I do not have that capacity. I am not sure if it is required. SCTP has interestingly enough built in support for this, called VRFs in Cisco parlance. it will be interesting to see how that handles it when it suddenly actually does something. Where to next: -------------------- After committing the ABI compatible version and MFCing it, I'd like to proceed in a forward direction in -current. this will result in some roto-tilling in the routing code. Firstly: the current code's idea of having a separate tree per protocol family, all of the same format, and pointed to by the 1 dimensional array is a bit silly. Especially when one considers that there is code that makes assumptions about every protocol having the same internal structures there. Some protocols don't WANT that sort of structure. (for example the whole idea of a netmask is foreign to appletalk). This needs to be made opaque to the external code. My suggested first change is to add routing method pointers to the 'domain' structure, along with information pointing the data. instead of having an array of pointers to uniform structures, there would be an array pointing to the 'domain' structures for each protocol address domain (protocol family), and the methods this reached would be called. The methods would have an argument that gives FIB number, but the protocol would be free to ignore it. When the ABI can be changed it raises the possibilty of the addition of a fib entry into the "struct route". Currently, the structure contains the sockaddr of the desination, and the resulting fib entry. To make this work fully, one could add a fib number so that given an address and a fib, one can find the third element, the fib entry. Interaction with the ARP layer/ LL layer would need to be revisited as well. Qing Li has been working on this already. This work was sponsored by Ironport Systems/Cisco Reviewed by: several including rwatson, bz and mlair (parts each) Obtained from: Ironport systems/Cisco
# 177788	31-Mar-2008	kib	Add the openat(), fexecve() and other *at() syscalls to the table. Based on the submission by rdivacky, sponsored by Google Summer of Code 2007 Reviewed by: rwatson, rdivacky Tested by: pho
# 177633	26-Mar-2008	dfr	Add the new kernel-mode NFS Lock Manager. To use it instead of the user-mode lock manager, build a kernel with the NFSLOCKD option and add '-k' to 'rpc_lockd_flags' in rc.conf. Highlights include: * Thread-safe kernel RPC client - many threads can use the same RPC client handle safely with replies being de-multiplexed at the socket upcall (typically driven directly by the NIC interrupt) and handed off to whichever thread matches the reply. For UDP sockets, many RPC clients can share the same socket. This allows the use of a single privileged UDP port number to talk to an arbitrary number of remote hosts. * Single-threaded kernel RPC server. Adding support for multi-threaded server would be relatively straightforward and would follow approximately the Solaris KPI. A single thread should be sufficient for the NLM since it should rarely block in normal operation. * Kernel mode NLM server supporting cancel requests and granted callbacks. I've tested the NLM server reasonably extensively - it passes both my own tests and the NFS Connectathon locking tests running on Solaris, Mac OS X and Ubuntu Linux. * Userland NLM client supported. While the NLM server doesn't have support for the local NFS client's locking needs, it does have to field async replies and granted callbacks from remote NLMs that the local client has contacted. We relay these replies to the userland rpc.lockd over a local domain RPC socket. * Robust deadlock detection for the local lock manager. In particular it will detect deadlocks caused by a lock request that covers more than one blocking request. As required by the NLM protocol, all deadlock detection happens synchronously - a user is guaranteed that if a lock request isn't rejected immediately, the lock will eventually be granted. The old system allowed for a 'deferred deadlock' condition where a blocked lock request could wake up and find that some other deadlock-causing lock owner had beaten them to the lock. * Since both local and remote locks are managed by the same kernel locking code, local and remote processes can safely use file locks for mutual exclusion. Local processes have no fairness advantage compared to remote processes when contending to lock a region that has just been unlocked - the local lock manager enforces a strict first-come first-served model for both local and remote lockers. Sponsored by: Isilon Systems PR: 95247 107555 115524 116679 MFC after: 2 weeks
# 177597	25-Mar-2008	ru	Fixed type of the fourth argument of cpuset_{get,set}affinity(2) to be size_t. Prodded by: davidxu
# 177091	12-Mar-2008	jeff	Remove kernel support for M:N threading. While the KSE project was quite successful in bringing threading to FreeBSD, the M:N approach taken by the kse library was never developed to its full potential. Backwards compatibility will be provided via libmap.conf for dynamically linked binaries and static binaries will be broken.
# 176730	02-Mar-2008	jeff	Add cpuset, an api for thread to cpu binding and cpu resource grouping and assignment. - Add a reference to a struct cpuset in each thread that is inherited from the thread that created it. - Release the reference when the thread is destroyed. - Add prototypes for syscalls and macros for manipulating cpusets in sys/cpuset.h - Add syscalls to create, get, and set new numbered cpusets: cpuset(), cpuset_{get,set}id() - Add syscalls for getting and setting affinity masks for cpusets or individual threads: cpuid_{get,set}affinity() - Add types for the 'level' and 'which' parameters for the cpuset. This will permit expansion of the api to cover cpu masks for other objects identifiable with an id_t integer. For example, IRQs and Jails may be coming soon. - The root set 0 contains all valid cpus. All thread initially belong to cpuset 1. This permits migrating all threads off of certain cpus to reserve them for special applications. Sponsored by: Nokia Discussed with: arch, rwatson, brooks, davidxu, deischen Reviewed by: antoine
# 176215	12-Feb-2008	ru	Change readlink(2)'s return type and type of the last argument to match POSIX. Prodded by: Alexey Lyashkov
# 175517	20-Jan-2008	rwatson	Use audit events AUE_SHMOPEN and AUE_SHMUNLINK with new system calls shm_open() and shm_unlink(). More auditing will need to be done for these calls to capture arguments properly.
# 175164	08-Jan-2008	jhb	Add a new file descriptor type for IPC shared memory objects and use it to implement shm_open(2) and shm_unlink(2) in the kernel: - Each shared memory file descriptor is associated with a swap-backed vm object which provides the backing store. Each descriptor starts off with a size of zero, but the size can be altered via ftruncate(2). The shared memory file descriptors also support fstat(2). read(2), write(2), ioctl(2), select(2), poll(2), and kevent(2) are not supported on shared memory file descriptors. - shm_open(2) and shm_unlink(2) are now implemented as system calls that manage shared memory file descriptors. The virtual namespace that maps pathnames to shared memory file descriptors is implemented as a hash table where the hash key is generated via the 32-bit Fowler/Noll/Vo hash of the pathname. - As an extension, the constant 'SHM_ANON' may be specified in place of the path argument to shm_open(2). In this case, an unnamed shared memory file descriptor will be created similar to the IPC_PRIVATE key for shmget(2). Note that the shared memory object can still be shared among processes by sharing the file descriptor via fork(2) or sendmsg(2), but it is unnamed. This effectively serves to implement the getmemfd() idea bandied about the lists several times over the years. - The backing store for shared memory file descriptors are garbage collected when they are not referenced by any open file descriptors or the shm_open(2) virtual namespace. Submitted by: dillon, peter (previous versions) Submitted by: rwatson (I based this on his version) Reviewed by: alc (suggested converting getmemfd() to shm_open())
# 172819	19-Oct-2007	emaste	Put comments about syscalls by the correct ones, and use the correct syscall number in the comment.
# 171859	16-Aug-2007	davidxu	Add thr_kill2 syscall which sends a signal to a thread in another process. Submitted by: Tijl Coosemans tijl at ulyssis dot org Approved by: re (kensmith)
# 171209	04-Jul-2007	peter	Create new syscalls for mmap(), lseek(), pread(), pwrite(), truncate() and ftruncate(), but without the pad arg. There are several reasons for this. Consider 'mmap()'. On AMD64, the function call (and syscall) ABI allow for 6 register arguments. Additional arguments go on the stack. mmap(2) has 6 arguments. However, the syscall definition has an extra 'int pad' argument. This pushes it to 7 arguments, which means one must spill into the memory stack. Since the kernel API doesn't match userland API, we have a hack in libc - libc/sys/mmap.c. This implements the userland API by calling __syscall() with an extra argument and the pad argument, for a total of 8 args. This is all unnecessary and inconvenient for several things, including the kernel's syscall handler code which now has to handle merging stack arguments with register arguments. It is a big deal for certain 3rd party code. I'm adding libc glue to make the transition totally painless. I had intended to mark the old syscalls as COMPAT6, but the potential to shoot your feet by building a new kernel without COMPAT_FREEBSD6 but with a slighly older userland was too great. For now, they have manual "freebsd6_" prefixes rather than being COMPAT6. They will go back to being marked 'COMPAT6' after 7-stable starts. Approved by: re (kensmith)
# 163953	03-Nov-2006	rrs	Ok, here it is, we finally add SCTP to current. Note that this work is not just mine, but it is also the works of Peter Lei and Michael Tuexen. They both are my two key other developers working on the project.. and they need ata-boy's too: ** peterlei@cisco.com tuexen@fh-muenster.de ** I did do a make sysent which updated the syscall's and sysproto.. I hope that is correct... without it you don't build since we have new syscalls for SCTP :-0 So go out and look at the NOTES, add option SCTP (make sure inet and inet6 are present too) and play with SCTP. I will see about comitting some test tools I have after I figure out where I should place them. I also have a lib (libsctp.a) that adds some of the missing socketapi functions that I need to put into lib's.. I will talk to George about this :-) There may still be some 64 bit issues in here, none of us have a 64 bit processor to test with yet.. Michael may have a MAC but thats another beast too.. If you have a mac and want to use SCTP contact Michael he maintains a web site with a loadable module with this code :-) Reviewed by: gnn Approved by: gnn
# 163449	17-Oct-2006	davidxu	o Add keyword volatile for user mutex owner field. o Fix type consistent problem by using type long for old umtx and wait channel. o Rename casuptr to casuword.
# 162991	03-Oct-2006	rwatson	Audit creat() system call (compat code), and change type for getpagesize(), which isn't actually being audited anyway. MFC after: 3 days Obtained from: TrustedBSD Project
# 162497	21-Sep-2006	davidxu	Replace system call thr_getscheduler, thr_setscheduler, thr_setschedparam with rtprio_thread, while rtprio system call is for process only, the new system call rtprio_thread is responsible for LWP.
# 162373	17-Sep-2006	rwatson	AUE_SIGALTSTACK instead of AUE_SIGPENDING for sigaltstack(). Obtained from: TrustedBSD Project MFC after: 3 days
# 161952	03-Sep-2006	rwatson	Assign proper audit event identifiers to a number of system calls not covered in previous passes: - sysarch, rtprio - clock_settime - preadv/pwritev - __getcwd - kqueue - fhstatfs - kldunloadf Obtained from: TrustedBSD Project
# 161946	03-Sep-2006	rwatson	Use AUE_NTP_ADJTIME for ntp_adjtime() instead of AUE_ADJTIME. Obtained from: TrustedBSD Project
# 161678	28-Aug-2006	davidxu	This is initial version of POSIX priority mutex support, a new userland mutex structure is added as following: struct umutex { __lwpid_t m_owner; uint32_t m_flags; uint32_t m_ceilings[2]; uint32_t m_spare[4]; }; The m_owner represents owner thread, it is a thread id, in non-contested case, userland can simply use atomic_cmpset_int to lock the mutex, if the mutex is contested, high order bit will be set, and userland should do locking and unlocking via kernel syscall. Flag UMUTEX_PRIO_INHERIT represents pthread's PTHREAD_PRIO_INHERIT mutex, which when contention happens, kernel should do priority propagating. Flag UMUTEX_PRIO_PROTECT indicates it is pthread's PTHREAD_PRIO_PROTECT mutex, userland should initialize m_owner to contested state UMUTEX_CONTESTED, then atomic_cmpset_int will be failure and kernel syscall should be invoked to do locking, this becauses for such a mutex, kernel should always boost the thread's priority before it can lock the mutex, m_ceilings is used by PTHREAD_PRIO_PROTECT mutex, the first element is used to boost thread's priority when it locked the mutex, second element is used when the mutex is unlocked, the PTHREAD_PRIO_PROTECT mutex's link list is kept in userland, the m_ceiling[1] is managed by thread library so kernel needn't allocate memory to keep the link list, when such a mutex is unlocked, kernel reset m_owner to UMUTEX_CONTESTED. Flag USYNC_PROCESS_SHARED indicate if the synchronization object is process shared, if the flag is not set, it saves a vm_map_lookup() call. The umtx chain is still used as a sleep queue, when a thread is blocked on PTHREAD_PRIO_INHERIT mutex, a umtx_pi is allocated to support priority propagating, it is dynamically allocated and reference count is used, it is not optimized but works well in my tests, while the umtx chain has its own locking protocol, the priority propagating protocol are all protected by sched_lock because priority propagating function is called with sched_lock held from scheduler. No visible performance degradation is found which these changes. Some parameter names in _umtx_op syscall are renamed.
# 161367	16-Aug-2006	peter	Grab two syscall numbers. One is used to emulate functionality that linux has in its procfs (do a readlink of /proc/self/fd/<nn> to find the pathname that corresponds to a given file descriptor). Valgrind-3.x needs this functionality. This is a placeholder only at this time.
# 161325	15-Aug-2006	jhb	- Use NOSTD rather than NOIMPL for nfssvc() to match other syscalls provided via klds. - Correct audit identifier for nfssvc().
# 160798	28-Jul-2006	jhb	Now that all system calls are MPSAFE, retire the SYF_MPSAFE flag used to mark system calls as being MPSAFE: - Stop conditionally acquiring Giant around system call invocations. - Remove all of the 'M' prefixes from the master system call files. - Remove support for the 'M' prefix from the script that generates the syscall-related files from the master system call files. - Don't explicitly set SYF_MPSAFE when registering nfssvc.
# 160797	28-Jul-2006	jhb	Various fixes to comments in the syscall master files including removing cruft from the audit import and adding mention of COMPAT4 to freebsd32.
# 160319	13-Jul-2006	davidxu	Add syscalls thr_setscheduler, thr_getscheduler, and thr_setschedparam, these syscalls are designed to set thread's scheduling parameters and policy, because each syscall contains a size parameter, it is possible to support future scheduling option, e.g SCHED_SPORADIC, this option needs other fields in structure sched_param, current they are not avaiblable.
# 160276	11-Jul-2006	jhb	- Add conditional VFS Giant locking to getdents_common() (linux ABIs), ibcs2_getdents(), ibcs2_read(), ogetdirentries(), svr4_sys_getdents(), and svr4_sys_getdents64() similar to that in getdirentries(). - Mark ibcs2_getdents(), ibcs2_read(), linux_getdents(), linux_getdents64(), linux_readdir(), ogetdirentries(), svr4_sys_getdents(), and svr4_sys_getdents64() MPSAFE.
# 160111	05-Jul-2006	wsalamon	Add audit events for the extended attribute system calls. Obtained from: TrustedBSD Project Approved by: rwatson (mentor)
# 159982	27-Jun-2006	jhb	- Expand the scope of Giant some in mount(2) to protect the vfsp structure from going away. mount(2) is now MPSAFE. - Expand the scope of Giant some in unmount(2) to protect the mp structure (or rather, to handle concurrent unmount races) from going away. umount(2) is now MPSAFE, as well as linux_umount() and linux_oldumount(). - nmount(2) and linux_mount() were already MPSAFE.
# 157211	28-Mar-2006	des	Revert previous commit at davidxu's insistance. Instead, use __DECONST (argh!) and rearrange the prototypes to make it clear that _umtx_op() is not deprecated.
# 157206	28-Mar-2006	des	The undocumented and deprecated system call _umtx_op() takes two pointer arguments. The first one is never used (all callers pass in 0); the second is sometimes used to pass in a struct timespec * which is used as a timeout and never modified. Constify that argument so callers can pass a const struct timespec * without jumping through hoops.
# 157037	23-Mar-2006	davidxu	Implement aio_fsync() syscall.
# 156134	01-Mar-2006	davidxu	Let kernel POSIX timer code and mqueue code to use integer as a resource handle, the timer_t and mqd_t types will be a pointer which userland will define it.
# 155377	06-Feb-2006	rwatson	Prefer AUE_FOO audit identifiers to AUE_O_FOO, which are largely left over from the Darwin implementation. When we implement a system call as a wrapper to sysctl(), audit it as AUE_SYSCTL. This leads to greater compatibility with Solaris audit trails as sysctl() argument tokens are not the same as the ones for the originaly system calls (i.e., setdomainname()). Replace references to AUE_ events that are equivilent to AUE_NULL with AUE_NULL. In the case of process signal configuration, this is because these events do not require auditing. Move from the Darwin spelling of getsockopt() to the FreeBSD/Solaris one. Audit nmount(). Obtained from: TrustedBSD Project
# 155327	05-Feb-2006	davidxu	Implement thr_set_name to set a name for thread. Reviewed by: julian
# 155249	03-Feb-2006	rwatson	Assign audit event identifiers to many system calls. Much work by: wsalamon Obtained from: TrustedBSD Project
# 155199	01-Feb-2006	rwatson	Map audit-related system calls to audit event identifiers. Much work by: wsalamon Obtained from: TrustedBSD Project
# 154669	22-Jan-2006	davidxu	Make aio code MP safe.
# 153679	23-Dec-2005	phk	Add abort2() systemcall.
# 152845	26-Nov-2005	davidxu	Don't use OpenBSD syscall numbers, instead, use new syscall numbers for POSIX message queue. Suggested by: rwatson
# 152825	26-Nov-2005	davidxu	Bring in experimental kernel support for POSIX message queue.
# 151867	30-Oct-2005	davidxu	Fix sigevent's POSIX incompatible problem by adding member fields sigev_notify_function and sigev_notify_attributes. AIO syscalls use sigevent, so they have to be adjusted. Reviewed by: alc
# 151576	23-Oct-2005	davidxu	Implement POSIX timers. Current only CLOCK_REALTIME and CLOCK_MONOTONIC clock are supported. I have plan to merge XSI timer ITIMER_REAL and other two CPU timers into the new code, current three slots are available for the XSI timers. The SIGEV_THREAD notification type is not supported yet because our sigevent struct lacks of two member fields: sigev_notify_function sigev_notify_attributes I have found the sigevent is used in AIO, so I won't add the two members unless the AIO code is adjusted.
# 151445	18-Oct-2005	stefanf	Const-qualify ksem_timedwait's parameter abstime as it's only passed in.
# 151316	14-Oct-2005	davidxu	1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most changes in MD code are trivial, before this change, trapsignal and sendsig use discrete parameters, now they uses member fields of ksiginfo_t structure. For sendsig, this change allows us to pass POSIX realtime signal value to user code. 2. Remove cpu_thread_siginfo, it is no longer needed because we now always generate ksiginfo_t data and feed it to libpthread. 3. Add p_sigqueue to proc structure to hold shared signals which were blocked by all threads in the proc. 4. Add td_sigqueue to thread structure to hold all signals delivered to thread. 5. i386 and amd64 now return POSIX standard si_code, other arches will be fixed. 6. In this sigqueue implementation, pending signal set is kept as before, an extra siginfo list holds additional siginfo_t data for signals. kernel code uses psignal() still behavior as before, it won't be failed even under memory pressure, only exception is when deleting a signal, we should call sigqueue_delete to remove signal from sigqueue but not SIGDELSET. Current there is no kernel code will deliver a signal with additional data, so kernel should be as stable as before, a ksiginfo can carry more information, for example, allow signal to be delivered but throw away siginfo data if memory is not enough. SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can not be caught or masked. The sigqueue() syscall allows user code to queue a signal to target process, if resource is unavailable, EAGAIN will be returned as specification said. Just before thread exits, signal queue memory will be freed by sigqueue_flush. Current, all signals are allowed to be queued, not only realtime signals. Earlier patch reviewed by: jhb, deischen Tested on: i386, amd64
# 150619	27-Sep-2005	csjp	Mark the extended attribute syscalls as being MP safe. Requested by: jhb
# 147831	08-Jul-2005	jhb	Mark second instance of lchown() MP safe just like the first. Approved by: re (scottl)
# 147813	07-Jul-2005	jhb	- Add two new system calls: preadv() and pwritev() which are like readv() and writev() except that they take an additional offset argument and do not change the current file position. In SAT speak: preadv:readv::pread:read and pwritev:writev::pwrite:write. - Try to reduce code duplication some by merging most of the old kern_foov() and dofilefoo() functions into new dofilefoo() functions that are called by kern_foov() and kern_pfoov(). The non-v functions now all generate a simple uio on the stack from the passed in arguments and then call kern_foov(). For example, read() now just builds a uio and calls kern_readv() and pwrite() just builds a uio and calls kern_pwritev(). PR: kern/80362 Submitted by: Marc Olzheim marcolz at stack dot nl (1) Approved by: re (scottl) MFC after: 1 week
# 146806	30-May-2005	rwatson	Introduce a new field in the syscalls.master file format to hold the audit event identifier associated with each system call, which will be stored by makesyscalls.sh in the sy_auevent field of struct sysent. For now, default the audit identifier on all system calls to AUE_NULL, but in the near future, other BSM event identifiers will be used. The mapping of system calls to event identifiers is many:one due to multiple system calls that map to the same end functionality across compatibility wrappers, ABI wrappers, etc. Submitted by: wsalamon Obtained from: TrustedBSD Project
# 146783	29-May-2005	rwatson	Normalize white space in syscalls.master: try to use tabs before system call types.
# 146723	28-May-2005	rwatson	Mark ntp_gettime() as MSTD, since its system call path will acquire Giant if required.
# 146719	28-May-2005	rwatson	Mark the following compatability system calls as MCOMPAT or MCOMPAT4 based on the their simply wrapping MPSAFE implementations of existing MPSAFE system calls: getfsstat() lseek() stat() lstat() truncate() ftruncate() statfs() fstatfs() Note that ogetdirentries() is not marked MPSAFE because it does not share the MPSAFE implementation used for getdirentries(), and requires separate locking to be implemented.
# 146716	28-May-2005	rwatson	Mark quotactl() as MSTD.
# 146713	28-May-2005	rwatson	Mark kenv(2) as MPSAFE, since it appears to be properly locked down.
# 146711	28-May-2005	rwatson	Also mark the COMPAT4 version of fhstatfs() as MPSAFE.
# 146710	28-May-2005	rwatson	Mark fhopen(), fhstat(), and fhstatfs() as MSTD, since they now acquire Giant themselves.
# 145434	23-Apr-2005	davidxu	Add new syscall thr_new to create thread in atomic, it will inherit signal mask from parent thread, setup TLS and stack, and user entry address. Also support POSIX thread's PTHREAD_SCOPE_PROCESS and PTHREAD_SCOPE_SYSTEM, sysctl is also provided to control the scheduler scope.
# 143317	09-Mar-2005	stefanf	Fix typo in comment.
# 142932	01-Mar-2005	ps	Change the prototype of kevent to remove the const from the changelist. Reviewed by: jhb
# 140840	26-Jan-2005	jeff	- Struct mount is not yet locked well enough to allow mount/nmount/unmount to run without Giant. Mark them as STD here.
# 140724	24-Jan-2005	jeff	- Change all VFS syscalls to MSTD as they all manually deal with giant or the appropriate filesystem locks. Sponsored By: Isilon Systems, Inc.
# 139598	02-Jan-2005	marcel	uuidgen(2) is MP safe.
# 139292	25-Dec-2004	davidxu	Make _umtx_op() as more general interface, the final parameter needn't be timespec pointer, every parameter will be interpreted by its opcode.
# 139013	18-Dec-2004	davidxu	1. make umtx sharable between processes, the way is two or more processes call mmap() to create a shared space, and then initialize umtx on it, after that, each thread in different processes can use the umtx same as threads in same process. 2. introduce a new syscall _umtx_op to support timed lock and condition variable semantics. also, orignal umtx_lock and umtx_unlock inline functions now are reimplemented by using _umtx_op, the _umtx_op can use arbitrary id not just a thread id.
# 138088	25-Nov-2004	phk	Mark mount, unmount and nmount MPSAFE
# 137874	18-Nov-2004	marks	Add ntp_gettime(2) system call. Reviewed by: imp, phk, njl, peter Approved by: njl
# 136830	23-Oct-2004	rwatson	Add system call place-holders for the following system calls implementing Sun's BSM Audit API on FreeBSD: audit() auditon() getauid() setauid() getaudit() setaudit() getaudit_addr() setaudit_addr() auditctl() Submitted by: Wayne Salamon <wsalamon at computer dot org> Obtained from: TrustedBSD Project
# 136207	06-Oct-2004	davidxu	Regen to unbreak world. Pointy hat to: mtm
# 132116	13-Jul-2004	phk	Add kldunloadf() system call. Stay tuned for follwing commit messages.
# 132020	12-Jul-2004	davidxu	Change kse_switchin to accept kse_thr_mailbox pointer, the syscall will be used heavily in debugging KSE threads. This breaks libpthread on IA64, but because libpthread was not in 5.2.1 release, I would like to change it so we needn't to introduce another syscall.
# 131431	01-Jul-2004	marcel	Change the thread ID (thr_id_t) used for 1:1 threading from being a pointer to the corresponding struct thread to the thread ID (lwpid_t) assigned to that thread. The primary reason for this change is that libthr now internally uses the same ID as the debugger and the kernel when referencing to a kernel thread. This allows us to implement the support for debugging without additional translations and/or mappings. To preserve the ABI, the 1:1 threading syscalls, including the umtx locking API have not been changed to work on a lwpid_t. Instead the 1:1 threading syscalls operate on long and the umtx locking API has not been changed except for the contested bit. Previously this was the least significant bit. Now it's the most significant bit. Since the contested bit should not be tested by userland, this change is not expected to be visible. Just to be sure, UMTX_CONTESTED has been removed from <sys/umtx.h>. Reviewed by: mtm@ ABI preservation tested on: i386, ia64
# 130907	22-Jun-2004	rwatson	Mark unlink() as MPSAFE as we now acquire Giant in the unlink() system call.
# 130904	22-Jun-2004	rwatson	Mark link() system call as MPSAFE.
# 127890	05-Apr-2004	dfr	Add lgetfh(2) which is like getfh(2) but doesn't follow symlinks.
# 127482	27-Mar-2004	mtm	Separate thread synchronization from signals in libthr. Instead use msleep() and wakeup_one(). Discussed with: jhb, peter, tjr
# 127061	16-Mar-2004	dwmalone	Get ready to mark open, creat and nosys as MPSAFE.
# 127034	15-Mar-2004	jhb	Drop the proc lock around calls to the MD functions ptrace_single_step(), ptrace_set_pc(), and cpu_ptrace() so that those functions are free to acquire Giant, sleep, etc. We already do a PHOLD/PRELE around them so that it is safe to sleep inside of these routines if necessary. This allows ptrace() to be marked MP safe again as it no longer triggers lock order reversals on Alpha. Tested by: wilko
# 126932	13-Mar-2004	peter	Push Giant down a little further: - no longer serialize on Giant for thread_single*() and family in fork, exit and exec - thread_wait() is mpsafe, assert no Giant - reduce scope of Giant in exit to not cover thread_wait and just do vm_waitproc(). - assert that thread_single() family are not called with Giant - remove the DROP/PICKUP_GIANT macros from thread_single() family - assert that thread_suspend_check() s not called with Giant - remove manual drop_giant hack in thread_suspend_check since we know it isn't held. - remove the DROP/PICKUP_GIANT macros from thread_suspend_check() family - mark kse_create() mpsafe
# 125368	03-Feb-2004	deischen	Add ksem_timedwait() to complement ksem_wait(). Glanced at by: alfred
# 123853	26-Dec-2003	alfred	Put restrict back in, the compilation failure was my fault when I did a bad merge from the PR. Thanks to Bruce Evans for explaining.
# 123817	24-Dec-2003	alfred	We're not ready for restrict qualifiers here.
# 123811	24-Dec-2003	alfred	Add restrict qualifiers. PR: 44394 Submitted by: Craig Rodrigues <rodrige@attbi.com>
# 123750	23-Dec-2003	peter	Remove namespc column and attempt to un-fold some of the longer lines that now fit.
# 123412	10-Dec-2003	peter	Previous commit also changed the sendmsg prototype to something more closely matching reality. I did not actually mean to commit that yet.
# 123408	10-Dec-2003	peter	Update file locations for syscall tables to copy to.
# 123252	07-Dec-2003	marcel	Add kse_switchin(2). This syscall can be used by KSE implementations to have the kernel switch to a new thread, instead of doing it in userland. It is in fact needed on ia64 where syscall restarts do not return to userland first. It's completely handled inside the kernel. As such, any context created by the kernel as part of an upcall and caused by some syscall needs to be restored by the kernel.
# 122635	14-Nov-2003	jeff	- Revision 1.156 marked ptrace() SMP safe. Unfortunately, alpha implements parts of ptrace using proc_rwmem(). proc_rwmem() requires giant, and giant must be acquired prior to the proc lock, so ptrace must require giant still.
# 122537	12-Nov-2003	mckusick	Update the statfs structure with 64-bit fields to allow accurate reporting of multi-terabyte filesystem sizes. You should build and boot a new kernel BEFORE doing a `make world' as the new kernel will know about binaries using the old statfs structure, but an old kernel will not know about the new system calls that support the new statfs structure. Running an old kernel after a `make world' will cause programs such as `df' that do a statfs system call to fail with a bad system call. Reviewed by: Bruce Evans <bde@zeta.org.au> Reviewed by: Tim Robbins <tjr@freebsd.org> Reviewed by: Julian Elischer <julian@elischer.org> Reviewed by: the hoards of <arch@freebsd.org> Sponsored by: DARPA & NAI Labs.
# 122241	07-Nov-2003	jhb	Mark ptrace(), ktrace(), utrace(), sysarch(), and issetugid() as MP safe. The parts of these calls that are not yet MP safe acquire Giant explicitly.
# 121298	21-Oct-2003	scottl	Don peril-sensitive sunglasses and mark pipe(2) as MPSAFE. I've beaten up on it for the last 15 hours with no signs of problems. It gives a small (1%) gain on buildworld since pipe_read/pipe_write are already free of Giant.
# 121284	20-Oct-2003	dwmalone	Mark dup as MPSAFE. Giant was pushed into dup ages ago, but it looks like it was missed in syscalls.master. Spotted by: alc
# 119827	07-Sep-2003	alc	msync(2) should be declared MP-safe.
# 117704	17-Jul-2003	davidxu	o Refine kse_thr_interrupt to allow it to handle different commands. o Remove TDF_NOSIGPOST. o Add a member td_waitset to proc structure, it will be used for sigwait. Tested by: deischen
# 116963	28-Jun-2003	davidxu	o Change kse_thr_interrupt to allow send a signal to a specified thread, or unblock a thread in kernel, and allow UTS to specify whether syscall should be restarted. o Add ability for UTS to monitor signal comes in and removed from process, the flag PS_SIGEVENT is used to indicate the events. o Add a KMF_WAITSIGEVENT for KSE mailbox flag, UTS call kse_release with this flag set to wait for above signal event. o For SA based thread, kernel masks all signal in its signal mask, let UTS to use kse_thr_interrupt interrupt a thread, and install a signal frame in userland for the thread. o Add a tm_syncsig in thread mailbox, when a hardware trap occurs, it is used to deliver synchronous signal to userland, and upcall is schedule, so UTS can process the synchronous signal for the thread. Reviewed by: julian (mentor)
# 115799	04-Jun-2003	rwatson	Add system calls to explicitly list extended attributes on a file/directory/link, rather than using a less explicit hack on the extattr retrieval API: extattr_list_fd() extattr_list_file() extattr_list_link() The existing API was counter-intuitive, and poorly documented. The prototypes for these system calls are identical to extattr_get_*(), but without a specific attribute name to leave NULL. Pointed out by: Dominic Giampaolo <dbg@apple.com> Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
# 113275	09-Apr-2003	mike	o In struct prison, add an allprison linked list of prisons (protected by allprison_mtx), a unique prison/jail identifier field, two path fields (pr_path for reporting and pr_root vnode instance) to store the chroot() point of each jail. o Add jail_attach(2) to allow a process to bind to an existing jail. o Add change_root() to perform the chroot operation on a specified vnode. o Generalize change_dir() to accept a vnode, and move namei() calls to callers of change_dir(). o Add a new sysctl (security.jail.list) which is a group of struct xprison instances that represent a snapshot of active jails. Reviewed by: rwatson, tjr
# 112911	01-Apr-2003	jeff	- Mark the various thr syscalls as MP safe. Previously there was a bug if this was not done since thr_exit() unwinds giant.
# 112906	31-Mar-2003	jeff	- Include umtx.h in files generated by makesyscalls.sh - Add system calls for umtx.
# 112901	31-Mar-2003	jeff	- Add the four thr related system calls.
# 112893	31-Mar-2003	jeff	- Define sigwait, sigtimedwait, and sigwaitinfo in terms of kern_sigtimedwait() which is capable of supporting all of their semantics. - These should be POSIX compliant but more careful review is needed before we announce this.
# 111169	20-Feb-2003	davidxu	Add a timeout parameter to kse_release.
# 109895	26-Jan-2003	alfred	Add const qualifier to data argument for msgsnd. PR: standards/45274 Submitted by: Craig Rodrigues <rodrigc@attbi.com>
# 109831	25-Jan-2003	alfred	Bring shm functions closer the the opengroup standards. PR: 47469 Submitted by: Craig Rodrigues <rodrigc@attbi.com>
# 109829	25-Jan-2003	alfred	Bring semop() closer the the opengroup standards. PR: 47471 Submitted by: Craig Rodrigues <rodrigc@attbi.com>
# 108659	04-Jan-2003	davidxu	Some KSE syscalls are MPSAFE.
# 108405	29-Dec-2002	rwatson	Add definitions for four new system calls: __acl_get_link() Retrieve an ACL by name without following symbolic links. __acl_set_link() Set an ACL by name without following symbolic links. __acl_delete_link() Delete an ACL by name without following symbolic links. __acl_aclcheck_link() Check an ACL against a file by name without following symbolic links. These calls are similar in spirit to lstat(), lchown(), lchmod(), etc, and will be used under similar circumstances. Obtained from: TrustedBSD Project
# 107913	15-Dec-2002	dillon	This is David Schultz's swapoff code which I am finally able to commit. This should be considered highly experimental for the moment. Submitted by: David Schultz <dschultz@uclink.Berkeley.EDU> MFC after: 3 weeks
# 106977	16-Nov-2002	deischen	Add getcontext, setcontext, and swapcontext as system calls. Previously these were libc functions but were requested to be made into system calls for atomicity and to coalesce what might be two entrances into the kernel (signal mask setting and floating point trap) into one. A few style nits and comments from bde are also included. Tested on alpha by: gallatin
# 106466	05-Nov-2002	rwatson	Flesh out the definition of __mac_execve(): per earlier discussion, it's essentially execve() with an optional MAC label argument. Approved by: re Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
# 106312	01-Nov-2002	rwatson	Rename __execve_mac() to __mac_execve() for increased consistency with other MAC system calls. Requested by: various (phk, gordont, jake, ...)
# 105950	25-Oct-2002	peter	Split 4.x and 5.x signal handling so that we can keep 4.x signal handling clean and functional as 5.x evolves. This allows some of the nasty bandaids in the 5.x codepaths to be unwound. Encapsulate 4.x signal handling under COMPAT_FREEBSD4 (there is an anti-foot-shooting measure in place, 5.x folks need this for a while) and finish encapsulating the older stuff under COMPAT_43. Since the ancient stuff is required on alpha (longjmp(3) passes a 'struct osigcontext ' to the current sigreturn(2), instead of the 'ucontext_t ' that sigreturn is supposed to take), add a compile time check to prevent foot shooting there too. Add uniform COMPAT_43 stubs for ia64/sparc64/powerpc. Tested on: i386, alpha, ia64. Compiled on sparc64 (a few days ago). Approved by: re
# 105691	22-Oct-2002	rwatson	Flesh out prototypes for __mac_get_pid, __mac_get_link, and __mac_set_link, based on __mac_get_proc() except with a pid, and __mac_get_file(), __mac_set_file() except that they do not follow symlinks. First in a series of commits to flesh out the user API. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
# 105490	19-Oct-2002	peter	Stake a claim on 418 (__xstat), 419 (__xfstat), 420 (__xlstat)
# 105486	19-Oct-2002	peter	Grab 416/417 real estate before I get burned while testing again. This is for the not-quite-ready signal/fpu abi stuff. It may not see the light of day, but I'm certainly not going to be able to validate it when getting shot in the foot due to syscall number conflicts.
# 105476	19-Oct-2002	rwatson	Add a placeholder for the execve_mac() system call, similar to SELinux's execve_secure() system call, which permits a process to pass in a label for a label change during exec. This permits SELinux to change the label for the resulting exec without a race following a manual label change on the process. Because this interface uses our general purpose MAC label abstraction, we call it execve_mac(), and wrap our port of SELinux's execve_secure() around it with appropriate sid mappings. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
# 105144	14-Oct-2002	peter	Restore pointer that was removed in 1.128. This wasn't a merge-o.
# 104747	10-Oct-2002	rwatson	Fix what looks like a merge-o from a conflict in the last commit to syscalls.master.
# 104734	09-Oct-2002	peter	Add a pointer to the alternate syscall tables on 64 bit platforms.
# 104730	09-Oct-2002	rwatson	Flesh out the extattr_{delete,get,set}_link() system calls: variations on the _file() theme that do not follow symlinks. Sync to MAC tree. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
# 104379	02-Oct-2002	archie	Let kse_wakeup() take a KSE mailbox pointer argument. Reviewed by: julian
# 104262	01-Oct-2002	rwatson	Reserve system call numbers for the following system calls: __mac_get_pid Retrieve MAC label of a process by pid Similar to __mac_get_proc() except that the target process of the operation is explicitly specified rather than assuming curthread. __mac_get_link Retrieve MAC label of a path with NOFOLLOW __mac_set_link Set MAC label of a path with NOFOLLOW extattr_set_link Set EAs on a path with NOFOLLOW extattr_get_link Retrieve EAs on a path with NOFOLLOW extattr_delete_link Delete EAs on a path with NOFOLLOW These calls are similar to __mac_get_file(), __mac_set_file(), extattr_set_file(), extattr_get_file(), and extattr_delete_file(), except that they do not follow symlinks. The distinction between these calls is similar to lchown() vs chown(). Implementations to follow. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
# 103972	25-Sep-2002	archie	Make the following name changes to KSE related functions, etc., to better represent their purpose and minimize namespace conflicts: kse_fn_t -> kse_func_t struct thread_mailbox -> struct kse_thr_mailbox thread_interrupt() -> kse_thr_interrupt() kse_yield() -> kse_release() kse_new() -> kse_create() Add missing declaration of kse_thr_interrupt() to <sys/kse.h>. Regenerate the various generated syscall files. Minor style fixes. Reviewed by: julian
# 103574	18-Sep-2002	alfred	Add the rest of the kernel support for the sem_ API in kern/uipc_sem.c. Option 'P1003_1B_SEMAPHORES' to compile them in, or load the "sem" module to activate them. Have kern/makesyscalls.sh emit an include for sys/_semaphore.h into sysproto.h to pull in the typedef for semid_t. Add the syscalls to the syscall table as module stubs.
# 102132	19-Aug-2002	rwatson	mac_syscall is now implemented, switch to MSTD. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
# 101425	06-Aug-2002	rwatson	Rename mac_policy() to mac_syscall() to be more reflective of its purpose. Submitted by: cvance@tislabs.com Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
# 100990	30-Jul-2002	rwatson	Introduce support for Mandatory Access Control and extensible kernel access control. Replace 'void ' with 'struct mac ' now that mac.h is in the base tree. The current POSIX.1e-derived userland MAC interface is schedule for replacement, but will act as a functional placeholder until the replacement is done. These system calls allow userland processes to get and set labels on both the current process, as well as file system objects and file descriptor backed objects.
# 100954	30-Jul-2002	rwatson	Introduce a mac_policy() system call that will provide MAC policies with a general purpose front end entry point for user applications to invoke. The MAC framework will route the system call to the appropriate policy by name. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
# 100896	30-Jul-2002	rwatson	Prototype function arguments, only with MAC-specific structures replaced with void until we bring in the actual structure definitions. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
# 99916	13-Jul-2002	alfred	Remove incorrect comment about now corrected manpage.
# 99855	12-Jul-2002	alfred	Create a bug-for-bug FreeBSD4 compatible version of sendfile and move the fixed sendfile over. This is needed to preserve binary compatibility from 4.x to 5.x.
# 99072	29-Jun-2002	julian	Part 1 of KSE-III The ability to schedule multiple threads per process (one one cpu) by making ALL system calls optionally asynchronous. to come: ia64 and power-pc patches, patches for gdb, test program (in tools) Reviewed by: Almost everyone who counts (at various times, peter, jhb, matt, alfred, mini, bernd, and a cast of thousands) NOTE: this is still Beta code, and contains lots of debugging stuff. expect slight instability in signals..
# 98197	13-Jun-2002	rwatson	Keep POSIX.1e capabilities system call placeholders, but remove definitions.
# 97369	28-May-2002	marcel	Add syscall uuidgen() for generating Univerally Unique Identifiers (UUIDs). On ia64 UUIDs, aka GUIDs, are used by EFI and the firmware among others. To create GUID Partition Tables (GPTs), we need to be able to generate UUIDs.
# 96083	05-May-2002	mux	Add an entry for the lchflags(2) syscall. It's useful to prevent a symlink deletion. Reviewed by: rwatson
# 94935	17-Apr-2002	mux	Add an entry for the kenv(2) syscall (code to follow). Reviewed by: peter
# 94640	14-Apr-2002	alc	Remove the requirement that Giant be held around sigreturn().
# 94446	11-Apr-2002	alc	Remove the requirement that Giant be held around osigreturn(). All platform- specific implementations are MPSAFE.
# 91692	05-Mar-2002	rwatson	Reserve system call numbers for the MAC framework. This will prevent people working on the MAC tree from getting toasted whenever system call numbers are allocated in the main tree (for example, for KSE :-). Calls allocated: __mac_{get,set}_proc, __mac_{get,set}_{fd,file}(). Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
# 90889	19-Feb-2002	julian	Add stub syscalls and definitions for KSE calls. "Book'em Danno"
# 90886	19-Feb-2002	julian	Add 5 KSE syscalls. Two will be implemented with the next KSE step and the others are reservations for coming code. All will be stubbed in this kernel in the next commit. This will allow people to easily make KSE binaries for userland testing (the syscalls will be in libc) but they will still need a real KSE kernel to test it. (libc looks in /sys to decide what it should add stubs for).
# 90777	17-Feb-2002	deischen	Fix prototype to sigreturn to use struct __ucontext instead of ucontext_t.
# 90448	10-Feb-2002	rwatson	Part I: Update extended attribute API and ABI: o Modify the system call syntax for extattr_{get,set}_{fd,file}() so as not to use the scatter gather API (which appeared not to be used by any consumers, and be less portable), rather, accepts 'data' and 'nbytes' in the style of other simple read/write interfaces. This changes the API and ABI. o Modify system call semantics so that extattr_get_{fd,file}() return a size_t. When performing a read, the number of bytes read will be returned, unless the data pointer is NULL, in which case the number of bytes of data are returned. This changes the API only. o Modify the VOP_GETEXTATTR() vnode operation to accept a *size_t argument so as to return the size, if desirable. If set to NULL, the size will not be returned. o Update various filesystems (pseodofs, ufs) to DTRT. These changes should make extended attributes more useful and more portable. More commits to rebuild the system call files, as well as update userland utilities to follow. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
# 90073	01-Feb-2002	bde	Made osigreturn(2) standard so that SYS_osigreturn can be used in the signal trampoline for old signals. The arches that support old signals currently abuse sigreturn(2) instead. This mainly complicates things and slightly breaks the the new sigreturn(2). COMPAT is too limited to support the correct configuration of osigreturn, and this commit doesn't attempt to fix it; it just moves the bogusness: osigreturn() must now be provided unconditionally even on arches that don't really need it; previously it had to be provided under the bogus condition defined(COMPAT_43).
# 88633	29-Dec-2001	alfred	Make AIO a loadable module. Remove the explicit call to aio_proc_rundown() from exit1(), instead AIO will use at_exit(9). Add functions at_exec(9), rm_at_exec(9) which function nearly the same as at_exec(9) and rm_at_exec(9), these functions are called on behalf of modules at the time of execve(2) after the image activator has run. Use a modified version of tegge's suggestion via at_exec(9) to close an exploitable race in AIO. Fix SYSCALL_MODULE_HELPER such that it's archetecuterally neutral, the problem was that one had to pass it a paramater indicating the number of arguments which were actually the number of "int". Fix it by using an inline version of the AS macro against the syscall arguments. (AS should be available globally but we'll get to that later.) Add a primative system for dynamically adding kqueue ops, it's really not as sophisticated as it should be, but I'll discuss with jlemon when he's around.
# 85890	02-Nov-2001	phk	Reserve 378 for the new mount syscall Maxime Henrion <mux@qualys.com> is working on. (This is to get us more than 32 mountoptions).
# 84883	13-Oct-2001	rwatson	o Reserve system call 377 for afs_syscall; by reserving a system call number, portable OpenAFS applications don't have to attempt to determine what system call number was dynamically allocated. No system call prototype or implementation is defined. Requested by: Tom Maher <tardis@watson.org>
# 83795	21-Sep-2001	rwatson	o Introduce eaccess(2), a version of access(2) that uses the effective credentials rather than the real credentials. This is useful for implementing GUI's which need to modify icons based on access rights, but where use of open(2) is too expensive, use of stat(2) doesn't reflect the file system's real protection model, and use of access() suffers from real/effective credential confusion. This implementation provides the same semantics as the call of the same name on SCO OpenServer. Note: using this call improperly can leave you subject to some of the same races present in the access(2) call. o To implement this, break out the basic logic of access(2) into vpaccess(), which accepts a passed credential to perform the invocation of VOP_ACCESS(). Add eaccess(2) to invoke vpaccess(), and modify access(2) to use vpaccess(). Obtained from: TrustedBSD Project
# 83651	18-Sep-2001	peter	Cleanup and split of nfs client and server code. This builds on the top of several repo-copies.
# 82753	01-Sep-2001	dillon	Synchronize syscalls.master(s) with recent Giant pushdown work
# 82711	01-Sep-2001	dillon	Make yield() MPSAFE. Synchronize syscalls.master with all MPSAFE changes to date. Synchronize new syscall generation follows because yield() will panic if it is out of sync with syscalls.master.
# 82610	30-Aug-2001	dillon	Giant pushdown syscalls in kern/uipc_syscalls.c. Affected calls: recvmsg(), sendmsg(), recvfrom(), accept(), getpeername(), getsockname(), socket(), connect(), accept(), send(), recv(), bind(), setsockopt(), listen(), sendto(), shutdown(), socketpair(), sendfile()
# 82607	30-Aug-2001	dillon	Giant Pushdown: sysv shm, sem, and msg calls.
# 82585	30-Aug-2001	dillon	Remove the MPSAFE keyword from the parser for syscalls.master. Instead introduce the [M] prefix to existing keywords. e.g. MSTD is the MP SAFE version of STD. This is prepatory for a massive Giant lock pushdown. The old MPSAFE keyword made syscalls.master too messy. Begin comments MP-Safe procedures with the comment: /* * MPSAFE / This comments means that the procedure may be called without Giant held (The procedure itself may still need to obtain Giant temporarily to do its thing). sv_prepsyscall() is now MP SAFE and assumed to be MP SAFE sv_transtrap() is now MP SAFE and assumed to be MP SAFE ktrsyscall() and ktrsysret() are now MP SAFE (Giant Pushdown) trapsignal() is now MP SAFE (Giant Pushdown) Places which used to do the if (mtx_owned(&Giant)) mtx_unlock(&Giant) test in syscall[2]() in /*/trap.c now do not. Instead they explicitly unlock Giant if they previously obtained it, and then assert that it is no longer held to catch broken system calls. Rebuild syscall tables.
# 77386	29-May-2001	phk	Remove a comment which was past its shelf life. PR: 18750 Submitted by: Tony Finch <dot@dotat.at>
# 76827	18-May-2001	alfred	Introduce a global lock for the vm subsystem (vm_mtx). vm_mtx does not recurse and is required for most low level vm operations. faults can not be taken without holding Giant. Memory subsystems can now call the base page allocators safely. Almost all atomic ops were removed as they are covered under the vm mutex. Alpha and ia64 now need to catch up to i386's trap handlers. FFS and NFS have been tested, other filesystems will need minor changes (grabbing the vm lock when twiddling page properties). Reviewed (partially) by: jake, jhb
# 76472	11-May-2001	tegge	gettimeofday() is MP safe on both -current and -stable.
# 75426	11-Apr-2001	rwatson	o Introduce a new system call, __setsugid(), which allows a process to toggle the P_SUGID bit explicitly, rather than relying on it being set implicitly by other protection and credential logic. This feature is introduced to support inter-process authorization regression testing by simplifying userland credential management allowing the easy isolation and reproduction of authorization events with specific security contexts. This feature is enabled only by "options REGRESSION" and is not intended to be used by applications. While the feature is not known to introduce security vulnerabilities, it does allow processes to enter previously inaccessible parts of the credential state machine, and is therefore disabled by default. It may not constitute a risk, and therefore in the future pending further analysis (and appropriate need) may become a published interface. Obtained from: TrustedBSD Project
# 75038	31-Mar-2001	rwatson	o Introduce extattr_{delete,get,set}_fd() to allow extended attribute operations on file descriptors, which complement the existing set of calls, extattr_{delete,get,set}_file() which act on paths. In doing so, restructure the system call implementation such that the two sets of functions share most of the relevant code, rather than duplicating it. This pushes the vnode locking into the shared code, but keeps the copying in of some arguments in the system call code. Allowing access via file descriptors reduces the opportunity for race conditions when managing extended attributes. Obtained from: TrustedBSD Project
# 74437	19-Mar-2001	rwatson	o Rename "namespace" argument to "attrnamespace" as namespace is a C++ reserved word. Submitted by: jkh Obtained from: TrustedBSD Project
# 74273	15-Mar-2001	rwatson	o Change the API and ABI of the Extended Attribute kernel interfaces to introduce a new argument, "namespace", rather than relying on a first- character namespace indicator. This is in line with more recent thinking on EA interfaces on various mailing lists, including the posix1e, Linux acl-devel, and trustedbsd-discuss forums. Two namespaces are defined by default, EXTATTR_NAMESPACE_SYSTEM and EXTATTR_NAMESPACE_USER, where the primary distinction lies in the access control model: user EAs are accessible based on the normal MAC and DAC file/directory protections, and system attributes are limited to kernel-originated or appropriately privileged userland requests. o These API changes occur at several levels: the namespace argument is introduced in the extattr_{get,set}_file() system call interfaces, at the vnode operation level in the vop_{get,set}extattr() interfaces, and in the UFS extended attribute implementation. Changes are also introduced in the VFS extattrctl() interface (system call, VFS, and UFS implementation), where the arguments are modified to include a namespace field, as well as modified to advoid direct access to userspace variables from below the VFS layer (in the style of recent changes to mount by adrian@FreeBSD.org). This required some cleanup and bug fixing regarding VFS locks and the VFS interface, as a vnode pointer may now be optionally submitted to the VFS_EXTATTRCTL() call. Updated documentation for the VFS interface will be committed shortly. o In the near future, the auto-starting feature will be updated to search two sub-directories to the ".attribute" directory in appropriate file systems: "user" and "system" to locate attributes intended for those namespaces, as the single filename is no longer sufficient to indicate what namespace the attribute is intended for. Until this is committed, all attributes auto-started by UFS will be placed in the EXTATTR_NAMESPACE_SYSTEM namespace. o The default POSIX.1e attribute names for ACLs and Capabilities have been updated to no longer include the '$' in their filename. As such, if you're using these features, you'll need to rename the attribute backing files to the same names without '$' symbols in front. o Note that these changes will require changes in userland, which will be committed shortly. These include modifications to the extended attribute utilities, as well as to libutil for new namespace string conversion routines. Once the matching userland changes are committed, a buildworld is recommended to update all the necessary include files and verify that the kernel and userland environments are in sync. Note: If you do not use extended attributes (most people won't), upgrading is not imperative although since the system call API has changed, the new userland extended attribute code will no longer compile with old include files. o Couple of minor cleanups while I'm there: make more code compilation conditional on FFS_EXTATTR, which should recover a bit of space on kernels running without EA's, as well as update copyright dates. Obtained from: TrustedBSD Project
# 69512	02-Dec-2000	jake	Remove thr_sleep and thr_wakeup. Remove fields p_nthread and p_wakeup from struct proc, which are now unused (p_nthread already was). Remove process flag P_KTHREADP which was untested and only set in vfs_aio.c (it should use kthread_create). Move the yield system call to kern_synch.c as kern_threads.c has been removed completely. moral support from: alfred, jhb
# 69449	01-Dec-2000	alfred	sysvipc loadable. new syscall entry lkmressys - "reserved loadable syscall" Make syscall_register allow overwriting of such entries (lkmressys).
# 65150	28-Aug-2000	marcel	Fix prototypes for {o\|}{g\|s}etrlimit. A recent change in the Linuxulator caused this bug to trigger.
# 64001	29-Jul-2000	peter	Sigh. Fix SYS_exit problems. I misunderstood the significance of these trailing options.
# 63986	28-Jul-2000	peter	Change the 'exit()' system call to 'sys_exit()'. This avoids overlapping gcc's internal exit() prototypes and the (futile) hackery that we did to try and avoid warnings. main() was renamed for similar reasons. Remove an exit related hack from makesyscalls.sh.
# 63452	18-Jul-2000	jlemon	Simplify kqueue API slightly. Discussed on: -arch
# 63082	13-Jul-2000	rwatson	o Introduce syscall prototypes, stubs for __cap_{get,set}_{fd,file}, syscalls to manage capability sets on files. First of two commits. Obtained from: TrustedBSD Project
# 61718	15-Jun-2000	rwatson	Introduce syscalls for process capability manipulation. Currently backs onto already committed stubs. Commit one of two. Reviewed by: Damned if I can remember. Many people. Obtained from: TrustedBSD Project
# 60247	09-May-2000	bde	Fixed the declaration of mmap(). The crufty padding arg had the wrong type. This gave an inconsistent amount of crufty padding on i386's with 64-bit longs (8 bytes instead of 4). On alphas it gives a consistent amount of crufty padding (8 bytes) in addition to the 4 bytes of normal padding caused by passing int args as register_t's. Fixed the args struct tag for the NOPROTO syscalls (netbsd_lchown() and netbsd_msync()). The tag is currently unused for NOPROTO syscalls, so the bug has no effect, but it will be used even in the NOPROTO case to calculate sy_nargs correctly.
# 59827	01-May-2000	peter	Remove undocumented broken-as-designed semconfig() syscall.
# 59288	16-Apr-2000	jlemon	Introduce kqueue() and kevent(), a kernel event notification facility.
# 58963	03-Apr-2000	alfred	Make makesyscalls.sh parse an optional field 'MPSAFE' that specifies that a syscall does not want the BGL to be grabbed automatically. Add the new MPSAFE flag to the syscalls that dillon has determined to be MPSAFE.
# 56270	19-Jan-2000	rwatson	Fix bde'isms in acl/extattr syscall interface, renaming syscalls to prettier (?) names, adding some const's around here, et al. Commit 1 out of 3. Reviewed by: bde
# 56115	16-Jan-2000	peter	Implement setres[ug]id() and getres[ug]id(). This has been sitting in my tree for ages (~2 years) waiting for an excuse to commit it. Now Linux has implemented it and it seems that Staroffice (when using the linux_base6.1 port's libc) calls this in the linux emulator and dies in setup. The Linux emulator can call these now.
# 55943	14-Jan-2000	jasone	Add aio_waitcomplete(). Make aio work correctly for socket descriptors. Make gratuitous style(9) fixes (me, not the submitter) to make the aio code more readable. PR: kern/12053 Submitted by: Chris Sedore <cmsedore@maxwell.syr.edu>
# 54970	21-Dec-1999	alfred	make getfh a standard syscall instead of dependant on having NFSSERVER defined, useful for userland fileservers that want to use a filehandle type interface to the filesystem. Submitted by: Assar Westerlund assar@stacken.kth.se PR: kern/15452
# 54802	19-Dec-1999	rwatson	First pass commit to introduce new ACL and Extended Attribute system calls. The second pass commit with all the supporting code will happen shortly afterwards. Reviewed by: eivind
# 53299	17-Nov-1999	brian	modfind(char ) -> modfind(const char ) Reminded by: dfr
# 52149	12-Oct-1999	marcel	Now that userland including modules don't use the osig* syscalls, make them of type COMPAT.
# 51790	29-Sep-1999	marcel	sigset_t change (part 1 of 5) ----------------------------- Rename sigaction, sigprocmask, sigpending and sigsuspend to osigaction, osigprocmask, osigpending and osigsuspend (resp) and add new syscalls for them to support the new sisgset_t without breaking existing binaries. Change the prototype of sigaltstack to use the typedef stack_t instead of struct sigaltstack to reflect that it is SUSv2 compliant. Also, rename sigreturn to osigreturn and add a new syscall to support the modified stackframe. The change is caused by sigreturn operating on ucontext_t now and the fact that siginfo_t has been updated to conform to SUSv2.
# 51138	10-Sep-1999	alfred	Seperate the export check in VFS_FHTOVP, exports are now checked via VFS_CHECKEXP. Add fh(open\|stat\|stafs) syscalls to allow userland to query filesystems based on (network) filehandle. Obtained from: NetBSD
# 50477	27-Aug-1999	peter	$Id$ -> $FreeBSD$
# 49641	11-Aug-1999	nik	Add CPT_NOA, LIBCOMPAT, NODEF, NOARGS, NOPROTO, and NOIMPL to the commented list of available types. PR: docs/13007 Submitted by: Assar Westerlund <assar@sics.se>
# 49428	05-Aug-1999	jkh	Move syscall 180 back to where it was before and fix the incorrect comment which led me to move it in the first place.
# 49420	04-Aug-1999	jkh	Reserve a syscall for the arla folks. I'm assuming that since syscalls.c and init_sysent.c are checked into CVS, I should also commit the regenerated copies even though they're built by syscalls.master. Correct? Bruce? :)
# 47103	13-May-1999	bde	Fixed nonsense arg type `const caddr_t' in the prototype() for utrace(). Changed to `const void *'. utrace() is undocumented, so nothing should notice. Fixed missing consts for utrace() and ktrace() in syscalls.master. sys/ktrace.h is missing some Lite2 changes of shorts to ints.
# 46154	28-Apr-1999	phk	Add the jail system call.
# 45311	04-Apr-1999	dt	Add standard padding argument to pread and pwrite syscall. That should make them NetBSD compatible. Add parameter to fo_read and fo_write. (The only flag FOF_OFFSET mean that the offset is set in the struct uio). Factor out some common code from read/pread/write/pwrite syscalls.
# 45065	27-Mar-1999	alc	Added pread and pwrite. These functions are defined by the X/Open Threads Extension. (Note: We use the same syscall numbers as NetBSD.) Submitted by: John Plevyak <jplevyak@inktomi.com>
# 41088	11-Nov-1998	peter	A kldsym(2) syscall prototype for extracting information from the in-kernel linker. This is intended to replace kvm_mkdb etc. The first version only does name->value lookups, but it's open ended. value->name lookups would probably be a good thing to do too. It's been suggested to try and connect the symbol tables to sysctl (which is probably a more flexible way of doing it if it's done right), but that is far more complex and difficult than I was ready to have a shot at.
# 40931	05-Nov-1998	dg	Implemented zero-copy TCP/IP extensions via sendfile(2) - send a file to a stream socket. sendfile(2) is similar to implementations in HP-UX, Linux, and other systems, but the API is more extensive and addresses many of the complaints that the Apache Group and others have had with those other implementations. Thanks to Marc Slemko of the Apache Group for helping me work out the best API for this. Anyway, this has the "net" result of speeding up sends of files over TCP/IP sockets by about 10X (that is to say, uses 1/10th of the CPU cycles) when compared to a traditional read/write loop.
# 38515	24-Aug-1998	dfr	Fix a few syscall arguments to use size_t instead of u_int.
# 36735	07-Jun-1998	dfr	This commit fixes various 64bit portability problems required for FreeBSD/alpha. The most significant item is to change the command argument to ioctl functions from int to u_long. This change brings us inline with various other BSD versions. Driver writers may like to use (__FreeBSD_version == 300003) to detect this change. The prototype FreeBSD/alpha machdep will follow in a couple of days time.
# 36033	14-May-1998	peter	deep-six signanosleep(). It sounded like a good idea at the time.
# 35938	11-May-1998	dyson	Fix the futimes/undelete/utrace conflict with other BSD's. Note that the only common usage of utrace (the possible problem with this commit) is with malloc, so this should be a real problem. Add the various NetBSD syscalls that allow full emulation of their development environment.
# 34925	28-Mar-1998	dufault	Finish _POSIX_PRIORITY_SCHEDULING. Needs P1003_1B and _KPOSIX_PRIORITY_SCHEDULING options to work. Changes: Change all "posix4" to "p1003_1b". Misnamed files are left as "posix4" until I'm told if I can simply delete them and add new ones; Add _POSIX_PRIORITY_SCHEDULING system calls for FreeBSD and Linux; Add man pages for _POSIX_PRIORITY_SCHEDULING system calls; Add options to LINT; Minor fixes to P1003_1B code during testing.
# 33040	03-Feb-1998	bde	Fixed type of mincore().
# 32889	30-Jan-1998	phk	Retire LFS. If you want to play with it, you can find the final version of the code in the repository the tag LFS_RETIREMENT. If somebody makes LFS work again, adding it back is certainly desireable, but as it is now nobody seems to care much about it, and it has suffered considerable bitrot since its somewhat haphazard integration. R.I.P
# 32726	24-Jan-1998	eivind	Make all file-system (MFS, FFS, NFS, LFS, DEVFS) related option new-style. This introduce an xxxFS_BOOT for each of the rootable filesystems. (Presently not required, but encouraged to allow a smooth move of option *FS to opt_dontuse.h later.) LFS is temporarily disabled, and will be re-enabled tomorrow.
# 32166	01-Jan-1998	alex	Added missing caddr_t --> void * conversions for sys/mman.h functions. Submitted by: bde
# 30740	26-Oct-1997	phk	Add "NOIMPL" for syscalls we know what is, but don't implement as "STD". Use this for getfh & nfssvc.
# 29391	14-Sep-1997	phk	Add a __getcwd() syscall. This is intentionally undocumented, but all it does is to try to figure the pwd out from the vfs namecache, and return a reversed string to it. libc:getcwd() is responsible for flipping it back.
# 29348	14-Sep-1997	peter	Activate poll(2) syscall
# 28399	19-Aug-1997	peter	SVR4/XPG-style getpgid()/getsid() syscalls.
# 26671	15-Jun-1997	dyson	Modifications to existing files to support the initial AIO/LIO and kernel based threading support.
# 26333	01-Jun-1997	peter	New syscall, signanosleep(), which is a hybrid of sigsuspend(2) and nanosleep(2). It sleeps until either the time expires, or a signal permitted by the supplied mask arrives (eg: SIGALRM if appropriate)
# 25581	08-May-1997	peter	oops. NODIDE -> NOHIDE
# 25580	08-May-1997	peter	Define entries for the posix-style clock/timer syscalls including nanosleep(). Also, note some syscall conflicts with other systems and indicate slots tagged for use with other syscalls some day.
# 25537	07-May-1997	dfr	This is the kernel linker. To use it, you will first need to apply the patches in freefall:/home/dfr/ld.diffs to your ld sources and set BINFORMAT to aoutkld when linking the kernel. Library changes and userland utilities will appear in a later commit.
# 24451	31-Mar-1997	peter	issetugid is now implemented rather than reserved
# 24439	31-Mar-1997	peter	Reserve 252 (poll, first in OpenBSD) Reserve 253 (issetugid, as in OpenBSD) Allocate 254 for lchown(2)
# 22975	22-Feb-1997	peter	Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
# 22521	10-Feb-1997	dyson	This is the kernel Lite/2 commit. There are some requisite userland changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes. The system boots and can mount UFS filesystems. Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed. Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>
# 21776	16-Jan-1997	bde	Reduced #include spam in <sys/sysproto.h> and fixed things that depended on it. makesyscalls.sh: This parsed $Id$. Fixed(?) to parse $FreeBSD$. The output is wrong when the id is not expanded in the source file. syscalls.master: Fixed declaration of sigsuspend(). There are still some bogons and spam involving sigset_t. Use `struct foo ' instead of the equivalent `foo_t ' for some nfs and lfs syscalls so that <sys/sysproto.h> doesn't depend on <sys/mount.h>.
# 21673	14-Jan-1997	jkh	Make the long-awaited change from $Id$ to $FreeBSD$ This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
# 18398	19-Sep-1996	phk	Add the utrace(caddr_t addr,size_t len) syscall, that will store the data pointed at in a ktrace file, if this process is being ktrace'ed. I'm using this to profile malloc usage. The advantage is that there is no context around this call, ie, no open file or socket, so it will work in any process, and you can decide if you want it to collect data or not.
# 17702	20-Aug-1996	smpatel	Remove the kernel FD_SETSIZE limit for select(). Make select()'s first argument 'int' not 'u_int'. Reviewed by: bde
# 14322	02-Mar-1996	peter	Change the 'int len' args in the mmap/msync/mincore/etc class syscalls to 'size_t' as per bde's request.
# 14219	23-Feb-1996	peter	Add hooks for rfork/minherit pair, and reset args of vfork in preperation for adding the syscalls.
# 14215	23-Feb-1996	peter	Note the syscall numbers used in BSD/OS 2.x. We dont want to accidently use one of these ourselves as it'd make it harder to run their binaries. Also, remove the now-defunct #include "opt_sysvipc.h".
# 13416	13-Jan-1996	phk	Add an option NFS_NOSERVER which saves 100K in the install kernel (or any other kernel that uses it). Use with option NFS.
# 13331	08-Jan-1996	peter	Remove the #ifdef SYSVSHM etc. Always call the functions, some stubs are about to go in. This is to fix the problem with the ibcs2 and linux lkm's not being able to call the sysv ipc functions unless the build is modified.
# 13226	04-Jan-1996	wollman	Convert SYSV IPC to new-style options. (I hope I got everything...) The LKMs will need an extra file, to come later.
# 13203	03-Jan-1996	wollman	Converted two options over to the new scheme: USER_LDT and KTRACE.
# 12864	15-Dec-1995	peter	Add the direct sysv shm/sem/msg system calls, in the same way as NetBSD. This costs very little, we gain prototypes for the calls from the linux emulator, and this is one less thing in the way of NetBSD binary support.
# 12216	12-Nov-1995	bde	Fixed the args list for mount(). We're not ready for the BSD4.4lite2/ NetBSD interface. Increased the bogusness of the args list for mmap(). The args lists for most of the memory mapping functions are bogus. The args lists in syscalls.master are a little better than the ones in the args structs currently being used, but the improvement for mmap() changed the object code and I don't want to worry about that now. Increased the bogusness of the args list for fcntl. BSD4.4lite2/NetBSD uses `void ' instead of int for the third arg. This has the advantage of working when `void 's are longer than ints, but requires extra bogus casts that I hope to avoid. Fixed the args list for uname. `struct outsname' seems to be a typo, not an old interface. Added comments about bogus args lists for open, mount, msync, munmap, mprotect, madvise, mincore, fcntl, semsys, msgsys and shmsys.
# 11330	07-Oct-1995	swallace	Fix misc formatting errors in makesyscalls.sh. Add CPT_NOA type which is COMPAT with NOARGS -- do not produce argument struct in sysproto. Change accept, recvfrom, getsockname to CPT_NOA type. Fix getrlimit, setrlimit argument #2 name to struct rlimit.
# 11294	07-Oct-1995	swallace	Add new functionality to makesyscalls.sh: o optional config-file to set vars: sysnames, sysproto, sysproto_h, syshdr, syssw, syshide, syscallprefix, switchname, namesname, sysvec. o change syntax of syscalls.master entry: remove argument count. add pseudo-prototype field defining function name and arguments. o generates correct structure definitions for all system calls in sys/sysproto.h o add type NOARGS: same as STD except do not create structure in sys/sysproto.h o add type NOPROTO: same as STD except do not create structure or function prototype in sys/sysproto.h New functionality provides complete prototype definitions. Usefull for generating files for emulated systems like my new ibcs2 code. Update syscalls.master to reflect new changes. For example, read() entry now looks like: 3 STD POSIX { int ibcs2_read(int fd, char *buf, u_int nbytes); } This is similar to how NetBSD generates these files.
# 10905	19-Sep-1995	bde	Generate prototypes for syscall-implementing functions. Put them in <sys/sysproto.h> and use them (so far only) in kern/init_sysent.c. Don't put $Id in generated files. kern/syscalls.master: I had to add some new fields to describe some non-orthogonal names. E.g., the args struct for the syscall-implementing function foo() is usually named `foo_args', but for getpid() it is named `args'. sys/sysent.h: sy_call_t is still incomplete to hide a couple of warnings.
# 8019	23-Apr-1995	ache	Make setreuid/setregid active syscalls
# 7359	25-Mar-1995	dg	Added a third "flags" argument to msync() ...as other systems have.
# 6875	04-Mar-1995	dg	Removed obsolete vtrace() remnants.
# 5107	14-Dec-1994	wollman	Actually enable NTP kernel PLL. (Oops!) Noticed by Pete Carah.
# 3291	02-Oct-1994	dg	"idle priority" support. Based on code from Henrik Vestergaard Draboel, but substantially rewritten by me.
# 3178	28-Sep-1994	wollman	LKM support is no longer optional.
# 2858	18-Sep-1994	wollman	Redo Kernel NTP PLL support, kernel side. This code is mostly taken from the 1.1 port (which was in turn taken from Dave Mills's kern.tar.Z example). A few significant differences: 1) ntp_gettime() is now a MIB variable rather than a system call. A few fiddles are done in libc to make it behave the same. 2) mono_time does not participate in the PLL adjustments. 3) A new interface has been defined (in <machine/clock.h>) for doing possibly machine-dependent things around the time of the clock update. This is used in Pentium kernels to disable interrupts, set `time', and reset the CPU cycle counter as quickly as possible to avoid jitter in microtime(). Measurements show an apparent resolution of a bit more than 8.14usec, which is reasonable given system-call overhead.
# 2729	13-Sep-1994	dfr	Added SYSV ipcs. Obtained from: NetBSD and FreeBSD-1.1.5
# 2696	12-Sep-1994	wollman	Added namespace information for future pollution-control measures.
# 2441	01-Sep-1994	dg	Realtime priority scheduling support. Submitted by: Henrik Vestergaard Draboel
# 2297	26-Aug-1994	wollman	Added ntp_gettime and ntp_adjtime syscalls, both nosys'ed out until someone gets to re-integrating the code. ntp_gettime() should be turned into a sysctl variable and emulated in the library.
# 2124	19-Aug-1994	dg	Terry Lambert's loadable kernel module support w/improvements from the NetBSD group.
# 1817	02-Aug-1994	dg	Added $Id$
# 1549	25-May-1994	rgrimes	The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch. Reviewed by: Rodney W. Grimes Submitted by: John Dyson and David Greenman
# 1542	24-May-1994	rgrimes	This commit was generated by cvs2svn to compensate for changes in r1541, which included commits to RCS files with non-trunk default branches.
# 1541	24-May-1994	rgrimes	BSD 4.4 Lite Kernel Sources