#
272461 |
|
02-Oct-2014 |
gjb |
Copy stable/10@r272459 to releng/10.1 as part of the 10.1-RELEASE process.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation |
#
256281 |
|
10-Oct-2013 |
gjb |
Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
|
#
255708 |
|
19-Sep-2013 |
jhb |
Extend the support for exempting processes from being killed when swap is exhausted. - Add a new protect(1) command that can be used to set or revoke protection from arbitrary processes. Similar to ktrace it can apply a change to all existing descendants of a process as well as future descendants. - Add a new procctl(2) system call that provides a generic interface for control operations on processes (as opposed to the debugger-specific operations provided by ptrace(2)). procctl(2) uses a combination of idtype_t and an id to identify the set of processes on which to operate similar to wait6(). - Add a PROC_SPROTECT control operation to manage the protection status of a set of processes. MADV_PROTECT still works for backwards compatability. - Add a p_flag2 to struct proc (and a corresponding ki_flag2 to kinfo_proc) the first bit of which is used to track if P_PROTECT should be inherited by new child processes.
Reviewed by: kib, jilles (earlier version) Approved by: re (delphij) MFC after: 1 month
|
#
255490 |
|
12-Sep-2013 |
jhb |
Fix the type of the idtype argument to wait6() in syscalls.master.
Approved by: re (kib) MFC after: 1 week
|
#
255219 |
|
04-Sep-2013 |
pjd |
Change the cap_rights_t type from uint64_t to a structure that we can extend in the future in a backward compatible (API and ABI) way.
The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough.
The structure definition looks like this:
struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; };
The initial CAP_RIGHTS_VERSION is 0.
The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements.
The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future.
To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg.
#define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL)
We still support aliases that combine few rights, but the rights have to belong to the same array element, eg:
#define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL)
#define CAP_FCHMODAT (CAP_FCHMOD | CAP_LOOKUP)
There is new API to manage the new cap_rights_t structure:
cap_rights_t *cap_rights_init(cap_rights_t *rights, ...); void cap_rights_set(cap_rights_t *rights, ...); void cap_rights_clear(cap_rights_t *rights, ...); bool cap_rights_is_set(const cap_rights_t *rights, ...);
bool cap_rights_is_valid(const cap_rights_t *rights); void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src); void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src); bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);
Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg:
cap_rights_t rights;
cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);
There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg:
#define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...);
Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1:
cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);
Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition.
This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x.
Sponsored by: The FreeBSD Foundation
|
#
251526 |
|
08-Jun-2013 |
glebius |
Add new system call - aio_mlock(). The name speaks for itself. It allows to perform the mlock(2) operation, which can consume a lot of time, under control of aio(4).
Reviewed by: kib, jilles Sponsored by: Nginx, Inc.
|
#
250853 |
|
21-May-2013 |
kib |
Fix the wait6(2) on 32bit architectures and for the compat32, by using the right type for the argument in syscalls.master. Also fix the posix_fallocate(2) and posix_fadvise(2) compat32 syscalls on the architectures which require padding of the 64bit argument.
Noted and reviewed by: jhb Pointy hat to: kib MFC after: 1 week
|
#
250159 |
|
01-May-2013 |
jilles |
Add pipe2() system call.
The pipe2() function is similar to pipe() but allows setting FD_CLOEXEC and O_NONBLOCK (on both sides) as part of the function.
If p points to two writable ints, pipe2(p, 0) is equivalent to pipe(p).
If the pointer is not valid, behaviour differs: pipe2() writes into the array from the kernel like socketpair() does, while pipe() writes into the array from an architecture-specific assembler wrapper.
Reviewed by: kan, kib
|
#
250154 |
|
01-May-2013 |
jilles |
Add accept4() system call.
The accept4() function, compared to accept(), allows setting the new file descriptor atomically close-on-exec and explicitly controlling the non-blocking status on the new socket. (Note that the latter point means that accept() is not equivalent to any form of accept4().)
The linuxulator's accept4 implementation leaves a race window where the new file descriptor is not close-on-exec because it calls sys_accept(). This implementation leaves no such race window (by using falloc() flags). The linuxulator could be fixed and simplified by using the new code.
Like accept(), accept4() is async-signal-safe, a cancellation point and permitted in capability mode.
|
#
248995 |
|
02-Apr-2013 |
mdf |
Fix return type of extattr_set_* and fix rmextattr(8) utility.
extattr_set_{fd,file,link} is logically a write(2)-like operation and should return ssize_t, just like extattr_get_*. Also, the user-space utility was using an int for the return value of extattr_get_* and extattr_list_*, both of which return an ssize_t.
MFC after: 1 week
|
#
248599 |
|
21-Mar-2013 |
pjd |
Implement chflagsat(2) system call, similar to fchmodat(2), but operates on file flags.
Reviewed by: kib, jilles Sponsored by: The FreeBSD Foundation
|
#
248597 |
|
21-Mar-2013 |
pjd |
- Make 'flags' argument to chflags(2), fchflags(2) and lchflags(2) of type u_long. Before this change it was of type int for syscalls, but prototypes in sys/stat.h and documentation for chflags(2) and fchflags(2) (but not for lchflags(2)) stated that it was u_long. Now some related functions use u_long type for flags (strtofflags(3), fflagstostr(3)). - Make path argument of type 'const char *' for consistency.
Discussed on: arch Sponsored by: The FreeBSD Foundation
|
#
247667 |
|
02-Mar-2013 |
pjd |
- Implement two new system calls:
int bindat(int fd, int s, const struct sockaddr *addr, socklen_t addrlen); int connectat(int fd, int s, const struct sockaddr *name, socklen_t namelen);
which allow to bind and connect respectively to a UNIX domain socket with a path relative to the directory associated with the given file descriptor 'fd'.
- Add manual pages for the new syscalls.
- Make the new syscalls available for processes in capability mode sandbox.
- Add capability rights CAP_BINDAT and CAP_CONNECTAT that has to be present on the directory descriptor for the syscalls to work.
- Update audit(4) to support those two new syscalls and to handle path in sockaddr_un structure relative to the given directory descriptor.
- Update procstat(1) to recognize the new capability rights.
- Document the new capability rights in cap_rights_limit(2).
Sponsored by: The FreeBSD Foundation Discussed with: rwatson, jilles, kib, des
|
#
247602 |
|
01-Mar-2013 |
pjd |
Merge Capsicum overhaul:
- Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights.
- The cap_new(2) system call is left, but it is no longer documented and should not be used in new code.
- The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one.
- The cap_getrights(2) syscall is renamed to cap_rights_get(2).
- If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall.
- If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2).
- To support ioctl and fcntl white-listing the filedesc structure was heavly modified.
- The audit subsystem, kdump and procstat tools were updated to recognize new syscalls.
- Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below:
CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT.
Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2).
Added CAP_SYMLINKAT: - Allow for symlinkat(2).
Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2).
Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory.
Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall.
Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call.
Removed CAP_MAPEXEC.
CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE.
Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ | PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ | PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE | PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ | PROT_WRITE | PROT_EXEC).
Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT.
CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required).
CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required).
Added convinient defines:
#define CAP_PREAD (CAP_SEEK | CAP_READ) #define CAP_PWRITE (CAP_SEEK | CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP | CAP_SEEK | CAP_READ) #define CAP_MMAP_W (CAP_MMAP | CAP_SEEK | CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP | CAP_SEEK | 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R | CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R | CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W | CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R | CAP_MMAP_W | CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE
#define CAP_SOCK_CLIENT \ (CAP_CONNECT | CAP_GETPEERNAME | CAP_GETSOCKNAME | CAP_GETSOCKOPT | \ CAP_PEELOFF | CAP_RECV | CAP_SEND | CAP_SETSOCKOPT | CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT | CAP_BIND | CAP_GETPEERNAME | CAP_GETSOCKNAME | \ CAP_GETSOCKOPT | CAP_LISTEN | CAP_PEELOFF | CAP_RECV | CAP_SEND | \ CAP_SETSOCKOPT | CAP_SHUTDOWN)
Added defines for backward API compatibility:
#define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT | CAP_SOCK_SERVER)
Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib
|
#
242958 |
|
13-Nov-2012 |
kib |
Add the wait6(2) system call. It takes POSIX waitid()-like process designator to select a process which is waited for. The system call optionally returns siginfo_t which would be otherwise provided to SIGCHLD handler, as well as extended structure accounting for child and cumulative grandchild resource usage.
Allow to get the current rusage information for non-exited processes as well, similar to Solaris.
The explicit WEXITED flag is required to wait for exited processes, allowing for more fine-grained control of the events the waiter is interested in.
Fix the handling of siginfo for WNOWAIT option for all wait*(2) family, by not removing the queued signal state.
PR: standards/170346 Submitted by: "Jukka A. Ukkonen" <jau@iki.fi> MFC after: 1 month
|
#
239347 |
|
17-Aug-2012 |
davidxu |
Implement syscall clock_getcpuclockid2, so we can get a clock id for process, thread or others we want to support. Use the syscall to implement POSIX API clock_getcpuclock and pthread_getcpuclockid.
PR: 168417
|
#
236026 |
|
25-May-2012 |
ed |
Remove use of non-ISO-C integer types from system call tables.
These files already use ISO-C-style integer types, so make them less inconsistent by preferring the standard types.
|
#
227776 |
|
20-Nov-2011 |
lstewart |
- Add the ffclock_getcounter(), ffclock_getestimate() and ffclock_setestimate() system calls to provide feed-forward clock management capabilities to userspace processes. ffclock_getcounter() returns the current value of the kernel's feed-forward clock counter. ffclock_getestimate() returns the current feed-forward clock parameter estimates and ffclock_setestimate() updates the feed-forward clock parameter estimates.
- Document the syscalls in the ffclock.2 man page.
- Regenerate the script-derived syscall related files.
Committed on behalf of Julien Ridoux and Darryl Veitch from the University of Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward Clock Synchronization Algorithms" project.
For more information, see http://www.synclab.org/radclock/
Submitted by: Julien Ridoux (jridoux at unimelb edu au)
|
#
227691 |
|
19-Nov-2011 |
ed |
Improve *access*() parameter name consistency.
The current code mixes the use of `flags' and `mode'. This is a bit confusing, since the faccessat() function as a `flag' parameter to store the AT_ flag.
Make this less confusing by using the same name as used in the POSIX specification -- `amode'.
|
#
227070 |
|
04-Nov-2011 |
jhb |
Add the posix_fadvise(2) system call. It is somewhat similar to madvise(2) except that it operates on a file descriptor instead of a memory region. It is currently only supported on regular files.
Just as with madvise(2), the advice given to posix_fadvise(2) can be divided into two types. The first type provide hints about data access patterns and are used in the file read and write routines to modify the I/O flags passed down to VOP_READ() and VOP_WRITE(). These modes are thus filesystem independent. Note that to ease implementation (and since this API is only advisory anyway), only a single non-normal range is allowed per file descriptor.
The second type of hints are used to hint to the OS that data will or will not be used. These hints are implemented via a new VOP_ADVISE(). A default implementation is provided which does nothing for the WILLNEED request and attempts to move any clean pages to the cache page queue for the DONTNEED request. This latter case required two other changes. First, a new V_CLEANONLY flag was added to vinvalbuf(). This requests vinvalbuf() to only flush clean buffers for the vnode from the buffer cache and to not remove any backing pages from the vnode. This is used to ensure clean pages are not wired into the buffer cache before attempting to move them to the cache page queue. The second change adds a new vm_object_page_cache() method. This method is somewhat similar to vm_object_page_remove() except that instead of freeing each page in the specified range, it attempts to move clean pages to the cache queue if possible.
To preserve the ABI of struct file, the f_cdevpriv pointer is now reused in a union to point to the currently active advice region if one is present for regular files.
Reviewed by: jilles, kib, arch@ Approved by: re (kib) MFC after: 1 month
|
#
224987 |
|
18-Aug-2011 |
jonathan |
Add experimental support for process descriptors
A "process descriptor" file descriptor is used to manage processes without using the PID namespace. This is required for Capsicum's Capability Mode, where the PID namespace is unavailable.
New system calls pdfork(2) and pdkill(2) offer the functional equivalents of fork(2) and kill(2). pdgetpid(2) allows querying the PID of the remote process for debugging purposes. The currently-unimplemented pdwait(2) will, in the future, allow querying rusage/exit status. In the interim, poll(2) may be used to check (and wait for) process termination.
When a process is referenced by a process descriptor, it does not issue SIGCHLD to the parent, making it suitable for use in libraries---a common scenario when using library compartmentalisation from within large applications (such as web browsers). Some observers may note a similarity to Mach task ports; process descriptors provide a subset of this behaviour, but in a UNIX style.
This feature is enabled by "options PROCDESC", but as with several other Capsicum kernel features, is not enabled by default in GENERIC 9.0.
Reviewed by: jhb, kib Approved by: re (kib), mentor (rwatson) Sponsored by: Google Inc
|
#
224066 |
|
15-Jul-2011 |
jonathan |
Add cap_new() and cap_getrights() system calls.
Implement two previously-reserved Capsicum system calls: - cap_new() creates a capability to wrap an existing file descriptor - cap_getrights() queries the rights mask of a capability.
Approved by: mentor (rwatson), re (Capsicum blanket) Sponsored by: Google Inc
|
#
220791 |
|
18-Apr-2011 |
mdf |
Add the posix_fallocate(2) syscall. The default implementation in vop_stdallocate() is filesystem agnostic and will run as slow as a read/write loop in userspace; however, it serves to correctly implement the functionality for filesystems that do not implement a VOP_ALLOCATE.
Note that __FreeBSD_version was already bumped today to 900036 for any ports which would like to use this function.
Also reserve space in the syscall table for posix_fadvise(2).
Reviewed by: -arch (previous version)
|
#
220163 |
|
30-Mar-2011 |
trasz |
Add rctl. It's used by racct to take user-configurable actions based on the set of rules it maintains and the current resource usage. It also privides userland API to manage that ruleset.
Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)
|
#
219304 |
|
05-Mar-2011 |
trasz |
Add two new system calls, setloginclass(2) and getloginclass(2). This makes it possible for the kernel to track login class the process is assigned to, which is required for RCTL. This change also make setusercontext(3) call setloginclass(2) and makes it possible to retrieve current login class using id(1).
Reviewed by: kib (as part of a larger patch)
|
#
219129 |
|
01-Mar-2011 |
rwatson |
Add initial support for Capsicum's Capability Mode to the FreeBSD kernel, compiled conditionally on options CAPABILITIES:
Add a new credential flag, CRED_FLAG_CAPMODE, which indicates that a subject (typically a process) is in capability mode.
Add two new system calls, cap_enter(2) and cap_getmode(2), which allow setting and querying (but never clearing) the flag.
Export the capability mode flag via process information sysctls.
Sponsored by: Google, Inc. Reviewed by: anderson Discussed with: benl, kris, pjd Obtained from: Capsicum Project MFC after: 3 months
|
#
211998 |
|
30-Aug-2010 |
kib |
Make the syscalls reserved for AFS usable by OpenAFS port.
Submitted by: Benjamin Kaduk <kaduk mit edu> MFC after: 2 weeks
|
#
211838 |
|
26-Aug-2010 |
kib |
Fix typo.
Submitted by: Ben Kaduk <minimarmot gmail com>
|
#
209579 |
|
28-Jun-2010 |
kib |
Count number of threads that enter and leave dynamically registered syscalls. On the dynamic syscall deregistration, wait until all threads leave the syscall code. This somewhat increases the safety of the loadable modules unloading.
Reviewed by: jhb Tested by: pho MFC after: 1 month
|
#
203660 |
|
08-Feb-2010 |
ed |
Remove unused LIBCOMPAT keyword from syscalls.master.
|
#
198508 |
|
27-Oct-2009 |
kib |
Current pselect(3) is implemented in usermode and thus vulnerable to well-known race condition, which elimination was the reason for the function appearance in first place. If sigmask supplied as argument to pselect() enables a signal, the signal might be delivered before thread called select(2), causing lost wakeup. Reimplement pselect() in kernel, making change of sigmask and sleep atomic.
Since signal shall be delivered to the usermode, but sigmask restored, set TDP_OLDMASK and save old mask in td_oldsigmask. The TDP_OLDMASK should be cleared by ast() in case signal was not gelivered during syscall execution.
Reviewed by: davidxu Tested by: pho MFC after: 1 month
|
#
197636 |
|
30-Sep-2009 |
rwatson |
Reserve system call numbers for Capsicum security framework capabilities, capability mode, and process descriptors: cap_new, cap_getrights, cap_enter, cap_getmode, pdfork, pdkill, pdgetpid, and pdwait.
Obtained from: TrustedBSD Project Sponsored by: Google MFC after: 3 weeks
|
#
195458 |
|
08-Jul-2009 |
trasz |
There is an optimization in chmod(1), that makes it not to call chmod(2) if the new file mode is the same as it was before; however, this optimization must be disabled for filesystems that support NFSv4 ACLs. Chmod uses pathconf(2) to determine whether this is the case - however, pathconf(2) always follows symbolic links, while the 'chmod -h' doesn't.
This change adds lpathconf(3) to make it possible to solve that problem in a clean way.
Reviewed by: rwatson (earlier version) Approved by: re (kib)
|
#
194910 |
|
24-Jun-2009 |
jhb |
Change the ABI of some of the structures used by the SYSV IPC API: - The uid/cuid members of struct ipc_perm are now uid_t instead of unsigned short. - The gid/cgid members of struct ipc_perm are now gid_t instead of unsigned short. - The mode member of struct ipc_perm is now mode_t instead of unsigned short (this is merely a style bug). - The rather dubious padding fields for ABI compat with SV/I386 have been removed from struct msqid_ds and struct semid_ds. - The shm_segsz member of struct shmid_ds is now a size_t instead of an int. This removes the need for the shm_bsegsz member in struct shmid_kernel and should allow for complete support of SYSV SHM regions >= 2GB. - The shm_nattch member of struct shmid_ds is now an int instead of a short. - The shm_internal member of struct shmid_ds is now gone. The internal VM object pointer for SHM regions has been moved into struct shmid_kernel. - The existing __semctl(), msgctl(), and shmctl() system call entries are now marked COMPAT7 and new versions of those system calls which support the new ABI are now present. - The new system calls are assigned to the FBSD-1.1 version in libc. The FBSD-1.0 symbols in libc now refer to the old COMPAT7 system calls. - A simplistic framework for tagging system calls with compatibility symbol versions has been added to libc. Version tags are added to system calls by adding an appropriate __sym_compat() entry to src/lib/libc/incldue/compat.h. [1]
PR: kern/16195 kern/113218 bin/129855 Reviewed by: arch@, rwatson Discussed with: kan, kib [1]
|
#
194894 |
|
24-Jun-2009 |
jhb |
Deprecate the msgsys(), semsys(), and shmsys() system calls by moving them under COMPAT_FREEBSD[4567]. Starting with FreeBSD 5.0 the SYSV IPC API was implemented via direct system calls (e.g. msgctl(), msgget(), etc.) rather than indirecting through the var-args *sys() system calls. The shmsys() system call was already effectively deprecated for all but COMPAT_FREEBSD4 already as its implementation for the !COMPAT_FREEBSD4 case was to simply invoke nosys().
|
#
194833 |
|
24-Jun-2009 |
jhb |
Add a new COMPAT7 flag for FreeBSD 7.x compatibility system calls.
|
#
194645 |
|
22-Jun-2009 |
jhb |
Fix a typo in a comment.
|
#
194390 |
|
17-Jun-2009 |
jhb |
- Add the ability to mix multiple flags seperated by pipe ('|') characters in the type field of system call tables. Specifically, one can now use the 'NO*' types as flags in addition to the 'COMPAT*' types. For example, to tag 'COMPAT*' system calls as living in a KLD via NOSTD. The COMPAT* type is required to be listed first in this case. - Add new functions 'type()' and 'flag()' to the embedded awk script in makesyscalls.sh that return true if a requested flag is found in the type field ($3). The flag() function checks all of the flags in the field, but type() only checks the first flag. type() is meant to be used in the top-level "switch" statement and flag() should be used otherwise. - Retire the CPT_NOA type, it is now replaced with "COMPAT|NOARGS" using the flags approach. - Tweak the comment descriptions of COMPAT[46] system calls so that they say "freebsd[46] foo" rather than "old foo". - Document the COMPAT6 type. - Sync comments in compat32 syscall table with the master table.
|
#
194384 |
|
17-Jun-2009 |
jhb |
Remove the now-unused NOIMPL flag. It serves no useful purpose given the existing UNIMPL and NOSTD types.
|
#
194383 |
|
17-Jun-2009 |
jhb |
- NOSTD results in lkmressys being used instead of lkmssys. - Mark nfsclnt as UNIMPL. It should have been NOSTD instead of NOIMPL back when it lived in nfsclient.ko, but it was removed from that a long time ago.
|
#
194262 |
|
15-Jun-2009 |
jhb |
Add a new 'void closefrom(int lowfd)' system call. When called, it closes any open file descriptors >= 'lowfd'. It is largely identical to the same function on other operating systems such as Solaris, DFly, NetBSD, and OpenBSD. One difference from other *BSD is that this closefrom() does not fail with any errors. In practice, while the manpages for NetBSD and OpenBSD claim that they return EINTR, they ignore internal errors from close() and never return EINTR. DFly does return EINTR, but for the common use case (closing fd's prior to execve()), the caller really wants all fd's closed and returning EINTR just forces callers to call closefrom() in a loop until it stops failing.
Note that this implementation of closefrom(2) does not make any effort to resolve userland races with open(2) in other threads. As such, it is not multithread safe.
Submitted by: rwatson (initial version) Reviewed by: rwatson MFC after: 2 weeks
|
#
191673 |
|
29-Apr-2009 |
jamie |
Introduce the extensible jail framework, using the same "name=value" interface as nmount(2). Three new system calls are added: * jail_set, to create jails and change the parameters of existing jails. This replaces jail(2). * jail_get, to read the parameters of existing jails. This replaces the security.jail.list sysctl. * jail_remove to kill off a jail's processes and remove the jail. Most jail parameters may now be changed after creation, and jails may be set to exist without any attached processes. The current jail(2) system call still exists, though it is now a stub to jail_set(2).
Approved by: bz (mentor)
|
#
184789 |
|
09-Nov-2008 |
ed |
Mark uname(), getdomainname() and setdomainname() with COMPAT_FREEBSD4.
Looking at our source code history, it seems the uname(), getdomainname() and setdomainname() system calls got deprecated somewhere after FreeBSD 1.1, but they have never been phased out properly. Because we don't have a COMPAT_FREEBSD1, just use COMPAT_FREEBSD4.
Also fix the Linuxolator to build without the setdomainname() routine by just making it call userland_sysctl on kern.domainname. Also replace the setdomainname()'s implementation to use this approach, because we're duplicating code with sysctl_domainname().
I wasn't able to keep these three routines working in our COMPAT_FREEBSD32, because that would require yet another keyword for syscalls.master (COMPAT4+NOPROTO). Because this routine is probably unused already, this won't be a problem in practice. If it turns out to be a problem, we'll just restore this functionality.
Reviewed by: rdivacky, kib
|
#
184588 |
|
03-Nov-2008 |
dfr |
Implement support for RPCSEC_GSS authentication to both the NFS client and server. This replaces the RPC implementation of the NFS client and server with the newer RPC implementation originally developed (actually ported from the userland sunrpc code) to support the NFS Lock Manager. I have tested this code extensively and I believe it is stable and that performance is at least equal to the legacy RPC implementation.
The NFS code currently contains support for both the new RPC implementation and the older legacy implementation inherited from the original NFS codebase. The default is to use the new implementation - add the NFS_LEGACYRPC option to fall back to the old code. When I merge this support back to RELENG_7, I will probably change this so that users have to 'opt in' to get the new code.
To use RPCSEC_GSS on either client or server, you must build a kernel which includes the KGSSAPI option and the crypto device. On the userland side, you must build at least a new libc, mountd, mount_nfs and gssd. You must install new versions of /etc/rc.d/gssd and /etc/rc.d/nfsd and add 'gssd_enable=YES' to /etc/rc.conf.
As long as gssd is running, you should be able to mount an NFS filesystem from a server that requires RPCSEC_GSS authentication. The mount itself can happen without any kerberos credentials but all access to the filesystem will be denied unless the accessing user has a valid ticket file in the standard place (/tmp/krb5cc_<uid>). There is currently no support for situations where the ticket file is in a different place, such as when the user logged in via SSH and has delegated credentials from that login. This restriction is also present in Solaris and Linux. In theory, we could improve this in future, possibly using Brooks Davis' implementation of variant symlinks.
Supporting RPCSEC_GSS on a server is nearly as simple. You must create service creds for the server in the form 'nfs/<fqdn>@<REALM>' and install them in /etc/krb5.keytab. The standard heimdal utility ktutil makes this fairly easy. After the service creds have been created, you can add a '-sec=krb5' option to /etc/exports and restart both mountd and nfsd.
The only other difference an administrator should notice is that nfsd doesn't fork to create service threads any more. In normal operation, there will be two nfsd processes, one in userland waiting for TCP connections and one in the kernel handling requests. The latter process will create as many kthreads as required - these should be visible via 'top -H'. The code has some support for varying the number of service threads according to load but initially at least, nfsd uses a fixed number of threads according to the value supplied to its '-n' option.
Sponsored by: Isilon Systems MFC after: 1 month
|
#
183361 |
|
25-Sep-2008 |
jhb |
Tidy up a few things with syscall generation: - Instead of using a syscall slot (370) just to get a function prototype for lkmressys(), add an explicit function prototype to <sys/sysent.h>. This also removes unused special case checks for 'lkmressys' from makesyscalls.sh. - Instead of having magic logic in makesyscalls.sh to only generate a function prototype the first time 'lkmnosys' is seen, make 'NODEF' always not generate a function prototype and include an explicit prototype for 'lkmnosys' in <sys/sysent.h>. - As a result of the fix in (2), update the LKM syscall entries in the freebsd32 syscall table to use 'lkmnosys' rather than 'nosys'. - Use NOPROTO for the __syscall() entry (198) in the native ABI. This avoids the need for magic logic in makesyscalls.h to only generate a function prototype the first time 'nosys' is encountered.
|
#
182123 |
|
24-Aug-2008 |
rwatson |
When MPSAFE ttys were merged, a new BSM audit event identifier was allocated for posix_openpt(2). Unfortunately, that identifier conflicts with other events already allocated to other systems in OpenBSM. Assign a new globally unique identifier and conform better to the AUE_ event naming scheme.
This is a stopgap until a new OpenBSM import is done with the correct identifier, so we'll maintain this as a local diff in svn until then.
Discussed with: ed Obtained from: TrustedBSD Project
|
#
181972 |
|
21-Aug-2008 |
obrien |
Add comments on NOARGS, NODEF, and NOPROTO.
|
#
181905 |
|
20-Aug-2008 |
ed |
Integrate the new MPSAFE TTY layer to the FreeBSD operating system.
The last half year I've been working on a replacement TTY layer for the FreeBSD kernel. The new TTY layer was designed to improve the following:
- Improved driver model:
The old TTY layer has a driver model that is not abstract enough to make it friendly to use. A good example is the output path, where the device drivers directly access the output buffers. This means that an in-kernel PPP implementation must always convert network buffers into TTY buffers.
If a PPP implementation would be built on top of the new TTY layer (still needs a hooks layer, though), it would allow the PPP implementation to directly hand the data to the TTY driver.
- Improved hotplugging:
With the old TTY layer, it isn't entirely safe to destroy TTY's from the system. This implementation has a two-step destructing design, where the driver first abandons the TTY. After all threads have left the TTY, the TTY layer calls a routine in the driver, which can be used to free resources (unit numbers, etc).
The pts(4) driver also implements this feature, which means posix_openpt() will now return PTY's that are created on the fly.
- Improved performance:
One of the major improvements is the per-TTY mutex, which is expected to improve scalability when compared to the old Giant locking. Another change is the unbuffered copying to userspace, which is both used on TTY device nodes and PTY masters.
Upgrading should be quite straightforward. Unlike previous versions, existing kernel configuration files do not need to be changed, except when they reference device drivers that are listed in UPDATING.
Obtained from: //depot/projects/mpsafetty/... Approved by: philip (ex-mentor) Discussed: on the lists, at BSDCan, at the DevSummit Sponsored by: Snow B.V., the Netherlands dcons(4) fixed by: kan
|
#
178888 |
|
09-May-2008 |
julian |
Add code to allow the system to handle multiple routing tables. This particular implementation is designed to be fully backwards compatible and to be MFC-able to 7.x (and 6.x)
Currently the only protocol that can make use of the multiple tables is IPv4 Similar functionality exists in OpenBSD and Linux.
From my notes:
-----
One thing where FreeBSD has been falling behind, and which by chance I have some time to work on is "policy based routing", which allows different packet streams to be routed by more than just the destination address.
Constraints: ------------
I want to make some form of this available in the 6.x tree (and by extension 7.x) , but FreeBSD in general needs it so I might as well do it in -current and back port the portions I need.
One of the ways that this can be done is to have the ability to instantiate multiple kernel routing tables (which I will now refer to as "Forwarding Information Bases" or "FIBs" for political correctness reasons). Which FIB a particular packet uses to make the next hop decision can be decided by a number of mechanisms. The policies these mechanisms implement are the "Policies" referred to in "Policy based routing".
One of the constraints I have if I try to back port this work to 6.x is that it must be implemented as a EXTENSION to the existing ABIs in 6.x so that third party applications do not need to be recompiled in timespan of the branch.
This first version will not have some of the bells and whistles that will come with later versions. It will, for example, be limited to 16 tables in the first commit. Implementation method, Compatible version. (part 1) ------------------------------- For this reason I have implemented a "sufficient subset" of a multiple routing table solution in Perforce, and back-ported it to 6.x. (also in Perforce though not always caught up with what I have done in -current/P4). The subset allows a number of FIBs to be defined at compile time (8 is sufficient for my purposes in 6.x) and implements the changes needed to allow IPV4 to use them. I have not done the changes for ipv6 simply because I do not need it, and I do not have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it.
Other protocol families are left untouched and should there be users with proprietary protocol families, they should continue to work and be oblivious to the existence of the extra FIBs.
To understand how this is done, one must know that the current FIB code starts everything off with a single dimensional array of pointers to FIB head structures (One per protocol family), each of which in turn points to the trie of routes available to that family.
The basic change in the ABI compatible version of the change is to extent that array to be a 2 dimensional array, so that instead of protocol family X looking at rt_tables[X] for the table it needs, it looks at rt_tables[Y][X] when for all protocol families except ipv4 Y is always 0. Code that is unaware of the change always just sees the first row of the table, which of course looks just like the one dimensional array that existed before.
The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign() are all maintained, but refer only to the first row of the array, so that existing callers in proprietary protocols can continue to do the "right thing". Some new entry points are added, for the exclusive use of ipv4 code called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(), which have an extra argument which refers the code to the correct row.
In addition, there are some new entry points (currently called rtalloc_fib() and friends) that check the Address family being looked up and call either rtalloc() (and friends) if the protocol is not IPv4 forcing the action to row 0 or to the appropriate row if it IS IPv4 (and that info is available). These are for calling from code that is not specific to any particular protocol. The way these are implemented would change in the non ABI preserving code to be added later.
One feature of the first version of the code is that for ipv4, the interface routes show up automatically on all the FIBs, so that no matter what FIB you select you always have the basic direct attached hosts available to you. (rtinit() does this automatically).
You CAN delete an interface route from one FIB should you want to but by default it's there. ARP information is also available in each FIB. It's assumed that the same machine would have the same MAC address, regardless of which FIB you are using to get to it.
This brings us as to how the correct FIB is selected for an outgoing IPV4 packet.
Firstly, all packets have a FIB associated with them. if nothing has been done to change it, it will be FIB 0. The FIB is changed in the following ways.
Packets fall into one of a number of classes.
1/ locally generated packets, coming from a socket/PCB. Such packets select a FIB from a number associated with the socket/PCB. This in turn is inherited from the process, but can be changed by a socket option. The process in turn inherits it on fork. I have written a utility call setfib that acts a bit like nice..
setfib -3 ping target.example.com # will use fib 3 for ping.
It is an obvious extension to make it a property of a jail but I have not done so. It can be achieved by combining the setfib and jail commands.
2/ packets received on an interface for forwarding. By default these packets would use table 0, (or possibly a number settable in a sysctl(not yet)). but prior to routing the firewall can inspect them (see below). (possibly in the future you may be able to associate a FIB with packets received on an interface.. An ifconfig arg, but not yet.)
3/ packets inspected by a packet classifier, which can arbitrarily associate a fib with it on a packet by packet basis. A fib assigned to a packet by a packet classifier (such as ipfw) would over-ride a fib associated by a more default source. (such as cases 1 or 2).
4/ a tcp listen socket associated with a fib will generate accept sockets that are associated with that same fib.
5/ Packets generated in response to some other packet (e.g. reset or icmp packets). These should use the FIB associated with the packet being reponded to.
6/ Packets generated during encapsulation. gif, tun and other tunnel interfaces will encapsulate using the FIB that was in effect withthe proces that set up the tunnel. thus setfib 1 ifconfig gif0 [tunnel instructions] will set the fib for the tunnel to use to be fib 1.
Routing messages would be associated with their process, and thus select one FIB or another. messages from the kernel would be associated with the fib they refer to and would only be received by a routing socket associated with that fib. (not yet implemented)
In addition Netstat has been edited to be able to cope with the fact that the array is now 2 dimensional. (It looks in system memory using libkvm (!)). Old versions of netstat see only the first FIB.
In addition two sysctls are added to give: a) the number of FIBs compiled in (active) b) the default FIB of the calling process.
Early testing experience: -------------------------
Basically our (IronPort's) appliance does this functionality already using ipfw fwd but that method has some drawbacks.
For example, It can't fully simulate a routing table because it can't influence the socket's choice of local address when a connect() is done.
Testing during the generating of these changes has been remarkably smooth so far. Multiple tables have co-existed with no notable side effects, and packets have been routes accordingly.
ipfw has grown 2 new keywords:
setfib N ip from anay to any count ip from any to any fib N
In pf there seems to be a requirement to be able to give symbolic names to the fibs but I do not have that capacity. I am not sure if it is required.
SCTP has interestingly enough built in support for this, called VRFs in Cisco parlance. it will be interesting to see how that handles it when it suddenly actually does something.
Where to next: --------------------
After committing the ABI compatible version and MFCing it, I'd like to proceed in a forward direction in -current. this will result in some roto-tilling in the routing code.
Firstly: the current code's idea of having a separate tree per protocol family, all of the same format, and pointed to by the 1 dimensional array is a bit silly. Especially when one considers that there is code that makes assumptions about every protocol having the same internal structures there. Some protocols don't WANT that sort of structure. (for example the whole idea of a netmask is foreign to appletalk). This needs to be made opaque to the external code.
My suggested first change is to add routing method pointers to the 'domain' structure, along with information pointing the data. instead of having an array of pointers to uniform structures, there would be an array pointing to the 'domain' structures for each protocol address domain (protocol family), and the methods this reached would be called. The methods would have an argument that gives FIB number, but the protocol would be free to ignore it.
When the ABI can be changed it raises the possibilty of the addition of a fib entry into the "struct route". Currently, the structure contains the sockaddr of the desination, and the resulting fib entry. To make this work fully, one could add a fib number so that given an address and a fib, one can find the third element, the fib entry.
Interaction with the ARP layer/ LL layer would need to be revisited as well. Qing Li has been working on this already.
This work was sponsored by Ironport Systems/Cisco
Reviewed by: several including rwatson, bz and mlair (parts each) Obtained from: Ironport systems/Cisco
|
#
177788 |
|
31-Mar-2008 |
kib |
Add the openat(), fexecve() and other *at() syscalls to the table.
Based on the submission by rdivacky, sponsored by Google Summer of Code 2007 Reviewed by: rwatson, rdivacky Tested by: pho
|
#
177633 |
|
26-Mar-2008 |
dfr |
Add the new kernel-mode NFS Lock Manager. To use it instead of the user-mode lock manager, build a kernel with the NFSLOCKD option and add '-k' to 'rpc_lockd_flags' in rc.conf.
Highlights include:
* Thread-safe kernel RPC client - many threads can use the same RPC client handle safely with replies being de-multiplexed at the socket upcall (typically driven directly by the NIC interrupt) and handed off to whichever thread matches the reply. For UDP sockets, many RPC clients can share the same socket. This allows the use of a single privileged UDP port number to talk to an arbitrary number of remote hosts.
* Single-threaded kernel RPC server. Adding support for multi-threaded server would be relatively straightforward and would follow approximately the Solaris KPI. A single thread should be sufficient for the NLM since it should rarely block in normal operation.
* Kernel mode NLM server supporting cancel requests and granted callbacks. I've tested the NLM server reasonably extensively - it passes both my own tests and the NFS Connectathon locking tests running on Solaris, Mac OS X and Ubuntu Linux.
* Userland NLM client supported. While the NLM server doesn't have support for the local NFS client's locking needs, it does have to field async replies and granted callbacks from remote NLMs that the local client has contacted. We relay these replies to the userland rpc.lockd over a local domain RPC socket.
* Robust deadlock detection for the local lock manager. In particular it will detect deadlocks caused by a lock request that covers more than one blocking request. As required by the NLM protocol, all deadlock detection happens synchronously - a user is guaranteed that if a lock request isn't rejected immediately, the lock will eventually be granted. The old system allowed for a 'deferred deadlock' condition where a blocked lock request could wake up and find that some other deadlock-causing lock owner had beaten them to the lock.
* Since both local and remote locks are managed by the same kernel locking code, local and remote processes can safely use file locks for mutual exclusion. Local processes have no fairness advantage compared to remote processes when contending to lock a region that has just been unlocked - the local lock manager enforces a strict first-come first-served model for both local and remote lockers.
Sponsored by: Isilon Systems PR: 95247 107555 115524 116679 MFC after: 2 weeks
|
#
177597 |
|
25-Mar-2008 |
ru |
Fixed type of the fourth argument of cpuset_{get,set}affinity(2) to be size_t.
Prodded by: davidxu
|
#
177091 |
|
12-Mar-2008 |
jeff |
Remove kernel support for M:N threading.
While the KSE project was quite successful in bringing threading to FreeBSD, the M:N approach taken by the kse library was never developed to its full potential. Backwards compatibility will be provided via libmap.conf for dynamically linked binaries and static binaries will be broken.
|
#
176730 |
|
02-Mar-2008 |
jeff |
Add cpuset, an api for thread to cpu binding and cpu resource grouping and assignment. - Add a reference to a struct cpuset in each thread that is inherited from the thread that created it. - Release the reference when the thread is destroyed. - Add prototypes for syscalls and macros for manipulating cpusets in sys/cpuset.h - Add syscalls to create, get, and set new numbered cpusets: cpuset(), cpuset_{get,set}id() - Add syscalls for getting and setting affinity masks for cpusets or individual threads: cpuid_{get,set}affinity() - Add types for the 'level' and 'which' parameters for the cpuset. This will permit expansion of the api to cover cpu masks for other objects identifiable with an id_t integer. For example, IRQs and Jails may be coming soon. - The root set 0 contains all valid cpus. All thread initially belong to cpuset 1. This permits migrating all threads off of certain cpus to reserve them for special applications.
Sponsored by: Nokia Discussed with: arch, rwatson, brooks, davidxu, deischen Reviewed by: antoine
|
#
176215 |
|
12-Feb-2008 |
ru |
Change readlink(2)'s return type and type of the last argument to match POSIX.
Prodded by: Alexey Lyashkov
|
#
175517 |
|
20-Jan-2008 |
rwatson |
Use audit events AUE_SHMOPEN and AUE_SHMUNLINK with new system calls shm_open() and shm_unlink(). More auditing will need to be done for these calls to capture arguments properly.
|
#
175164 |
|
08-Jan-2008 |
jhb |
Add a new file descriptor type for IPC shared memory objects and use it to implement shm_open(2) and shm_unlink(2) in the kernel: - Each shared memory file descriptor is associated with a swap-backed vm object which provides the backing store. Each descriptor starts off with a size of zero, but the size can be altered via ftruncate(2). The shared memory file descriptors also support fstat(2). read(2), write(2), ioctl(2), select(2), poll(2), and kevent(2) are not supported on shared memory file descriptors. - shm_open(2) and shm_unlink(2) are now implemented as system calls that manage shared memory file descriptors. The virtual namespace that maps pathnames to shared memory file descriptors is implemented as a hash table where the hash key is generated via the 32-bit Fowler/Noll/Vo hash of the pathname. - As an extension, the constant 'SHM_ANON' may be specified in place of the path argument to shm_open(2). In this case, an unnamed shared memory file descriptor will be created similar to the IPC_PRIVATE key for shmget(2). Note that the shared memory object can still be shared among processes by sharing the file descriptor via fork(2) or sendmsg(2), but it is unnamed. This effectively serves to implement the getmemfd() idea bandied about the lists several times over the years. - The backing store for shared memory file descriptors are garbage collected when they are not referenced by any open file descriptors or the shm_open(2) virtual namespace.
Submitted by: dillon, peter (previous versions) Submitted by: rwatson (I based this on his version) Reviewed by: alc (suggested converting getmemfd() to shm_open())
|
#
172819 |
|
19-Oct-2007 |
emaste |
Put comments about syscalls by the correct ones, and use the correct syscall number in the comment.
|
#
171859 |
|
16-Aug-2007 |
davidxu |
Add thr_kill2 syscall which sends a signal to a thread in another process.
Submitted by: Tijl Coosemans tijl at ulyssis dot org Approved by: re (kensmith)
|
#
171209 |
|
04-Jul-2007 |
peter |
Create new syscalls for mmap(), lseek(), pread(), pwrite(), truncate() and ftruncate(), but without the pad arg.
There are several reasons for this. Consider 'mmap()'. On AMD64, the function call (and syscall) ABI allow for 6 register arguments. Additional arguments go on the stack. mmap(2) has 6 arguments. However, the syscall definition has an extra 'int pad' argument. This pushes it to 7 arguments, which means one must spill into the memory stack. Since the kernel API doesn't match userland API, we have a hack in libc - libc/sys/mmap.c. This implements the userland API by calling __syscall() with an extra argument and the pad argument, for a total of 8 args. This is all unnecessary and inconvenient for several things, including the kernel's syscall handler code which now has to handle merging stack arguments with register arguments. It is a big deal for certain 3rd party code.
I'm adding libc glue to make the transition totally painless. I had intended to mark the old syscalls as COMPAT6, but the potential to shoot your feet by building a new kernel without COMPAT_FREEBSD6 but with a slighly older userland was too great. For now, they have manual "freebsd6_" prefixes rather than being COMPAT6. They will go back to being marked 'COMPAT6' after 7-stable starts.
Approved by: re (kensmith)
|
#
163953 |
|
03-Nov-2006 |
rrs |
Ok, here it is, we finally add SCTP to current. Note that this work is not just mine, but it is also the works of Peter Lei and Michael Tuexen. They both are my two key other developers working on the project.. and they need ata-boy's too: **** peterlei@cisco.com tuexen@fh-muenster.de **** I did do a make sysent which updated the syscall's and sysproto.. I hope that is correct... without it you don't build since we have new syscalls for SCTP :-0
So go out and look at the NOTES, add option SCTP (make sure inet and inet6 are present too) and play with SCTP.
I will see about comitting some test tools I have after I figure out where I should place them. I also have a lib (libsctp.a) that adds some of the missing socketapi functions that I need to put into lib's.. I will talk to George about this :-)
There may still be some 64 bit issues in here, none of us have a 64 bit processor to test with yet.. Michael may have a MAC but thats another beast too..
If you have a mac and want to use SCTP contact Michael he maintains a web site with a loadable module with this code :-)
Reviewed by: gnn Approved by: gnn
|
#
163449 |
|
17-Oct-2006 |
davidxu |
o Add keyword volatile for user mutex owner field. o Fix type consistent problem by using type long for old umtx and wait channel. o Rename casuptr to casuword.
|
#
162991 |
|
03-Oct-2006 |
rwatson |
Audit creat() system call (compat code), and change type for getpagesize(), which isn't actually being audited anyway.
MFC after: 3 days Obtained from: TrustedBSD Project
|
#
162497 |
|
21-Sep-2006 |
davidxu |
Replace system call thr_getscheduler, thr_setscheduler, thr_setschedparam with rtprio_thread, while rtprio system call is for process only, the new system call rtprio_thread is responsible for LWP.
|
#
162373 |
|
17-Sep-2006 |
rwatson |
AUE_SIGALTSTACK instead of AUE_SIGPENDING for sigaltstack().
Obtained from: TrustedBSD Project MFC after: 3 days
|
#
161952 |
|
03-Sep-2006 |
rwatson |
Assign proper audit event identifiers to a number of system calls not covered in previous passes:
- sysarch, rtprio - clock_settime - preadv/pwritev - __getcwd - kqueue - fhstatfs - kldunloadf
Obtained from: TrustedBSD Project
|
#
161946 |
|
03-Sep-2006 |
rwatson |
Use AUE_NTP_ADJTIME for ntp_adjtime() instead of AUE_ADJTIME.
Obtained from: TrustedBSD Project
|
#
161678 |
|
28-Aug-2006 |
davidxu |
This is initial version of POSIX priority mutex support, a new userland mutex structure is added as following: struct umutex { __lwpid_t m_owner; uint32_t m_flags; uint32_t m_ceilings[2]; uint32_t m_spare[4]; }; The m_owner represents owner thread, it is a thread id, in non-contested case, userland can simply use atomic_cmpset_int to lock the mutex, if the mutex is contested, high order bit will be set, and userland should do locking and unlocking via kernel syscall. Flag UMUTEX_PRIO_INHERIT represents pthread's PTHREAD_PRIO_INHERIT mutex, which when contention happens, kernel should do priority propagating. Flag UMUTEX_PRIO_PROTECT indicates it is pthread's PTHREAD_PRIO_PROTECT mutex, userland should initialize m_owner to contested state UMUTEX_CONTESTED, then atomic_cmpset_int will be failure and kernel syscall should be invoked to do locking, this becauses for such a mutex, kernel should always boost the thread's priority before it can lock the mutex, m_ceilings is used by PTHREAD_PRIO_PROTECT mutex, the first element is used to boost thread's priority when it locked the mutex, second element is used when the mutex is unlocked, the PTHREAD_PRIO_PROTECT mutex's link list is kept in userland, the m_ceiling[1] is managed by thread library so kernel needn't allocate memory to keep the link list, when such a mutex is unlocked, kernel reset m_owner to UMUTEX_CONTESTED. Flag USYNC_PROCESS_SHARED indicate if the synchronization object is process shared, if the flag is not set, it saves a vm_map_lookup() call.
The umtx chain is still used as a sleep queue, when a thread is blocked on PTHREAD_PRIO_INHERIT mutex, a umtx_pi is allocated to support priority propagating, it is dynamically allocated and reference count is used, it is not optimized but works well in my tests, while the umtx chain has its own locking protocol, the priority propagating protocol are all protected by sched_lock because priority propagating function is called with sched_lock held from scheduler.
No visible performance degradation is found which these changes. Some parameter names in _umtx_op syscall are renamed.
|
#
161367 |
|
16-Aug-2006 |
peter |
Grab two syscall numbers. One is used to emulate functionality that linux has in its procfs (do a readlink of /proc/self/fd/<nn> to find the pathname that corresponds to a given file descriptor). Valgrind-3.x needs this functionality. This is a placeholder only at this time.
|
#
161325 |
|
15-Aug-2006 |
jhb |
- Use NOSTD rather than NOIMPL for nfssvc() to match other syscalls provided via klds. - Correct audit identifier for nfssvc().
|
#
160798 |
|
28-Jul-2006 |
jhb |
Now that all system calls are MPSAFE, retire the SYF_MPSAFE flag used to mark system calls as being MPSAFE: - Stop conditionally acquiring Giant around system call invocations. - Remove all of the 'M' prefixes from the master system call files. - Remove support for the 'M' prefix from the script that generates the syscall-related files from the master system call files. - Don't explicitly set SYF_MPSAFE when registering nfssvc.
|
#
160797 |
|
28-Jul-2006 |
jhb |
Various fixes to comments in the syscall master files including removing cruft from the audit import and adding mention of COMPAT4 to freebsd32.
|
#
160319 |
|
13-Jul-2006 |
davidxu |
Add syscalls thr_setscheduler, thr_getscheduler, and thr_setschedparam, these syscalls are designed to set thread's scheduling parameters and policy, because each syscall contains a size parameter, it is possible to support future scheduling option, e.g SCHED_SPORADIC, this option needs other fields in structure sched_param, current they are not avaiblable.
|
#
160276 |
|
11-Jul-2006 |
jhb |
- Add conditional VFS Giant locking to getdents_common() (linux ABIs), ibcs2_getdents(), ibcs2_read(), ogetdirentries(), svr4_sys_getdents(), and svr4_sys_getdents64() similar to that in getdirentries(). - Mark ibcs2_getdents(), ibcs2_read(), linux_getdents(), linux_getdents64(), linux_readdir(), ogetdirentries(), svr4_sys_getdents(), and svr4_sys_getdents64() MPSAFE.
|
#
160111 |
|
05-Jul-2006 |
wsalamon |
Add audit events for the extended attribute system calls.
Obtained from: TrustedBSD Project Approved by: rwatson (mentor)
|
#
159982 |
|
27-Jun-2006 |
jhb |
- Expand the scope of Giant some in mount(2) to protect the vfsp structure from going away. mount(2) is now MPSAFE. - Expand the scope of Giant some in unmount(2) to protect the mp structure (or rather, to handle concurrent unmount races) from going away. umount(2) is now MPSAFE, as well as linux_umount() and linux_oldumount(). - nmount(2) and linux_mount() were already MPSAFE.
|
#
157211 |
|
28-Mar-2006 |
des |
Revert previous commit at davidxu's insistance. Instead, use __DECONST (argh!) and rearrange the prototypes to make it clear that _umtx_op() is not deprecated.
|
#
157206 |
|
28-Mar-2006 |
des |
The undocumented and deprecated system call _umtx_op() takes two pointer arguments. The first one is never used (all callers pass in 0); the second is sometimes used to pass in a struct timespec * which is used as a timeout and never modified. Constify that argument so callers can pass a const struct timespec * without jumping through hoops.
|
#
157037 |
|
23-Mar-2006 |
davidxu |
Implement aio_fsync() syscall.
|
#
156134 |
|
01-Mar-2006 |
davidxu |
Let kernel POSIX timer code and mqueue code to use integer as a resource handle, the timer_t and mqd_t types will be a pointer which userland will define it.
|
#
155377 |
|
06-Feb-2006 |
rwatson |
Prefer AUE_FOO audit identifiers to AUE_O_FOO, which are largely left over from the Darwin implementation.
When we implement a system call as a wrapper to sysctl(), audit it as AUE_SYSCTL. This leads to greater compatibility with Solaris audit trails as sysctl() argument tokens are not the same as the ones for the originaly system calls (i.e., setdomainname()).
Replace references to AUE_ events that are equivilent to AUE_NULL with AUE_NULL. In the case of process signal configuration, this is because these events do not require auditing.
Move from the Darwin spelling of getsockopt() to the FreeBSD/Solaris one.
Audit nmount().
Obtained from: TrustedBSD Project
|
#
155327 |
|
05-Feb-2006 |
davidxu |
Implement thr_set_name to set a name for thread.
Reviewed by: julian
|
#
155249 |
|
03-Feb-2006 |
rwatson |
Assign audit event identifiers to many system calls.
Much work by: wsalamon Obtained from: TrustedBSD Project
|
#
155199 |
|
01-Feb-2006 |
rwatson |
Map audit-related system calls to audit event identifiers.
Much work by: wsalamon Obtained from: TrustedBSD Project
|
#
154669 |
|
22-Jan-2006 |
davidxu |
Make aio code MP safe.
|
#
153679 |
|
23-Dec-2005 |
phk |
Add abort2() systemcall.
|
#
152845 |
|
26-Nov-2005 |
davidxu |
Don't use OpenBSD syscall numbers, instead, use new syscall numbers for POSIX message queue.
Suggested by: rwatson
|
#
152825 |
|
26-Nov-2005 |
davidxu |
Bring in experimental kernel support for POSIX message queue.
|
#
151867 |
|
30-Oct-2005 |
davidxu |
Fix sigevent's POSIX incompatible problem by adding member fields sigev_notify_function and sigev_notify_attributes. AIO syscalls use sigevent, so they have to be adjusted.
Reviewed by: alc
|
#
151576 |
|
23-Oct-2005 |
davidxu |
Implement POSIX timers. Current only CLOCK_REALTIME and CLOCK_MONOTONIC clock are supported. I have plan to merge XSI timer ITIMER_REAL and other two CPU timers into the new code, current three slots are available for the XSI timers. The SIGEV_THREAD notification type is not supported yet because our sigevent struct lacks of two member fields: sigev_notify_function sigev_notify_attributes I have found the sigevent is used in AIO, so I won't add the two members unless the AIO code is adjusted.
|
#
151445 |
|
18-Oct-2005 |
stefanf |
Const-qualify ksem_timedwait's parameter abstime as it's only passed in.
|
#
151316 |
|
14-Oct-2005 |
davidxu |
1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most changes in MD code are trivial, before this change, trapsignal and sendsig use discrete parameters, now they uses member fields of ksiginfo_t structure. For sendsig, this change allows us to pass POSIX realtime signal value to user code.
2. Remove cpu_thread_siginfo, it is no longer needed because we now always generate ksiginfo_t data and feed it to libpthread.
3. Add p_sigqueue to proc structure to hold shared signals which were blocked by all threads in the proc.
4. Add td_sigqueue to thread structure to hold all signals delivered to thread.
5. i386 and amd64 now return POSIX standard si_code, other arches will be fixed.
6. In this sigqueue implementation, pending signal set is kept as before, an extra siginfo list holds additional siginfo_t data for signals. kernel code uses psignal() still behavior as before, it won't be failed even under memory pressure, only exception is when deleting a signal, we should call sigqueue_delete to remove signal from sigqueue but not SIGDELSET. Current there is no kernel code will deliver a signal with additional data, so kernel should be as stable as before, a ksiginfo can carry more information, for example, allow signal to be delivered but throw away siginfo data if memory is not enough. SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can not be caught or masked. The sigqueue() syscall allows user code to queue a signal to target process, if resource is unavailable, EAGAIN will be returned as specification said. Just before thread exits, signal queue memory will be freed by sigqueue_flush. Current, all signals are allowed to be queued, not only realtime signals.
Earlier patch reviewed by: jhb, deischen Tested on: i386, amd64
|
#
150619 |
|
27-Sep-2005 |
csjp |
Mark the extended attribute syscalls as being MP safe.
Requested by: jhb
|
#
147831 |
|
08-Jul-2005 |
jhb |
Mark second instance of lchown() MP safe just like the first.
Approved by: re (scottl)
|
#
147813 |
|
07-Jul-2005 |
jhb |
- Add two new system calls: preadv() and pwritev() which are like readv() and writev() except that they take an additional offset argument and do not change the current file position. In SAT speak: preadv:readv::pread:read and pwritev:writev::pwrite:write. - Try to reduce code duplication some by merging most of the old kern_foov() and dofilefoo() functions into new dofilefoo() functions that are called by kern_foov() and kern_pfoov(). The non-v functions now all generate a simple uio on the stack from the passed in arguments and then call kern_foov(). For example, read() now just builds a uio and calls kern_readv() and pwrite() just builds a uio and calls kern_pwritev().
PR: kern/80362 Submitted by: Marc Olzheim marcolz at stack dot nl (1) Approved by: re (scottl) MFC after: 1 week
|
#
146806 |
|
30-May-2005 |
rwatson |
Introduce a new field in the syscalls.master file format to hold the audit event identifier associated with each system call, which will be stored by makesyscalls.sh in the sy_auevent field of struct sysent. For now, default the audit identifier on all system calls to AUE_NULL, but in the near future, other BSM event identifiers will be used. The mapping of system calls to event identifiers is many:one due to multiple system calls that map to the same end functionality across compatibility wrappers, ABI wrappers, etc.
Submitted by: wsalamon Obtained from: TrustedBSD Project
|
#
146783 |
|
29-May-2005 |
rwatson |
Normalize white space in syscalls.master: try to use tabs before system call types.
|
#
146723 |
|
28-May-2005 |
rwatson |
Mark ntp_gettime() as MSTD, since its system call path will acquire Giant if required.
|
#
146719 |
|
28-May-2005 |
rwatson |
Mark the following compatability system calls as MCOMPAT or MCOMPAT4 based on the their simply wrapping MPSAFE implementations of existing MPSAFE system calls:
getfsstat() lseek() stat() lstat() truncate() ftruncate() statfs() fstatfs()
Note that ogetdirentries() is not marked MPSAFE because it does not share the MPSAFE implementation used for getdirentries(), and requires separate locking to be implemented.
|
#
146716 |
|
28-May-2005 |
rwatson |
Mark quotactl() as MSTD.
|
#
146713 |
|
28-May-2005 |
rwatson |
Mark kenv(2) as MPSAFE, since it appears to be properly locked down.
|
#
146711 |
|
28-May-2005 |
rwatson |
Also mark the COMPAT4 version of fhstatfs() as MPSAFE.
|
#
146710 |
|
28-May-2005 |
rwatson |
Mark fhopen(), fhstat(), and fhstatfs() as MSTD, since they now acquire Giant themselves.
|
#
145434 |
|
23-Apr-2005 |
davidxu |
Add new syscall thr_new to create thread in atomic, it will inherit signal mask from parent thread, setup TLS and stack, and user entry address. Also support POSIX thread's PTHREAD_SCOPE_PROCESS and PTHREAD_SCOPE_SYSTEM, sysctl is also provided to control the scheduler scope.
|
#
143317 |
|
09-Mar-2005 |
stefanf |
Fix typo in comment.
|
#
142932 |
|
01-Mar-2005 |
ps |
Change the prototype of kevent to remove the const from the changelist.
Reviewed by: jhb
|
#
140840 |
|
26-Jan-2005 |
jeff |
- Struct mount is not yet locked well enough to allow mount/nmount/unmount to run without Giant. Mark them as STD here.
|
#
140724 |
|
24-Jan-2005 |
jeff |
- Change all VFS syscalls to MSTD as they all manually deal with giant or the appropriate filesystem locks.
Sponsored By: Isilon Systems, Inc.
|
#
139598 |
|
02-Jan-2005 |
marcel |
uuidgen(2) is MP safe.
|
#
139292 |
|
25-Dec-2004 |
davidxu |
Make _umtx_op() as more general interface, the final parameter needn't be timespec pointer, every parameter will be interpreted by its opcode.
|
#
139013 |
|
18-Dec-2004 |
davidxu |
1. make umtx sharable between processes, the way is two or more processes call mmap() to create a shared space, and then initialize umtx on it, after that, each thread in different processes can use the umtx same as threads in same process. 2. introduce a new syscall _umtx_op to support timed lock and condition variable semantics. also, orignal umtx_lock and umtx_unlock inline functions now are reimplemented by using _umtx_op, the _umtx_op can use arbitrary id not just a thread id.
|
#
138088 |
|
25-Nov-2004 |
phk |
Mark mount, unmount and nmount MPSAFE
|
#
137874 |
|
18-Nov-2004 |
marks |
Add ntp_gettime(2) system call.
Reviewed by: imp, phk, njl, peter Approved by: njl
|
#
136830 |
|
23-Oct-2004 |
rwatson |
Add system call place-holders for the following system calls implementing Sun's BSM Audit API on FreeBSD:
audit() auditon() getauid() setauid() getaudit() setaudit() getaudit_addr() setaudit_addr() auditctl()
Submitted by: Wayne Salamon <wsalamon at computer dot org> Obtained from: TrustedBSD Project
|
#
136207 |
|
06-Oct-2004 |
davidxu |
Regen to unbreak world.
Pointy hat to: mtm
|
#
132116 |
|
13-Jul-2004 |
phk |
Add kldunloadf() system call. Stay tuned for follwing commit messages.
|
#
132020 |
|
12-Jul-2004 |
davidxu |
Change kse_switchin to accept kse_thr_mailbox pointer, the syscall will be used heavily in debugging KSE threads. This breaks libpthread on IA64, but because libpthread was not in 5.2.1 release, I would like to change it so we needn't to introduce another syscall.
|
#
131431 |
|
01-Jul-2004 |
marcel |
Change the thread ID (thr_id_t) used for 1:1 threading from being a pointer to the corresponding struct thread to the thread ID (lwpid_t) assigned to that thread. The primary reason for this change is that libthr now internally uses the same ID as the debugger and the kernel when referencing to a kernel thread. This allows us to implement the support for debugging without additional translations and/or mappings.
To preserve the ABI, the 1:1 threading syscalls, including the umtx locking API have not been changed to work on a lwpid_t. Instead the 1:1 threading syscalls operate on long and the umtx locking API has not been changed except for the contested bit. Previously this was the least significant bit. Now it's the most significant bit. Since the contested bit should not be tested by userland, this change is not expected to be visible. Just to be sure, UMTX_CONTESTED has been removed from <sys/umtx.h>.
Reviewed by: mtm@ ABI preservation tested on: i386, ia64
|
#
130907 |
|
22-Jun-2004 |
rwatson |
Mark unlink() as MPSAFE as we now acquire Giant in the unlink() system call.
|
#
130904 |
|
22-Jun-2004 |
rwatson |
Mark link() system call as MPSAFE.
|
#
127890 |
|
05-Apr-2004 |
dfr |
Add lgetfh(2) which is like getfh(2) but doesn't follow symlinks.
|
#
127482 |
|
27-Mar-2004 |
mtm |
Separate thread synchronization from signals in libthr. Instead use msleep() and wakeup_one().
Discussed with: jhb, peter, tjr
|
#
127061 |
|
16-Mar-2004 |
dwmalone |
Get ready to mark open, creat and nosys as MPSAFE.
|
#
127034 |
|
15-Mar-2004 |
jhb |
Drop the proc lock around calls to the MD functions ptrace_single_step(), ptrace_set_pc(), and cpu_ptrace() so that those functions are free to acquire Giant, sleep, etc. We already do a PHOLD/PRELE around them so that it is safe to sleep inside of these routines if necessary. This allows ptrace() to be marked MP safe again as it no longer triggers lock order reversals on Alpha.
Tested by: wilko
|
#
126932 |
|
13-Mar-2004 |
peter |
Push Giant down a little further: - no longer serialize on Giant for thread_single*() and family in fork, exit and exec - thread_wait() is mpsafe, assert no Giant - reduce scope of Giant in exit to not cover thread_wait and just do vm_waitproc(). - assert that thread_single() family are not called with Giant - remove the DROP/PICKUP_GIANT macros from thread_single() family - assert that thread_suspend_check() s not called with Giant - remove manual drop_giant hack in thread_suspend_check since we know it isn't held. - remove the DROP/PICKUP_GIANT macros from thread_suspend_check() family - mark kse_create() mpsafe
|
#
125368 |
|
03-Feb-2004 |
deischen |
Add ksem_timedwait() to complement ksem_wait().
Glanced at by: alfred
|
#
123853 |
|
26-Dec-2003 |
alfred |
Put restrict back in, the compilation failure was my fault when I did a bad merge from the PR.
Thanks to Bruce Evans for explaining.
|
#
123817 |
|
24-Dec-2003 |
alfred |
We're not ready for restrict qualifiers here.
|
#
123811 |
|
24-Dec-2003 |
alfred |
Add restrict qualifiers.
PR: 44394 Submitted by: Craig Rodrigues <rodrige@attbi.com>
|
#
123750 |
|
23-Dec-2003 |
peter |
Remove namespc column and attempt to un-fold some of the longer lines that now fit.
|
#
123412 |
|
10-Dec-2003 |
peter |
Previous commit also changed the sendmsg prototype to something more closely matching reality. I did not actually mean to commit that yet.
|
#
123408 |
|
10-Dec-2003 |
peter |
Update file locations for syscall tables to copy to.
|
#
123252 |
|
07-Dec-2003 |
marcel |
Add kse_switchin(2). This syscall can be used by KSE implementations to have the kernel switch to a new thread, instead of doing it in userland. It is in fact needed on ia64 where syscall restarts do not return to userland first. It's completely handled inside the kernel. As such, any context created by the kernel as part of an upcall and caused by some syscall needs to be restored by the kernel.
|
#
122635 |
|
14-Nov-2003 |
jeff |
- Revision 1.156 marked ptrace() SMP safe. Unfortunately, alpha implements parts of ptrace using proc_rwmem(). proc_rwmem() requires giant, and giant must be acquired prior to the proc lock, so ptrace must require giant still.
|
#
122537 |
|
12-Nov-2003 |
mckusick |
Update the statfs structure with 64-bit fields to allow accurate reporting of multi-terabyte filesystem sizes.
You should build and boot a new kernel BEFORE doing a `make world' as the new kernel will know about binaries using the old statfs structure, but an old kernel will not know about the new system calls that support the new statfs structure. Running an old kernel after a `make world' will cause programs such as `df' that do a statfs system call to fail with a bad system call.
Reviewed by: Bruce Evans <bde@zeta.org.au> Reviewed by: Tim Robbins <tjr@freebsd.org> Reviewed by: Julian Elischer <julian@elischer.org> Reviewed by: the hoards of <arch@freebsd.org> Sponsored by: DARPA & NAI Labs.
|
#
122241 |
|
07-Nov-2003 |
jhb |
Mark ptrace(), ktrace(), utrace(), sysarch(), and issetugid() as MP safe. The parts of these calls that are not yet MP safe acquire Giant explicitly.
|
#
121298 |
|
21-Oct-2003 |
scottl |
Don peril-sensitive sunglasses and mark pipe(2) as MPSAFE. I've beaten up on it for the last 15 hours with no signs of problems. It gives a small (1%) gain on buildworld since pipe_read/pipe_write are already free of Giant.
|
#
121284 |
|
20-Oct-2003 |
dwmalone |
Mark dup as MPSAFE. Giant was pushed into dup ages ago, but it looks like it was missed in syscalls.master.
Spotted by: alc
|
#
119827 |
|
07-Sep-2003 |
alc |
msync(2) should be declared MP-safe.
|
#
117704 |
|
17-Jul-2003 |
davidxu |
o Refine kse_thr_interrupt to allow it to handle different commands. o Remove TDF_NOSIGPOST. o Add a member td_waitset to proc structure, it will be used for sigwait.
Tested by: deischen
|
#
116963 |
|
28-Jun-2003 |
davidxu |
o Change kse_thr_interrupt to allow send a signal to a specified thread, or unblock a thread in kernel, and allow UTS to specify whether syscall should be restarted. o Add ability for UTS to monitor signal comes in and removed from process, the flag PS_SIGEVENT is used to indicate the events. o Add a KMF_WAITSIGEVENT for KSE mailbox flag, UTS call kse_release with this flag set to wait for above signal event. o For SA based thread, kernel masks all signal in its signal mask, let UTS to use kse_thr_interrupt interrupt a thread, and install a signal frame in userland for the thread. o Add a tm_syncsig in thread mailbox, when a hardware trap occurs, it is used to deliver synchronous signal to userland, and upcall is schedule, so UTS can process the synchronous signal for the thread.
Reviewed by: julian (mentor)
|
#
115799 |
|
04-Jun-2003 |
rwatson |
Add system calls to explicitly list extended attributes on a file/directory/link, rather than using a less explicit hack on the extattr retrieval API:
extattr_list_fd() extattr_list_file() extattr_list_link()
The existing API was counter-intuitive, and poorly documented. The prototypes for these system calls are identical to extattr_get_*(), but without a specific attribute name to leave NULL.
Pointed out by: Dominic Giampaolo <dbg@apple.com> Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
#
113275 |
|
09-Apr-2003 |
mike |
o In struct prison, add an allprison linked list of prisons (protected by allprison_mtx), a unique prison/jail identifier field, two path fields (pr_path for reporting and pr_root vnode instance) to store the chroot() point of each jail. o Add jail_attach(2) to allow a process to bind to an existing jail. o Add change_root() to perform the chroot operation on a specified vnode. o Generalize change_dir() to accept a vnode, and move namei() calls to callers of change_dir(). o Add a new sysctl (security.jail.list) which is a group of struct xprison instances that represent a snapshot of active jails.
Reviewed by: rwatson, tjr
|
#
112911 |
|
01-Apr-2003 |
jeff |
- Mark the various thr syscalls as MP safe. Previously there was a bug if this was not done since thr_exit() unwinds giant.
|
#
112906 |
|
31-Mar-2003 |
jeff |
- Include umtx.h in files generated by makesyscalls.sh - Add system calls for umtx.
|
#
112901 |
|
31-Mar-2003 |
jeff |
- Add the four thr related system calls.
|
#
112893 |
|
31-Mar-2003 |
jeff |
- Define sigwait, sigtimedwait, and sigwaitinfo in terms of kern_sigtimedwait() which is capable of supporting all of their semantics. - These should be POSIX compliant but more careful review is needed before we announce this.
|
#
111169 |
|
20-Feb-2003 |
davidxu |
Add a timeout parameter to kse_release.
|
#
109895 |
|
26-Jan-2003 |
alfred |
Add const qualifier to data argument for msgsnd.
PR: standards/45274 Submitted by: Craig Rodrigues <rodrigc@attbi.com>
|
#
109831 |
|
25-Jan-2003 |
alfred |
Bring shm functions closer the the opengroup standards.
PR: 47469 Submitted by: Craig Rodrigues <rodrigc@attbi.com>
|
#
109829 |
|
25-Jan-2003 |
alfred |
Bring semop() closer the the opengroup standards.
PR: 47471 Submitted by: Craig Rodrigues <rodrigc@attbi.com>
|
#
108659 |
|
04-Jan-2003 |
davidxu |
Some KSE syscalls are MPSAFE.
|
#
108405 |
|
29-Dec-2002 |
rwatson |
Add definitions for four new system calls:
__acl_get_link() Retrieve an ACL by name without following symbolic links. __acl_set_link() Set an ACL by name without following symbolic links. __acl_delete_link() Delete an ACL by name without following symbolic links. __acl_aclcheck_link() Check an ACL against a file by name without following symbolic links.
These calls are similar in spirit to lstat(), lchown(), lchmod(), etc, and will be used under similar circumstances.
Obtained from: TrustedBSD Project
|
#
107913 |
|
15-Dec-2002 |
dillon |
This is David Schultz's swapoff code which I am finally able to commit. This should be considered highly experimental for the moment.
Submitted by: David Schultz <dschultz@uclink.Berkeley.EDU> MFC after: 3 weeks
|
#
106977 |
|
16-Nov-2002 |
deischen |
Add getcontext, setcontext, and swapcontext as system calls. Previously these were libc functions but were requested to be made into system calls for atomicity and to coalesce what might be two entrances into the kernel (signal mask setting and floating point trap) into one.
A few style nits and comments from bde are also included.
Tested on alpha by: gallatin
|
#
106466 |
|
05-Nov-2002 |
rwatson |
Flesh out the definition of __mac_execve(): per earlier discussion, it's essentially execve() with an optional MAC label argument.
Approved by: re Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
#
106312 |
|
01-Nov-2002 |
rwatson |
Rename __execve_mac() to __mac_execve() for increased consistency with other MAC system calls.
Requested by: various (phk, gordont, jake, ...)
|
#
105950 |
|
25-Oct-2002 |
peter |
Split 4.x and 5.x signal handling so that we can keep 4.x signal handling clean and functional as 5.x evolves. This allows some of the nasty bandaids in the 5.x codepaths to be unwound.
Encapsulate 4.x signal handling under COMPAT_FREEBSD4 (there is an anti-foot-shooting measure in place, 5.x folks need this for a while) and finish encapsulating the older stuff under COMPAT_43. Since the ancient stuff is required on alpha (longjmp(3) passes a 'struct osigcontext *' to the current sigreturn(2), instead of the 'ucontext_t *' that sigreturn is supposed to take), add a compile time check to prevent foot shooting there too. Add uniform COMPAT_43 stubs for ia64/sparc64/powerpc.
Tested on: i386, alpha, ia64. Compiled on sparc64 (a few days ago). Approved by: re
|
#
105691 |
|
22-Oct-2002 |
rwatson |
Flesh out prototypes for __mac_get_pid, __mac_get_link, and __mac_set_link, based on __mac_get_proc() except with a pid, and __mac_get_file(), __mac_set_file() except that they do not follow symlinks. First in a series of commits to flesh out the user API.
Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
#
105490 |
|
19-Oct-2002 |
peter |
Stake a claim on 418 (__xstat), 419 (__xfstat), 420 (__xlstat)
|
#
105486 |
|
19-Oct-2002 |
peter |
Grab 416/417 real estate before I get burned while testing again. This is for the not-quite-ready signal/fpu abi stuff. It may not see the light of day, but I'm certainly not going to be able to validate it when getting shot in the foot due to syscall number conflicts.
|
#
105476 |
|
19-Oct-2002 |
rwatson |
Add a placeholder for the execve_mac() system call, similar to SELinux's execve_secure() system call, which permits a process to pass in a label for a label change during exec. This permits SELinux to change the label for the resulting exec without a race following a manual label change on the process. Because this interface uses our general purpose MAC label abstraction, we call it execve_mac(), and wrap our port of SELinux's execve_secure() around it with appropriate sid mappings.
Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
#
105144 |
|
14-Oct-2002 |
peter |
Restore pointer that was removed in 1.128. This wasn't a merge-o.
|
#
104747 |
|
10-Oct-2002 |
rwatson |
Fix what looks like a merge-o from a conflict in the last commit to syscalls.master.
|
#
104734 |
|
09-Oct-2002 |
peter |
Add a pointer to the alternate syscall tables on 64 bit platforms.
|
#
104730 |
|
09-Oct-2002 |
rwatson |
Flesh out the extattr_{delete,get,set}_link() system calls: variations on the _file() theme that do not follow symlinks. Sync to MAC tree.
Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
#
104379 |
|
02-Oct-2002 |
archie |
Let kse_wakeup() take a KSE mailbox pointer argument.
Reviewed by: julian
|
#
104262 |
|
01-Oct-2002 |
rwatson |
Reserve system call numbers for the following system calls:
__mac_get_pid Retrieve MAC label of a process by pid
Similar to __mac_get_proc() except that the target process of the operation is explicitly specified rather than assuming curthread.
__mac_get_link Retrieve MAC label of a path with NOFOLLOW __mac_set_link Set MAC label of a path with NOFOLLOW extattr_set_link Set EAs on a path with NOFOLLOW extattr_get_link Retrieve EAs on a path with NOFOLLOW extattr_delete_link Delete EAs on a path with NOFOLLOW
These calls are similar to __mac_get_file(), __mac_set_file(), extattr_set_file(), extattr_get_file(), and extattr_delete_file(), except that they do not follow symlinks. The distinction between these calls is similar to lchown() vs chown().
Implementations to follow.
Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
#
103972 |
|
25-Sep-2002 |
archie |
Make the following name changes to KSE related functions, etc., to better represent their purpose and minimize namespace conflicts:
kse_fn_t -> kse_func_t struct thread_mailbox -> struct kse_thr_mailbox thread_interrupt() -> kse_thr_interrupt() kse_yield() -> kse_release() kse_new() -> kse_create()
Add missing declaration of kse_thr_interrupt() to <sys/kse.h>. Regenerate the various generated syscall files. Minor style fixes.
Reviewed by: julian
|
#
103574 |
|
18-Sep-2002 |
alfred |
Add the rest of the kernel support for the sem_ API in kern/uipc_sem.c.
Option 'P1003_1B_SEMAPHORES' to compile them in, or load the "sem" module to activate them.
Have kern/makesyscalls.sh emit an include for sys/_semaphore.h into sysproto.h to pull in the typedef for semid_t.
Add the syscalls to the syscall table as module stubs.
|
#
102132 |
|
19-Aug-2002 |
rwatson |
mac_syscall is now implemented, switch to MSTD.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
101425 |
|
06-Aug-2002 |
rwatson |
Rename mac_policy() to mac_syscall() to be more reflective of its purpose.
Submitted by: cvance@tislabs.com Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
100990 |
|
30-Jul-2002 |
rwatson |
Introduce support for Mandatory Access Control and extensible kernel access control.
Replace 'void *' with 'struct mac *' now that mac.h is in the base tree. The current POSIX.1e-derived userland MAC interface is schedule for replacement, but will act as a functional placeholder until the replacement is done. These system calls allow userland processes to get and set labels on both the current process, as well as file system objects and file descriptor backed objects.
|
#
100954 |
|
30-Jul-2002 |
rwatson |
Introduce a mac_policy() system call that will provide MAC policies with a general purpose front end entry point for user applications to invoke. The MAC framework will route the system call to the appropriate policy by name.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
100896 |
|
30-Jul-2002 |
rwatson |
Prototype function arguments, only with MAC-specific structures replaced with void until we bring in the actual structure definitions.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
99916 |
|
13-Jul-2002 |
alfred |
Remove incorrect comment about now corrected manpage.
|
#
99855 |
|
12-Jul-2002 |
alfred |
Create a bug-for-bug FreeBSD4 compatible version of sendfile and move the fixed sendfile over. This is needed to preserve binary compatibility from 4.x to 5.x.
|
#
99072 |
|
29-Jun-2002 |
julian |
Part 1 of KSE-III
The ability to schedule multiple threads per process (one one cpu) by making ALL system calls optionally asynchronous. to come: ia64 and power-pc patches, patches for gdb, test program (in tools)
Reviewed by: Almost everyone who counts (at various times, peter, jhb, matt, alfred, mini, bernd, and a cast of thousands)
NOTE: this is still Beta code, and contains lots of debugging stuff. expect slight instability in signals..
|
#
98197 |
|
13-Jun-2002 |
rwatson |
Keep POSIX.1e capabilities system call placeholders, but remove definitions.
|
#
97369 |
|
28-May-2002 |
marcel |
Add syscall uuidgen() for generating Univerally Unique Identifiers (UUIDs). On ia64 UUIDs, aka GUIDs, are used by EFI and the firmware among others. To create GUID Partition Tables (GPTs), we need to be able to generate UUIDs.
|
#
96083 |
|
05-May-2002 |
mux |
Add an entry for the lchflags(2) syscall. It's useful to prevent a symlink deletion.
Reviewed by: rwatson
|
#
94935 |
|
17-Apr-2002 |
mux |
Add an entry for the kenv(2) syscall (code to follow).
Reviewed by: peter
|
#
94640 |
|
14-Apr-2002 |
alc |
Remove the requirement that Giant be held around sigreturn().
|
#
94446 |
|
11-Apr-2002 |
alc |
Remove the requirement that Giant be held around osigreturn(). All platform- specific implementations are MPSAFE.
|
#
91692 |
|
05-Mar-2002 |
rwatson |
Reserve system call numbers for the MAC framework. This will prevent people working on the MAC tree from getting toasted whenever system call numbers are allocated in the main tree (for example, for KSE :-). Calls allocated: __mac_{get,set}_proc, __mac_{get,set}_{fd,file}().
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
90889 |
|
19-Feb-2002 |
julian |
Add stub syscalls and definitions for KSE calls. "Book'em Danno"
|
#
90886 |
|
19-Feb-2002 |
julian |
Add 5 KSE syscalls. Two will be implemented with the next KSE step and the others are reservations for coming code. All will be stubbed in this kernel in the next commit. This will allow people to easily make KSE binaries for userland testing (the syscalls will be in libc) but they will still need a real KSE kernel to test it. (libc looks in /sys to decide what it should add stubs for).
|
#
90777 |
|
17-Feb-2002 |
deischen |
Fix prototype to sigreturn to use struct __ucontext instead of ucontext_t.
|
#
90448 |
|
10-Feb-2002 |
rwatson |
Part I: Update extended attribute API and ABI:
o Modify the system call syntax for extattr_{get,set}_{fd,file}() so as not to use the scatter gather API (which appeared not to be used by any consumers, and be less portable), rather, accepts 'data' and 'nbytes' in the style of other simple read/write interfaces. This changes the API and ABI.
o Modify system call semantics so that extattr_get_{fd,file}() return a size_t. When performing a read, the number of bytes read will be returned, unless the data pointer is NULL, in which case the number of bytes of data are returned. This changes the API only.
o Modify the VOP_GETEXTATTR() vnode operation to accept a *size_t argument so as to return the size, if desirable. If set to NULL, the size will not be returned.
o Update various filesystems (pseodofs, ufs) to DTRT.
These changes should make extended attributes more useful and more portable. More commits to rebuild the system call files, as well as update userland utilities to follow.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
90073 |
|
01-Feb-2002 |
bde |
Made osigreturn(2) standard so that SYS_osigreturn can be used in the signal trampoline for old signals. The arches that support old signals currently abuse sigreturn(2) instead. This mainly complicates things and slightly breaks the the new sigreturn(2).
COMPAT is too limited to support the correct configuration of osigreturn, and this commit doesn't attempt to fix it; it just moves the bogusness: osigreturn() must now be provided unconditionally even on arches that don't really need it; previously it had to be provided under the bogus condition defined(COMPAT_43).
|
#
88633 |
|
29-Dec-2001 |
alfred |
Make AIO a loadable module.
Remove the explicit call to aio_proc_rundown() from exit1(), instead AIO will use at_exit(9).
Add functions at_exec(9), rm_at_exec(9) which function nearly the same as at_exec(9) and rm_at_exec(9), these functions are called on behalf of modules at the time of execve(2) after the image activator has run.
Use a modified version of tegge's suggestion via at_exec(9) to close an exploitable race in AIO.
Fix SYSCALL_MODULE_HELPER such that it's archetecuterally neutral, the problem was that one had to pass it a paramater indicating the number of arguments which were actually the number of "int". Fix it by using an inline version of the AS macro against the syscall arguments. (AS should be available globally but we'll get to that later.)
Add a primative system for dynamically adding kqueue ops, it's really not as sophisticated as it should be, but I'll discuss with jlemon when he's around.
|
#
85890 |
|
02-Nov-2001 |
phk |
Reserve 378 for the new mount syscall Maxime Henrion <mux@qualys.com> is working on. (This is to get us more than 32 mountoptions).
|
#
84883 |
|
13-Oct-2001 |
rwatson |
o Reserve system call 377 for afs_syscall; by reserving a system call number, portable OpenAFS applications don't have to attempt to determine what system call number was dynamically allocated. No system call prototype or implementation is defined.
Requested by: Tom Maher <tardis@watson.org>
|
#
83795 |
|
21-Sep-2001 |
rwatson |
o Introduce eaccess(2), a version of access(2) that uses the effective credentials rather than the real credentials. This is useful for implementing GUI's which need to modify icons based on access rights, but where use of open(2) is too expensive, use of stat(2) doesn't reflect the file system's real protection model, and use of access() suffers from real/effective credential confusion. This implementation provides the same semantics as the call of the same name on SCO OpenServer. Note: using this call improperly can leave you subject to some of the same races present in the access(2) call. o To implement this, break out the basic logic of access(2) into vpaccess(), which accepts a passed credential to perform the invocation of VOP_ACCESS(). Add eaccess(2) to invoke vpaccess(), and modify access(2) to use vpaccess().
Obtained from: TrustedBSD Project
|
#
83651 |
|
18-Sep-2001 |
peter |
Cleanup and split of nfs client and server code. This builds on the top of several repo-copies.
|
#
82753 |
|
01-Sep-2001 |
dillon |
Synchronize syscalls.master(s) with recent Giant pushdown work
|
#
82711 |
|
01-Sep-2001 |
dillon |
Make yield() MPSAFE.
Synchronize syscalls.master with all MPSAFE changes to date. Synchronize new syscall generation follows because yield() will panic if it is out of sync with syscalls.master.
|
#
82610 |
|
30-Aug-2001 |
dillon |
Giant pushdown syscalls in kern/uipc_syscalls.c. Affected calls:
recvmsg(), sendmsg(), recvfrom(), accept(), getpeername(), getsockname(), socket(), connect(), accept(), send(), recv(), bind(), setsockopt(), listen(), sendto(), shutdown(), socketpair(), sendfile()
|
#
82607 |
|
30-Aug-2001 |
dillon |
Giant Pushdown: sysv shm, sem, and msg calls.
|
#
82585 |
|
30-Aug-2001 |
dillon |
Remove the MPSAFE keyword from the parser for syscalls.master. Instead introduce the [M] prefix to existing keywords. e.g. MSTD is the MP SAFE version of STD. This is prepatory for a massive Giant lock pushdown. The old MPSAFE keyword made syscalls.master too messy.
Begin comments MP-Safe procedures with the comment: /* * MPSAFE */ This comments means that the procedure may be called without Giant held (The procedure itself may still need to obtain Giant temporarily to do its thing).
sv_prepsyscall() is now MP SAFE and assumed to be MP SAFE sv_transtrap() is now MP SAFE and assumed to be MP SAFE
ktrsyscall() and ktrsysret() are now MP SAFE (Giant Pushdown) trapsignal() is now MP SAFE (Giant Pushdown)
Places which used to do the if (mtx_owned(&Giant)) mtx_unlock(&Giant) test in syscall[2]() in */*/trap.c now do not. Instead they explicitly unlock Giant if they previously obtained it, and then assert that it is no longer held to catch broken system calls.
Rebuild syscall tables.
|
#
77386 |
|
29-May-2001 |
phk |
Remove a comment which was past its shelf life.
PR: 18750 Submitted by: Tony Finch <dot@dotat.at>
|
#
76827 |
|
18-May-2001 |
alfred |
Introduce a global lock for the vm subsystem (vm_mtx).
vm_mtx does not recurse and is required for most low level vm operations.
faults can not be taken without holding Giant.
Memory subsystems can now call the base page allocators safely.
Almost all atomic ops were removed as they are covered under the vm mutex.
Alpha and ia64 now need to catch up to i386's trap handlers.
FFS and NFS have been tested, other filesystems will need minor changes (grabbing the vm lock when twiddling page properties).
Reviewed (partially) by: jake, jhb
|
#
76472 |
|
11-May-2001 |
tegge |
gettimeofday() is MP safe on both -current and -stable.
|
#
75426 |
|
11-Apr-2001 |
rwatson |
o Introduce a new system call, __setsugid(), which allows a process to toggle the P_SUGID bit explicitly, rather than relying on it being set implicitly by other protection and credential logic. This feature is introduced to support inter-process authorization regression testing by simplifying userland credential management allowing the easy isolation and reproduction of authorization events with specific security contexts. This feature is enabled only by "options REGRESSION" and is not intended to be used by applications. While the feature is not known to introduce security vulnerabilities, it does allow processes to enter previously inaccessible parts of the credential state machine, and is therefore disabled by default. It may not constitute a risk, and therefore in the future pending further analysis (and appropriate need) may become a published interface.
Obtained from: TrustedBSD Project
|
#
75038 |
|
31-Mar-2001 |
rwatson |
o Introduce extattr_{delete,get,set}_fd() to allow extended attribute operations on file descriptors, which complement the existing set of calls, extattr_{delete,get,set}_file() which act on paths. In doing so, restructure the system call implementation such that the two sets of functions share most of the relevant code, rather than duplicating it. This pushes the vnode locking into the shared code, but keeps the copying in of some arguments in the system call code. Allowing access via file descriptors reduces the opportunity for race conditions when managing extended attributes.
Obtained from: TrustedBSD Project
|
#
74437 |
|
19-Mar-2001 |
rwatson |
o Rename "namespace" argument to "attrnamespace" as namespace is a C++ reserved word.
Submitted by: jkh Obtained from: TrustedBSD Project
|
#
74273 |
|
15-Mar-2001 |
rwatson |
o Change the API and ABI of the Extended Attribute kernel interfaces to introduce a new argument, "namespace", rather than relying on a first- character namespace indicator. This is in line with more recent thinking on EA interfaces on various mailing lists, including the posix1e, Linux acl-devel, and trustedbsd-discuss forums. Two namespaces are defined by default, EXTATTR_NAMESPACE_SYSTEM and EXTATTR_NAMESPACE_USER, where the primary distinction lies in the access control model: user EAs are accessible based on the normal MAC and DAC file/directory protections, and system attributes are limited to kernel-originated or appropriately privileged userland requests.
o These API changes occur at several levels: the namespace argument is introduced in the extattr_{get,set}_file() system call interfaces, at the vnode operation level in the vop_{get,set}extattr() interfaces, and in the UFS extended attribute implementation. Changes are also introduced in the VFS extattrctl() interface (system call, VFS, and UFS implementation), where the arguments are modified to include a namespace field, as well as modified to advoid direct access to userspace variables from below the VFS layer (in the style of recent changes to mount by adrian@FreeBSD.org). This required some cleanup and bug fixing regarding VFS locks and the VFS interface, as a vnode pointer may now be optionally submitted to the VFS_EXTATTRCTL() call. Updated documentation for the VFS interface will be committed shortly.
o In the near future, the auto-starting feature will be updated to search two sub-directories to the ".attribute" directory in appropriate file systems: "user" and "system" to locate attributes intended for those namespaces, as the single filename is no longer sufficient to indicate what namespace the attribute is intended for. Until this is committed, all attributes auto-started by UFS will be placed in the EXTATTR_NAMESPACE_SYSTEM namespace.
o The default POSIX.1e attribute names for ACLs and Capabilities have been updated to no longer include the '$' in their filename. As such, if you're using these features, you'll need to rename the attribute backing files to the same names without '$' symbols in front.
o Note that these changes will require changes in userland, which will be committed shortly. These include modifications to the extended attribute utilities, as well as to libutil for new namespace string conversion routines. Once the matching userland changes are committed, a buildworld is recommended to update all the necessary include files and verify that the kernel and userland environments are in sync. Note: If you do not use extended attributes (most people won't), upgrading is not imperative although since the system call API has changed, the new userland extended attribute code will no longer compile with old include files.
o Couple of minor cleanups while I'm there: make more code compilation conditional on FFS_EXTATTR, which should recover a bit of space on kernels running without EA's, as well as update copyright dates.
Obtained from: TrustedBSD Project
|
#
69512 |
|
02-Dec-2000 |
jake |
Remove thr_sleep and thr_wakeup. Remove fields p_nthread and p_wakeup from struct proc, which are now unused (p_nthread already was). Remove process flag P_KTHREADP which was untested and only set in vfs_aio.c (it should use kthread_create). Move the yield system call to kern_synch.c as kern_threads.c has been removed completely.
moral support from: alfred, jhb
|
#
69449 |
|
01-Dec-2000 |
alfred |
sysvipc loadable.
new syscall entry lkmressys - "reserved loadable syscall"
Make syscall_register allow overwriting of such entries (lkmressys).
|
#
65150 |
|
28-Aug-2000 |
marcel |
Fix prototypes for {o|}{g|s}etrlimit. A recent change in the Linuxulator caused this bug to trigger.
|
#
64001 |
|
29-Jul-2000 |
peter |
Sigh. Fix SYS_exit problems. I misunderstood the significance of these trailing options.
|
#
63986 |
|
28-Jul-2000 |
peter |
Change the 'exit()' system call to 'sys_exit()'. This avoids overlapping gcc's internal exit() prototypes and the (futile) hackery that we did to try and avoid warnings. main() was renamed for similar reasons. Remove an exit related hack from makesyscalls.sh.
|
#
63452 |
|
18-Jul-2000 |
jlemon |
Simplify kqueue API slightly.
Discussed on: -arch
|
#
63082 |
|
13-Jul-2000 |
rwatson |
o Introduce syscall prototypes, stubs for __cap_{get,set}_{fd,file}, syscalls to manage capability sets on files. First of two commits.
Obtained from: TrustedBSD Project
|
#
61718 |
|
15-Jun-2000 |
rwatson |
Introduce syscalls for process capability manipulation. Currently backs onto already committed stubs. Commit one of two.
Reviewed by: Damned if I can remember. Many people. Obtained from: TrustedBSD Project
|
#
60247 |
|
09-May-2000 |
bde |
Fixed the declaration of mmap(). The crufty padding arg had the wrong type. This gave an inconsistent amount of crufty padding on i386's with 64-bit longs (8 bytes instead of 4). On alphas it gives a consistent amount of crufty padding (8 bytes) in addition to the 4 bytes of normal padding caused by passing int args as register_t's.
Fixed the args struct tag for the NOPROTO syscalls (netbsd_lchown() and netbsd_msync()). The tag is currently unused for NOPROTO syscalls, so the bug has no effect, but it will be used even in the NOPROTO case to calculate sy_nargs correctly.
|
#
59827 |
|
01-May-2000 |
peter |
Remove undocumented broken-as-designed semconfig() syscall.
|
#
59288 |
|
16-Apr-2000 |
jlemon |
Introduce kqueue() and kevent(), a kernel event notification facility.
|
#
58963 |
|
03-Apr-2000 |
alfred |
Make makesyscalls.sh parse an optional field 'MPSAFE' that specifies that a syscall does not want the BGL to be grabbed automatically.
Add the new MPSAFE flag to the syscalls that dillon has determined to be MPSAFE.
|
#
56270 |
|
19-Jan-2000 |
rwatson |
Fix bde'isms in acl/extattr syscall interface, renaming syscalls to prettier (?) names, adding some const's around here, et al.
Commit 1 out of 3.
Reviewed by: bde
|
#
56115 |
|
16-Jan-2000 |
peter |
Implement setres[ug]id() and getres[ug]id(). This has been sitting in my tree for ages (~2 years) waiting for an excuse to commit it. Now Linux has implemented it and it seems that Staroffice (when using the linux_base6.1 port's libc) calls this in the linux emulator and dies in setup. The Linux emulator can call these now.
|
#
55943 |
|
14-Jan-2000 |
jasone |
Add aio_waitcomplete(). Make aio work correctly for socket descriptors. Make gratuitous style(9) fixes (me, not the submitter) to make the aio code more readable.
PR: kern/12053 Submitted by: Chris Sedore <cmsedore@maxwell.syr.edu>
|
#
54970 |
|
21-Dec-1999 |
alfred |
make getfh a standard syscall instead of dependant on having NFSSERVER defined, useful for userland fileservers that want to use a filehandle type interface to the filesystem.
Submitted by: Assar Westerlund assar@stacken.kth.se PR: kern/15452
|
#
54802 |
|
19-Dec-1999 |
rwatson |
First pass commit to introduce new ACL and Extended Attribute system calls. The second pass commit with all the supporting code will happen shortly afterwards.
Reviewed by: eivind
|
#
53299 |
|
17-Nov-1999 |
brian |
modfind(char *) -> modfind(const char *)
Reminded by: dfr
|
#
52149 |
|
12-Oct-1999 |
marcel |
Now that userland including modules don't use the osig* syscalls, make them of type COMPAT.
|
#
51790 |
|
29-Sep-1999 |
marcel |
sigset_t change (part 1 of 5) -----------------------------
Rename sigaction, sigprocmask, sigpending and sigsuspend to osigaction, osigprocmask, osigpending and osigsuspend (resp) and add new syscalls for them to support the new sisgset_t without breaking existing binaries.
Change the prototype of sigaltstack to use the typedef stack_t instead of struct sigaltstack to reflect that it is SUSv2 compliant.
Also, rename sigreturn to osigreturn and add a new syscall to support the modified stackframe. The change is caused by sigreturn operating on ucontext_t now and the fact that siginfo_t has been updated to conform to SUSv2.
|
#
51138 |
|
10-Sep-1999 |
alfred |
Seperate the export check in VFS_FHTOVP, exports are now checked via VFS_CHECKEXP.
Add fh(open|stat|stafs) syscalls to allow userland to query filesystems based on (network) filehandle.
Obtained from: NetBSD
|
#
50477 |
|
27-Aug-1999 |
peter |
$Id$ -> $FreeBSD$
|
#
49641 |
|
11-Aug-1999 |
nik |
Add CPT_NOA, LIBCOMPAT, NODEF, NOARGS, NOPROTO, and NOIMPL to the commented list of available types.
PR: docs/13007 Submitted by: Assar Westerlund <assar@sics.se>
|
#
49428 |
|
05-Aug-1999 |
jkh |
Move syscall 180 back to where it was before and fix the incorrect comment which led me to move it in the first place.
|
#
49420 |
|
04-Aug-1999 |
jkh |
Reserve a syscall for the arla folks. I'm assuming that since syscalls.c and init_sysent.c are checked into CVS, I should also commit the regenerated copies even though they're built by syscalls.master. Correct? Bruce? :)
|
#
47103 |
|
13-May-1999 |
bde |
Fixed nonsense arg type `const caddr_t' in the prototype() for utrace(). Changed to `const void *'. utrace() is undocumented, so nothing should notice.
Fixed missing consts for utrace() and ktrace() in syscalls.master.
sys/ktrace.h is missing some Lite2 changes of shorts to ints.
|
#
46154 |
|
28-Apr-1999 |
phk |
Add the jail system call.
|
#
45311 |
|
04-Apr-1999 |
dt |
Add standard padding argument to pread and pwrite syscall. That should make them NetBSD compatible.
Add parameter to fo_read and fo_write. (The only flag FOF_OFFSET mean that the offset is set in the struct uio).
Factor out some common code from read/pread/write/pwrite syscalls.
|
#
45065 |
|
27-Mar-1999 |
alc |
Added pread and pwrite. These functions are defined by the X/Open Threads Extension. (Note: We use the same syscall numbers as NetBSD.)
Submitted by: John Plevyak <jplevyak@inktomi.com>
|
#
41088 |
|
11-Nov-1998 |
peter |
A kldsym(2) syscall prototype for extracting information from the in-kernel linker. This is intended to replace kvm_mkdb etc. The first version only does name->value lookups, but it's open ended. value->name lookups would probably be a good thing to do too.
It's been suggested to try and connect the symbol tables to sysctl (which is probably a more flexible way of doing it if it's done right), but that is far more complex and difficult than I was ready to have a shot at.
|
#
40931 |
|
05-Nov-1998 |
dg |
Implemented zero-copy TCP/IP extensions via sendfile(2) - send a file to a stream socket. sendfile(2) is similar to implementations in HP-UX, Linux, and other systems, but the API is more extensive and addresses many of the complaints that the Apache Group and others have had with those other implementations. Thanks to Marc Slemko of the Apache Group for helping me work out the best API for this. Anyway, this has the "net" result of speeding up sends of files over TCP/IP sockets by about 10X (that is to say, uses 1/10th of the CPU cycles) when compared to a traditional read/write loop.
|
#
38515 |
|
24-Aug-1998 |
dfr |
Fix a few syscall arguments to use size_t instead of u_int.
|
#
36735 |
|
07-Jun-1998 |
dfr |
This commit fixes various 64bit portability problems required for FreeBSD/alpha. The most significant item is to change the command argument to ioctl functions from int to u_long. This change brings us inline with various other BSD versions. Driver writers may like to use (__FreeBSD_version == 300003) to detect this change.
The prototype FreeBSD/alpha machdep will follow in a couple of days time.
|
#
36033 |
|
14-May-1998 |
peter |
deep-six signanosleep(). It sounded like a good idea at the time.
|
#
35938 |
|
11-May-1998 |
dyson |
Fix the futimes/undelete/utrace conflict with other BSD's. Note that the only common usage of utrace (the possible problem with this commit) is with malloc, so this should be a real problem. Add the various NetBSD syscalls that allow full emulation of their development environment.
|
#
34925 |
|
28-Mar-1998 |
dufault |
Finish _POSIX_PRIORITY_SCHEDULING. Needs P1003_1B and _KPOSIX_PRIORITY_SCHEDULING options to work. Changes:
Change all "posix4" to "p1003_1b". Misnamed files are left as "posix4" until I'm told if I can simply delete them and add new ones;
Add _POSIX_PRIORITY_SCHEDULING system calls for FreeBSD and Linux;
Add man pages for _POSIX_PRIORITY_SCHEDULING system calls;
Add options to LINT;
Minor fixes to P1003_1B code during testing.
|
#
33040 |
|
03-Feb-1998 |
bde |
Fixed type of mincore().
|
#
32889 |
|
30-Jan-1998 |
phk |
Retire LFS.
If you want to play with it, you can find the final version of the code in the repository the tag LFS_RETIREMENT.
If somebody makes LFS work again, adding it back is certainly desireable, but as it is now nobody seems to care much about it, and it has suffered considerable bitrot since its somewhat haphazard integration.
R.I.P
|
#
32726 |
|
24-Jan-1998 |
eivind |
Make all file-system (MFS, FFS, NFS, LFS, DEVFS) related option new-style.
This introduce an xxxFS_BOOT for each of the rootable filesystems. (Presently not required, but encouraged to allow a smooth move of option *FS to opt_dontuse.h later.)
LFS is temporarily disabled, and will be re-enabled tomorrow.
|
#
32166 |
|
01-Jan-1998 |
alex |
Added missing caddr_t --> void * conversions for sys/mman.h functions.
Submitted by: bde
|
#
30740 |
|
26-Oct-1997 |
phk |
Add "NOIMPL" for syscalls we know what is, but don't implement as "STD". Use this for getfh & nfssvc.
|
#
29391 |
|
14-Sep-1997 |
phk |
Add a __getcwd() syscall. This is intentionally undocumented, but all it does is to try to figure the pwd out from the vfs namecache, and return a reversed string to it. libc:getcwd() is responsible for flipping it back.
|
#
29348 |
|
14-Sep-1997 |
peter |
Activate poll(2) syscall
|
#
28399 |
|
19-Aug-1997 |
peter |
SVR4/XPG-style getpgid()/getsid() syscalls.
|
#
26671 |
|
15-Jun-1997 |
dyson |
Modifications to existing files to support the initial AIO/LIO and kernel based threading support.
|
#
26333 |
|
01-Jun-1997 |
peter |
New syscall, signanosleep(), which is a hybrid of sigsuspend(2) and nanosleep(2). It sleeps until either the time expires, or a signal permitted by the supplied mask arrives (eg: SIGALRM if appropriate)
|
#
25581 |
|
08-May-1997 |
peter |
oops. NODIDE -> NOHIDE
|
#
25580 |
|
08-May-1997 |
peter |
Define entries for the posix-style clock/timer syscalls including nanosleep(). Also, note some syscall conflicts with other systems and indicate slots tagged for use with other syscalls some day.
|
#
25537 |
|
07-May-1997 |
dfr |
This is the kernel linker. To use it, you will first need to apply the patches in freefall:/home/dfr/ld.diffs to your ld sources and set BINFORMAT to aoutkld when linking the kernel.
Library changes and userland utilities will appear in a later commit.
|
#
24451 |
|
31-Mar-1997 |
peter |
issetugid is now implemented rather than reserved
|
#
24439 |
|
31-Mar-1997 |
peter |
Reserve 252 (poll, first in OpenBSD) Reserve 253 (issetugid, as in OpenBSD) Allocate 254 for lchown(2)
|
#
22975 |
|
22-Feb-1997 |
peter |
Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
|
#
22521 |
|
10-Feb-1997 |
dyson |
This is the kernel Lite/2 commit. There are some requisite userland changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes.
The system boots and can mount UFS filesystems.
Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed.
Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>
|
#
21776 |
|
16-Jan-1997 |
bde |
Reduced #include spam in <sys/sysproto.h> and fixed things that depended on it.
makesyscalls.sh: This parsed $Id$. Fixed(?) to parse $FreeBSD$. The output is wrong when the id is not expanded in the source file.
syscalls.master: Fixed declaration of sigsuspend(). There are still some bogons and spam involving sigset_t. Use `struct foo *' instead of the equivalent `foo_t *' for some nfs and lfs syscalls so that <sys/sysproto.h> doesn't depend on <sys/mount.h>.
|
#
21673 |
|
14-Jan-1997 |
jkh |
Make the long-awaited change from $Id$ to $FreeBSD$
This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
|
#
18398 |
|
19-Sep-1996 |
phk |
Add the utrace(caddr_t addr,size_t len) syscall, that will store the data pointed at in a ktrace file, if this process is being ktrace'ed. I'm using this to profile malloc usage. The advantage is that there is no context around this call, ie, no open file or socket, so it will work in any process, and you can decide if you want it to collect data or not.
|
#
17702 |
|
20-Aug-1996 |
smpatel |
Remove the kernel FD_SETSIZE limit for select(). Make select()'s first argument 'int' not 'u_int'.
Reviewed by: bde
|
#
14322 |
|
02-Mar-1996 |
peter |
Change the 'int len' args in the mmap/msync/mincore/etc class syscalls to 'size_t' as per bde's request.
|
#
14219 |
|
23-Feb-1996 |
peter |
Add hooks for rfork/minherit pair, and reset args of vfork in preperation for adding the syscalls.
|
#
14215 |
|
23-Feb-1996 |
peter |
Note the syscall numbers used in BSD/OS 2.x. We dont want to accidently use one of these ourselves as it'd make it harder to run their binaries. Also, remove the now-defunct #include "opt_sysvipc.h".
|
#
13416 |
|
13-Jan-1996 |
phk |
Add an option NFS_NOSERVER which saves 100K in the install kernel (or any other kernel that uses it). Use with option NFS.
|
#
13331 |
|
08-Jan-1996 |
peter |
Remove the #ifdef SYSVSHM etc. Always call the functions, some stubs are about to go in. This is to fix the problem with the ibcs2 and linux lkm's not being able to call the sysv ipc functions unless the build is modified.
|
#
13226 |
|
04-Jan-1996 |
wollman |
Convert SYSV IPC to new-style options. (I hope I got everything...) The LKMs will need an extra file, to come later.
|
#
13203 |
|
03-Jan-1996 |
wollman |
Converted two options over to the new scheme: USER_LDT and KTRACE.
|
#
12864 |
|
15-Dec-1995 |
peter |
Add the direct sysv shm/sem/msg system calls, in the same way as NetBSD. This costs very little, we gain prototypes for the calls from the linux emulator, and this is one less thing in the way of NetBSD binary support.
|
#
12216 |
|
12-Nov-1995 |
bde |
Fixed the args list for mount(). We're not ready for the BSD4.4lite2/ NetBSD interface.
Increased the bogusness of the args list for mmap(). The args lists for most of the memory mapping functions are bogus. The args lists in syscalls.master are a little better than the ones in the args structs currently being used, but the improvement for mmap() changed the object code and I don't want to worry about that now.
Increased the bogusness of the args list for fcntl. BSD4.4lite2/NetBSD uses `void *' instead of int for the third arg. This has the advantage of working when `void *'s are longer than ints, but requires extra bogus casts that I hope to avoid.
Fixed the args list for uname. `struct outsname' seems to be a typo, not an old interface.
Added comments about bogus args lists for open, mount, msync, munmap, mprotect, madvise, mincore, fcntl, semsys, msgsys and shmsys.
|
#
11330 |
|
07-Oct-1995 |
swallace |
Fix misc formatting errors in makesyscalls.sh.
Add CPT_NOA type which is COMPAT with NOARGS -- do not produce argument struct in sysproto.
Change accept, recvfrom, getsockname to CPT_NOA type. Fix getrlimit, setrlimit argument #2 name to struct rlimit.
|
#
11294 |
|
07-Oct-1995 |
swallace |
Add new functionality to makesyscalls.sh: o optional config-file to set vars: sysnames, sysproto, sysproto_h, syshdr, syssw, syshide, syscallprefix, switchname, namesname, sysvec. o change syntax of syscalls.master entry: remove argument count. add pseudo-prototype field defining function name and arguments. o generates correct structure definitions for all system calls in sys/sysproto.h o add type NOARGS: same as STD except do not create structure in sys/sysproto.h o add type NOPROTO: same as STD except do not create structure or function prototype in sys/sysproto.h
New functionality provides complete prototype definitions. Usefull for generating files for emulated systems like my new ibcs2 code.
Update syscalls.master to reflect new changes. For example, read() entry now looks like:
3 STD POSIX { int ibcs2_read(int fd, char *buf, u_int nbytes); }
This is similar to how NetBSD generates these files.
|
#
10905 |
|
19-Sep-1995 |
bde |
Generate prototypes for syscall-implementing functions. Put them in <sys/sysproto.h> and use them (so far only) in kern/init_sysent.c.
Don't put $Id in generated files.
kern/syscalls.master: I had to add some new fields to describe some non-orthogonal names. E.g., the args struct for the syscall-implementing function foo() is usually named `foo_args', but for getpid() it is named `args'.
sys/sysent.h: sy_call_t is still incomplete to hide a couple of warnings.
|
#
8019 |
|
23-Apr-1995 |
ache |
Make setreuid/setregid active syscalls
|
#
7359 |
|
25-Mar-1995 |
dg |
Added a third "flags" argument to msync() ...as other systems have.
|
#
6875 |
|
04-Mar-1995 |
dg |
Removed obsolete vtrace() remnants.
|
#
5107 |
|
14-Dec-1994 |
wollman |
Actually enable NTP kernel PLL. (Oops!) Noticed by Pete Carah.
|
#
3291 |
|
02-Oct-1994 |
dg |
"idle priority" support. Based on code from Henrik Vestergaard Draboel, but substantially rewritten by me.
|
#
3178 |
|
28-Sep-1994 |
wollman |
LKM support is no longer optional.
|
#
2858 |
|
18-Sep-1994 |
wollman |
Redo Kernel NTP PLL support, kernel side.
This code is mostly taken from the 1.1 port (which was in turn taken from Dave Mills's kern.tar.Z example). A few significant differences:
1) ntp_gettime() is now a MIB variable rather than a system call. A few fiddles are done in libc to make it behave the same.
2) mono_time does not participate in the PLL adjustments.
3) A new interface has been defined (in <machine/clock.h>) for doing possibly machine-dependent things around the time of the clock update. This is used in Pentium kernels to disable interrupts, set `time', and reset the CPU cycle counter as quickly as possible to avoid jitter in microtime(). Measurements show an apparent resolution of a bit more than 8.14usec, which is reasonable given system-call overhead.
|
#
2729 |
|
13-Sep-1994 |
dfr |
Added SYSV ipcs.
Obtained from: NetBSD and FreeBSD-1.1.5
|
#
2696 |
|
12-Sep-1994 |
wollman |
Added namespace information for future pollution-control measures.
|
#
2441 |
|
01-Sep-1994 |
dg |
Realtime priority scheduling support.
Submitted by: Henrik Vestergaard Draboel
|
#
2297 |
|
26-Aug-1994 |
wollman |
Added ntp_gettime and ntp_adjtime syscalls, both nosys'ed out until someone gets to re-integrating the code. ntp_gettime() should be turned into a sysctl variable and emulated in the library.
|
#
2124 |
|
19-Aug-1994 |
dg |
Terry Lambert's loadable kernel module support w/improvements from the NetBSD group.
|
#
1817 |
|
02-Aug-1994 |
dg |
Added $Id$
|
#
1549 |
|
25-May-1994 |
rgrimes |
The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.
Reviewed by: Rodney W. Grimes Submitted by: John Dyson and David Greenman
|
#
1542 |
|
24-May-1994 |
rgrimes |
This commit was generated by cvs2svn to compensate for changes in r1541, which included commits to RCS files with non-trunk default branches.
|
#
1541 |
|
24-May-1994 |
rgrimes |
BSD 4.4 Lite Kernel Sources
|