#
259065 |
|
07-Dec-2013 |
gjb |
- Copy stable/10 (r259064) to releng/10.0 as part of the 10.0-RELEASE cycle. - Update __FreeBSD_version [1] - Set branch name to -RC1
[1] 10.0-CURRENT __FreeBSD_version value ended at '55', so start releng/10.0 at '100' so the branch is started with a value ending in zero.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation |
#
256281 |
|
10-Oct-2013 |
gjb |
Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
|
#
255708 |
|
19-Sep-2013 |
jhb |
Extend the support for exempting processes from being killed when swap is exhausted. - Add a new protect(1) command that can be used to set or revoke protection from arbitrary processes. Similar to ktrace it can apply a change to all existing descendants of a process as well as future descendants. - Add a new procctl(2) system call that provides a generic interface for control operations on processes (as opposed to the debugger-specific operations provided by ptrace(2)). procctl(2) uses a combination of idtype_t and an id to identify the set of processes on which to operate similar to wait6(). - Add a PROC_SPROTECT control operation to manage the protection status of a set of processes. MADV_PROTECT still works for backwards compatability. - Add a p_flag2 to struct proc (and a corresponding ki_flag2 to kinfo_proc) the first bit of which is used to track if P_PROTECT should be inherited by new child processes.
Reviewed by: kib, jilles (earlier version) Approved by: re (delphij) MFC after: 1 month
|
#
255675 |
|
18-Sep-2013 |
rdivacky |
Revert r255672, it has some serious flaws, leaking file references etc.
Approved by: re (delphij)
|
#
255672 |
|
18-Sep-2013 |
rdivacky |
Implement epoll support in Linuxulator. This is a tiny wrapper around kqueue to implement epoll subset of functionality. The kqueue user data are 32bit on i386 which is not enough for epoll user data so this patch overrides kqueue fileops to maintain enough space in struct file.
Initial patch developed by me in 2007 and then extended and finished by Yuri Victorovich.
Approved by: re (delphij) Sponsored by: Google Summer of Code Submitted by: Yuri Victorovich <yuri at rawbw dot com> Tested by: Yuri Victorovich <yuri at rawbw dot com>
|
#
254481 |
|
18-Aug-2013 |
pjd |
Implement 32bit versions of the cap_ioctls_limit(2) and cap_ioctls_get(2) system calls as unsigned longs have different size on i386 and amd64.
Reported by: jilles Sponsored by: The FreeBSD Foundation
|
#
253530 |
|
21-Jul-2013 |
kib |
Implement compat32 wrappers for the ktimer_* syscalls.
Reported, reviewed and tested by: Petr Salinger <Petr.Salinger@seznam.cz> Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
253494 |
|
20-Jul-2013 |
kib |
id_t is 64bit, provide the compat32 wrapper for clock_getcpuclockid2(2).
Reported and tested by: Petr Salinger <Petr.Salinger@seznam.cz> PR: threads/180652 Sponsored by: The FreeBSD Foundation
|
#
250154 |
|
01-May-2013 |
jilles |
Add accept4() system call.
The accept4() function, compared to accept(), allows setting the new file descriptor atomically close-on-exec and explicitly controlling the non-blocking status on the new socket. (Note that the latter point means that accept() is not equivalent to any form of accept4().)
The linuxulator's accept4 implementation leaves a race window where the new file descriptor is not close-on-exec because it calls sys_accept(). This implementation leaves no such race window (by using falloc() flags). The linuxulator could be fixed and simplified by using the new code.
Like accept(), accept4() is async-signal-safe, a cancellation point and permitted in capability mode.
|
#
248951 |
|
31-Mar-2013 |
jilles |
Rename do_pipe() to kern_pipe2() and declare it properly.
|
#
243135 |
|
16-Nov-2012 |
kib |
Move the definition of the idtype_t from sys/types.h to sys/wait.h. Fix the bug, use #if __BSD_VISIBLE instead of #if defined(__BSD_VISIBLE), since __BSD_VISIBLE is always defined. Reformat the comments from the Solaris style to KNF.
Reported and reviewed by: bde MFC after: 28 days
|
#
243134 |
|
16-Nov-2012 |
kib |
Alphabetically reorder the forward-declarations of the structures. Add the declaration for enum idtype, to be used later.
Reported and reviewed by: bde MFC after: 28 days
|
#
242958 |
|
13-Nov-2012 |
kib |
Add the wait6(2) system call. It takes POSIX waitid()-like process designator to select a process which is waited for. The system call optionally returns siginfo_t which would be otherwise provided to SIGCHLD handler, as well as extended structure accounting for child and cumulative grandchild resource usage.
Allow to get the current rusage information for non-exited processes as well, similar to Solaris.
The explicit WEXITED flag is required to wait for exited processes, allowing for more fine-grained control of the events the waiter is interested in.
Fix the handling of siginfo for WNOWAIT option for all wait*(2) family, by not removing the queued signal state.
PR: standards/170346 Submitted by: "Jukka A. Ukkonen" <jau@iki.fi> MFC after: 1 month
|
#
235886 |
|
24-May-2012 |
gleb |
Add kern_fhstat(), adjust sys_fhstat() to use it.
Extend kern_getdirentries() to accept uio segflag and optionally return buffer residue.
Sponsored by: Google Summer of Code 2011
|
#
227502 |
|
14-Nov-2011 |
jhb |
- Split out a kern_posix_fadvise() from the posix_fadvise() system call so it can be used by in-kernel consumers. - Make kern_posix_fallocate() public. - Use kern_posix_fadvise() and kern_posix_fallocate() to implement the freebsd32 wrappers for the two system calls.
|
#
220238 |
|
01-Apr-2011 |
kib |
Add support for executing the FreeBSD 1/i386 a.out binaries on amd64.
In particular: - implement compat shims for old stat(2) variants and ogetdirentries(2); - implement delivery of signals with ancient stack frame layout and corresponding sigreturn(2); - implement old getpagesize(2); - provide a user-mode trampoline and LDT call gate for lcall $7,$0; - port a.out image activator and connect it to the build as a module on amd64.
The changes are hidden under COMPAT_43.
MFC after: 1 month
|
#
220158 |
|
30-Mar-2011 |
kib |
Provide compat32 shims for kldstat(2).
Requested and tested by: jpaetzel MFC after: 1 week
|
#
209613 |
|
30-Jun-2010 |
jhb |
Move prototypes for kern_sigtimedwait() and kern_sigprocmask() to <sys/syscallsubr.h> where all other kern_<syscall> prototypes live.
|
#
202113 |
|
11-Jan-2010 |
mckusick |
Background:
When renaming a directory it passes through several intermediate states. First its new name will be created causing it to have two names (from possibly different parents). Next, if it has different parents, its value of ".." will be changed from pointing to the old parent to pointing to the new parent. Concurrently, its old name will be removed bringing it back into a consistent state. When fsck encounters an extra name for a directory, it offers to remove the "extraneous hard link"; when it finds that the names have been changed but the update to ".." has not happened, it offers to rewrite ".." to point at the correct parent. Both of these changes were considered unexpected so would cause fsck in preen mode or fsck in background mode to fail with the need to run fsck manually to fix these problems. Fsck running in preen mode or background mode now corrects these expected inconsistencies that arise during directory rename. The functionality added with this update is used by fsck running in background mode to make these fixes.
Solution:
This update adds three new fsck sysctl commands to support background fsck in correcting expected inconsistencies that arise from incomplete directory rename operations. They are:
setcwd(dirinode) - set the current directory to dirinode in the filesystem associated with the snapshot. setdotdot(oldvalue, newvalue) - Verify that the inode number for ".." in the current directory is oldvalue then change it to newvalue. unlink(nameptr, oldvalue) - Verify that the inode number associated with nameptr in the current directory is oldvalue then unlink it.
As with all other fsck sysctls, these new ones may only be used by processes with appropriate priviledge.
Reported by: jeff Security issues: rwatson
|
#
198508 |
|
27-Oct-2009 |
kib |
Current pselect(3) is implemented in usermode and thus vulnerable to well-known race condition, which elimination was the reason for the function appearance in first place. If sigmask supplied as argument to pselect() enables a signal, the signal might be delivered before thread called select(2), causing lost wakeup. Reimplement pselect() in kernel, making change of sigmask and sleep atomic.
Since signal shall be delivered to the usermode, but sigmask restored, set TDP_OLDMASK and save old mask in td_oldsigmask. The TDP_OLDMASK should be cleared by ast() in case signal was not gelivered during syscall execution.
Reviewed by: davidxu Tested by: pho MFC after: 1 month
|
#
198506 |
|
27-Oct-2009 |
kib |
In kern_sigsuspend(), better manipulate thread signal mask using kern_sigprocmask() to properly notify other possible candidate threads for signal delivery.
Since sigsuspend() shall only return to usermode after a signal was delivered, do cursig/postsig loop immediately after waiting for signal, repeating the wait if wakeup was spurious due to race with other thread fetching signal from the process queue before us. Add thread_suspend_check() call to allow the thread to be stopped or killed while in loop.
Modify last argument of kern_sigprocmask() from boolean to flags, allowing the function to be called with locked proc. Convertion of the callers that supplied 1 to the old argument will be done in the next commit, and due to SIGPROCMASK_OLD value equial to 1, code is formally correct in between.
Reviewed by: davidxu Tested by: pho MFC after: 1 month
|
#
197049 |
|
09-Sep-2009 |
kib |
kern_select(9) copies fd_set in and out of userspace in quantities of longs. Since 32bit processes longs are 4 bytes, 64bit kernel may copy in or out 4 bytes more then the process expected.
Calculate the amount of bytes to copy taking into account size of fd_set for the current process ABI.
Diagnosed and tested by: Peter Jeremy <peterjeremy acm org> Reviewed by: jhb MFC after: 1 week
|
#
195458 |
|
08-Jul-2009 |
trasz |
There is an optimization in chmod(1), that makes it not to call chmod(2) if the new file mode is the same as it was before; however, this optimization must be disabled for filesystems that support NFSv4 ACLs. Chmod uses pathconf(2) to determine whether this is the case - however, pathconf(2) always follows symbolic links, while the 'chmod -h' doesn't.
This change adds lpathconf(3) to make it possible to solve that problem in a clean way.
Reviewed by: rwatson (earlier version) Approved by: re (kib)
|
#
193167 |
|
31-May-2009 |
dchagin |
Split native socketpair() syscall onto kern_socketpair() which should be used by kernel consumers and socketpair() itself.
Approved by: kib (mentor) MFC after: 1 month
|
#
192895 |
|
27-May-2009 |
jamie |
Add hierarchical jails. A jail may further virtualize its environment by creating a child jail, which is visible to that jail and to any parent jails. Child jails may be restricted more than their parents, but never less. Jail names reflect this hierarchy, being MIB-style dot-separated strings.
Every thread now points to a jail, the default being prison0, which contains information about the physical system. Prison0's root directory is the same as rootvnode; its hostname is the same as the global hostname, and its securelevel replaces the global securelevel. Note that the variable "securelevel" has actually gone away, which should not cause any problems for code that properly uses securelevel_gt() and securelevel_ge().
Some jail-related permissions that were kept in global variables and set via sysctls are now per-jail settings. The sysctls still exist for backward compatibility, used only by the now-deprecated jail(2) system call.
Approved by: bz (mentor)
|
#
191673 |
|
29-Apr-2009 |
jamie |
Introduce the extensible jail framework, using the same "name=value" interface as nmount(2). Three new system calls are added: * jail_set, to create jails and change the parameters of existing jails. This replaces jail(2). * jail_get, to read the parameters of existing jails. This replaces the security.jail.list sysctl. * jail_remove to kill off a jail's processes and remove the jail. Most jail parameters may now be changed after creation, and jails may be set to exist without any attached processes. The current jail(2) system call still exists, though it is now a stub to jail_set(2).
Approved by: bz (mentor)
|
#
188849 |
|
20-Feb-2009 |
ed |
Don't make Linux stat() open character devices to resolve its name.
The existing code calls kern_open() to resolve the vnode of a pathname right after a stat(). This is not correct, because it causes random character devices to be opened in /dev. This means ls'ing a tape streamer will cause it to rewind, for example. Changes I have made:
- Add kern_statat_vnhook() to allow binary emulators to `post-process' struct stat, using the proper vnode.
- Remove unneeded printf's from stat() and statfs().
- Make the Linuxolator use kern_statat_vnhook(), replacing translate_path_major_minor_at().
- Let translate_fd_major_minor() use vp->v_rdev instead of vp->v_un.vu_cdev.
Result:
crw-rw-rw- 1 root root 0, 14 Feb 20 13:54 /dev/ptmx crw--w---- 1 root adm 136, 0 Feb 20 14:03 /dev/pts/0 crw--w---- 1 root adm 136, 1 Feb 20 14:02 /dev/pts/1 crw--w---- 1 ed tty 136, 2 Feb 20 14:03 /dev/pts/2
Before this commit, ptmx also had a major number of 136, because it silently allocated and deallocated a pseudo-terminal. Device nodes that cannot be opened now have proper major/minor-numbers.
Reviewed by: kib, netchild, rdivacky (thanks!)
|
#
184849 |
|
11-Nov-2008 |
ed |
Several cleanups related to pipe(2).
- Use `fildes[2]' instead of `*fildes' to make more clear that pipe(2) fills an array with two descriptors.
- Remove EFAULT from the manual page. Because of the current calling convention, pipe(2) raises a segmentation fault when an invalid address is passed.
- Introduce kern_pipe() to make it easier for binary emulations to implement pipe(2).
- Make Linux binary emulation use kern_pipe(), which means we don't have to recover td_retval after calling the FreeBSD system call.
Approved by: rdivacky Discussed on: arch
|
#
184183 |
|
22-Oct-2008 |
jhb |
Split the copyout of *base at the end of getdirentries() out leaving the rest in kern_getdirentries(). Use kern_getdirentries() to implement freebsd32_getdirentries(). This fixes a bug where calls to getdirentries() in 32-bit binaries would trash the 4 bytes after the 'long base' in userland.
Submitted by: ups MFC after: 1 week
|
#
177997 |
|
08-Apr-2008 |
kib |
Implement the linux syscalls openat, mkdirat, mknodat, fchownat, futimesat, fstatat, unlinkat, renameat, linkat, symlinkat, readlinkat, fchmodat, faccessat.
Submitted by: rdivacky Sponsored by: Google Summer of Code 2007 Tested by: pho
|
#
177786 |
|
31-Mar-2008 |
kib |
Implement the openat(2), faccessat(2), fchmodat(2), fchownat(2), fstatat(2), futimesat(2), linkat(2), mkdirat(2), mkfifoat(2), mknodat(2), readlinkat(2), renameat(2), symlinkat(2) syscalls.
Based on the submission by rdivacky, sponsored by Google Summer of Code 2007 Reviewed by: rwatson, rdivacky Tested by: pho
|
#
176215 |
|
12-Feb-2008 |
ru |
Change readlink(2)'s return type and type of the last argument to match POSIX.
Prodded by: Alexey Lyashkov
|
#
175140 |
|
07-Jan-2008 |
jhb |
Make ftruncate a 'struct file' operation rather than a vnode operation. This makes it possible to support ftruncate() on non-vnode file types in the future. - 'struct fileops' grows a 'fo_truncate' method to handle an ftruncate() on a given file descriptor. - ftruncate() moves to kern/sys_generic.c and now just fetches a file object and invokes fo_truncate(). - The vnode-specific portions of ftruncate() move to vn_truncate() in vfs_vnops.c which implements fo_truncate() for vnode file types. - Non-vnode file types return EINVAL in their fo_truncate() method.
Submitted by: rwatson
|
#
170404 |
|
07-Jun-2007 |
jhb |
- Remove unused variable from create_thread(). - Move kern_thr_*() prototype to <sys/syscallsubr.h> where all the other kern_*() prototypes live.
|
#
165403 |
|
20-Dec-2006 |
jkim |
MFP4: (part of) 110058
copyin()/copyout() for message type is separated from msgsnd()/msgrcv() and it is done from its wrapper functions to support 32-bit emulations. After I implemented this, I have briefly referenced NetBSD and Darwin. NetBSD passes copyin()/copyout() function pointers from wrappers. Darwin passes size of message type as an argument, which is actually similar to my first implementation (P4 109706). We may revisit these implementations later.
|
#
160765 |
|
27-Jul-2006 |
jhb |
Fix a file descriptor race I reintroduced when I split accept1() up into kern_accept() and accept1(). If another thread closed the new file descriptor and the first thread later got an error trying to copyout the socket address, then it would attempt to close the wrong file object. To fix, add a struct file ** argument to kern_accept(). If it is non-NULL, then on success kern_accept() will store a pointer to the new file object there and not release any of the references. It is up to the calling code to drop the references appropriately (including a call to fdclose() in case of error to safely handle the aforementioned race). While I'm at it, go ahead and fix the svr4 streams code to not leak the accept fd if it gets an error trying to copyout the streams structures.
|
#
160249 |
|
10-Jul-2006 |
jhb |
- Split out kern_accept(), kern_getpeername(), and kern_getsockname() for use by ABI emulators. - Alter the interface of kern_recvit() somewhat. Specifically, go ahead and hard code UIO_USERSPACE in the uio as that's what all the callers specify. In place, add a new uioseg to indicate what type of pointer is in mp->msg_name. Previously it was always a userland address, but ABI emulators may pass in kernel-side sockaddrs. Also, remove the namelenp field and instead require the two places that used it to explicitly copy mp->msg_namelen out to userland. - Use the patched kern_recvit() to replace svr4_recvit() and the stock kern_sendit() to replace svr4_sendit(). - Use kern_bind() instead of stackgap use in ti_bind(). - Use kern_getpeername() and kern_getsockname() instead of stackgap in svr4_stream_ti_ioctl(). - Use kern_connect() instead of stackgap in svr4_do_putmsg(). - Use kern_getpeername() and kern_accept() instead of stackgap in svr4_do_getmsg(). - Retire the stackgap from SVR4 compat as it is no longer used.
|
#
160192 |
|
08-Jul-2006 |
jhb |
- Split ioctl() up into ioctl() and kern_ioctl(). The kern_ioctl() assumes that the 'data' pointer is already setup to point to a valid KVM buffer or contains the copied-in data from userland as appropriate (ioctl(2) still does this). kern_ioctl() takes care of looking up a file pointer, implementing FIONCLEX and FIOCLEX, and calling fi_ioctl(). - Use kern_ioctl() to implement xenix_rdchk() instead of using the stackgap and mark xenix_rdchk() MPSAFE.
|
#
160190 |
|
08-Jul-2006 |
jhb |
Add a kern_close() so that the ABIs can close a file descriptor w/o having to populate a close_args struct and change some of the places that do.
|
#
160187 |
|
08-Jul-2006 |
jhb |
Rework kern_semctl a bit to always assume the UIO_SYSSPACE case. This mostly consists of pushing a few copyin's and copyout's up into __semctl() as all the other callers were already doing the UIO_SYSSPACE case. This also changes kern_semctl() to set the return value in a passed in pointer to a register_t rather than td->td_retval[0] directly so that callers can only set td->td_retval[0] if all the various copyout's succeed.
As a result of these changes, kern_semctl() no longer does copyin/copyout (except for GETALL/SETALL) so simplify the locking to acquire the semakptr mutex before the MAC check and hold it all the way until the end of the big switch statement. The GETALL/SETALL cases have to temporarily drop it while they do copyin/malloc and copyout. Also, simplify the SETALL case to remove handling for a non-existent race condition.
|
#
160139 |
|
06-Jul-2006 |
jhb |
Add kern_setgroups() and kern_getgroups() and use them to implement ibcs2_[gs]etgroups() rather than using the stackgap. This also makes ibcs2_[gs]etgroups() MPSAFE. Also, it cleans up one bit of weirdness in the old setgroups() where it allocated an entire credential just so it had a place to copy the group list into. Now setgroups just allocates a NGROUPS_MAX array on the stack that it copies into and then passes to kern_setgroups().
|
#
159991 |
|
27-Jun-2006 |
jhb |
- Add a kern_semctl() helper function for __semctl(). It accepts a pointer to a copied-in copy of the 'union semun' and a uioseg to indicate which memory space the 'buf' pointer of the union points to. This is then used in linux_semctl() and svr4_sys_semctl() to eliminate use of the stackgap. - Mark linux_ipc() and svr4_sys_semsys() MPSAFE.
|
#
159588 |
|
13-Jun-2006 |
jhb |
- Add a kern_kldload() that is most of the previous kldload() and push Giant down in it. - Push Giant down in kern_kldunload() and reorganize it slightly to avoid using gotos. Also, expose this function to the rest of the kernel.
|
#
156114 |
|
28-Feb-2006 |
ps |
Fix 32bit sendfile by implementing kern_sendfile so that it takes the header and trailers as iovec arguments instead of copying them in inside of sendfile.
Reviewed by: jhb MFC after: 3 weeks
|
#
155401 |
|
06-Feb-2006 |
jhb |
Add a kern_eaccess() function and use it to implement xenix_eaccess() rather than kern_access().
Suggested by: rwatson
|
#
151909 |
|
31-Oct-2005 |
ps |
Reformat socket control messages on input/output for 32bit compatibility on 64bit systems.
Submitted by: ps, ups Reviewed by: jhb
|
#
151359 |
|
15-Oct-2005 |
ps |
Implement the 32bit versions of recvmsg, recvfrom, sendmsg
Partially obtained from: jhb
|
#
151357 |
|
15-Oct-2005 |
ps |
Implement 32bit wrappers for clock_gettime, clock_settime, and clock_getres.
|
#
147813 |
|
07-Jul-2005 |
jhb |
- Add two new system calls: preadv() and pwritev() which are like readv() and writev() except that they take an additional offset argument and do not change the current file position. In SAT speak: preadv:readv::pread:read and pwritev:writev::pwrite:write. - Try to reduce code duplication some by merging most of the old kern_foov() and dofilefoo() functions into new dofilefoo() functions that are called by kern_foov() and kern_pfoov(). The non-v functions now all generate a simple uio on the stack from the passed in arguments and then call kern_foov(). For example, read() now just builds a uio and calls kern_readv() and pwrite() just builds a uio and calls kern_pwritev().
PR: kern/80362 Submitted by: Marc Olzheim marcolz at stack dot nl (1) Approved by: re (scottl) MFC after: 1 week
|
#
147302 |
|
11-Jun-2005 |
pjd |
Do not allocate memory based on not-checked argument from userland. It can be used to panic the kernel by giving too big value. Fix it by moving allocation and size verification into kern_getfsstat(). This even simplifies kern_getfsstat() consumers, but destroys symmetry - memory is allocated inside kern_getfsstat(), but has to be freed by the caller.
Found by: FreeBSD Kernel Stress Test Suite: http://www.holm.cc/stress/ Reported by: Peter Holm <peter@holm.cc>
|
#
147178 |
|
09-Jun-2005 |
pjd |
Avoid code duplication in serval places by introducing universal kern_getfsstat() function.
Obtained from: jhb
|
#
146950 |
|
03-Jun-2005 |
ps |
Wrap copyin/copyout for kevent so the 32bit wrapper does not have to malloc nchanges * sizeof(struct kevent) AND/OR nevents * sizeof(struct kevent) on every syscall.
Glanced at by: peter, jmg Obtained from: Yahoo! MFC after: 2 weeks
|
#
144445 |
|
31-Mar-2005 |
jhb |
Implement kern_adjtime(), kern_readv(), kern_sched_rr_get_interval(), kern_settimeofday(), and kern_writev() to allow for further stackgap reduction in the compat ABIs.
|
#
142933 |
|
01-Mar-2005 |
ps |
regen
|
#
141815 |
|
13-Feb-2005 |
sobomax |
Backout previous change (disabling of security checks for signals delivered in emulation layers), since it appears to be too broad.
Requested by: rwatson
|
#
141812 |
|
13-Feb-2005 |
sobomax |
Split out kill(2) syscall service routine into user-level and kernel part, the former is callable from user space and the latter from the kernel one. Make kernel version take additional argument which tells if the respective call should check for additional restrictions for sending signals to suid/sugid applications or not.
Make all emulation layers using non-checked version, since signal numbers in emulation layers can have different meaning that in native mode and such protection can cause misbehaviour.
As a result remove LIBTHR from the signals allowed to be delivered to a suid/sugid application.
Requested (sorta) by: rwatson MFC after: 2 weeks
|
#
141484 |
|
07-Feb-2005 |
jhb |
Implement a kern_pathconf() wrapper for pathconf() which can take the filename from either a user space or a kernel space pointer.
|
#
141471 |
|
07-Feb-2005 |
jhb |
- Tweak kern_msgctl() to return a copy of the requested message queue id structure in the struct pointed to by the 3rd argument for IPC_STAT and get rid of the 4th argument. The old way returned a pointer into the kernel array that the calling function would then access afterwards without holding the appropriate locks and doing non-lock-safe things like copyout() with the data anyways. This change removes that unsafeness and resulting race conditions as well as simplifying the interface. - Implement kern_foo wrappers for stat(), lstat(), fstat(), statfs(), fstatfs(), and fhstatfs(). Use these wrappers to cut out a lot of code duplication for freebsd4 and netbsd compatability system calls. - Add a new lookup function kern_alternate_path() that looks up a filename under an alternate prefix and determines which filename should be used. This is basically a more general version of linux_emul_convpath() that can be shared by all the ABIs thus allowing for further reduction of code duplication.
|
#
141029 |
|
30-Jan-2005 |
sobomax |
Extend kern_sendit() to take another enum uio_seg argument, which specifies where the buffer to send lies and use it to eliminate yet another stackgap in linuxlator.
MFC after: 2 weeks
|
#
140992 |
|
29-Jan-2005 |
sobomax |
o Split out kernel part of execve(2) syscall into two parts: one that copies arguments into the kernel space and one that operates completely in the kernel space;
o use kernel-only version of execve(2) to kill another stackgap in linuxlator/i386.
Obtained from: DragonFlyBSD (partially) MFC after: 2 weeks
|
#
140839 |
|
25-Jan-2005 |
sobomax |
Split out kernel side of msgctl(2) into two parts: the first that pops data from the userland and pushes results back and the second which does actual processing. Use the latter to eliminate stackgap in the linux wrapper of that syscall.
MFC after: 2 weeks
|
#
140836 |
|
25-Jan-2005 |
sobomax |
More kern_{get,set}itiver() where they belong.
Submitted by: dwmalone MFC after: 2 weeks
|
#
140483 |
|
19-Jan-2005 |
ps |
move kern_nanosleep to sys/syscallsubr.h
Requested by: jhb
|
#
139825 |
|
07-Jan-2005 |
imp |
/* -> /*- for license, minor formatting changes
|
#
139739 |
|
05-Jan-2005 |
jhb |
- Move the function prototypes for kern_setrlimit() and kern_wait() to sys/syscallsubr.h where all the other kern_foo() prototypes live. - Resort kern_execve() while I'm there.
|
#
136221 |
|
07-Oct-2004 |
davidxu |
Add an execve command for kse_thr_interrupt to allow libpthread to restore signal mask correctly, this is required by POSIX.
Reviewed by: deischen
|
#
136152 |
|
05-Oct-2004 |
jhb |
Rework how we store process times in the kernel such that we always store the raw values including for child process statistics and only compute the system and user timevals on demand.
- Fix the various kern_wait() syscall wrappers to only pass in a rusage pointer if they are going to use the result. - Add a kern_getrusage() function for the ABI syscalls to use so that they don't have to play stackgap games to call getrusage(). - Fix the svr4_sys_times() syscall to just call calcru() to calculate the times it needs rather than calling getrusage() twice with associated stackgap, etc. - Add a new rusage_ext structure to store raw time stats such as tick counts for user, system, and interrupt time as well as a bintime of the total runtime. A new p_rux field in struct proc replaces the same inline fields from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux field in struct proc contains the "raw" child time usage statistics. ruadd() has been changed to handle adding the associated rusage_ext structures as well as the values in rusage. Effectively, the values in rusage_ext replace the ru_utime and ru_stime values in struct rusage. These two fields in struct rusage are no longer used in the kernel. - calcru() has been split into a static worker function calcru1() that calculates appropriate timevals for user and system time as well as updating the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a copy of the process' p_rux structure to compute the timevals after updating the runtime appropriately if any of the threads in that process are currently executing. It also now only locks sched_lock internally while doing the rux_runtime fixup. calcru() now only requires the caller to hold the proc lock and calcru1() only requires the proc lock internally. calcru() also no longer allows callers to ask for an interrupt timeval since none of them actually did. - calcru() now correctly handles threads executing on other CPUs. - A new calccru() function computes the child system and user timevals by calling calcru1() on p_crux. Note that this means that any code that wants child times must now call this function rather than reading from p_cru directly. This function also requires the proc lock. - This finishes the locking for rusage and friends so some of the Giant locks in exit1() and kern_wait() are now gone. - The locking in ttyinfo() has been tweaked so that a shared lock of the proctree lock is used to protect the process group rather than the process group lock. By holding this lock until the end of the function we now ensure that the process/thread that we pick to dump info about will no longer vanish while we are trying to output its info to the console.
Submitted by: bde (mostly) MFC after: 1 month
|
#
135764 |
|
24-Sep-2004 |
jhb |
Sort forward declared structures.
|
#
132313 |
|
17-Jul-2004 |
dwmalone |
Add a kern_setsockopt and kern_getsockopt which can read the option values from either user land or from the kernel. Use them for [gs]etsockopt and to clean up some calls to [gs]etsockopt in the Linux emulation code that uses the stackgap.
|
#
122088 |
|
04-Nov-2003 |
fjoe |
Back out the following revisions:
1.36 +73 -60 src/sys/compat/linux/linux_ipc.c 1.83 +102 -48 src/sys/kern/sysv_shm.c 1.8 +4 -0 src/sys/sys/syscallsubr.h
That change was intended to support vmware3, but wantrem parameter is useless because vmware3 uses SYSV shared memory to talk with X server and X server is native application. The patch worked because check for wantrem was not valid (wantrem and SHMSEG_REMOVED was never checked for SHMSEG_ALLOCATED segments).
Add kern.ipc.shm_allow_removed (integer, rw) sysctl (default 0) which when set to 1 allows to return removed segments in shm_find_segment_by_shmid() and shm_find_segment_by_shmidx().
MFC after: 1 week
|
#
114749 |
|
05-May-2003 |
dwmalone |
Split sendit into two parts. The first part, still called sendit, that does the copyin stuff and then calls the second part kern_sendit to do the hard work. Don't bother holding Giant during the copyin phase.
The intent of this is to allow the Linux emulator to impliment send* syscalls without using the stackgap.
|
#
114724 |
|
05-May-2003 |
mbr |
Change the semantics of sysv shm emulation to take a additional argument to the functions shm{at,ctl}1 and shm_find_segment_by_shmid{x}. The BSD semantics didn't allow the usage of shared segment after being marked for removal through IPC_RMID.
The patch involves the following functions: - shmat - shmctl - shm_find_segment_by_shmid - shm_find_segment_by_shmidx - linux_shmat - linux_shmctl
Submitted by: Orlando Bassotto <orlando.bassotto@ieo-research.it> Reviewed by: marcel
|
#
113685 |
|
18-Apr-2003 |
jhb |
Rename do_sigprocmask() to kern_sigprocmask() and make it a global symbol so that it can be used by binary emulators.
|
#
110294 |
|
03-Feb-2003 |
ume |
Break out the bind and connect syscalls to intend to make calling these syscalls internally easy. This is preparation for force coming IPv6 support for Linuxlator.
Submitted by: dwmalone MFC after: 10 days
|
#
105950 |
|
25-Oct-2002 |
peter |
Split 4.x and 5.x signal handling so that we can keep 4.x signal handling clean and functional as 5.x evolves. This allows some of the nasty bandaids in the 5.x codepaths to be unwound.
Encapsulate 4.x signal handling under COMPAT_FREEBSD4 (there is an anti-foot-shooting measure in place, 5.x folks need this for a while) and finish encapsulating the older stuff under COMPAT_43. Since the ancient stuff is required on alpha (longjmp(3) passes a 'struct osigcontext *' to the current sigreturn(2), instead of the 'ucontext_t *' that sigreturn is supposed to take), add a compile time check to prevent foot shooting there too. Add uniform COMPAT_43 stubs for ia64/sparc64/powerpc.
Tested on: i386, alpha, ia64. Compiled on sparc64 (a few days ago). Approved by: re
|
#
102946 |
|
04-Sep-2002 |
iedowse |
Split up ptrace() into a wrapper that does the copying to and from user space and a kern_ptrace() implementation. Use the kern_*() version in the Linux emulation code to remove more stack gap uses.
Approved by: des
|
#
102870 |
|
02-Sep-2002 |
iedowse |
Split up __getcwd so that kernel callers of the internal version can specify whether the buffer is in user or system space.
|
#
102868 |
|
02-Sep-2002 |
iedowse |
Split fcntl() into a wrapper and a kernel-callable kern_fcntl() implementation. The wrapper is responsible for copying additional structure arguments (struct flock) to and from userland.
|
#
102779 |
|
01-Sep-2002 |
iedowse |
Split out a number of mostly VFS and signal related syscalls into a kernel-internal kern_*() version and a wrapper that is called via the syscall vector table. For paths and structure pointers, the internal version either takes a uio_seg parameter or requires the caller to copyin() the data to kernel memory as appropiate. This will permit emulation layers to use these syscalls without having to copy out translated arguments to the stack gap.
Discussed on: -arch Review/suggestions: bde, jhb, peter, marcel
|