#
5b3e5c6c |
|
29-Apr-2024 |
Konstantin Belousov <kib@FreeBSD.org> |
kcmp_pget(): do not accept TIDs Otherwise pget() might still look up and hold the current process. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 days
|
#
1e01650a |
|
29-Apr-2024 |
Konstantin Belousov <kib@FreeBSD.org> |
kcmp_pget(): add an assert that we did not hold the current process Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 days
|
#
47ad4f2d |
|
04-Mar-2024 |
Kyle Evans <kevans@FreeBSD.org> |
ktrace: log genio events on failed write Visibility into the contents of the buffer when a write(2) has failed can be immensely useful in debugging IPC issues -- pushing this to discuss the idea, or maybe an alternative where we can set a flag like KTRFAC_ERRIO to enable it. When a genio event is potentially raised after an error, currently we'll just free the uio and return. However, such data can be useful when debugging communication between processes to, e.g., understand what the remote side should have grabbed before closing a pipe. Tap out the entire buffer on failure rather than simply discarding it. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D43799
|
#
b5d2165b |
|
04-Mar-2024 |
Kyle Evans <kevans@FreeBSD.org> |
kern: poll: tap out the pollfd array on successful return We do this in kern_poll() to include freebsd32 but exclude the linux compat layer. The ABI should be the same, but the POLL constants are probably different or should be assumed so. Reviewed by: bapt, jhb Differential Revision: https://reviews.freebsd.org/D44158
|
#
61cc4830 |
|
18-Jan-2024 |
Alfredo Mazzinghi <am2419@cl.cam.ac.uk> |
Abstract UIO allocation and deallocation. Introduce the allocuio() and freeuio() functions to allocate and deallocate struct uio. This hides the actual allocator interface, so it is easier to modify the sub-allocation layout of struct uio and the corresponding iovec array. Obtained from: CheriBSD Reviewed by: kib, markj MFC after: 2 weeks Sponsored by: CHaOS, EPSRC grant EP/V000292/1 Differential Revision: https://reviews.freebsd.org/D43711
|
#
f28526e9 |
|
19-Jan-2024 |
Konstantin Belousov <kib@FreeBSD.org> |
kcmp(2): implement for generic file types Reviewed by: brooks, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D43518
|
#
d8decc9a |
|
19-Jan-2024 |
Konstantin Belousov <kib@FreeBSD.org> |
Add kcmp(2) kernel bits This is based purely on reading the Linux kcmp(2) man page. In addition to the Linux set of comparators, I also added KCMP_FILEOBJ to compare underlying file' objects. Tested by: manu Reviewed by: brooks, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D43518
|
#
29363fb4 |
|
23-Nov-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove ancient SCCS tags. Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script. Sponsored by: Netflix
|
#
685dc743 |
|
16-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove $FreeBSD$: one-line .c pattern Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
|
#
7a2c93b8 |
|
14-Dec-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sockets: provide sousrsend() that does socket specific error handling Sockets have special handling for EPIPE on a write, that was spread out into several places. Treating transient errors is also special - if protocol is atomic, than we should ignore any changes to uio_resid, a transient error means the write had completely failed (see d2b3a0ed31e). - Provide sousrsend() that expects a valid uio, and leave sosend() for kernel consumers only. Do all special error handling right here. - In dofilewrite() don't do special handling of error for DTYPE_SOCKET. - For send(2), write(2) and aio_write(2) call into sousrsend() and remove error handling for kern_sendit(), soo_write() and soaio_process_job(). PR: 265087 Reported by: rz-rpi03 at h-ka.de Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35863
|
#
c6d31b83 |
|
18-Jul-2022 |
Konstantin Belousov <kib@FreeBSD.org> |
AST: rework Make most AST handlers dynamically registered. This allows to have subsystem-specific handler source located in the subsystem files, instead of making subr_trap.c aware of it. For instance, signal delivery code on return to userspace is now moved to kern_sig.c. Also, it allows to have some handlers designated as the cleanup (kclear) type, which are called both at AST and on thread/process exit. For instance, ast(), exit1(), and NFS server no longer need to be aware about UFS softdep processing. The dynamic registration also allows third-party modules to register AST handlers if needed. There is one caveat with loadable modules: the code does not make any effort to ensure that the module is not unloaded before all threads processed through AST handler in it. In fact, this is already present behavior for hwpmc.ko and ufs.ko. I do not think it is worth the efforts and the runtime overhead to try to fix it. Reviewed by: markj Tested by: emaste (arm64), pho Discussed with: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D35888
|
#
91e7bdcd |
|
25-Apr-2022 |
Dmitry Chagin <dchagin@FreeBSD.org> |
Add timespecvalid_interval macro and use it. Reviewed by: jhb, imp (early rev) Differential revision: https://reviews.freebsd.org/D34848 MFC after: 2 weeks
|
#
f17ef286 |
|
22-Feb-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
fd: rename fget*_locked to fget*_noref This gets rid of the error prone naming where fget_unlocked returns with a ref held, while fget_locked requires a lock but provides nothing in terms of making sure the file lives past unlock. No functional changes.
|
#
513c7a6e |
|
10-Feb-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
fd: make fget_unlocked take a thread argument Just like other fget routines. This enables embedding fd table pointer in struct thread, avoiding taking a trip through proc.
|
#
04c91ac4 |
|
13-Oct-2021 |
Brooks Davis <brooks@FreeBSD.org> |
selsocket: handle sopoll() errors correctly Without this change, unmounting smbfs filesystems with an INVARIANTS kernel would panic after 10e64782ed59727e8c9fe4a5c7e17f497903c8eb. Found by: markj Reviewed by: markj, jhb Obtained from: CheriBSD MFC after: 3 days Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D32492
|
#
0dc332bf |
|
05-Aug-2021 |
Ka Ho Ng <khng@FreeBSD.org> |
Add fspacectl(2), vn_deallocate(9) and VOP_DEALLOCATE(9). fspacectl(2) is a system call to provide space management support to userspace applications. VOP_DEALLOCATE(9) is a VOP call to perform the deallocation. vn_deallocate(9) is a public KPI for kmods' use. The purpose of proposing a new system call, a KPI and a VOP call is to allow bhyve or other hypervisor monitors to emulate the behavior of SCSI UNMAP/NVMe DEALLOCATE on a plain file. fspacectl(2) comprises of cmd and flags parameters to specify the space management operation to be performed. Currently cmd has to be SPACECTL_DEALLOC, and flags has to be 0. fo_fspacectl is added to fileops. VOP_DEALLOCATE(9) is added as a new VOP call. A trivial implementation of VOP_DEALLOCATE(9) is provided. Sponsored by: The FreeBSD Foundation Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D28347
|
#
cae3f9dd |
|
23-Jul-2021 |
Mark Johnston <markj@FreeBSD.org> |
select: Define select_flags[] as const MFC after: 1 week Sponsored by: The FreeBSD Foundation
|
#
e884512a |
|
10-Jun-2021 |
Dmitry Chagin <dchagin@FreeBSD.org> |
Split kern_poll() on two counterparts. The kern_poll_kfds() operates on clear kernel data, kfds points to an array in the kernel, while kern_poll() operates on user supplied pollfd. Move nfds check to kern_poll_maxfds(). No functional changes, it's for future use in the Linux emulation layer. Reviewd by: kib Differential Revision: https://reviews.freebsd.org/D30690 MFC after: 2 weeks
|
#
45e1f854 |
|
28-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
poll: use fget_unlocked or fget_only_user when feasible This follows select by eleminating the use of filedesc lock. This is a win for single-threaded processes and a mixed bag for others as at small concurrency it is faster to take the lock instead of refing/unrefing each file descriptor. Nonetheless, removal of shared lock usage is a step towards a mtx-protected fd table.
|
#
6affe1b7 |
|
28-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
select: employ fget_only_user Since most select users are single-threaded this avoid a lot of work in the common case. For example select of 16 fds (ops/s): before: 2114536 after: 2991010
|
#
7a202823 |
|
23-Dec-2020 |
Konstantin Belousov <kib@FreeBSD.org> |
Expose eventfd in the native API/ABI using a new __specialfd syscall eventfd is a Linux system call that produces special file descriptors for event notification. When porting Linux software, it is currently usually emulated by epoll-shim on top of kqueues. Unfortunately, kqueues are not passable between processes. And, as noted by the author of epoll-shim, even if they were, the library state would also have to be passed somehow. This came up when debugging strange HW video decode failures in Firefox. A native implementation would avoid these problems and help with porting Linux software. Since we now already have an eventfd implementation in the kernel (for the Linuxulator), it's pretty easy to expose it natively, which is what this patch does. Submitted by: greg@unrelenting.technology Reviewed by: markj (previous version) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D26668
|
#
10e64782 |
|
01-Dec-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
select: make sure there are no wakeup attempts after selfdfree returns Prior to the patch returning selfdfree could still be racing against doselwakeup which set sf_si = NULL and now locks stp to wake up the other thread. A sufficiently unlucky pair can end up going all the way down to freeing select-related structures before the lock/wakeup/unlock finishes. This started manifesting itself as crashes since select data started getting freed in r367714.
|
#
31b2ac4b |
|
15-Nov-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
select: replace reference counting with memory barriers in selfd Refcounting was added to combat a race between selfdfree and doselwakup, but it adds avoidable overhead. selfdfree detects it can free the object by ->sf_si == NULL, thus we can ensure that the condition only holds after all accesses are completed.
|
#
ea33cca9 |
|
04-Nov-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
poll/select: change selfd_zone into a malloc type On a sample box vmstat -z shows: ITEM SIZE LIMIT USED FREE REQ 64: 64, 0, 1043784, 4367538,3698187229 selfd: 64, 0, 1520, 13726,182729008 But at the same time: vm.uma.selfd.keg.domain.1.pages: 121 vm.uma.selfd.keg.domain.0.pages: 121 Thus 242 pages got pulled even though the malloc zone would likely accomodate the load without using extra memory.
|
#
6fed89b1 |
|
01-Sep-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
kern: clean up empty lines in .c and .h files
|
#
b1607c87 |
|
15-Jul-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
poll: factor fd lookup out of scan and rescan
|
#
d8bc2a17 |
|
15-Jul-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
fd: remove fd_lastfile It keeps recalculated way more often than it is needed. Provide a routine (fdlastfile) to get it if necessary. Consumers may be better off with a bitmap iterator instead.
|
#
a90fb6cf |
|
15-Apr-2020 |
Hans Petter Selasky <hselasky@FreeBSD.org> |
Cast all ioctl command arguments through uint32_t internally. Hide debug print showing use of sign extended ioctl command argument under INVARIANTS. The print is available to all and can easily fill up the logs. No functional change intended. MFC after: 1 week Sponsored by: Mellanox Technologies
|
#
52604ed7 |
|
03-Feb-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
fd: remove the seq argument from fget_unlocked It is almost always NULL.
|
#
3ff65f71 |
|
30-Jan-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
Remove duplicated empty lines from kern/*.c No functional changes.
|
#
2856d85e |
|
08-Jan-2020 |
Kyle Evans <kevans@FreeBSD.org> |
posix_fallocate: push vnop implementation into the fileop layer This opens the door for other descriptor types to implement posix_fallocate(2) as needed. Reviewed by: kib, bcr (manpages) Differential Revision: https://reviews.freebsd.org/D23042
|
#
dfa8dae4 |
|
05-Oct-2019 |
Mateusz Guzik <mjg@FreeBSD.org> |
devfs: plug redundant bwillwrite avoidance vn_write already checks for vnode type to see if bwillwrite should be called. This effectively reverts r244643. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21905
|
#
f1cf2b9d |
|
21-Jul-2019 |
Konstantin Belousov <kib@FreeBSD.org> |
Check and avoid overflow when incrementing fp->f_count in fget_unlocked() and fhold(). On sufficiently large machine, f_count can be legitimately very large, e.g. malicious code can dup same fd up to the per-process filedescriptors limit, and then fork as much as it can. On some smaller machine, I see kern.maxfilesperproc: 939132 kern.maxprocperuid: 34203 which already overflows u_int. More, the malicious code can create transient references by sending fds over unix sockets. I realized that this check is missed after reading https://secfault-security.com/blog/FreeBSD-SA-1902.fd.html Reviewed by: markj (previous version), mjg Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D20947
|
#
78c2a980 |
|
01-Nov-2018 |
Conrad Meyer <cem@FreeBSD.org> |
kern_poll: Restore explanatory comment removed in r177374 The comment isn't stale. The check is bogus in the sense that poll(2) does not require pollfd entries to be unique in fd space, so there is no reason there cannot be more pollfd entries than open or even allowed fds. The check is mostly a seatbelt against accidental misuse or abuse. FD_SETSIZE, while usually unrelated to poll, is used as an arbitrary floor for systems with very low kern.maxfilesperproc. Additionally, document this possible EINVAL condition in the poll.2 manual. No functional change. Reviewed by: markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D17671
|
#
2384981b |
|
26-Oct-2018 |
Conrad Meyer <cem@FreeBSD.org> |
poll: Unify userspace pollfd pointer name Some of the poll code used 'fds' and some used 'ufds' to refer to the uap->fds userspace pointer that was passed around to subroutines. Some of the poll code used 'fds' to refer to the kernel memory pollfd arrays, which seemed unnecessarily confusing. Unify on 'ufds' to refer to the userspace pollfd array. Additionally, 'bits' is not an accurate description of the kernel pollfd array in kern_poll, so rename that to 'kfds'. Finally, clean up some logic with mallocarray() and nitems(). No functional change. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D17670
|
#
d6fda03a |
|
21-Sep-2018 |
Mateusz Guzik <mjg@FreeBSD.org> |
select: stop doing zero-sized memsets Approved by: re (kib)
|
#
d5cdcc3a |
|
10-May-2018 |
Andrew Gallatin <gallatin@FreeBSD.org> |
Fix the build after r333457 In r333457, the arguments to kern_pwritev() were accidentally re-ordered as part of ANSIfication, breaking the build.
|
#
cc3c9df8 |
|
10-May-2018 |
Ed Maste <emaste@FreeBSD.org> |
ANSIfy sys_generic.c
|
#
cbd92ce6 |
|
09-May-2018 |
Matt Macy <mmacy@FreeBSD.org> |
Eliminate the overhead of gratuitous repeated reinitialization of cap_rights - Add macros to allow preinitialization of cap_rights_t. - Convert most commonly used code paths to use preinitialized cap_rights_t. A 3.6% speedup in fstat was measured with this change. Reported by: mjg Reviewed by: oshogbo Approved by: sbruno MFC after: 1 month
|
#
6469bdcd |
|
06-Apr-2018 |
Brooks Davis <brooks@FreeBSD.org> |
Move most of the contents of opt_compat.h to opt_global.h. opt_compat.h is mentioned in nearly 180 files. In-progress network driver compabibility improvements may add over 100 more so this is closer to "just about everywhere" than "only some files" per the guidance in sys/conf/options. Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h is created on all architectures. Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the set of compiled files. Reviewed by: kib, cem, jhb, jtl Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14941
|
#
63901c01 |
|
23-Feb-2018 |
Conrad Meyer <cem@FreeBSD.org> |
kern/sys_generic.c: style(9) return(foo) -> return (foo) No functional change. Sponsored by: Dell EMC Isilon
|
#
36bce27b |
|
01-Dec-2017 |
Konstantin Belousov <kib@FreeBSD.org> |
Destroy seltd st_mtx and st_wait in seltdfini(). A correct destruction is important for WITNESS(4) and LOCK_PROFILING(9). Submitted by: Sebastian Huber <sebastian.huber@embedded-brains.de> MFC after: 1 week
|
#
51369649 |
|
20-Nov-2017 |
Pedro F. Giffuni <pfg@FreeBSD.org> |
sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.
|
#
32a1fb0d |
|
10-Mar-2017 |
Mahdi Mokhtari <mmokhi@FreeBSD.org> |
Fix NULL pointer dereference and panic with shm file pread/pwrite. PR: 217429 Reported by: Tim Newsham <tim.newsham@nccgroup.trust> Reviewed by: kib Approved by: dchagin MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D9844
|
#
b38b22b0 |
|
31-Jan-2017 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Add kern_pread() and kern_pwrite(), and use it in compats instead of their sys_*() counterparts. The svr4 is left unchanged. Reviewed by: kib@ MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D9379
|
#
69a28758 |
|
15-Sep-2016 |
Ed Maste <emaste@FreeBSD.org> |
Renumber license clauses in sys/kern to avoid skipping #3
|
#
5cba398b |
|
18-Aug-2016 |
George V. Neville-Neil <gnn@FreeBSD.org> |
Remove unusedd and obsolete openbsd_poll system call. (Phase 1) Reported by: brooks Reviewed by: brooks,jhb Differential Revision: https://reviews.freebsd.org/D7548
|
#
51d1f690 |
|
10-Jul-2016 |
Robert Watson <rwatson@FreeBSD.org> |
Audit file-descriptor arguments to I/O system calls such as read(2), write(2), dup(2), and mmap(2). This auditing is not required by the Common Criteria (and hence was not being performed), but is valuable in both contemporary live analysis and forensic use cases. MFC after: 3 days Sponsored by: DARPA, AFRL
|
#
611fcff9 |
|
01-Apr-2016 |
John Baldwin <jhb@FreeBSD.org> |
Cap IOSIZE_MAX to INT_MAX for 32-bit processes. Previously, freebsd32 binaries could submit read/write requests with lengths greater than INT_MAX that a native kernel would have rejected. Reviewed by: kib Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D5788
|
#
0acf5d0b |
|
25-Feb-2016 |
Mark Johnston <markj@FreeBSD.org> |
Improve error handling for posix_fallocate(2) and posix_fadvise(2). - Set td_errno so that ktrace and dtrace can obtain the syscall error number in the usual way. - Pass negative error numbers directly to the syscall layer, as they're not intended to be returned to userland. Reviewed by: kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D5425
|
#
fcb5b3a4 |
|
09-Jul-2015 |
Konstantin Belousov <kib@FreeBSD.org> |
Cover a race between doselwakeup() and selfdfree(). If doselwakeup() loop finds the selfd entry and clears its sf_si pointer, which is handled by selfdfree() in parallel, NULL sf_si makes selfdfree() free the memory. The result is the race and accesses to the freed memory. Refcount the selfd ownership. One reference is for the sf_link linkage, which is unconditionally dereferenced by selfdfree(). Another reference is for sf_threads, both selfdfree() and doselwakeup() race to deref it, the winner unlinks and than frees the selfd entry. Reported by: Larry Rosenman <ler@lerctr.org> Tested by: Larry Rosenman <ler@lerctr.org>, pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
|
#
0538aafc |
|
18-Apr-2015 |
Konstantin Belousov <kib@FreeBSD.org> |
The lseek(2), mmap(2), truncate(2), ftruncate(2), pread(2), and pwrite(2) syscalls are wrapped to provide compatibility with pre-7.x kernels which required padding before the off_t parameter. The fcntl(2) contains compatibility code to handle kernels before the struct flock was changed during the 8.x CURRENT development. The shims were reasonable to allow easier revert to the older kernel at that time. Now, two or three major releases later, shims do not serve any purpose. Such old kernels cannot handle current libc, so revert the compatibility code. Make padded syscalls support conditional under the COMPAT6 config option. For COMPAT32, the syscalls were under COMPAT6 already. Remove WITHOUT_SYSCALL_COMPAT build option, which only purpose was to (partially) disable the removed shims. Reviewed by: jhb, imp (previous versions) Discussed with: peter Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
b7a39e9e |
|
17-Feb-2015 |
Mateusz Guzik <mjg@FreeBSD.org> |
filedesc: simplify fget_unlocked & friends Introduce fget_fcntl which performs appropriate checks when needed. This removes a branch from fget_unlocked. Introduce fget_mmap dealing with cap_rights_to_vmprot conversion. This removes a branch from _fget. Modify fget_unlocked to pass sequence counter to interested callers so that they can perform their own checks and make sure the result was otained from stable & current state. Reviewed by: silence on -hackers
|
#
50ae6690 |
|
28-Nov-2014 |
Hans Petter Selasky <hselasky@FreeBSD.org> |
Style changes: - Move two IOCTL related defines to the top of the C-file - Add more comments describing the recently added IOCTL small size and small align macros
|
#
186d9c34 |
|
12-Nov-2014 |
Dmitry Chagin <dchagin@FreeBSD.org> |
Add the ppoll() system call. Export kern_poll() needed by an upcoming Linuxulator change. Differential Revision: https://reviews.freebsd.org/D1133 Reviewed by: kib, wblock MFC after: 1 month
|
#
0ecd606b |
|
04-Nov-2014 |
Hans Petter Selasky <hselasky@FreeBSD.org> |
Simplify logic a bit. Ensure data buffer is properly aligned, especially for platforms where unaligned access is not allowed. Make it possible to override the small buffer size. A simple continuous read string test using libusb showed a reduction in CPU usage from roughly 10% to less than 1% using a dual-core GHz CPU, when the malloc() operation was skipped for small buffers. MFC after: 2 weeks
|
#
5cbf44bf |
|
03-Nov-2014 |
Mateusz Guzik <mjg@FreeBSD.org> |
Provide an on-stack temporary buffer for small ioctl requests.
|
#
ffc5ce7b |
|
23-Oct-2014 |
Mateusz Guzik <mjg@FreeBSD.org> |
In selfdfree re-evaulate sf_si after takin the lock. Otherwise we can race with doselwakeup. This is a fixup to r273549 Reviewed by: jhb Reported by: everyone and their dog
|
#
73f2e5f7 |
|
23-Oct-2014 |
Mateusz Guzik <mjg@FreeBSD.org> |
Avoid taking the lock in selfdfree when not needed.
|
#
adf87ab0 |
|
21-Jun-2014 |
Mateusz Guzik <mjg@FreeBSD.org> |
fd: replace fd_nfiles with fd_lastfile where appropriate fd_lastfile is guaranteed to be the biggest open fd, so when the intent is to iterate over active fds or lookup one, there is no point in looking beyond that limit. Few places are left unpatched for now. MFC after: 1 week
|
#
4bc38a5a |
|
12-Apr-2014 |
Davide Italiano <davide@FreeBSD.org> |
Hide internal details of sbintime_t implementation wrapping INT64_MAX into SBT_MAX, to make it more robust in case internal type representation will change in the future. All the consumers were migrated to SBT_MAX and every new consumer (if any) should from now use this interface. Requested by: bapt, jmg, Ryan Lortie (implictly) Reviewed by: mav, bde
|
#
4a144410 |
|
16-Mar-2014 |
Robert Watson <rwatson@FreeBSD.org> |
Update kernel inclusions of capability.h to use capsicum.h instead; some further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h. MFC after: 3 weeks
|
#
ed5848c8 |
|
15-Nov-2013 |
Pawel Jakub Dawidek <pjd@FreeBSD.org> |
Replace CAP_POLL_EVENT and CAP_POST_EVENT capability rights (which I had a very hard time to fully understand) with much more intuitive rights: CAP_EVENT - when set on descriptor, the descriptor can be monitored with syscalls like select(2), poll(2), kevent(2). CAP_KQUEUE_EVENT - When set on a kqueue descriptor, the kevent(2) syscall can be called on this kqueue to with the eventlist argument set to non-NULL value; in other words the given kqueue descriptor can be used to monitor other descriptors. CAP_KQUEUE_CHANGE - When set on a kqueue descriptor, the kevent(2) syscall can be called on this kqueue to with the changelist argument set to non-NULL value; in other words it allows to modify events monitored with the given kqueue descriptor. Add alias CAP_KQUEUE, which allows for both CAP_KQUEUE_EVENT and CAP_KQUEUE_CHANGE. Add backward compatibility define CAP_POLL_EVENT which is equal to CAP_EVENT. Sponsored by: The FreeBSD Foundation MFC after: 3 days
|
#
cd4dd444 |
|
15-Oct-2013 |
Konstantin Belousov <kib@FreeBSD.org> |
By default, allow up to SSIZE_MAX i/o for non-devfs files. Sponsored by: The FreeBSD Foundation Reminded by: Dmitry Sivachenko <trtrmitya@gmail.com> MFC after: 1 month X-MFC-note: stable/10 only
|
#
bf3e483b |
|
15-Oct-2013 |
Konstantin Belousov <kib@FreeBSD.org> |
Similar to debug.iosize_max_clamp sysctl, introduce devfs_iosize_max_clamp sysctl, which allows/disables SSIZE_MAX-sized i/o requests on the devfs files. Sponsored by: The FreeBSD Foundation Reminded by: Dmitry Sivachenko <trtrmitya@gmail.com> MFC after: 1 week
|
#
8e076eff |
|
04-Sep-2013 |
Sean Bruno <sbruno@FreeBSD.org> |
Restore builds on architectures that don't support CAPABILITIES (mips).
|
#
7008be5b |
|
04-Sep-2013 |
Pawel Jakub Dawidek <pjd@FreeBSD.org> |
Change the cap_rights_t type from uint64_t to a structure that we can extend in the future in a backward compatible (API and ABI) way. The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough. The structure definition looks like this: struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; }; The initial CAP_RIGHTS_VERSION is 0. The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements. The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future. To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg. #define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL) We still support aliases that combine few rights, but the rights have to belong to the same array element, eg: #define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL) #define CAP_FCHMODAT (CAP_FCHMOD | CAP_LOOKUP) There is new API to manage the new cap_rights_t structure: cap_rights_t *cap_rights_init(cap_rights_t *rights, ...); void cap_rights_set(cap_rights_t *rights, ...); void cap_rights_clear(cap_rights_t *rights, ...); bool cap_rights_is_set(const cap_rights_t *rights, ...); bool cap_rights_is_valid(const cap_rights_t *rights); void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src); void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src); bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little); Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg: cap_rights_t rights; cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT); There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg: #define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...); Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1: cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL); Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition. This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x. Sponsored by: The FreeBSD Foundation
|
#
76665df9 |
|
28-Jun-2013 |
Peter Wemm <peter@FreeBSD.org> |
Help out gcc. clang understands. sys_generic.c:1510: warning: 'precision' may be used uninitialized *** [sys_generic.o] Error code 1
|
#
237abf0c |
|
28-Jun-2013 |
Davide Italiano <davide@FreeBSD.org> |
- Trim an unused and bogus Makefile for mount_smbfs. - Reconnect with some minor modifications, in particular now selsocket() internals are adapted to use sbintime units after recent'ish calloutng switch.
|
#
21a37a71 |
|
09-Mar-2013 |
Alexander Motin <mav@FreeBSD.org> |
Rework overflow checks of r247898 to not let too "intelligent" compiler to optimize it out. Submitted by: bde
|
#
980c545d |
|
06-Mar-2013 |
Alexander Motin <mav@FreeBSD.org> |
Fix time math overflows and improve zero intervals handling in poll(), select(), nanosleep() and kevent() functions after calloutng changes. Reported by: bde
|
#
cf5e4fe6 |
|
04-Mar-2013 |
Davide Italiano <davide@FreeBSD.org> |
MFcalloutng: Fix kern_select() and sys_poll() so that they can handle sub-tick precision for timeouts (in the same fashion it was done for nanosleep() in r247797). Sponsored by: Google Summer of Code 2012, iXsystems inc. Tested by: flo, marius, ian, markj, Fabian Keil
|
#
2609222a |
|
01-Mar-2013 |
Pawel Jakub Dawidek <pjd@FreeBSD.org> |
Merge Capsicum overhaul: - Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights. - The cap_new(2) system call is left, but it is no longer documented and should not be used in new code. - The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one. - The cap_getrights(2) syscall is renamed to cap_rights_get(2). - If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall. - If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2). - To support ioctl and fcntl white-listing the filedesc structure was heavly modified. - The audit subsystem, kdump and procstat tools were updated to recognize new syscalls. - Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below: CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT. Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2). Added CAP_SYMLINKAT: - Allow for symlinkat(2). Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2). Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory. Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall. Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call. Removed CAP_MAPEXEC. CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE. Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ | PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ | PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE | PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ | PROT_WRITE | PROT_EXEC). Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT. CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required). CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required). Added convinient defines: #define CAP_PREAD (CAP_SEEK | CAP_READ) #define CAP_PWRITE (CAP_SEEK | CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP | CAP_SEEK | CAP_READ) #define CAP_MMAP_W (CAP_MMAP | CAP_SEEK | CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP | CAP_SEEK | 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R | CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R | CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W | CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R | CAP_MMAP_W | CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE #define CAP_SOCK_CLIENT \ (CAP_CONNECT | CAP_GETPEERNAME | CAP_GETSOCKNAME | CAP_GETSOCKOPT | \ CAP_PEELOFF | CAP_RECV | CAP_SEND | CAP_SETSOCKOPT | CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT | CAP_BIND | CAP_GETPEERNAME | CAP_GETSOCKNAME | \ CAP_GETSOCKOPT | CAP_LISTEN | CAP_PEELOFF | CAP_RECV | CAP_SEND | \ CAP_SETSOCKOPT | CAP_SHUTDOWN) Added defines for backward API compatibility: #define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT | CAP_SOCK_SERVER) Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib
|
#
ad9789f6 |
|
23-Dec-2012 |
Konstantin Belousov <kib@FreeBSD.org> |
Do not force a writer to the devfs file to drain the buffer writes. Requested and tested by: Ian Lepore <freebsd@damnhippie.dyndns.org> MFC after: 2 weeks
|
#
2e564269 |
|
17-Oct-2012 |
Attilio Rao <attilio@FreeBSD.org> |
Disconnect non-MPSAFE SMBFS from the build in preparation for dropping GIANT from VFS. In addition, disconnect also netsmb, which is a base requirement for SMBFS. In the while SMBFS regular users can use FUSE interface and smbnetfs port to work with their SMBFS partitions. Also, there are ongoing efforts by vendor to support in-kernel smbfs, so there are good chances that it will get relinked once properly locked. This is not targeted for MFC.
|
#
68cefd64 |
|
17-Jun-2012 |
Davide Italiano <davide@FreeBSD.org> |
The variable 'error' in sys_poll() is initialized in declaration to value zero but in any case is overwritten by successive copyin(), making the previous initialization useless. Remove this. As an added bonus this fixes a style(9) bug. Discussed with: kib Approved by: gnn (mentor) MFC after: 3 days
|
#
8bb9a904 |
|
04-Mar-2012 |
Konstantin Belousov <kib@FreeBSD.org> |
Instead of incomplete handling of read(2)/write(2) return values that does not fit into registers, declare that we do not support this case using CTASSERT(), and remove endianess-unsafe code to split return value into td_retval. While there, change the style of the sysctl debug.iosize_max_clamp definition. Requested by: bde MFC after: 3 weeks
|
#
526d0bd5 |
|
20-Feb-2012 |
Konstantin Belousov <kib@FreeBSD.org> |
Fix found places where uio_resid is truncated to int. Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from the usermode. Discussed with: bde, das (previous versions) MFC after: 1 month
|
#
56be1b9a |
|
13-Nov-2011 |
Konstantin Belousov <kib@FreeBSD.org> |
To limit amount of the kernel memory allocated, and to optimize the iteration over the fdsets, kern_select() limits the length of the fdsets copied in by the last valid file descriptor index. If any bit is set in a mask above the limit, current implementation ignores the filedescriptor, instead of returning EBADF. Fix the issue by scanning the tails of fdset before entering the select loop and returning EBADF if any bit above last valid filedescriptor index is set. The performance impact of the additional check is only imposed on the (somewhat) buggy applications that pass bad file descriptors to select(2) or pselect(2). PR: kern/155606, kern/162379 Discussed with: cognet, glebius Tested by: andreast (powerpc, all 64/32bit ABI combinations, big-endian), marius (sparc64, big-endian) MFC after: 2 weeks
|
#
8451d0dd |
|
16-Sep-2011 |
Kip Macy <kmacy@FreeBSD.org> |
In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)
|
#
6aba400a |
|
25-Aug-2011 |
Attilio Rao <attilio@FreeBSD.org> |
Fix a deficiency in the selinfo interface: If a selinfo object is recorded (via selrecord()) and then it is quickly destroyed, with the waiters missing the opportunity to awake, at the next iteration they will find the selinfo object destroyed, causing a PF#. That happens because the selinfo interface has no way to drain the waiters before to destroy the registered selinfo object. Also this race is quite rare to get in practice, because it would require a selrecord(), a poll request by another thread and a quick destruction of the selrecord()'ed selinfo object. Fix this by adding the seldrain() routine which should be called before to destroy the selinfo objects (in order to avoid such case), and fix the present cases where it might have already been called. Sometimes, the context is safe enough to prevent this type of race, like it happens in device drivers which installs selinfo objects on poll callbacks. There, the destruction of the selinfo object happens at driver detach time, when all the filedescriptors should be already closed, thus there cannot be a race. For this case, mfi(4) device driver can be set as an example, as it implements a full correct logic for preventing this from happening. Sponsored by: Sandvine Incorporated Reported by: rstone Tested by: pluknet Reviewed by: jhb, kib Approved by: re (bz) MFC after: 3 weeks
|
#
d6f72489 |
|
16-Aug-2011 |
Jonathan Anderson <jonathan@FreeBSD.org> |
poll(2) implementation for capabilities. When calling poll(2) on a capability, unwrap first and then poll the underlying object. Approved by: re (kib), mentor (rwatson) Sponsored by: Google Inc
|
#
d1b6899e |
|
12-Aug-2011 |
Jonathan Anderson <jonathan@FreeBSD.org> |
Rename CAP_*_KEVENT to CAP_*_EVENT. Change the names of a couple of capability rights to be less FreeBSD-specific. Approved by: re (kib), mentor (rwatson) Sponsored by: Google Inc
|
#
a9d2f8d8 |
|
10-Aug-2011 |
Robert Watson <rwatson@FreeBSD.org> |
Second-to-last commit implementing Capsicum capabilities in the FreeBSD kernel for FreeBSD 9.0: Add a new capability mask argument to fget(9) and friends, allowing system call code to declare what capabilities are required when an integer file descriptor is converted into an in-kernel struct file *. With options CAPABILITIES compiled into the kernel, this enforces capability protection; without, this change is effectively a no-op. Some cases require special handling, such as mmap(2), which must preserve information about the maximum rights at the time of mapping in the memory map so that they can later be enforced in mprotect(2) -- this is done by narrowing the rights in the existing max_protection field used for similar purposes with file permissions. In namei(9), we assert that the code is not reached from within capability mode, as we're not yet ready to enforce namespace capabilities there. This will follow in a later commit. Update two capability names: CAP_EVENT and CAP_KEVENT become CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they represent. Approved by: re (bz) Submitted by: jonathan Sponsored by: Google Inc
|
#
a7d5f7eb |
|
19-Oct-2010 |
Jamie Gritton <jamie@FreeBSD.org> |
A new jail(8) with a configuration file, to replace the work currently done by /etc/rc.d/jail.
|
#
6d8fedda |
|
28-Aug-2010 |
Konstantin Belousov <kib@FreeBSD.org> |
For some file types, select code registers two selfd structures. E.g., for socket, when specified POLLIN|POLLOUT in events, you would have one selfd registered for receiving socket buffer, and one for sending. Now, if both events are not ready to fire at the time of the initial scan, but are simultaneously ready after the sleep, pollrescan() would iterate over the pollfd struct twice. Since both times revents is not zero, returned value would be off by one. Fix this by recalculating the return value in pollout(). PR: kern/143029 MFC after: 2 weeks
|
#
7a6f3d78 |
|
29-Jun-2010 |
John Baldwin <jhb@FreeBSD.org> |
Send SIGPIPE to the thread that issued the offending system call rather than to the entire process. Reported by: Anit Chakraborty Reviewed by: kib, deischen (concept) MFC after: 1 week
|
#
f0f2f0a9 |
|
04-Jun-2010 |
Konstantin Belousov <kib@FreeBSD.org> |
MFC r208374: Remove POLLHUP from the flags used to test for to set exceptfsd fd_set bits in select(2). It seems that historical behaviour is to not reporting exception on EOF, and several applications are broken. Approved by: re (kensmith)
|
#
61e53a38 |
|
21-May-2010 |
Konstantin Belousov <kib@FreeBSD.org> |
Remove PIOLLHUP from the flags used to test for to set exceptfsd fd_set bits in select(2). It seems that historical behaviour is to not reporting exception on EOF, and several applications are broken. Reported by: Yoshihiko Sarumaru <ysarumaru gmail com> Discussed with: bde PR: ports/140934 MFC after: 2 weeks
|
#
4ccf64eb |
|
06-Apr-2010 |
Nathan Whitehorn <nwhitehorn@FreeBSD.org> |
MFC r205014,205015: Provide groundwork for 32-bit binary compatibility on non-x86 platforms, for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32 option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts of the kernel and enhances the freebsd32 compatibility code to support big-endian platforms. This MFC is required for MFCs of later changes to the freebsd32 compatibility from HEAD. Requested by: kib
|
#
841c0c7e |
|
11-Mar-2010 |
Nathan Whitehorn <nwhitehorn@FreeBSD.org> |
Provide groundwork for 32-bit binary compatibility on non-x86 platforms, for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32 option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts of the kernel and enhances the freebsd32 compatibility code to support big-endian platforms. Reviewed by: kib, jhb
|
#
7e767511 |
|
19-Dec-2009 |
Konstantin Belousov <kib@FreeBSD.org> |
MFC r198508, r198509: Reimplement pselect() in kernel, making change of sigmask and sleep atomic. MFC r198538: Move pselect(3) man page to section 2.
|
#
066d836b |
|
27-Oct-2009 |
Konstantin Belousov <kib@FreeBSD.org> |
Current pselect(3) is implemented in usermode and thus vulnerable to well-known race condition, which elimination was the reason for the function appearance in first place. If sigmask supplied as argument to pselect() enables a signal, the signal might be delivered before thread called select(2), causing lost wakeup. Reimplement pselect() in kernel, making change of sigmask and sleep atomic. Since signal shall be delivered to the usermode, but sigmask restored, set TDP_OLDMASK and save old mask in td_oldsigmask. The TDP_OLDMASK should be cleared by ast() in case signal was not gelivered during syscall execution. Reviewed by: davidxu Tested by: pho MFC after: 1 month
|
#
9f1fab50 |
|
16-Sep-2009 |
Konstantin Belousov <kib@FreeBSD.org> |
MFC r197049: Calculate the amount of bytes to copy for select filedescriptor masks taking into account size of fd_set for the current process ABI. Approved by: re (kensmith)
|
#
b55ef216 |
|
09-Sep-2009 |
Konstantin Belousov <kib@FreeBSD.org> |
kern_select(9) copies fd_set in and out of userspace in quantities of longs. Since 32bit processes longs are 4 bytes, 64bit kernel may copy in or out 4 bytes more then the process expected. Calculate the amount of bytes to copy taking into account size of fd_set for the current process ABI. Diagnosed and tested by: Peter Jeremy <peterjeremy acm org> Reviewed by: jhb MFC after: 1 week
|
#
33656b91 |
|
01-Sep-2009 |
Jilles Tjoelker <jilles@FreeBSD.org> |
MFC r196460 Fix the conformance of poll(2) for sockets after r195423 by returning POLLHUP instead of POLLIN for several cases. Now, the tools/regression/poll results for FreeBSD are closer to that of the Solaris and Linux. Also, improve the POSIX conformance by explicitely clearing POLLOUT when POLLHUP is reported in pollscan(), making the fix global. Submitted by: bde Reviewed by: rwatson MFC r196556 Fix poll() on half-closed sockets, while retaining POLLHUP for fifos. This reverts part of r196460, so that sockets only return POLLHUP if both directions are closed/error. Fifos get POLLHUP by closing the unused direction immediately after creating the sockets. The tools/regression/poll/*poll.c tests now pass except for two other things: - if POLLHUP is returned, POLLIN is always returned as well instead of only when there is data left in the buffer to be read - fifo old/new reader distinction does not work the way POSIX specs it Reviewed by: kib, bde MFC r196554 Add some tests for poll(2)/shutdown(2) interaction. Approved by: re (kensmith)
|
#
f2159cc7 |
|
22-Aug-2009 |
Konstantin Belousov <kib@FreeBSD.org> |
Fix the conformance of poll(2) for sockets after r195423 by returning POLLHUP instead of POLLIN for several cases. Now, the tools/regression/poll results for FreeBSD are closer to that of the Solaris and Linux. Also, improve the POSIX conformance by explicitely clearing POLLOUT when POLLHUP is reported in pollscan(), making the fix global. Submitted by: bde Reviewed by: rwatson MFC after: 1 week
|
#
64c9a4d9 |
|
02-Jul-2009 |
Robert Watson <rwatson@FreeBSD.org> |
Audit file descriptor and command arguments to ioctl(2). Approved by: re (audit argument blanket) MFC after: 1 week
|
#
2141453e |
|
01-Jul-2009 |
Jeff Roberson <jeff@FreeBSD.org> |
- Use fd_lastfile + 1 as the upper bound on nd. This is more correct than using the size of the descriptor array. - A lock is not needed to fetch fd_lastfile. The results are stale the instant it is dropped. - Use a private mutex pool for select since the pool mutex is not used as a leaf. - Fetch the si_mtx pointer first before resorting to hashing to compute the mutex address. Reviewed by: McKusick Approved by: re (kib)
|
#
14961ba7 |
|
27-Jun-2009 |
Robert Watson <rwatson@FreeBSD.org> |
Replace AUDIT_ARG() with variable argument macros with a set more more specific macros for each audit argument type. This makes it easier to follow call-graphs, especially for automated analysis tools (such as fxr). In MFC, we should leave the existing AUDIT_ARG() macros as they may be used by third-party kernel modules. Suggested by: brooks Approved by: re (kib) Obtained from: TrustedBSD Project MFC after: 1 week
|
#
bf422e5f |
|
13-May-2009 |
Jeff Roberson <jeff@FreeBSD.org> |
- Implement a lockless file descriptor lookup algorithm in fget_unlocked(). - Save old file descriptor tables created on expansion until the entire descriptor table is freed so that pointers may be followed without regard for expanders. - Mark the file zone as NOFREE so we may attempt to reference potentially freed files. - Convert several fget_locked() users to fget_unlocked(). This requires us to manage reference counts explicitly but reduces locking overhead in the common case.
|
#
ae81968f |
|
11-Mar-2009 |
Robert Watson <rwatson@FreeBSD.org> |
When writing out updated pollfd records when returning from poll(), only copy out the revents field, not the whole pollfd structure. Otherwise, if the events field is updated concurrently by another thread, that update may be lost. This issue apparently causes problems for the JDK on FreeBSD, which expects the Linux behavior of not updating all fields (somewhat oddly, Solaris does not implement the required behavior, but presumably our adaptation of the JDK is based on the Linux port?). MFC after: 2 weeks PR: kern/130924 Submitted by: Kurt Miller <kurt @ intricatesoftware.com> Discussed with: kib
|
#
125dcf8c |
|
06-Mar-2009 |
Konstantin Belousov <kib@FreeBSD.org> |
Extract the no_poll() and vop_nopoll() code into the common routine poll_no_poll(). Return a poll_no_poll() result from devfs_poll_f() when filedescriptor does not reference the live cdev, instead of ENXIO. Noted and tested by: hps MFC after: 1 week
|
#
60b7f468 |
|
01-Feb-2009 |
Stephane E. Potvin <sepotvin@FreeBSD.org> |
Fix select on platforms where sizeof(long) != sizeof(int). This used to work by accident before the cleanup done in revision 187693. Approved by: kan (mentor)
|
#
9cdacff1 |
|
25-Jan-2009 |
Jeff Roberson <jeff@FreeBSD.org> |
- bit has to be fd_mask to work properly on 64bit platforms. Constants must also be cast even though the result ultimately is promoted to 64bit. - Correct a loop index upper bound in selscan().
|
#
748b9df6 |
|
25-Jan-2009 |
Jeff Roberson <jeff@FreeBSD.org> |
- Correct a typo in a comment. Noticed by: danger
|
#
11b763df |
|
25-Jan-2009 |
Jeff Roberson <jeff@FreeBSD.org> |
Fix errors introduced when I rewrote select. - Restructure selscan() and selrescan() to avoid producing extra selfps when we have a fd in multiple sets. As described below multiple selfps may still exist for other reasons. - Make selrescan() tolerate multiple selfds for a given descriptor set since sockets use two selinfos per fd. If an event on each selinfo fires selrescan() will see the descriptor twice. This could result in select() returning 2x the number of fds actually existing in fd sets. Reported by: mgleason@ncftp.com
|
#
d7f03759 |
|
19-Oct-2008 |
Ulf Lilleengen <lulf@FreeBSD.org> |
- Import the HEAD csup code which is the basis for the cvsmode work.
|
#
715457f6 |
|
23-Sep-2008 |
David E. O'Brien <obrien@FreeBSD.org> |
Reverse if() logic to improve readability. Reviewed by: ru
|
#
bd4e1535 |
|
19-Mar-2008 |
Jeff Roberson <jeff@FreeBSD.org> |
- Remove stale comment. - In the last revision the code was changed to use maxfilesperproc rather than the per-process file limit to restrict the size of the poll array. This eliminates a significant source of process lock contention in multithreaded programs and is cheaper. This had been committed with the wrong batch of changes.
|
#
374ae2a3 |
|
19-Mar-2008 |
Jeff Roberson <jeff@FreeBSD.org> |
- Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice from requiring the per-process spinlock to only requiring the process lock. - Reflect these changes in the proc.h documentation and consumers throughout the kernel. This is a substantial reduction in locking cost for these fields and was made possible by recent changes to threading support.
|
#
e4650294 |
|
07-Jan-2008 |
John Baldwin <jhb@FreeBSD.org> |
Make ftruncate a 'struct file' operation rather than a vnode operation. This makes it possible to support ftruncate() on non-vnode file types in the future. - 'struct fileops' grows a 'fo_truncate' method to handle an ftruncate() on a given file descriptor. - ftruncate() moves to kern/sys_generic.c and now just fetches a file object and invokes fo_truncate(). - The vnode-specific portions of ftruncate() move to vn_truncate() in vfs_vnops.c which implements fo_truncate() for vnode file types. - Non-vnode file types return EINVAL in their fo_truncate() method. Submitted by: rwatson
|
#
397c19d1 |
|
29-Dec-2007 |
Jeff Roberson <jeff@FreeBSD.org> |
Remove explicit locking of struct file. - Introduce a finit() which is used to initailize the fields of struct file in such a way that the ops vector is only valid after the data, type, and flags are valid. - Protect f_flag and f_count with atomic operations. - Remove the global list of all files and associated accounting. - Rewrite the unp garbage collection such that it no longer requires the global list of all files and instead uses a list of all unp sockets. - Mark sockets in the accept queue so we don't incorrectly gc them. Tested by: kris, pho
|
#
ace8398d |
|
15-Dec-2007 |
Jeff Roberson <jeff@FreeBSD.org> |
Refactor select to reduce contention and hide internal implementation details from consumers. - Track individual selecters on a per-descriptor basis such that there are no longer collisions and after sleeping for events only those descriptors which triggered events must be rescaned. - Protect the selinfo (per descriptor) structure with a mtx pool mutex. mtx pool mutexes were chosen to preserve api compatibility with existing code which does nothing but bzero() to setup selinfo structures. - Use a per-thread wait channel rather than a global wait channel. - Hide select implementation details in a seltd structure which is opaque to the rest of the kernel. - Provide a 'selsocket' interface for those kernel consumers who wish to select on a socket when they have no fd so they no longer have to be aware of select implementation details. Tested by: kris Reviewed on: arch
|
#
431f8906 |
|
13-Nov-2007 |
Julian Elischer <julian@FreeBSD.org> |
generally we are interested in what thread did something as opposed to what process. Since threads by default have teh name of the process unless over-written with more useful information, just print the thread name instead.
|
#
c2815ad5 |
|
04-Jul-2007 |
Peter Wemm <peter@FreeBSD.org> |
Add freebsd6_ wrappers for mmap/lseek/pread/pwrite/truncate/ftruncate Approved by: re (kensmith)
|
#
982d11f8 |
|
04-Jun-2007 |
Jeff Roberson <jeff@FreeBSD.org> |
Commit 14/14 of sched_lock decomposition. - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
|
#
fa75abb0 |
|
01-May-2007 |
Alan Cox <alc@FreeBSD.org> |
Remove unneeded include files.
|
#
5e3f7694 |
|
04-Apr-2007 |
Robert Watson <rwatson@FreeBSD.org> |
Replace custom file descriptor array sleep lock constructed using a mutex and flags with an sxlock. This leads to a significant and measurable performance improvement as a result of access to shared locking for frequent lookup operations, reduced general overhead, and reduced overhead in the event of contention. All of these are imported for threaded applications where simultaneous access to a shared file descriptor array occurs frequently. Kris has reported 2x-4x transaction rate improvements on 8-core MySQL benchmarks; smaller improvements can be expected for many workloads as a result of reduced overhead. - Generally eliminate the distinction between "fast" and regular acquisisition of the filedesc lock; the plan is that they will now all be fast. Change all locking instances to either shared or exclusive locks. - Correct a bug (pointed out by kib) in fdfree() where previously msleep() was called without the mutex held; sx_sleep() is now always called with the sxlock held exclusively. - Universally hold the struct file lock over changes to struct file, rather than the filedesc lock or no lock. Always update the f_ops field last. A further memory barrier is required here in the future (discussed with jhb). - Improve locking and reference management in linux_at(), which fails to properly acquire vnode references before using vnode pointers. Annotate improper use of vn_fullpath(), which will be replaced at a future date. In fcntl(), we conservatively acquire an exclusive lock, even though in some cases a shared lock may be sufficient, which should be revisited. The dropping of the filedesc lock in fdgrowtable() is no longer required as the sxlock can be held over the sleep operation; we should consider removing that (pointed out by attilio). Tested by: kris Discussed with: jhb, kris, attilio, jeff
|
#
873fbcd7 |
|
05-Mar-2007 |
Robert Watson <rwatson@FreeBSD.org> |
Further system call comment cleanup: - Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde) - Remove extra blank lines in some cases. - Add extra blank lines in some cases. - Remove no-op comments consisting solely of the function name, the word "syscall", or the system call name. - Add punctuation. - Re-wrap some comments.
|
#
0c14ff0e |
|
04-Mar-2007 |
Robert Watson <rwatson@FreeBSD.org> |
Remove 'MPSAFE' annotations from the comments above most system calls: all system calls now enter without Giant held, and then in some cases, acquire Giant explicitly. Remove a number of other MPSAFE annotations in the credential code and tweak one or two other adjacent comments.
|
#
6f7ca813 |
|
01-Mar-2007 |
Bruce M Simpson <bms@FreeBSD.org> |
Do not dispatch SIGPIPE from the generic write path for a socket; with this patch the code behaves according to the comment on the line above. Without this patch, a socket could cause SIGPIPE to be delivered to its process, once with SO_NOSIGPIPE set, and twice without. With this patch, the kernel now passes the sigpipe regression test. Tested by: Anton Yuzhaninov MFC after: 1 week
|
#
a1b0a180 |
|
14-Oct-2006 |
Ruslan Ermilov <ru@FreeBSD.org> |
Prevent IOC_IN with zero size argument (this is only supported if backward copatibility options are present) from attempting to free memory that wasn't allocated. This is an old bug, and previously it would attempt to free a null pointer. I noticed this bug when working on the previous revision, but forgot to fix it. Security: local DoS Reported by: Peter Holm MFC after: 3 days
|
#
9fddcc66 |
|
27-Sep-2006 |
Ruslan Ermilov <ru@FreeBSD.org> |
Fix our ioctl(2) implementation when the argument is "int". New ioctls passing integer arguments should use the _IOWINT() macro. This fixes a lot of ioctl's not working on sparc64, most notable being keyboard/syscons ioctls. Full ABI compatibility is provided, with the bonus of fixing the handling of old ioctls on sparc64. Reviewed by: bde (with contributions) Tested by: emax, marius MFC after: 1 week
|
#
d9f46233 |
|
08-Jul-2006 |
John Baldwin <jhb@FreeBSD.org> |
- Split ioctl() up into ioctl() and kern_ioctl(). The kern_ioctl() assumes that the 'data' pointer is already setup to point to a valid KVM buffer or contains the copied-in data from userland as appropriate (ioctl(2) still does this). kern_ioctl() takes care of looking up a file pointer, implementing FIONCLEX and FIOCLEX, and calling fi_ioctl(). - Use kern_ioctl() to implement xenix_rdchk() instead of using the stackgap and mark xenix_rdchk() MPSAFE.
|
#
af56abaa |
|
06-Jan-2006 |
John Baldwin <jhb@FreeBSD.org> |
Return error from fget_write() rather than hardcoding EBADF now that fget_write() DTRT. Requested by: bde
|
#
e730167f |
|
05-Jan-2006 |
John Baldwin <jhb@FreeBSD.org> |
Remove XXX comments complaining that write(2) on a read-only descriptor returns EBADF. That errno is correct and is mandated by POSIX. It also goes back to revision 1.1 of our CVS history (i.e. 4.4BSD). The _fget() function should probably also be upated as it currently returns EINVAL in that case rather than EBADF. (It does return EBADF for reads on a write-only descriptor without any XXX comments oddly enough.) Discussed with: scottl, grog, mjacob, bde
|
#
bcd9e0dd |
|
07-Jul-2005 |
John Baldwin <jhb@FreeBSD.org> |
- Add two new system calls: preadv() and pwritev() which are like readv() and writev() except that they take an additional offset argument and do not change the current file position. In SAT speak: preadv:readv::pread:read and pwritev:writev::pwrite:write. - Try to reduce code duplication some by merging most of the old kern_foov() and dofilefoo() functions into new dofilefoo() functions that are called by kern_foov() and kern_pfoov(). The non-v functions now all generate a simple uio on the stack from the passed in arguments and then call kern_foov(). For example, read() now just builds a uio and calls kern_readv() and pwrite() just builds a uio and calls kern_pwritev(). PR: kern/80362 Submitted by: Marc Olzheim marcolz at stack dot nl (1) Approved by: re (scottl) MFC after: 1 week
|
#
2de92a38 |
|
29-Jun-2005 |
Peter Wemm <peter@FreeBSD.org> |
Conditionally weaken sys_generic.c rev 1.136 to allow certain dubious ioctl numbers in backwards compatability mode. eg: an IOC_IN ioctl with a size of zero. Traditionally this was what you did before IOC_VOID existed, and we had some established users of this in the tree, namely procfs. Certain 3rd party drivers with binary userland components also have this too. This is necessary to have 4.x and 5.x binaries use these ioctl's. We found this at work when trying to run 4.x binaries. Approved by: re
|
#
b88ec951 |
|
31-Mar-2005 |
John Baldwin <jhb@FreeBSD.org> |
Implement kern_adjtime(), kern_readv(), kern_sched_rr_get_interval(), kern_settimeofday(), and kern_writev() to allow for further stackgap reduction in the compat ABIs.
|
#
e5e6a464 |
|
10-Feb-2005 |
Colin Percival <cperciva@FreeBSD.org> |
Declare "cnt" (a number of bytes to read or write) as an "ssize_t", not as a "long" in dofileread() and dofilewrite(). Discussed with: jhb
|
#
4f8d23d6 |
|
25-Jan-2005 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Previously a read of zero bytes got handled in devfs:vop_read() but I missed that when the vnode bypass was introduced. Deal with zero length transfers before we even get to fo_ops->fo_read(). Found by: Slawa Olhovchenkov <slwzxy.spb.ru@zxy.spb.ru> PR: 75758
|
#
9fc6aa06 |
|
18-Jan-2005 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Detect sign-extension bugs in the ioctl(2) command argument: Truncate to 32 bits and print warning.
|
#
9454b2d8 |
|
06-Jan-2005 |
Warner Losh <imp@FreeBSD.org> |
/* -> /*- for copyright notices, minor format tweaks as necessary
|
#
a0fbccc9 |
|
17-Nov-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Push Giant down through ioctl. Don't grab Giant in the upper syscall/wrapper code NET_LOCK_GIANT in the socket code (sockets/fifos). mtx_lock(&Giant) in the vnode code. mtx_lock(&Giant) in the opencrypto code. (This may actually not be needed, but better safe than sorry). Devfs grabs Giant if the driver is marked as needing Giant.
|
#
db446e30 |
|
17-Nov-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Push Giant down through select and poll. Don't grab Giant in the upper syscall/wrapper code NET_LOCK_GIANT in the socket code (sockets/fifos). mtx_lock(&Giant) in the vnode code. Devfs grabs Giant if the driver is marked as needing Giant.
|
#
8ccf264f |
|
16-Nov-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Polish code to correctly reflect structure.
|
#
ca51b19b |
|
14-Nov-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Rearrange memory management for ioctl arguments to use stronger checks for illegal values and don't store them on the stack any more.
|
#
3e15c66f |
|
13-Nov-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
style polish.
|
#
124e4c3b |
|
13-Nov-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Introduce an alias for FILEDESC_{UN}LOCK() with the suffix _FAST. Use this in all the places where sleeping with the lock held is not an issue. The distinction will become significant once we finalize the exact lock-type to use for this kind of case.
|
#
2580f4e5 |
|
27-Aug-2004 |
Andre Oppermann <andre@FreeBSD.org> |
Poll() uses the array smallbits that is big enough to hold 32 struct pollfd's to avoid calling malloc() on small numbers of fd's. Because smalltype's members have type char, its address might be misaligned for a struct pollfd. Change the array of char to an array of struct pollfd. PR: kern/58214 Submitted by: Stefan Farfeleder <stefan@fafoe.narf.at> Reviewed by: bde (a long time ago) MFC after: 3 days
|
#
552afd9c |
|
10-Jul-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Clean up and wash struct iovec and struct uio handling. Add copyiniov() which copies a struct iovec array in from userland into a malloc'ed struct iovec. Caller frees. Change uiofromiov() to malloc the uio (caller frees) and name it copyinuio() which is more appropriate. Add cloneuio() which returns a malloc'ed copy. Caller frees. Use them throughout.
|
#
7f8a436f |
|
05-Apr-2004 |
Warner Losh <imp@FreeBSD.org> |
Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999. Approved by: core
|
#
5d8dd01d |
|
12-Mar-2004 |
Robert Watson <rwatson@FreeBSD.org> |
Add annotations to mtx_lock(&Giant) in kern_select() and poll() that we always grab Giant, even if we're actually only polling objects that don't require giant. Once socket locking is merged, there will be strong motivation to fix this.
|
#
44f3b092 |
|
27-Feb-2004 |
John Baldwin <jhb@FreeBSD.org> |
Switch the sleep/wakeup and condition variable implementations to use the sleep queue interface: - Sleep queues attempt to merge some of the benefits of both sleep queues and condition variables. Having sleep qeueus in a hash table avoids having to allocate a queue head for each wait channel. Thus, struct cv has shrunk down to just a single char * pointer now. However, the hash table does not hold threads directly, but queue heads. This means that once you have located a queue in the hash bucket, you no longer have to walk the rest of the hash chain looking for threads. Instead, you have a list of all the threads sleeping on that wait channel. - Outside of the sleepq code and the sleep/cv code the kernel no longer differentiates between cv's and sleep/wakeup. For example, calls to abortsleep() and cv_abort() are replaced with a call to sleepq_abort(). Thus, the TDF_CVWAITQ flag is removed. Also, calls to unsleep() and cv_waitq_remove() have been replaced with calls to sleepq_remove(). - The sched_sleep() function no longer accepts a priority argument as sleep's no longer inherently bump the priority. Instead, this is soley a propery of msleep() which explicitly calls sched_prio() before blocking. - The TDF_ONSLEEPQ flag has been dropped as it was never used. The associated TDF_SET_ONSLEEPQ and TDF_CLR_ON_SLEEPQ macros have also been dropped and replaced with a single explicit clearing of td_wchan. TD_SET_ONSLEEPQ() would really have only made sense if it had taken the wait channel and message as arguments anyway. Now that that only happens in one place, a macro would be overkill.
|
#
91d5354a |
|
04-Feb-2004 |
John Baldwin <jhb@FreeBSD.org> |
Locking for the per-process resource limits structure. - struct plimit includes a mutex to protect a reference count. The plimit structure is treated similarly to struct ucred in that is is always copy on write, so having a reference to a structure is sufficient to read from it without needing a further lock. - The proc lock protects the p_limit pointer and must be held while reading limits from a process to keep the limit structure from changing out from under you while reading from it. - Various global limits that are ints are not protected by a lock since int writes are atomic on all the archs we support and thus a lock wouldn't buy us anything. - All accesses to individual resource limits from a process are abstracted behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return either an rlimit, or the current or max individual limit of the specified resource from a process. - dosetrlimit() was renamed to kern_setrlimit() to match existing style of other similar syscall helper functions. - The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit() (it didn't used the stackgap when it should have) but uses lim_rlimit() and kern_setrlimit() instead. - The svr4 compat no longer uses the stackgap for resource limits calls, but uses lim_rlimit() and kern_setrlimit() instead. - The ibcs2 compat no longer uses the stackgap for resource limits. It also no longer uses the stackgap for accessing sysctl's for the ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result, ibcs2_sysconf() no longer needs Giant. - The p_rlimit macro no longer exists. Submitted by: mtm (mostly, I only did a few cleanups and catchups) Tested on: i386 Compiled on: alpha, amd64
|
#
9bbee259 |
|
19-Jan-2004 |
Andrey A. Chernov <ache@FreeBSD.org> |
pread/pwrite: follow lseek spirit - return EINVAL on negative offset for non-VCHR
|
#
512824f8 |
|
09-Nov-2003 |
Seigo Tanimura <tanimura@FreeBSD.org> |
- Implement selwakeuppri() which allows raising the priority of a thread being waken up. The thread waken up can run at a priority as high as after tsleep(). - Replace selwakeup()s with selwakeuppri()s and pass appropriate priorities. - Add cv_broadcastpri() which raises the priority of the broadcast threads. Used by selwakeuppri() if collision occurs. Not objected in: -arch, -current
|
#
b2941431 |
|
26-Sep-2003 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Introduce no_poll() default method for device drivers. Have it do exactly the same as vop_nopoll() for consistency and put a comment in the two pointing at each other. Retire seltrue() in favour of no_poll(). Create private default functions in kern_conf.c instead of public ones. Change default strategy to return the bio with ENODEV instead of doing nothing which would lead the bio stranded. Retire public nullopen() and nullclose() as well as the entire band of public no{read,write,ioctl,mmap,kqfilter,strategy,poll,dump} funtions, they are the default actions now. Move the final two trivial functions from subr_xxx.c to kern_conf.c and retire the now empty subr_xxx.c
|
#
882d8469 |
|
31-Jul-2003 |
Alan Cox <alc@FreeBSD.org> |
Remove Giant from writev(2). Eliminate trivial style differences between writev(2) and readv(2).
|
#
2db4b023 |
|
18-Jun-2003 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Introduce a new flag on a file descriptor: DFLAG_SEEKABLE and use that rather than assume that only DTYPE_VNODE is seekable.
|
#
677b542e |
|
10-Jun-2003 |
David E. O'Brien <obrien@FreeBSD.org> |
Use __FBSDID().
|
#
104a9b7e |
|
29-Apr-2003 |
Alexander Kabaev <kan@FreeBSD.org> |
Deprecate machine/limits.h in favor of new sys/limits.h. Change all in-tree consumers to include <sys/limits.h> Discussed on: standards@ Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>
|
#
a163d034 |
|
18-Feb-2003 |
Warner Losh <imp@FreeBSD.org> |
Back out M_* changes, per decision of the TRB. Approved by: trb
|
#
44956c98 |
|
21-Jan-2003 |
Alfred Perlstein <alfred@FreeBSD.org> |
Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
|
#
d1e405c5 |
|
13-Dec-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
SCARGS removal take II.
|
#
bc9e75d7 |
|
13-Dec-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
Backout removal SCARGS, the code freeze is only "selectively" over.
|
#
0bbe7292 |
|
13-Dec-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
Remove SCARGS. Reviewed by: md5
|
#
37c84183 |
|
28-Sep-2002 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Be consistent about "static" functions: if the function is marked static in its prototype, mark it static at the definition too. Inspired by: FlexeLint warning #512
|
#
2e45c1b1 |
|
20-Sep-2002 |
Poul-Henning Kamp <phk@FreeBSD.org> |
We don't need the <sys/disklabel.h> include for alpha anymore. Sponsored by: DARPA & NAI Labs.
|
#
71fad9fd |
|
11-Sep-2002 |
Julian Elischer <julian@FreeBSD.org> |
Completely redo thread states. Reviewed by: davidxu@freebsd.org
|
#
8f19eb88 |
|
01-Sep-2002 |
Ian Dowse <iedowse@FreeBSD.org> |
Split out a number of mostly VFS and signal related syscalls into a kernel-internal kern_*() version and a wrapper that is called via the syscall vector table. For paths and structure pointers, the internal version either takes a uio_seg parameter or requires the caller to copyin() the data to kernel memory as appropiate. This will permit emulation layers to use these syscalls without having to copy out translated arguments to the stack gap. Discussed on: -arch Review/suggestions: bde, jhb, peter, marcel
|
#
2149c527 |
|
23-Aug-2002 |
Peter Wemm <peter@FreeBSD.org> |
Move the TAILQ_INIT(&td->td_selq) before the retry: label. Otherwise in some circumstances when we get a select collision, we can end up with cases where we do not clear some sip->si_thread on the way out, leading to page faults in selwakeup(). This should solve the problem where postfix can crash the kernel during select collisions. Reviewed by: alfred
|
#
d49fa1ca |
|
16-Aug-2002 |
Robert Watson <rwatson@FreeBSD.org> |
In continuation of early fileop credential changes, modify fo_ioctl() to accept an 'active_cred' argument reflecting the credential of the thread initiating the ioctl operation. - Change fo_ioctl() to accept active_cred; change consumers of the fo_ioctl() interface to generally pass active_cred from td->td_ucred. - In fifofs, initialize filetmp.f_cred to ap->a_cred so that the invocations of soo_ioctl() are provided access to the calling f_cred. Pass ap->a_td->td_ucred as the active_cred, but note that this is required because we don't yet distinguish file_cred and active_cred in invoking VOP's. - Update kqueue_ioctl() for its new argument. - Update pipe_ioctl() for its new argument, pass active_cred rather than td_ucred to MAC for authorization. - Update soo_ioctl() for its new argument. - Update vn_ioctl() for its new argument, use active_cred rather than td->td_ucred to authorize VOP_IOCTL() and the associated VOP_GETATTR(). Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
ea6027a8 |
|
15-Aug-2002 |
Robert Watson <rwatson@FreeBSD.org> |
Make similar changes to fo_stat() and fo_poll() as made earlier to fo_read() and fo_write(): explicitly use the cred argument to fo_poll() as "active_cred" using the passed file descriptor's f_cred reference to provide access to the file credential. Add an active_cred argument to fo_stat() so that implementers have access to the active credential as well as the file credential. Generally modify callers of fo_stat() to pass in td->td_ucred rather than fp->f_cred, which was redundantly provided via the fp argument. This set of modifications also permits threads to perform these operations on behalf of another thread without modifying their credential. Trickle this change down into fo_stat/poll() implementations: - badfo_poll(), badfo_stat(): modify/add arguments. - kqueue_poll(), kqueue_stat(): modify arguments. - pipe_poll(), pipe_stat(): modify/add arguments, pass active_cred to MAC checks rather than td->td_ucred. - soo_poll(), soo_stat(): modify/add arguments, pass fp->f_cred rather than cred to pru_sopoll() to maintain current semantics. - sopoll(): moidfy arguments. - vn_poll(), vn_statfile(): modify/add arguments, pass new arguments to vn_stat(). Pass active_cred to MAC and fp->f_cred to VOP_POLL() to maintian current semantics. - vn_close(): rename cred to file_cred to reflect reality while I'm here. - vn_stat(): Add active_cred and file_cred arguments to vn_stat() and consumers so that this distinction is maintained at the VFS as well as 'struct file' layer. Pass active_cred instead of td->td_ucred to MAC and to VOP_GETATTR() to maintain current semantics. - fifofs: modify the creation of a "filetemp" so that the file credential is properly initialized and can be used in the socket code if desired. Pass ap->a_td->td_ucred as the active credential to soo_poll(). If we teach the vnop interface about the distinction between file and active credentials, we would use the active credential here. Note that current inconsistent passing of active_cred vs. file_cred to VOP's is maintained. It's not clear why GETATTR would be authorized using active_cred while POLL would be authorized using file_cred at the file system level. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
9ca43589 |
|
15-Aug-2002 |
Robert Watson <rwatson@FreeBSD.org> |
In order to better support flexible and extensible access control, make a series of modifications to the credential arguments relating to file read and write operations to cliarfy which credential is used for what: - Change fo_read() and fo_write() to accept "active_cred" instead of "cred", and change the semantics of consumers of fo_read() and fo_write() to pass the active credential of the thread requesting an operation rather than the cached file cred. The cached file cred is still available in fo_read() and fo_write() consumers via fp->f_cred. These changes largely in sys_generic.c. For each implementation of fo_read() and fo_write(), update cred usage to reflect this change and maintain current semantics: - badfo_readwrite() unchanged - kqueue_read/write() unchanged pipe_read/write() now authorize MAC using active_cred rather than td->td_ucred - soo_read/write() unchanged - vn_read/write() now authorize MAC using active_cred but VOP_READ/WRITE() with fp->f_cred Modify vn_rdwr() to accept two credential arguments instead of a single credential: active_cred and file_cred. Use active_cred for MAC authorization, and select a credential for use in VOP_READ/WRITE() based on whether file_cred is NULL or not. If file_cred is provided, authorize the VOP using that cred, otherwise the active credential, matching current semantics. Modify current vn_rdwr() consumers to pass a file_cred if used in the context of a struct file, and to always pass active_cred. When vn_rdwr() is used without a file_cred, pass NOCRED. These changes should maintain current semantics for read/write, but avoid a redundant passing of fp->f_cred, as well as making it more clear what the origin of each credential is in file descriptor read/write operations. Follow-up commits will make similar changes to other file descriptor operations, and modify the MAC framework to pass both credentials to MAC policy modules so they can implement either semantic for revocation. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
b605b54c |
|
23-Jul-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
Attempt to clarify comment in selrecord.
|
#
d452ec95 |
|
22-Jul-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
remove caddr_t from fo_ioctl calls
|
#
0a3e28cf |
|
22-Jul-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
remove caddr_t
|
#
e602ba25 |
|
29-Jun-2002 |
Julian Elischer <julian@FreeBSD.org> |
Part 1 of KSE-III The ability to schedule multiple threads per process (one one cpu) by making ALL system calls optionally asynchronous. to come: ia64 and power-pc patches, patches for gdb, test program (in tools) Reviewed by: Almost everyone who counts (at various times, peter, jhb, matt, alfred, mini, bernd, and a cast of thousands) NOTE: this is still Beta code, and contains lots of debugging stuff. expect slight instability in signals..
|
#
c33c8251 |
|
20-Jun-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
Implement SO_NOSIGPIPE option for sockets. This allows one to request that an EPIPE error return not generate SIGPIPE on sockets. Submitted by: lioux Inspired by: Darwin
|
#
c4bacc18 |
|
19-Jun-2002 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Remove the compat bits for the mis-aligned struct disklabel on alpha, people got three times longer than I promised. Sponsored by: DARPA & NAI Labs.
|
#
9ae6d334 |
|
11-Jun-2002 |
Kelly Yancey <kbyanc@FreeBSD.org> |
Make nselcol, the number of select collisions since boot, unsigned as negative collisions simply doesn't make sense. PR: (one small part of) 19720 Approved by: alfred
|
#
60a9bb19 |
|
06-Jun-2002 |
John Baldwin <jhb@FreeBSD.org> |
Catch up to changes in ktrace API.
|
#
82641acd |
|
08-May-2002 |
Alan Cox <alc@FreeBSD.org> |
o Correct an error made in revision 1.65: In readv(), if uap->iovcnt is out-of-range, drop the file reference before returning. (This error also exists in the RELENG_4 branch.) o Eliminate the acquisition and release of Giant in readv() now that malloc() and free() are callable without Giant.
|
#
0b5d880d |
|
02-May-2002 |
Poul-Henning Kamp <phk@FreeBSD.org> |
As promised make the hack for sizeof(struct disklabel) on alpha annoying. Run make world (or recompile whatever program whines) to get rid of warning. Compat bits will be removed entirely in about two weeks.
|
#
6008862b |
|
04-Apr-2002 |
John Baldwin <jhb@FreeBSD.org> |
Change callers of mtx_init() to pass in an appropriate lock type name. In most cases NULL is passed, but in some cases such as network driver locks (which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used. Tested on: i386, alpha, sparc64
|
#
f67ad03a |
|
04-Apr-2002 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Delete the bogus d_boot[01] fields from struct disklabel. This shrinks the size 4 bytes on alpha, down to the same 276 bytes as all other platforms. Construct a hack to make old ioctls work on new kernels. Once world is recompiled only the new and correct sysctls will be used. This hack will become annoying around 1st of may to make people rebuild their worlds and it will be gone before 5.0.
|
#
4d77a549 |
|
19-Mar-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
Remove __P.
|
#
628abf6c |
|
15-Mar-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
Giant pushdown for read/write/pread/pwrite syscalls. kern/kern_descrip.c: Aquire Giant in fdrop_locked when file refcount hits zero, this removes the requirement for the caller to own Giant for the most part. kern/kern_ktrace.c: Aquire Giant in ktrgenio, simplifies locking in upper read/write syscalls. kern/vfs_bio.c: Aquire Giant in bwillwrite if needed. kern/sys_generic.c Giant pushdown, remove Giant for: read, pread, write and pwrite. readv and writev aren't done yet because of the possible malloc calls for iov to uio processing. kern/sys_socket.c Grab giant in the socket fo_read/write functions. kern/vfs_vnops.c Grab giant in the vnode fo_read/write functions.
|
#
85f190e4 |
|
13-Mar-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
Fixes to make select/poll mpsafe. Problem: selwakeup required calling pfind which would cause lock order reversals with the allproc_lock and the per-process filedesc lock. Solution: Instead of recording the pid of the select()'ing process into the selinfo structure, actually record a pointer to the thread. To avoid dereferencing a bad address all the selinfo structures that are in use by a thread are kept in a list hung off the thread (protected by sellock). When a selwakeup occurs the selinfo is removed from that threads list, it is also removed on the way out of select or poll where the thread will traverse its list removing all the selinfos from its own list. Problem: Previously the PROC_LOCK was used to provide the mutual exclusion needed to ensure proper locking, this couldn't work because there was a single condvar used for select and poll and condvars can only be used with a single mutex. Solution: Introduce a global mutex 'sellock' which is used to provide mutual exclusion when recording events to wait on as well as performing notification when an event occurs. Interesting note: schedlock is required to manipulate the per-thread TDF_SELECT flag, however if given its own field it would not need schedlock, also because TDF_SELECT is only manipulated under sellock one doesn't actually use schedlock for syncronization, only to protect against corruption. Proc locks are no longer used in select/poll. Portions contributed by: davidc
|
#
bbbb04ce |
|
09-Mar-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
Remove __P
|
#
4658f926 |
|
30-Jan-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
Remove unused variables in select(2) from previous delta. Pointed out by: bde
|
#
eb209311 |
|
29-Jan-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
Attempt to fixup select(2) and poll(2), this should fix some races with other threads as well as speed up the interfaces. To fix the race and accomplish the speedup, remove selholddrop and pollholddrop. The entire concept is somewhat bogus because holding the individual struct file pointers offers us no guarantees that another thread context won't close it on us thereby removing our access to our own reference. Selholddrop and pollholddrop also would do multiple locks and unlocks of mutexes _per-file_ in the fd arrays to be scanned, this needed to be sped up. Instead of using selholddrop and pollholddrop, simply hold the filedesc lock over the selscan and pollscan functions. This should protect us against close(2)'s on the files as reduce the multiple lock/unlock pairs per fd into a single lock over the filedesc.
|
#
97fa4397 |
|
23-Jan-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
make pread use fget_read instead of holdfp.
|
#
aa11a498 |
|
18-Jan-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
undo a bit of the Giant pushdown. fdrop isn't SMP safe as it may call into the file's close routine which definetly is not SMP safe right now, so we hold Giant over calls to fdrop now.
|
#
b5c93a56 |
|
16-Jan-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
Fix giant handling in pwrite(2), I forgot to release it when finishing the syscall.
|
#
a4db4953 |
|
13-Jan-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
Replace ffind_* with fget calls. Make fget MPsafe. Make fgetvp and fgetsock use the fget subsystem to reduce code bloat. Push giant down in fpathconf().
|
#
426da3bc |
|
13-Jan-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
SMP Lock struct file, filedesc and the global file list. Seigo Tanimura (tanimura) posted the initial delta. I've polished it quite a bit reducing the need for locking and adapting it for KSE. Locks: 1 mutex in each filedesc protects all the fields. protects "struct file" initialization, while a struct file is being changed from &badfileops -> &pipeops or something the filedesc should be locked. 1 mutex in each struct file protects the refcount fields. doesn't protect anything else. the flags used for garbage collection have been moved to f_gcflag which was the FILLER short, this doesn't need locking because the garbage collection is a single threaded container. could likely be made to use a pool mutex. 1 sx lock for the global filelist. struct file * fhold(struct file *fp); /* increments reference count on a file */ struct file * fhold_locked(struct file *fp); /* like fhold but expects file to locked */ struct file * ffind_hold(struct thread *, int fd); /* finds the struct file in thread, adds one reference and returns it unlocked */ struct file * ffind_lock(struct thread *, int fd); /* ffind_hold, but returns file locked */ I still have to smp-safe the fget cruft, I'll get to that asap.
|
#
b064d43d |
|
13-Nov-2001 |
Matthew Dillon <dillon@FreeBSD.org> |
remove holdfp() Replace uses of holdfp() with fget*() or fgetvp*() calls as appropriate introduce fget(), fget_read(), fget_write() - these functions will take a thread and file descriptor and return a file pointer with its ref count bumped. introduce fgetvp(), fgetvp_read(), fgetvp_write() - these functions will take a thread and file descriptor and return a vref()'d vnode. *_read() requires that the file pointer be FREAD, *_write that it be FWRITE. This continues the cleanup of struct filedesc and struct file access routines which, when are all through with it, will allow us to then make the API calls MP safe and be able to move Giant down into the fo_* functions.
|
#
fea2ab83 |
|
21-Sep-2001 |
John Baldwin <jhb@FreeBSD.org> |
The P_SELECT flag was moved from p->p_flag to td->td_flags, but p_flag was locked by the proc lock and td_flags is locked by the sched_lock. The places that read, set, and cleared TDF_SELECT weren't updated, so they read and modified td_flags w/o holding the sched_lock, meaning that they could corrupt the per-thread flags field. As an immediate band-aid, grab sched_lock while reading and manipulating td_flags in relation to TDF_SELECT. This will probably be cleaned up some later on.
|
#
b40ce416 |
|
12-Sep-2001 |
Julian Elischer <julian@FreeBSD.org> |
KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
|
#
ad2edad9 |
|
01-Sep-2001 |
Matthew Dillon <dillon@FreeBSD.org> |
Giant Pushdown: read() pread() readv() write () pwrite() writev() ioctl() select () poll() openbsd_poll()
|
#
1b369704 |
|
15-May-2001 |
Seigo Tanimura <tanimura@FreeBSD.org> |
Back out scanning file descriptors with holding a process lock. selrecord() requires allproc sx in pfind(), resulting in lock order reversal between allproc and a process lock.
|
#
265fc98f |
|
13-May-2001 |
Seigo Tanimura <tanimura@FreeBSD.org> |
- Convert msleep(9) in select(2) and poll(2) to cv_*wait*(9). - Since polling should not involve sleeping, keep holding a process lock upon scanning file descriptors. - Hold a reference to every file descriptor prior to entering polling loop in order to avoid lock order reversal between lockmgr and p_mtx upon calling fdrop() in fo_poll(). (NOTE: this work has not been done for netncp and netsmb yet because a socket itself has no reference counts.) Reviewed by: jhb
|
#
33a9ed9d |
|
23-Apr-2001 |
John Baldwin <jhb@FreeBSD.org> |
Change the pfind() and zpfind() functions to lock the process that they find before releasing the allproc lock and returning. Reviewed by: -smp, dfr, jake
|
#
19eb87d2 |
|
06-Mar-2001 |
John Baldwin <jhb@FreeBSD.org> |
Grab the process lock while calling psignal and before calling psignal.
|
#
ea0237ed |
|
27-Feb-2001 |
Jonathan Lemon <jlemon@FreeBSD.org> |
Correctly declare variables as u_int rather than doing typecasts. Kill some register declarations while I'm here. Submitted by: bde (1)
|
#
0b7088c4 |
|
26-Feb-2001 |
Jonathan Lemon <jlemon@FreeBSD.org> |
Cast nfds to u_int before range checking it in order to catch negative values. PR: 25393
|
#
2bd5ac33 |
|
09-Feb-2001 |
Peter Wemm <peter@FreeBSD.org> |
poll(2) array limits (take 2) - after some input from bde.
|
#
9ed346ba |
|
08-Feb-2001 |
Bosko Milekic <bmilekic@FreeBSD.org> |
Change and clean the mutex lock interface. mtx_enter(lock, type) becomes: mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized) similarily, for releasing a lock, we now have: mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument. The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind. Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two: MTX_QUIET and MTX_NOSWITCH The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers: mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively. Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case. Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled. Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those. Finally, caught up to the interface changes in all sys code. Contributors: jake, jhb, jasone (in no particular order)
|
#
89b71647 |
|
07-Feb-2001 |
Peter Wemm <peter@FreeBSD.org> |
The code I picked up from NetBSD in '97 had a nasty bug. It limited the index of the pollfd array to the number of fd's currently open, not the maximum number of fd's. ie: if you had 0,1,2 open, you could not use pollfd slots higher than 20. The specs say we only have to support OPEN_MAX [64] entries but we allow way more than that.
|
#
e04ac2fe |
|
24-Jan-2001 |
John Baldwin <jhb@FreeBSD.org> |
- Catch up to proc flag changes. - Add proc locking for selwakeup() and selrecord().
|
#
0a2c3d48 |
|
08-Jan-2001 |
Garrett Wollman <wollman@FreeBSD.org> |
select() DKI is now in <sys/selinfo.h>.
|
#
a41ce5d3 |
|
07-Dec-2000 |
Matthew Dillon <dillon@FreeBSD.org> |
Only call bwillwrite() for vnodes. Do not penalize devices or pipes.
|
#
9440653d |
|
06-Dec-2000 |
Matthew Dillon <dillon@FreeBSD.org> |
Add necessary bwillwrite() in writev() entry point. Deal with excessive dirty buffers when msync() syncs non-contiguous dirty buffers by checking for the case in UFS *before* checking for clusterability.
|
#
c6ab5768 |
|
30-Nov-2000 |
Alfred Perlstein <alfred@FreeBSD.org> |
only call bwillwrite() to stall on IO when dealing with VNODEs otherwise we will stall on non-disk IO for things like fifos and sockets
|
#
4a476efa |
|
21-Nov-2000 |
Jonathan Lemon <jlemon@FreeBSD.org> |
Protect p_wchan with sched_lock in selwakeup().
|
#
279d7226 |
|
18-Nov-2000 |
Matthew Dillon <dillon@FreeBSD.org> |
This patchset fixes a large number of file descriptor race conditions. Pre-rfork code assumed inherent locking of a process's file descriptor array. However, with the advent of rfork() the file descriptor table could be shared between processes. This patch closes over a dozen serious race conditions related to one thread manipulating the table (e.g. closing or dup()ing a descriptor) while another is blocked in an open(), close(), fcntl(), read(), write(), etc... PR: kern/11629 Discussed with: Alexander Viro <viro@math.psu.edu>
|
#
b31ae1ad |
|
28-Jul-2000 |
Peter Wemm <peter@FreeBSD.org> |
Fix a warning that has been annoying me for some time: "kern/sys_generic.c:358: warning: cast discards qualifiers from pointer target type" The idea for using the uintptr_t intermediate cast for de-constifying a pointer was hinted at by bde some time ago.
|
#
3c89e357 |
|
26-Jul-2000 |
Brian Feldman <green@FreeBSD.org> |
Distinguish between whether ktraceing was enabled before an IO operation or after it. If the ktrace operation was enabled while the process was blocked doing IO, the race would allow it to pass down invalid (uninitialized) data and panic later down the call stack.
|
#
9c386f6b |
|
12-Jul-2000 |
John Baldwin <jhb@FreeBSD.org> |
For infinite timeouts, set both the tv_sec and tv_usec fields to zero in poll() and select(). Noticed by: Wesley Morgan <morganw@chemicals.tacorp.com>
|
#
4da144c0 |
|
12-Jul-2000 |
John Baldwin <jhb@FreeBSD.org> |
Fix a very obscure bug in select() and poll() where the timeout would never expire if poll() or select() was called before the system had been in multiuser for 1 second. This was caused by only checking to see if tv_sec was zero rather than checking both tv_sec and tv_usec.
|
#
7ceba2d7 |
|
07-Jul-2000 |
Brian Feldman <green@FreeBSD.org> |
Remove two micro-pessimizations I made. Bruce is teaching me well :) KTRPOINT(p, KTR_GENIO) is more uncommon than error == 0, so it should be first in the && statement.
|
#
42ebfbf2 |
|
02-Jul-2000 |
Brian Feldman <green@FreeBSD.org> |
Modify ktrace's general I/O tracing, ktrgenio(), to use a struct uio * instead of a struct iovec * array and int len. Get rid of stupidly trying to allocate all of the memory and copyin()ing the entire iovec[], and instead just do the proper VOP_WRITE() in ktrwrite() using a copy of the struct uio that the syscall originally used. This solves the DoS which could easily be performed; to work around the DoS, one could also remove "options KTRACE" from the kernel. This is a very strong MFC candidate for 4.1. Found by: art@OpenBSD.org
|
#
8757e5bb |
|
12-Jun-2000 |
Alfred Perlstein <alfred@FreeBSD.org> |
unstatic getfp() so that other subsystems can use it. make sendfile() use it. Approved by: dg
|
#
d2ba455c |
|
09-May-2000 |
Matthew Dillon <dillon@FreeBSD.org> |
Some ioctl routines assume that the ioctl buffer is aligned, but a char[] declaration makes no such guarentee. A union is used to force alignment of the char buffer.
|
#
f082218c |
|
20-Feb-2000 |
Peter Wemm <peter@FreeBSD.org> |
Fix select(2) for the Alpha. (!!) It was never returning true for fd's in the range of 32-63, 96-127 etc. The first problem was the FD_*() macros were shifting a 32 bit integer "1" left by more than 32 bits. The same problem happened in selscan(). ffs() also takes an int argument and causes failure. For cases where int == long (ie: the usual case for x86, but not always as gcc can have long being a 64 bit quantity) ffs() could be used. Reported by: Marian Stagarescu <marian@bile.skycache.com> Reviewed by: dfr, gallatin (sys/types.h only) Approved by: jkh
|
#
bfbbc4aa |
|
13-Jan-2000 |
Jason Evans <jasone@FreeBSD.org> |
Add aio_waitcomplete(). Make aio work correctly for socket descriptors. Make gratuitous style(9) fixes (me, not the submitter) to make the aio code more readable. PR: kern/12053 Submitted by: Chris Sedore <cmsedore@maxwell.syr.edu>
|
#
8cb96f20 |
|
05-Jan-2000 |
Peter Wemm <peter@FreeBSD.org> |
Export the nselcoll counter via the kern.nselcoll sysctl so we can see just how bad it gets in various situations. Reminded by: adrian
|
#
d8177437 |
|
14-Oct-1999 |
Brian Feldman <green@FreeBSD.org> |
Missed the second argument of fdrop(). Submitted by: jhay
|
#
1aa3e7dd |
|
13-Oct-1999 |
Brian Feldman <green@FreeBSD.org> |
Fix a race condition with shared fd tables and writev(). It's still not safe to consider file table sharing secure. Submitted by: Ville-Pertti Keinonen <will@iki.fi>
|
#
d1f088da |
|
11-Oct-1999 |
Peter Wemm <peter@FreeBSD.org> |
Trim unused options (or #ifdef for undoc options). Submitted by: phk
|
#
13ccadd4 |
|
19-Sep-1999 |
Brian Feldman <green@FreeBSD.org> |
This is what was "fdfix2.patch," a fix for fd sharing. It's pretty far-reaching in fd-land, so you'll want to consult the code for changes. The biggest change is that now, you don't use fp->f_ops->fo_foo(fp, bar) but instead fo_foo(fp, bar), which increments and decrements the fp refcount upon entry and exit. Two new calls, fhold() and fdrop(), are provided. Each does what it seems like it should, and if fdrop() brings the refcount to zero, the fd is freed as well. Thanks to peter ("to hell with it, it looks ok to me.") for his review. Thanks to msmith for keeping me from putting locks everywhere :) Reviewed by: peter
|
#
c3aac50f |
|
27-Aug-1999 |
Peter Wemm <peter@FreeBSD.org> |
$Id$ -> $FreeBSD$
|
#
8fe387ab |
|
04-Apr-1999 |
Dmitrij Tejblum <dt@FreeBSD.org> |
Add standard padding argument to pread and pwrite syscall. That should make them NetBSD compatible. Add parameter to fo_read and fo_write. (The only flag FOF_OFFSET mean that the offset is set in the struct uio). Factor out some common code from read/pread/write/pwrite syscalls.
|
#
4160ccd9 |
|
27-Mar-1999 |
Alan Cox <alc@FreeBSD.org> |
Added pread and pwrite. These functions are defined by the X/Open Threads Extension. (Note: We use the same syscall numbers as NetBSD.) Submitted by: John Plevyak <jplevyak@inktomi.com>
|
#
425c50cf |
|
29-Jan-1999 |
Bruce Evans <bde@FreeBSD.org> |
Removed a bogus cast to c_caddr_t. This is part of terminating c_caddr_t with extreme prejudice. Here the point of the original cast to caddr_t was to break the warning about the const mismatch between write(2)'s `const void *buf' and `struct uio's `char *iov_base' (previous bitrot gave a gratuitous dependency on caddr_t being char *). Compiling with -Wcast-qual made the cast a full no-op. This change has no effect on the warning for discarding `const' on assignment to iov_base. The warning should not be fixed by splitting `struct iovec' into a non-const version for read() and a const version for write(), since correct const poisoning would affect all pointers to i/o addresses. Const'ness should probably be forgotten by not declaring it in syscalls.master.
|
#
d254af07 |
|
27-Jan-1999 |
Matthew Dillon <dillon@FreeBSD.org> |
Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile
|
#
337c9691 |
|
09-Dec-1998 |
Jordan K. Hubbard <jkh@FreeBSD.org> |
poll(2) sets POLLNVAL for descriptors passed in that are less than 0. This makes it difficult to do efficient manipulation of the struct pollfd since you can't leave a slot empty. PR: 8599 Submitted-by: Marc Slemko <marcs@znep.com>
|
#
831d27a9 |
|
11-Nov-1998 |
Don Lewis <truckman@FreeBSD.org> |
Installed the second patch attached to kern/7899 with some changes suggested by bde, a few other tweaks to get the patch to apply cleanly again and some improvements to the comments. This change closes some fairly minor security holes associated with F_SETOWN, fixes a few bugs, and removes some limitations that F_SETOWN had on tty devices. For more details, see the description on the PR. Because this patch increases the size of the proc and pgrp structures, it is necessary to re-install the includes and recompile libkvm, the vinum lkm, fstat, gcore, gdb, ipfilter, ps, top, and w. PR: kern/7899 Reviewed by: bde, elvind
|
#
134e06fe |
|
05-Sep-1998 |
Bruce Evans <bde@FreeBSD.org> |
Fixed bogotification of pseudocode for syscall args by rev.1.53 of syscalls.master.
|
#
069e9bc1 |
|
24-Aug-1998 |
Doug Rabson <dfr@FreeBSD.org> |
Change various syscalls to use size_t arguments instead of u_int. Add some overflow checks to read/write (from bde). Change all modifications to vm_page::flags, vm_page::busy, vm_object::flags and vm_object::paging_in_progress to use operations which are not interruptable. Reviewed by: Bruce Evans <bde@zeta.org.au>
|
#
831b9ef2 |
|
10-Jun-1998 |
Doug Rabson <dfr@FreeBSD.org> |
64bit fixes: use u_long not int for ioctl command.
|
#
c21410e1 |
|
17-May-1998 |
Poul-Henning Kamp <phk@FreeBSD.org> |
s/nanoruntime/nanouptime/g s/microruntime/microuptime/g Reviewed by: bde
|
#
80a39463 |
|
05-Apr-1998 |
Andrey A. Chernov <ache@FreeBSD.org> |
Remove unused atv.tv_usec = 0; from select/poll code
|
#
00af9731 |
|
04-Apr-1998 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Time changes mark 2: * Figure out UTC relative to boottime. Four new functions provide time relative to boottime. * move "runtime" into struct proc. This helps fix the calcru() problem in SMP. * kill mono_time. * add timespec{add|sub|cmp} macros to time.h. (XXX: These may change!) * nanosleep, select & poll takes long sleeps one day at a time Reviewed by: bde Tested by: ache and others
|
#
4ff16568 |
|
02-Apr-1998 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Try to fix poll & select after I broke them.
|
#
227ee8a1 |
|
30-Mar-1998 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Eradicate the variable "time" from the kernel, using various measures. "time" wasn't a atomic variable, so splfoo() protection were needed around any access to it, unless you just wanted the seconds part. Most uses of time.tv_sec now uses the new variable time_second instead. gettime() changed to getmicrotime(0. Remove a couple of unneeded splfoo() protections, the new getmicrotime() is atomic, (until Bruce sets a breakpoint in it). A couple of places needed random data, so use read_random() instead of mucking about with time which isn't random. Add a new nfs_curusec() function. Mark a couple of bogosities involving the now disappeard time variable. Update ffs_update() to avoid the weird "== &time" checks, by fixing the one remaining call that passwd &time as args. Change profiling in ncr.c to use ticks instead of time. Resolution is the same. Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call hzto() which subtracts time" sequences. Reviewed by: bde
|
#
2087c896 |
|
23-Nov-1997 |
Bruce Evans <bde@FreeBSD.org> |
Fixed some style bugs in the poll() code. Removed dead code to "Avoid inadvertently sleeping forever". hzto() never returns 0.
|
#
cb226aaa |
|
06-Nov-1997 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Move the "retval" (3rd) parameter from all syscall functions and put it in struct proc instead. This fixes a boatload of compiler warning, and removes a lot of cruft from the sources. I have not removed the /*ARGSUSED*/, they will require some looking at. libkvm, ps and other userland struct proc frobbing programs will need recompiled.
|
#
a1c995b6 |
|
12-Oct-1997 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Last major round (Unless Bruce thinks of somthing :-) of malloc changes. Distribute all but the most fundamental malloc types. This time I also remembered the trick to making things static: Put "static" in front of them. A couple of finer points by: bde
|
#
55166637 |
|
11-Oct-1997 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Distribute and statizice a lot of the malloc M_* types. Substantial input from: bde
|
#
42d11757 |
|
13-Sep-1997 |
Peter Wemm <peter@FreeBSD.org> |
Implement poll(2). This is mostly taken from the NetBSD implementation (from some time ago) but with a few tweaks along the way. Obtained from: NetBSD
|
#
e4ba6a82 |
|
02-Sep-1997 |
Bruce Evans <bde@FreeBSD.org> |
Removed unused #includes.
|
#
2c1011f7 |
|
15-Jun-1997 |
John Dyson <dyson@FreeBSD.org> |
Modifications to existing files to support the initial AIO/LIO and kernel based threading support.
|
#
20982410 |
|
24-Mar-1997 |
Bruce Evans <bde@FreeBSD.org> |
Don't include <sys/ioctl.h> in the kernel. Stage 4: include <sys/ttycom.h> and sometimes <sys/filio.h> instead of <sys/ioctl.h> in miscellaneous files. Most of these files have nothing to do with ttys but need to include <sys/ttycom.h> to get the definitions of TIOC[SG]PGRP which are (ab)used to convert F[SG]ETOWN fcntls into ioctls.
|
#
3ac4d1ef |
|
22-Mar-1997 |
Bruce Evans <bde@FreeBSD.org> |
Don't #include <sys/fcntl.h> in <sys/file.h> if KERNEL is defined. Fixed everything that depended on getting fcntl.h stuff from the wrong place. Most things don't depend on file.h stuff at all.
|
#
774fce94 |
|
22-Mar-1997 |
Bruce Evans <bde@FreeBSD.org> |
Removed `volatile' from declaration of `time', and removed the resulting null casts. `time' is nonvolatile for accesses within a region locked by splclock()/splx(). Accesses outside such a region are invalid, and splx() must have the side effect of potentially changing all global variables (since there are hundreds of sort of volatile variables like `time'), so declaring `time' as volatile didn't have any real benefits.
|
#
6875d254 |
|
22-Feb-1997 |
Peter Wemm <peter@FreeBSD.org> |
Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
|
#
d5e4d7e1 |
|
20-Feb-1997 |
Bruce Evans <bde@FreeBSD.org> |
Improved select(): - avoid malloc() if the number of fds is small. - pack the bits better so that `small' is quite large. - don't waste time generating zero bits for null fd_set pointers or scanning these bits. Possibly improved select(): - free malloc()ed storage before returning. This is simpler and I think huge select()s aren't worth optimizing since they are rare, relative gain would be small and there would be tiny costs for all selects(). Reviewed by: ache (first version by him too)
|
#
1130b656 |
|
14-Jan-1997 |
Jordan K. Hubbard <jkh@FreeBSD.org> |
Make the long-awaited change from $Id$ to $FreeBSD$ This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
|
#
acbfbfea |
|
20-Aug-1996 |
Sujal Patel <smpatel@FreeBSD.org> |
Fix a minor style error in my code.
|
#
b08f7993 |
|
20-Aug-1996 |
Sujal Patel <smpatel@FreeBSD.org> |
Remove the kernel FD_SETSIZE limit for select(). Make select()'s first argument 'int' not 'u_int'. Reviewed by: bde
|
#
edbfedac |
|
11-Mar-1996 |
Peter Wemm <peter@FreeBSD.org> |
Import 4.4BSD-Lite2 onto the vendor branch, note that in the kernel, all files are off the vendor branch, so this should not change anything. A "U" marker generally means that the file was not changed in between the 4.4Lite and Lite-2 releases, and does not need a merge. "C" generally means that there was a change. [note new unused (in this form) syscalls.conf, to be 'cvs rm'ed]
|
#
db6a20e2 |
|
03-Jan-1996 |
Garrett Wollman <wollman@FreeBSD.org> |
Converted two options over to the new scheme: USER_LDT and KTRACE.
|
#
87b6de2b |
|
14-Dec-1995 |
Poul-Henning Kamp <phk@FreeBSD.org> |
A Major staticize sweep. Generates a couple of warnings that I'll deal with later. A number of unused vars removed. A number of unused procs removed or #ifdefed.
|
#
d2d3e875 |
|
11-Nov-1995 |
Bruce Evans <bde@FreeBSD.org> |
Included <sys/sysproto.h> to get central declarations for syscall args structs and prototypes for syscalls. Ifdefed duplicated decentralized declarations of args structs. It's convenient to have this visible but they are hard to maintain. Some are already different from the central declarations. 4.4lite2 puts them in comments in the function headers but I wanted to avoid the large changes for that.
|
#
7147b19d |
|
10-Nov-1995 |
Bruce Evans <bde@FreeBSD.org> |
Fixed the type of readv(). An args struct member name conflicted with the machine-generated one in <sys/sysproto.h>.
|
#
c3e5be20 |
|
10-Oct-1995 |
Steven Wallace <swallace@FreeBSD.org> |
Remove the ugly COMPAT_IBCS2 hack to hide a return value through magic numbers. The new socksys support does not need this hack. I am against any magic practicing.
|
#
9b2e5354 |
|
30-May-1995 |
Rodney W. Grimes <rgrimes@FreeBSD.org> |
Remove trailing whitespace.
|
#
f153fb6e |
|
13-Apr-1995 |
David Greenman <dg@FreeBSD.org> |
Backed out previous change - it reduces performance. (oops).
|
#
cf0ec51a |
|
13-Apr-1995 |
David Greenman <dg@FreeBSD.org> |
Slight optimization to select().
|
#
2b101991 |
|
13-Oct-1994 |
Søren Schmidt <sos@FreeBSD.org> |
Damn, check in the wrong version, fixed. Reviewed by: Submitted by: Obtained from:
|
#
8117efaa |
|
13-Oct-1994 |
Søren Schmidt <sos@FreeBSD.org> |
Made it possible for ioctl to return a value. Ifdef by COMPAT_IBCS2 (used by the socksys system). Submitted by: Mostyn Lewis (mostyn@mrl.com)
|
#
d93f860c |
|
09-Oct-1994 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Cosmetics. related to getting prototypes into view.
|
#
797f2d22 |
|
02-Oct-1994 |
Poul-Henning Kamp <phk@FreeBSD.org> |
All of this is cosmetic. prototypes, #includes, printfs and so on. Makes GCC a lot more silent.
|
#
bb56ec4a |
|
25-Sep-1994 |
Poul-Henning Kamp <phk@FreeBSD.org> |
While in the real world, I had a bad case of being swapped out for a lot of cycles. While waiting there I added a lot of the extra ()'s I have, (I have never used LISP to any extent). So I compiled the kernel with -Wall and shut up a lot of "suggest you add ()'s", removed a bunch of unused var's and added a couple of declarations here and there. Having a lap-top is highly recommended. My kernel still runs, yell at me if you kernel breaks.
|
#
90324b07 |
|
02-Sep-1994 |
David Greenman <dg@FreeBSD.org> |
Whoops, accidently left out some pieces of the munmapfd patch.
|
#
dd968eae |
|
02-Sep-1994 |
David Greenman <dg@FreeBSD.org> |
Make sure that uio_resid isn't negative in read().
|
#
3c4dd356 |
|
02-Aug-1994 |
David Greenman <dg@FreeBSD.org> |
Added $Id$
|
#
26f9a767 |
|
25-May-1994 |
Rodney W. Grimes <rgrimes@FreeBSD.org> |
The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch. Reviewed by: Rodney W. Grimes Submitted by: John Dyson and David Greenman
|
#
df8bae1d |
|
24-May-1994 |
Rodney W. Grimes <rgrimes@FreeBSD.org> |
BSD 4.4 Lite Kernel Sources
|