#
f28526e9 |
|
19-Jan-2024 |
Konstantin Belousov <kib@FreeBSD.org> |
kcmp(2): implement for generic file types Reviewed by: brooks, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D43518
|
#
168c7580 |
|
19-Jan-2024 |
Konstantin Belousov <kib@FreeBSD.org> |
file: add fo_cmp method The method should return 0 if the file' underlying objects are same. In other words, if 0 is returned, io from either of file causes modifications of the same object. Reviewed by: brooks, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D43518
|
#
58d31716 |
|
22-Jan-2024 |
Konstantin Belousov <kib@FreeBSD.org> |
Add fget_remote() The function holds and returns struct file for a file descriptor index in the given process. Reviewed by: brooks, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D43518
|
#
24fee977 |
|
20-Jan-2024 |
Konstantin Belousov <kib@FreeBSD.org> |
sys/file.h: style Reviewed by: brooks, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D43518
|
#
55edc40e |
|
04-Jan-2024 |
Mark Johnston <markj@FreeBSD.org> |
file: Remove the fd parameter to fgetvp_lookup() and fgetvp_lookup_smr() The fd is always obtained from nameidata, so just fetch it from there instead. No functional change intended. Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D43257
|
#
29363fb4 |
|
23-Nov-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove ancient SCCS tags. Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script. Sponsored by: Netflix
|
#
af93fea7 |
|
23-Aug-2023 |
Jake Freeland <jfree@freebsd.org> |
timerfd: Move implementation from linux compat to sys/kern Move the timerfd impelemntation from linux compat code to sys/kern. Use it to implement the new system calls for timerfd. Add a hook to kern_tc to allow timerfd to know when the system time has stepped. Add kqueue support to timerfd. Adjust a few names to be less Linux centric. RelNotes: YES Reviewed by: markj (on irc), imp, kib (with reservations), jhb (slack) Differential Revision: https://reviews.freebsd.org/D38459
|
#
2ff63af9 |
|
16-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove $FreeBSD$: one-line .h pattern Remove /^\s*\*+\s*\$FreeBSD\$.*$\n/
|
#
e28932c6 |
|
08-Dec-2022 |
Ka Ho Ng <khng@FreeBSD.org> |
vfs: Add spare fileops function pointer slots This allows backporting of new fileops function pointers while preserving KBI. Bump __FreeBSD_version. Sponsored by: Juniper Networks, Inc. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D37636
|
#
0c805718 |
|
24-Mar-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: fix memory leak on lookup with fds with ioctl caps Reviewed by: markj PR: 262515 Noted by: firk@cantconnect.ru Differential Revision: https://reviews.freebsd.org/D34667
|
#
a2b53e53 |
|
24-Nov-2021 |
Warner Losh <imp@FreeBSD.org> |
sys/file.h: Allow inclusion when compiling for a strict namespace Although not part of the standard, this file is sometimes included with -D_POSIX_C_SOURCE=<value> or -D_XOPEN_SOURCE=<value>. Limit those sturctures that use types hidden by __BSD_VISIBLE to when they are visible. PR: 259975, 234205 Sponsored by: Netflix
|
#
2b68eb8e |
|
01-Oct-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: remove thread argument from VOP_STAT and fo_stat.
|
#
0dc332bf |
|
05-Aug-2021 |
Ka Ho Ng <khng@FreeBSD.org> |
Add fspacectl(2), vn_deallocate(9) and VOP_DEALLOCATE(9). fspacectl(2) is a system call to provide space management support to userspace applications. VOP_DEALLOCATE(9) is a VOP call to perform the deallocation. vn_deallocate(9) is a public KPI for kmods' use. The purpose of proposing a new system call, a KPI and a VOP call is to allow bhyve or other hypervisor monitors to emulate the behavior of SCSI UNMAP/NVMe DEALLOCATE on a plain file. fspacectl(2) comprises of cmd and flags parameters to specify the space management operation to be performed. Currently cmd has to be SPACECTL_DEALLOC, and flags has to be 0. fo_fspacectl is added to fileops. VOP_DEALLOCATE(9) is added as a new VOP call. A trivial implementation of VOP_DEALLOCATE(9) is provided. Sponsored by: The FreeBSD Foundation Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D28347
|
#
bbf7a4e8 |
|
07-Apr-2021 |
Konstantin Belousov <kib@FreeBSD.org> |
O_PATH: allow vnode kevent filter on such files if VREAD access is checked as allowed during open Requested by: wulf Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D29323
|
#
8d9ed174 |
|
17-Mar-2021 |
Konstantin Belousov <kib@FreeBSD.org> |
open(2): Implement O_PATH Reviewed by: markj Tested by: pho Discussed with: walker.aj325_gmail.com, wulf Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D29323
|
#
437c241d |
|
17-Mar-2021 |
Konstantin Belousov <kib@FreeBSD.org> |
vfs_vnops.c: Make vn_statfile() non-static Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D29323
|
#
7a202823 |
|
23-Dec-2020 |
Konstantin Belousov <kib@FreeBSD.org> |
Expose eventfd in the native API/ABI using a new __specialfd syscall eventfd is a Linux system call that produces special file descriptors for event notification. When porting Linux software, it is currently usually emulated by epoll-shim on top of kqueues. Unfortunately, kqueues are not passable between processes. And, as noted by the author of epoll-shim, even if they were, the library state would also have to be passed somehow. This came up when debugging strange HW video decode failures in Firefox. A native implementation would avoid these problems and help with porting Linux software. Since we now already have an eventfd implementation in the kernel (for the Linuxulator), it's pretty easy to expose it natively, which is what this patch does. Submitted by: greg@unrelenting.technology Reviewed by: markj (previous version) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D26668
|
#
edcdcefb |
|
13-Dec-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
fd: fix fdrop prediction when closing a fd Most of the time this is the last reference, contrary to typical fdrop use.
|
#
1463aa8c |
|
18-Nov-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
fd: reorder struct file to reduce false sharing The size on LP64 is 80 bytes, which is just more than a cacheline, does not lend itself to easy shrinking and rounding up to 2 would be a huge waste given NOFREE marker. The least which can be done is to reorder it so that most commonly used fields are less likely to span different lines, and consequently suffer less false sharing. With the change at hand most commonly used fields land in the same line about 3/4 of the time, as opposed to 2/4.
|
#
dd28b379 |
|
09-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: support lockless dirfd lookups
|
#
96474d2a |
|
15-Sep-2020 |
Konstantin Belousov <kib@FreeBSD.org> |
Do not copy vp into f_data for DTYPE_VNODE files. The pointer to vnode is already stored into f_vnode, so f_data can be reused. Fix all found users of f_data for DTYPE_VNODE. Provide finit_vnode() helper to initialize file of DTYPE_VNODE type. Reviewed by: markj (previous version) Discussed with: freqlabs (openzfs chunk) Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26346
|
#
551cf5ab |
|
30-Jul-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
fd: predict in fdrop
|
#
f2706588 |
|
21-Jun-2020 |
Thomas Munro <tmunro@FreeBSD.org> |
vfs: track sequential reads and writes separately For software like PostgreSQL and SQLite that sometimes reads sequentially while also writing sequentially some distance behind with interleaved syscalls on the same fd, performance is better on UFS if we do sequential access heuristics separately for reads and writes. Patch originally by Andrew Gierth in 2008, updated and proposed by me with his permission. Reviewed by: mjg, kib, tmunro Approved by: mjg (mentor) Obtained from: Andrew Gierth <andrew@tao11.riddles.org.uk> Differential Revision: https://reviews.freebsd.org/D25024
|
#
b4ee4ce1 |
|
01-Jun-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
Remove ->f_label from struct file The field was added in r141137 in 2005 and is unused. It avoidably grows a struct which is NOFREE and easily gets hundreds of thousands of instances. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D25036
|
#
3fc40153 |
|
23-May-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: scale foffset_lock by using atomics instead of serializing on mtx pool Contending cases still serialize on sleepq (which would be taken anyway). Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D21626
|
#
0f5f49ef |
|
13-Feb-2020 |
Kyle Evans <kevans@FreeBSD.org> |
u_char -> vm_prot_t in a couple of places, NFC The latter is a typedef of the former; the typedef exists and these bits are representing vmprot values, so use the correct type. Submitted by: sigsys@gmail.com MFC after: 3 days
|
#
2856d85e |
|
08-Jan-2020 |
Kyle Evans <kevans@FreeBSD.org> |
posix_fallocate: push vnop implementation into the fileop layer This opens the door for other descriptor types to implement posix_fallocate(2) as needed. Reviewed by: kib, bcr (manpages) Differential Revision: https://reviews.freebsd.org/D23042
|
#
55eb92db |
|
11-Dec-2019 |
Mateusz Guzik <mjg@FreeBSD.org> |
fd: static-ize and devolatile openfiles Almost all access is using atomics. The only read is sysctl which should use a whole-int-at-a-time friendly read internally.
|
#
dfa8dae4 |
|
05-Oct-2019 |
Mateusz Guzik <mjg@FreeBSD.org> |
devfs: plug redundant bwillwrite avoidance vn_write already checks for vnode type to see if bwillwrite should be called. This effectively reverts r244643. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21905
|
#
af755d3e |
|
25-Sep-2019 |
Kyle Evans <kevans@FreeBSD.org> |
[1/3] Add mostly Linux-compatible file sealing support File sealing applies protections against certain actions (currently: write, growth, shrink) at the inode level. New fileops are added to accommodate seals - EINVAL is returned by fcntl(2) if they are not implemented. Reviewed by: markj, kib Differential Revision: https://reviews.freebsd.org/D21391
|
#
f1cf2b9d |
|
21-Jul-2019 |
Konstantin Belousov <kib@FreeBSD.org> |
Check and avoid overflow when incrementing fp->f_count in fget_unlocked() and fhold(). On sufficiently large machine, f_count can be legitimately very large, e.g. malicious code can dup same fd up to the per-process filedescriptors limit, and then fork as much as it can. On some smaller machine, I see kern.maxfilesperproc: 939132 kern.maxprocperuid: 34203 which already overflows u_int. More, the malicious code can create transient references by sending fds over unix sockets. I realized that this check is missed after reading https://secfault-security.com/blog/FreeBSD-SA-1902.fd.html Reviewed by: markj (previous version), mjg Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D20947
|
#
38b06f8a |
|
20-Jun-2019 |
Alan Somers <asomers@FreeBSD.org> |
fcntl: fix overflow when setting F_READAHEAD VOP_READ and VOP_WRITE take the seqcount in blocks in a 16-bit field. However, fcntl allows you to set the seqcount in bytes to any nonnegative 31-bit value. The result can be a 16-bit overflow, which will be sign-extended in functions like ffs_read. Fix this by sanitizing the argument in kern_fcntl. As a matter of policy, limit to IO_SEQMAX rather than INT16_MAX. Also, fifos have overloaded the f_seqcount field for a completely different purpose ever since r238936. Formalize that by using a union type. Reviewed by: cem MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20710
|
#
f38b68ae |
|
05-Jul-2018 |
Brooks Davis <brooks@FreeBSD.org> |
Make struct xinpcb and friends word-size independent. Replace size_t members with ksize_t (uint64_t) and pointer members (never used as pointers in userspace, but instead as unique idenitifiers) with kvaddr_t (uint64_t). This makes the structs identical between 32-bit and 64-bit ABIs. On 64-bit bit systems, the ABI is maintained. On 32-bit systems, this is an ABI breaking change. The ABI of most of these structs was previously broken in r315662. This also imposes a small API change on userspace consumers who must handle kernel pointers becoming virtual addresses. PR: 228301 (exp-run by antoine) Reviewed by: jtl, kib, rwatson (various versions) Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D15386
|
#
51369649 |
|
20-Nov-2017 |
Pedro F. Giffuni <pfg@FreeBSD.org> |
sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.
|
#
fbbd9655 |
|
28-Feb-2017 |
Warner Losh <imp@FreeBSD.org> |
Renumber copyright clause 4 Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96
|
#
dd93b628 |
|
26-Feb-2017 |
Dmitry Chagin <dchagin@FreeBSD.org> |
Implement timerfd family syscalls. MFC after: 1 month
|
#
e83a71c6 |
|
10-Feb-2017 |
Konstantin Belousov <kib@FreeBSD.org> |
Fix r313495. The file type DTYPE_VNODE can be assigned as a fallback if VOP_OPEN() did not initialized file type. This is a typical code path used by normal file systems. Also, change error returned for inappropriate file type used for O_EXLOCK to EOPNOTSUPP, as declared in the open(2) man page. Reported by: cy, dhw, Iblis Lin <iblis@hs.ntnu.edu.tw> Tested by: dhw Sponsored by: The FreeBSD Foundation MFC after: 13 days
|
#
4fce19da |
|
13-Jan-2017 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Remove deprecated fgetsock() and fputsock().
|
#
9c64cfe5 |
|
29-Mar-2016 |
Gleb Smirnoff <glebius@FreeBSD.org> |
The sendfile(2) allows to send extra data from userspace before the file data (headers). Historically the size of the headers was not checked against the socket buffer space. Application could easily overcommit the socket buffer space. With the new sendfile (r293439) the problem remained, but a KASSERT was inserted that checked that amount of data written to the socket matches its space. In case when size of headers is bigger that socket space, KASSERT fires. Without INVARIANTS the new sendfile won't panic, but would report incorrect amount of bytes sent. o With this change, the headers copyin is moved down into the cycle, after the sbspace() check. The uio size is trimmed by socket space there, which fixes the overcommit problem and its consequences. o The compatibility handling for FreeBSD 4 sendfile headers API is pushed up the stack to syscall wrappers. This required a copy and paste of the code, but in turn this allowed to remove extra stack carried parameter from fo_sendfile_t, and embrace entire compat code into #ifdef. If in future we got more fo_sendfile_t function, the copy and paste level would even reduce. Reviewed by: emax, gallatin, Maxim Dounin <mdounin mdounin.ru> Tested by: Vitalij Satanivskij <satan ukr.net> Sponsored by: Netflix
|
#
f3215338 |
|
01-Mar-2016 |
John Baldwin <jhb@FreeBSD.org> |
Refactor the AIO subsystem to permit file-type-specific handling and improve cancellation robustness. Introduce a new file operation, fo_aio_queue, which is responsible for queueing and completing an asynchronous I/O request for a given file. The AIO subystem now exports library of routines to manipulate AIO requests as well as the ability to run a handler function in the "default" pool of AIO daemons to service a request. A default implementation for file types which do not include an fo_aio_queue method queues requests to the "default" pool invoking the fo_read or fo_write methods as before. The AIO subsystem permits file types to install a private "cancel" routine when a request is queued to permit safe dequeueing and cleanup of cancelled requests. Sockets now use their own pool of AIO daemons and service per-socket requests in FIFO order. Socket requests will not block indefinitely permitting timely cancellation of all requests. Due to the now-tight coupling of the AIO subsystem with file types, the AIO subsystem is now a standard part of all kernels. The VFS_AIO kernel option and aio.ko module are gone. Many file types may block indefinitely in their fo_read or fo_write callbacks resulting in a hung AIO daemon. This can result in hung user processes (when processes attempt to cancel all outstanding requests during exit) or a hung system. To protect against this, AIO requests are only permitted for known "safe" files by default. AIO requests for all file types can be enabled by setting the new vfs.aio.enable_usafe sysctl to a non-zero value. The AIO tests have been updated to skip operations on unsafe file types if the sysctl is zero. Currently, AIO requests on sockets and raw disks are considered safe and are enabled by default. aio_mlock() is also enabled by default. Reviewed by: cem, jilles Discussed with: kib (earlier version) Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D5289
|
#
3138cd36 |
|
30-Sep-2015 |
Mark Johnston <markj@FreeBSD.org> |
As a step towards the elimination of PG_CACHED pages, rework the handling of POSIX_FADV_DONTNEED so that it causes the backing pages to be moved to the head of the inactive queue instead of being cached. This affects the implementation of POSIX_FADV_NOREUSE as well, since it works by applying POSIX_FADV_DONTNEED to file ranges after they have been read or written. At that point the corresponding buffers may still be dirty, so the previous implementation would coalesce successive ranges and apply POSIX_FADV_DONTNEED to the result, ensuring that pages backing the dirty buffers would eventually be cached. To preserve this behaviour in an efficient manner, this change adds a new buf flag, B_NOREUSE, which causes the pages backing a VMIO buf to be placed at the head of the inactive queue when the buf is released. POSIX_FADV_NOREUSE then works by setting this flag in bufs that underlie the specified range. Reviewed by: alc, kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3726
|
#
7077c426 |
|
04-Jun-2015 |
John Baldwin <jhb@FreeBSD.org> |
Add a new file operations hook for mmap operations. File type-specific logic is now placed in the mmap hook implementation rather than requiring it to be placed in sys/vm/vm_mmap.c. This hook allows new file types to support mmap() as well as potentially allowing mmap() for existing file types that do not currently support any mapping. The vm_mmap() function is now split up into two functions. A new vm_mmap_object() function handles the "back half" of vm_mmap() and accepts a referenced VM object to map rather than a (handle, handle_type) tuple. vm_mmap() is now reduced to converting a (handle, handle_type) tuple to a a VM object and then calling vm_mmap_object() to handle the actual mapping. The vm_mmap() function remains for use by other parts of the kernel (e.g. device drivers and exec) but now only supports mapping vnodes, character devices, and anonymous memory. The mmap() system call invokes vm_mmap_object() directly with a NULL object for anonymous mappings. For mappings using a file descriptor, the descriptors fo_mmap() hook is invoked instead. The fo_mmap() hook is responsible for performing type-specific checks and adjustments to arguments as well as possibly modifying mapping parameters such as flags or the object offset. The fo_mmap() hook routines then call vm_mmap_object() to handle the actual mapping. The fo_mmap() hook is optional. If it is not set, then fo_mmap() will fail with ENODEV. A fo_mmap() hook is implemented for regular files, character devices, and shared memory objects (created via shm_open()). While here, consistently use the VM_PROT_* constants for the vm_prot_t type for the 'prot' variable passed to vm_mmap() and vm_mmap_object() as well as the vm_mmap_vnode() and vm_mmap_cdev() helper routines. Previously some places were using the mmap()-specific PROT_* constants instead. While this happens to work because PROT_xx == VM_PROT_xx, using VM_PROT_* is more correct. Differential Revision: https://reviews.freebsd.org/D2658 Reviewed by: alc (glanced over), kib MFC after: 1 month Sponsored by: Chelsio
|
#
a31d7686 |
|
24-May-2015 |
Dmitry Chagin <dchagin@FreeBSD.org> |
Implement eventfd system call. Differential Revision: https://reviews.freebsd.org/D1094 In collaboration with: Jilles Tjoelker
|
#
b7a39e9e |
|
17-Feb-2015 |
Mateusz Guzik <mjg@FreeBSD.org> |
filedesc: simplify fget_unlocked & friends Introduce fget_fcntl which performs appropriate checks when needed. This removes a branch from fget_unlocked. Introduce fget_mmap dealing with cap_rights_to_vmprot conversion. This removes a branch from _fget. Modify fget_unlocked to pass sequence counter to interested callers so that they can perform their own checks and make sure the result was otained from stable & current state. Reviewed by: silence on -hackers
|
#
0e87b36e |
|
11-Nov-2014 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Remove SF_KQUEUE code. This code was developed at Netflix, but was not ever used. It didn't go into stable/10, neither was documented. It might be useful, but we collectively decided to remove it, rather leave it abandoned and unmaintained. It is removed in one single commit, so restoring it should be easy, if anyone wants to reopen this idea. Sponsored by: Netflix
|
#
9696feeb |
|
22-Sep-2014 |
John Baldwin <jhb@FreeBSD.org> |
Add a new fo_fill_kinfo fileops method to add type-specific information to struct kinfo_file. - Move the various fill_*_info() methods out of kern_descrip.c and into the various file type implementations. - Rework the support for kinfo_ofile to generate a suitable kinfo_file object for each file and then convert that to a kinfo_ofile structure rather than keeping a second, different set of code that directly manipulates type-specific file information. - Remove the shm_path() and ksem_info() layering violations. Differential Revision: https://reviews.freebsd.org/D775 Reviewed by: kib, glebius (earlier version)
|
#
2d69d0dc |
|
12-Sep-2014 |
John Baldwin <jhb@FreeBSD.org> |
Fix various issues with invalid file operations: - Add invfo_rdwr() (for read and write), invfo_ioctl(), invfo_poll(), and invfo_kqfilter() for use by file types that do not support the respective operations. Home-grown versions of invfo_poll() were universally broken (they returned an errno value, invfo_poll() uses poll_no_poll() to return an appropriate event mask). Home-grown ioctl routines also tended to return an incorrect errno (invfo_ioctl returns ENOTTY). - Use the invfo_*() functions instead of local versions for unsupported file operations. - Reorder fileops members to match the order in the structure definition to make it easier to spot missing members. - Add several missing methods to linuxfileops used by the OFED shim layer: fo_write(), fo_truncate(), fo_kqfilter(), and fo_stat(). Most of these used invfo_*(), but a dummy fo_stat() implementation was added.
|
#
e86447ca |
|
26-Aug-2014 |
Gleb Smirnoff <glebius@FreeBSD.org> |
- Remove socket file operations declaration from sys/file.h. - Make them static in sys_socket.c. - Provide generic invfo_truncate() instead of soo_truncate(). Sponsored by: Netflix Sponsored by: Nginx, Inc.
|
#
037755fd |
|
26-Aug-2014 |
Mateusz Guzik <mjg@FreeBSD.org> |
Fix up races with f_seqcount handling. It was possible that the kernel would overwrite user-supplied hint. Abuse vnode lock for this purpose. In collaboration with: kib MFC after: 1 week
|
#
79750e3b |
|
30-Nov-2013 |
Adrian Chadd <adrian@FreeBSD.org> |
Migrate the sendfile_sync structure into a public(ish) API in preparation for extending and reusing it. The sendfile_sync wrapper is mostly just a "mbuf transaction" wrapper, used to indicate that the backing store for a group of mbufs has completed. It's only being used by sendfile for now and it's only implementing a sleep/wakeup rendezvous. However, there are other potential signaling paths (kqueue) and other potential uses (socket zero-copy write) where the same mechanism would also be useful. So, with that in mind: * extract the sendfile_sync code out into sf_sync_*() methods * teach the sf_sync_alloc method about the current config flag - it will eventually know about kqueue. * move the sendfile_sync code out of do_sendfile() - the only thing it now knows about is the sfs pointer. The guts of the sync rendezvous (setup, rendezvous/wait, free) is now done in the syscall wrapper. * .. and teach the 32-bit compat sendfile call the same. This should be a no-op. It's primarily preparation work for teaching the sendfile_sync about kqueue notification. Tested: * Peter Holm's sendfile stress / regression scripts Sponsored by: Netflix, Inc.
|
#
b12698e1 |
|
18-Sep-2013 |
Roman Divacky <rdivacky@FreeBSD.org> |
Revert r255672, it has some serious flaws, leaking file references etc. Approved by: re (delphij)
|
#
253c75c0 |
|
18-Sep-2013 |
Roman Divacky <rdivacky@FreeBSD.org> |
Implement epoll support in Linuxulator. This is a tiny wrapper around kqueue to implement epoll subset of functionality. The kqueue user data are 32bit on i386 which is not enough for epoll user data so this patch overrides kqueue fileops to maintain enough space in struct file. Initial patch developed by me in 2007 and then extended and finished by Yuri Victorovich. Approved by: re (delphij) Sponsored by: Google Summer of Code Submitted by: Yuri Victorovich <yuri at rawbw dot com> Tested by: Yuri Victorovich <yuri at rawbw dot com>
|
#
7008be5b |
|
04-Sep-2013 |
Pawel Jakub Dawidek <pjd@FreeBSD.org> |
Change the cap_rights_t type from uint64_t to a structure that we can extend in the future in a backward compatible (API and ABI) way. The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough. The structure definition looks like this: struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; }; The initial CAP_RIGHTS_VERSION is 0. The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements. The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future. To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg. #define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL) We still support aliases that combine few rights, but the rights have to belong to the same array element, eg: #define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL) #define CAP_FCHMODAT (CAP_FCHMOD | CAP_LOOKUP) There is new API to manage the new cap_rights_t structure: cap_rights_t *cap_rights_init(cap_rights_t *rights, ...); void cap_rights_set(cap_rights_t *rights, ...); void cap_rights_clear(cap_rights_t *rights, ...); bool cap_rights_is_set(const cap_rights_t *rights, ...); bool cap_rights_is_valid(const cap_rights_t *rights); void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src); void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src); bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little); Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg: cap_rights_t rights; cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT); There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg: #define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...); Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1: cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL); Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition. This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x. Sponsored by: The FreeBSD Foundation
|
#
c0a46535 |
|
21-Aug-2013 |
Konstantin Belousov <kib@FreeBSD.org> |
Make the seek a method of the struct fileops. Tested by: pho Sponsored by: The FreeBSD Foundation
|
#
b1dd38f4 |
|
16-Aug-2013 |
Konstantin Belousov <kib@FreeBSD.org> |
Restore the previous sendfile(2) behaviour on the block devices. Provide valid .fo_sendfile method for several missed struct fileops. Reviewed by: glebius Sponsored by: The FreeBSD Foundation
|
#
ca04d21d |
|
15-Aug-2013 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Make sendfile() a method in the struct fileops. Currently only vnode backed file descriptors have this method implemented. Reviewed by: kib Sponsored by: Nginx, Inc. Sponsored by: Netflix
|
#
2609222a |
|
01-Mar-2013 |
Pawel Jakub Dawidek <pjd@FreeBSD.org> |
Merge Capsicum overhaul: - Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights. - The cap_new(2) system call is left, but it is no longer documented and should not be used in new code. - The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one. - The cap_getrights(2) syscall is renamed to cap_rights_get(2). - If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall. - If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2). - To support ioctl and fcntl white-listing the filedesc structure was heavly modified. - The audit subsystem, kdump and procstat tools were updated to recognize new syscalls. - Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below: CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT. Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2). Added CAP_SYMLINKAT: - Allow for symlinkat(2). Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2). Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory. Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall. Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call. Removed CAP_MAPEXEC. CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE. Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ | PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ | PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE | PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ | PROT_WRITE | PROT_EXEC). Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT. CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required). CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required). Added convinient defines: #define CAP_PREAD (CAP_SEEK | CAP_READ) #define CAP_PWRITE (CAP_SEEK | CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP | CAP_SEEK | CAP_READ) #define CAP_MMAP_W (CAP_MMAP | CAP_SEEK | CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP | CAP_SEEK | 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R | CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R | CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W | CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R | CAP_MMAP_W | CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE #define CAP_SOCK_CLIENT \ (CAP_CONNECT | CAP_GETPEERNAME | CAP_GETSOCKNAME | CAP_GETSOCKOPT | \ CAP_PEELOFF | CAP_RECV | CAP_SEND | CAP_SETSOCKOPT | CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT | CAP_BIND | CAP_GETPEERNAME | CAP_GETSOCKNAME | \ CAP_GETSOCKOPT | CAP_LISTEN | CAP_PEELOFF | CAP_RECV | CAP_SEND | \ CAP_SETSOCKOPT | CAP_SHUTDOWN) Added defines for backward API compatibility: #define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT | CAP_SOCK_SERVER) Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib
|
#
ad9789f6 |
|
23-Dec-2012 |
Konstantin Belousov <kib@FreeBSD.org> |
Do not force a writer to the devfs file to drain the buffer writes. Requested and tested by: Ian Lepore <freebsd@damnhippie.dyndns.org> MFC after: 2 weeks
|
#
28a7f607 |
|
07-Jul-2012 |
Mateusz Guzik <mjg@FreeBSD.org> |
Unbreak handling of descriptors opened with O_EXEC by fexecve(2). While here return EBADF for descriptors opened for writing (previously it was ETXTBSY). Add fgetvp_exec function which performs appropriate checks. PR: kern/169651 In collaboration with: kib Approved by: trasz (mentor) MFC after: 1 week
|
#
c5c1199c |
|
02-Jul-2012 |
Konstantin Belousov <kib@FreeBSD.org> |
Extend the KPI to lock and unlock f_offset member of struct file. It now fully encapsulates all accesses to f_offset, and extends f_offset locking to other consumers that need it, in particular, to lseek() and variants of getdirentries(). Ensure that on 32bit architectures f_offset, which is 64bit quantity, always read and written under the mtxpool protection. This fixes apparently easy to trigger race when parallel lseek()s or lseek() and read/write could destroy file offset. The already broken ABI emulations, including iBCS and SysV, are not converted (yet). Tested by: pho No objections from: jhb MFC after: 3 weeks
|
#
cd4ecf3c |
|
19-Jun-2012 |
John Baldwin <jhb@FreeBSD.org> |
Further refine the implementation of POSIX_FADV_NOREUSE. First, extend the changes in r230782 to better handle the common case of using NOREUSE with sequential reads. A NOREUSE file descriptor will now track the last implicit DONTNEED request it made as a result of a NOREUSE read. If a subsequent NOREUSE read is adjacent to the previous range, it will apply the DONTNEED request to the entire range of both the previous read and the current read. The effect is that each read of a file accessed sequentially will apply the DONTNEED request to the entire range that has been read. This allows NOREUSE to properly handle misaligned reads by flushing each buffer to cache once it has been completely read. Second, apply the same changes made to read(2) by r230782 and this change to writes. This provides much better performance in the sequential write case as it allows writes to still be clustered. It also provides much better performance for misaligned writes. It does mean that NOREUSE will be generally ineffective for non-sequential writes as the current implementation relies on a future NOREUSE write's implicit DONTNEED request to flush the dirty buffer from the current write. MFC after: 2 weeks
|
#
8f80f103 |
|
06-Nov-2011 |
Ed Schouten <ed@FreeBSD.org> |
Remove MALLOC_DECLAREs of nonexisting malloc-pools. After careful grepping, it seems none of these pools can be found in our source tree. They are not in use, nor are they defined.
|
#
936c09ac |
|
03-Nov-2011 |
John Baldwin <jhb@FreeBSD.org> |
Add the posix_fadvise(2) system call. It is somewhat similar to madvise(2) except that it operates on a file descriptor instead of a memory region. It is currently only supported on regular files. Just as with madvise(2), the advice given to posix_fadvise(2) can be divided into two types. The first type provide hints about data access patterns and are used in the file read and write routines to modify the I/O flags passed down to VOP_READ() and VOP_WRITE(). These modes are thus filesystem independent. Note that to ease implementation (and since this API is only advisory anyway), only a single non-normal range is allowed per file descriptor. The second type of hints are used to hint to the OS that data will or will not be used. These hints are implemented via a new VOP_ADVISE(). A default implementation is provided which does nothing for the WILLNEED request and attempts to move any clean pages to the cache page queue for the DONTNEED request. This latter case required two other changes. First, a new V_CLEANONLY flag was added to vinvalbuf(). This requests vinvalbuf() to only flush clean buffers for the vnode from the buffer cache and to not remove any backing pages from the vnode. This is used to ensure clean pages are not wired into the buffer cache before attempting to move them to the cache page queue. The second change adds a new vm_object_page_cache() method. This method is somewhat similar to vm_object_page_remove() except that instead of freeing each page in the specified range, it attempts to move clean pages to the cache queue if possible. To preserve the ABI of struct file, the f_cdevpriv pointer is now reused in a union to point to the currently active advice region if one is present for regular files. Reviewed by: jilles, kib, arch@ Approved by: re (kib) MFC after: 1 month
|
#
cfb5f768 |
|
18-Aug-2011 |
Jonathan Anderson <jonathan@FreeBSD.org> |
Add experimental support for process descriptors A "process descriptor" file descriptor is used to manage processes without using the PID namespace. This is required for Capsicum's Capability Mode, where the PID namespace is unavailable. New system calls pdfork(2) and pdkill(2) offer the functional equivalents of fork(2) and kill(2). pdgetpid(2) allows querying the PID of the remote process for debugging purposes. The currently-unimplemented pdwait(2) will, in the future, allow querying rusage/exit status. In the interim, poll(2) may be used to check (and wait for) process termination. When a process is referenced by a process descriptor, it does not issue SIGCHLD to the parent, making it suitable for use in libraries---a common scenario when using library compartmentalisation from within large applications (such as web browsers). Some observers may note a similarity to Mach task ports; process descriptors provide a subset of this behaviour, but in a UNIX style. This feature is enabled by "options PROCDESC", but as with several other Capsicum kernel features, is not enabled by default in GENERIC 9.0. Reviewed by: jhb, kib Approved by: re (kib), mentor (rwatson) Sponsored by: Google Inc
|
#
9c00bb91 |
|
16-Aug-2011 |
Konstantin Belousov <kib@FreeBSD.org> |
Add the fo_chown and fo_chmod methods to struct fileops and use them to implement fchown(2) and fchmod(2) support for several file types that previously lacked it. Add MAC entries for chown/chmod done on posix shared memory and (old) in-kernel posix semaphores. Based on the submission by: glebius Reviewed by: rwatson Approved by: re (bz)
|
#
a9d2f8d8 |
|
10-Aug-2011 |
Robert Watson <rwatson@FreeBSD.org> |
Second-to-last commit implementing Capsicum capabilities in the FreeBSD kernel for FreeBSD 9.0: Add a new capability mask argument to fget(9) and friends, allowing system call code to declare what capabilities are required when an integer file descriptor is converted into an in-kernel struct file *. With options CAPABILITIES compiled into the kernel, this enforces capability protection; without, this change is effectively a no-op. Some cases require special handling, such as mmap(2), which must preserve information about the maximum rights at the time of mapping in the memory map so that they can later be enforced in mprotect(2) -- this is done by narrowing the rights in the existing max_protection field used for similar purposes with file permissions. In namei(9), we assert that the code is not reached from within capability mode, as we're not yet ready to enforce namespace capabilities there. This will follow in a later commit. Update two capability names: CAP_EVENT and CAP_KEVENT become CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they represent. Approved by: re (bz) Submitted by: jonathan Sponsored by: Google Inc
|
#
9acdfe65 |
|
05-Jul-2011 |
Jonathan Anderson <jonathan@FreeBSD.org> |
Rework _fget to accept capability parameters. This new version of _fget() requires new parameters: - cap_rights_t needrights the rights that we expect the capability's rights mask to include (e.g. CAP_READ if we are going to read from the file) - cap_rights_t *haverights used to return the capability's rights mask (ignored if NULL) - u_char *maxprotp the maximum mmap() rights (e.g. VM_PROT_READ) that can be permitted (only used if we are going to mmap the file; ignored if NULL) - int fget_flags FGET_GETCAP if we want to return the capability itself, rather than the underlying object which it wraps Approved by: mentor (rwatson), re (Capsicum blanket) Sponsored by: Google Inc
|
#
07b1b594 |
|
30-Jun-2011 |
Jonathan Anderson <jonathan@FreeBSD.org> |
Define cap_rights_t and DTYPE_CAPABILITY, which are required to implement Capsicum capabilities. Approved by: mentor (rwatson), re (bz)
|
#
e4cd31dd |
|
21-Mar-2011 |
Jeff Roberson <jeff@FreeBSD.org> |
- Merge changes to the base system to support OFED. These include a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND, and other miscellaneous small features.
|
#
a7d5f7eb |
|
19-Oct-2010 |
Jamie Gritton <jamie@FreeBSD.org> |
A new jail(8) with a configuration file, to replace the work currently done by /etc/rc.d/jail.
|
#
bc8e0a9a |
|
11-Jun-2010 |
Jung-uk Kim <jkim@FreeBSD.org> |
Apply band-aid around function-like macro fdrop() without turning it into a real (inline) function or applying void casting for all its consumers. In most of places, the "return value" is not checked nor assigned, which causes too many warnings for some smart compilers, i.e., clang. Found by: clang
|
#
01d716fc |
|
28-Dec-2009 |
Ed Schouten <ed@FreeBSD.org> |
Use ANSI declarations instead of K&R.
|
#
c2f74a45 |
|
31-Dec-2008 |
David E. O'Brien <obrien@FreeBSD.org> |
style(9) Verfied with: svn diff -x -Bbw file.h showing empty diff
|
#
d7f03759 |
|
19-Oct-2008 |
Ulf Lilleengen <lulf@FreeBSD.org> |
- Import the HEAD csup code which is the basis for the cvsmode work.
|
#
bc093719 |
|
20-Aug-2008 |
Ed Schouten <ed@FreeBSD.org> |
Integrate the new MPSAFE TTY layer to the FreeBSD operating system. The last half year I've been working on a replacement TTY layer for the FreeBSD kernel. The new TTY layer was designed to improve the following: - Improved driver model: The old TTY layer has a driver model that is not abstract enough to make it friendly to use. A good example is the output path, where the device drivers directly access the output buffers. This means that an in-kernel PPP implementation must always convert network buffers into TTY buffers. If a PPP implementation would be built on top of the new TTY layer (still needs a hooks layer, though), it would allow the PPP implementation to directly hand the data to the TTY driver. - Improved hotplugging: With the old TTY layer, it isn't entirely safe to destroy TTY's from the system. This implementation has a two-step destructing design, where the driver first abandons the TTY. After all threads have left the TTY, the TTY layer calls a routine in the driver, which can be used to free resources (unit numbers, etc). The pts(4) driver also implements this feature, which means posix_openpt() will now return PTY's that are created on the fly. - Improved performance: One of the major improvements is the per-TTY mutex, which is expected to improve scalability when compared to the old Giant locking. Another change is the unbuffered copying to userspace, which is both used on TTY device nodes and PTY masters. Upgrading should be quite straightforward. Unlike previous versions, existing kernel configuration files do not need to be changed, except when they reference device drivers that are listed in UPDATING. Obtained from: //depot/projects/mpsafetty/... Approved by: philip (ex-mentor) Discussed: on the lists, at BSDCan, at the DevSummit Sponsored by: Snow B.V., the Netherlands dcons(4) fixed by: kan
|
#
6bc1e9cd |
|
26-Jun-2008 |
John Baldwin <jhb@FreeBSD.org> |
Rework the lifetime management of the kernel implementation of POSIX semaphores. Specifically, semaphores are now represented as new file descriptor type that is set to close on exec. This removes the need for all of the manual process reference counting (and fork, exec, and exit event handlers) as the normal file descriptor operations handle all of that for us nicely. It is also suggested as one possible implementation in the spec and at least one other OS (OS X) uses this approach. Some bugs that were fixed as a result include: - References to a named semaphore whose name is removed still work after the sem_unlink() operation. Prior to this patch, if a semaphore's name was removed, valid handles from sem_open() would get EINVAL errors from sem_getvalue(), sem_post(), etc. This fixes that. - Unnamed semaphores created with sem_init() were not cleaned up when a process exited or exec'd. They were only cleaned up if the process did an explicit sem_destroy(). This could result in a leak of semaphore objects that could never be cleaned up. - On the other hand, if another process guessed the id (kernel pointer to 'struct ksem' of an unnamed semaphore (created via sem_init)) and had write access to the semaphore based on UID/GID checks, then that other process could manipulate the semaphore via sem_destroy(), sem_post(), sem_wait(), etc. - As part of the permission check (UID/GID), the umask of the proces creating the semaphore was not honored. Thus if your umask denied group read/write access but the explicit mode in the sem_init() call allowed it, the semaphore would be readable/writable by other users in the same group, for example. This includes access via the previous bug. - If the module refused to unload because there were active semaphores, then it might have deregistered one or more of the semaphore system calls before it noticed that there was a problem. I'm not sure if this actually happened as the order that modules are discovered by the kernel linker depends on how the actual .ko file is linked. One can make the order deterministic by using a single module with a mod_event handler that explicitly registers syscalls (and deregisters during unload after any checks). This also fixes a race where even if the sem_module unloaded first it would have destroyed locks that the syscalls might be trying to access if they are still executing when they are unloaded. XXX: By the way, deregistering system calls doesn't do any blocking to drain any threads from the calls. - Some minor fixes to errno values on error. For example, sem_init() isn't documented to return ENFILE or EMFILE if we run out of semaphores the way that sem_open() can. Instead, it should return ENOSPC in that case. Other changes: - Kernel semaphores now use a hash table to manage the namespace of named semaphores nearly in a similar fashion to the POSIX shared memory object file descriptors. Kernel semaphores can now also have names longer than 14 chars (up to MAXPATHLEN) and can include subdirectories in their pathname. - The UID/GID permission checks for access to a named semaphore are now done via vaccess() rather than a home-rolled set of checks. - Now that kernel semaphores have an associated file object, the various MAC checks for POSIX semaphores accept both a file credential and an active credential. There is also a new posixsem_check_stat() since it is possible to fstat() a semaphore file descriptor. - A small set of regression tests (using the ksem API directly) is present in src/tools/regression/posixsem. Reported by: kris (1) Tested by: kris Reviewed by: rwatson (lightly) MFC after: 1 month
|
#
037dab57 |
|
26-May-2008 |
Pawel Jakub Dawidek <pjd@FreeBSD.org> |
Use _WANT_FILE to make struct file visible from userland. This is similar to _WANT_UCRED and _WANT_PRISON and seems to be much nicer than defining _KERNEL. It is also needed for my sys/refcount.h change going in soon.
|
#
258f4727 |
|
25-May-2008 |
Attilio Rao <attilio@FreeBSD.org> |
Replace direct atomic operation for the file refcount witht the refcount interface. It also introduces the correct usage of memory barriers, as sometimes fdrop() and fhold() are used with shared locks, which don't use any release barrier.
|
#
82f4d640 |
|
21-May-2008 |
Konstantin Belousov <kib@FreeBSD.org> |
Implement the per-open file data for the cdev. The patch does not change the cdevsw KBI. Management of the data is provided by the functions int devfs_set_cdevpriv(void *priv, cdevpriv_dtr_t dtr); int devfs_get_cdevpriv(void **datap); void devfs_clear_cdevpriv(void); All of the functions are supposed to be called from the cdevsw method contexts. - devfs_set_cdevpriv assigns the priv as private data for the file descriptor which is used to initiate currently performed driver operation. dtr is the function that will be called when either the last refernce to the file goes away, the device is destroyed or devfs_clear_cdevpriv is called. - devfs_get_cdevpriv is the obvious accessor. - devfs_clear_cdevpriv allows to clear the private data for the still open file. Implementation keeps the driver-supplied pointers in the struct cdev_privdata, that is referenced both from the struct file and struct cdev, and cannot outlive any of the referee. Man pages will be provided after the KPI stabilizes. Reviewed by: jhb Useful suggestions from: jeff, antoine Debugging help and tested by: pho MFC after: 1 month
|
#
8e38aeff |
|
08-Jan-2008 |
John Baldwin <jhb@FreeBSD.org> |
Add a new file descriptor type for IPC shared memory objects and use it to implement shm_open(2) and shm_unlink(2) in the kernel: - Each shared memory file descriptor is associated with a swap-backed vm object which provides the backing store. Each descriptor starts off with a size of zero, but the size can be altered via ftruncate(2). The shared memory file descriptors also support fstat(2). read(2), write(2), ioctl(2), select(2), poll(2), and kevent(2) are not supported on shared memory file descriptors. - shm_open(2) and shm_unlink(2) are now implemented as system calls that manage shared memory file descriptors. The virtual namespace that maps pathnames to shared memory file descriptors is implemented as a hash table where the hash key is generated via the 32-bit Fowler/Noll/Vo hash of the pathname. - As an extension, the constant 'SHM_ANON' may be specified in place of the path argument to shm_open(2). In this case, an unnamed shared memory file descriptor will be created similar to the IPC_PRIVATE key for shmget(2). Note that the shared memory object can still be shared among processes by sharing the file descriptor via fork(2) or sendmsg(2), but it is unnamed. This effectively serves to implement the getmemfd() idea bandied about the lists several times over the years. - The backing store for shared memory file descriptors are garbage collected when they are not referenced by any open file descriptors or the shm_open(2) virtual namespace. Submitted by: dillon, peter (previous versions) Submitted by: rwatson (I based this on his version) Reviewed by: alc (suggested converting getmemfd() to shm_open())
|
#
e4650294 |
|
07-Jan-2008 |
John Baldwin <jhb@FreeBSD.org> |
Make ftruncate a 'struct file' operation rather than a vnode operation. This makes it possible to support ftruncate() on non-vnode file types in the future. - 'struct fileops' grows a 'fo_truncate' method to handle an ftruncate() on a given file descriptor. - ftruncate() moves to kern/sys_generic.c and now just fetches a file object and invokes fo_truncate(). - The vnode-specific portions of ftruncate() move to vn_truncate() in vfs_vnops.c which implements fo_truncate() for vnode file types. - Non-vnode file types return EINVAL in their fo_truncate() method. Submitted by: rwatson
|
#
397c19d1 |
|
29-Dec-2007 |
Jeff Roberson <jeff@FreeBSD.org> |
Remove explicit locking of struct file. - Introduce a finit() which is used to initailize the fields of struct file in such a way that the ops vector is only valid after the data, type, and flags are valid. - Protect f_flag and f_count with atomic operations. - Remove the global list of all files and associated accounting. - Rewrite the unp garbage collection such that it no longer requires the global list of all files and instead uses a list of all unp sockets. - Mark sockets in the accept queue so we don't incorrectly gc them. Tested by: kris, pho
|
#
9ae328fc |
|
05-Jan-2007 |
John Baldwin <jhb@FreeBSD.org> |
- Close a race between enumerating UNIX domain socket pcb structures via sysctl and socket teardown by adding a reference count to the UNIX domain pcb object and fixing the sysctl that enumerates unpcbs to grab a reference on each unpcb while it builds the list to copy out to userland. - Close a race between UNIX domain pcb garbage collection (unp_gc()) and file descriptor teardown (fdrop()) by adding a new garbage collection flag FWAIT. unp_gc() sets FWAIT while it walks the message buffers in a UNIX domain socket looking for nested file descriptor references and clears the flag when it is finished. fdrop() checks to see if the flag is set on a file descriptor whose refcount just dropped to 0 and waits for unp_gc() to clear the flag before completely destroying the file descriptor. MFC after: 1 week Reviewed by: rwatson Submitted by: ups Hopefully makes the panics go away: mx1
|
#
6befa6ae |
|
16-May-2006 |
Paul Saab <ps@FreeBSD.org> |
Allow concurrent read(2)/readv(2) access to a file. Lock file offset against multiple read calls. Submitted by: ups Obtained from: Yahoo! MFC after: 2 weeks
|
#
655291f2 |
|
25-Nov-2005 |
David Xu <davidxu@FreeBSD.org> |
Bring in experimental kernel support for POSIX message queue.
|
#
44dc16a9 |
|
09-Feb-2005 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Make some file/filedesc related functions static
|
#
79888acf |
|
02-Feb-2005 |
Robert Watson <rwatson@FreeBSD.org> |
Add a place-holder f_label void * for a future struct label pointer required for the port of SELinux FLASK/TE to FreeBSD using the MAC Framework.
|
#
60727d8b |
|
06-Jan-2005 |
Warner Losh <imp@FreeBSD.org> |
/* -> /*- for license, minor formatting changes
|
#
e4643c73 |
|
01-Dec-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
"nfiles" is a bad name for a global variable. Call it "openfiles" instead as this is more correct and matches the sysctl variable.
|
#
29b96dcc |
|
08-Nov-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Save a pointless FILE_LOCK_ASSERT() call in fhold.
|
#
923805d2 |
|
19-Jun-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Add the f_vnode pointer to struct xfile, shortly it will no longer be identical to f_data by definition.
|
#
82c6e879 |
|
06-Apr-2004 |
Warner Losh <imp@FreeBSD.org> |
Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999. Approved by: core
|
#
3b6d9652 |
|
22-Jun-2003 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Add a f_vnode field to struct file. Several of the subtypes have an associated vnode which is used for stuff like the f*() functions. By giving the vnode a speparate field, a number of checks for the specific subtype can be replaced simply with a check for f_vnode != NULL, and we can later free f_data up to subtype specific use. At this point in time, f_data still points to the vnode, so any code I might have overlooked will still work.
|
#
c066b892 |
|
20-Jun-2003 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Move FMARK and FDEFER til sys/file.h where they belong. Order the fields in struct file in sections after their scope.
|
#
2db4b023 |
|
18-Jun-2003 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Introduce a new flag on a file descriptor: DFLAG_SEEKABLE and use that rather than assume that only DTYPE_VNODE is seekable.
|
#
e7d6662f |
|
14-Feb-2003 |
Alfred Perlstein <alfred@FreeBSD.org> |
Do not allow kqueues to be passed via unix domain sockets.
|
#
48e3128b |
|
12-Jan-2003 |
Matthew Dillon <dillon@FreeBSD.org> |
Bow to the whining masses and change a union back into void *. Retain removal of unnecessary casts and throw in some minor cleanups to see if anyone complains, just for the hell of it.
|
#
cd72f218 |
|
11-Jan-2003 |
Matthew Dillon <dillon@FreeBSD.org> |
Change struct file f_data to un_data, a union of the correct struct pointer types, and remove a huge number of casts from code using it. Change struct xfile xf_data to xun_data (ABI is still compatible). If we need to add a #define for f_data and xf_data we can, but I don't think it will be necessary. There are no operational changes in this commit.
|
#
a7010ee2 |
|
24-Dec-2002 |
Poul-Henning Kamp <phk@FreeBSD.org> |
White-space changes.
|
#
08c7670a |
|
23-Dec-2002 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Move the declaration of the socket fileops from socketvar.h to file.h. This allows us to use the new typedefs and removes the needs for a number of forward struct declarations in socketvar.h
|
#
f3a68211 |
|
23-Dec-2002 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Detediousficate declaration of fileops array members by introducing typedefs for them.
|
#
b35c7eef |
|
04-Oct-2002 |
Sam Leffler <sam@FreeBSD.org> |
add DTYPE_CRYPTO for use by /dev/crypto support
|
#
117f87fe |
|
16-Sep-2002 |
Mike Barcroft <mike@FreeBSD.org> |
Include <sys/types.h> directly rather than depending on <sys/fcntl.h> to include it. This could be avoided by adding the necessary typedefs here, or by making users of <sys/file.h> include <sys/types.h> first.
|
#
d49fa1ca |
|
16-Aug-2002 |
Robert Watson <rwatson@FreeBSD.org> |
In continuation of early fileop credential changes, modify fo_ioctl() to accept an 'active_cred' argument reflecting the credential of the thread initiating the ioctl operation. - Change fo_ioctl() to accept active_cred; change consumers of the fo_ioctl() interface to generally pass active_cred from td->td_ucred. - In fifofs, initialize filetmp.f_cred to ap->a_cred so that the invocations of soo_ioctl() are provided access to the calling f_cred. Pass ap->a_td->td_ucred as the active_cred, but note that this is required because we don't yet distinguish file_cred and active_cred in invoking VOP's. - Update kqueue_ioctl() for its new argument. - Update pipe_ioctl() for its new argument, pass active_cred rather than td_ucred to MAC for authorization. - Update soo_ioctl() for its new argument. - Update vn_ioctl() for its new argument, use active_cred rather than td->td_ucred to authorize VOP_IOCTL() and the associated VOP_GETATTR(). Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
ea6027a8 |
|
15-Aug-2002 |
Robert Watson <rwatson@FreeBSD.org> |
Make similar changes to fo_stat() and fo_poll() as made earlier to fo_read() and fo_write(): explicitly use the cred argument to fo_poll() as "active_cred" using the passed file descriptor's f_cred reference to provide access to the file credential. Add an active_cred argument to fo_stat() so that implementers have access to the active credential as well as the file credential. Generally modify callers of fo_stat() to pass in td->td_ucred rather than fp->f_cred, which was redundantly provided via the fp argument. This set of modifications also permits threads to perform these operations on behalf of another thread without modifying their credential. Trickle this change down into fo_stat/poll() implementations: - badfo_poll(), badfo_stat(): modify/add arguments. - kqueue_poll(), kqueue_stat(): modify arguments. - pipe_poll(), pipe_stat(): modify/add arguments, pass active_cred to MAC checks rather than td->td_ucred. - soo_poll(), soo_stat(): modify/add arguments, pass fp->f_cred rather than cred to pru_sopoll() to maintain current semantics. - sopoll(): moidfy arguments. - vn_poll(), vn_statfile(): modify/add arguments, pass new arguments to vn_stat(). Pass active_cred to MAC and fp->f_cred to VOP_POLL() to maintian current semantics. - vn_close(): rename cred to file_cred to reflect reality while I'm here. - vn_stat(): Add active_cred and file_cred arguments to vn_stat() and consumers so that this distinction is maintained at the VFS as well as 'struct file' layer. Pass active_cred instead of td->td_ucred to MAC and to VOP_GETATTR() to maintain current semantics. - fifofs: modify the creation of a "filetemp" so that the file credential is properly initialized and can be used in the socket code if desired. Pass ap->a_td->td_ucred as the active credential to soo_poll(). If we teach the vnop interface about the distinction between file and active credentials, we would use the active credential here. Note that current inconsistent passing of active_cred vs. file_cred to VOP's is maintained. It's not clear why GETATTR would be authorized using active_cred while POLL would be authorized using file_cred at the file system level. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
c6a3f1fc |
|
15-Aug-2002 |
Robert Watson <rwatson@FreeBSD.org> |
For some reason, the flags and td arguments in the fo_read prototype were reversed. Correct this with no functional change.
|
#
9ca43589 |
|
15-Aug-2002 |
Robert Watson <rwatson@FreeBSD.org> |
In order to better support flexible and extensible access control, make a series of modifications to the credential arguments relating to file read and write operations to cliarfy which credential is used for what: - Change fo_read() and fo_write() to accept "active_cred" instead of "cred", and change the semantics of consumers of fo_read() and fo_write() to pass the active credential of the thread requesting an operation rather than the cached file cred. The cached file cred is still available in fo_read() and fo_write() consumers via fp->f_cred. These changes largely in sys_generic.c. For each implementation of fo_read() and fo_write(), update cred usage to reflect this change and maintain current semantics: - badfo_readwrite() unchanged - kqueue_read/write() unchanged pipe_read/write() now authorize MAC using active_cred rather than td->td_ucred - soo_read/write() unchanged - vn_read/write() now authorize MAC using active_cred but VOP_READ/WRITE() with fp->f_cred Modify vn_rdwr() to accept two credential arguments instead of a single credential: active_cred and file_cred. Use active_cred for MAC authorization, and select a credential for use in VOP_READ/WRITE() based on whether file_cred is NULL or not. If file_cred is provided, authorize the VOP using that cred, otherwise the active credential, matching current semantics. Modify current vn_rdwr() consumers to pass a file_cred if used in the context of a struct file, and to always pass active_cred. When vn_rdwr() is used without a file_cred, pass NOCRED. These changes should maintain current semantics for read/write, but avoid a redundant passing of fp->f_cred, as well as making it more clear what the origin of each credential is in file descriptor read/write operations. Follow-up commits will make similar changes to other file descriptor operations, and modify the MAC framework to pass both credentials to MAC policy modules so they can implement either semantic for revocation. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
84baf7a2 |
|
30-Jul-2002 |
Dag-Erling Smørgrav <des@FreeBSD.org> |
Add struct xfile, which will be used instead of struct file for sysctl purposes. Sponsored by: DARPA, NAI Labs
|
#
7f05b035 |
|
28-Jun-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
More caddr_t removal, make fo_ioctl take a void * instead of a caddr_t.
|
#
b555662c |
|
28-Jun-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
change f_data field in struct file from caddr_t to void *.
|
#
4ff96497 |
|
17-Jun-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
remove bogus comment, select/poll do NOT need to fhold as they hold the filedesc lock. style(9) fixes, add blank line at start of functions with no local variables.
|
#
cb331c71 |
|
25-Mar-2002 |
Bruce Evans <bde@FreeBSD.org> |
Removed some namespace pollution (unnecessary nested includes).
|
#
c58eb46e |
|
23-Mar-2002 |
Bruce Evans <bde@FreeBSD.org> |
Fixed some style bugs in the removal of __P(()). The main ones were not removing tabs before "__P((", and not outdenting continuation lines to preserve non-KNF lining up of code with parentheses. Switch to KNF formatting and/or rewrap the whole prototype in some cases.
|
#
789f12fe |
|
19-Mar-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
Remove __P
|
#
649e7178 |
|
23-Feb-2002 |
Dag-Erling Smørgrav <des@FreeBSD.org> |
Fix namespace pollution introduced in previous commit.
|
#
f591779b |
|
23-Feb-2002 |
Seigo Tanimura <tanimura@FreeBSD.org> |
Lock struct pgrp, session and sigio. New locks are: - pgrpsess_lock which locks the whole pgrps and sessions, - pg_mtx which protects the pgrp members, and - s_mtx which protects the session members. Please refer to sys/proc.h for the coverage of these locks. Changes on the pgrp/session interface: - pgfind() needs the pgrpsess_lock held. - The caller of enterpgrp() is responsible to allocate a new pgrp and session. - Call enterthispgrp() in order to enter an existing pgrp. - pgsignal() requires a pgrp lock held. Reviewed by: jhb, alfred Tested on: cvsup.jp.FreeBSD.org (which is a quad-CPU machine running -current)
|
#
767567d3 |
|
20-Jan-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
use mutex pools for "struct file" locking. fix indentation of FILE_LOCK/UNLOCK macros while I'm here.
|
#
74ed76de |
|
14-Jan-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
Remove requirement for queue.h by consumers by moving its inclusion before other headers that require it. Pointed out by: ru, bde
|
#
a4db4953 |
|
13-Jan-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
Replace ffind_* with fget calls. Make fget MPsafe. Make fgetvp and fgetsock use the fget subsystem to reduce code bloat. Push giant down in fpathconf().
|
#
9e209b12 |
|
13-Jan-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
Include sys/_lock.h and sys/_mutex.h to reduce namespace pollution. Requested by: jhb
|
#
297f86dc |
|
12-Jan-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
Remove file locking debug cruft.
|
#
426da3bc |
|
13-Jan-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
SMP Lock struct file, filedesc and the global file list. Seigo Tanimura (tanimura) posted the initial delta. I've polished it quite a bit reducing the need for locking and adapting it for KSE. Locks: 1 mutex in each filedesc protects all the fields. protects "struct file" initialization, while a struct file is being changed from &badfileops -> &pipeops or something the filedesc should be locked. 1 mutex in each struct file protects the refcount fields. doesn't protect anything else. the flags used for garbage collection have been moved to f_gcflag which was the FILLER short, this doesn't need locking because the garbage collection is a single threaded container. could likely be made to use a pool mutex. 1 sx lock for the global filelist. struct file * fhold(struct file *fp); /* increments reference count on a file */ struct file * fhold_locked(struct file *fp); /* like fhold but expects file to locked */ struct file * ffind_hold(struct thread *, int fd); /* finds the struct file in thread, adds one reference and returns it unlocked */ struct file * ffind_lock(struct thread *, int fd); /* ffind_hold, but returns file locked */ I still have to smp-safe the fget cruft, I'll get to that asap.
|
#
b1e4abd2 |
|
16-Nov-2001 |
Matthew Dillon <dillon@FreeBSD.org> |
Give struct socket structures a ref counting interface similar to vnodes. This will hopefully serve as a base from which we can expand the MP code. We currently do not attempt to obtain any mutex or SX locks, but the door is open to add them when we nail down exactly how that part of it is going to work.
|
#
b064d43d |
|
13-Nov-2001 |
Matthew Dillon <dillon@FreeBSD.org> |
remove holdfp() Replace uses of holdfp() with fget*() or fgetvp*() calls as appropriate introduce fget(), fget_read(), fget_write() - these functions will take a thread and file descriptor and return a file pointer with its ref count bumped. introduce fgetvp(), fgetvp_read(), fgetvp_write() - these functions will take a thread and file descriptor and return a vref()'d vnode. *_read() requires that the file pointer be FREAD, *_write that it be FWRITE. This continues the cleanup of struct filedesc and struct file access routines which, when are all through with it, will allow us to then make the API calls MP safe and be able to move Giant down into the fo_* functions.
|
#
2af8d76d |
|
13-Sep-2001 |
David E. O'Brien <obrien@FreeBSD.org> |
Re-apply rev 1.178 -- style(9) the structure definitions. I have to wonder how many other changes were lost in the KSE mildstone 2 merge.
|
#
b40ce416 |
|
12-Sep-2001 |
Julian Elischer <julian@FreeBSD.org> |
KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
|
#
ac8f990b |
|
24-May-2001 |
Matthew Dillon <dillon@FreeBSD.org> |
This patch implements O_DIRECT about 80% of the way. It takes a patchset Tor created a while ago, removes the raw I/O piece (that has cache coherency problems), and adds a buffer cache / VM freeing piece. Essentially this patch causes O_DIRECT I/O to not be left in the cache, but does not prevent it from going through the cache, hence the 80%. For the last 20% we need a method by which the I/O can be issued directly to buffer supplied by the user process and bypass the buffer cache entirely, but still maintain cache coherency. I also have the code working under -stable but the changes made to sys/file.h may not be MFCable, so an MFC is not on the table yet. Submitted by: tegge, dillon
|
#
608a3ce6 |
|
15-Feb-2001 |
Jonathan Lemon <jlemon@FreeBSD.org> |
Extend kqueue down to the device layer. Backwards compatible approach suggested by: peter
|
#
e3975643 |
|
25-May-2000 |
Jake Burkholder <jake@FreeBSD.org> |
Back out the previous change to the queue(3) interface. It was not discussed and should probably not happen. Requested by: msmith and others
|
#
740a1973 |
|
23-May-2000 |
Jake Burkholder <jake@FreeBSD.org> |
Change the way that the queue(3) structures are declared; don't assume that the type argument to *_HEAD and *_ENTRY is a struct. Suggested by: phk Reviewed by: phk Approved by: mdodd
|
#
5aadd4d9 |
|
15-May-2000 |
Matthew Dillon <dillon@FreeBSD.org> |
Change f_count and f_msgcount from short to int, solving the rollover problem demonstrated by the PR. MFC to follow. PR: kern/18346
|
#
cb679c38 |
|
16-Apr-2000 |
Jonathan Lemon <jlemon@FreeBSD.org> |
Introduce kqueue() and kevent(), a kernel event notification facility.
|
#
e4649cfa |
|
01-Apr-2000 |
Matthew Dillon <dillon@FreeBSD.org> |
Change the write-behind code to take more care when starting async I/O's. The sequential read heuristic has been extended to cover writes as well. We continue to call cluster_write() normally, thus blocks in the file will still be reallocated for large (but still random) I/O's, but I/O will only be initiated for truely sequential writes. This solves a number of annoying situations, especially with DBM (hash method) writes, and also has the side effect of fixing a number of (stupid) benchmarks. Reviewed-by: mckusick
|
#
664a31e4 |
|
28-Dec-1999 |
Peter Wemm <peter@FreeBSD.org> |
Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL" is an application space macro and the applications are supposed to be free to use it as they please (but cannot). This is consistant with the other BSD's who made this change quite some time ago. More commits to come.
|
#
a2eec8ee |
|
07-Nov-1999 |
Peter Wemm <peter@FreeBSD.org> |
Create a fileops fo_stat() entry point. This will enable collection of a bunch of duplicated code that breaks (read: panic) when a new file type is added and some switch/case entries are missed.
|
#
13ccadd4 |
|
19-Sep-1999 |
Brian Feldman <green@FreeBSD.org> |
This is what was "fdfix2.patch," a fix for fd sharing. It's pretty far-reaching in fd-land, so you'll want to consult the code for changes. The biggest change is that now, you don't use fp->f_ops->fo_foo(fp, bar) but instead fo_foo(fp, bar), which increments and decrements the fp refcount upon entry and exit. Two new calls, fhold() and fdrop(), are provided. Each does what it seems like it should, and if fdrop() brings the refcount to zero, the fd is freed as well. Thanks to peter ("to hell with it, it looks ok to me.") for his review. Thanks to msmith for keeping me from putting locks everywhere :) Reviewed by: peter
|
#
c3aac50f |
|
27-Aug-1999 |
Peter Wemm <peter@FreeBSD.org> |
$Id$ -> $FreeBSD$
|
#
e32c66c5 |
|
04-Aug-1999 |
Brian Feldman <green@FreeBSD.org> |
Fix fd race conditions (during shared fd table usage.) Badfileops is now used in f_ops in place of NULL, and modifications to the files are more carefully ordered. f_ops should also be set to &badfileops upon "close" of a file. This does not fix other problems mentioned in this PR than the first one. PR: 11629 Reviewed by: peter
|
#
8fe387ab |
|
04-Apr-1999 |
Dmitrij Tejblum <dt@FreeBSD.org> |
Add standard padding argument to pread and pwrite syscall. That should make them NetBSD compatible. Add parameter to fo_read and fo_write. (The only flag FOF_OFFSET mean that the offset is set in the struct uio). Factor out some common code from read/pread/write/pwrite syscalls.
|
#
ecbb00a2 |
|
07-Jun-1998 |
Doug Rabson <dfr@FreeBSD.org> |
This commit fixes various 64bit portability problems required for FreeBSD/alpha. The most significant item is to change the command argument to ioctl functions from int to u_long. This change brings us inline with various other BSD versions. Driver writers may like to use (__FreeBSD_version == 300003) to detect this change. The prototype FreeBSD/alpha machdep will follow in a couple of days time.
|
#
d826c479 |
|
23-Nov-1997 |
Bruce Evans <bde@FreeBSD.org> |
Fixed duplicate definitions of M_FILE (one static).
|
#
3a74593f |
|
13-Sep-1997 |
Peter Wemm <peter@FreeBSD.org> |
Update interfaces for poll()
|
#
3ac4d1ef |
|
22-Mar-1997 |
Bruce Evans <bde@FreeBSD.org> |
Don't #include <sys/fcntl.h> in <sys/file.h> if KERNEL is defined. Fixed everything that depended on getting fcntl.h stuff from the wrong place. Most things don't depend on file.h stuff at all.
|
#
6875d254 |
|
22-Feb-1997 |
Peter Wemm <peter@FreeBSD.org> |
Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
|
#
1130b656 |
|
14-Jan-1997 |
Jordan K. Hubbard <jkh@FreeBSD.org> |
Make the long-awaited change from $Id$ to $FreeBSD$ This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
|
#
8b612c4b |
|
28-Dec-1996 |
John Dyson <dyson@FreeBSD.org> |
This commit is the embodiment of some VFS read clustering improvements. Firstly, now our read-ahead clustering is on a file descriptor basis and not on a per-vnode basis. This will allow multiple processes reading the same file to take advantage of read-ahead clustering. Secondly, there previously was a problem with large reads still using the ramp-up algorithm. Of course, that was bogus, and now we read the entire "chunk" off of the disk in one operation. The read-ahead clustering algorithm should use less CPU than the previous also (I hope :-)). NOTE: THAT LKMS MUST BE REBUILT!!!
|
#
bb65f5a1 |
|
19-Dec-1996 |
Bruce Evans <bde@FreeBSD.org> |
Fixed lseek() on named pipes. It always succeeded but should always fail. Broke locking on named pipes in the same way as locking on non-vnodes (wrong errno). This will be fixed later. The fix involves negative logic. Named pipes are now distinguished from other types of files with vnodes, and there is additional code to handle vnodes and named pipes in the same way only where that makes sense (not for lseek, locking or TIOCSCTTY).
|
#
b71fec07 |
|
03-Sep-1996 |
Bruce Evans <bde@FreeBSD.org> |
Eliminated nested include of <sys/unistd.h> in <sys/file.h> in the kernel. Include it directly in the few places where it is used. Reduced some #includes of <sys/file.h> to #includes of <sys/fcntl.h> or nothing.
|
#
02e2c406 |
|
11-Mar-1996 |
Peter Wemm <peter@FreeBSD.org> |
Import 4.4BSD-Lite2 onto the vendor branch, note that in the kernel, all files are off the vendor branch, so this should not change anything. A "U" marker generally means that the file was not changed in between the 4.4Lite and Lite-2 releases, and does not need a merge. "C" generally means that there was a change. [new sys/syscallargs.h file, to be "cvs rm"ed]
|
#
4b50ceef |
|
10-Mar-1996 |
Jeffrey Hsu <hsu@FreeBSD.org> |
Merge in Lite2: LIST replacement for f_filef, f_fileb, and filehead. Did not accept change of second argument to ioctl from int to u_long. Reviewed by: davidg & bde
|
#
f9827213 |
|
28-Jan-1996 |
John Dyson <dyson@FreeBSD.org> |
Enable the new fast pipe code. The old pipes can be used with the "OLD_PIPE" config option.
|
#
b5e8ce9f |
|
16-Mar-1995 |
Bruce Evans <bde@FreeBSD.org> |
Add and move declarations to fix all of the warnings from `gcc -Wimplicit' (except in netccitt, netiso and netns) and most of the warnings from `gcc -Wnested-externs'. Fix all the bugs found. There were no serious ones.
|
#
e6373c9e |
|
20-Feb-1995 |
Guido van Rooij <guido@FreeBSD.org> |
Implement maxprocperuid and maxfilesperproc. They are tunable via sysctl(8). The initial value of maxprocperuid is maxproc-1, that of maxfilesperproc is maxfiles (untill maxfile will disappear) Now it is at least possible to prohibit one user opening maxfiles -Guido Submitted by: Obtained from:
|
#
af9da405 |
|
20-Aug-1994 |
Paul Richards <paul@FreeBSD.org> |
Made them all idempotent. Reviewed by: Submitted by:
|
#
3c4dd356 |
|
02-Aug-1994 |
David Greenman <dg@FreeBSD.org> |
Added $Id$
|
#
df8bae1d |
|
24-May-1994 |
Rodney W. Grimes <rgrimes@FreeBSD.org> |
BSD 4.4 Lite Kernel Sources
|