#
7893419d |
|
04-Dec-2023 |
Brooks Davis <brooks@FreeBSD.org> |
Remove never implemented sbrk and sstk syscalls Both system calls were stubs returning EOPNOTSUPP and libc did not provide _ or __sys_ prefixed symbols. The actual implementation of sbrk(2) is on top of the undocumented break(2) system call. Technically this is a change in ABI, but no non-contrived program ever called these syscalls. Reviewed by: kib, emaste Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D42872
|
#
29363fb4 |
|
23-Nov-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove ancient SCCS tags. Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script. Sponsored by: Netflix
|
#
685dc743 |
|
16-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove $FreeBSD$: one-line .c pattern Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
|
#
f3e11927 |
|
14-Aug-2023 |
Dmitry Chagin <dchagin@FreeBSD.org> |
vm: Allow MAP_32BIT for all architectures Reviewed by: alc, kib, markj Differential revision: https://reviews.freebsd.org/D41435
|
#
0ddd32b6 |
|
14-Aug-2023 |
Dmitry Chagin <dchagin@FreeBSD.org> |
vm: MAP_32BIT_MAX_ADDR defined in sys/mman.h Reviewed by: kib Differential revision: https://reviews.freebsd.org/D41434
|
#
37e5d49e |
|
03-Aug-2023 |
Alan Cox <alc@FreeBSD.org> |
vm: Fix address hints of 0 with MAP_32BIT Also, rename min_addr to default_addr, which better reflects what it represents. The min_addr is not a minimum address in the same way that max_addr is actually a maximum address that can be allocated. For example, a non-zero hint can be less than min_addr and be allocated. Reported by: dchagin Reviewed by: dchagin, kib, markj Fixes: d8e6f4946cec0 "vm: Fix anonymous memory clustering under ASLR" Differential Revision: https://reviews.freebsd.org/D41397
|
#
9b65fa69 |
|
29-Jul-2023 |
Konstantin Belousov <kib@FreeBSD.org> |
linuxolator: implement Linux' PROT_GROWSDOWN From the Linux man page for mprotect(2): PROT_GROWSDOWN Apply the protection mode down to the beginning of a mapping that grows downward (which should be a stack segment or a segment mapped with the MAP_GROWSDOWN flag set). Reported by: dchagin Reviewed by: alc, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D41099
|
#
5ec2d94a |
|
25-Jul-2023 |
Alan Cox <alc@FreeBSD.org> |
vm_mmap_object: Update the spelling of true/false Since fitit is already a bool, use true/false instead of TRUE/FALSE. MFC after: 2 weeks
|
#
d8e6f494 |
|
22-Jun-2023 |
Alan Cox <alc@FreeBSD.org> |
vm: Fix anonymous memory clustering under ASLR By default, our ASLR implementation is supposed to cluster anonymous memory allocations, unless the application's mmap(..., MAP_ANON, ...) call included a non-zero address hint. Unfortunately, clustering never occurred because kern_mmap() always replaced the given address hint when it was zero. So, the ASLR implementation always believed that a non-zero hint had been provided and randomized the mapping's location in the address space. To fix this problem, I'm pushing down the point at which we convert a hint of zero to the minimum allocatable address from kern_mmap() to vm_map_find_min(). Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D40743
|
#
0cb2610e |
|
16-Jul-2022 |
Mark Johnston <markj@FreeBSD.org> |
vm: Remove handling for OBJT_DEFAULT objects Now that OBJT_DEFAULT objects can't be instantiated, we can simplify checks of the form object->type == OBJT_DEFAULT || (object->flags & OBJ_SWAP) != 0. No functional change intended. Reviewed by: alc, kib Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D35788
|
#
eee9aab9 |
|
13-Jul-2022 |
Mark Johnston <markj@FreeBSD.org> |
vm_mmap: Remove obsolete code and comments from vm_mmap() In preparation for removing OBJT_DEFAULT, eliminate some stale/unhelpful comments from vm_mmap(), and remove an unused case. In particular, the remaining callers of vm_mmap() in the tree do not specify OBJT_DEFAULT. It's much more common to use vm_map_find() to map an object into user memory, so rather than adjusting vm_mmap() to handle OBJT_SWAP objects, let's further discourage its use and simply remove OBJT_DEFAULT handling. Reviewed by: dougm, alc, kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D35778
|
#
e123264e |
|
19-Jun-2022 |
Mark Johnston <markj@FreeBSD.org> |
vm: Fix racy checks for swap objects Commit 4b8365d752ef introduced the ability to dynamically register VM object types, for use by tmpfs, which creates swap-backed objects. As a part of this, checks for such objects changed from object->type == OBJT_DEFAULT || object->type == OBJT_SWAP to object->type == OBJT_DEFAULT || (object->flags & OBJ_SWAP) != 0 In particular, objects of type OBJT_DEFAULT do not have OBJ_SWAP set; the swap pager sets this flag when converting from OBJT_DEFAULT to OBJT_SWAP. A few of these checks are done without the object lock held. It turns out that this can result in false negatives since the swap pager converts objects like so: object->type = OBJT_SWAP; object->flags |= OBJ_SWAP; Fix the problem by adding explicit tests for OBJT_SWAP objects in unlocked checks. PR: 258932 Fixes: 4b8365d752ef ("Add OBJT_SWAP_TMPFS pager") Reported by: bdrewery Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D35470
|
#
b1ad6a90 |
|
28-Mar-2022 |
Brooks Davis <brooks@FreeBSD.org> |
syscallarg_t: Add a type for system call arguments This more clearly differentiates system call arguments from integer registers and return values. On current architectures it has no effect, but on architectures where pointers are not integers (CHERI) and may not even share registers (CHERI-MIPS) it is necessiary to differentiate between system call arguments (syscallarg_t) and integer register values (register_t). Obtained from: CheriBSD Reviewed by: imp, kib Differential Revision: https://reviews.freebsd.org/D33780
|
#
0910a41e |
|
12-Jan-2022 |
Brooks Davis <brooks@FreeBSD.org> |
Revert "syscallarg_t: Add a type for system call arguments" Missed issues in truss on at least armv7 and powerpcspe need to be resolved before recommit. This reverts commit 3889fb8af0b611e3126dc250ebffb01805152104. This reverts commit 1544e0f5d1f1e3b8c10a64cb899a936976ca7ea4.
|
#
1544e0f5 |
|
12-Jan-2022 |
Brooks Davis <brooks@FreeBSD.org> |
syscallarg_t: Add a type for system call arguments This more clearly differentiates system call arguments from integer registers and return values. On current architectures it has no effect, but on architectures where pointers are not integers (CHERI) and may not even share registers (CHERI-MIPS) it is necessiary to differentiate between system call arguments (syscallarg_t) and integer register values (register_t). Obtained from: CheriBSD Reviewed by: imp, kib Differential Revision: https://reviews.freebsd.org/D33780
|
#
01ce7fca |
|
15-Nov-2021 |
Brooks Davis <brooks@FreeBSD.org> |
ommap: fix signed len and pos arguments 4.3 BSD's mmap took an int len and long pos. Reject negative lengths and in freebsd32 sign-extend pos correctly rather than mis-handling negative positions as large positive ones. Reviewed by: kib
|
#
4b8365d7 |
|
30-Apr-2021 |
Konstantin Belousov <kib@FreeBSD.org> |
Add OBJT_SWAP_TMPFS pager This is OBJT_SWAP pager, specialized for tmpfs. Right now, both swap pager and generic vm code have to explicitly handle swap objects which are tmpfs vnode v_object, in the special ways. Replace (almost) all such places with proper methods. Since VM still needs a notion of the 'swap object', regardless of its use, add yet another type-classification flag OBJ_SWAP. Set it in vm_object_allocate() where other type-class flags are set. This change almost completely eliminates the knowledge of tmpfs from VM, and opens a way to make OBJT_SWAP_TMPFS loadable from tmpfs.ko. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30070
|
#
7a1591c1 |
|
22-Jan-2021 |
Brooks Davis <brooks@FreeBSD.org> |
Rename kern_mmap_req to kern_mmap Replace all uses of kern_mmap with kern_mmap_req move the old kern_mmap. Reand rename kern_mmap_req to kern_mmap . The helper saved some code churn initially, but having multiple interfaces is sub-optimal. Obtained from: CheriBSD Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D28292
|
#
0659df6f |
|
12-Jan-2021 |
Konstantin Belousov <kib@FreeBSD.org> |
vm_map_protect: allow to set prot and max_prot in one go. This prevents a situation where other thread modifies map entries permissions between setting max_prot, then relocking, then setting prot, confusing the operation outcome. E.g. you can get an error that is not possible if operation is performed atomic. Also enable setting rwx for max_prot even if map does not allow to set effective rwx protection. Reviewed by: brooks, markj (previous version) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28117
|
#
d301b358 |
|
09-Sep-2020 |
Konstantin Belousov <kib@FreeBSD.org> |
Support for userspace non-transparent superpages (largepages). Created with shm_open2(SHM_LARGEPAGE) and then configured with FIOSSHMLPGCNF ioctl, largepages posix shared memory objects guarantee that all userspace mappings of it are served by superpage non-managed mappings. Only amd64 for now, both 2M and 1G superpages can be requested, the later requires CPU feature. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24652
|
#
e8f77c20 |
|
09-Sep-2020 |
Konstantin Belousov <kib@FreeBSD.org> |
Prepare to handle non-trivial errors from vm_map_delete(). Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24652
|
#
67a659d2 |
|
08-Sep-2020 |
Konstantin Belousov <kib@FreeBSD.org> |
Add kern_mmap_racct_check(), a helper to verify limits in vm_mmap*(). Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24652
|
#
847ab36b |
|
02-Sep-2020 |
Mark Johnston <markj@FreeBSD.org> |
Include the psind in data returned by mincore(2). Currently we use a single bit to indicate whether the virtual page is part of a superpage. To support a forthcoming implementation of non-transparent 1GB superpages, it is useful to provide more detailed information about large page sizes. The change converts MINCORE_SUPER into a mask for MINCORE_PSIND(psind) values, indicating a mapping of size psind, where psind is an index into the pagesizes array returned by getpagesizes(3), which in turn comes from the hw.pagesizes sysctl. MINCORE_PSIND(1) is equal to the old value of MINCORE_SUPER. For now, two bits are used to record the page size, permitting values of MAXPAGESIZES up to 4. Reviewed by: alc, kib Sponsored by: Juniper Networks, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D26238
|
#
c3aa3bf9 |
|
01-Sep-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vm: clean up empty lines in .c and .h files
|
#
a92a971b |
|
16-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: remove the thread argument from vget It was already asserted to be curthread. Semantic patch: @@ expression arg1, arg2, arg3; @@ - vget(arg1, arg2, arg3) + vget(arg1, arg2)
|
#
52c81be1 |
|
20-Jun-2020 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Add linux_madvise(2) instead of having Linux apps call the native FreeBSD madvise(2) directly. While some of the flag values match, most don't. PR: kern/230160 Reported by: markj Reviewed by: markj Discussed with: brooks, kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25272
|
#
0f1e6ec5 |
|
18-Jun-2020 |
Mark Johnston <markj@FreeBSD.org> |
Add a helper function for validating VA ranges. Functions which take untrusted user ranges must validate against the bounds of the map, and also check for wraparound. Instead of having the same logic duplicated in a number of places, add a function to check. Reviewed by: dougm, kib Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D25328
|
#
4d13f784 |
|
03-Jun-2020 |
Ed Maste <emaste@FreeBSD.org> |
Correct terminology in vm.imply_prot_max sysctl description As with r361769 (man page), PROT_* are properly called protections, not permissions. MFC after: 1 week MFC with: r361769 Sponsored by: The FreeBSD Foundation
|
#
d718de81 |
|
04-Mar-2020 |
Brooks Davis <brooks@FreeBSD.org> |
Introduce kern_mmap_req(). This presents an extensible interface to the generic mmap(2) implementation via a struct pointer intended to use a designated initializer or compount literal. We take advantage of the mandatory zeroing of fields not listed in the initializer. Remove kern_mmap_fpcheck() and use kern_mmap_req(). The motivation for this change is a desire to keep the core implementation from growing an ever-increasing number of arguments that must be specified in the correct order for the lowest-level implementations. In CheriBSD we have already added two more arguments. Reviewed by: kib Discussed with: kevans Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D23164
|
#
acb8858f |
|
26-Feb-2020 |
Ed Maste <emaste@FreeBSD.org> |
Return ENOTSUP for mmap/mprotect if prot not subset of prot_max From POSIX, [ENOTSUP] The implementation does not support the combination of accesses requested in the prot argument. This fits the case that prot contains permissions which are not a subset of prot_max. Reviewed by: brooks, cem Relnotes: Yes Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23843
|
#
3379d2f9 |
|
14-Feb-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vm: use new capsicum helpers
|
#
23ed568c |
|
14-Feb-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vm: remove no longer needed atomic_load_ptr casts
|
#
643656cf |
|
31-Jan-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: replace VOP_MARKATIME with VOP_MMAPPED The routine is only provided by ufs and is only used on mmap and exec. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23422
|
#
2180f6c6 |
|
04-Jan-2020 |
Kyle Evans <kevans@FreeBSD.org> |
kern_mmap: restore character deleted in transit Pointy hat to: kevans X-MFC-With: r356359
|
#
18348a23 |
|
04-Jan-2020 |
Kyle Evans <kevans@FreeBSD.org> |
kern_mmap: add a variant that allows caller to inspect fp Linux mmap rejects mmap() on a write-only file with EACCES. linux_mmap_common currently does a fun dance to grab the fp associated with the passed in fd, validates it, then drops the reference and calls into kern_mmap(). Doing so is perhaps both fragile and premature; there's still plenty of chance for the request to get rejected with a more appropriate error, and it's prone to a race where the file we ultimately mmap has changed after it drops its referenced. This change alleviates the need to do this by providing a kern_mmap variant that allows the caller to inspect the fp just before calling into the fileop layer. The callback takes flags, prot, and maxprot as one could imagine scenarios where any of these, in conjunction with the file itself, may influence a caller's decision. The file type check in the linux compat layer has been removed; EINVAL is seemingly not an appropriate response to the file not being a vnode or device. The fileop layer will reject the operation with ENODEV if it's not supported, which more closely matches the common linux description of mmap(2) return values. If we discover that we're allowing an mmap() on a file type that Linux normally wouldn't, we should restrict those explicitly. Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D22977
|
#
5cff1f4d |
|
10-Dec-2019 |
Mark Johnston <markj@FreeBSD.org> |
Introduce vm_page_astate. This is a 32-bit structure embedded in each vm_page, consisting mostly of page queue state. The use of a structure makes it easy to store a snapshot of a page's queue state in a stack variable and use cmpset loops to update that state without requiring the page lock. This change merely adds the structure and updates references to atomic state fields. No functional change intended. Reviewed by: alc, jeff, kib Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D22650
|
#
f2410510 |
|
29-Nov-2019 |
Jeff Roberson <jeff@FreeBSD.org> |
Avoid acquiring the object lock if color is already set. It can not be unset until the object is recycled so this check is stable. Now that we can acquire the ref without a lock it is not necessary to group these operations and we can avoid it entirely in many cases. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D22565
|
#
7cdcf863 |
|
13-Nov-2019 |
Doug Moore <dougm@FreeBSD.org> |
Define wrapper functions vm_map_entry_{succ,pred} to act as wrappers around entry->{next,prev} when those are used for ordered list traversal, and use those wrapper functions everywhere. Where the next field is used for maintaining a stack of deferred operations, #define defer_next to make that different usage clearer, and then use the 'right' pointer instead of 'next' for that purpose. Approved by: markj Tested by: pho (as part of a larger patch) Differential Revision: https://reviews.freebsd.org/D22347
|
#
01cef4ca |
|
16-Oct-2019 |
Mark Johnston <markj@FreeBSD.org> |
Remove page locking from pmap_mincore(). After r352110 the page lock no longer protects a page's identity, so there is no purpose in locking the page in pmap_mincore(). Instead, if vm.mincore_mapped is set to the non-default value of 0, re-lookup the page after acquiring its object lock, which holds the page's identity stable. The change removes the last callers of vm_page_pa_tryrelock(), so remove it. Reviewed by: kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21823
|
#
d0c9294b |
|
16-Oct-2019 |
Mark Johnston <markj@FreeBSD.org> |
Correct the range boundaries used by kern_mincore(). Reported by: alc Sponsored by: Netflix
|
#
0012f373 |
|
14-Oct-2019 |
Jeff Roberson <jeff@FreeBSD.org> |
(4/6) Protect page valid with the busy lock. Atomics are used for page busy and valid state when the shared busy is held. The details of the locking protocol and valid and dirty synchronization are in the updated vm_page.h comments. Reviewed by: kib, markj Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D21594
|
#
e8bcf696 |
|
16-Sep-2019 |
Mark Johnston <markj@FreeBSD.org> |
Revert r352406, which contained changes I didn't intend to commit.
|
#
41fd4b94 |
|
16-Sep-2019 |
Mark Johnston <markj@FreeBSD.org> |
Fix a couple of nits in r352110. - Remove a dead variable from the amd64 pmap_extract_and_hold(). - Fix grammar in the vm_page_wire man page. Reported by: alc Reviewed by: alc, kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21639
|
#
fe7bcbaf |
|
03-Sep-2019 |
Kyle Evans <kevans@FreeBSD.org> |
vm pager: writemapping accounting for OBJT_SWAP Currently writemapping accounting is only done for vnode_pager which does some accounting on the underlying vnode. Extend this to allow accounting to be possible for any of the pager types. New pageops are added to update/release writecount that need to be implemented for any pager wishing to do said accounting, and we implement these methods now for both vnode_pager (unchanged) and swap_pager. The primary motivation for this is to allow other systems with OBJT_SWAP objects to check if their objects have any write mappings and reject operations with EBUSY if so. posixshm will be the first to do so in order to reject adding write seals to the shmfd if any writable mappings exist. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D21456
|
#
5dc7e31a |
|
02-Jul-2019 |
Konstantin Belousov <kib@FreeBSD.org> |
Control implicit PROT_MAX() using procctl(2) and the FreeBSD note feature bit. In particular, allocate the bit to opt-out the image from implicit PROTMAX enablement. Provide procctl(2) verbs to set and query implicit PROTMAX handling. The knobs mimic the same per-image flag and per-process controls for ASLR. Reviewed by: emaste, markj (previous version) Discussed with: brooks Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D20795
|
#
37306951 |
|
02-Jul-2019 |
Konstantin Belousov <kib@FreeBSD.org> |
Use traditional 'p' local to designate td->td_proc in kern_mmap. Reviewed by: emaste, markj Sponsored by: The FreeBSD Foundation MFC after: 3 days Differential revision: https://reviews.freebsd.org/D20795
|
#
74a1b66c |
|
20-Jun-2019 |
Brooks Davis <brooks@FreeBSD.org> |
Extend mmap/mprotect API to specify the max page protections. A new macro PROT_MAX() alters a protection value so it can be OR'd with a regular protection value to specify the maximum permissions. If present, these flags specify the maximum permissions. While these flags are non-portable, they can be used in portable code with simple ifdefs to expand PROT_MAX() to 0. This change allows (e.g.) a region that must be writable during run-time linking or JIT code generation to be made permanently read+execute after writes are complete. This complements W^X protections allowing more precise control by the programmer. This change alters mprotect argument checking and returns an error when unhandled protection flags are set. This differs from POSIX (in that POSIX only specifies an error), but is the documented behavior on Linux and more closely matches historical mmap behavior. In addition to explicit setting of the maximum permissions, an experimental sysctl vm.imply_prot_max causes mmap to assume that the initial permissions requested should be the maximum when the sysctl is set to 1. PROT_NONE mappings are excluded from this for compatibility with rtld and other consumers that use such mappings to reserve address space before mapping contents into part of the reservation. A final version this is expected to provide per-binary and per-process opt-in/out options and this sysctl will go away in its current form. As such it is undocumented. Reviewed by: emaste, kib (prior version), markj Additional suggestions from: alc Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D18880
|
#
f8c8b2e8 |
|
10-Jun-2019 |
Doug Moore <dougm@FreeBSD.org> |
r348879 introduced a wrong-way comparison that broke mmap. This change rights that comparison. Reported by: pho Approved by: markj (mentor) MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D20595
|
#
77555b84 |
|
10-Jun-2019 |
Doug Moore <dougm@FreeBSD.org> |
Change the check for 'size' wrapping around to zero in kern_mmap to account for both the lower and upper bound modifications. Change the error returned to ENOMEM. Rename the parameter size to len and make size a local variable that stores the value of len after it has been modified. This addresses concerns expressed by Bruce Evans after r348843. Reported by: brde@optusnet.com.au Reviewed by: kib, markj (mentors) MFC after: 3 days Relnotes: yes Differential Revision: https://reviews.freebsd.org/D20592
|
#
97220a27 |
|
09-Jun-2019 |
Doug Moore <dougm@FreeBSD.org> |
There are times when a len==0 parameter to mmap is okay. But on a 32-bit machine, a len parameter just a few bytes short of 4G, rounded up to a page boundary and hitting zero then, is not okay. Return failure in that case. Reported by: pho Reviewed by: alc, kib (mentor) Tested by: pho Differential Revision: https://reviews.freebsd.org/D20580
|
#
8cd6a80d |
|
13-May-2019 |
Mark Johnston <markj@FreeBSD.org> |
Restore the pre-r347532 behaviour of ignoring wiring failures in mmap(). The error handling added in r347532 is not right when mapping vnodes and will be fixed separately. Reported by: syzbot+1d2cc393bd6c88a548be@syzkaller.appspotmail.com MFC with: r347532
|
#
54a3a114 |
|
13-May-2019 |
Mark Johnston <markj@FreeBSD.org> |
Provide separate accounting for user-wired pages. Historically we have not distinguished between kernel wirings and user wirings for accounting purposes. User wirings (via mlock(2)) were subject to a global limit on the number of wired pages, so if large swaths of physical memory were wired by the kernel, as happens with the ZFS ARC among other things, the limit could be exceeded, causing user wirings to fail. The change adds a new counter, v_user_wire_count, which counts the number of virtual pages wired by user processes via mlock(2) and mlockall(2). Only user-wired pages are subject to the system-wide limit which helps provide some safety against deadlocks. In particular, while sources of kernel wirings typically support some backpressure mechanism, there is no way to reclaim user-wired pages shorting of killing the wiring process. The limit is exported as vm.max_user_wired, renamed from vm.max_wired, and changed from u_int to u_long. The choice to count virtual user-wired pages rather than physical pages was done for simplicity. There are mechanisms that can cause user-wired mappings to be destroyed while maintaining a wiring of the backing physical page; these make it difficult to accurately track user wirings at the physical page layer. The change also closes some holes which allowed user wirings to succeed even when they would cause the system limit to be exceeded. For instance, mmap() may now fail with ENOMEM in a process that has called mlockall(MCL_FUTURE) if the new mapping would cause the user wiring limit to be exceeded. Note that bhyve -S is subject to the user wiring limit, which defaults to 1/3 of physical RAM. Users that wish to exceed the limit must tune vm.max_user_wired. Reviewed by: kib, ngie (mlock() test changes) Tested by: pho (earlier version) MFC after: 45 days Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19908
|
#
78022527 |
|
05-May-2019 |
Konstantin Belousov <kib@FreeBSD.org> |
Switch to use shared vnode locks for text files during image activation. kern_execve() locks text vnode exclusive to be able to set and clear VV_TEXT flag. VV_TEXT is mutually exclusive with the v_writecount > 0 condition. The change removes VV_TEXT, replacing it with the condition v_writecount <= -1, and puts v_writecount under the vnode interlock. Each text reference decrements v_writecount. To clear the text reference when the segment is unmapped, it is recorded in the vm_map_entry backed by the text file as MAP_ENTRY_VN_TEXT flag, and v_writecount is incremented on the map entry removal The operations like VOP_ADD_WRITECOUNT() and VOP_SET_TEXT() check that v_writecount does not contradict the desired change. vn_writecheck() is now racy and its use was eliminated everywhere except access. Atomic check for writeability and increment of v_writecount is performed by the VOP. vn_truncate() now increments v_writecount around VOP_SETATTR() call, lack of which is arguably a bug on its own. nullfs bypasses v_writecount to the lower vnode always, so nullfs vnode has its own v_writecount correct, and lower vnode gets all references, since object->handle is always lower vnode. On the text vnode' vm object dealloc, the v_writecount value is reset to zero, and deadfs vop_unset_text short-circuit the operation. Reclamation of lowervp always reclaims all nullfs vnodes referencing lowervp first, so no stray references are left. Reviewed by: markj, trasz Tested by: mjg, pho Sponsored by: The FreeBSD Foundation MFC after: 1 month Differential revision: https://reviews.freebsd.org/D19923
|
#
5dddee2d |
|
08-Feb-2019 |
Konstantin Belousov <kib@FreeBSD.org> |
i386: honor kern.elf32.read_exec for ommap(2) and break(2), as already done on amd64. Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
a7f67fac |
|
08-Feb-2019 |
Konstantin Belousov <kib@FreeBSD.org> |
Normalize the declaration of i386_read_exec variable. It is currently re-declared in sys/sysent.h which is a wrong place for MD variable. Which causes redeclaration error with gcc when sys/sysent.h and machine/md_var.h are included both. Remove it from sys/sysent.h and instead include machine/md_var.h when needed, under #ifdef for both i386 and amd64. Reported and tested by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
3fbc2e00 |
|
07-Jan-2019 |
Konstantin Belousov <kib@FreeBSD.org> |
Add a tunable which changes mincore(2) algorithm to only report data from the local mapping. Enable the setting by default. The article behind the change: https://arxiv.org/abs/1901.01161 Reviewed by: markj Discussed with: emaste Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D18764
|
#
cc426dd3 |
|
11-Dec-2018 |
Mateusz Guzik <mjg@FreeBSD.org> |
Remove unused argument to priv_check_cred. Patch mostly generated with cocinnelle: @@ expression E1,E2; @@ - priv_check_cred(E1,E2,0) + priv_check_cred(E1,E2) Sponsored by: The FreeBSD Foundation
|
#
d48719bd |
|
04-Dec-2018 |
Brooks Davis <brooks@FreeBSD.org> |
Normalize COMPAT_43 syscall declarations. Have ogetkerninfo, ogetpagesize, ogethostname, osethostname, and oaccept declare o<foo>_args structs rather than non-compat ones. Due to a failure to use NOARGS in most cases this adds only one new declaration. No changes required in freebsd32 as only ogetpagesize() is implemented and it has a 32-bit specific implementation. Reviewed by: kib Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D15816
|
#
2554f86a |
|
17-Sep-2018 |
Mateusz Guzik <mjg@FreeBSD.org> |
vm: stop taking proc lock in mmap to satisfy racct if it is disabled Limits can be safely obtained with lim_cur from the thread. racct is compiled in but disabled by default. Note that racct enablement is a boot-only tunable. This eliminates second most common place of taking the lock while pkg building. While here don't take the lock in mlockall either. Reviewed by: kib Approved by: re (gjb) Differential Revision: https://reviews.freebsd.org/D17210
|
#
6e1d2cf6 |
|
31-Jul-2018 |
Konstantin Belousov <kib@FreeBSD.org> |
For compat32, emulate the same wraparound check as occurs on the real ILP32 system. Reported by and discussed with: asomers PR: 230162 Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D16525
|
#
3e7cb27c |
|
04-Jun-2018 |
Alan Cox <alc@FreeBSD.org> |
Use a single, consistent approach to returning success versus failure in vm_map_madvise(). Previously, vm_map_madvise() used a traditional Unix- style "return (0);" to indicate success in the common case, but Mach- style return values in the edge cases. Since KERN_SUCCESS equals zero, the only problem with this inconsistency was stylistic. vm_map_madvise() has exactly two callers in the entire source tree, and only one of them cares about the return value. That caller, kern_madvise(), can be simplified if vm_map_madvise() consistently uses Unix-style return values. Since vm_map_madvise() uses the variable modify_map as a Boolean, make it one. Eliminate a redundant error check from kern_madvise(). Add a comment explaining where the check is performed. Explicitly note that exec_release_args_kva() doesn't care about vm_map_madvise()'s return value. Since MADV_FREE is passed as the behavior, the return value will always be zero. Reviewed by: kib, markj MFC after: 7 days
|
#
633d3b1c |
|
01-Jun-2018 |
Konstantin Belousov <kib@FreeBSD.org> |
Only check for MAP_32BIT when available. Reported by: mmacy Sponsored by: The FreeBSD Foundation MFC after: 10 days
|
#
60221a57 |
|
01-Jun-2018 |
Alan Cox <alc@FreeBSD.org> |
Only a small subset of mmap(2)'s flags should be used in combination with the flag MAP_GUARD. Rather than enumerating the flags that are not allowed, enumerate the flags that are allowed. The list of allowed flags is much shorter and less likely to change. (As an aside, one of the previously enumerated flags, MAP_PREFAULT, was not even a legal flag for mmap(2). However, because of an earlier check within kern_mmap(), this misuse of MAP_PREFAULT was harmless.) Reviewed by: kib MFC after: 10 days
|
#
6469bdcd |
|
06-Apr-2018 |
Brooks Davis <brooks@FreeBSD.org> |
Move most of the contents of opt_compat.h to opt_global.h. opt_compat.h is mentioned in nearly 180 files. In-progress network driver compabibility improvements may add over 100 more so this is closer to "just about everywhere" than "only some files" per the guidance in sys/conf/options. Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h is created on all architectures. Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the set of compiled files. Reviewed by: kib, cem, jhb, jtl Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14941
|
#
e958ad4c |
|
12-Feb-2018 |
Jeff Roberson <jeff@FreeBSD.org> |
Make v_wire_count a per-cpu counter(9) counter. This eliminates a significant source of cache line contention from vm_page_alloc(). Use accessors and vm_page_unwire_noq() so that the mechanism can be easily changed in the future. Reviewed by: markj Discussed with: kib, glebius Tested by: pho (earlier version) Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D14273
|
#
1c5196c3 |
|
19-Jan-2018 |
Konstantin Belousov <kib@FreeBSD.org> |
Assign map->header values to avoid boundary checks. In several places, entry start and end field are checked, after excluding the possibility that the entry is map->header. By assigning max and min values to the start and end fields of map->header in vm_map_init, the explicit map->header checks become unnecessary. Submitted by: Doug Moore <dougm@rice.edu> Reviewed by: alc, kib, markj (previous version) Tested by: pho (previous version) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D13735
|
#
51369649 |
|
20-Nov-2017 |
Pedro F. Giffuni <pfg@FreeBSD.org> |
sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.
|
#
bd0e1beb |
|
07-Nov-2017 |
Mark Johnston <markj@FreeBSD.org> |
Correct the type of foff. No functional change intended. Github PR: 124 Submitted by: Wuyang Chung <wuyang.m.chung@outlook.com> MFC after: 1 week
|
#
6a97a3f7 |
|
27-Jun-2017 |
Konstantin Belousov <kib@FreeBSD.org> |
Treat the addr argument for mmap(2) request without MAP_FIXED flag as a hint. Right now, for non-fixed mmap(2) calls, addr is de-facto interpreted as the absolute minimal address of the range where the mapping is created. The VA allocator only allocates in the range [addr, VM_MAXUSER_ADDRESS]. This is too restrictive, the mmap(2) call might unduly fail if there is no free addresses above addr but a lot of usable space below it. Lift this implementation limitation by allocating VA in two passes. First, try to allocate above addr, as before. If that fails, do the second pass with less restrictive constraints for the start of allocation by specifying minimal allocation address at the max bss end, if this limit is less than addr. One important case where this change makes a difference is the allocation of the stacks for new threads in libthr. Under some configuration conditions, libthr tries to hint kernel to reuse the main thread stack grow area for the new stacks. This cannot work by design now after grow area is converted to stack, and there is no unallocated VA above the main stack. Interpreting requested stack base address as the hint provides compatibility with old libthr and with (mis-)configured current libthr. Reviewed by: alc Tested by: dim (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
19bd0d9c |
|
24-Jun-2017 |
Konstantin Belousov <kib@FreeBSD.org> |
Implement address space guards. Guard, requested by the MAP_GUARD mmap(2) flag, prevents the reuse of the allocated address space, but does not allow instantiation of the pages in the range. It is useful for more explicit support for usual two-stage reserve then commit allocators, since it prevents accidental instantiation of the mapping, e.g. by mprotect(2). Use guards to reimplement stack grow code. Explicitely track stack grow area with the guard, including the stack guard page. On stack grow, trivial shift of the guard map entry and stack map entry limits makes the stack expansion. Move the code to detect stack grow and call vm_map_growstack(), from vm_fault() into vm_map_lookup(). As result, it is impossible to get random mapping to occur in the stack grow area, or to overlap the stack guard page. Enable stack guard page by default. Reviewed by: alc, markj Man page update reviewed by: alc, bjk, emaste, markj, pho Tested by: pho, Qualys Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D11306 (man pages)
|
#
46dc8e9d |
|
30-Mar-2017 |
Dmitry Chagin <dchagin@FreeBSD.org> |
Add kern_mincore() helper for micore() syscall. Suggested by: kib@ Reviewed by: kib@ MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D10143
|
#
d1780e8d |
|
14-Mar-2017 |
Konstantin Belousov <kib@FreeBSD.org> |
Use atop() instead of OFF_TO_IDX() for convertion of addresses or addresses offsets, as intended. Suggested and reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
|
#
fbbd9655 |
|
28-Feb-2017 |
Warner Losh <imp@FreeBSD.org> |
Renumber copyright clause 4 Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96
|
#
496ab053 |
|
13-Feb-2017 |
Konstantin Belousov <kib@FreeBSD.org> |
Rework r313352. Rename kern_vm_* functions to kern_*. Move the prototypes to syscallsubr.h. Also change Mach VM types to uintptr_t/size_t as needed, to avoid headers pollution. Requested by: alc, jhb Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D9535
|
#
04e89ffb |
|
12-Feb-2017 |
Konstantin Belousov <kib@FreeBSD.org> |
Remove MPSAFE and ARGUSED annotations, ANSI-fy syscall handlers. Discussed with: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
1c2ad3e9 |
|
11-Feb-2017 |
Konstantin Belousov <kib@FreeBSD.org> |
Change type of the prot parameter for kern_vm_mmap() from vm_prot_t to int. This makes the code to pass whole word of the mmap(2) syscall argument prot to the syscall helper kern_vm_mmap(), which can validate all bits. The change provides temporal fix for sys/vm/mmap_test mmap__bad_arguments, which was broken after r313352. PR: 216976 Reported and tested by: ngie Sponsored by: The FreeBSD Foundation
|
#
69cdfcef |
|
06-Feb-2017 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Add kern_vm_mmap2(), kern_vm_mprotect(), kern_vm_msync(), kern_vm_munlock(), kern_vm_munmap(), and kern_vm_madvise(), and use them in various compats instead of their sys_*() counterparts. Reviewed by: ed, dchagin, kib MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D9378
|
#
736ff8c3 |
|
24-Jan-2017 |
Mateusz Guzik <mjg@FreeBSD.org> |
hwpmc: partially depessimize munmap handling if the module is not loaded HWPMC_HOOKS is enabled in GENERIC and triggers some work avoidable in the common (module not loaded) case. In particular this avoids permission checks + lock downgrade singlethreaded and in cases were an executable mapping is found the pmc sx lock is no longer bounced. Note this is a band aid. MFC after: 1 week
|
#
7667839a |
|
15-Nov-2016 |
Alan Cox <alc@FreeBSD.org> |
Remove most of the code for implementing PG_CACHED pages. (This change does not remove user-space visible fields from vm_cnt or all of the references to cached pages from comments. Those changes will come later.) Reviewed by: kib, markj Tested by: pho Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8497
|
#
0df42647 |
|
10-Jul-2016 |
Robert Watson <rwatson@FreeBSD.org> |
When mmap(2) is used with a vnode, capture vnode attributes in the audit trail. This was not required for Common Criteria auditing (which requires only that the intent to read or write be audited at the time of open(2)), but is useful for contemporary live analysis and forensics. MFC after: 3 days Sponsored by: DARPA, AFRL
|
#
51d1f690 |
|
10-Jul-2016 |
Robert Watson <rwatson@FreeBSD.org> |
Audit file-descriptor arguments to I/O system calls such as read(2), write(2), dup(2), and mmap(2). This auditing is not required by the Common Criteria (and hence was not being performed), but is valuable in both contemporary live analysis and forensic use cases. MFC after: 3 days Sponsored by: DARPA, AFRL
|
#
010ba384 |
|
05-Jul-2015 |
Mark Johnston <markj@FreeBSD.org> |
Add a local variable initialization needed in the OBJT_DEFAULT case. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D2992
|
#
cd336bad |
|
02-Jul-2015 |
Mateusz Guzik <mjg@FreeBSD.org> |
vm: don't lock proc around accesses to vm_{t,d}addr and RLIMIT_DATA in sys_mmap vm_{t,d}addr are constant and we can use thread's copy of resource limits
|
#
f6f6d240 |
|
10-Jun-2015 |
Mateusz Guzik <mjg@FreeBSD.org> |
Implement lockless resource limits. Use the same scheme implemented to manage credentials. Code needing to look at process's credentials (as opposed to thred's) is provided with *_proc variants of relevant functions. Places which possibly had to take the proc lock anyway still use the proc pointer to access limits.
|
#
7077c426 |
|
04-Jun-2015 |
John Baldwin <jhb@FreeBSD.org> |
Add a new file operations hook for mmap operations. File type-specific logic is now placed in the mmap hook implementation rather than requiring it to be placed in sys/vm/vm_mmap.c. This hook allows new file types to support mmap() as well as potentially allowing mmap() for existing file types that do not currently support any mapping. The vm_mmap() function is now split up into two functions. A new vm_mmap_object() function handles the "back half" of vm_mmap() and accepts a referenced VM object to map rather than a (handle, handle_type) tuple. vm_mmap() is now reduced to converting a (handle, handle_type) tuple to a a VM object and then calling vm_mmap_object() to handle the actual mapping. The vm_mmap() function remains for use by other parts of the kernel (e.g. device drivers and exec) but now only supports mapping vnodes, character devices, and anonymous memory. The mmap() system call invokes vm_mmap_object() directly with a NULL object for anonymous mappings. For mappings using a file descriptor, the descriptors fo_mmap() hook is invoked instead. The fo_mmap() hook is responsible for performing type-specific checks and adjustments to arguments as well as possibly modifying mapping parameters such as flags or the object offset. The fo_mmap() hook routines then call vm_mmap_object() to handle the actual mapping. The fo_mmap() hook is optional. If it is not set, then fo_mmap() will fail with ENODEV. A fo_mmap() hook is implemented for regular files, character devices, and shared memory objects (created via shm_open()). While here, consistently use the VM_PROT_* constants for the vm_prot_t type for the 'prot' variable passed to vm_mmap() and vm_mmap_object() as well as the vm_mmap_vnode() and vm_mmap_cdev() helper routines. Previously some places were using the mmap()-specific PROT_* constants instead. While this happens to work because PROT_xx == VM_PROT_xx, using VM_PROT_* is more correct. Differential Revision: https://reviews.freebsd.org/D2658 Reviewed by: alc (glanced over), kib MFC after: 1 month Sponsored by: Chelsio
|
#
4b5c9cf6 |
|
29-Apr-2015 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Add kern.racct.enable tunable and RACCT_DISABLED config option. The point of this is to be able to add RACCT (with RACCT_DISABLED) to GENERIC, to avoid having to rebuild the kernel to use rctl(8). Differential Revision: https://reviews.freebsd.org/D2369 Reviewed by: kib@ MFC after: 1 month Relnotes: yes Sponsored by: The FreeBSD Foundation
|
#
0538aafc |
|
18-Apr-2015 |
Konstantin Belousov <kib@FreeBSD.org> |
The lseek(2), mmap(2), truncate(2), ftruncate(2), pread(2), and pwrite(2) syscalls are wrapped to provide compatibility with pre-7.x kernels which required padding before the off_t parameter. The fcntl(2) contains compatibility code to handle kernels before the struct flock was changed during the 8.x CURRENT development. The shims were reasonable to allow easier revert to the older kernel at that time. Now, two or three major releases later, shims do not serve any purpose. Such old kernels cannot handle current libc, so revert the compatibility code. Make padded syscalls support conditional under the COMPAT6 config option. For COMPAT32, the syscalls were under COMPAT6 already. Remove WITHOUT_SYSCALL_COMPAT build option, which only purpose was to (partially) disable the removed shims. Reviewed by: jhb, imp (previous versions) Discussed with: peter Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
3d653db0 |
|
21-Mar-2015 |
Alan Cox <alc@FreeBSD.org> |
Introduce vm_object_color() and use it in mmap(2) to set the color of named objects to zero before the virtual address is selected. Previously, the color setting was delayed until after the virtual address was selected. In rtld, this delay effectively prevented the mapping of a shared library's code section using superpages. Now, for example, we see the first 1 MB of libc's code on armv6 mapped by a superpage after we've gotten through the initial cold misses that bring the first 1 MB of code into memory. (With the page clustering that we perform on read faults, this happens quickly.) Differential Revision: https://reviews.freebsd.org/D2013 Reviewed by: jhb, kib Tested by: Svatopluk Kraus (armv6) MFC after: 6 weeks
|
#
f81b73f3 |
|
28-Feb-2015 |
Alan Cox <alc@FreeBSD.org> |
Eliminate a variable that became unused when VFS_LOCK_GIANT() was eliminated. MFC after: 3 days
|
#
01ca58b2 |
|
05-Dec-2014 |
John Baldwin <jhb@FreeBSD.org> |
Always ignore the deprecated MAP_RENAME and MAP_NORESERVE flags to mmap(). Some old libraries may be used even with newer binaries (specifically the Nvidia driver libraries). Differential Revision: https://reviews.freebsd.org/D1262 Reviewed by: kib
|
#
5817298f |
|
17-Oct-2014 |
John Baldwin <jhb@FreeBSD.org> |
Retire the unimplemented MAP_RENAME and MAP_NORESERVE flags to mmap(2). Older binaries are still permitted to use these flags. PR: 193961 (exp-run in ports) Differential Revision: https://reviews.freebsd.org/D848 Reviewed by: kib
|
#
10204535 |
|
17-Sep-2014 |
Konstantin Belousov <kib@FreeBSD.org> |
The vm_mmap_cdev() explicitely converts absence of both MAP_SHARED and MAP_PRIVATE flags to MAP_SHARED. Apparently, some code in tree, in particular, libgeom, relied on this behaviour, see r271721. For regular file types, the absence of the flags is interpreted as MAP_PRIVATE, and libc nlist used this (fixed in r271723). Allow the implicit flags for legacy binaries. Bump __FreeBSD_version to get the ABI note on new binaries to check for in mmap code. Remove the test for presence of one of the MAP_ANON, MAP_SHARED or MAP_PRIVATE flags before fget_mmap(). For MAP_ANON, we already verify that passed fd == -1. For fd != -1, test after fget_mmap() (for newer binaries) covers the case. Reported by: bdrewery, pho Reviewed by: jhb Sponsored by: The FreeBSD Foundation
|
#
8bafac54 |
|
16-Sep-2014 |
John Baldwin <jhb@FreeBSD.org> |
Permit MAP_RENAME and MAP_NORESERVE for now. These flags should be removed, but at least Chromium and OpenJDK use MAP_NORESERVE.
|
#
5fd3f8b3 |
|
15-Sep-2014 |
John Baldwin <jhb@FreeBSD.org> |
Add stricter checking of some mmap() arguments: - Fail with EINVAL if an invalid protection mask is passed to mmap(). - Fail with EINVAL if an unknown flag is passed to mmap(). - Fail with EINVAL if both MAP_PRIVATE and MAP_SHARED are passed to mmap(). - Require one of either MAP_PRIVATE or MAP_SHARED for non-anonymous mappings. Reviewed by: alc, kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D698
|
#
e7d939bd |
|
06-Jul-2014 |
Marcel Moolenaar <marcel@FreeBSD.org> |
Remove ia64. This includes: o All directories named *ia64* o All files named *ia64* o All ia64-specific code guarded by __ia64__ o All ia64-specific makefile logic o Mention of ia64 in comments and documentation This excludes: o Everything under contrib/ o Everything under crypto/ o sys/xen/interface o sys/sys/elf_common.h Discussed at: BSDcan
|
#
af3b2549 |
|
27-Jun-2014 |
Hans Petter Selasky <hselasky@FreeBSD.org> |
Pull in r267961 and r267973 again. Fix for issues reported will follow.
|
#
37a107a4 |
|
27-Jun-2014 |
Glen Barber <gjb@FreeBSD.org> |
Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory
|
#
3da1cf1e |
|
27-Jun-2014 |
Hans Petter Selasky <hselasky@FreeBSD.org> |
Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies
|
#
11c42bcc |
|
18-Jun-2014 |
Konstantin Belousov <kib@FreeBSD.org> |
Add MAP_EXCL flag for mmap(2). It should be combined with MAP_FIXED, and prevents the request from deleting existing mappings in the region, failing instead. Reviewed by: alc Discussed with: jhb Tested by: markj, pho (previous version, as part of the bigger patch) Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
4648ba0a |
|
08-Jun-2014 |
Konstantin Belousov <kib@FreeBSD.org> |
Make mmap(MAP_STACK) search for the available address space, similar to !MAP_STACK mapping requests. For MAP_STACK | MAP_FIXED, clear any mappings which could previously exist in the used range. For this, teach vm_map_find() and vm_map_fixed() to handle MAP_STACK_GROWS_DOWN or _UP cow flags, by calling a new vm_map_stack_locked() helper, which is factored out from vm_map_stack(). The side effect of the change is that MAP_STACK started obeying MAP_ALIGNMENT and MAP_32BIT flags. Reported by: rwatson Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
|
#
e103f5b1 |
|
07-May-2014 |
Peter Holm <pho@FreeBSD.org> |
msync(2) must return ENOMEM and not EINVAL when the address is outside the allowed range or when one or more pages are not mapped. This according to The Open Group Base Specifications Issue 7. Discussed with: attilio, Bruce Evans Reviewed by: alc, Garrett Cooper Reported by: ATF MFC after: 2 weeks Sponsored by: EMC / Isilon storage division
|
#
44f1c916 |
|
22-Mar-2014 |
Bryan Drewery <bdrewery@FreeBSD.org> |
Rename global cnt to vm_cnt to avoid shadowing. To reduce the diff struct pcu.cnt field was not renamed, so PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in kvm(3) and vmstat(8). The goal was to not affect externally used KPI. Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the the global cnt variable. Exp-run revealed no ports using it directly. No objection from: arch@ Sponsored by: EMC / Isilon Storage Division
|
#
4a144410 |
|
16-Mar-2014 |
Robert Watson <rwatson@FreeBSD.org> |
Update kernel inclusions of capability.h to use capsicum.h instead; some further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h. MFC after: 3 weeks
|
#
55648840 |
|
19-Sep-2013 |
John Baldwin <jhb@FreeBSD.org> |
Extend the support for exempting processes from being killed when swap is exhausted. - Add a new protect(1) command that can be used to set or revoke protection from arbitrary processes. Similar to ktrace it can apply a change to all existing descendants of a process as well as future descendants. - Add a new procctl(2) system call that provides a generic interface for control operations on processes (as opposed to the debugger-specific operations provided by ptrace(2)). procctl(2) uses a combination of idtype_t and an id to identify the set of processes on which to operate similar to wait6(). - Add a PROC_SPROTECT control operation to manage the protection status of a set of processes. MADV_PROTECT still works for backwards compatability. - Add a p_flag2 to struct proc (and a corresponding ki_flag2 to kinfo_proc) the first bit of which is used to track if P_PROTECT should be inherited by new child processes. Reviewed by: kib, jilles (earlier version) Approved by: re (delphij) MFC after: 1 month
|
#
6a87d217 |
|
12-Sep-2013 |
John Baldwin <jhb@FreeBSD.org> |
Fix an off-by-one error when populating mincore(2) entries for skipped entries. lastvecindex references the last valid byte, so the new bytes should come after it. Approved by: re (kib) MFC after: 1 week
|
#
edb572a3 |
|
09-Sep-2013 |
John Baldwin <jhb@FreeBSD.org> |
Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping use an address in the first 2GB of the process's address space. This flag should have the same semantics as the same flag on Linux. To facilitate this, add a new parameter to vm_map_find() that specifies an optional maximum virtual address. While here, fix several callers of vm_map_find() to use a VMFS_* constant for the findspace argument instead of TRUE and FALSE. Reviewed by: alc Approved by: re (kib)
|
#
7008be5b |
|
04-Sep-2013 |
Pawel Jakub Dawidek <pjd@FreeBSD.org> |
Change the cap_rights_t type from uint64_t to a structure that we can extend in the future in a backward compatible (API and ABI) way. The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough. The structure definition looks like this: struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; }; The initial CAP_RIGHTS_VERSION is 0. The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements. The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future. To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg. #define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL) We still support aliases that combine few rights, but the rights have to belong to the same array element, eg: #define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL) #define CAP_FCHMODAT (CAP_FCHMOD | CAP_LOOKUP) There is new API to manage the new cap_rights_t structure: cap_rights_t *cap_rights_init(cap_rights_t *rights, ...); void cap_rights_set(cap_rights_t *rights, ...); void cap_rights_clear(cap_rights_t *rights, ...); bool cap_rights_is_set(const cap_rights_t *rights, ...); bool cap_rights_is_valid(const cap_rights_t *rights); void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src); void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src); bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little); Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg: cap_rights_t rights; cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT); There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg: #define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...); Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1: cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL); Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition. This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x. Sponsored by: The FreeBSD Foundation
|
#
5aa60b6f |
|
16-Aug-2013 |
John Baldwin <jhb@FreeBSD.org> |
Add new mmap(2) flags to permit applications to request specific virtual address alignment of mappings. - MAP_ALIGNED(n) requests a mapping aligned on a boundary of (1 << n). Requests for n >= number of bits in a pointer or less than the size of a page fail with EINVAL. This matches the API provided by NetBSD. - MAP_ALIGNED_SUPER is a special case of MAP_ALIGNED. It can be used to optimize the chances of using large pages. By default it will align the mapping on a large page boundary (the system is free to choose any large page size to align to that seems best for the mapping request). However, if the object being mapped is already using large pages, then it will align the virtual mapping to match the existing large pages in the object instead. - Internally, VMFS_ALIGNED_SPACE is now renamed to VMFS_SUPER_SPACE, and VMFS_ALIGNED_SPACE(n) is repurposed for specifying a specific alignment. MAP_ALIGNED(n) maps to using VMFS_ALIGNED_SPACE(n), while MAP_ALIGNED_SUPER maps to VMFS_SUPER_SPACE. - mmap() of a device object now uses VMFS_OPTIMAL_SPACE rather than explicitly using VMFS_SUPER_SPACE. All device objects are forced to use a specific color on creation, so VMFS_OPTIMAL_SPACE is effectively equivalent. Reviewed by: alc MFC after: 1 month
|
#
fc2b1679 |
|
22-Jul-2013 |
Jeremie Le Hen <jlh@FreeBSD.org> |
Fix previous commit when option RACCT is not used. MFC after: 7 days
|
#
c92b5069 |
|
22-Jul-2013 |
Jeremie Le Hen <jlh@FreeBSD.org> |
Fix a panic in the racct code when munlock(2) is called with incorrect values. The racct code in sys_munlock() assumed that the boundaries provided by the userland were correct as long as vm_map_unwire() returned successfully. However the latter contains its own logic and sometimes manages to do something out of those boundaries, even if they are buggy. This change makes the racct code to use the accounting done by the vm layer, as it is done in other places such as vm_mlock(). Despite fixing the panic, Alan Cox pointed that this code is still race-y though: two simultaneous callers will produce incorrect values. Reviewed by: alc MFC after: 7 days
|
#
ff74a3fa |
|
19-Jul-2013 |
John Baldwin <jhb@FreeBSD.org> |
Be more aggressive in using superpages in all mappings of objects: - Add a new address space allocation method (VMFS_OPTIMAL_SPACE) for vm_map_find() that will try to alter the alignment of a mapping to match any existing superpage mappings of the object being mapped. If no suitable address range is found with the necessary alignment, vm_map_find() will fall back to using the simple first-fit strategy (VMFS_ANY_SPACE). - Change mmap() without MAP_FIXED, shmat(), and the GEM mapping ioctl to use VMFS_OPTIMAL_SPACE instead of VMFS_ANY_SPACE. Reviewed by: alc (earlier version) MFC after: 2 weeks
|
#
995d7069 |
|
08-Jun-2013 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Make sys_mlock() function just a wrapper around vm_mlock() function that does all the job. Reviewed by: kib, jilles Sponsored by: Nginx, Inc.
|
#
53f5f8a0 |
|
02-May-2013 |
Konstantin Belousov <kib@FreeBSD.org> |
Add a hint suggesting why tmpfs does not need a special case there.
|
#
e5f299ff |
|
28-Apr-2013 |
Konstantin Belousov <kib@FreeBSD.org> |
Make vm_object_page_clean() and vm_mmap_vnode() tolerate the vnode' v_object of non OBJT_VNODE type. For vm_object_page_clean(), simply do not assert that object type must be OBJT_VNODE, and add a comment explaining how the check for OBJ_MIGHTBEDIRTY prevents the rest of function from operating on such objects. For vm_mmap_vnode(), if the object type is not OBJT_VNODE, require it to be for swap pager (or default), handle the bypass filesystems, and correctly acquire the object reference in this case. Reviewed by: alc Tested by: pho, bf MFC after: 1 week
|
#
bafa6cfc |
|
28-Mar-2013 |
Konstantin Belousov <kib@FreeBSD.org> |
Release the v_writecount reference on the vnode in case of error, before the vnode is vput() in vm_mmap_vnode(). Error return means that there is no use reference on the vnode from the vm object reference, and failing to restore v_writecount breaks the invariant that v_writecount is less or equal to the usecount. The situation observed when nfs client returns ESTALE for VOP_GETATTR() after the open. In collaboration with: pho MFC after: 1 week
|
#
89f6b863 |
|
08-Mar-2013 |
Attilio Rao <attilio@FreeBSD.org> |
Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes. The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs. The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example). Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho
|
#
2609222a |
|
01-Mar-2013 |
Pawel Jakub Dawidek <pjd@FreeBSD.org> |
Merge Capsicum overhaul: - Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights. - The cap_new(2) system call is left, but it is no longer documented and should not be used in new code. - The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one. - The cap_getrights(2) syscall is renamed to cap_rights_get(2). - If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall. - If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2). - To support ioctl and fcntl white-listing the filedesc structure was heavly modified. - The audit subsystem, kdump and procstat tools were updated to recognize new syscalls. - Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below: CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT. Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2). Added CAP_SYMLINKAT: - Allow for symlinkat(2). Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2). Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory. Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall. Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call. Removed CAP_MAPEXEC. CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE. Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ | PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ | PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE | PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ | PROT_WRITE | PROT_EXEC). Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT. CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required). CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required). Added convinient defines: #define CAP_PREAD (CAP_SEEK | CAP_READ) #define CAP_PWRITE (CAP_SEEK | CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP | CAP_SEEK | CAP_READ) #define CAP_MMAP_W (CAP_MMAP | CAP_SEEK | CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP | CAP_SEEK | 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R | CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R | CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W | CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R | CAP_MMAP_W | CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE #define CAP_SOCK_CLIENT \ (CAP_CONNECT | CAP_GETPEERNAME | CAP_GETSOCKNAME | CAP_GETSOCKOPT | \ CAP_PEELOFF | CAP_RECV | CAP_SEND | CAP_SETSOCKOPT | CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT | CAP_BIND | CAP_GETPEERNAME | CAP_GETSOCKNAME | \ CAP_GETSOCKOPT | CAP_LISTEN | CAP_PEELOFF | CAP_RECV | CAP_SEND | \ CAP_SETSOCKOPT | CAP_SHUTDOWN) Added defines for backward API compatibility: #define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT | CAP_SOCK_SERVER) Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib
|
#
3ac7d297 |
|
09-Jan-2013 |
Andrey Zonov <zont@FreeBSD.org> |
- Reduce kernel size by removing unnecessary pointer indirections. GENERIC kernel size reduced in 16 bytes and RACCT kernel in 336 bytes. Suggested by: alc Reviewed by: alc Approved by: kib (mentor) MFC after: 1 week
|
#
7e19eda4 |
|
18-Dec-2012 |
Andrey Zonov <zont@FreeBSD.org> |
- Fix locked memory accounting for maps with MAP_WIREFUTURE flag. - Add sysctl vm.old_mlock which may turn such accounting off. Reviewed by: avg, trasz Approved by: kib (mentor) MFC after: 1 week
|
#
5050aa86 |
|
22-Oct-2012 |
Konstantin Belousov <kib@FreeBSD.org> |
Remove the support for using non-mpsafe filesystem modules. In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho
|
#
c4e357e8 |
|
05-Sep-2012 |
Andrey Zonov <zont@FreeBSD.org> |
- Simplify VM code by using vmspace_wired_count() for counting wired memory of a process. Reviewed by: avg Approved by: kib (mentor) MFC after: 2 weeks
|
#
ee4116b8 |
|
13-Aug-2012 |
Konstantin Belousov <kib@FreeBSD.org> |
For old mmap syscall, when executing on amd64 or ia64, enforce the PROT_EXEC if prot is non-zero, process is 32bit and kern.elf32.i386_read_exec syscal is enabled. This workaround is needed for old i386 a.out binaries, where dynamic linker did not specified PROT_EXEC for mapping of the text. The kern.elf32.i386_read_exec MIB name looks weird for a.out binaries, but I reused the existing knob which already has the needed semantic. MFC after: 1 week
|
#
7707ccab |
|
14-Aug-2012 |
Konstantin Belousov <kib@FreeBSD.org> |
Adjust the r205536, by allowing a non-zero offset for anonymous mappings for a.out binaries. Apparently, a.out ld.so from FreeBSD 1.1.5.1 can issue such requests. Reported and tested by: Dan Plassche <dplassche@gmail.com> MFC after: 1 week
|
#
1472f4f4 |
|
21-Apr-2012 |
Konstantin Belousov <kib@FreeBSD.org> |
When MAP_STACK mapping is created, the map entry is created only to cover the initial stack size. For MCL_WIREFUTURE maps, the subsequent call to vm_map_wire() to wire the whole stack region fails due to VM_MAP_WIRE_NOHOLES flag. Use the VM_MAP_WIRE_HOLESOK to only wire mapped part of the stack. Reported and tested by: Sushanth Rai <sushanth_rai yahoo com> Reviewed by: alc MFC after: 1 week
|
#
1c8279e4 |
|
08-Apr-2012 |
Alan Cox <alc@FreeBSD.org> |
Fix mincore(2) so that it reports PG_CACHED pages as resident. MFC after: 2 weeks
|
#
126d6082 |
|
17-Mar-2012 |
Konstantin Belousov <kib@FreeBSD.org> |
In vm_object_page_clean(), do not clean OBJ_MIGHTBEDIRTY object flag if the filesystem performed short write and we are skipping the page due to this. Propogate write error from the pager back to the callers of vm_pageout_flush(). Report the failure to write a page from the requested range as the FALSE return value from vm_object_page_clean(), and propagate it back to msync(2) to return EIO to usermode. While there, convert the clearobjflags variable in the vm_object_page_clean() and arguments of the helper functions to boolean. PR: kern/165927 Reviewed by: alc MFC after: 2 weeks
|
#
83cbe16f |
|
02-Mar-2012 |
Alan Cox <alc@FreeBSD.org> |
Eliminate stale incorrect ARGSUSED comments. Submitted by: bde
|
#
f9230ad6 |
|
25-Feb-2012 |
Alan Cox <alc@FreeBSD.org> |
Simplify vm_mmap()'s control flow. Add a comment describing what vm_mmap_to_errno() does. Reviewed by: kib MFC after: 3 weeks X-MFC after: r232071
|
#
9d22083d |
|
24-Feb-2012 |
Konstantin Belousov <kib@FreeBSD.org> |
Place the if() at the right location, to activate the v_writecount accounting for shared writeable mappings for all filesystems, not only for the bypass layers. Submitted by: alc Pointy hat to: kib MFC after: 20 days
|
#
84110e7e |
|
23-Feb-2012 |
Konstantin Belousov <kib@FreeBSD.org> |
Account the writeable shared mappings backed by file in the vnode v_writecount. Keep the amount of the virtual address space used by the mappings in the new vm_object un_pager.vnp.writemappings counter. The vnode v_writecount is incremented when writemappings gets non-zero value, and decremented when writemappings is returned to zero. Writeable shared vnode-backed mappings are accounted for in vm_mmap(), and vm_map_insert() is instructed to set MAP_ENTRY_VN_WRITECNT flag on the created map entry. During deferred map entry deallocation, vm_map_process_deferred() checks for MAP_ENTRY_VN_WRITECOUNT and decrements writemappings for the vm object. Now, the writeable mount cannot be demoted to read-only while writeable shared mappings of the vnodes from the mount point exist. Also, execve(2) fails for such files with ETXTBUSY, as it should be. Noted by: tegge Reviewed by: tegge (long time ago, early version), alc Tested by: pho MFC after: 3 weeks
|
#
a6492969 |
|
15-Feb-2012 |
Alan Cox <alc@FreeBSD.org> |
When vm_mmap() is used to map a vm object into a kernel vm_map, it makes no sense to check the size of the kernel vm_map against the user-level resource limits for the calling process. Reviewed by: kib
|
#
8211bd45 |
|
11-Feb-2012 |
Konstantin Belousov <kib@FreeBSD.org> |
Close a race due to dropping of the map lock between creating map entry for a shared mapping and marking the entry for inheritance. Other thread might execute vmspace_fork() in between (e.g. by fork(2)), resulting in the mapping becoming private. Noted and reviewed by: alc MFC after: 1 week
|
#
8451d0dd |
|
16-Sep-2011 |
Kip Macy <kmacy@FreeBSD.org> |
In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)
|
#
3407fefe |
|
06-Sep-2011 |
Konstantin Belousov <kib@FreeBSD.org> |
Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic flags field. Updates to the atomic flags are performed using the atomic ops on the containing word, do not require any vm lock to be held, and are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9) functions are provided to modify afalgs. Document the changes to flags field to only require the page lock. Introduce vm_page_reference(9) function to provide a stable KPI and KBI for filesystems like tmpfs and zfs which need to mark a page as referenced. Reviewed by: alc, attilio Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64) Approved by: re (bz)
|
#
a9d2f8d8 |
|
10-Aug-2011 |
Robert Watson <rwatson@FreeBSD.org> |
Second-to-last commit implementing Capsicum capabilities in the FreeBSD kernel for FreeBSD 9.0: Add a new capability mask argument to fget(9) and friends, allowing system call code to declare what capabilities are required when an integer file descriptor is converted into an in-kernel struct file *. With options CAPABILITIES compiled into the kernel, this enforces capability protection; without, this change is effectively a no-op. Some cases require special handling, such as mmap(2), which must preserve information about the maximum rights at the time of mapping in the memory map so that they can later be enforced in mprotect(2) -- this is done by narrowing the rights in the existing max_protection field used for similar purposes with file permissions. In namei(9), we assert that the code is not reached from within capability mode, as we're not yet ready to enforce namespace capabilities there. This will follow in a later commit. Update two capability names: CAP_EVENT and CAP_KEVENT become CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they represent. Approved by: re (bz) Submitted by: jonathan Sponsored by: Google Inc
|
#
2e32165c |
|
10-Jul-2011 |
Konstantin Belousov <kib@FreeBSD.org> |
Extract the code to translate VM error into errno, into an exported function vm_mmap_to_errno(). It is useful for the drivers that implement mmap(2)-like functionality, to be able to return error codes consistent with mmap(2). Sponsored by: The FreeBSD Foundation No objections from: alc MFC after: 1 week
|
#
3103730c |
|
10-Jul-2011 |
Konstantin Belousov <kib@FreeBSD.org> |
Style. MFC after: 3 days
|
#
afcc55f3 |
|
06-Jul-2011 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
All the racct_*() calls need to happen with the proc locked. Fixing this won't happen before 9.0. This commit adds "#ifdef RACCT" around all the "PROC_LOCK(p); racct_whatever(p, ...); PROC_UNLOCK(p)" instances, in order to avoid useless locking/unlocking in kernels built without "options RACCT".
|
#
1ba5ad42 |
|
05-Apr-2011 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Add accounting for most of the memory-related resources. Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)
|
#
7ec9c8d1 |
|
24-Feb-2011 |
Sergey Kandaurov <pluknet@FreeBSD.org> |
Remove sysctl vm.max_proc_mmap used to protect from KVA space exhaustion. As it was pointed out by Alan Cox, that no longer serves its purpose with the modern UMA allocator compared to the old one used in 4.x days. The removal of sysctl eliminates max_proc_mmap type overflow leading to the broken mmap(2) seen with large amount of physical memory on arches with factually unbound KVA space (such as amd64). It was found that slightly less than 256GB of physmem was enough to trigger the overflow. Reviewed by: alc, kib Approved by: avg (mentor) MFC after: 2 months
|
#
a2f510e8 |
|
04-Dec-2010 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Fix comment intentation.
|
#
7022f954 |
|
14-Nov-2010 |
Konstantin Belousov <kib@FreeBSD.org> |
Do not use __FreeBSD_version prefix for the special osrel version. The ports/Mk/bsd.port.mk uses sys/param.h to fetch osrel, and cannot grok several constants with the prefix. Reported and tested by: swell.k gmail com MFC after: 1 week
|
#
94bce453 |
|
14-Nov-2010 |
Konstantin Belousov <kib@FreeBSD.org> |
Use symbolic names instead of hardcoding values for magic p_osrel constants. MFC after: 1 week
|
#
a7d5f7eb |
|
19-Oct-2010 |
Jamie Gritton <jamie@FreeBSD.org> |
A new jail(8) with a configuration file, to replace the work currently done by /etc/rc.d/jail.
|
#
da048309 |
|
19-Sep-2010 |
Alan Cox <alc@FreeBSD.org> |
Allow a POSIX shared memory object that is opened for read but not for write to nonetheless be mapped PROT_WRITE and MAP_PRIVATE, i.e., copy-on-write. (This is a regression in the new implementation of POSIX shared memory objects that is used by HEAD and RELENG_8. This bug does not exist in RELENG_7's user-level, file-based implementation.) PR: 150260 MFC after: 3 weeks
|
#
d473d3a1 |
|
06-Sep-2010 |
Ryan Stone <rstone@FreeBSD.org> |
Fix a typo in r212281. uintptr -> uintptr_t Pointy hat to: rstone Approved by: emaste (mentor) MFC after: 2 weeks
|
#
0d419640 |
|
06-Sep-2010 |
Ryan Stone <rstone@FreeBSD.org> |
In munmap() downgrade the vm_map_lock to a read lock before taking a read lock on the pmc-sx lock. This prevents a deadlock with pmc_log_process_mappings, which has an exclusive lock on pmc-sx and tries to get a read lock on a vm_map. Downgrading the vm_map_lock in munmap allows pmc_log_process_mappings to continue, preventing the deadlock. Without this change I could cause a deadlock on a multicore 8.1-RELEASE system by having one thread constantly mmap'ing and then munmap'ing a PROT_EXEC mapping in a loop while I repeatedly invoked and stopped pmcstat in system-wide sampling mode. Reviewed by: fabient Approved by: emaste (mentor) MFC after: 2 weeks
|
#
74ffb9af |
|
28-Aug-2010 |
Alan Cox <alc@FreeBSD.org> |
Add the MAP_PREFAULT_READ option to mmap(2). Reviewed by: jhb, kib
|
#
3979450b |
|
06-Aug-2010 |
Konstantin Belousov <kib@FreeBSD.org> |
Add new make_dev_p(9) flag MAKEDEV_ETERNAL to inform devfs that created cdev will never be destroyed. Propagate the flag to devfs vnodes as VV_ETERNVALDEV. Use the flags to avoid acquiring devmtx and taking a thread reference on such nodes. In collaboration with: pho MFC after: 1 month
|
#
fd6f4ffb |
|
27-Jul-2010 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Fix commented out resource limit check in mlockall(2). It's still racy, but at least less misleading.
|
#
c46b90e9 |
|
26-May-2010 |
Alan Cox <alc@FreeBSD.org> |
Push down page queues lock acquisition in pmap_enter_object() and pmap_is_referenced(). Eliminate the corresponding page queues lock acquisitions from vm_map_pmap_enter() and mincore(), respectively. In mincore(), this allows some additional cases to complete without ever acquiring the page queues lock. Assert that the page is managed in pmap_is_referenced(). On powerpc/aim, push down the page queues lock acquisition from moea*_is_modified() and moea*_is_referenced() into moea*_query_bit(). Again, this will allow some additional cases to complete without ever acquiring the page queues lock. Reorder a few statements in vm_page_dontneed() so that a race can't lead to an old reference persisting. This scenario is described in detail by a comment. Correct a spelling error in vm_page_dontneed(). Assert that the object is locked in vm_page_clear_dirty(), and restrict the page queues lock assertion to just those cases in which the page is currently writeable. Add object locking to vnode_pager_generic_putpages(). This was the one and only place where vm_page_clear_dirty() was being called without the object being locked. Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call to vm_page_clear_dirty(). Change vnode_pager_generic_putpages() to the modern-style of function definition. Also, change the name of one of the parameters to follow virtual memory system naming conventions. Reviewed by: kib
|
#
567e51e1 |
|
24-May-2010 |
Alan Cox <alc@FreeBSD.org> |
Roughly half of a typical pmap_mincore() implementation is machine- independent code. Move this code into mincore(), and eliminate the page queues lock from pmap_mincore(). Push down the page queues lock into pmap_clear_modify(), pmap_clear_reference(), and pmap_is_modified(). Assert that these functions are never passed an unmanaged page. Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m: Contrary to what the comment says, pmap_mincore() is not simply an optimization. Without a complete pmap_mincore() implementation, mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED because only the pmap can provide this information. Eliminate the page queues lock from vfs_setdirty_locked_object(), vm_pageout_clean(), vm_object_page_collect_flush(), and vm_object_page_clean(). Generally speaking, these are all accesses to the page's dirty field, which are synchronized by the containing vm object's lock. Reduce the scope of the page queues lock in vm_object_madvise() and vm_page_dontneed(). Reviewed by: kib (an earlier version)
|
#
2965a453 |
|
29-Apr-2010 |
Kip Macy <kmacy@FreeBSD.org> |
On Alan's advice, rather than do a wholesale conversion on a single architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib
|
#
7b85f591 |
|
24-Apr-2010 |
Alan Cox <alc@FreeBSD.org> |
Resurrect pmap_is_referenced() and use it in mincore(). Essentially, pmap_ts_referenced() is not always appropriate for checking whether or not pages have been referenced because it clears any reference bits that it encounters. For example, in mincore(), clearing the reference bits has two negative consequences. First, it throws off the activity count calculations performed by the page daemon. Specifically, a page on which mincore() has called pmap_ts_referenced() looks less active to the page daemon than it should. Consequently, the page could be deactivated prematurely by the page daemon. Arguably, this problem could be fixed by having mincore() duplicate the activity count calculation on the page. However, there is a second problem for which that is not a solution. In order to clear a reference on a 4KB page, it may be necessary to demote a 2/4MB page mapping. Thus, a mincore() by one process can have the side effect of demoting a superpage mapping within another process!
|
#
f70ad548 |
|
14-Apr-2010 |
John Baldwin <jhb@FreeBSD.org> |
MFC 205536: Reject attempts to create a MAP_ANON mapping with a non-zero offset.
|
#
5711bf30 |
|
23-Mar-2010 |
John Baldwin <jhb@FreeBSD.org> |
Reject attempts to create a MAP_ANON mapping with a non-zero offset. PR: kern/71258 Submitted by: Alexander Best MFC after: 2 weeks
|
#
fc508327 |
|
02-Oct-2009 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
Back out the functional parts from r197537. After r197711, affecting all user mappings, mmap no longer needs special treatment.
|
#
27bfa958 |
|
27-Sep-2009 |
Simon L. B. Nielsen <simon@FreeBSD.org> |
Do not allow mmap with the MAP_FIXED argument to map at address zero. This is done to make it harder to exploit kernel NULL pointer security vulnerabilities. While this of course does not fix vulnerabilities, it does mitigate their impact. Note that this may break some applications, most likely emulators or similar, which for one reason or another require mapping memory at zero. This restriction can be disabled with the security.bsd.mmap_zero sysctl variable. Discussed with: rwatson, bz Tested by: bz (Wine), simon (VirtualBox) Submitted by: jhb
|
#
2c5f9fbe |
|
23-Sep-2009 |
Konstantin Belousov <kib@FreeBSD.org> |
MFC r197348: For a.out and pre-8 ELF binaries, allow the mmap of zero length. Approved by: re (kensmith)
|
#
497a8238 |
|
19-Sep-2009 |
Konstantin Belousov <kib@FreeBSD.org> |
Old (a.out) rtld attempts to mmap zero-length region, e.g. when bss of the linked object is zero-length. More old code assumes that mmap of zero length returns success. For a.out and pre-8 ELF binaries, allow the mmap of zero length. Reported by: tegge Reviewed by: tegge, alc, jhb MFC after: 3 days
|
#
0fe0ed8b |
|
14-Jul-2009 |
John Baldwin <jhb@FreeBSD.org> |
- Change mmap() to fail requests with EINVAL that pass a length of 0. This behavior is mandated by POSIX. - Do not fail requests that pass a length greater than SSIZE_MAX (such as > 2GB on 32-bit platforms). The 'len' parameter is actually an unsigned 'size_t' so negative values don't really make sense. Submitted by: Alexander Best alexbestms at math.uni-muenster.de Reviewed by: alc Approved by: re (kib) MFC after: 1 week
|
#
3364c323 |
|
23-Jun-2009 |
Konstantin Belousov <kib@FreeBSD.org> |
Implement global and per-uid accounting of the anonymous memory. Add rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved for the uid. The accounting information (charge) is associated with either map entry, or vm object backing the entry, assuming the object is the first one in the shadow chain and entry does not require COW. Charge is moved from entry to object on allocation of the object, e.g. during the mmap, assuming the object is allocated, or on the first page fault on the entry. It moves back to the entry on forks due to COW setup. The per-entry granularity of accounting makes the charge process fair for processes that change uid during lifetime, and decrements charge for proper uid when region is unmapped. The interface of vm_pager_allocate(9) is extended by adding struct ucred *, that is used to charge appropriate uid when allocation if performed by kernel, e.g. md(4). Several syscalls, among them is fork(2), may now return ENOMEM when global or per-uid limits are enforced. In collaboration with: pho Reviewed by: alc Approved by: re (kensmith)
|
#
bcf11e8d |
|
05-Jun-2009 |
Robert Watson <rwatson@FreeBSD.org> |
Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd
|
#
64345f0b |
|
01-Jun-2009 |
John Baldwin <jhb@FreeBSD.org> |
Add an extension to the character device interface that allows character device drivers to use arbitrary VM objects to satisfy individual mmap() requests. - A new d_mmap_single(cdev, &foff, objsize, &object, prot) callback is added to cdevsw. This function is called for each mmap() request. If it returns ENODEV, then the mmap() request will fall back to using the device's device pager object and d_mmap(). Otherwise, the method can return a VM object to satisfy this entire mmap() request via *object. It can also modify the starting offset into this object via *foff. This allows device drivers to use the file offset as a cookie to identify specific VM objects. - vm_mmap_vnode() has been changed to call vm_mmap_cdev() directly when mapping V_CHR vnodes. This avoids duplicating all the cdev mmap handling code and simplifies some of vm_mmap_vnode(). - D_VERSION has been bumped to D_VERSION_02. Older device drivers using D_VERSION_01 are still supported. MFC after: 1 month
|
#
beb3c3a9 |
|
04-Apr-2009 |
Alan Cox <alc@FreeBSD.org> |
Retire VM_PROT_READ_IS_EXEC. It was intended to be a micro-optimization, but I see no benefit from it today. VM_PROT_READ_IS_EXEC was only intended for use on processors that do not distinguish between read and execute permission. On an mmap(2) or mprotect(2), it automatically added execute permission if the caller specified permissions included read permission. The hope was that this would reduce the number of vm map entries needed to implement an address space because there would be fewer neighboring vm map entries that differed only in the presence or absence of VM_PROT_EXECUTE. (See vm/vm_mmap.c revision 1.56.) Today, I don't see any real applications that benefit from VM_PROT_READ_IS_EXEC. In any case, vm map entries are now organized as a self-adjusting binary search tree instead of an ordered list. So, the need for coalescing vm map entries is not as great as it once was.
|
#
655c3490 |
|
24-Feb-2009 |
Konstantin Belousov <kib@FreeBSD.org> |
Revert the addition of the freelist argument for the vm_map_delete() function, done in r188334. Instead, collect the entries that shall be freed, in the deferred_freelist member of the map. Automatically purge the deferred freelist when map is unlocked. Tested by: pho Reviewed by: alc
|
#
897d81a0 |
|
08-Feb-2009 |
Konstantin Belousov <kib@FreeBSD.org> |
Do not call vm_object_deallocate() from vm_map_delete(), because we hold the map lock there, and might need the vnode lock for OBJT_VNODE objects. Postpone object deallocation until caller of vm_map_delete() drops the map lock. Link the map entries to be freed into the freelist, that is released by the new helper function vm_map_entry_free_freelist(). Reviewed by: tegge, alc Tested by: pho
|
#
fa3de770 |
|
21-Jan-2009 |
John Baldwin <jhb@FreeBSD.org> |
Now that vfs_markatime() no longer requires an exclusive lock due to the VOP_MARKATIME() changes, use a shared vnode lock for mmap(). Submitted by: ups
|
#
556c3162 |
|
22-Oct-2008 |
Robert Watson <rwatson@FreeBSD.org> |
Update mmap() comment: no more block devices, so no more block device cache coherency questions. MFC after: 3 days
|
#
d7f03759 |
|
19-Oct-2008 |
Ulf Lilleengen <lulf@FreeBSD.org> |
- Import the HEAD csup code which is the basis for the cvsmode work.
|
#
36b90789 |
|
20-Sep-2008 |
Konstantin Belousov <kib@FreeBSD.org> |
Allow the d_mmap driver methods to use cdevpriv KPI during verification phase of establishing mapping. Discussed with: rwatson, jhb, rnoland Tested by: rnoland MFC after: 3 days
|
#
0359a12e |
|
28-Aug-2008 |
Attilio Rao <attilio@FreeBSD.org> |
Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread was always curthread and totally unuseful. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
|
#
6bd9cb1c |
|
03-Aug-2008 |
Tom Rhodes <trhodes@FreeBSD.org> |
Fill in a few sysctl descriptions. Reviewed by: alc, Matt Dillon <dillon@apollo.backplane.com> Approved by: alc
|
#
ba304211 |
|
24-May-2008 |
Alan Cox <alc@FreeBSD.org> |
To date, our implementation of munmap(2) has required that the entirety of the specified range be mapped. Specifically, it has returned EINVAL if the entire range is not mapped. There is not, however, any basis for this in either SuSv2 or our own man page. Moreover, neither Linux nor Solaris impose this requirement. This revision removes this requirement. Submitted by: Tijl Coosemans PR: 118510 MFC after: 6 weeks
|
#
d0a83a83 |
|
17-May-2008 |
Alan Cox <alc@FreeBSD.org> |
In order to map device memory using superpages, mmap(2) must find a superpage-aligned virtual address for the mapping. Revision 1.65 implemented an overly simplistic and generally ineffectual method for finding a superpage-aligned virtual address. Specifically, it rounds the virtual address corresponding to the end of the data segment up to the next superpage-aligned virtual address. If this virtual address is unallocated, then the device will be mapped using superpages. Unfortunately, in modern times, where applications like the X server dynamically load much of their code, this virtual address is already allocated. In such cases, mmap(2) simply uses the first available virtual address, which is not necessarily superpage aligned. This revision changes mmap(2) to use a more robust method, specifically, the VMFS_ALIGNED_SPACE option that is now implemented by vm_map_find().
|
#
b8ca4ef2 |
|
27-Apr-2008 |
Alan Cox <alc@FreeBSD.org> |
vm_map_fixed(), unlike vm_map_find(), does not update "addr", so it can be passed by value.
|
#
91a35e78 |
|
20-Mar-2008 |
Konstantin Belousov <kib@FreeBSD.org> |
Do not dereference cdev->si_cdevsw, use the dev_refthread() to properly obtain the reference. In particular, this fixes the panic reported in the PR. Remove the comments stating that this needs to be done. PR: kern/119422 MFC after: 1 week
|
#
237fdd78 |
|
16-Mar-2008 |
Robert Watson <rwatson@FreeBSD.org> |
In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink
|
#
8e38aeff |
|
08-Jan-2008 |
John Baldwin <jhb@FreeBSD.org> |
Add a new file descriptor type for IPC shared memory objects and use it to implement shm_open(2) and shm_unlink(2) in the kernel: - Each shared memory file descriptor is associated with a swap-backed vm object which provides the backing store. Each descriptor starts off with a size of zero, but the size can be altered via ftruncate(2). The shared memory file descriptors also support fstat(2). read(2), write(2), ioctl(2), select(2), poll(2), and kevent(2) are not supported on shared memory file descriptors. - shm_open(2) and shm_unlink(2) are now implemented as system calls that manage shared memory file descriptors. The virtual namespace that maps pathnames to shared memory file descriptors is implemented as a hash table where the hash key is generated via the 32-bit Fowler/Noll/Vo hash of the pathname. - As an extension, the constant 'SHM_ANON' may be specified in place of the path argument to shm_open(2). In this case, an unnamed shared memory file descriptor will be created similar to the IPC_PRIVATE key for shmget(2). Note that the shared memory object can still be shared among processes by sharing the file descriptor via fork(2) or sendmsg(2), but it is unnamed. This effectively serves to implement the getmemfd() idea bandied about the lists several times over the years. - The backing store for shared memory file descriptors are garbage collected when they are not referenced by any open file descriptors or the shm_open(2) virtual namespace. Submitted by: dillon, peter (previous versions) Submitted by: rwatson (I based this on his version) Reviewed by: alc (suggested converting getmemfd() to shm_open())
|
#
30d239bc |
|
24-Oct-2007 |
Robert Watson <rwatson@FreeBSD.org> |
Merge first in a series of TrustedBSD MAC Framework KPI changes from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer
|
#
c899450b |
|
18-Oct-2007 |
Peter Wemm <peter@FreeBSD.org> |
Fix cosmetic bug in stale copy of msync_args. 'len' is size_t, not int.
|
#
d239bd3c |
|
19-Aug-2007 |
Konstantin Belousov <kib@FreeBSD.org> |
Do not drop vm_map lock between doing vm_map_remove() and vm_map_insert(). For this, introduce vm_map_fixed() that does that for MAP_FIXED case. Dropping the lock allowed for parallel thread to occupy the freed space. Reported by: Tijl Coosemans <tijl ulyssis org> Reviewed by: alc Approved by: re (kensmith) MFC after: 2 weeks
|
#
c2815ad5 |
|
04-Jul-2007 |
Peter Wemm <peter@FreeBSD.org> |
Add freebsd6_ wrappers for mmap/lseek/pread/pwrite/truncate/ftruncate Approved by: re (kensmith)
|
#
6bda842d |
|
16-Jun-2007 |
Matt Jacob <mjacob@FreeBSD.org> |
Make sure object is NULL- there is a possible case where you could fall through to it being used w/o being set. Put a break in the default case.
|
#
2feb50bf |
|
31-May-2007 |
Attilio Rao <attilio@FreeBSD.org> |
Revert VMCNT_* operations introduction. Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately. Requested by: alc Approved by: jeff (mentor)
|
#
222d0195 |
|
18-May-2007 |
Jeff Roberson <jeff@FreeBSD.org> |
- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines. Contributed by: Attilio Rao <attilio@FreeBSD.org>
|
#
acd3428b |
|
06-Nov-2006 |
Robert Watson <rwatson@FreeBSD.org> |
Sweep kernel replacing suser(9) calls with priv(9) calls, assigning specific privilege names to a broad range of privileges. These may require some future tweaking. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>
|
#
aed55708 |
|
22-Oct-2006 |
Robert Watson <rwatson@FreeBSD.org> |
Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA
|
#
455dd7d4 |
|
20-Jun-2006 |
Konstantin Belousov <kib@FreeBSD.org> |
Make the mincore(2) return ENOMEM when requested range is not fully mapped. Requested by: Bruno Haible <bruno at clisp org> Reviewed by: alc Approved by: pjd (mentor) MFC after: 1 month
|
#
89eae00b |
|
21-Apr-2006 |
Tom Rhodes <trhodes@FreeBSD.org> |
It seems that POSIX would rather ENODEV returned in place of EINVAL when trying to mmap() an fd that isn't a normal file. Reference: http://www.opengroup.org/onlinepubs/009695399/functions/mmap.html Submitted by: fanf
|
#
49874f6e |
|
25-Mar-2006 |
Joseph Koshy <jkoshy@FreeBSD.org> |
MFP4: Support for profiling dynamically loaded objects. Kernel changes: Inform hwpmc of executable objects brought into the system by kldload() and mmap(), and of their removal by kldunload() and munmap(). A helper function linker_hwpmc_list_objects() has been added to "sys/kern/kern_linker.c" and is used by hwpmc to retrieve the list of currently loaded kernel modules. The unused `MAPPINGCHANGE' event has been deprecated in favour of separate `MAP_IN' and `MAP_OUT' events; this change reduces space wastage in the log. Bump the hwpmc's ABI version to "2.0.00". Teach hwpmc(4) to handle the map change callbacks. Change the default per-cpu sample buffer size to hold 32 samples (up from 16). Increment __FreeBSD_version. libpmc(3) changes: Update libpmc(3) to deal with the new events in the log file; bring the pmclog(3) manual page in sync with the code. pmcstat(8) changes: Introduce new options to pmcstat(8): "-r" (root fs path), "-M" (mapfile name), "-q"/"-v" (verbosity control). Option "-k" now takes a kernel directory as its argument but will also work with the older invocation syntax. Rework string handling in pmcstat(8) to use an opaque type for interned strings. Clean up ELF parsing code and add support for tracking dynamic object mappings reported by a v2.0.00 hwpmc(4). Report statistics at the end of a log conversion run depending on the requested verbosity level. Reviewed by: jhb, dds (kernel parts of an earlier patch) Tested by: gallatin (earlier patch)
|
#
9f5c1d19 |
|
12-Oct-2005 |
Diomidis Spinellis <dds@FreeBSD.org> |
Move execve's access time update functionality into a new vfs_mark_atime() function, and use the new function for performing efficient atime updates in mmap(). Reviewed by: bde MFC after: 2 weeks
|
#
1e309003 |
|
04-Oct-2005 |
Diomidis Spinellis <dds@FreeBSD.org> |
Update the vnode's access time after an mmap operation on it. Before this change a copy operation with cp(1) would not update the file access times. According to the POSIX mmap(2) documentation: the st_atime field of the mapped file may be marked for update at any time between the mmap() call and the corresponding munmap() call. The initial read or write reference to a mapped region shall cause the file's st_atime field to be marked for update if it has not already been marked for update.
|
#
749474f2 |
|
20-Sep-2005 |
Peter Wemm <peter@FreeBSD.org> |
Remove unused (but initialized) variable 'objsize' from vm_mmap()
|
#
c92163dc |
|
14-Apr-2005 |
Christian S.J. Peron <csjp@FreeBSD.org> |
Move MAC check_vnode_mmap entry point out from being exclusive to MAP_SHARED so that the entry point gets executed un-conditionally. This may be useful for security policies which want to perform access control checks around run-time linking. -add the mmap(2) flags argument to the check_vnode_mmap entry point so that we can make access control decisions based on the type of mapped object. -update any dependent API around this parameter addition such as function prototype modifications, entry point parameter additions and the inclusion of sys/mman.h header file. -Change the MLS, BIBA and LOMAC security policies so that subject domination routines are not executed unless the type of mapping is shared. This is done to maintain compatibility between the old vm_mmap_vnode(9) and these policies. Reviewed by: rwatson MFC after: 1 month
|
#
98df9218 |
|
01-Apr-2005 |
John Baldwin <jhb@FreeBSD.org> |
- Change the vm_mmap() function to accept an objtype_t parameter specifying the type of object represented by the handle argument. - Allow vm_mmap() to map device memory via cdev objects in addition to vnodes and anonymous memory. Note that mmaping a cdev directly does not currently perform any MAC checks like mapping a vnode does. - Unbreak the DRM getbufs ioctl by having it call vm_mmap() directly on the cdev the ioctl is acting on rather than trying to find a suitable vnode to map from. Reviewed by: alc, arch@
|
#
8516dd18 |
|
24-Jan-2005 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Don't use VOP_GETVOBJECT, use vp->v_object directly.
|
#
ae51ff11 |
|
24-Jan-2005 |
Jeff Roberson <jeff@FreeBSD.org> |
- Remove GIANT_REQUIRED where giant is no longer required. - Use VFS_LOCK_GIANT() rather than directly acquiring giant in places where giant is only held because vfs requires it. Sponsored By: Isilon Systems, Inc.
|
#
60727d8b |
|
06-Jan-2005 |
Warner Losh <imp@FreeBSD.org> |
/* -> /*- for license, minor formatting changes
|
#
ff4782b5 |
|
25-Oct-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Don't clear flags we just checked were not set.
|
#
891822a8 |
|
24-Sep-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
XXX mark two places where we do not hold a threadcount on the dev when frobbing the cdevsw. In both cases we examine only the cdevsw and it is a good question if we weren't better off copying those properties into the cdev in the first place. This question will be revisited.
|
#
c2296e99 |
|
01-Sep-2004 |
Alan Cox <alc@FreeBSD.org> |
Remove dead code.
|
#
23fc1a90 |
|
05-Aug-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Remove a product specific workaround for wrong modes when mmap(2)'ing devices. They have had plenty of time to adjust now.
|
#
21c12545 |
|
01-Aug-2004 |
Alan Cox <alc@FreeBSD.org> |
Eliminate the acquisition and release of Giant around the call to pmap_mincore() in mincore(2). Either pmap locking exists (alpha, amd64, i386, ia64) or pmap_mincore() is unimplemented (arm, powerpc, sparc64).
|
#
1930e303 |
|
11-Jun-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Deorbit COMPAT_SUNOS. We inherited this from the sparc32 port of BSD4.4-Lite1. We have neither a sparc32 port nor a SunOS4.x compatibility desire these days.
|
#
8eec77b0 |
|
11-May-2004 |
Tim J. Robbins <tjr@FreeBSD.org> |
To handle orphaned character device vnodes properly in mmap(), check that v_mount is non-null before dereferencing it. If it's null, behave as if MNT_NOEXEC was not set on the mount that originally containined it.
|
#
05eb3785 |
|
06-Apr-2004 |
Warner Losh <imp@FreeBSD.org> |
Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999. Approved by: core
|
#
ce7a036d |
|
04-Apr-2004 |
Alexander Kabaev <kan@FreeBSD.org> |
Delay permission checks for VCHR vnodes until after vnode is locked in vm_mmap_vnode function, where we can safely check for a special /dev/zero case. Rev. 1.180 has reordered checks and introduced a regression. Submitted by: alc Was broken by: kan
|
#
b483c7f6 |
|
18-Mar-2004 |
Guido van Rooij <guido@FreeBSD.org> |
When mmap-ing a file from a noexec mount, be sure not to grant the right to mmap it PROT_EXEC. This also depends on the architecture, as some architextures (e.g. i386) do not distinguish between read and exec pages Inspired by: http://linux.bkbits.net:8080/linux-2.4/cset@1.1267.1.85 Reviewed by: alc
|
#
bb734798 |
|
15-Mar-2004 |
Don Lewis <truckman@FreeBSD.org> |
Make overflow/wraparound checking more robust and unbreak len=0 in vslock(), mlock(), and munlock(). Reviewed by: bde
|
#
f0ea4612 |
|
14-Mar-2004 |
Don Lewis <truckman@FreeBSD.org> |
Style(9) changes. Pointed out by: bde
|
#
be4c5ad0 |
|
14-Mar-2004 |
Don Lewis <truckman@FreeBSD.org> |
Remove redundant suser() check.
|
#
16929939 |
|
05-Mar-2004 |
Don Lewis <truckman@FreeBSD.org> |
Undo the merger of mlock()/vslock and munlock()/vsunlock() and the introduction of kern_mlock() and kern_munlock() in src/sys/kern/kern_sysctl.c 1.150 src/sys/vm/vm_extern.h 1.69 src/sys/vm/vm_glue.c 1.190 src/sys/vm/vm_mmap.c 1.179 because different resource limits are appropriate for transient and "permanent" page wiring requests. Retain the kern_mlock() and kern_munlock() API in the revived vslock() and vsunlock() functions. Combine the best parts of each of the original sets of implementations with further code cleanup. Make the mclock() and vslock() implementations as similar as possible. Retain the RLIMIT_MEMLOCK check in mlock(). Move the most strigent test, which can return EAGAIN, last so that requests that have no hope of ever being satisfied will not be retried unnecessarily. Disable the test that can return EAGAIN in the vslock() implementation because it will cause the sysctl code to wedge. Tested by: Cy Schubert <Cy.Schubert AT komquats.com>
|
#
30d4dd7e |
|
29-Feb-2004 |
Alexander Kabaev <kan@FreeBSD.org> |
Pich up a do {} while(0) cleanup by phk that was discarded accidentally in previous revision. Submitted by: alc
|
#
c8daea13 |
|
27-Feb-2004 |
Alexander Kabaev <kan@FreeBSD.org> |
Move the code dealing with vnode out of several functions into a single helper function vm_mmap_vnode. Discussed with: jeffr,alc (a while ago)
|
#
47934cef |
|
25-Feb-2004 |
Don Lewis <truckman@FreeBSD.org> |
Split the mlock() kernel code into two parts, mlock(), which unpacks the syscall arguments and does the suser() permission check, and kern_mlock(), which does the resource limit checking and calls vm_map_wire(). Split munlock() in a similar way. Enable the RLIMIT_MEMLOCK checking code in kern_mlock(). Replace calls to vslock() and vsunlock() in the sysctl code with calls to kern_mlock() and kern_munlock() so that the sysctl code will obey the wired memory limits. Nuke the vslock() and vsunlock() implementations, which are no longer used. Add a member to struct sysctl_req to track the amount of memory that is wired to handle the request. Modify sysctl_wire_old_buffer() to return an error if its call to kern_mlock() fails. Only wire the minimum of the length specified in the sysctl request and the length specified in its argument list. It is recommended that sysctl handlers that use sysctl_wire_old_buffer() should specify reasonable estimates for the amount of data they want to return so that only the minimum amount of memory is wired no matter what length has been specified by the request. Modify the callers of sysctl_wire_old_buffer() to look for the error return. Modify sysctl_old_user to obey the wired buffer length and clean up its implementation. Reviewed by: bms
|
#
91d5354a |
|
04-Feb-2004 |
John Baldwin <jhb@FreeBSD.org> |
Locking for the per-process resource limits structure. - struct plimit includes a mutex to protect a reference count. The plimit structure is treated similarly to struct ucred in that is is always copy on write, so having a reference to a structure is sufficient to read from it without needing a further lock. - The proc lock protects the p_limit pointer and must be held while reading limits from a process to keep the limit structure from changing out from under you while reading from it. - Various global limits that are ints are not protected by a lock since int writes are atomic on all the archs we support and thus a lock wouldn't buy us anything. - All accesses to individual resource limits from a process are abstracted behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return either an rlimit, or the current or max individual limit of the specified resource from a process. - dosetrlimit() was renamed to kern_setrlimit() to match existing style of other similar syscall helper functions. - The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit() (it didn't used the stackgap when it should have) but uses lim_rlimit() and kern_setrlimit() instead. - The svr4 compat no longer uses the stackgap for resource limits calls, but uses lim_rlimit() and kern_setrlimit() instead. - The ibcs2 compat no longer uses the stackgap for resource limits. It also no longer uses the stackgap for accessing sysctl's for the ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result, ibcs2_sysconf() no longer needs Giant. - The p_rlimit macro no longer exists. Submitted by: mtm (mostly, I only did a few cleanups and catchups) Tested on: i386 Compiled on: alpha, amd64
|
#
cafe836a |
|
20-Dec-2003 |
Alan Cox <alc@FreeBSD.org> |
- Correct an error in mincore(2) that has existed since its introduction: mincore(2) should check that the page is valid, not just allocated. Otherwise, it can return a false positive for a page that is not yet resident because it is being read from disk.
|
#
5e6dbda0 |
|
07-Dec-2003 |
Alexander Kabaev <kan@FreeBSD.org> |
Remove trailing whitespace.
|
#
c8123cb8 |
|
07-Dec-2003 |
Alan Cox <alc@FreeBSD.org> |
Addendum to revision 1.174: In the case where vm_pager_allocate() is called to create a vnode-backed object, the vnode lock must be held by the caller. Reported by: truckman Discussed with: kan
|
#
20eec4bb |
|
05-Dec-2003 |
Alan Cox <alc@FreeBSD.org> |
Fix a deadlock between vm_fault() and vm_mmap(): The expected lock ordering between vm_map and vnode locks is that vm_map locks are acquired first. In revision 1.150 mmap(2) was changed to pass a locked vnode into vm_mmap(). This creates a lock-order reversal when vm_mmap() calls one of the vm_map routines that acquires a vm_map lock. The solution implemented herein is to release the vnode lock in mmap() before calling vm_mmap() and reacquire this lock if necessary in vm_mmap(). Approved by: re (scottl) Reviewed by: jeff, kan, rwatson
|
#
6f8b4fc0 |
|
14-Nov-2003 |
Alan Cox <alc@FreeBSD.org> |
- Remove long dead code.
|
#
b7b7cd44 |
|
13-Nov-2003 |
Alan Cox <alc@FreeBSD.org> |
Changes to msync(2) - Return EBUSY if the region was wired by mlock(2) and MS_INVALIDATE is specified to msync(2). This is required by the Open Group Base Specifications Issue 6. - vm_map_sync() doesn't return KERN_FAILURE. Thus, msync(2) can't possibly return EIO. - The second major loop in vm_map_sync() handles sub maps. Thus, failing on sub maps in the first major loop isn't necessary.
|
#
d8834602 |
|
09-Nov-2003 |
Alan Cox <alc@FreeBSD.org> |
- The Open Group Base Specifications Issue 6 specifies that an munmap(2) must return EINVAL if size is zero. Submitted by: tegge - In order to avoid a race condition in multithreaded applications, the check and removal operations by munmap(2) must be in the same critical section. To accomodate this, vm_map_check_protection() is modified to require its caller to obtain at least a read lock on the map.
|
#
637315ed |
|
09-Nov-2003 |
Alan Cox <alc@FreeBSD.org> |
- Remove Giant from msync(2). Giant is still acquired by the lower layers if we drop into the pmap or vnode layers. - Migrate the handling of zero-length msync(2)s into vm_map_sync() so that multithread applications can't change the map between implementing the zero-length hack in msync(2) and reacquiring the map lock in vm_map_sync(). Reviewed by: tegge
|
#
950f8459 |
|
08-Nov-2003 |
Alan Cox <alc@FreeBSD.org> |
- Rename vm_map_clean() to vm_map_sync(). This better reflects the fact that msync(2) is its only caller. - Migrate the parts of the old vm_map_clean() that examined the internals of a vm object to a new function vm_object_sync() that is implemented in vm_object.c. At the same, introduce the necessary vm object locking so that vm_map_sync() and vm_object_sync() can be called without Giant. Reviewed by: tegge
|
#
11f7ddc5 |
|
05-Oct-2003 |
Bruce M Simpson <bms@FreeBSD.org> |
Only the super-user should be able to wire pages via the mlock() family of system calls at this time. Remove various #ifdef's to enforce this.
|
#
fd75d710 |
|
27-Sep-2003 |
Marcel Moolenaar <marcel@FreeBSD.org> |
Part 2 of implementing rstacks: add the ability to create rstacks and use the ability on ia64 to map the register stack. The orientation of the stack (i.e. its grow direction) is passed to vm_map_stack() in the overloaded cow argument. Since the grow direction is represented by bits, it is possible and allowed to create bi-directional stacks. This is not an advertised feature, more of a side-effect. Fix a bug in vm_map_growstack() that's specific to rstacks and which we could only find by having the ability to create rstacks: when the mapped stack ends at the faulting address, we have not actually mapped the faulting address. we need to include or cover the faulting address. Note that at this time mmap(2) has not been extended to allow the creation of rstacks by processes. If such a need arises, this can be done. Tested on: alpha, i386, ia64, sparc64
|
#
c460ac3a |
|
24-Sep-2003 |
Peter Wemm <peter@FreeBSD.org> |
Add sysentvec->sv_fixlimits() hook so that we can catch cases on 64 bit systems where the data/stack/etc limits are too big for a 32 bit process. Move the 5 or so identical instances of ELF_RTLD_ADDR() into imgact_elf.c. Supply an ia32_fixlimits function. Export the clip/default values to sysctl under the compat.ia32 heirarchy. Have mmap(0, ...) respect the current p->p_limits[RLIMIT_DATA].rlim_max value rather than the sysctl tweakable variable. This allows mmap to place mappings at sensible locations when limits have been reduced. Have the imgact_elf.c ld-elf.so.1 placement algorithm use the same method as mmap(0, ...) now does. Note that we cannot remove all references to the sysctl tweakable maxdsiz etc variables because /etc/login.conf specifies a datasize of 'unlimited'. And that causes exec etc to fail since it can no longer find space to mmap things.
|
#
7ebcee37 |
|
07-Sep-2003 |
Alan Cox <alc@FreeBSD.org> |
Revise the locking in mincore(2).
|
#
abd498aa |
|
11-Aug-2003 |
Bruce M Simpson <bms@FreeBSD.org> |
Add the mlockall() and munlockall() system calls. - All those diffs to syscalls.master for each architecture *are* necessary. This needed clarification; the stub code generation for mlockall() was disabled, which would prevent applications from linking to this API (suggested by mux) - Giant has been quoshed. It is no longer held by the code, as the required locking has been pushed down within vm_map.c. - Callers must specify VM_MAP_WIRE_HOLESOK or VM_MAP_WIRE_NOHOLES to express their intention explicitly. - Inspected at the vmstat, top and vm pager sysctl stats level. Paging-in activity is occurring correctly, using a test harness. - The RES size for a process may appear to be greater than its SIZE. This is believed to be due to mappings of the same shared library page being wired twice. Further exploration is needed. - Believed to back out of allocations and locks correctly (tested with WITNESS, MUTEX_PROFILING, INVARIANTS and DIAGNOSTIC). PR: kern/43426, standards/54223 Reviewed by: jake, alc Approved by: jake (mentor) MFC after: 2 weeks
|
#
a5d841d4 |
|
03-Jul-2003 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Remove unnecessary cast.
|
#
3b6d9652 |
|
22-Jun-2003 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Add a f_vnode field to struct file. Several of the subtypes have an associated vnode which is used for stuff like the f*() functions. By giving the vnode a speparate field, a number of checks for the specific subtype can be replaced simply with a check for f_vnode != NULL, and we can later free f_data up to subtype specific use. At this point in time, f_data still points to the vnode, so any code I might have overlooked will still work.
|
#
a6af4ff1 |
|
21-Jun-2003 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Use a do {...} while (0); and a couple of breaks to reduce the level of indentation a bit.
|
#
874651b1 |
|
11-Jun-2003 |
David E. O'Brien <obrien@FreeBSD.org> |
Use __FBSDID().
|
#
bc5b057f |
|
09-Jun-2003 |
Alan Cox <alc@FreeBSD.org> |
Hold the vm object's lock when performing vm_page_lookup().
|
#
69297bf8 |
|
17-Apr-2003 |
John Baldwin <jhb@FreeBSD.org> |
suser() does not need the proc lock, just the setting of P_PROTECTED in p_flag needs the lock.
|
#
f4cf2141 |
|
31-Mar-2003 |
Wes Peters <wes@FreeBSD.org> |
Add a facility allowing processes to inform the VM subsystem they are critical and should not be killed when pageout is looking for more memory pages in all the wrong places. Reviewed by: arch@ Sponsored by: St. Bernard Software
|
#
6900a17c |
|
29-Mar-2003 |
Maxime Henrion <mux@FreeBSD.org> |
The object type can't be OBJT_PHYS in vm_mmap(). Reviewed by: peter
|
#
48e3128b |
|
12-Jan-2003 |
Matthew Dillon <dillon@FreeBSD.org> |
Bow to the whining masses and change a union back into void *. Retain removal of unnecessary casts and throw in some minor cleanups to see if anyone complains, just for the hell of it.
|
#
cd72f218 |
|
11-Jan-2003 |
Matthew Dillon <dillon@FreeBSD.org> |
Change struct file f_data to un_data, a union of the correct struct pointer types, and remove a huge number of casts from code using it. Change struct xfile xf_data to xun_data (ABI is still compatible). If we need to add a #define for f_data and xf_data we can, but I don't think it will be necessary. There are no operational changes in this commit.
|
#
e80b7b69 |
|
28-Nov-2002 |
Alan Cox <alc@FreeBSD.org> |
Lock page field accesses in mincore(). Approved by: re (blanket)
|
#
3e732e7d |
|
22-Oct-2002 |
Robert Watson <rwatson@FreeBSD.org> |
Invoke mac_check_vnode_mmap() during mmap operations on vnodes, permitting policies to restrict access to memory mapping based on the credential requesting the mapping, the target vnode, the requested rights, or other policy considerations. Approved by: re Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
#
05ba50f5 |
|
21-Sep-2002 |
Jake Burkholder <jake@FreeBSD.org> |
Use the fields in the sysentvec and in the vm map header in place of the constants VM_MIN_ADDRESS, VM_MAXUSER_ADDRESS, USRSTACK and PS_STRINGS. This is mainly so that they can be variable even for the native abi, based on different machine types. Get stack protections from the sysentvec too. This makes it trivial to map the stack non-executable for certain abis, on machines that support it.
|
#
f6b5b182 |
|
06-Jul-2002 |
Jeff Roberson <jeff@FreeBSD.org> |
- Hold a lock on the vnode acquired from the file table across the call to vm_mmap() as well as the GETATTR etc. - If the handle is a vnode in vm_mmap() assert that it is locked. - Wiggle Giant around a little to account for the extra vnode operation.
|
#
070f64fe |
|
25-Jun-2002 |
Matthew Dillon <dillon@FreeBSD.org> |
Part I of RLIMIT_VMEM implementation. Implement core functionality for a new resource limit that covers a process's entire VM space, including mmap()'d space. (Part II will be additional code to check RLIMIT_VMEM during exec() but it needs more fleshing out). PR: kern/18209 Submitted by: Andrey Alekseyev <uitm@zenon.net>, Dmitry Kim <jason@nichego.net> MFC after: 7 days
|
#
2cd301d1 |
|
22-Jun-2002 |
Alan Cox <alc@FreeBSD.org> |
o Remove the unnecessary acquisition and release of Giant around fdrop() in mmap(2).
|
#
c04c996b |
|
22-Jun-2002 |
Alan Cox <alc@FreeBSD.org> |
o Reduce the scope of Giant in vm_mmap() to just the code that manipulates a vnode. (Thus, MAP_ANON and MAP_STACK never acquire Giant.)
|
#
042bb299 |
|
16-Jun-2002 |
Alan Cox <alc@FreeBSD.org> |
o Remove GIANT_REQUIRED from vm_fault_user_wire(). o Move pmap_pageable() outside of Giant in vm_fault_unwire(). (pmap_pageable() is a no-op on all supported architectures.) o Remove the acquisition and release of Giant from mlock().
|
#
e30616db |
|
14-Jun-2002 |
Alan Cox <alc@FreeBSD.org> |
o Remove the acquisition and release of Giant from munlock(). Reviewed by: tegge
|
#
1d7cf06c |
|
14-Jun-2002 |
Alan Cox <alc@FreeBSD.org> |
o Use vm_map_wire() and vm_map_unwire() in place of vm_map_pageable() and vm_map_user_pageable(). o Remove vm_map_pageable() and vm_map_user_pageable(). o Remove vm_map_clear_recursive() and vm_map_set_recursive(). (They were only used by vm_map_pageable() and vm_map_user_pageable().) Reviewed by: tegge
|
#
fa721254 |
|
06-Jun-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
fix typo in _SYS_SYSPROTO_H_ case: s/mlockall_args/munlockall_args Submitted by: Mark Santcroos <marks@ripe.net>
|
#
99b9331a |
|
30-May-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
Check for defined(__i386__) instead of just defined(i386) since the compiler will be updated to only define(__i386__) for ANSI cleanliness.
|
#
4b9fdc2b |
|
25-May-2002 |
Alan Cox <alc@FreeBSD.org> |
o Acquire and release Giant around pmap operations in vm_fault_unwire() and vm_map_delete(). Assert GIANT_REQUIRED in vm_map_delete() only if operating on the kernel_object or the kmem_object. o Remove GIANT_REQUIRED from vm_map_remove(). o Remove the acquisition and release of Giant from munmap().
|
#
e0be79af |
|
18-May-2002 |
Alan Cox <alc@FreeBSD.org> |
o Eliminate the acquisition and release of Giant from minherit(2). (vm_map_inherit() no longer requires Giant to be held.)
|
#
094f6d26 |
|
18-May-2002 |
Alan Cox <alc@FreeBSD.org> |
o Remove GIANT_REQUIRED from vm_map_madvise(). Instead, acquire and release Giant around vm_map_madvise()'s call to pmap_object_init_pt(). o Replace GIANT_REQUIRED in vm_object_madvise() with the acquisition and release of Giant. o Remove the acquisition and release of Giant from madvise().
|
#
43285049 |
|
17-May-2002 |
Alan Cox <alc@FreeBSD.org> |
o Remove the acquisition and release of Giant from mprotect().
|
#
8c5c5d04 |
|
03-May-2002 |
Alan Cox <alc@FreeBSD.org> |
o Remove GIANT_REQUIRED from vm_map_lookup_entry() and vm_map_check_protection(). o Call vm_map_check_protection() without Giant held in munmap().
|
#
44731cab |
|
01-Apr-2002 |
John Baldwin <jhb@FreeBSD.org> |
Change the suser() API to take advantage of td_ucred as well as do a general cleanup of the API. The entire API now consists of two functions similar to the pre-KSE API. The suser() function takes a thread pointer as its only argument. The td_ucred member of this thread must be valid so the only valid thread pointers are curthread and a few kernel threads such as thread0. The suser_cred() function takes a pointer to a struct ucred as its first argument and an integer flag as its second argument. The flag is currently only used for the PRISON_ROOT flag. Discussed on: smp@
|
#
11caded3 |
|
19-Mar-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
Remove __P.
|
#
a1287949 |
|
10-Mar-2002 |
Eivind Eklund <eivind@FreeBSD.org> |
- Remove a number of extra newlines that do not belong here according to style(9) - Minor space adjustment in cases where we have "( ", " )", if(), return(), while(), for(), etc. - Add /* SYMBOL */ after a few #endifs. Reviewed by: alc
|
#
a854ed98 |
|
27-Feb-2002 |
John Baldwin <jhb@FreeBSD.org> |
Simple p_ucred -> td_ucred changes to start using the per-thread ucred reference.
|
#
1e92845e |
|
15-Feb-2002 |
Bruce Evans <bde@FreeBSD.org> |
Garbage-collect options ACPI_NO_ENABLE_ON_BOOT, AML_DEBUG, BLEED, DEVICE_SYSCTLS, KEY, LOUTB, NFS_MUIDHASHSIZ, NFS_UIDHASHSIZ, PCI_QUIET and SIMPLELOCK_DEBUG.
|
#
a4db4953 |
|
13-Jan-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
Replace ffind_* with fget calls. Make fget MPsafe. Make fgetvp and fgetsock use the fget subsystem to reduce code bloat. Push giant down in fpathconf().
|
#
426da3bc |
|
13-Jan-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
SMP Lock struct file, filedesc and the global file list. Seigo Tanimura (tanimura) posted the initial delta. I've polished it quite a bit reducing the need for locking and adapting it for KSE. Locks: 1 mutex in each filedesc protects all the fields. protects "struct file" initialization, while a struct file is being changed from &badfileops -> &pipeops or something the filedesc should be locked. 1 mutex in each struct file protects the refcount fields. doesn't protect anything else. the flags used for garbage collection have been moved to f_gcflag which was the FILLER short, this doesn't need locking because the garbage collection is a single threaded container. could likely be made to use a pool mutex. 1 sx lock for the global filelist. struct file * fhold(struct file *fp); /* increments reference count on a file */ struct file * fhold_locked(struct file *fp); /* like fhold but expects file to locked */ struct file * ffind_hold(struct thread *, int fd); /* finds the struct file in thread, adds one reference and returns it unlocked */ struct file * ffind_lock(struct thread *, int fd); /* ffind_hold, but returns file locked */ I still have to smp-safe the fget cruft, I'll get to that asap.
|
#
cbc89bfb |
|
10-Oct-2001 |
Paul Saab <ps@FreeBSD.org> |
Make MAXTSIZ, DFLDSIZ, MAXDSIZ, DFLSSIZ, MAXSSIZ, SGROWSIZ loader tunable. Reviewed by: peter MFC after: 2 weeks
|
#
8c5d4fe8 |
|
26-Sep-2001 |
Robert Watson <rwatson@FreeBSD.org> |
o Modify access control checks in mmap() to use securelevel_gt() instead of direct variable access. Obtained from: TrustedBSD Project
|
#
b40ce416 |
|
12-Sep-2001 |
Julian Elischer <julian@FreeBSD.org> |
KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
|
#
d2c60af8 |
|
30-Aug-2001 |
Matthew Dillon <dillon@FreeBSD.org> |
Cleanup
|
#
676274db |
|
24-Aug-2001 |
Matthew Dillon <dillon@FreeBSD.org> |
Remove support for the badly broken MAP_INHERIT (from -current only).
|
#
54d92145 |
|
04-Jul-2001 |
Matthew Dillon <dillon@FreeBSD.org> |
whitespace / register cleanup
|
#
0cddd8f0 |
|
04-Jul-2001 |
Matthew Dillon <dillon@FreeBSD.org> |
With Alfred's permission, remove vm_mtx in favor of a fine-grained approach (this commit is just the first stage). Also add various GIANT_ macros to formalize the removal of Giant, making it easy to test in a more piecemeal fashion. These macros will allow us to test fine-grained locks to a degree before removing Giant, and also after, and to remove Giant in a piecemeal fashion via sysctl's on those subsystems which the authors believe can operate without Giant.
|
#
190609dd |
|
24-May-2001 |
John Baldwin <jhb@FreeBSD.org> |
Stick VM syscalls back under Giant if the BLEED option is not defined.
|
#
e4ca250d |
|
23-May-2001 |
John Baldwin <jhb@FreeBSD.org> |
- Obtain Giant in mmap() syscall while messing with file descriptors and vnodes. - Fix an old bug that would leak a reference to a fd if the vnode being mmap'd wasn't of type VREG or VCHR. - Lock Giant in vm_mmap() around calls into the VM that can call into pager routines that need Giant or into other VM routines that need Giant. - Replace code that used a goto to jump around the else branch of a test to use an else branch instead.
|
#
12635f9c |
|
22-May-2001 |
John Baldwin <jhb@FreeBSD.org> |
Unlock the VM lock at the end of munlock() instead of locking it again.
|
#
23955314 |
|
18-May-2001 |
Alfred Perlstein <alfred@FreeBSD.org> |
Introduce a global lock for the vm subsystem (vm_mtx). vm_mtx does not recurse and is required for most low level vm operations. faults can not be taken without holding Giant. Memory subsystems can now call the base page allocators safely. Almost all atomic ops were removed as they are covered under the vm mutex. Alpha and ia64 now need to catch up to i386's trap handlers. FFS and NFS have been tested, other filesystems will need minor changes (grabbing the vm lock when twiddling page properties). Reviewed (partially) by: jake, jhb
|
#
fb919e4d |
|
01-May-2001 |
Mark Murray <markm@FreeBSD.org> |
Undo part of the tangle of having sys/lock.h and sys/mutex.h included in other "system" header files. Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files. Sort sys/*.h includes where possible in affected files. OK'ed by: bde (with reservations)
|
#
279d7226 |
|
18-Nov-2000 |
Matthew Dillon <dillon@FreeBSD.org> |
This patchset fixes a large number of file descriptor race conditions. Pre-rfork code assumed inherent locking of a process's file descriptor array. However, with the advent of rfork() the file descriptor table could be shared between processes. This patch closes over a dozen serious race conditions related to one thread manipulating the table (e.g. closing or dup()ing a descriptor) while another is blocked in an open(), close(), fcntl(), read(), write(), etc... PR: kern/11629 Discussed with: Alexander Viro <viro@math.psu.edu>
|
#
9ff5ce6b |
|
12-Sep-2000 |
Boris Popov <bp@FreeBSD.org> |
Add three new VOPs: VOP_CREATEVOBJECT, VOP_DESTROYVOBJECT and VOP_GETVOBJECT. They will be used by nullfs and other stacked filesystems to support full cache coherency. Reviewed in general by: mckusick, dillon
|
#
3592b715 |
|
26-Jul-2000 |
Kirk McKusick <mckusick@FreeBSD.org> |
Clean up the snapshot code so that it no longer depends on the use of the SF_IMMUTABLE flag to prevent writing. Instead put in explicit checking for the SF_SNAPSHOT flag in the appropriate places. With this change, it is now possible to rename and link to snapshot files. It is also possible to set or clear any of the owner, group, or other read bits on the file, though none of the write or execute bits can be set. There is also an explicit test to prevent the setting or clearing of the SF_SNAPSHOT flag via chflags() or fchflags(). Note also that the modify time cannot be changed as it needs to accurately reflect the time that the snapshot was taken. Submitted by: Robert Watson <rwatson@FreeBSD.org>
|
#
2589f249 |
|
25-Jun-2000 |
Mark Murray <markm@FreeBSD.org> |
Nifty idea from Jeroen van Gelderen; don't call a routine to check if we are using the /dev/zero device, just check a flag (supplied by /dev/zero). Reviewed by: dfr
|
#
24964514 |
|
21-May-2000 |
Peter Wemm <peter@FreeBSD.org> |
Checkpoint of a new physical memory backed object type, that does not have pv_entries. This is intended for very special circumstances, eg: a certain database that has a 1GB shm segment mapped into 300 processes. That would consume 2GB of kvm just to hold the pv_entries alone. This would not be used on systems unless the physical ram was available, as it's not pageable. This is a work-in-progress, but is a useful and functional checkpoint. Matt has got some more fixes for it that will be committed soon. Reviewed by: dillon
|
#
0385347c |
|
20-May-2000 |
Peter Wemm <peter@FreeBSD.org> |
Implement an optimization of the VM<->pmap API. Pass vm_page_t's directly to various pmap_*() functions instead of looking up the physical address and passing that. In many cases, the first thing the pmap code was doing was going to a lot of trouble to get back the original vm_page_t, or it's shadow pv_table entry. Inspired by: John Dyson's 1998 patches. Also: Eliminate pv_table as a seperate thing and build it into a machine dependent part of vm_page_t. This eliminates having a seperate set of structions that shadow each other in a 1:1 fashion that we often went to a lot of trouble to translate from one to the other. (see above) This happens to save 4 bytes of physical memory for each page in the system. (8 bytes on the Alpha). Eliminate the use of the phys_avail[] array to determine if a page is managed (ie: it has pv_entries etc). Store this information in a flag. Things like device_pager set it because they create vm_page_t's on the fly that do not have pv_entries. This makes it easier to "unmanage" a page of physical memory (this will be taken advantage of in subsequent commits). Add a function to add a new page to the freelist. This could be used for reclaiming the previously wasted pages left over from preloaded loader(8) files. Reviewed by: dillon
|
#
aa543039 |
|
22-Apr-2000 |
Garrett Wollman <wollman@FreeBSD.org> |
Implement POSIX.1b shared memory objects. In this implementation, shared memory objects are regular files; the shm_open(3) routine uses fcntl(2) to set a flag on the descriptor which tells mmap(2) to automatically apply MAP_NOSYNC. Not objected to by: bde, dillon, dufault, jasone
|
#
5929bcfa |
|
27-Mar-2000 |
Philippe Charnier <charnier@FreeBSD.org> |
Revert spelling mistake I made in the previous commit Requested by: Alan and Bruce
|
#
956f3135 |
|
26-Mar-2000 |
Philippe Charnier <charnier@FreeBSD.org> |
Spelling
|
#
9730a5da |
|
27-Feb-2000 |
Paul Saab <ps@FreeBSD.org> |
Add MAP_NOCORE to mmap(2), and MADV_NOCORE and MADV_CORE to madvise(2). This This feature allows you to specify if mmap'd data is included in an application's corefile. Change the type of eflags in struct vm_map_entry from u_char to vm_eflags_t (an unsigned int). Reviewed by: dillon,jdp,alfred Approved by: jkh
|
#
1f6889a1 |
|
16-Feb-2000 |
Matthew Dillon <dillon@FreeBSD.org> |
Fix null-pointer dereference crash when the system is intentionally run out of KVM through a mmap()/fork() bomb that allocates hundreds of thousands of vm_map_entry structures. Add panic to make null-pointer dereference crash a little more verbose. Add a new sysctl, vm.max_proc_mmap, which specifies the maximum number of mmap()'d spaces (discrete vm_map_entry's in the process). The value defaults to around 9000 for a 128MB machine. The test is scaled for the number of processes sharing a vmspace (aka linux threads). Setting the value to 0 disables the feature. PR: kern/16573 Approved by: jkh
|
#
00d76afe |
|
03-Jan-2000 |
Guido van Rooij <guido@FreeBSD.org> |
Use MAP_NOSYNC for vnodes without any links in their filesystem. This is necessary for vmware: it does not use an anonymous mmap for the memory of the virtual system. In stead it creates a temp file an unlinks it. For a 50 MB file, this results in a ot of syncing every 30 seconds. Reviewed by: Matthew Dillon <dillon@backplane.com>
|
#
4f79d873 |
|
11-Dec-1999 |
Matthew Dillon <dillon@FreeBSD.org> |
Add MAP_NOSYNC feature to mmap(), and MADV_NOSYNC and MADV_AUTOSYNC to madvise(). This feature prevents the update daemon from gratuitously flushing dirty pages associated with a mapped file-backed region of memory. The system pager will still page the memory as necessary and the VM system will still be fully coherent with the filesystem. Modifications made by other means to the same area of memory, for example by write(), are unaffected. The feature works on a page-granularity basis. MAP_NOSYNC allows one to use mmap() to share memory between processes without incuring any significant filesystem overhead, putting it in the same performance category as SysV Shared memory and anonymous memory. Reviewed by: julian, alc, dg
|
#
923502ff |
|
29-Oct-1999 |
Poul-Henning Kamp <phk@FreeBSD.org> |
useracc() the prequel: Merge the contents (less some trivial bordering the silly comments) of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts the #defines for the vm_inherit_t and vm_prot_t types next to their typedefs. This paves the road for the commit to follow shortly: change useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE} as argument.
|
#
b4309055 |
|
20-Sep-1999 |
Matthew Dillon <dillon@FreeBSD.org> |
cleanup madvise code, add a few more sanity checks. Reviewed by: Alan Cox <alc@cs.rice.edu>, dg@root.com
|
#
c3aac50f |
|
27-Aug-1999 |
Peter Wemm <peter@FreeBSD.org> |
$Id$ -> $FreeBSD$
|
#
0ef1c826 |
|
08-Aug-1999 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Decommision miscfs/specfs/specdev.h. Most of it goes into <sys/conf.h>, a few lines into <sys/vnode.h>. Add a few fields to struct specinfo, paving the way for the fun part.
|
#
4738fa09 |
|
05-Jun-1999 |
Alan Cox <alc@FreeBSD.org> |
vm_mmap: Insure that device mappings get MAP_PREFAULT(_PARTIAL) set, so that 4M page mappings are used when possible. Reviewed by: Luoqi Chen <luoqi@watermarkgroup.com>
|
#
e972780a |
|
16-May-1999 |
Alan Cox <alc@FreeBSD.org> |
Add the options MAP_PREFAULT and MAP_PREFAULT_PARTIAL to vm_map_find/insert, eliminating the need for the pmap_object_init_pt calls in imgact_* and mmap. Reviewed by: David Greenman <dg@root.com>
|
#
ea41812f |
|
15-May-1999 |
Alan Cox <alc@FreeBSD.org> |
Remove prototypes for functions that don't exist anymore (vm_map.h). Remove a useless argument from vm_map_madvise's interface (vm_map.c, vm_map.h, and vm_mmap.c). Remove a redundant test in vm_uiomove (vm_map.c). Make two changes to vm_object_coalesce: 1. Determine whether the new range of pages actually overlaps the existing object's range of pages before calling vm_object_page_remove. (Prior to this change almost 90% of the calls to vm_object_page_remove were to remove pages that were beyond the end of the object.) 2. Free any swap space allocated to removed pages.
|
#
e5f13bdd |
|
14-May-1999 |
Alan Cox <alc@FreeBSD.org> |
Simplify vm_map_find/insert's interface: remove the MAP_COPY_NEEDED option. It never makes sense to specify MAP_COPY_NEEDED without also specifying MAP_COPY_ON_WRITE, and vice versa. Thus, MAP_COPY_ON_WRITE suffices. Reviewed by: David Greenman <dg@root.com>
|
#
4d38e6b5 |
|
06-May-1999 |
Peter Wemm <peter@FreeBSD.org> |
Add brackets to silence egcs and help clarity.
|
#
d28ab90f |
|
05-May-1999 |
Luoqi Chen <luoqi@FreeBSD.org> |
Don't ignore mmap() address hint below the text section.
|
#
f711d546 |
|
27-Apr-1999 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Suser() simplification: 1: s/suser/suser_xxx/ 2: Add new function: suser(struct proc *), prototyped in <sys/proc.h>. 3: s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/ The remaining suser_xxx() calls will be scrutinized and dealt with later. There may be some unneeded #include <sys/cred.h>, but they are left as an exercise for Bruce. More changes to the suser() API will come along with the "jail" code.
|
#
db42d908 |
|
19-Apr-1999 |
Peter Wemm <peter@FreeBSD.org> |
unifdef -DVM_STACK - it's been on for a while for x86 and was checked and appeared to be working for the Alpha some time ago.
|
#
dd2622a8 |
|
02-Mar-1999 |
Alan Cox <alc@FreeBSD.org> |
To avoid a conflict for the vm_map's lock with vm_fault, release the read lock around the subyte operations in mincore. After the lock is reacquired, use the map's timestamp to determine if we need to restart the scan.
|
#
eff50fcd |
|
01-Mar-1999 |
Alan Cox <alc@FreeBSD.org> |
mincore doesn't modify the vm_map. Therefore, it doesn't require an exclusive lock. A read lock will suffice.
|
#
b1028ad1 |
|
19-Feb-1999 |
Luoqi Chen <luoqi@FreeBSD.org> |
Hide access to vmspace:vm_pmap with inline function vmspace_pmap(). This is the preparation step for moving pmap storage out of vmspace proper. Reviewed by: Alan Cox <alc@cs.rice.edu> Matthew Dillion <dillon@apollo.backplane.com>
|
#
9fdfe602 |
|
07-Feb-1999 |
Matthew Dillon <dillon@FreeBSD.org> |
Remove MAP_ENTRY_IS_A_MAP 'share' maps. These maps were once used to attempt to optimize forks but were essentially given-up on due to problems and replaced with an explicit dup of the vm_map_entry structure. Prior to the removal, they were entirely unused.
|
#
2907af2a |
|
25-Jan-1999 |
Julian Elischer <julian@FreeBSD.org> |
Mostly remove the VM_STACK OPTION. This changes the definitions of a few items so that structures are the same whether or not the option itself is enabled. This allows people to enable and disable the option without recompilng the world. As the author says: |I ran into a problem pulling out the VM_STACK option. I was aware of this |when I first did the work, but then forgot about it. The VM_STACK stuff |has some code changes in the i386 branch. There need to be corresponding |changes in the alpha branch before it can come out completely. what is done: | |1) Pull the VM_STACK option out of the header files it appears in. This |really shouldn't affect anything that executes with or without the rest |of the VM_STACK patches. The vm_map_entry will then always have one |extra element (avail_ssize). It just won't be used if the VM_STACK |option is not turned on. | |I've also pulled the option out of vm_map.c. This shouldn't harm anything, |since the routines that are enabled as a result are not called unless |the VM_STACK option is enabled elsewhere. | |2) Add what appears to be appropriate code the the alpha branch, still |protected behind the VM_STACK switch. I don't have an alpha machine, |so we would need to get some testers with alpha machines to try it out. | |Once there is some testing, we can consider making the change permanent |for both i386 and alpha. | [..] | |Once the alpha code is adequately tested, we can pull VM_STACK out |everywhere. | Submitted by: "Richard Seaman, Jr." <dick@tar.com>
|
#
1c7c3c6a |
|
21-Jan-1999 |
Matthew Dillon <dillon@FreeBSD.org> |
This is a rather large commit that encompasses the new swapper, changes to the VM system to support the new swapper, VM bug fixes, several VM optimizations, and some additional revamping of the VM code. The specific bug fixes will be documented with additional forced commits. This commit is somewhat rough in regards to code cleanup issues. Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
|
#
2267af78 |
|
06-Jan-1999 |
Julian Elischer <julian@FreeBSD.org> |
Add (but don't activate) code for a special VM option to make downward growing stacks more general. Add (but don't activate) code to use the new stack facility when running threads, (specifically the linux threads support). This allows people to use both linux compiled linuxthreads, and also the native FreeBSD linux-threads port. The code is conditional on VM_STACK. Not using this will produce the old heavily tested system. Submitted by: Richard Seaman <dick@tar.com>
|
#
fc565456 |
|
09-Dec-1998 |
Dmitrij Tejblum <dt@FreeBSD.org> |
Don't disable mmap with large file offset.
|
#
6cde7a16 |
|
13-Oct-1998 |
David Greenman <dg@FreeBSD.org> |
Fixed two potentially serious classes of bugs: 1) The vnode pager wasn't properly tracking the file size due to "size" being page rounded in some cases and not in others. This sometimes resulted in corrupted files. First noticed by Terry Lambert. Fixed by changing the "size" pager_alloc parameter to be a 64bit byte value (as opposed to a 32bit page index) and changing the pagers and their callers to deal with this properly. 2) Fixed a bogus type cast in round_page() and trunc_page() that caused some 64bit offsets and sizes to be scrambled. Removing the cast required adding casts at a few dozen callers. There may be problems with other bogus casts in close-by macros. A quick check seemed to indicate that those were okay, however.
|
#
e69763a3 |
|
04-Sep-1998 |
Doug Rabson <dfr@FreeBSD.org> |
Cosmetic changes to the PAGE_XXX macros to make them consistent with the other objects in vm.
|
#
069e9bc1 |
|
24-Aug-1998 |
Doug Rabson <dfr@FreeBSD.org> |
Change various syscalls to use size_t arguments instead of u_int. Add some overflow checks to read/write (from bde). Change all modifications to vm_page::flags, vm_page::busy, vm_object::flags and vm_object::paging_in_progress to use operations which are not interruptable. Reviewed by: Bruce Evans <bde@zeta.org.au>
|
#
a23d65bf |
|
14-Jul-1998 |
Bruce Evans <bde@FreeBSD.org> |
Cast pointers to uintptr_t/intptr_t instead of to u_long/long, respectively. Most of the longs should probably have been u_longs, but this changes is just to prevent warnings about casts between pointers and integers of different sizes, not to fix poorly chosen types.
|
#
711458e3 |
|
05-Jul-1998 |
Doug Rabson <dfr@FreeBSD.org> |
Don't truncate the return value of mmap to sizeof(int).
|
#
be160d60 |
|
21-Jun-1998 |
Bruce Evans <bde@FreeBSD.org> |
Removed unused includes.
|
#
ecbb00a2 |
|
07-Jun-1998 |
Doug Rabson <dfr@FreeBSD.org> |
This commit fixes various 64bit portability problems required for FreeBSD/alpha. The most significant item is to change the command argument to ioctl functions from int to u_long. This change brings us inline with various other BSD versions. Driver writers may like to use (__FreeBSD_version == 300003) to detect this change. The prototype FreeBSD/alpha machdep will follow in a couple of days time.
|
#
4183b6b6 |
|
19-May-1998 |
Peter Wemm <peter@FreeBSD.org> |
Make the previous commit compile..
|
#
05feb99f |
|
18-May-1998 |
Guido van Rooij <guido@FreeBSD.org> |
Plug hole reported on Bugtraq: do not allow mmap with WRITE privs for append-only and immutable files. Obtained from: OpenBSD (partly)
|
#
c8bdd56b |
|
12-Mar-1998 |
Guido van Rooij <guido@FreeBSD.org> |
Fix for mmap of char devices bug as described in OpenBSD advisory of 1998/02/20 Reviewed by: John Dyson Submitted by: "Cy Schubert" <cschuber@uumail.gov.bc.ca>
|
#
8f9110f6 |
|
07-Mar-1998 |
John Dyson <dyson@FreeBSD.org> |
This mega-commit is meant to fix numerous interrelated problems. There has been some bitrot and incorrect assumptions in the vfs_bio code. These problems have manifest themselves worse on NFS type filesystems, but can still affect local filesystems under certain circumstances. Most of the problems have involved mmap consistancy, and as a side-effect broke the vfs.ioopt code. This code might have been committed seperately, but almost everything is interrelated. 1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that are fully valid. 2) Rather than deactivating erroneously read initial (header) pages in kern_exec, we now free them. 3) Fix the rundown of non-VMIO buffers that are in an inconsistent (missing vp) state. 4) Fix the disassociation of pages from buffers in brelse. The previous code had rotted and was faulty in a couple of important circumstances. 5) Remove a gratuitious buffer wakeup in vfs_vmio_release. 6) Remove a crufty and currently unused cluster mechanism for VBLK files in vfs_bio_awrite. When the code is functional, I'll add back a cleaner version. 7) The page busy count wakeups assocated with the buffer cache usage were incorrectly cleaned up in a previous commit by me. Revert to the original, correct version, but with a cleaner implementation. 8) The cluster read code now tries to keep data associated with buffers more aggressively (without breaking the heuristics) when it is presumed that the read data (buffers) will be soon needed. 9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The delay loop waiting is not useful for filesystem locks, due to the length of the time intervals. 10) Correct and clean-up spec_getpages. 11) Implement a fully functional nfs_getpages, nfs_putpages. 12) Fix nfs_write so that modifications are coherent with the NFS data on the server disk (at least as well as NFS seems to allow.) 13) Properly support MS_INVALIDATE on NFS. 14) Properly pass down MS_INVALIDATE to lower levels of the VM code from vm_map_clean. 15) Better support the notion of pages being busy but valid, so that fewer in-transit waits occur. (use p->busy more for pageouts instead of PG_BUSY.) Since the page is fully valid, it is still usable for reads. 16) It is possible (in error) for cached pages to be busy. Make the page allocation code handle that case correctly. (It should probably be a printf or panic, but I want the system to handle coding errors robustly. I'll probably add a printf.) 17) Correct the design and usage of vm_page_sleep. It didn't handle consistancy problems very well, so make the design a little less lofty. After vm_page_sleep, if it ever blocked, it is still important to relookup the page (if the object generation count changed), and verify it's status (always.) 18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up. 19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush. 20) Fix vm_pager_put_pages and it's descendents to support an int flag instead of a boolean, so that we can pass down the invalidate bit.
|
#
0b08f5f7 |
|
05-Feb-1998 |
Eivind Eklund <eivind@FreeBSD.org> |
Back out DIAGNOSTIC changes.
|
#
47cfdb16 |
|
04-Feb-1998 |
Eivind Eklund <eivind@FreeBSD.org> |
Turn DIAGNOSTIC into a new-style option.
|
#
651bb817 |
|
30-Dec-1997 |
Alexander Langer <alex@FreeBSD.org> |
caddr_t --> void *
|
#
5591b823d |
|
16-Dec-1997 |
Eivind Eklund <eivind@FreeBSD.org> |
Make COMPAT_43 and COMPAT_SUNOS new-style options.
|
#
cb226aaa |
|
06-Nov-1997 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Move the "retval" (3rd) parameter from all syscall functions and put it in struct proc instead. This fixes a boatload of compiler warning, and removes a lot of cruft from the sources. I have not removed the /*ARGSUSED*/, they will require some looking at. libkvm, ps and other userland struct proc frobbing programs will need recompiled.
|
#
79624e21 |
|
31-Aug-1997 |
Bruce Evans <bde@FreeBSD.org> |
Removed unused #includes.
|
#
54f42e4b |
|
30-Aug-1997 |
Peter Wemm <peter@FreeBSD.org> |
Allow non-page aligned file offset mmap's, providing that the system is allowed to choose the address, or that the MAP_FIXED address has the same remainder when modulo PAGE_SIZE as the file offset. Apparently this is posix1003.1b specified behavior. SVR4 and the other *BSD's allow it too. It costs us nothing to support and means we don't get EINVAL on some mmap code that works perfectly elsewhere. Obtained from: NetBSD
|
#
b9dcd593 |
|
25-Aug-1997 |
Bruce Evans <bde@FreeBSD.org> |
Fixed type mismatches for functions with args of type vm_prot_t and/or vm_inherit_t. These types are smaller than ints, so the prototypes should have used the promoted type (int) to match the old-style function definitions. They use just vm_prot_t and/or vm_inherit_t. This depends on gcc features to work. I fixed the definitions since this is easiest. The correct fix may be to change the small types to u_int, to optimize for time instead of space.
|
#
0a0a85b3 |
|
16-Jul-1997 |
John Dyson <dyson@FreeBSD.org> |
Add support for 4MB pages. This includes the .text, .data, .data parts of the kernel, and also most of the dynamic parts of the kernel. Additionally, 4MB pages will be allocated for display buffers as appropriate (only.) The 4MB support for SMP isn't complete, but doesn't interfere with operation either.
|
#
4a40e3d4 |
|
15-Jun-1997 |
John Dyson <dyson@FreeBSD.org> |
Correct the return code for the mlock system call. Also add the stubs for mlockall and munlockall.
|
#
3ac4d1ef |
|
22-Mar-1997 |
Bruce Evans <bde@FreeBSD.org> |
Don't #include <sys/fcntl.h> in <sys/file.h> if KERNEL is defined. Fixed everything that depended on getting fcntl.h stuff from the wrong place. Most things don't depend on file.h stuff at all.
|
#
6875d254 |
|
22-Feb-1997 |
Peter Wemm <peter@FreeBSD.org> |
Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
|
#
996c772f |
|
09-Feb-1997 |
John Dyson <dyson@FreeBSD.org> |
This is the kernel Lite/2 commit. There are some requisite userland changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes. The system boots and can mount UFS filesystems. Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed. Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>
|
#
afa07f7e |
|
15-Jan-1997 |
John Dyson <dyson@FreeBSD.org> |
Change the map entry flags from bitfields to bitmasks. Allows for some code simplification.
|
#
1130b656 |
|
14-Jan-1997 |
Jordan K. Hubbard <jkh@FreeBSD.org> |
Make the long-awaited change from $Id$ to $FreeBSD$ This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
|
#
9b5a5d81 |
|
11-Jan-1997 |
John Dyson <dyson@FreeBSD.org> |
Prepare better for multi-platform by eliminating another required pmap routine (pmap_is_referenced.) Upper level recoded to use pmap_ts_referenced.
|
#
d0aea04f |
|
29-Dec-1996 |
John Dyson <dyson@FreeBSD.org> |
Let the VM system know that on certain arch's that VM_PROT_READ also implies VM_PROT_EXEC. We support it that way for now, since the break system call by default gives VM_PROT_ALL. Now we have a better chance of coalesing map entries when mixing mmap/break type operations. This was contributing to excessive numbers of map entries on the modula-3 runtime system. The problem is still not "solved", but the situation makes more sense. Eventually, when we work on architectures where VM_PROT_READ is orthogonal to VM_PROT_EXEC, we will have to visit this issue carefully (esp. regarding security issues.)
|
#
94328e90 |
|
28-Dec-1996 |
John Dyson <dyson@FreeBSD.org> |
The code unnecessarily created an object with no handle up-front, which has the negative effect of disabling some map optimizations. This patch defers the creation of the object until it needs to be at fault time. Submitted by: Alan Cox <alc@cs.rice.edu>
|
#
e9822d92 |
|
22-Dec-1996 |
Joerg Wunsch <joerg@FreeBSD.org> |
Make DFLDSIZ and MAXDSIZ fully-supported options. "Don't forget to do a ``make depend''" :-)
|
#
7aaaa4fd |
|
14-Dec-1996 |
John Dyson <dyson@FreeBSD.org> |
Implement closer-to POSIX mlock semantics. The major difference is that we do allow mlock to span unallocated regions (of course, not mlocking them.) We also allow mlocking of RO regions (which the old code couldn't.) The restriction there is that once a RO region is wired (mlocked), it cannot be debugged (or EVER written to.) Under normal usage, the new mlock code will be a significant improvement over our old stuff.
|
#
851c12ff |
|
29-Oct-1996 |
John Dyson <dyson@FreeBSD.org> |
Change mmap to use OBJT_DEFAULT instead of OBJT_SWAP by default for anonymous objects. The system will automatically change the type to SWAP if needed (for size or pageout reasons.)
|
#
fcae040b |
|
23-Oct-1996 |
John Dyson <dyson@FreeBSD.org> |
Remove a bogus optimization in the mmap code. It is superfluous, and at best is the same speed as the unoptimized code. At worst, it slows down trivial programs.
|
#
1111860c |
|
13-Oct-1996 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Remove a stale comment.
|
#
cd6eea25 |
|
19-Sep-1996 |
David Greenman <dg@FreeBSD.org> |
Fixed bug with reversed trunc/round_page() in madvise...start must be trunced, end must be rounded.
|
#
67bf6868 |
|
29-Jul-1996 |
John Dyson <dyson@FreeBSD.org> |
Backed out the recent changes/enhancements to the VM code. The problem with the 'shell scripts' was found, but there was a 'strange' problem found with a 486 laptop that we could not find. This commit backs the code back to 25-jul, and will be re-entered after the snapshot in smaller (more easily tested) chunks.
|
#
0f281c28 |
|
27-Jul-1996 |
David Greenman <dg@FreeBSD.org> |
Slight performance tweak for previous commit.
|
#
bf6dfc7b |
|
27-Jul-1996 |
John Dyson <dyson@FreeBSD.org> |
Allow sequentially created mmap'ed anonymous regions to coalesce. There is little or no reason to create a swap pager for small mmap's. The vm_map_insert code will automatically create a swap pager if the object becomes too large. This fix, per a request from phk.
|
#
feb32a8f |
|
26-Jul-1996 |
John Dyson <dyson@FreeBSD.org> |
Remove experimental header file. My test-build must have picked it up in an unexpected place. Submitted by: jkh
|
#
4f4d35ed |
|
26-Jul-1996 |
John Dyson <dyson@FreeBSD.org> |
This commit is meant to solve a couple of VM system problems or performance issues. 1) The pmap module has had too many inlines, and so the object file is simply bigger than it needs to be. Some common code is also merged into subroutines. 2) Removal of some *evil* PHYS_TO_VM_PAGE macro calls. Unfortunately, a few have needed to be added also. The removal caused the need for more vm_page_lookups. I added lookup hints to minimize the need for the page table lookup operations. 3) Removal of some bogus performance improvements, that mostly made the code more complex (tracking individual page table page updates unnecessarily). Those improvements actually hurt 386 processors perf (not that people who worry about perf use 386 processors anymore :-)). 4) Changed pv queue manipulations/structures to be TAILQ's. 5) The pv queue code has had some performance problems since day one. Some significant scalability issues are resolved by threading the pv entries from the pmap AND the physical address instead of just the physical address. This makes certain pmap operations run much faster. This does not affect most micro-benchmarks, but should help loaded system performance *significantly*. DG helped and came up with most of the solution for this one. 6) Most if not all pmap bit operations follow the pattern: pmap_test_bit(); pmap_clear_bit(); That made for twice the necessary pv list traversal. The pmap interface now supports only pmap_tc_bit type operations: pmap_[test/clear]_modified, pmap_[test/clear]_referenced. Additionally, the modified routine now takes a vm_page_t arg instead of a phys address. This eliminates a PHYS_TO_VM_PAGE operation. 7) Several rewrites of routines that contain redundant code to use common routines, so that there is a greater likelihood of keeping the cache footprint smaller.
|
#
f35329ac |
|
30-May-1996 |
John Dyson <dyson@FreeBSD.org> |
This commit is dual-purpose, to fix more of the pageout daemon queue corruption problems, and to apply Gary Palmer's code cleanups. David Greenman helped with these problems also. There is still a hang problem using X in small memory machines.
|
#
867a482d |
|
19-May-1996 |
John Dyson <dyson@FreeBSD.org> |
Initial support for mincore and madvise. Both are almost fully supported, except madvise does not page in with MADV_WILLNEED, and MADV_DONTNEED doesn't force dirty pages out.
|
#
b18bfc3d |
|
17-May-1996 |
John Dyson <dyson@FreeBSD.org> |
This set of commits to the VM system does the following, and contain contributions or ideas from Stephen McKay <syssgm@devetir.qld.gov.au>, Alan Cox <alc@cs.rice.edu>, David Greenman <davidg@freebsd.org> and me: More usage of the TAILQ macros. Additional minor fix to queue.h. Performance enhancements to the pageout daemon. Addition of a wait in the case that the pageout daemon has to run immediately. Slightly modify the pageout algorithm. Significant revamp of the pmap/fork code: 1) PTE's and UPAGES's are NO LONGER in the process's map. 2) PTE's and UPAGES's reside in their own objects. 3) TOTAL elimination of recursive page table pagefaults. 4) The page directory now resides in the PTE object. 5) Implemented pmap_copy, thereby speeding up fork time. 6) Changed the pv entries so that the head is a pointer and not an entire entry. 7) Significant cleanup of pmap_protect, and pmap_remove. 8) Removed significant amounts of machine dependent fork code from vm_glue. Pushed much of that code into the machine dependent pmap module. 9) Support more completely the reuse of already zeroed pages (Page table pages and page directories) as being already zeroed. Performance and code cleanups in vm_map: 1) Improved and simplified allocation of map entries. 2) Improved vm_map_copy code. 3) Corrected some minor problems in the simplify code. Implemented splvm (combo of splbio and splimp.) The VM code now seldom uses splhigh. Improved the speed of and simplified kmem_malloc. Minor mod to vm_fault to avoid using pre-zeroed pages in the case of objects with backing objects along with the already existant condition of having a vnode. (If there is a backing object, there will likely be a COW... With a COW, it isn't necessary to start with a pre-zeroed page.) Minor reorg of source to perhaps improve locality of ref.
|
#
aa8de40a |
|
03-May-1996 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Another sweep over the pmap/vm macros, this time with more focus on the usage. I'm not satisfied with the naming, but now at least there is less bogus stuff around.
|
#
8f2ec877 |
|
16-Mar-1996 |
David Greenman <dg@FreeBSD.org> |
Force device mappings to always be shared. It doesn't make sense for them to ever be COW and we need the mappings to be shared for backward compatibilty. Reviewed by: dyson
|
#
5850152d |
|
11-Mar-1996 |
John Dyson <dyson@FreeBSD.org> |
Allow mmap'ed devices to work correctly across forks. The sanest solution appeared to be to allow the child to maintain the same mapping as the parent.
|
#
8169788f |
|
11-Mar-1996 |
Peter Wemm <peter@FreeBSD.org> |
Import 4.4BSD-Lite2 onto the vendor branch, note that in the kernel, all files are off the vendor branch, so this should not change anything. A "U" marker generally means that the file was not changed in between the 4.4Lite and Lite-2 releases, and does not need a merge. "C" generally means that there was a change.
|
#
9154ee6a |
|
02-Mar-1996 |
Peter Wemm <peter@FreeBSD.org> |
Oops.. I nearly forgot the actual core of the length/rounding/etc fixes that Bruce asked for. These still are not quite perfect, and in particular, it can get upset on extreme boundary cases (addr = 0xfff, len = 0xffffffff, which would end up mapping a single page rather than failing), but this is better code that I committed before. (note, the VM system does not (apparently) support single mmap segment sizes above 0x80000000 anyway)
|
#
de5f6a77 |
|
01-Mar-1996 |
John Dyson <dyson@FreeBSD.org> |
1) Eliminate unnecessary bzero of UPAGES. 2) Eliminate unnecessary copying of pages during/after forks. 3) Add user map simplification.
|
#
dabee6fe |
|
23-Feb-1996 |
Peter Wemm <peter@FreeBSD.org> |
kern_descrip.c: add fdshare()/fdcopy() kern_fork.c: add the tiny bit of code for rfork operation. kern/sysv_*: shmfork() takes one less arg, it was never used. sys/shm.h: drop "isvfork" arg from shmfork() prototype sys/param.h: declare rfork args.. (this is where OpenBSD put it..) sys/filedesc.h: protos for fdshare/fdcopy. vm/vm_mmap.c: add minherit code, add rounding to mmap() type args where it makes sense. vm/*: drop unused isvfork arg. Note: this rfork() implementation copies the address space mappings, it does not connect the mappings together. ie: once the two processes have split, the pages may be shared, but the address space is not. If one does a mmap() etc, it does not appear in the other. This makes it not useful for pthreads, but it is useful in it's own right for having light-weight threads in a static shared address space. Obtained from: Original by Ron Minnich, extended by OpenBSD
|
#
bd7e5f99 |
|
18-Jan-1996 |
John Dyson <dyson@FreeBSD.org> |
Eliminated many redundant vm_map_lookup operations for vm_mmap. Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish overhead for merged cache. Efficiency improvement for vfs_cluster. It used to do alot of redundant calls to cluster_rbuild. Correct the ordering for vrele of .text and release of credentials. Use the selective tlb update for 486/586/P6. Numerous fixes to the size of objects allocated for files. Additionally, fixes in the various pagers. Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs. Fixes in the swap pager for exhausted resources. The pageout code will not as readily thrash. Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE), thereby improving efficiency of several routines. Eliminate even more unnecessary vm_page_protect operations. Significantly speed up process forks. Make vm_object_page_clean more efficient, thereby eliminating the pause that happens every 30seconds. Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the case of filesystems mounted async. Fix a panic with busy pages when write clustering is done for non-VMIO buffers.
|
#
f2c6b65b |
|
17-Dec-1995 |
Bruce Evans <bde@FreeBSD.org> |
Fixed 1TB filesize changes. Some pindexes had bogus names and types but worked because vm_pindex_t is indistinuishable from vm_offset_t.
|
#
3048c512 |
|
12-Dec-1995 |
John Dyson <dyson@FreeBSD.org> |
There was a bug that the size for an msync'ed region was not rounded up. The effect of this was that msync with a size would generally sync 1 page less than it should. This problem was brought to my attention by Darrel Herbst <dherbst@gradin.cis.upenn.edu> and Ron Minnich <rminnich@sarnoff.com>.
|
#
a316d390 |
|
10-Dec-1995 |
John Dyson <dyson@FreeBSD.org> |
Changes to support 1Tb filesizes. Pages are now named by an (object,index) pair instead of (object,offset) pair.
|
#
efeaf95a |
|
06-Dec-1995 |
David Greenman <dg@FreeBSD.org> |
Untangled the vm.h include file spaghetti.
|
#
cac597e4 |
|
02-Dec-1995 |
Bruce Evans <bde@FreeBSD.org> |
Completed function declarations and/or added prototypes. Staticized some functions. __purified some functions. Some functions were bogusly declared as returning `const'. This hasn't done anything since gcc-2.5. For later versions of gcc, the equivalent is __attribute__((const)) at the end of function declarations.
|
#
d2d3e875 |
|
11-Nov-1995 |
Bruce Evans <bde@FreeBSD.org> |
Included <sys/sysproto.h> to get central declarations for syscall args structs and prototypes for syscalls. Ifdefed duplicated decentralized declarations of args structs. It's convenient to have this visible but they are hard to maintain. Some are already different from the central declarations. 4.4lite2 puts them in comments in the function headers but I wanted to avoid the large changes for that.
|
#
e17bed12 |
|
22-Oct-1995 |
John Dyson <dyson@FreeBSD.org> |
First phase of removing the PG_COPYONWRITE flag, and an architectural cleanup of mapping files.
|
#
02c04a2f |
|
21-Oct-1995 |
John Dyson <dyson@FreeBSD.org> |
Implement mincore system call.
|
#
24a1cce3 |
|
13-Jul-1995 |
David Greenman <dg@FreeBSD.org> |
NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct proc or any VM system structure will have to be rebuilt!!! Much needed overhaul of the VM system. Included in this first round of changes: 1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages, haspage, and sync operations are supported. The haspage interface now provides information about clusterability. All pager routines now take struct vm_object's instead of "pagers". 2) Improved data structures. In the previous paradigm, there is constant confusion caused by pagers being both a data structure ("allocate a pager") and a collection of routines. The idea of a pager structure has escentially been eliminated. Objects now have types, and this type is used to index the appropriate pager. In most cases, items in the pager structure were duplicated in the object data structure and thus were unnecessary. In the few cases that remained, a un_pager structure union was created in the object to contain these items. 3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now be removed. For instance, vm_object_enter(), vm_object_lookup(), vm_object_remove(), and the associated object hash list were some of the things that were removed. 4) simple_lock's removed. Discussion with several people reveals that the SMP locking primitives used in the VM system aren't likely the mechanism that we'll be adopting. Even if it were, the locking that was in the code was very inadequate and would have to be mostly re-done anyway. The locking in a uni-processor kernel was a no-op but went a long way toward making the code difficult to read and debug. 5) Places that attempted to kludge-up the fact that we don't have kernel thread support have been fixed to reflect the reality that we are really dealing with processes, not threads. The VM system didn't have complete thread support, so the comments and mis-named routines were just wrong. We now use tsleep and wakeup directly in the lock routines, for instance. 6) Where appropriate, the pagers have been improved, especially in the pager_alloc routines. Most of the pager_allocs have been rewritten and are now faster and easier to maintain. 7) The pagedaemon pageout clustering algorithm has been rewritten and now tries harder to output an even number of pages before and after the requested page. This is sort of the reverse of the ideal pagein algorithm and should provide better overall performance. 8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup have been removed. Some other unnecessary casts have also been removed. 9) Some almost useless debugging code removed. 10) Terminology of shadow objects vs. backing objects straightened out. The fact that the vm_object data structure escentially had this backwards really confused things. The use of "shadow" and "backing object" throughout the code is now internally consistent and correct in the Mach terminology. 11) Several minor bug fixes, including one in the vm daemon that caused 0 RSS objects to not get purged as intended. 12) A "default pager" has now been created which cleans up the transition of objects to the "swap" type. The previous checks throughout the code for swp->pg_data != NULL were really ugly. This change also provides the rudiments for future backing of "anonymous" memory by something other than the swap pager (via the vnode pager, for example), and it allows the decision about which of these pagers to use to be made dynamically (although will need some additional decision code to do this, of course). 13) (dyson) MAP_COPY has been deprecated and the corresponding "copy object" code has been removed. MAP_COPY was undocumented and non- standard. It was furthermore broken in several ways which caused its behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will continue to work correctly, but via the slightly different semantics of MAP_PRIVATE. 14) (dyson) Sharing maps have been removed. It's marginal usefulness in a threads design can be worked around in other ways. Both #12 and #13 were done to simplify the code and improve readability and maintain- ability. (As were most all of these changes) TODO: 1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing this will reduce the vnode pager to a mere fraction of its current size. 2) Rewrite vm_fault and the swap/vnode pagers to use the clustering information provided by the new haspage pager interface. This will substantially reduce the overhead by eliminating a large number of VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be improved to provide both a "behind" and "ahead" indication of contiguousness. 3) Implement the extended features of pager_haspage in swap_pager_haspage(). It currently just says 0 pages ahead/behind. 4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps via a much more general mechanism that could also be used for disk striping of regular filesystems. 5) Do something to improve the architecture of vm_object_collapse(). The fact that it makes calls into the swap pager and knows too much about how the swap pager operates really bothers me. It also doesn't allow for collapsing of non-swap pager objects ("unnamed" objects backed by other pagers).
|
#
06cb7259 |
|
09-Jul-1995 |
David Greenman <dg@FreeBSD.org> |
Moved call to VOP_GETATTR() out of vnode_pager_alloc() and into the places that call vnode_pager_alloc() so that a failure return can be dealt with. This fixes a panic seen on NFS clients when a file being opened is deleted on the server before the open completes.
|
#
9b2e5354 |
|
30-May-1995 |
Rodney W. Grimes <rgrimes@FreeBSD.org> |
Remove trailing whitespace.
|
#
5f55e841 |
|
17-May-1995 |
David Greenman <dg@FreeBSD.org> |
Accessing pages beyond the end of a mapped file results in internal inconsistencies in the VM system that eventually lead to a panic. These changes fix the behavior to conform to the behavior in SunOS, which is to deny faults to pages beyond the EOF (returning SIGBUS). Internally, this is implemented by requiring faults to be within the object size boundaries. These changes exposed another bug, namely that passing in an offset to mmap when trying to map an unnamed anonymous region also results in internal inconsistencies. In this case, the offset is forced to zero. Reviewed by: John Dyson and others
|
#
c3cb3e12 |
|
15-Apr-1995 |
David Greenman <dg@FreeBSD.org> |
Moved some zero-initialized variables into .bss. Made code intended to be called only from DDB #ifdef DDB. Removed some completely unused globals.
|
#
6c534ad8 |
|
25-Mar-1995 |
David Greenman <dg@FreeBSD.org> |
Fix logic bug I just introduced with the flags to msync().
|
#
1e62bc63 |
|
25-Mar-1995 |
David Greenman <dg@FreeBSD.org> |
Disallow both MS_ASYNC and MS_INVALIDATE flags being set at the same time in msync().
|
#
e6c6af11 |
|
25-Mar-1995 |
David Greenman <dg@FreeBSD.org> |
Added "flags" argument to msync, and implemented MS_ASYNC and MS_INVALIDATE. The MS_ASYNC flag doesn't current work, and MS_INVALIDATE will only toss out the pages in the address space (not all pages in the shadow chain).
|
#
8f4e17d4 |
|
21-Mar-1995 |
David Greenman <dg@FreeBSD.org> |
Fixed bug in vm_mmap() where the object that is created in some cases was the wrong size. This is the likely cause of panics reported by Lars Fredriksen and Paul Richards related to a -1 blkno when paging via the swap_pager. Submitted by: John Dyson
|
#
bc9ad247 |
|
21-Mar-1995 |
David Greenman <dg@FreeBSD.org> |
Disallow non page-aligned file offsets in vm_mmap(). We don't support this in either the high or low level parts of the VM system. Just return EINVAL in this case, just like SunOS does.
|
#
fbcfcdf7 |
|
20-Mar-1995 |
David Greenman <dg@FreeBSD.org> |
Fixed bug in the size == 0 case of msync() caused by a bogus return value check..
|
#
edf8a815 |
|
19-Mar-1995 |
David Greenman <dg@FreeBSD.org> |
Removed redundant newlines that were in some panic strings.
|
#
b5e8ce9f |
|
16-Mar-1995 |
Bruce Evans <bde@FreeBSD.org> |
Add and move declarations to fix all of the warnings from `gcc -Wimplicit' (except in netccitt, netiso and netns) and most of the warnings from `gcc -Wnested-externs'. Fix all the bugs found. There were no serious ones.
|
#
c4ed5a07 |
|
12-Mar-1995 |
David Greenman <dg@FreeBSD.org> |
Fixed obsolete comment.
|
#
f2da180f |
|
07-Mar-1995 |
David Greenman <dg@FreeBSD.org> |
Fixed object reference count problem that occurred in the MAP_PRIVATE case after we rewrote vm_mmap(). Added some comments to make it easier to follow the reference counts.
|
#
50ce2102 |
|
22-Feb-1995 |
David Greenman <dg@FreeBSD.org> |
Rewrote MAP_PRIVATE case of vm_mmap() - all of the COW portion of this routine was highly convoluted. Submitted by: John Dyson
|
#
7fb0c17e |
|
20-Feb-1995 |
David Greenman <dg@FreeBSD.org> |
Deprecated remaining use of vm_deallocate. Deprecated vm_allocate_with_ pager(). Almost completely rewrote vm_mmap(); when John gets done with the bottom half, it will be a complete rewrite. Deprecated most use of vm_object_setpager(). Removed side effect of setting object persist in vm_object_enter and moved this into the pager(s). A few other cosmetic changes.
|
#
ca40da74 |
|
15-Feb-1995 |
David Greenman <dg@FreeBSD.org> |
Don't bother calling pmap_create() when creating the temporary map. The whole COW section of vm_mmap() should be rewritten; the current implementation is highly convoluted.
|
#
0d94caff |
|
09-Jan-1995 |
David Greenman <dg@FreeBSD.org> |
These changes embody the support of the fully coherent merged VM buffer cache, much higher filesystem I/O performance, and much better paging performance. It represents the culmination of over 6 months of R&D. The majority of the merged VM/cache work is by John Dyson. The following highlights the most significant changes. Additionally, there are (mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to support the new VM/buffer scheme. vfs_bio.c: Significant rewrite of most of vfs_bio to support the merged VM buffer cache scheme. The scheme is almost fully compatible with the old filesystem interface. Significant improvement in the number of opportunities for write clustering. vfs_cluster.c, vfs_subr.c Upgrade and performance enhancements in vfs layer code to support merged VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff. vm_object.c: Yet more improvements in the collapse code. Elimination of some windows that can cause list corruption. vm_pageout.c: Fixed it, it really works better now. Somehow in 2.0, some "enhancements" broke the code. This code has been reworked from the ground-up. vm_fault.c, vm_page.c, pmap.c, vm_object.c Support for small-block filesystems with merged VM/buffer cache scheme. pmap.c vm_map.c Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of kernel PTs. vm_glue.c Much simpler and more effective swapping code. No more gratuitous swapping. proc.h Fixed the problem that the p_lock flag was not being cleared on a fork. swap_pager.c, vnode_pager.c Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the code doesn't need it anymore. machdep.c Changes to better support the parameter values for the merged VM/buffer cache scheme. machdep.c, kern_exec.c, vm_glue.c Implemented a seperate submap for temporary exec string space and another one to contain process upages. This eliminates all map fragmentation problems that previously existed. ffs_inode.c, ufs_inode.c, ufs_readwrite.c Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on busy buffers. Submitted by: John Dyson and David Greenman
|
#
05f0fdd2 |
|
08-Oct-1994 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Cosmetics: unused vars, ()'s, #include's &c &c to silence gcc. Reviewed by: davidg
|
#
90324b07 |
|
02-Sep-1994 |
David Greenman <dg@FreeBSD.org> |
Whoops, accidently left out some pieces of the munmapfd patch.
|
#
f720dc2c |
|
06-Aug-1994 |
David Greenman <dg@FreeBSD.org> |
Enabled page table preloading of cached objects. Submitted by: John Dyson
|
#
bbc0ec52 |
|
03-Aug-1994 |
David Greenman <dg@FreeBSD.org> |
Integrated VM system improvements/fixes from FreeBSD-1.1.5.
|
#
3c4dd356 |
|
02-Aug-1994 |
David Greenman <dg@FreeBSD.org> |
Added $Id$
|
#
26f9a767 |
|
25-May-1994 |
Rodney W. Grimes <rgrimes@FreeBSD.org> |
The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch. Reviewed by: Rodney W. Grimes Submitted by: John Dyson and David Greenman
|
#
df8bae1d |
|
24-May-1994 |
Rodney W. Grimes <rgrimes@FreeBSD.org> |
BSD 4.4 Lite Kernel Sources
|