History log of /freebsd-10.1-release/sys/vm/vm_map.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 290362 04-Nov-2015 glebius

o Fix regressions related to SA-15:25 upgrade of NTP. [1]
o Fix kqueue write events never fired for files greater 2GB. [2]
o Fix kpplications exiting due to segmentation violation on a correct
memory address. [3]

PR: 204046 [1]
PR: 204203 [1]
Errata Notice: FreeBSD-EN-15:19.kqueue [2]
Errata Notice: FreeBSD-EN-15:20.vm [3]
Approved by: so


# 272461 02-Oct-2014 gjb

Copy stable/10@r272459 to releng/10.1 as part of
the 10.1-RELEASE process.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

# 272202 27-Sep-2014 kib

MFC r272036:
Avoid calling vm_map_pmap_enter() for the MADV_WILLNEED on the wired
entry, the pages must be already mapped.

Approved by: re (gjb)


# 270920 01-Sep-2014 kib

Fix a leak of the wired pages when unwiring of the PROT_NONE-mapped
wired region. Rework the handling of unwire to do the it in batch,
both at pmap and object level.

All commits below are by alc.

MFC r268327:
Introduce pmap_unwire().

MFC r268591:
Implement pmap_unwire() for powerpc.

MFC r268776:
Implement pmap_unwire() for arm.

MFC r268806:
pmap_unwire(9) man page.

MFC r269134:
When unwiring a region of an address space, do not assume that the
underlying physical pages are mapped by the pmap. This fixes a leak
of the wired pages on the unwiring of the region mapped with no access
allowed.

MFC r269339:
In the implementation of the new function pmap_unwire(), the call to
MOEA64_PVO_TO_PTE() must be performed before any changes are made to the
PVO. Otherwise, MOEA64_PVO_TO_PTE() will panic.

MFC r269365:
Correct a long-standing problem in moea{,64}_pvo_enter() that was revealed
by the combination of r268591 and r269134: When we attempt to add the
wired attribute to an existing mapping, moea{,64}_pvo_enter() do nothing.
(They only set the wired attribute on newly created mappings.)

MFC r269433:
Handle wiring failures in vm_map_wire() with the new functions
pmap_unwire() and vm_object_unwire().
Retire vm_fault_{un,}wire(), since they are no longer used.

MFC r269438:
Rewrite a loop in vm_map_wire() so that gcc doesn't think that the variable
"rv" is uninitialized.

MFC r269485:
Retire pmap_change_wiring().

Reviewed by: alc


# 269072 24-Jul-2014 kib

MFC r267213 (by alc):
Add a page size field to struct vm_page.

Approved by: alc


# 267956 27-Jun-2014 kib

MFC r267664:
Assert that the new entry is inserted into the right location in the
map entries list, and that it does not overlap with the previous and
next entries.


# 267901 26-Jun-2014 kib

MFC r267630:
Add MAP_EXCL flag for mmap(2).


# 267899 26-Jun-2014 kib

MFC r267766:
Use correct names for the flags.


# 267772 23-Jun-2014 kib

MFC r267254:
Make mmap(MAP_STACK) search for the available address space.

MFC r267497 (by alc):
Use local variable instead of sgrowsiz.


# 267059 04-Jun-2014 kib

MFC r266780:
Remove the assert which can be triggered by the userspace.


# 266589 23-May-2014 alc

MFC r265886, r265948
With the new-and-improved vm_fault_copy_entry() (r265843), we can always
avoid soft page faults when adding write access to user wired entries in
vm_map_protect(). Previously, we only avoided the soft page fault when
the underlying pages were copy-on-write. In other words, we avoided the
pages faults that might sleep on page allocation, but not the trivial
page faults to update the physical map.

On a fork allow read-only wired pages to be copy-on-write shared between
the parent and child processes. Previously, we copied these pages even
though they are read only. However, the reason for copying them is
historical and no longer exists. In recent times, vm_map_protect() has
developed the ability to copy pages when write access is added to wired
copy-on-write pages. So, in this case, copy-on-write sharing of wired
pages is not to be feared. It is not going to lead to copy-on-write
faults on wired memory.


# 266582 23-May-2014 kib

MFC r266464:
In execve(2), postpone the free of old vmspace until the threads are resumed
and exited.


# 266315 17-May-2014 alc

MFC r265850
About 9% of the pmap_protect() calls being performed by
vm_map_copy_entry() are unnecessary.
Eliminate the unnecessary calls.


# 266302 17-May-2014 kib

MFC r265825:
When printing the map with the ddb 'show procvm' command, do not dump
page queues for the backing objects.


# 266299 17-May-2014 kib

MFC r265824:
Print the entry address in addition to the object.


# 263684 24-Mar-2014 kib

MFC r263471:
Initialize vm_map_entry member wiring_thread on the map entry creation.


# 260081 30-Dec-2013 kib

MFC r259951:
Do not coalesce stack entry. Pass MAP_STACK_GROWS_DOWN and
MAP_STACK_GROWS_UP flags to vm_map_insert() from vm_map_stack()


# 259299 13-Dec-2013 kib

MFC r258367:
Verify for zero-length requests and act as if it is always successfull
without performing any action on the address space.


# 259297 13-Dec-2013 kib

MFC r258366:
Add assertions to cover all places in the wiring and unwiring code
where MAP_ENTRY_IN_TRANSITION is set or cleared.


# 256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


# 255793 22-Sep-2013 alc

Both the vm_map and vmspace zones are defined as "no free". So, there is no
point in defining a fini function for these zones.

Reviewed by: kib
Approved by: re (glebius)
Sponsored by: EMC / Isilon Storage Division


# 255732 20-Sep-2013 neel

Merge the following changes from projects/bhyve_npt_pmap:
- add fields to 'struct pmap' that are required to manage nested page tables.
- add a parameter to 'vmspace_alloc()' that can be used to override the
default pmap initialization routine 'pmap_pinit()'.

These changes are pushed ahead of the remaining changes in 'bhyve_npt_pmap'
in anticipation of the upcoming KBI freeze for 10.0.

Reviewed by: kib@, alc@
Approved by: re (glebius)


# 255426 09-Sep-2013 jhb

Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping use
an address in the first 2GB of the process's address space. This flag should
have the same semantics as the same flag on Linux.

To facilitate this, add a new parameter to vm_map_find() that specifies an
optional maximum virtual address. While here, fix several callers of
vm_map_find() to use a VMFS_* constant for the findspace argument instead of
TRUE and FALSE.

Reviewed by: alc
Approved by: re (kib)


# 255028 29-Aug-2013 alc

Significantly reduce the cost, i.e., run time, of calls to madvise(...,
MADV_DONTNEED) and madvise(..., MADV_FREE). Specifically, introduce a new
pmap function, pmap_advise(), that operates on a range of virtual addresses
within the specified pmap, allowing for a more efficient implementation of
MADV_DONTNEED and MADV_FREE. Previously, the implementation of
MADV_DONTNEED and MADV_FREE relied on per-page pmap operations, such as
pmap_clear_reference(). Intuitively, the problem with this implementation
is that the pmap-level locks are acquired and released and the page table
traversed repeatedly, once for each resident page in the range
that was specified to madvise(2). A more subtle flaw with the previous
implementation is that pmap_clear_reference() would clear the reference bit
on all mappings to the specified page, not just the mapping in the range
specified to madvise(2).

Since our malloc(3) makes heavy use of madvise(2), this change can have a
measureable impact. For example, the system time for completing a parallel
"buildworld" on a 6-core amd64 machine was reduced by about 1.5% to 2.0%.

Note: This change only contains pmap_advise() implementations for a subset
of our supported architectures. I will commit implementations for the
remaining architectures after further testing. For now, a stub function is
sufficient because of the advisory nature of pmap_advise().

Discussed with: jeff, jhb, kib
Tested by: pho (i386), marcel (ia64)
Sponsored by: EMC / Isilon Storage Division


# 254667 22-Aug-2013 kib

Revert r254501. Instead, reuse the type stability of the struct pmap
which is the part of struct vmspace, allocated from UMA_ZONE_NOFREE
zone. Initialize the pmap lock in the vmspace zone init function, and
remove pmap lock initialization and destruction from pmap_pinit() and
pmap_release().

Suggested and reviewed by: alc (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation


# 254430 16-Aug-2013 jhb

Add new mmap(2) flags to permit applications to request specific virtual
address alignment of mappings.
- MAP_ALIGNED(n) requests a mapping aligned on a boundary of (1 << n).
Requests for n >= number of bits in a pointer or less than the size of
a page fail with EINVAL. This matches the API provided by NetBSD.
- MAP_ALIGNED_SUPER is a special case of MAP_ALIGNED. It can be used
to optimize the chances of using large pages. By default it will align
the mapping on a large page boundary (the system is free to choose any
large page size to align to that seems best for the mapping request).
However, if the object being mapped is already using large pages, then
it will align the virtual mapping to match the existing large pages in
the object instead.
- Internally, VMFS_ALIGNED_SPACE is now renamed to VMFS_SUPER_SPACE, and
VMFS_ALIGNED_SPACE(n) is repurposed for specifying a specific alignment.
MAP_ALIGNED(n) maps to using VMFS_ALIGNED_SPACE(n), while
MAP_ALIGNED_SUPER maps to VMFS_SUPER_SPACE.
- mmap() of a device object now uses VMFS_OPTIMAL_SPACE rather than
explicitly using VMFS_SUPER_SPACE. All device objects are forced to
use a specific color on creation, so VMFS_OPTIMAL_SPACE is effectively
equivalent.

Reviewed by: alc
MFC after: 1 month


# 254025 07-Aug-2013 jeff

Replace kernel virtual address space allocation with vmem. This provides
transparent layering and better fragmentation.

- Normalize functions that allocate memory to use kmem_*
- Those that allocate address space are named kva_*
- Those that operate on maps are named kmap_*
- Implement recursive allocation handling for kmem_arena in vmem.

Reviewed by: alc
Tested by: pho
Sponsored by: EMC / Isilon Storage Division


# 253636 25-Jul-2013 kientzle

Clear entire map structure including locks so that the
locks don't accidentally appear to have been already
initialized.

In particular, this fixes a consistent kernel crash on
armv6 with:
panic: lock "vm map (user)" 0xc09cc050 already initialized
that appeared with r251709.

PR: arm/180820


# 253471 19-Jul-2013 jhb

Be more aggressive in using superpages in all mappings of objects:
- Add a new address space allocation method (VMFS_OPTIMAL_SPACE) for
vm_map_find() that will try to alter the alignment of a mapping to match
any existing superpage mappings of the object being mapped. If no
suitable address range is found with the necessary alignment,
vm_map_find() will fall back to using the simple first-fit strategy
(VMFS_ANY_SPACE).
- Change mmap() without MAP_FIXED, shmat(), and the GEM mapping ioctl to
use VMFS_OPTIMAL_SPACE instead of VMFS_ANY_SPACE.

Reviewed by: alc (earlier version)
MFC after: 2 weeks


# 253190 11-Jul-2013 kib

The mlockall() or VM_MAP_WIRE_HOLESOK does not interact properly with
parallel creation of the map entries, e.g. by mmap() or stack growing.
It also breaks when other entry is wired in parallel.

The vm_map_wire() iterates over the map entries in the region, and
assumes that map entries it finds are marked as in transition before,
also that any entry marked as in transition, are marked by the current
invocation of vm_map_wire(). This is not true for new entries in the
holes.

Add the thread owner of the MAP_ENTRY_IN_TRANSITION flag to struct
vm_map_entry. In vm_map_wire() and vm_map_unwire(), only process the
entries which transition owner is the current thread.

Reported and tested by: pho
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 251901 18-Jun-2013 des

Fix a bug that allowed a tracing process (e.g. gdb) to write
to a memory-mapped file in the traced process's address space
even if neither the traced process nor the tracing process had
write access to that file.

Security: CVE-2013-2171
Security: FreeBSD-SA-13:06.mmap
Approved by: so


# 250884 21-May-2013 attilio

o Relax locking assertions for vm_page_find_least()
o Relax locking assertions for pmap_enter_object() and add them also
to architectures that currently don't have any
o Introduce VM_OBJECT_LOCK_DOWNGRADE() which is basically a downgrade
operation on the per-object rwlock
o Use all the mechanisms above to make vm_map_pmap_enter() to work
mostl of the times only with readlocks.

Sponsored by: EMC / Isilon storage division
Reviewed by: alc


# 249303 09-Apr-2013 kib

Fix the assertions for the state of the object under the map entry
with the MAP_ENTRY_VN_WRITECNT flag:
- Move the assertion that verifies the state of the v_writecount and
vnp.writecount, under the block where the object is locked.
- Check that the object type is OBJT_VNODE before asserting.

Reported by: avg
Reviewed by: alc
MFC after: 1 week


# 248084 09-Mar-2013 attilio

Switch the vm_object mutex to be a rwlock. This will enable in the
future further optimizations where the vm_object lock will be held
in read mode most of the time the page cache resident pool of pages
are accessed for reading purposes.

The change is mostly mechanical but few notes are reported:
* The KPI changes as follow:
- VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK()
- VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK()
- VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK()
- VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED()
(in order to avoid visibility of implementation details)
- The read-mode operations are added:
VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(),
VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED()
* The vm/vm_pager.h namespace pollution avoidance (forcing requiring
sys/mutex.h in consumers directly to cater its inlining functions
using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h
consumers now must include also sys/rwlock.h.
* zfs requires a quite convoluted fix to include FreeBSD rwlocks into
the compat layer because the name clash between FreeBSD and solaris
versions must be avoided.
At this purpose zfs redefines the vm_object locking functions
directly, isolating the FreeBSD components in specific compat stubs.

The KPI results heavilly broken by this commit. Thirdy part ports must
be updated accordingly (I can think off-hand of VirtualBox, for example).

Sponsored by: EMC / Isilon storage division
Reviewed by: jeff
Reviewed by: pjd (ZFS specific review)
Discussed with: alc
Tested by: pho


# 247360 26-Feb-2013 attilio

Merge from vmc-playground branch:
Replace the sub-optimal uma_zone_set_obj() primitive with more modern
uma_zone_reserve_kva(). The new primitive reserves before hand
the necessary KVA space to cater the zone allocations and allocates pages
with ALLOC_NOOBJ. More specifically:
- uma_zone_reserve_kva() does not need an object to cater the backend
allocator.
- uma_zone_reserve_kva() can cater M_WAITOK requests, in order to
serve zones which need to do uma_prealloc() too.
- When possible, uma_zone_reserve_kva() uses directly the direct-mapping
by uma_small_alloc() rather than relying on the KVA / offset
combination.

The removal of the object attribute allows 2 further changes:
1) _vm_object_allocate() becomes static within vm_object.c
2) VM_OBJECT_LOCK_INIT() is removed. This function is replaced by
direct calls to mtx_init() as there is no need to export it anymore
and the calls aren't either homogeneous anymore: there are now small
differences between arguments passed to mtx_init().

Sponsored by: EMC / Isilon storage division
Reviewed by: alc (which also offered almost all the comments)
Tested by: pho, jhb, davide


# 245421 14-Jan-2013 zont

- Get rid of unused function vmspace_wired_count().

Reviewed by: alc
Approved by: kib (mentor)
MFC after: 1 week


# 245255 10-Jan-2013 zont

- Reduce kernel size by removing unnecessary pointer indirections.

GENERIC kernel size reduced in 16 bytes and RACCT kernel in 336 bytes.

Suggested by: alc
Reviewed by: alc
Approved by: kib (mentor)
MFC after: 1 week


# 244384 18-Dec-2012 zont

- Fix locked memory accounting for maps with MAP_WIREFUTURE flag.
- Add sysctl vm.old_mlock which may turn such accounting off.

Reviewed by: avg, trasz
Approved by: kib (mentor)
MFC after: 1 week


# 244043 08-Dec-2012 alc

In the past four years, we've added two new vm object types. Each time,
similar changes had to be made in various places throughout the machine-
independent virtual memory layer to support the new vm object type.
However, in most of these places, it's actually not the type of the vm
object that matters to us but instead certain attributes of its pages.
For example, OBJT_DEVICE, OBJT_MGTDEVICE, and OBJT_SG objects contain
fictitious pages. In other words, in most of these places, we were
testing the vm object's type to determine if it contained fictitious (or
unmanaged) pages.

To both simplify the code in these places and make the addition of future
vm object types easier, this change introduces two new vm object flags
that describe attributes of the vm object's pages, specifically, whether
they are fictitious or unmanaged.

Reviewed and tested by: kib


# 243529 25-Nov-2012 alc

Make a few small changes to vm_map_pmap_enter():

Add detail to the comment describing this function. In particular,
describe what MAP_PREFAULT_PARTIAL does.

Eliminate the abrupt change in behavior when the specified address range
grows from MAX_INIT_PT pages to MAX_INIT_PT plus one pages. Instead of
doing nothing, i.e., preloading no mappings whatsoever, map any resident
pages that fall within the start of the specified address range, i.e.,
[addr, addr + ulmin(size, ptoa(MAX_INIT_PT))).

Long ago, the vm object's list of resident pages was not ordered, so
this function had to choose between probing the global hash table of
all resident pages and iterating over the vm object's unordered list of
resident pages. Now, the list is ordered, so there is no reason for
MAP_PREFAULT_PARTIAL to be concerned with the vm object's count of
resident changes.

MFC after: 14 days


# 242903 11-Nov-2012 attilio

Fix DDB command "show map XXX":
- Check that an argument is always available, otherwise current map
printing before to recurse is garbage.
- Spit out a message if an argument is not provided.
- Remove unread nlines variable.
- Use an explicit recursive function, disassociated from the
DB_SHOW_COMMAND() body, in order to make clear prototype and recursion
of the above mentioned function. The code results now much less
obscure.

Submitted by: gianni


# 240069 03-Sep-2012 zont

- After r240026 sgrowsiz should be used in a safer maner.

Approved by: kib (mentor)
MCF after: 1 week


# 237623 27-Jun-2012 alc

Add new pmap layer locks to the predefined lock order. Change the names
of a few existing VM locks to follow a consistent naming scheme.


# 237334 20-Jun-2012 jhb

Move the per-thread deferred user map entries list into a private list
in vm_map_process_deferred() which is then iterated to release map entries.
This avoids having a nested vm map unlock operation called from the loop
body attempt to recuse into vm_map_process_deferred(). This can happen if
the vm_map_remove() triggers the OOM killer.

Reviewed by: alc, kib
MFC after: 1 week


# 236848 10-Jun-2012 kib

Use the previous stack entry protection and max protection to correctly
propagate the stack execution permissions when stack is grown down.

First, curproc->p_sysent->sv_stackprot specifies maximum allowed stack
protection for current ABI, so the new stack entry was typically marked
executable always. Second, for non-main stack MAP_STACK mapping,
the PROT_ flags should be used which were specified at the mmap(2) call
time, and not sv_stackprot.

MFC after: 1 week


# 235230 10-May-2012 alc

Give vm_fault()'s sequential access optimization a makeover.

There are two aspects to the sequential access optimization: (1) read ahead
of pages that are expected to be accessed in the near future and (2) unmap
and cache behind of pages that are not expected to be accessed again. This
revision changes both aspects.

The read ahead optimization is now more effective. It starts with the same
initial read window as before, but arithmetically grows the window on
sequential page faults. This can yield increased read bandwidth. For
example, on one of my machines, a program using mmap() to read a file that
is several times larger than the machine's physical memory takes about 17%
less time to complete.

The unmap and cache behind optimization is now more selectively applied.
The read ahead window must grow to its maximum size before unmap and cache
behind is performed. This significantly reduces the number of times that
pages are unmapped and cached only to be reactivated a short time later.

The unmap and cache behind optimization now clears each page's referenced
flag. Previously, in the case of dirty pages, if the containing file was
still mapped at the time that the page daemon examined the dirty pages,
they would be reactivated.

From a stylistic standpoint, this revision also cleanly separates the
implementation of the read ahead and unmap/cache behind optimizations.

Glanced at: kib
MFC after: 2 weeks


# 233191 19-Mar-2012 jhb

Fix madvise(MADV_WILLNEED) to properly handle individual mappings larger
than 4GB. Specifically, the inlined version of 'ptoa' of the the 'int'
count of pages overflowed on 64-bit platforms. While here, change
vm_object_madvise() to accept two vm_pindex_t parameters (start and end)
rather than a (start, count) tuple to match other VM APIs as suggested
by alc@.


# 233100 17-Mar-2012 kib

In vm_object_page_clean(), do not clean OBJ_MIGHTBEDIRTY object flag
if the filesystem performed short write and we are skipping the page
due to this.

Propogate write error from the pager back to the callers of
vm_pageout_flush(). Report the failure to write a page from the
requested range as the FALSE return value from vm_object_page_clean(),
and propagate it back to msync(2) to return EIO to usermode.

While there, convert the clearobjflags variable in the
vm_object_page_clean() and arguments of the helper functions to
boolean.

PR: kern/165927
Reviewed by: alc
MFC after: 2 weeks


# 232160 25-Feb-2012 alc

Simplify vmspace_fork()'s control flow by copying immutable data before
the vm map locks are acquired. Also, eliminate redundant initialization
of the new vm map's timestamp.

Reviewed by: kib
MFC after: 3 weeks


# 232071 23-Feb-2012 kib

Account the writeable shared mappings backed by file in the vnode
v_writecount. Keep the amount of the virtual address space used by
the mappings in the new vm_object un_pager.vnp.writemappings
counter. The vnode v_writecount is incremented when writemappings gets
non-zero value, and decremented when writemappings is returned to
zero.

Writeable shared vnode-backed mappings are accounted for in vm_mmap(),
and vm_map_insert() is instructed to set MAP_ENTRY_VN_WRITECNT flag on
the created map entry. During deferred map entry deallocation,
vm_map_process_deferred() checks for MAP_ENTRY_VN_WRITECOUNT and
decrements writemappings for the vm object.

Now, the writeable mount cannot be demoted to read-only while
writeable shared mappings of the vnodes from the mount point
exist. Also, execve(2) fails for such files with ETXTBUSY, as it
should be.

Noted by: tegge
Reviewed by: tegge (long time ago, early version), alc
Tested by: pho
MFC after: 3 weeks


# 231526 11-Feb-2012 kib

Close a race due to dropping of the map lock between creating map entry
for a shared mapping and marking the entry for inheritance.
Other thread might execute vmspace_fork() in between (e.g. by fork(2)),
resulting in the mapping becoming private.

Noted and reviewed by: alc
MFC after: 1 week


# 227788 21-Nov-2011 attilio

Introduce the same mutex-wise fix in r227758 for sx locks.

The functions that offer file and line specifications are:
- sx_assert_
- sx_downgrade_
- sx_slock_
- sx_slock_sig_
- sx_sunlock_
- sx_try_slock_
- sx_try_xlock_
- sx_try_upgrade_
- sx_unlock_
- sx_xlock_
- sx_xlock_sig_
- sx_xunlock_

Now vm_map locking is fully converted and can avoid to know specifics
about locking procedures.
Reviewed by: kib
MFC after: 1 month


# 227758 20-Nov-2011 attilio

Introduce macro stubs in the mutex implementation that will be always
defined and will allow consumers, willing to provide options, file and
line to locking requests, to not worry about options redefining the
interfaces.
This is typically useful when there is the need to build another
locking interface on top of the mutex one.

The introduced functions that consumers can use are:
- mtx_lock_flags_
- mtx_unlock_flags_
- mtx_lock_spin_flags_
- mtx_unlock_spin_flags_
- mtx_assert_
- thread_lock_flags_

Spare notes:
- Likely we can get rid of all the 'INVARIANTS' specification in the
ppbus code by using the same macro as done in this patch (but this is
left to the ppbus maintainer)
- all the other locking interfaces may require a similar cleanup, where
the most notable case is sx which will allow a further cleanup of
vm_map locking facilities
- The patch should be fully compatible with older branches, thus a MFC
is previewed (infact it uses all the underlying mechanisms already
present).

Comments review by: eadler, Ben Kaduk
Discussed with: kib, jhb
MFC after: 1 month


# 223825 06-Jul-2011 trasz

All the racct_*() calls need to happen with the proc locked. Fixing this
won't happen before 9.0. This commit adds "#ifdef RACCT" around all the
"PROC_LOCK(p); racct_whatever(p, ...); PROC_UNLOCK(p)" instances, in order
to avoid useless locking/unlocking in kernels built without "options RACCT".


# 223677 29-Jun-2011 alc

Add a new option, OBJPR_NOTMAPPED, to vm_object_page_remove(). Passing this
option to vm_object_page_remove() asserts that the specified range of pages
is not mapped, or more precisely that none of these pages have any managed
mappings. Thus, vm_object_page_remove() need not call pmap_remove_all() on
the pages.

This change not only saves time by eliminating pointless calls to
pmap_remove_all(), but it also eliminates an inconsistency in the use of
pmap_remove_all() versus related functions, like pmap_remove_write(). It
eliminates harmless but pointless calls to pmap_remove_all() that were being
performed on PG_UNMANAGED pages.

Update all of the existing assertions on pmap_remove_all() to reflect this
change.

Reviewed by: kib


# 220373 05-Apr-2011 trasz

Add accounting for most of the memory-related resources.

Sponsored by: The FreeBSD Foundation
Reviewed by: kib (earlier version)


# 219819 21-Mar-2011 jeff

- Merge changes to the base system to support OFED. These include
a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND,
and other miscellaneous small features.


# 218304 04-Feb-2011 alc

Since the last parameter to vm_object_shadow() is a vm_size_t and not a
vm_pindex_t, it makes no sense for its callers to perform atop(). Let
vm_object_shadow() do that instead.


# 218070 29-Jan-2011 alc

Reenable the call to vm_map_simplify_entry() from vm_map_insert() for non-
MAP_STACK_* entries. (See r71983 and r74235.)

In some cases, performing this call to vm_map_simplify_entry() halves the
number of vm map entries used by the Sun JDK.


# 216335 09-Dec-2010 mlaier

Fix a long standing (from the original 4.4BSD lite sources) race between
vmspace_fork and vm_map_wire that would lead to "vm_fault_copy_wired: page
missing" panics. While faulting in pages for a map entry that is being
wired down, mark the containing map as busy. In vmspace_fork wait until the
map is unbusy, before we try to copy the entries.

Reviewed by: kib
MFC after: 5 days
Sponsored by: Isilon Systems, Inc.


# 216128 02-Dec-2010 trasz

Replace pointer to "struct uidinfo" with pointer to "struct ucred"
in "struct vm_object". This is required to make it possible to account
for per-jail swap usage.

Reviewed by: kib@
Tested by: pho@
Sponsored by: FreeBSD Foundation


# 215307 14-Nov-2010 kib

Implement a (soft) stack guard page for auto-growing stack mappings.
The unmapped page separates the tip of the stack and possible adjanced
segment, making some uses of stack overflow harder. The stack growing
code refuses to expand the segment to the last page of the reseved
region when sysctl security.bsd.stack_guard_page is set to 1. The
default value for sysctl and accompanying tunable is 0.

Please note that mmap(MAP_FIXED) still can place a mapping right up to
the stack, making continuous region.

Reviewed by: alc
MFC after: 1 week


# 214953 07-Nov-2010 alc

In case the stack size reaches its limit and its growth must be restricted,
ensure that grow_amount is a multiple of the page size. Otherwise, the
kernel may crash in swap_reserve_by_uid() on HEAD and FreeBSD 8.x, and
produce a core file with a missing stack on FreeBSD 7.x.

Diagnosed and reported by: jilles
Reviewed by: kib
MFC after: 1 week


# 214144 21-Oct-2010 jhb

- Make 'vm_refcnt' volatile so that compilers won't be tempted to treat
its value as a loop invariant. Currently this is a no-op because
'atomic_cmpset_int()' clobbers all memory on current architectures.
- Use atomic_fetchadd_int() instead of an atomic_cmpset_int() loop to drop
a reference in vmspace_free().

Reviewed by: alc
MFC after: 1 month


# 213408 04-Oct-2010 alc

If vm_map_find() is asked to allocate a superpage-aligned region of virtual
addresses that is greater than a superpage in size but not a multiple of
the superpage size, then vm_map_find() is not always expanding the kernel
pmap to support the last few small pages being allocated. These failures
are not commonplace, so this was first noticed by someone porting FreeBSD
to a new architecture. Previously, we grew the kernel page table in
vm_map_findspace() when we found the first available virtual address.
This works most of the time because we always grow the kernel pmap or page
table by an amount that is a multiple of the superpage size. Now, instead,
we defer the call to pmap_growkernel() until we are committed to a range
of virtual addresses in vm_map_insert(). In general, there is another
reason to prefer calling pmap_growkernel() in vm_map_insert(). It makes
it possible for someone to do the equivalent of an mmap(MAP_FIXED) on the
kernel map.

Reported by: Svatopluk Kraus
Reviewed by: kib@
MFC after: 3 weeks


# 212868 19-Sep-2010 alc

Make refinements to r212824. In particular, don't make
vm_map_unlock_nodefer() part of the synchronization interface for maps.

Add comments to vm_map_unlock_and_wait() and vm_map_wakeup() describing
how they should be used. In particular, describe the deferred deallocations
issue with vm_map_unlock_and_wait().

Redo the implementation of vm_map_unlock_and_wait() so that it passes
along the caller's file and line information, just like the other map
locking primitives.

Reviewed by: kib
X-MFC after: r212824


# 212824 18-Sep-2010 kib

Adopt the deferring of object deallocation for the deleted map entries
on map unlock to the lock downgrade and later read unlock operation.

System map entries cannot be backed by OBJT_VNODE objects, no need to
defer deallocation for them. Map entries from user maps do not require
the owner map for deallocation, and can be accumulated in the
thread-local list for freeing when a user map is unlocked.

Move the collection of entries for deferred reclamation into
vm_map_delete(). Create helper vm_map_process_deferred(), that is
called from locations where processing is feasible. Do not process
deferred entries in vm_map_unlock_and_wait() since map_sleep_mtx is
held.

Reviewed by: alc, rstone (previous versions)
Tested by: pho
MFC after: 2 weeks


# 209685 04-Jul-2010 kib

Introduce a helper function vm_page_find_least(). Use it in several places,
which inline the function.

Reviewed by: alc
Tested by: pho
MFC after: 1 week


# 208574 26-May-2010 alc

Push down page queues lock acquisition in pmap_enter_object() and
pmap_is_referenced(). Eliminate the corresponding page queues lock
acquisitions from vm_map_pmap_enter() and mincore(), respectively. In
mincore(), this allows some additional cases to complete without ever
acquiring the page queues lock.

Assert that the page is managed in pmap_is_referenced().

On powerpc/aim, push down the page queues lock acquisition from
moea*_is_modified() and moea*_is_referenced() into moea*_query_bit().
Again, this will allow some additional cases to complete without ever
acquiring the page queues lock.

Reorder a few statements in vm_page_dontneed() so that a race can't lead
to an old reference persisting. This scenario is described in detail by a
comment.

Correct a spelling error in vm_page_dontneed().

Assert that the object is locked in vm_page_clear_dirty(), and restrict the
page queues lock assertion to just those cases in which the page is
currently writeable.

Add object locking to vnode_pager_generic_putpages(). This was the one
and only place where vm_page_clear_dirty() was being called without the
object being locked.

Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call
to vm_page_clear_dirty().

Change vnode_pager_generic_putpages() to the modern-style of function
definition. Also, change the name of one of the parameters to follow
virtual memory system naming conventions.

Reviewed by: kib


# 207487 01-May-2010 alc

Correct an error of omission in r206819. If VMFS_TLB_ALIGNED_SPACE is
specified to vm_map_find(), then retry the vm_map_findspace() if
vm_map_insert() fails because the aligned space is already partly used.

Reported by: Neel Natu


# 206819 18-Apr-2010 jmallett

o) Add a VM find-space option, VMFS_TLB_ALIGNED_SPACE, which searches the
address space for an address as aligned by the new pmap_align_tlb()
function, which is for constraints imposed by the TLB. [1]
o) Add a kmem_alloc_nofault_space() function, which acts like
kmem_alloc_nofault() but allows the caller to specify which find-space
option to use. [1]
o) Use kmem_alloc_nofault_space() with VMFS_TLB_ALIGNED_SPACE to allocate the
kernel stack address on MIPS. [1]
o) Make pmap_align_tlb() on MIPS align addresses so that they do not start on
an odd boundary within the TLB, so that they are suitable for insertion as
wired entries and do not have to share a TLB entry with another mapping,
assuming they are appropriately-sized.
o) Eliminate md_realstack now that the kstack will be appropriately-aligned on
MIPS.
o) Increase the number of guard pages to 2 so that we retain the proper
alignment of the kstack address.

Reviewed by: [1] alc
X-MFC-after: Making sure alc has not come up with a better interface.


# 206142 03-Apr-2010 alc

Make _vm_map_init() the one place where the vm map's pmap field is
initialized.

Reviewed by: kib


# 206140 03-Apr-2010 alc

Re-enable the call to pmap_release() by vmspace_dofree(). The accounting
problem that is described in the comment has been addressed.

Submitted by: kib
Tested by: pho (a few months ago)
MFC after: 6 weeks


# 203175 29-Jan-2010 kib

The MAP_ENTRY_NEEDS_COPY flag belongs to protoeflags, cow variable
uses different namespace.

Reported by: Jonathan Anderson <jonathan.anderson cl cam ac uk>
MFC after: 3 days


# 199819 26-Nov-2009 alc

Replace VM_PROT_OVERRIDE_WRITE by VM_PROT_COPY. VM_PROT_OVERRIDE_WRITE has
represented a write access that is allowed to override write protection.
Until now, VM_PROT_OVERRIDE_WRITE has been used to write breakpoints into
text pages. Text pages are not just write protected but they are also
copy-on-write. VM_PROT_OVERRIDE_WRITE overrides the write protection on the
text page and triggers the replication of the page so that the breakpoint
will be written to a private copy. However, here is where things become
confused. It is the debugger, not the process being debugged that requires
write access to the copied page. Nonetheless, the copied page is being
mapped into the process with write access enabled. In other words, once the
debugger sets a breakpoint within a text page, the program can write to its
private copy of that text page. Whereas prior to setting the breakpoint, a
SIGSEGV would have occurred upon a write access. VM_PROT_COPY addresses
this problem. The combination of VM_PROT_READ and VM_PROT_COPY forces the
replication of a copy-on-write page even though the access is only for read.
Moreover, the replicated page is only mapped into the process with read
access, and not write access.

Reviewed by: kib
MFC after: 4 weeks


# 199490 18-Nov-2009 alc

Simplify both the invocation and the implementation of vm_fault() for wiring
pages.

(Note: Claims made in the comments about the handling of breakpoints in
wired pages have been false for roughly a decade. This and another bug
involving breakpoints will be fixed in coming changes.)

Reviewed by: kib


# 198812 02-Nov-2009 alc

Avoid pointless calls to pmap_protect().

Reviewed by: kib


# 198505 27-Oct-2009 kib

When protection of wired read-only mapping is changed to read-write,
install new shadow object behind the map entry and copy the pages
from the underlying objects to it. This makes the mprotect(2) call to
actually perform the requested operation instead of silently do nothing
and return success, that causes SIGSEGV on later write access to the
mapping.

Reuse vm_fault_copy_entry() to do the copying, modifying it to behave
correctly when src_entry == dst_entry.

Reviewed by: alc
MFC after: 3 weeks


# 197661 01-Oct-2009 kib

Move the annotation for vm_map_startup() immediately before the function.

MFC after: 3 days


# 195840 24-Jul-2009 jhb

Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to
a device pager (OBJT_DEVICE) object in that it uses fictitious pages to
provide aliases to other memory addresses. The primary difference is that
it uses an sglist(9) to determine the physical addresses for a given offset
into the object instead of invoking the d_mmap() method in a device driver.

Reviewed by: alc
Approved by: re (kensmith)
MFC after: 2 weeks


# 195635 12-Jul-2009 kib

When VM_MAP_WIRE_HOLESOK is not specified and vm_map_wire(9) encounters
non-readable and non-executable map entry, the entry is skipped from
wiring and loop is aborted. But, since MAP_ENTRY_WIRE_SKIPPED was not
set for the map entry, its wired_count is later erronously decremented.
vm_map_delete(9) for such map entry stuck in "vmmaps".

Properly set MAP_ENTRY_WIRE_SKIPPED when aborting the loop.

Reported by: John Marshall <john.marshall riverwillow com au>
Approved by: re (kensmith)


# 195329 03-Jul-2009 kib

When forking a vm space that has wired map entries, do not forget to
charge the objects created by vm_fault_copy_entry. The object charge
was set, but reserve not incremented.

Reported by: Greg Rivers <gcr+freebsd-current tharned org>
Reviewed by: alc (previous version)
Approved by: re (kensmith)


# 194766 23-Jun-2009 kib

Implement global and per-uid accounting of the anonymous memory. Add
rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved
for the uid.

The accounting information (charge) is associated with either map entry,
or vm object backing the entry, assuming the object is the first one
in the shadow chain and entry does not require COW. Charge is moved
from entry to object on allocation of the object, e.g. during the mmap,
assuming the object is allocated, or on the first page fault on the
entry. It moves back to the entry on forks due to COW setup.

The per-entry granularity of accounting makes the charge process fair
for processes that change uid during lifetime, and decrements charge
for proper uid when region is unmapped.

The interface of vm_pager_allocate(9) is extended by adding struct ucred *,
that is used to charge appropriate uid when allocation if performed by
kernel, e.g. md(4).

Several syscalls, among them is fork(2), may now return ENOMEM when
global or per-uid limits are enforced.

In collaboration with: pho
Reviewed by: alc
Approved by: re (kensmith)


# 193842 09-Jun-2009 alc

Eliminate an unnecessary restriction on the vm object type from
vm_map_pmap_enter(). The immediate effect of this change is that automatic
prefaulting by mmap() for small mappings is performed on POSIX shared memory
objects just the same as it is on ordinary files.


# 193643 07-Jun-2009 alc

Eliminate unnecessary obfuscation when testing a page's valid bits.


# 191256 18-Apr-2009 alc

Allow valid pages to be mapped for read access when they have a non-zero
busy count. Only mappings that allow write access should be prevented by
a non-zero busy count.

(The prohibition on mapping pages for read access when they have a non-
zero busy count originated in revision 1.202 of i386/i386/pmap.c when
this code was a part of the pmap.)

Reviewed by: tegge


# 190886 10-Apr-2009 kib

When vm_map_wire(9) is allowed to skip holes in the wired region, skip
the mappings without any of read and execution rights, in particular,
the PROT_NONE entries. This makes mlockall(2) work for the process
address space that has such mappings.

Since protection mode of the entry may change between setting
MAP_ENTRY_IN_TRANSITION and final pass over the region that records
the wire status of the entries, allocate new map entry flag
MAP_ENTRY_WIRE_SKIPPED to mark the skipped PROT_NONE entries.

Reported and tested by: Hans Ottevanger <fbsdhackers beasties demon nl>
Reviewed by: alc
MFC after: 3 weeks


# 189015 24-Feb-2009 kib

Revert the addition of the freelist argument for the vm_map_delete()
function, done in r188334. Instead, collect the entries that shall be
freed, in the deferred_freelist member of the map. Automatically purge
the deferred freelist when map is unlocked.

Tested by: pho
Reviewed by: alc


# 189014 24-Feb-2009 kib

Add the assertion macros for the map locks. Use them in several map
manipulation functions.

Tested by: pho
Reviewed by: alc


# 189012 24-Feb-2009 kib

Update the comment after the r188334.

Reviewed by: alc


# 188335 08-Feb-2009 kib

Improve comments, correct English.

Submitted by: alc


# 188334 08-Feb-2009 kib

Do not call vm_object_deallocate() from vm_map_delete(), because we
hold the map lock there, and might need the vnode lock for OBJT_VNODE
objects. Postpone object deallocation until caller of vm_map_delete()
drops the map lock. Link the map entries to be freed into the freelist,
that is released by the new helper function vm_map_entry_free_freelist().

Reviewed by: tegge, alc
Tested by: pho


# 188333 08-Feb-2009 kib

In vm_map_sync(), do not call vm_object_sync() while holding map lock.
Reference object, drop the map lock, and then call vm_object_sync().
The object sync might require vnode lock for OBJT_VNODE type objects.

Reviewed by: tegge
Tested by: pho


# 188325 08-Feb-2009 kib

Add the comments to vm_map_simplify_entry() and vmspace_fork(),
describing why several calls to vm_deallocate_object() with locked map
do not result in the acquisition of the vnode lock after map lock.

Suggested and reviewed by: tegge


# 188323 08-Feb-2009 kib

Lock the new map in vmspace_fork(). The newly allocated map should not
be accessible outside vmspace_fork() yet, but locking it would satisfy
the protocol of the vm_map_entry_link() and other functions called
from vmspace_fork().

Use trylock that is supposedly cannot fail, to silence WITNESS warning
of the nested acquisition of the sx lock with the same name.

Suggested and reviewed by: tegge


# 188320 08-Feb-2009 kib

Do not leak the MAP_ENTRY_IN_TRANSITION flag when copying map entry
on fork. Otherwise, copied entry cannot be removed in the child map.

Reviewed by: tegge
MFC after: 2 weeks


# 186665 31-Dec-2008 alc

Resurrect shared map locks allowing greater concurrency during some map
operations, such as page faults.

An earlier version of this change was ...

Reviewed by: kib
Tested by: pho
MFC after: 6 weeks


# 186633 31-Dec-2008 alc

Update or eliminate some stale comments.


# 186618 30-Dec-2008 alc

Avoid an unnecessary memory dereference in vm_map_entry_splay().


# 186616 30-Dec-2008 alc

Style change to vm_map_lookup(): Eliminate a macro of dubious value.


# 186609 30-Dec-2008 alc

Move the implementation of the vm map's fast path on address lookup from
vm_map_lookup{,_locked}() to vm_map_lookup_entry(). Having the fast path
in vm_map_lookup{,_locked}() limits its benefits to page faults. Moving
it to vm_map_lookup_entry() extends its benefits to other operations on
the vm map.


# 179921 21-Jun-2008 alc

KERNBASE is not necessarily an address within the kernel map, e.g.,
PowerPC/AIM. Consequently, it should not be used to determine the maximum
number of kernel map entries. Intead, use VM_MIN_KERNEL_ADDRESS, which marks
the start of the kernel map on all architectures.

Tested by: marcel@ (PowerPC/AIM)


# 178928 10-May-2008 alc

Generalize vm_map_find(9)'s parameter "find_space". Specifically, add
support for VMFS_ALIGNED_SPACE, which requests the allocation of an
address range best suited to superpages. The old options TRUE and FALSE
are mapped to VMFS_ANY_SPACE and VMFS_NO_SPACE, so that there is no
immediate need to update all of vm_map_find(9)'s callers.

While I'm here, correct a misstatement about vm_map_find(9)'s return
values in the man page.


# 178630 28-Apr-2008 alc

vm_map_fixed(), unlike vm_map_find(), does not update "addr", so it can be
passed by value.


# 177922 04-Apr-2008 alc

Update a comment to vm_map_pmap_enter().


# 177091 12-Mar-2008 jeff

Remove kernel support for M:N threading.

While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential. Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.


# 175079 04-Jan-2008 kib

In the vm_map_stack(), check for the specified stack region wraparound.

Reported and tested by: Peter Holm
Reviewed by: alc
MFC after: 3 days


# 173429 07-Nov-2007 pjd

Change unused 'user_wait' argument to 'timo' argument, which will be
used to specify timeout for msleep(9).

Discussed with: alc
Reviewed by: alc


# 173361 05-Nov-2007 kib

Fix for the panic("vm_thread_new: kstack allocation failed") and
silent NULL pointer dereference in the i386 and sparc64 pmap_pinit()
when the kmem_alloc_nofault() failed to allocate address space. Both
functions now return error instead of panicing or dereferencing NULL.

As consequence, vmspace_exec() and vmspace_unshare() returns the errno
int. struct vmspace arg was added to vm_forkproc() to avoid dealing
with failed allocation when most of the fork1() job is already done.

The kernel stack for the thread is now set up in the thread_alloc(),
that itself may return NULL. Also, allocation of the first process
thread is performed in the fork1() to properly deal with stack
allocation failure. proc_linkup() is separated into proc_linkup()
called from fork1(), and proc_linkup0(), that is used to set up the
kernel process (was known as swapper).

In collaboration with: Peter Holm
Reviewed by: jhb


# 172863 22-Oct-2007 alc

Correct an error in vm_map_sync(), nee vm_map_clean(), that has existed
since revision 1.1. Specifically, neither traversal of the vm map checks
whether the end of the vm map has been reached. Consequently, the first
traversal can wrap around and bogusly return an error.

This error has gone unnoticed for so long because no one had ever before
tried msync(2)ing a region above the stack.

Reported by: peter
MFC after: 1 week


# 172317 25-Sep-2007 alc

Change the management of cached pages (PQ_CACHE) in two fundamental
ways:

(1) Cached pages are no longer kept in the object's resident page
splay tree and memq. Instead, they are kept in a separate per-object
splay tree of cached pages. However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock. Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.

This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held. Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.

Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case. Cached pages
are reclaimed far, far more often than they are reactivated. Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.

(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.

Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated. Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page. Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.

Discussed with: many over the course of the summer, including jeff@,
Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)


# 171902 20-Aug-2007 kib

Do not drop vm_map lock between doing vm_map_remove() and vm_map_insert().
For this, introduce vm_map_fixed() that does that for MAP_FIXED case.

Dropping the lock allowed for parallel thread to occupy the freed space.

Reported by: Tijl Coosemans <tijl ulyssis org>
Reviewed by: alc
Approved by: re (kensmith)
MFC after: 2 weeks


# 170170 31-May-2007 attilio

Revert VMCNT_* operations introduction.
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.

Requested by: alc
Approved by: jeff (mentor)


# 170149 31-May-2007 attilio

Add functions sx_xlock_sig() and sx_slock_sig().
These functions are intended to do the same actions of sx_xlock() and
sx_slock() but with the difference to perform an interruptible sleep, so
that sleep can be interrupted by external events.
In order to support these new featueres, some code renstruction is needed,
but external API won't be affected at all.

Note: use "void" cast for "int" returning functions in order to avoid tools
like Coverity prevents to whine.

Requested by: rwatson
Tested by: rwatson
Reviewed by: jhb
Approved by: jeff (mentor)


# 169849 22-May-2007 alc

Eliminate the reactivation of cached pages in vm_fault_prefault() and
vm_map_pmap_enter() unless the caller is madvise(MADV_WILLNEED). With
the exception of calls to vm_map_pmap_enter() from
madvise(MADV_WILLNEED), vm_fault_prefault() and vm_map_pmap_enter()
are both used to create speculative mappings. Thus, always
reactivating cached pages is a mistake. In principle, cached pages
should only be reactivated by an actual access. Otherwise, the
following misbehavior can occur. On a hard fault for a text page the
clustering algorithm fetches not only the required page but also
several of the adjacent pages. Now, suppose that one or more of the
adjacent pages are never accessed. Ultimately, these unused pages
become cached pages through the efforts of the page daemon. However,
the next activation of the executable reactivates and maps these
unused pages. Consequently, they are never replaced. In effect, they
become pinned in memory.


# 169667 18-May-2007 jeff

- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating
vmcnts. This can be used to abstract away pcpu details but also changes
to use atomics for all counters now. This means sched lock is no longer
responsible for protecting counts in the switch routines.

Contributed by: Attilio Rao <attilio@FreeBSD.org>


# 169048 26-Apr-2007 alc

Remove some code from vmspace_fork() that became redundant after
revision 1.334 modified _vm_map_init() to initialize the new vm map's
flags to zero.


# 167880 25-Mar-2007 alc

Two small changes to vm_map_pmap_enter():

1) Eliminate an unnecessary check for fictitious pages. Specifically,
only device-backed objects contain fictitious pages and the object is
not device-backed.

2) Change the types of "psize" and "tmpidx" to vm_pindex_t in order to
prevent possible wrap around with extremely large maps and objects,
respectively. Observed by: tegge (last summer)


# 166964 25-Feb-2007 alc

Change the way that unmanaged pages are created. Specifically,
immediately flag any page that is allocated to a OBJT_PHYS object as
unmanaged in vm_page_alloc() rather than waiting for a later call to
vm_page_unmanage(). This allows for the elimination of some uses of
the page queues lock.

Change the type of the kernel and kmem objects from OBJT_DEFAULT to
OBJT_PHYS. This allows us to take advantage of the above change to
simplify the allocation of unmanaged pages in kmem_alloc() and
kmem_malloc().

Remove vm_page_unmanage(). It is no longer used.


# 163594 21-Oct-2006 alc

Eliminate unnecessary PG_BUSY tests. They originally served a purpose
that is now handled by vm object locking.


# 160561 21-Jul-2006 alc

Retire debug.mpsafevm. None of the architectures supported in CVS require
it any longer.


# 159681 17-Jun-2006 alc

Use ptoa(psize) instead of size to compute the end of the mapping in
vm_map_pmap_enter().


# 159620 14-Jun-2006 alc

Correct an error in the previous revision that could lead to a panic:
Found mapped cache page. Specifically, if cnt.v_free_count dips below
cnt.v_free_reserved after p_start has been set to a non-NULL value,
then vm_map_pmap_enter() would break out of the loop and incorrectly
call pmap_enter_object() for the remaining address range. To correct
this error, this revision truncates the address range so that
pmap_enter_object() will not map any cache pages.

In collaboration with: tegge@
Reported by: kris@


# 159303 05-Jun-2006 alc

Introduce the function pmap_enter_object(). It maps a sequence of resident
pages from the same object. Use it in vm_map_pmap_enter() to reduce the
locking overhead of premapping objects.

Reviewed by: tegge@


# 159054 29-May-2006 tegge

Close race between vmspace_exitfree() and exit1() and races between
vmspace_exitfree() and vmspace_free() which could result in the same
vmspace being freed twice.

Factor out part of exit1() into new function vmspace_exit(). Attach
to vmspace0 to allow old vmspace to be freed earlier.

Add new function, vmspace_acquire_ref(), for obtaining a vmspace
reference for a vmspace belonging to another process. Avoid changing
vmspace refcount from 0 to 1 since that could also lead to the same
vmspace being freed twice.

Change vmtotal() and swapout_procs() to use vmspace_acquire_ref().

Reviewed by: alc


# 156420 08-Mar-2006 imp

Remove leading __ from __(inline|const|signed|volatile). They are
obsolete. This should reduce diffs to NetBSD as well.


# 154889 27-Jan-2006 alc

Use the new macros abstracting the page coloring/queues implementation.
(There are no functional changes.)


# 153095 04-Dec-2005 alc

Simplify vmspace_dofree().


# 153068 03-Dec-2005 alc

Eliminate unneeded preallocation at initialization.

Reviewed by: tegge


# 152630 20-Nov-2005 alc

Eliminate pmap_init2(). It's no longer used.


# 149768 03-Sep-2005 alc

Pass a value of type vm_prot_t to pmap_enter_quick() so that it determine
whether the mapping should permit execute access.


# 148193 20-Jul-2005 alc

Eliminate an incorrect (and unnecessary) cast.


# 145788 02-May-2005 alc

Remove GIANT_REQUIRED from vmspace_exec().

Prodded by: jeff


# 140439 18-Jan-2005 alc

Add checks to vm_map_findspace() to test for address wrap. The conditions
where this could occur are very rare, but possible.

Submitted by: Mark W. Krentel
MFC after: 2 weeks


# 139825 07-Jan-2005 imp

/* -> /*- for license, minor formatting changes


# 139241 23-Dec-2004 alc

Modify pmap_enter_quick() so that it expects the page queues to be locked
on entry and it assumes the responsibility for releasing the page queues
lock if it must sleep.

Remove a bogus comment from pmap_enter_quick().

Using the first change, modify vm_map_pmap_enter() so that the page queues
lock is acquired and released once, rather than each time that a page
is mapped.


# 138897 15-Dec-2004 alc

In the common case, pmap_enter_quick() completes without sleeping.
In such cases, the busying of the page and the unlocking of the
containing object by vm_map_pmap_enter() and vm_fault_prefault() is
unnecessary overhead. To eliminate this overhead, this change
modifies pmap_enter_quick() so that it expects the object to be locked
on entry and it assumes the responsibility for busying the page and
unlocking the object if it must sleep. Note: alpha, amd64, i386 and
ia64 are the only implementations optimized by this change; arm,
powerpc, and sparc64 still conservatively busy the page and unlock the
object within every pmap_enter_quick() call.

Additionally, this change is the first case where we synchronize
access to the page's PG_BUSY flag and busy field using the containing
object's lock rather than the global page queues lock. (Modifications
to the page's PG_BUSY flag and busy field have asserted both locks for
several weeks, enabling an incremental transition.)


# 134675 03-Sep-2004 alc

Push Giant deep into vm_forkproc(), acquiring it only if the process has
mapped System V shared memory segments (see shmfork_myhook()) or requires
the allocation of an ldt (see vm_fault_wire()).


# 133807 16-Aug-2004 alc

- Introduce and use a new tunable "debug.mpsafevm". At present, setting
"debug.mpsafevm" results in (almost) Giant-free execution of zero-fill
page faults. (Giant is held only briefly, just long enough to determine
if there is a vnode backing the faulting address.)

Also, condition the acquisition and release of Giant around calls to
pmap_remove() on "debug.mpsafevm".

The effect on performance is significant. On my dual Opteron, I see a
3.6% reduction in "buildworld" time.

- Use atomic operations to update several counters in vm_fault().


# 133796 16-Aug-2004 green

Rather than bringing back all of the changes to make VM map deletion
wait for system wires to disappear, do so (much more trivially) by
instead only checking for system wires of user maps and not kernel maps.

Alternative by: tor
Reviewed by: alc


# 133726 14-Aug-2004 alc

Remove spl calls.


# 133636 13-Aug-2004 alc

Replace the linear search in vm_map_findspace() with an O(log n)
algorithm built into the map entry splay tree. This replaces the
first_free hint in struct vm_map with two fields in vm_map_entry:
adj_free, the amount of free space following a map entry, and
max_free, the maximum amount of free space in the entry's subtree.
These fields make it possible to find a first-fit free region of a
given size in one pass down the tree, so O(log n) amortized using
splay trees.

This significantly reduces the overhead in vm_map_findspace() for
applications that mmap() many hundreds or thousands of regions, and
has a negligible slowdown (0.1%) on buildworld. See, for example, the
discussion of a micro-benchmark titled "Some mmap observations
compared to Linux 2.6/OpenBSD" on -hackers in late October 2003.

OpenBSD adopted this approach in March 2002, and NetBSD added it in
November 2003, both with Red-Black trees.

Submitted by: Mark W. Krentel


# 133598 12-Aug-2004 tegge

The vm map lock is needed in vm_fault() after the page has been found,
to avoid later changes before pmap_enter() and vm_fault_prefault()
has completed.

Simplify deadlock avoidance by not blocking on vm map relookup.

In collaboration with: alc


# 133587 12-Aug-2004 green

Re-delete the comment from r1.352.


# 133435 10-Aug-2004 green

Back out all behavioral chnages.


# 133401 09-Aug-2004 green

Revamp VM map wiring.

* Allow no-fault wiring/unwiring to succeed for consistency;
however, the wired count remains at zero, so it's a special case.

* Fix issues inside vm_map_wire() and vm_map_unwire() where the
exact state of user wiring (one or zero) and system wiring
(zero or more) could be confused; for example, system unwiring
could succeed in removing a user wire, instead of being an
error.

* Require all mappings to be unwired before they are deleted.
When VM space is still wired upon deletion, it will be waited
upon for the following unwire. This makes vslock(9) work
rather than allowing kernel-locked memory to be deleted
out from underneath of its consumer as it would before.


# 133395 09-Aug-2004 alc

Remove a stale comment from vm_map_lookup() that pertains to share maps.
(The last vestiges of the share map code were removed in revisions 1.153
and 1.159.)


# 133143 04-Aug-2004 alc

- Push down the acquisition and release of Giant into pmap_enter_quick()
on those architectures without pmap locking.
- Eliminate the acquisition and release of Giant in vm_map_pmap_enter().


# 132987 01-Aug-2004 green

* Add a "how" argument to uma_zone constructors and initialization functions
so that they know whether the allocation is supposed to be able to sleep
or not.
* Allow uma_zone constructors and initialation functions to return either
success or error. Almost all of the ones in the tree currently return
success unconditionally, but mbuf is a notable exception: the packet
zone constructor wants to be able to fail if it cannot suballocate an
mbuf cluster, and the mbuf allocators want to be able to fail in general
in a MAC kernel if the MAC mbuf initializer fails. This fixes the
panics people are seeing when they run out of memory for mbuf clusters.
* Allow debug.nosleepwithlocks on WITNESS to be disabled, without changing
the default.

Both bmilekic and jeff have reviewed the changes made to make failable
zone allocations work.


# 132899 30-Jul-2004 alc

- Push down the acquisition and release of Giant into pmap_protect() on
those architectures without pmap locking.
- Eliminate the acquisition and release of Giant from vm_map_protect().

(Translation: mprotect(2) runs to completion without touching Giant on
alpha, amd64, i386 and ia64.)


# 132880 30-Jul-2004 mux

Get rid of another lockmgr(9) consumer by using sx locks for the user
maps. We always acquire the sx lock exclusively here, but we can't
use a mutex because we want to be able to sleep while holding the
lock. This is completely equivalent to what we were doing with the
lockmgr(9) locks before.

Approved by: alc


# 132684 27-Jul-2004 alc

- Use atomic ops for updating the vmspace's refcnt and exitingcnt.
- Push down Giant into shmexit(). (Giant is acquired only if the vmspace
contains shm segments.)
- Eliminate the acquisition of Giant from proc_rwmem().
- Reduce the scope of Giant in exit1(), uncovering the destruction of the
address space.


# 132627 25-Jul-2004 alc

Make the code and comments for vm_object_coalesce() consistent.


# 132593 24-Jul-2004 alc

Simplify vmspace initialization. The bcopy() of fields from the old
vmspace to the new vmspace in vmspace_exec() is mostly wasted effort. With
one exception, vm_swrss, the copied fields are immediately overwritten.
Instead, initialize these fields to zero in vmspace_alloc(), eliminating a
bcopy() from vmspace_exec() and a bzero() from vmspace_fork().


# 132483 21-Jul-2004 peter

Semi-gratuitous change. Move two refcount operations to their own lines
rather than be buried inside an if (expression). And now that the if
expression is the same in both exit paths, use the same ordering.


# 132475 20-Jul-2004 peter

Move the initialization and teardown of pmaps to the vmspace zone's
init and fini handlers. Our vm system removes all userland mappings at
exit prior to calling pmap_release. It just so happens that we might
as well reuse the pmap for the next process since the userland slate
has already been wiped clean.

However. There is a functional benefit to this as well. For platforms
that share userland and kernel context in the same pmap, it means that
the kernel portion of a pmap remains valid after the vmspace has been
freed (process exit) and while it is in uma's cache. This is significant
for i386 SMP systems with kernel context borrowing because it avoids
a LOT of IPIs from the pmap_lazyfix() cleanup in the usual case.

Tested on: amd64, i386, sparc64, alpha
Glanced at by: alc


# 132220 15-Jul-2004 alc

Push down the acquisition and release of the page queues lock into
pmap_protect() and pmap_remove(). In general, they require the lock in
order to modify a page's pv list or flags. In some cases, however,
pmap_protect() can avoid acquiring the lock.


# 131252 28-Jun-2004 gallatin

Use MIN() macro rather than ulmin() inline, and fix stray tab
that snuck in with my last commit.

Submitted by: green


# 131251 28-Jun-2004 gallatin

Fix alpha - the use of min() on longs was loosing the high bits and
returning wrong answers, leading to strange values vm2->vm_{s,t,d}size.


# 131073 24-Jun-2004 green

Correct the tracking of various bits of the process's vmspace and vm_map
when not propogated on fork (due to minherit(2)). Consistency checks
otherwise fail when the vm_map is freed and it appears to have not been
emptied completely, causing an INVARIANTS panic in vm_map_zdtor().

PR: kern/68017
Submitted by: Mark W. Krentel <krentel@dreamscape.com>
Reviewed by: alc


# 129728 25-May-2004 des

Back out previous commit; it went to the wrong file.


# 129725 25-May-2004 des

MFS: rev 1.187.2.27 through 1.187.2.29, fix MS_INVALIDATE semantics but
provide a sysctl knob for reverting to old ones.


# 129701 25-May-2004 alc

Correct two error cases in vm_map_unwire():

1. Contrary to the Single Unix Specification our implementation of
munlock(2) when performed on an unwired virtual address range has
returned an error. Correct this. Note, however, that the behavior
of "system" unwiring is unchanged, only "user" unwiring is changed.
If "system" unwiring is performed on an unwired virtual address
range, an error is still returned.

2. Performing an errant "system" unwiring on a virtual address range
that was "user" (i.e., mlock(2)) but not "system" wired would
incorrectly undo the "user" wiring instead of returning an error.
Correct this.

Discussed with: green@
Reviewed by: tegge@


# 129571 22-May-2004 alc

To date, unwiring a fictitious page has produced a panic. The reason
being that PHYS_TO_VM_PAGE() returns the wrong vm_page for fictitious
pages but unwiring uses PHYS_TO_VM_PAGE(). The resulting panic
reported an unexpected wired count. Rather than attempting to fix
PHYS_TO_VM_PAGE(), this fix takes advantage of the properties of
fictitious pages. Specifically, fictitious pages will never be
completely unwired. Therefore, we can keep a fictitious page's wired
count forever set to one and thereby avoid the use of
PHYS_TO_VM_PAGE() when we know that we're working with a fictitious
page, just not which one.

In collaboration with: green@, tegge@
PR: kern/29915


# 129018 06-May-2004 green

Properly remove MAP_FUTUREWIRE when a vm_map_entry gets torn down.
Previously, mlockall(2) usage would leak MAP_FUTUREWIRE of the process's
vmspace::vm_map and subsequent processes would wire all of their memory.
Coupled with a wired-page leak in vm_fault_unwire(), this would run the
system out of free pages and cause programs to randomly SIGBUS when
faulting in new pages.

(Note that this is not the fix for the latter part; pages are still
leaked when a wired area is unmapped in some cases.)

Reviewed by: alc
PR kern/62930


# 128596 24-Apr-2004 alc

In cases where a file was resident in memory mmap(..., PROT_NONE, ...)
would actually map the file with read access enabled. According to
http://www.opengroup.org/onlinepubs/007904975/functions/mmap.html this is
an error. Similarly, an madvise(..., MADV_WILLNEED) would enable read
access on a virtual address range that was PROT_NONE.

The solution implemented herein is (1) to pass a vm_prot_t to
vm_map_pmap_enter() describing the allowed access and (2) to make
vm_map_pmap_enter() responsible for understanding the limitations of
pmap_enter_quick().

Submitted by: "Mark W. Krentel" <krentel@dreamscape.com>
PR: kern/64573


# 127961 06-Apr-2004 imp

Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core


# 127327 23-Mar-2004 tjr

Do not copy vm_exitingcnt to the new vmspace in vmspace_exec(). Copying
it led to impossibly high values in the new vmspace, causing it to never
drop to 0 and be freed.


# 126728 07-Mar-2004 alc

Retire pmap_pinit2(). Alpha was the last platform that used it. However,
ever since alpha/alpha/pmap.c revision 1.81 introduced the list allpmaps,
there has been no reason for having this function on Alpha. Briefly,
when pmap_growkernel() relied upon the list of all processes to find and
update the various pmaps to reflect a growth in the kernel's valid
address space, pmap_init2() served to avoid a race between pmap
initialization and pmap_growkernel(). Specifically, pmap_pinit2() was
responsible for initializing the kernel portions of the pmap and
pmap_pinit2() was called after the process structure contained a pointer
to the new pmap for use by pmap_growkernel(). Thus, an update to the
kernel's address space might be applied to the new pmap unnecessarily,
but an update would never be lost.


# 125748 12-Feb-2004 alc

Further reduce the use of Giant in vm_map_delete(): Perform pmap_remove()
on system maps, besides the kmem_map, without Giant.

In collaboration with: tegge


# 125470 05-Feb-2004 alc

- Locking for the per-process resource limits structure has eliminated
the need for Giant in vm_map_growstack().
- Use the proc * that is passed to vm_map_growstack() rather than
curthread->td_proc.


# 125454 04-Feb-2004 jhb

Locking for the per-process resource limits structure.
- struct plimit includes a mutex to protect a reference count. The plimit
structure is treated similarly to struct ucred in that is is always copy
on write, so having a reference to a structure is sufficient to read from
it without needing a further lock.
- The proc lock protects the p_limit pointer and must be held while reading
limits from a process to keep the limit structure from changing out from
under you while reading from it.
- Various global limits that are ints are not protected by a lock since
int writes are atomic on all the archs we support and thus a lock
wouldn't buy us anything.
- All accesses to individual resource limits from a process are abstracted
behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return
either an rlimit, or the current or max individual limit of the specified
resource from a process.
- dosetrlimit() was renamed to kern_setrlimit() to match existing style of
other similar syscall helper functions.
- The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit()
(it didn't used the stackgap when it should have) but uses lim_rlimit()
and kern_setrlimit() instead.
- The svr4 compat no longer uses the stackgap for resource limits calls,
but uses lim_rlimit() and kern_setrlimit() instead.
- The ibcs2 compat no longer uses the stackgap for resource limits. It
also no longer uses the stackgap for accessing sysctl's for the
ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result,
ibcs2_sysconf() no longer needs Giant.
- The p_rlimit macro no longer exists.

Submitted by: mtm (mostly, I only did a few cleanups and catchups)
Tested on: i386
Compiled on: alpha, amd64


# 125362 02-Feb-2004 jhb

Drop the reference count on the old vmspace after fully switching the
current thread to the new vmspace.

Suggested by: dillon


# 124008 30-Dec-2003 alc

- Modify vm_object_split() to expect a locked vm object on entry and
return on a locked vm object on exit. Remove GIANT_REQUIRED.
- Eliminate some unnecessary local variables from vm_object_split().


# 123878 26-Dec-2003 alc

Minor correction to revision 1.258: Use the proc pointer that is passed to
vm_map_growstack() in the RLIMIT_VMEM check rather than curthread.


# 122902 19-Nov-2003 alc

- Avoid a lock-order reversal between Giant and a system map mutex that
occurs when kmem_malloc() fails to allocate a sufficient number of vm
pages. Specifically, we avoid the lock-order reversal by not grabbing
Giant around pmap_remove() if the map is the kmem_map.

Approved by: re (jhb)
Reported by: Eugene <eugene3@web.de>


# 122646 14-Nov-2003 alc

Changes to msync(2)
- Return EBUSY if the region was wired by mlock(2) and MS_INVALIDATE
is specified to msync(2). This is required by the Open Group Base
Specifications Issue 6.
- vm_map_sync() doesn't return KERN_FAILURE. Thus, msync(2) can't
possibly return EIO.
- The second major loop in vm_map_sync() handles sub maps. Thus,
failing on sub maps in the first major loop isn't necessary.


# 122384 09-Nov-2003 alc

- The Open Group Base Specifications Issue 6 specifies that an munmap(2)
must return EINVAL if size is zero. Submitted by: tegge
- In order to avoid a race condition in multithreaded applications, the
check and removal operations by munmap(2) must be in the same critical
section. To accomodate this, vm_map_check_protection() is modified to
require its caller to obtain at least a read lock on the map.


# 122367 09-Nov-2003 alc

- Remove Giant from msync(2). Giant is still acquired by the lower layers
if we drop into the pmap or vnode layers.
- Migrate the handling of zero-length msync(2)s into vm_map_sync() so that
multithread applications can't change the map between implementing the
zero-length hack in msync(2) and reacquiring the map lock in
vm_map_sync().

Reviewed by: tegge


# 122349 09-Nov-2003 alc

- Rename vm_map_clean() to vm_map_sync(). This better reflects the fact
that msync(2) is its only caller.
- Migrate the parts of the old vm_map_clean() that examined the internals
of a vm object to a new function vm_object_sync() that is implemented in
vm_object.c. At the same, introduce the necessary vm object locking so
that vm_map_sync() and vm_object_sync() can be called without Giant.

Reviewed by: tegge


# 122095 05-Nov-2003 alc

- Move the implementation of OBJ_ONEMAPPING from vm_map_delete() to
vm_map_entry_delete() so that all of the vm object manipulation is
performed in one place.


# 122034 04-Nov-2003 marcel

Update avail_ssize for rstacks after growing them.


# 121962 03-Nov-2003 des

Whitespace cleanup.


# 121919 02-Nov-2003 alc

- Increase the scope of the source object lock in vm_map_copy_entry().


# 121907 02-Nov-2003 alc

- Introduce and use vm_object_reference_locked(). Unlike
vm_object_reference(), this function must not be used to reanimate dead
vm objects. This restriction simplifies locking.

Reviewed by: tegge


# 121786 31-Oct-2003 marcel

Fix two bugs introduced with the rstack functionality and specific to
the rstack functionality:
1. Fix a KASSERT that tests for the address to be above the upward
growable stack. Typically for rstack, the faulting address can be
identical to the record end of the upward growable entry, and
very likely is on ia64. The KASSERT tested for greater than, not
greater equal, so whenever the register stack had to be grown
the assertion fired.
2. When we grow the upward growable stack entry and adjust the
unlying object, don't forget to adjust the size of the VM map.
Not doing so would trigger an assert in vm_mapzdtor().

Pointy hat: marcel (for not testing with INVARIANTS).


# 121221 18-Oct-2003 alc

Corrections to revision 1.305
- Specifying VM_MAP_WIRE_HOLESOK should not assume that the start
address is the beginning of the map. Instead, move to the first
entry after the start address.
- The implementation of VM_MAP_WIRE_HOLESOK was incomplete. This
caused the failure of mlockall(2) in some circumstances.


# 120831 05-Oct-2003 bms

Move pmap_resident_count() from the MD pmap.h to the MI pmap.h.
Add a definition of pmap_wired_count().
Add a definition of vmspace_wired_count().

Reviewed by: truckman
Discussed with: peter


# 120531 27-Sep-2003 marcel

Part 2 of implementing rstacks: add the ability to create rstacks and
use the ability on ia64 to map the register stack. The orientation of
the stack (i.e. its grow direction) is passed to vm_map_stack() in the
overloaded cow argument. Since the grow direction is represented by
bits, it is possible and allowed to create bi-directional stacks.
This is not an advertised feature, more of a side-effect.

Fix a bug in vm_map_growstack() that's specific to rstacks and which
we could only find by having the ability to create rstacks: when
the mapped stack ends at the faulting address, we have not actually
mapped the faulting address. we need to include or cover the faulting
address.

Note that at this time mmap(2) has not been extended to allow the
creation of rstacks by processes. If such a need arises, this can
be done.

Tested on: alpha, i386, ia64, sparc64


# 120389 23-Sep-2003 silby

Adjust the kmapentzone limit so that it takes into account the size of
maxproc and maxfiles, as procs, pipes, and other structures cause allocations
from kmapentzone.

Submitted by: tegge


# 120371 23-Sep-2003 alc

Change the handling of the kernel and kmem objects in vm_map_delete(): In
order to use "unmanaged" pages in the kmem object, vm_map_delete() must
unconditionally perform pmap_remove(). Otherwise, sparc64 has problems.

Tested by: jake


# 119595 30-Aug-2003 marcel

Introduce MAP_ENTRY_GROWS_DOWN and MAP_ENTRY_GROWS_UP to allow for
growable (stack) entries that not only grow down, but also grow up.
Have vm_map_growstack() take these flags into account when growing
an entry.

This is the first step in adding support for upward growable stacks.
It is a required feature on ia64 to support the register stack (or
rstack as I like to call it -- it also means reverse stack). We do
not currently create rstacks, so the upward growing is not exercised
and the change should be a functional no-op.

Reviewed by: alc


# 118878 13-Aug-2003 alc

Remove GIANT_REQUIRED from vmspace_alloc().


# 118771 11-Aug-2003 bms

Add the mlockall() and munlockall() system calls.
- All those diffs to syscalls.master for each architecture *are*
necessary. This needed clarification; the stub code generation for
mlockall() was disabled, which would prevent applications from
linking to this API (suggested by mux)
- Giant has been quoshed. It is no longer held by the code, as
the required locking has been pushed down within vm_map.c.
- Callers must specify VM_MAP_WIRE_HOLESOK or VM_MAP_WIRE_NOHOLES
to express their intention explicitly.
- Inspected at the vmstat, top and vm pager sysctl stats level.
Paging-in activity is occurring correctly, using a test harness.
- The RES size for a process may appear to be greater than its SIZE.
This is believed to be due to mappings of the same shared library
page being wired twice. Further exploration is needed.
- Believed to back out of allocations and locks correctly
(tested with WITNESS, MUTEX_PROFILING, INVARIANTS and DIAGNOSTIC).

PR: kern/43426, standards/54223
Reviewed by: jake, alc
Approved by: jake (mentor)
MFC after: 2 weeks


# 117724 18-Jul-2003 phk

Move the implementation of the vmspace_swap_count() (used only in
the "toss the largest process" emergency handling) from vm_map.c to
swap_pager.c.

The quantity calculated depends strongly on the internals of the
swap_pager and by moving it, we no longer need to expose the
internal metrics of the swap_pager to the world.


# 117206 03-Jul-2003 alc

Background: pmap_object_init_pt() premaps the pages of a object in
order to avoid the overhead of later page faults. In general, it
implements two cases: one for vnode-backed objects and one for
device-backed objects. Only the device-backed case is really
machine-dependent, belonging in the pmap.

This commit moves the vnode-backed case into the (relatively) new
function vm_map_pmap_enter(). On amd64 and i386, this commit only
amounts to code rearrangement. On alpha and ia64, the new machine
independent (MI) implementation of the vnode case is smaller and more
efficient than their pmap-based implementations. (The MI
implementation takes advantage of the fact that objects in -CURRENT
are ordered collections of pages.) On sparc64, pmap_object_init_pt()
hadn't (yet) been implemented.


# 117093 01-Jul-2003 alc

Check the address provided to vm_map_stack() against the vm map's maximum,
returning an error if the address is too high.


# 117047 29-Jun-2003 alc

Introduce vm_map_pmap_enter(). Presently, this is a stub calling the MD
pmap_object_init_pt().


# 116923 27-Jun-2003 alc

Simple read-modify-write operations on a vm object's flags, ref_count, and
shadow_count can now rely on its mutex for synchronization. Remove one use
of Giant from vm_map_insert().


# 116799 25-Jun-2003 alc

Remove a GIANT_REQUIRED on the kernel object that we no longer need.


# 116226 11-Jun-2003 obrien

Use __FBSDID().


# 115931 07-Jun-2003 alc

Pass the vm object to vm_object_collapse() with its lock held.


# 114317 30-Apr-2003 alc

Increase the scope of the vm_object lock in vm_map_delete().


# 114263 29-Apr-2003 alc

Add vm_object locking to vmspace_swap_count().


# 114053 26-Apr-2003 alc

- Extend the scope of two existing vm_object locks to cover
swap_pager_freespace().


# 113955 24-Apr-2003 alc

- Acquire the vm_object's lock when performing vm_object_page_clean().
- Add a parameter to vm_pageout_flush() that tells vm_pageout_flush()
whether its caller has locked the vm_object. (This is a temporary
measure to bootstrap vm_object locking.)


# 113768 20-Apr-2003 alc

- Update the vm_object locking in vm_map_insert().


# 113740 20-Apr-2003 alc

Update vm_object locking in vm_map_delete().


# 113701 18-Apr-2003 alc

o Update locking around vm_object_page_remove() in vm_map_clean()
to use the new macros.
o Remove unnecessary increment and decrement of the vm_object's
reference count in vm_map_clean().


# 113449 13-Apr-2003 alc

Lock some manipulations of the vm object's flags.


# 112367 18-Mar-2003 phk

Including <sys/stdint.h> is (almost?) universally only to be able to use
%j in printfs, so put a newsted include in <sys/systm.h> where the printf
prototype lives and save everybody else the trouble.


# 112167 12-Mar-2003 das

- When the VM daemon is out of swap space and looking for a
process to kill, don't block on a map lock while holding the
process lock. Instead, skip processes whose map locks are held
and find something else to kill.
- Add vm_map_trylock_read() to support the above.

Reviewed by: alc, mike (mentor)


# 111937 06-Mar-2003 alc

Remove ENABLE_VFS_IOOPT. It is a long unfinished work-in-progress.

Discussed on: arch@


# 111119 19-Feb-2003 imp

Back out M_* changes, per decision of the TRB.

Approved by: trb


# 110958 15-Feb-2003 alc

Remove the acquisition and release of Giant around pmap_growkernel().
It's unnecessary for two reasons: (1) Giant is at present already held in
such cases and (2) our various implementations of pmap_growkernel() look to
be MP safe. (For example, for sparc64 the proof of (2) is trivial.)


# 109820 25-Jan-2003 alc

Add MTX_DUPOK to the initialization of system map locks.


# 109623 21-Jan-2003 alfred

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


# 109572 20-Jan-2003 dillon

Close the remaining user address mapping races for physical
I/O, CAM, and AIO. Still TODO: streamline useracc() checks.

Reviewed by: alc, tegge
MFC after: 7 days


# 109205 13-Jan-2003 dillon

It is possible for an active aio to prevent shared memory from being
dereferenced when a process exits due to the vmspace ref-count being
bumped. Change shmexit() and shmexit_myhook() to take a vmspace instead
of a process and call it in vmspace_dofree(). This way if it is missed
in exit1()'s early-resource-free it will still be caught when the zombie is
reaped.

Also fix a potential race in shmexit_myhook() by NULLing out
vmspace->vm_shm prior to calling shm_delete_mapping() and free().

MFC after: 7 days


# 108594 03-Jan-2003 alc

Lock the vm object when performing vm_object_clear_flag().


# 108515 31-Dec-2002 alc

Implement a variant locking scheme for vm maps: Access to system maps
is now synchronized by a mutex, whereas access to user maps is still
synchronized by a lockmgr()-based lock. Why? No single type of lock,
including sx locks, meets the requirements of both types of vm map.
Sometimes we sleep while holding the lock on a user map. Thus, a
a mutex isn't appropriate. On the other hand, both lockmgr()-based
and sx locks release Giant when a thread/process blocks during
contention for a lock. This could lead to a race condition in a legacy
driver (that relies on Giant for synchronization) if it attempts to
kmem_malloc() and fails to immediately obtain the lock. Fortunately,
we never sleep while holding a system map lock.


# 108418 29-Dec-2002 alc

- Increment the vm_map's timestamp if _vm_map_trylock() succeeds.
- Introduce map_sleep_mtx and use it to replace Giant in
vm_map_unlock_and_wait() and vm_map_wakeup(). (Original
version by: tegge.)


# 108413 29-Dec-2002 alc

- Remove vm_object_init2(). It is unused.
- Add a mtx_destroy() to vm_object_collapse(). (This allows a bzero()
to migrate from _vm_object_allocate() to vm_object_zinit(), where it
will be performed less often.)


# 107912 15-Dec-2002 dillon

Fix a refcount race with the vmspace structure. In order to prevent
resource starvation we clean-up as much of the vmspace structure as we
can when the last process using it exits. The rest of the structure
is cleaned up when it is reaped. But since exit1() decrements the ref
count it is possible for a double-free to occur if someone else, such as
the process swapout code, references and then dereferences the structure.
Additionally, the final cleanup of the structure should not occur until
the last process referencing it is reaped.

This commit solves the problem by introducing a secondary reference count,
calling 'vm_exitingcnt'. The normal reference count is decremented on exit
and vm_exitingcnt is incremented. vm_exitingcnt is decremented when the
process is reaped. When both vm_exitingcnt and vm_refcnt are 0, the
structure is freed for real.

MFC after: 3 weeks


# 107892 15-Dec-2002 alc

Perform vm_object_lock() and vm_object_unlock() around
vm_object_page_remove().


# 107464 01-Dec-2002 alc

Hold the page queues lock when calling pmap_protect(); it updates fields
of the vm_page structure. Make the style of the pmap_protect() calls
consistent.

Approved by: re (blanket)


# 107250 25-Nov-2002 alc

Acquire and release the page queues lock around calls to pmap_protect()
because it updates flags within the vm page.

Approved by: re (blanket)


# 106708 09-Nov-2002 alc

Fix an error case in vm_map_wire(): unwiring of an entry during cleanup
after a user wire error fails when the entry is already system wired.

Reported by: tegge


# 106600 07-Nov-2002 mux

Correctly print vm_offset_t types.


# 105229 16-Oct-2002 phk

Properly put macro args in ().

Spotted by: FlexeLint.


# 103794 22-Sep-2002 mdodd

Modify vm_map_clean() (and thus the msync(2) system call) to support
invalidation of cached pages for objects of type OBJT_DEVICE.

Submitted by: Christian Zander <zander@minion.de>
Approved by: alc


# 103767 21-Sep-2002 jake

Use the fields in the sysentvec and in the vm map header in place of the
constants VM_MIN_ADDRESS, VM_MAXUSER_ADDRESS, USRSTACK and PS_STRINGS.
This is mainly so that they can be variable even for the native abi, based
on different machine types. Get stack protections from the sysentvec too.
This makes it trivial to map the stack non-executable for certain abis, on
machines that support it.


# 102370 24-Aug-2002 alc

o Use vm_object_lock() in place of Giant when manipulating a vm object
in vm_map_insert().


# 100630 24-Jul-2002 alc

o Merge vm_fault_wire() and vm_fault_user_wire() by adding a new parameter,
user_wire.


# 100384 20-Jul-2002 peter

Infrastructure tweaks to allow having both an Elf32 and an Elf64 executable
handler in the kernel at the same time. Also, allow for the
exec_new_vmspace() code to build a different sized vmspace depending on
the executable environment. This is a big help for execing i386 binaries
on ia64. The ELF exec code grows the ability to map partial pages when
there is a page size difference, eg: emulating 4K pages on 8K or 16K
hardware pages.

Flesh out the i386 emulation support for ia64. At this point, the only
binary that I know of that fails is cvsup, because the cvsup runtime
tries to execute code in pages not marked executable.

Obtained from: dfr (mostly, many tweaks from me).


# 100309 18-Jul-2002 peter

(VM_MAX_KERNEL_ADDRESS - KERNBASE) / PAGE_SIZE may not fit in an integer.
Use lmin(long, long), not min(u_int, u_int). This is a problem here on
ia64 which has *way* more than 2^32 pages of KVA. 281474976710655 pages
to be precice.


# 99893 12-Jul-2002 alc

o Assert GIANT_REQUIRED on system maps in _vm_map_lock(),
_vm_map_lock_read(), and _vm_map_trylock(). Submitted by: tegge
o Remove GIANT_REQUIRED from kmem_alloc_wait() and kmem_free_wakeup().
(This clears the way for exec_map accesses to move outside of Giant.
The exec_map is not a system map.)
o Remove some premature MPSAFE comments.

Reviewed by: tegge


# 99754 11-Jul-2002 alc

o Add a "needs wakeup" flag to the vm_map for use by kmem_alloc_wait()
and kmem_free_wakeup(). Previously, kmem_free_wakeup() always
called wakeup(). In general, no one was sleeping.
o Export vm_map_unlock_and_wait() and vm_map_wakeup() from vm_map.c
for use in vm_kern.c.


# 99374 03-Jul-2002 alc

o Make the reservation of KVA space for kernel map entries a function
of the KVA space's size in addition to the amount of physical memory
and reduce it by a factor of two.

Under the old formula, our reservation amounted to one kernel map entry
per virtual page in the KVA space on a 4GB i386.


# 98892 26-Jun-2002 iedowse

Avoid using the 64-bit vm_pindex_t in a few places where 64-bit
types are not required, as the overhead is unnecessary:

o In the i386 pmap_protect(), `sindex' and `eindex' represent page
indices within the 32-bit virtual address space.
o In swp_pager_meta_build() and swp_pager_meta_ctl(), use a temporary
variable to store the low few bits of a vm_pindex_t that gets used
as an array index.
o vm_uiomove() uses `osize' and `idx' for page offsets within a
map entry.
o In vm_object_split(), `idx' is a page offset within a map entry.


# 98848 26-Jun-2002 dillon

Enforce RLIMIT_VMEM on growable mappings (aka the primary stack or any
MAP_STACK mapping).

Suggested by: alc


# 98624 22-Jun-2002 alc

o In vm_map_insert(), replace GIANT_REQUIRED by the acquisition and
release of Giant around the direct manipulation of the vm_object and
the optional call to pmap_object_init_pt().
o In vm_map_findspace(), remove GIANT_REQUIRED. Instead, acquire and
release Giant around the occasional call to pmap_growkernel().
o In vm_map_find(), remove GIANT_REQUIRED.


# 98541 21-Jun-2002 alc

o Remove GIANT_REQUIRED from vm_map_stack().


# 98414 19-Jun-2002 alc

o Replace GIANT_REQUIRED in vm_object_coalesce() by the acquisition and
release of Giant.
o Reduce the scope of GIANT_REQUIRED in vm_map_insert().

These changes will enable us to remove the acquisition and release
of Giant from obreak().


# 98397 18-Jun-2002 alc

o Remove LK_CANRECURSE from the vm_map lock.


# 98361 17-Jun-2002 jeff

- Introduce the new M_NOVM option which tells uma to only check the currently
allocated slabs and bucket caches for free items. It will not go ask the vm
for pages. This differs from M_NOWAIT in that it not only doesn't block, it
doesn't even ask.

- Add a new zcreate option ZONE_VM, that sets the BUCKETCACHE zflag. This
tells uma that it should only allocate buckets out of the bucket cache, and
not from the VM. It does this by using the M_NOVM option to zalloc when
getting a new bucket. This is so that the VM doesn't recursively enter
itself while trying to allocate buckets for vm_map_entry zones. If there
are already allocated buckets when we get here we'll still use them but
otherwise we'll skip it.

- Use the ZONE_VM flag on vm map entries and pv entries on x86.


# 98343 17-Jun-2002 alc

o Acquire and release Giant in vm_map_wakeup() to prevent
a lost wakeup().

Reviewed by: tegge


# 98226 14-Jun-2002 alc

o Use vm_map_wire() and vm_map_unwire() in place of vm_map_pageable() and
vm_map_user_pageable().
o Remove vm_map_pageable() and vm_map_user_pageable().
o Remove vm_map_clear_recursive() and vm_map_set_recursive(). (They were
only used by vm_map_pageable() and vm_map_user_pageable().)

Reviewed by: tegge


# 98142 12-Jun-2002 alc

o Acquire and release Giant in vm_map_unlock_and_wait().

Submitted by: tegge


# 98119 11-Jun-2002 alc

o Properly handle a failure by vm_fault_wire() or vm_fault_user_wire()
in vm_map_wire().
o Make two white-space changes in vm_map_wire().

Reviewed by: tegge


# 98109 11-Jun-2002 alc

o Teach vm_map_delete() to respect the "in-transition" flag
on a vm_map_entry by sleeping until the flag is cleared.

Submitted by: tegge


# 98083 10-Jun-2002 alc

o In vm_map_entry_create(), call uma_zalloc() with M_NOWAIT on system maps.
Submitted by: tegge
o Eliminate the "!mapentzone" check from vm_map_entry_create() and
vm_map_entry_dispose(). Reviewed by: tegge
o Fix white-space usage in vm_map_entry_create().


# 98071 09-Jun-2002 alc

o Add vm_map_wire() for wiring contiguous regions of either kernel
or user vm_maps. This implementation has two key benefits when compared
to vm_map_{user_,}pageable(): (1) it avoids a race condition through
the use of "in-transition" vm_map entries and (2) it eliminates lock
recursion on the vm_map.

Note: there is still an error case that requires clean up.

Reviewed by: tegge


# 98052 08-Jun-2002 alc

o Simplify vm_map_unwire() by merging the second and third passes
over the caller-specified region.


# 98036 08-Jun-2002 alc

o Remove an unnecessary call to vm_map_wakeup() from vm_map_unwire().
o Add a stub for vm_map_wire().

Note: the description of the previous commit had an error. The in-
transition flag actually blocks the deallocation of a vm_map_entry by
vm_map_delete() and vm_map_simplify_entry().


# 98022 07-Jun-2002 alc

o Add vm_map_unwire() for unwiring contiguous regions of either kernel
or user vm_maps. In accordance with the standards for munlock(2),
and in contrast to vm_map_user_pageable(), this implementation does not
allow holes in the specified region. This implementation uses the
"in transition" flag described below.
o Introduce a new flag, "in transition," to the vm_map_entry.
Eventually, vm_map_delete() and vm_map_simplify_entry() will respect
this flag by deallocating in-transition vm_map_entrys, allowing
the vm_map lock to be safely released in vm_map_unwire() and (the
forthcoming) vm_map_wire().
o Modify vm_map_simplify_entry() to respect the in-transition flag.

In collaboration with: tegge


# 97753 02-Jun-2002 alc

o Migrate vm_map_split() from vm_map.c to vm_object.c, renaming it
to vm_object_split(). Its interface should still be changed
to resemble vm_object_shadow().


# 97747 02-Jun-2002 alc

o Style fixes to vm_map_split(), including the elimination of one variable
declaration that shadows another.

Note: This function should really be vm_object_split(), not vm_map_split().

Reviewed by: md5


# 97727 01-Jun-2002 alc

o Remove GIANT_REQUIRED from vm_map_zfini(), vm_map_zinit(),
vm_map_create(), and vm_map_submap().
o Make further use of a local variable in vm_map_entry_splay()
that caches a reference to one of a vm_map_entry's children.
(This reduces code size somewhat.)
o Revert a part of revision 1.66, deinlining vmspace_pmap().
(This function is MPSAFE.)


# 97710 01-Jun-2002 alc

o Revert a part of revision 1.66, contrary to what that commit message says,
deinlining vm_map_entry_behavior() and vm_map_entry_set_behavior()
actually increases the kernel's size.
o Make vm_map_entry_set_behavior() static and add a comment describing
its purpose.
o Remove an unnecessary initialization statement from vm_map_entry_splay().


# 97648 31-May-2002 alc

Further work on pushing Giant out of the vm_map layer and down
into the vm_object layer:
o Acquire and release Giant in vm_object_shadow() and
vm_object_page_remove().
o Remove the GIANT_REQUIRED assertion preceding vm_map_delete()'s call
to vm_object_page_remove().
o Remove the acquisition and release of Giant around vm_map_lookup()'s
call to vm_object_shadow().


# 97294 26-May-2002 alc

o Acquire and release Giant around pmap operations in vm_fault_unwire()
and vm_map_delete(). Assert GIANT_REQUIRED in vm_map_delete()
only if operating on the kernel_object or the kmem_object.
o Remove GIANT_REQUIRED from vm_map_remove().
o Remove the acquisition and release of Giant from munmap().


# 97198 23-May-2002 alc

o Replace the vm_map's hint by the root of a splay tree. By design,
the last accessed datum is moved to the root of the splay tree.
Therefore, on lookups in which the hint resulted in O(1) access,
the splay tree still achieves O(1) access. In contrast, on lookups
in which the hint failed miserably, the splay tree achieves amortized
logarithmic complexity, resulting in dramatic improvements on vm_maps
with a large number of entries. For example, the execution time
for replaying an access log from www.cs.rice.edu against the thttpd
web server was reduced by 23.5% due to the large number of files
simultaneously mmap()ed by this server. (The machine in question has
enough memory to cache most of this workload.)

Nothing comes for free: At present, I see a 0.2% slowdown on "buildworld"
due to the overhead of maintaining the splay tree. I believe that
some or all of this can be eliminated through optimizations
to the code.

Developed in collaboration with: Juan E Navarro <jnavarro@cs.rice.edu>
Reviewed by: jeff


# 96839 18-May-2002 alc

o Remove GIANT_REQUIRED from vm_map_madvise(). Instead, acquire and
release Giant around vm_map_madvise()'s call to pmap_object_init_pt().
o Replace GIANT_REQUIRED in vm_object_madvise() with the acquisition
and release of Giant.
o Remove the acquisition and release of Giant from madvise().


# 96469 12-May-2002 alc

o Remove GIANT_REQUIRED and an excessive number of blank lines
from vm_map_inherit(). (minherit() need not acquire Giant
anymore.)


# 96441 12-May-2002 alc

o Acquire and release Giant in vm_object_reference() and
vm_object_deallocate(), replacing the assertion GIANT_REQUIRED.
o Remove GIANT_REQUIRED from vm_map_protect() and vm_map_simplify_entry().
o Acquire and release Giant around vm_map_protect()'s call to pmap_protect().

Altogether, these changes eliminate the need for mprotect() to acquire
and release Giant.


# 96087 05-May-2002 alc

o Move vm_freeze_copyopts() from vm_map.{c.h} to vm_object.{c,h}. It's plainly
an operation on a vm_object and belongs in the latter place.


# 96080 05-May-2002 alc

o Condition the compilation of uiomoveco() and vm_uiomove()
on ENABLE_VFS_IOOPT.
o Add a comment to the effect that this code is experimental
support for zero-copy I/O.


# 96056 05-May-2002 alc

o Remove GIANT_REQUIRED from vm_map_lookup() and vm_map_lookup_done().
o Acquire and release Giant around vm_map_lookup()'s call
to vm_object_shadow().


# 96007 04-May-2002 alc

o Remove GIANT_REQUIRED from vm_map_lookup_entry() and
vm_map_check_protection().
o Call vm_map_check_protection() without Giant held in munmap().


# 95942 02-May-2002 alc

o Change the implementation of vm_map locking to use exclusive locks
exclusively. The interface still, however, distinguishes
between a shared lock and an exclusive lock.


# 95901 02-May-2002 alc

o Remove dead and lockmgr()-specific debugging code.


# 95758 29-Apr-2002 jeff

Add a new zone flag UMA_ZONE_MTXCLASS. This puts the zone in it's own
mutex class. Currently this is only used for kmapentzone because kmapents
are are potentially allocated when freeing memory. This is not dangerous
though because no other allocations will be done while holding the
kmapentzone lock.


# 95686 28-Apr-2002 alc

Pass the caller's file name and line number to the vm_map locking functions.


# 95610 28-Apr-2002 alc

o Introduce and use vm_map_trylock() to replace several direct uses
of lockmgr().
o Add missing synchronization to vmspace_swap_count(): Obtain a read lock
on the vm_map before traversing it.


# 95589 27-Apr-2002 alc

o Begin documenting the (existing) locking protocol on the vm_map
in the same style as sys/proc.h.
o Undo the de-inlining of several trivial, MPSAFE methods on the vm_map.
(Contrary to the commit message for vm_map.h revision 1.66 and vm_map.c
revision 1.206, de-inlining these methods increased the kernel's size.)


# 94921 17-Apr-2002 peter

Do not free the vmspace until p->p_vmspace is set to null. Otherwise
statclock can access it in the tail end of statclock_process() at an
unfortunate time. This bit me several times on an SMP alpha (UP2000)
and the problem went away with this change. I'm not sure why it doesn't
break x86 as well. Maybe it's because the clocks are much faster
on alpha (HZ=1024 by default).


# 94777 15-Apr-2002 peter

Pass vm_page_t instead of physical addresses to pmap_zero_page[_area]()
and pmap_copy_page(). This gets rid of a couple more physical addresses
in upper layers, with the eventual aim of supporting PAE and dealing with
the physical addressing mostly within pmap. (We will need either 64 bit
physical addresses or page indexes, possibly both depending on the
circumstances. Leaving this to pmap itself gives more flexibilitly.)

Reviewed by: jake
Tested on: i386, ia64 and (I believe) sparc64. (my alpha was hosed)


# 92748 20-Mar-2002 jeff

Remove references to vm_zone.h and switch over to the new uma API.


# 92692 19-Mar-2002 jeff

Quit a warning introduced by UMA. This only occurs on machines where
vm_size_t != unsigned long.

Reviewed by: phk


# 92654 19-Mar-2002 jeff

This is the first part of the new kernel memory allocator. This replaces
malloc(9) and vm_zone with a slab like allocator.

Reviewed by: arch@


# 92588 18-Mar-2002 green

Back out the modification of vm_map locks from lockmgr to sx locks. The
best path forward now is likely to change the lockmgr locks to simple
sleep mutexes, then see if any extra contention it generates is greater
than removed overhead of managing local locking state information,
cost of extra calls into lockmgr, etc.

Additionally, making the vm_map lock a mutex and respecting it properly
will put us much closer to not needing Giant magic in vm.


# 92466 17-Mar-2002 alc

Acquire a read lock on the map inside of vm_map_check_protection() rather
than expecting the caller to do so. This (1) eliminates duplicated code in
kernacc() and useracc() and (2) fixes missing synchronization in munmap().


# 92246 13-Mar-2002 green

Rename SI_SUB_MUTEX to SI_SUB_MTX_POOL to make the name at all accurate.
While doing this, move it earlier in the sysinit boot process so that the
VM system can use it.

After that, the system is now able to use sx locks instead of lockmgr
locks in the VM system. To accomplish this, some of the more
questionable uses of the locks (such as testing whether they are
owned or not, as well as allowing shared+exclusive recursion) are
removed, and simpler logic throughout is used so locks should also be
easier to understand.

This has been tested on my laptop for months, and has not shown any
problems on SMP systems, either, so appears quite safe. One more
user of lockmgr down, many more to go :)


# 92029 10-Mar-2002 eivind

- Remove a number of extra newlines that do not belong here according to
style(9)
- Minor space adjustment in cases where we have "( ", " )", if(), return(),
while(), for(), etc.
- Add /* SYMBOL */ after a few #endifs.

Reviewed by: alc


# 91777 07-Mar-2002 dillon

Fix a bug in the vm_map_clean() procedure. msync()ing an area of memory
that has just been mapped MAP_ANON|MAP_NOSYNC and has not yet been accessed
will panic the machine.

MFC after: 1 day


# 90263 05-Feb-2002 alfred

Fix a race with free'ing vmspaces at process exit when vmspaces are
shared.

Also introduce vm_endcopy instead of using pointer tricks when
initializing new vmspaces.

The race occured because of how the reference was utilized:
test vmspace reference,
possibly block,
decrement reference

When sharing a vmspace between multiple processes it was possible
for two processes exiting at the same time to test the reference
count, possibly block and neither one free because they wouldn't
see the other's update.

Submitted by: green


# 85762 31-Oct-2001 dillon

Don't let pmap_object_init_pt() exhaust all available free pages
(allocating pv entries w/ zalloci) when called in a loop due to
an madvise(). It is possible to completely exhaust the free page list and
cause a system panic when an expected allocation fails.


# 84932 14-Oct-2001 tegge

Fix locking violations during page wiring:

- vm map entries are not valid after the map has been unlocked.

- An exclusive lock on the map is needed before calling
vm_map_simplify_entry().

Fix cleanup after page wiring failure to unwire all pages that had been
successfully wired before the failure was detected.

Reviewed by: dillon


# 84812 11-Oct-2001 jhb

Add missing includes of sys/ktr.h.


# 84783 10-Oct-2001 ps

Make MAXTSIZ, DFLDSIZ, MAXDSIZ, DFLSSIZ, MAXSSIZ, SGROWSIZ loader
tunable.

Reviewed by: peter
MFC after: 2 weeks


# 83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 79248 04-Jul-2001 dillon

Change inlines back into mainline code in preparation for mutexing. Also,
most of these inlines had been bloated in -current far beyond their
original intent. Normalize prototypes and function declarations to be ANSI
only (half already were). And do some general cleanup.

(kernel size also reduced by 50-100K, but that isn't the prime intent)


# 79224 04-Jul-2001 dillon

With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage). Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.


# 78592 22-Jun-2001 bmilekic

Introduce numerous SMP friendly changes to the mbuf allocator. Namely,
introduce a modified allocation mechanism for mbufs and mbuf clusters; one
which can scale under SMP and which offers the possibility of resource
reclamation to be implemented in the future. Notable advantages:

o Reduce contention for SMP by offering per-CPU pools and locks.
o Better use of data cache due to per-CPU pools.
o Much less code cache pollution due to excessively large allocation macros.
o Framework for `grouping' objects from same page together so as to be able
to possibly free wired-down pages back to the system if they are no longer
needed by the network stacks.

Additional things changed with this addition:

- Moved some mbuf specific declarations and initializations from
sys/conf/param.c into mbuf-specific code where they belong.
- m_getclr() has been renamed to m_get_clrd() because the old name is really
confusing. m_getclr() HAS been preserved though and is defined to the new
name. No tree sweep has been done "to change the interface," as the old
name will continue to be supported and is not depracated. The change was
merely done because m_getclr() sounds too much like "m_get a cluster."
- TEMPORARILY disabled mbtypes statistics displaying in netstat(1) and
systat(1) (see TODO below).
- Fixed systat(1) to display number of "free mbufs" based on new per-CPU
stat structures.
- Fixed netstat(1) to display new per-CPU stats based on sysctl-exported
per-CPU stat structures. All infos are fetched via sysctl.

TODO (in order of priority):

- Re-enable mbtypes statistics in both netstat(1) and systat(1) after
introducing an SMP friendly way to collect the mbtypes stats under the
already introduced per-CPU locks (i.e. hopefully don't use atomic() - it
seems too costly for a mere stat update, especially when other locks are
already present).
- Optionally have systat(1) display not only "total free mbufs" but also
"total free mbufs per CPU pool."
- Fix minor length-fetching issues in netstat(1) related to recently
re-enabled option to read mbuf stats from a core file.
- Move reference counters at least for mbuf clusters into an unused portion
of the cluster itself, to save space and need to allocate a counter.
- Look into introducing resource freeing possibly from a kproc.

Reviewed by (in parts): jlemon, jake, silby, terry
Tested by: jlemon (Intel & Alpha), mjacob (Intel & Alpha)
Preliminary performance measurements: jlemon (and me, obviously)
URL: http://people.freebsd.org/~bmilekic/mb_alloc/


# 78099 11-Jun-2001 dillon

Cleanup the tabbing


# 77948 09-Jun-2001 dillon

Two fixes to the out-of-swap process termination code. First, start killing
processes a little earlier to avoid a deadlock. Second, when calculating
the 'largest process' do not just count RSS. Instead count the RSS + SWAP
used by the process. Without this the code tended to kill small
inconsequential processes like, oh, sshd, rather then one of the many
'eatmem 200MB' I run on a whim :-). This fix has been extensively tested on
-stable and somewhat tested on -current and will be MFCd in a few days.

Shamed into fixing this by: ps


# 77090 23-May-2001 jhb

- Add lots of vm_mtx assertions.
- Add a few KTR tracepoints to track the addition and removal of
vm_map_entry's and the creation adn free'ing of vmspace's.
- Adjust a few portions of code so that we update the process' vmspace
pointer to its new vmspace before freeing the old vmspace.


# 76827 18-May-2001 alfred

Introduce a global lock for the vm subsystem (vm_mtx).

vm_mtx does not recurse and is required for most low level
vm operations.

faults can not be taken without holding Giant.

Memory subsystems can now call the base page allocators safely.

Almost all atomic ops were removed as they are covered under the
vm mutex.

Alpha and ia64 now need to catch up to i386's trap handlers.

FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).

Reviewed (partially) by: jake, jhb


# 76166 01-May-2001 markm

Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by: bde (with reservations)


# 75452 12-Apr-2001 alfred

remove truncated part from commment


# 74237 14-Mar-2001 dillon

Fix a lock reversal problem in the VM subsystem related to threaded
programs. There is a case during a fork() which can cause a deadlock.

From Tor -
The workaround that consists of setting a flag in the vm map that
indicates that a fork is in progress and using that mark in the page
fault handling to force a revalidation failure. That change will only
affect (pessimize) page fault handling during fork for threaded
(linuxthreads style) applications and applications using aio_*().

Submited by: tegge


# 74235 14-Mar-2001 dillon

Temporarily remove the vm_map_simplify() call from vm_map_insert(). The
call is correct, but it interferes with the massive hack called
vm_map_growstack(). The call will be returned after our stack handling
code is fixed.

Reported by: tegge


# 74042 09-Mar-2001 iedowse

When creating a shadow vm_object in vmspace_fork(), only one
reference count was transferred to the new object, but both the
new and the old map entries had pointers to the new object.
Correct this by transferring the second reference.

This fixes a panic that can occur when mmap(2) is used with the
MAP_INHERIT flag.

PR: i386/25603
Reviewed by: dillon, alc


# 71983 04-Feb-2001 dillon

This commit represents work mainly submitted by Tor and slightly modified
by myself. It solves a serious vm_map corruption problem that can occur
with the buffer cache when block sizes > 64K are used. This code has been
heavily tested in -stable but only tested somewhat on -current. An MFC
will occur in a few days. My additions include the vm_map_simplify_entry()
and minor buffer cache boundry case fix.

Make the buffer cache use a system map for buffer cache KVM rather then a
normal map.

Ensure that VM objects are not allocated for system maps. There were cases
where a buffer map could wind up with a backing VM object -- normally
harmless, but this could also result in the buffer cache blocking in places
where it assumes no blocking will occur, possibly resulting in corrupted
maps.

Fix a minor boundry case in the buffer cache size limit is reached that
could result in non-optimal code.

Add vm_map_simplify_entry() calls to prevent 'creeping proliferation'
of vm_map_entry's in the buffer cache's vm_map. Previously only a simple
linear optimization was made. (The buffer vm_map typically has only a
handful of vm_map_entry's. This stabilizes it at that level permanently).

PR: 20609
Submitted by: (Tor Egge) tegge


# 69972 13-Dec-2000 tanimura

- If swap metadata does not fit into the KVM, reduce the number of
struct swblock entries by dividing the number of the entries by 2
until the swap metadata fits.

- Reject swapon(2) upon failure of swap_zone allocation.

This is just a temporary fix. Better solutions include:
(suggested by: dillon)

o reserving swap in SWAP_META_PAGES chunks, and
o swapping the swblock structures themselves.

Reviewed by: alfred, dillon


# 68261 02-Nov-2000 tegge

Clear the MAP_ENTRY_USER_WIRED flag from cloned vm_map entries.
PR: 2840


# 66615 03-Oct-2000 jasone

Convert lockmgr locks from using simple locks to using mutexes.

Add lockdestroy() and appropriate invocations, which corresponds to
lockinit() and must be called to clean up after a lockmgr lock is no
longer needed.


# 60557 14-May-2000 dillon

Fixed bug in madvise() / MADV_WILLNEED. When the request is offset
from the base of the first map_entry the call to pmap_object_init_pt()
uses the wrong start VA. MFC to follow.

PR: i386/18095


# 58705 27-Mar-2000 charnier

Revert spelling mistake I made in the previous commit
Requested by: Alan and Bruce


# 58634 26-Mar-2000 charnier

Spelling


# 57550 28-Feb-2000 ps

Add MAP_NOCORE to mmap(2), and MADV_NOCORE and MADV_CORE to madvise(2).
This
This feature allows you to specify if mmap'd data is included in
an application's corefile.

Change the type of eflags in struct vm_map_entry from u_char to
vm_eflags_t (an unsigned int).

Reviewed by: dillon,jdp,alfred
Approved by: jkh


# 57263 16-Feb-2000 dillon

Fix null-pointer dereference crash when the system is intentionally
run out of KVM through a mmap()/fork() bomb that allocates hundreds
of thousands of vm_map_entry structures.

Add panic to make null-pointer dereference crash a little more verbose.

Add a new sysctl, vm.max_proc_mmap, which specifies the maximum number
of mmap()'d spaces (discrete vm_map_entry's in the process). The value
defaults to around 9000 for a 128MB machine. The test is scaled for the
number of processes sharing a vmspace (aka linux threads). Setting
the value to 0 disables the feature.

PR: kern/16573
Approved by: jkh


# 56378 21-Jan-2000 dillon

Fix a deadlock between msync(..., MS_INVALIDATE) and vm_fault. The
invalidation code cannot wait for paging to complete while holding a
vnode lock, so we don't wait. Instead we simply allow the lower level
code to simply block on any busy pages it encounters. I think Yahoo
may be the only entity in the entire world that actually uses this
msync feature :-).

Bug reported by: Paul Saab <paul@mu.org>


# 54467 12-Dec-1999 dillon

Add MAP_NOSYNC feature to mmap(), and MADV_NOSYNC and MADV_AUTOSYNC to
madvise().

This feature prevents the update daemon from gratuitously flushing
dirty pages associated with a mapped file-backed region of memory. The
system pager will still page the memory as necessary and the VM system
will still be fully coherent with the filesystem. Modifications made
by other means to the same area of memory, for example by write(), are
unaffected. The feature works on a page-granularity basis.

MAP_NOSYNC allows one to use mmap() to share memory between processes
without incuring any significant filesystem overhead, putting it in
the same performance category as SysV Shared memory and anonymous memory.

Reviewed by: julian, alc, dg


# 53701 25-Nov-1999 alc

Remove nonsensical vm_map_{clear,set}_recursive() calls
from vm_map_pageable(). At the point they called, vm_map_pageable()
holds a read (or shared) lock on the map. The purpose
of vm_map_{clear,set}_recursive() is to disable/enable repeated
write (or exclusive) lock requests by the same process.


# 53627 23-Nov-1999 alc

Correct the following error: vm_map_pageable() on a COW'ed (post-fork)
vm_map always failed because vm_map_lookup() looked at
"vm_map_entry->wired_count" instead of "(vm_map_entry->eflags &
MAP_ENTRY_USER_WIRED)". The effect was that many page
wiring operations by sysctl were (silently) failing.


# 52973 07-Nov-1999 alc

Remove unused #include's.

Submitted by: phk


# 52960 07-Nov-1999 alc

The functions declared by this header file no longer exist.

Submitted by: phk (in part)


# 52635 29-Oct-1999 phk

useracc() the prequel:

Merge the contents (less some trivial bordering the silly comments)
of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts
the #defines for the vm_inherit_t and vm_prot_t types next to their
typedefs.

This paves the road for the commit to follow shortly: change
useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE}
as argument.


# 51493 21-Sep-1999 dillon

cleanup madvise code, add a few more sanity checks.

Reviewed by: Alan Cox <alc@cs.rice.edu>, dg@root.com


# 50477 27-Aug-1999 peter

$Id$ -> $FreeBSD$


# 49697 13-Aug-1999 alc

vm_map_madvise:
A complete rewrite by dillon and myself to separate
the implementation of behaviors that effect the vm_map_entry
from those that effect the vm_object.

A result of this change is that madvise(..., MADV_FREE);
is much cheaper.


# 49592 10-Aug-1999 alc

vm_map_madvise:
Now that behaviors are stored in the vm_map_entry rather than
the vm_object, it's no longer necessary to instantiate a vm_object
just to hold the behavior.

Reviewed by: dillon


# 49338 01-Aug-1999 alc

Move the memory access behavior information provided by madvise
from the vm_object to the vm_map.

Submitted by: dillon


# 48963 21-Jul-1999 alc

Fix the following problem:

When creating new processes (or performing exec), the new page
directory is initialized too early. The kernel might grow before
p_vmspace is initialized for the new process. Since pmap_growkernel
doesn't yet know about the new page directory, it isn't updated, and
subsequent use causes a failure.

The fix is (1) to clear p_vmspace early, to stop pmap_growkernel
from stomping on memory, and (2) to defer part of the initialization
of new page directories until p_vmspace is initialized.

PR: kern/12378
Submitted by: tegge
Reviewed by: dfr


# 48757 11-Jul-1999 alc

Cleanup OBJ_ONEMAPPING management.

vm_map.c:
Don't set OBJ_ONEMAPPING on arbitrary vm objects. Only default
and swap type vm objects should have it set. vm_object_deallocate
already handles these cases.

vm_object.c:
If OBJ_ONEMAPPING isn't already clear in vm_object_shadow,
we are in trouble. Instead of clearing it, make it
an assertion that it is already clear.


# 48409 01-Jul-1999 peter

Fix some int/long printf problems for the Alpha


# 47986 17-Jun-1999 alc

vm_map_growstack uses vmspace::vm_ssize as though it contained
the stack size in bytes when in fact it is the stack size in pages.


# 47968 17-Jun-1999 alc

vm_map_insert sometimes extends an existing vm_map entry, rather than
creating a new entry. vm_map_stack and vm_map_growstack can panic when
a new entry isn't created. Fixed vm_map_stack and vm_map_growstack.

Also, when extending the stack, always set the protection to VM_PROT_ALL.


# 47966 16-Jun-1999 alc

Move vm_map_stack and vm_map_growstack after the definition
of the vm_map_clip_end macro. (The next commit will modify
vm_map_stack and vm_map_growstack to use vm_map_clip_end.)


# 47965 16-Jun-1999 alc

Remove some unused declarations and duplicate initialization.


# 47888 12-Jun-1999 alc

vm_map_protect:
The wrong vm_map_entry is used to determine if writes must not be
allowed due to COW.


# 47568 28-May-1999 alc

Avoid the creation of unnecessary shadow objects.


# 47290 18-May-1999 alc

vm_map_insert:
General cleanup. Eliminate coalescing checks that are duplicated
by vm_object_coalesce.


# 47258 16-May-1999 alc

Add the options MAP_PREFAULT and MAP_PREFAULT_PARTIAL to vm_map_find/insert,
eliminating the need for the pmap_object_init_pt calls in imgact_* and
mmap.

Reviewed by: David Greenman <dg@root.com>


# 47243 16-May-1999 alc

Remove prototypes for functions that don't exist anymore (vm_map.h).

Remove a useless argument from vm_map_madvise's interface (vm_map.c,
vm_map.h, and vm_mmap.c).

Remove a redundant test in vm_uiomove (vm_map.c).

Make two changes to vm_object_coalesce:

1. Determine whether the new range of pages actually overlaps
the existing object's range of pages before calling vm_object_page_remove.
(Prior to this change almost 90% of the calls to vm_object_page_remove
were to remove pages that were beyond the end of the object.)

2. Free any swap space allocated to removed pages.


# 47207 14-May-1999 alc

Simplify vm_map_find/insert's interface: remove the MAP_COPY_NEEDED option.

It never makes sense to specify MAP_COPY_NEEDED without also specifying
MAP_COPY_ON_WRITE, and vice versa. Thus, MAP_COPY_ON_WRITE suffices.

Reviewed by: David Greenman <dg@root.com>


# 45293 04-Apr-1999 alc

Two changes to vm_map_delete:

1. Don't bother checking object->ref_count == 1 in order to set
OBJ_ONEMAPPING. It's a waste of time. If object->ref_count == 1,
vm_map_entry_delete will "run-down" the object and its pages.

2. If object->ref_count == 1, ignore OBJ_ONEMAPPING. Wait for
vm_map_entry_delete to "run-down" the object and its pages.
Otherwise, we're calling two different procedures to delete
the object's pages.

Note: "vmstat -s" will once again show a non-zero value
for "pages freed by exiting processes".


# 45069 27-Mar-1999 alc

Mainly, eliminate the comments about share maps. (We don't have share maps
any more.) Also, eliminate an incorrect comment that says that we don't
coalesce vm_map_entry's. (We do.)


# 44928 21-Mar-1999 alc

Two changes:

Remove more (redundant) map timestamp increments from properly
synchronized routines. (Changed: vm_map_entry_link, vm_map_entry_unlink,
and vm_map_pageable.)

Micro-optimize vm_map_entry_link and vm_map_entry_unlink, eliminating
unnecessary dereferences. At the same time, converted them from macros
to inline functions.


# 44773 15-Mar-1999 alc

Two changes:

In general, vm_map_simplify_entry should be performed INSIDE
the loop that traverses the map, not outside. (Changed:
vm_map_inherit, vm_map_pageable.)

vm_fault_unwire doesn't acquire the map lock (or block holding
it). Thus, vm_map_set/clear_recursive shouldn't be called.
(Changed: vm_map_user_pageable, vm_map_pageable.)


# 44597 09-Mar-1999 alc

Remove (redundant) map timestamp increments from some properly
synchronized routines.


# 44569 08-Mar-1999 alc

Remove an unused variable from vmspace_fork.


# 44565 07-Mar-1999 alc

Change vm_map_growstack to acquire and hold a read lock (instead of a write
lock) until it actually needs to modify the vm_map.

Note: it is legal to modify vm_map::hint without holding a write lock.

Submitted by: "Richard Seaman, Jr." <dick@tar.com> with minor changes
by myself.


# 44396 02-Mar-1999 alc

Remove the last of the share map code: struct vm_map::is_main_map.

Reviewed by: Matthew Dillon <dillon@apollo.backplane.com>


# 44245 24-Feb-1999 dillon

Remove unnecessary page protects on map_split and collapse operations.
Fix bug where an object's OBJ_WRITEABLE/OBJ_MIGHTBEDIRTY flags do
not get set under certain circumstances ( page rename case ).

Reviewed by: Alan Cox <alc@cs.rice.edu>, John Dyson


# 44146 19-Feb-1999 luoqi

Hide access to vmspace:vm_pmap with inline function vmspace_pmap(). This
is the preparation step for moving pmap storage out of vmspace proper.

Reviewed by: Alan Cox <alc@cs.rice.edu>
Matthew Dillion <dillon@apollo.backplane.com>


# 44135 19-Feb-1999 dillon

Submitted by: Alan Cox <alc@cs.rice.edu>

Remove remaining share map garbage from vm_map_lookup() and clean out
old #if 0 stuff.


# 43923 12-Feb-1999 dillon

Fix non-fatal bug in vm_map_insert() which improperly cleared
OBJ_ONEMAPPING in the case where an object is extended by an
additional vm_map_entry must be allocated.

In vm_object_madvise(), remove calll to vm_page_cache() in MADV_FREE
case in order to avoid a page fault on page reuse. However, we still
mark the page as clean and destroy any swap backing store.

Submitted by: Alan Cox <alc@cs.rice.edu>


# 43748 07-Feb-1999 dillon

Remove MAP_ENTRY_IS_A_MAP 'share' maps. These maps were once used to
attempt to optimize forks but were essentially given-up on due to
problems and replaced with an explicit dup of the vm_map_entry structure.
Prior to the removal, they were entirely unused.


# 43547 02-Feb-1999 dillon

Submitted by: Alan Cox

The vm_map_insert()/vm_object_coalesce() optimization has been extended
to include OBJT_SWAP objects as well as OBJT_DEFAULT objects. This is
possible because it costs nothing to extend an OBJT_SWAP object with
the new swapper. We can't do this with the old swapper. The old swapper
used a linear array that would have had to have been reallocated, costing
time as well as a potential low-memory deadlock.


# 43493 01-Feb-1999 dillon

This patch eliminates a pointless test from appearing twice
in vm_map_simplify_entry. Basically, once you've verified that
the objects in the adjacent vm_map_entry's are the same, either
NULL or the same vm_object, there's no point in checking that the
objects have the same behavior.

Obtained from: Alan Cox <alc@cs.rice.edu>


# 43476 31-Jan-1999 julian

Submitted by: Alan Cox <alc@cs.rice.edu>
Checked by: "Richard Seaman, Jr." <dick@tar.com>
Fix the following problem:
As the code stands now, growing any stack, and not just the process's
main stack, modifies vm->vm_ssize. This is inconsistent with the code
earlier in the same procedure.


# 43311 27-Jan-1999 dillon

Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile


# 43209 26-Jan-1999 julian

Mostly remove the VM_STACK OPTION.
This changes the definitions of a few items so that structures are the
same whether or not the option itself is enabled. This allows
people to enable and disable the option without recompilng the world.

As the author says:

|I ran into a problem pulling out the VM_STACK option. I was aware of this
|when I first did the work, but then forgot about it. The VM_STACK stuff
|has some code changes in the i386 branch. There need to be corresponding
|changes in the alpha branch before it can come out completely.

what is done:
|
|1) Pull the VM_STACK option out of the header files it appears in. This
|really shouldn't affect anything that executes with or without the rest
|of the VM_STACK patches. The vm_map_entry will then always have one
|extra element (avail_ssize). It just won't be used if the VM_STACK
|option is not turned on.
|
|I've also pulled the option out of vm_map.c. This shouldn't harm anything,
|since the routines that are enabled as a result are not called unless
|the VM_STACK option is enabled elsewhere.
|
|2) Add what appears to be appropriate code the the alpha branch, still
|protected behind the VM_STACK switch. I don't have an alpha machine,
|so we would need to get some testers with alpha machines to try it out.
|
|Once there is some testing, we can consider making the change permanent
|for both i386 and alpha.
|
[..]
|
|Once the alpha code is adequately tested, we can pull VM_STACK out
|everywhere.
|

Submitted by: "Richard Seaman, Jr." <dick@tar.com>


# 43138 24-Jan-1999 dillon

Change all manual settings of vm_page_t->dirty = VM_PAGE_BITS_ALL
to use the vm_page_dirty() inline.

The inline can thus do sanity checks ( or not ) over all cases.


# 42970 21-Jan-1999 dillon

General cleanup related to the new pager. We no longer have to worry
about conversions of objects to OBJT_SWAP, it is done automatically
now.

Replaced manually inserted code with inline calls for busy waiting on
pages, which also incidently fixes a potential PG_BUSY race due to
the code not running at splvm().

vm_objects no longer have a paging_offset field ( see vm/vm_object.c )


# 42957 21-Jan-1999 dillon

This is a rather large commit that encompasses the new swapper,
changes to the VM system to support the new swapper, VM bug
fixes, several VM optimizations, and some additional revamping of the
VM code. The specific bug fixes will be documented with additional
forced commits. This commit is somewhat rough in regards to code
cleanup issues.

Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>


# 42360 06-Jan-1999 julian

Add (but don't activate) code for a special VM option to make
downward growing stacks more general.
Add (but don't activate) code to use the new stack facility
when running threads, (specifically the linux threads support).
This allows people to use both linux compiled linuxthreads, and also the
native FreeBSD linux-threads port.

The code is conditional on VM_STACK. Not using this will
produce the old heavily tested system.

Submitted by: Richard Seaman <dick@tar.com>


# 40648 25-Oct-1998 phk

Nitpicking and dusting performed on a train. Removes trivial warnings
about unused variables, labels and other lint.


# 40286 13-Oct-1998 dg

Fixed two potentially serious classes of bugs:

1) The vnode pager wasn't properly tracking the file size due to
"size" being page rounded in some cases and not in others.
This sometimes resulted in corrupted files. First noticed by
Terry Lambert.
Fixed by changing the "size" pager_alloc parameter to be a 64bit
byte value (as opposed to a 32bit page index) and changing the
pagers and their callers to deal with this properly.
2) Fixed a bogus type cast in round_page() and trunc_page() that
caused some 64bit offsets and sizes to be scrambled. Removing
the cast required adding casts at a few dozen callers.
There may be problems with other bogus casts in close-by
macros. A quick check seemed to indicate that those were okay,
however.


# 39873 01-Oct-1998 jdp

Fix a bug in which a page index was used where a byte offset was
expected. This bug caused builds of Modula-3 to fail in mysterious
ways on SMP kernels. More precisely, such builds failed on systems
with kern.fast_vfork equal to 0, the default and only supported
value for SMP kernels.

PR: kern/7468
Submitted by: tegge (Tor Egge)


# 38799 04-Sep-1998 dfr

Cosmetic changes to the PAGE_XXX macros to make them consistent with
the other objects in vm.


# 38517 24-Aug-1998 dfr

Change various syscalls to use size_t arguments instead of u_int.

Add some overflow checks to read/write (from bde).

Change all modifications to vm_page::flags, vm_page::busy, vm_object::flags
and vm_object::paging_in_progress to use operations which are not
interruptable.

Reviewed by: Bruce Evans <bde@zeta.org.au>


# 38135 06-Aug-1998 dfr

Protect all modifications to paging_in_progress with splvm(). The i386
managed to avoid corruption of this variable by luck (the compiler used a
memory read-modify-write instruction which wasn't interruptable) but other
architectures cannot.

With this change, I am now able to 'make buildworld' on the alpha (sfx: the
crowd goes wild...)


# 37640 14-Jul-1998 bde

Print pointers using %p instead of attempting to print them by
casting them to long, etc. Fixed some nearby printf bogons (sign
errors not warned about by gcc, and style bugs, but not truncation
of vm_ooffset_t's).

Use slightly less bogus casts for passing pointers to ddb command
functions.


# 37562 11-Jul-1998 bde

Fixed printf format errors.


# 37555 11-Jul-1998 bde

Fixed printf format errors.


# 37094 21-Jun-1998 bde

Removed unused includes.


# 36735 07-Jun-1998 dfr

This commit fixes various 64bit portability problems required for
FreeBSD/alpha. The most significant item is to change the command
argument to ioctl functions from int to u_long. This change brings us
inline with various other BSD versions. Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.

The prototype FreeBSD/alpha machdep will follow in a couple of days
time.


# 36275 21-May-1998 dyson

Make flushing dirty pages work correctly on filesystems that
unexpectedly do not complete writes even with sync I/O requests.
This should help the behavior of mmaped files when using
softupdates (and perhaps in other circumstances also.)


# 36112 16-May-1998 dyson

An important fix for proper inheritance of backing objects for
object splits. Another excellent detective job by Tor.
Submitted by: Tor Egge <Tor.Egge@idi.ntnu.no>


# 35694 04-May-1998 dyson

Fix the shm panic. I mistakenly used the shadow_count to keep the object
from being split, and instead added an OBJ_NOSPLIT.


# 35669 04-May-1998 dyson

Work around some VM bugs, the worst being an overly aggressive
swap space free calculation. More complete fixes will be forthcoming,
in a week.


# 35615 02-May-1998 dyson

Another minor cleanup of the split code. Make sure that pages are
busied during the entire time, so that the waits for pages being
unbusy don't make the objects inconsistant.


# 35571 01-May-1998 dyson

Fix minor bug with new over used swap fix.


# 35499 29-Apr-1998 dyson

Add a needed prototype, and fix a panic problem with the new
memory code.


# 35497 29-Apr-1998 dyson

Tighten up management of memory and swap space during map allocation,
deallocation cycles. This should provide a measurable improvement
on swap and memory allocation on loaded systems. It is unlikely a
complete solution. Also, provide more map info with procfs.
Chuck Cranor spurred on this improvement.


# 35485 28-Apr-1998 dyson

Fix a pseudo-swap leak problem. This mitigates "leaks" due to
freeing partial objects, not freeing entire objects didn't
free any of it. Simple fix to the map code.
Reviewed by: dg


# 34206 07-Mar-1998 dyson

This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code. These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances. Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code. This code might have been committed seperately, but
almost everything is interrelated.

1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
are fully valid.
2) Rather than deactivating erroneously read initial (header) pages in
kern_exec, we now free them.
3) Fix the rundown of non-VMIO buffers that are in an inconsistent
(missing vp) state.
4) Fix the disassociation of pages from buffers in brelse. The previous
code had rotted and was faulty in a couple of important circumstances.
5) Remove a gratuitious buffer wakeup in vfs_vmio_release.
6) Remove a crufty and currently unused cluster mechanism for VBLK
files in vfs_bio_awrite. When the code is functional, I'll add back
a cleaner version.
7) The page busy count wakeups assocated with the buffer cache usage were
incorrectly cleaned up in a previous commit by me. Revert to the
original, correct version, but with a cleaner implementation.
8) The cluster read code now tries to keep data associated with buffers
more aggressively (without breaking the heuristics) when it is presumed
that the read data (buffers) will be soon needed.
9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The
delay loop waiting is not useful for filesystem locks, due to the
length of the time intervals.
10) Correct and clean-up spec_getpages.
11) Implement a fully functional nfs_getpages, nfs_putpages.
12) Fix nfs_write so that modifications are coherent with the NFS data on
the server disk (at least as well as NFS seems to allow.)
13) Properly support MS_INVALIDATE on NFS.
14) Properly pass down MS_INVALIDATE to lower levels of the VM code from
vm_map_clean.
15) Better support the notion of pages being busy but valid, so that
fewer in-transit waits occur. (use p->busy more for pageouts instead
of PG_BUSY.) Since the page is fully valid, it is still usable for
reads.
16) It is possible (in error) for cached pages to be busy. Make the
page allocation code handle that case correctly. (It should probably
be a printf or panic, but I want the system to handle coding errors
robustly. I'll probably add a printf.)
17) Correct the design and usage of vm_page_sleep. It didn't handle
consistancy problems very well, so make the design a little less
lofty. After vm_page_sleep, if it ever blocked, it is still important
to relookup the page (if the object generation count changed), and
verify it's status (always.)
18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20) Fix vm_pager_put_pages and it's descendents to support an int flag
instead of a boolean, so that we can pass down the invalidate bit.


# 33817 25-Feb-1998 dyson

Fix page prezeroing for SMP, and fix some potential paging-in-progress
hangs. The paging-in-progress diagnosis was a result of Tor Egge's
excellent detective work.
Submitted by: Partially from Tor Egge.


# 33758 23-Feb-1998 dyson

Significantly improve the efficiency of the swap pager, which appears to
have declined due to code-rot over time. The swap pager rundown code
has been clean-up, and unneeded wakeups removed. Lots of splbio's
are changed to splvm's. Also, set the dynamic tunables for the
pageout daemon to be more sane for larger systems (thereby decreasing
the daemon overheadla.)


# 33676 20-Feb-1998 bde

Removed unused #includes.


# 33181 09-Feb-1998 eivind

Staticize.


# 33173 08-Feb-1998 dyson

Fix an argument to vn_lock. It appears that alot of the vn_lock usage
is a bit undisciplined, and should be checked carefully.


# 33134 06-Feb-1998 eivind

Back out DIAGNOSTIC changes.


# 33109 05-Feb-1998 dyson

1) Start using a cleaner and more consistant page allocator instead
of the various ad-hoc schemes.
2) When bringing in UPAGES, the pmap code needs to do another vm_page_lookup.
3) When appropriate, set the PG_A or PG_M bits a-priori to both avoid some
processor errata, and to minimize redundant processor updating of page
tables.
4) Modify pmap_protect so that it can only remove permissions (as it
originally supported.) The additional capability is not needed.
5) Streamline read-only to read-write page mappings.
6) For pmap_copy_page, don't enable write mapping for source page.
7) Correct and clean-up pmap_incore.
8) Cluster initial kern_exec pagin.
9) Removal of some minor lint from kern_malloc.
10) Correct some ioopt code.
11) Remove some dead code from the MI swapout routine.
12) Correct vm_object_deallocate (to remove backing_object ref.)
13) Fix dead object handling, that had problems under heavy memory load.
14) Add minor vm_page_lookup improvements.
15) Some pages are not in objects, and make sure that the vm_page.c can
properly support such pages.
16) Add some more page deficit handling.
17) Some minor code readability improvements.


# 33108 04-Feb-1998 eivind

Turn DIAGNOSTIC into a new-style option.


# 32937 31-Jan-1998 dyson

Change the busy page mgmt, so that when pages are freed, they
MUST be PG_BUSY. It is bogus to free a page that isn't busy,
because it is in a state of being "unavailable" when being
freed. The additional advantage is that the page_remove code
has a better cross-check that the page should be busy and
unavailable for other use. There were some minor problems
with the collapse code, and this plugs those subtile "holes."

Also, the vfs_bio code wasn't checking correctly for PG_BUSY
pages. I am going to develop a more consistant scheme for
grabbing pages, busy or otherwise. For now, we are stuck
with the current morass.


# 32702 22-Jan-1998 dyson

VM level code cleanups.

1) Start using TSM.
Struct procs continue to point to upages structure, after being freed.
Struct vmspace continues to point to pte object and kva space for kstack.
u_map is now superfluous.
2) vm_map's don't need to be reference counted. They always exist either
in the kernel or in a vmspace. The vmspaces are managed by reference
counts.
3) Remove the "wired" vm_map nonsense.
4) No need to keep a cache of kernel stack kva's.
5) Get rid of strange looking ++var, and change to var++.
6) Change more data structures to use our "zone" allocator. Added
struct proc, struct vmspace and struct vnode. This saves a significant
amount of kva space and physical memory. Additionally, this enables
TSM for the zone managed memory.
7) Keep ioopt disabled for now.
8) Remove the now bogus "single use" map concept.
9) Use generation counts or id's for data structures residing in TSM, where
it allows us to avoid unneeded restart overhead during traversals, where
blocking might occur.
10) Account better for memory deficits, so the pageout daemon will be able
to make enough memory available (experimental.)
11) Fix some vnode locking problems. (From Tor, I think.)
12) Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp.
(experimental.)
13) Significantly shrink, cleanup, and make slightly faster the vm_fault.c
code. Use generation counts, get rid of unneded collpase operations,
and clean up the cluster code.
14) Make vm_zone more suitable for TSM.

This commit is partially as a result of discussions and contributions from
other people, including DG, Tor Egge, PHK, and probably others that I
have forgotten to attribute (so let me know, if I forgot.)

This is not the infamous, final cleanup of the vnode stuff, but a necessary
step. Vnode mgmt should be correct, but things might still change, and
there is still some missing stuff (like ioopt, and physical backing of
non-merged cache files, debugging of layering concepts.)


# 32670 21-Jan-1998 dyson

Allow gdb to work again.


# 32585 17-Jan-1998 dyson

Tie up some loose ends in vnode/object management. Remove an unneeded
config option in pmap. Fix a problem with faulting in pages. Clean-up
some loose ends in swap pager memory management.

The system should be much more stable, but all subtile bugs aren't fixed yet.


# 32454 11-Jan-1998 dyson

Fix some vnode management problems, and better mgmt of vnode free list.
Fix the UIO optimization code.
Fix an assumption in vm_map_insert regarding allocation of swap pagers.
Fix an spl problem in the collapse handling in vm_object_deallocate.
When pages are freed from vnode objects, and the criteria for putting
the associated vnode onto the free list is reached, either put the
vnode onto the list, or put it onto an interrupt safe version of the
list, for further transfer onto the actual free list.
Some minor syntax changes changing pre-decs, pre-incs to post versions.
Remove a bogus timeout (that I added for debugging) from vn_lock.

PHK will likely still have problems with the vnode list management, and
so do I, but it is better than it was.


# 32286 06-Jan-1998 dyson

Make our v_usecount vnode reference count work identically to the
original BSD code. The association between the vnode and the vm_object
no longer includes reference counts. The major difference is that
vm_object's are no longer freed gratuitiously from the vnode, and so
once an object is created for the vnode, it will last as long as the
vnode does.

When a vnode object reference count is incremented, then the underlying
vnode reference count is incremented also. The two "objects" are now
more intimately related, and so the interactions are now much less
complex.

When vnodes are now normally placed onto the free queue with an object still
attached. The rundown of the object happens at vnode rundown time, and
happens with exactly the same filesystem semantics of the original VFS
code. There is absolutely no need for vnode_pager_uncache and other
travesties like that anymore.

A side-effect of these changes is that SMP locking should be much simpler,
the I/O copyin/copyout optimizations work, NFS should be more ponderable,
and further work on layered filesystems should be less frustrating, because
of the totally coherent management of the vnode objects and vnodes.

Please be careful with your system while running this code, but I would
greatly appreciate feedback as soon a reasonably possible.


# 32072 28-Dec-1997 dyson

Fix the decl of vfs_ioopt, allow LFS to compile again, fix a minor problem
with the object cache removal.


# 32071 28-Dec-1997 dyson

Lots of improvements, including restructring the caching and management
of vnodes and objects. There are some metadata performance improvements
that come along with this. There are also a few prototypes added when
the need is noticed. Changes include:

1) Cleaning up vref, vget.
2) Removal of the object cache.
3) Nuke vnode_pager_uncache and friends, because they aren't needed anymore.
4) Correct some missing LK_RETRY's in vn_lock.
5) Correct the page range in the code for msync.

Be gentle, and please give me feedback asap.


# 31991 25-Dec-1997 dyson

The ioopt code is still buggy, but wasn't fully disabled.


# 31857 19-Dec-1997 dyson

Change bogus usage of btoc to atop. The incorrect usage of btoc was
pointed out by bde.


# 31853 19-Dec-1997 dyson

Some performance improvements, and code cleanups (including changing our
expensive OFF_TO_IDX to btoc whenever possible.)


# 31392 24-Nov-1997 bde

Don't #define max() to get a version that works with vm_ooffset's.
Just use qmax().

This should be fixed more generally using overloaded functions.


# 31175 14-Nov-1997 tegge

Simplify map entries during user page wire and user page unwire operations in
vm_map_user_pageable().

Check return value of vm_map_lock_upgrade() during a user page wire operation.


# 31016 07-Nov-1997 phk

Remove a bunch of variables which were unused both in GENERIC and LINT.

Found by: -Wunused


# 30813 28-Oct-1997 bde

Removed unused #includes.


# 30700 24-Oct-1997 dyson

Decrease the initial allocation for the zone allocations.


# 30354 12-Oct-1997 phk

Last major round (Unless Bruce thinks of somthing :-) of malloc changes.

Distribute all but the most fundamental malloc types. This time I also
remembered the trick to making things static: Put "static" in front of
them.

A couple of finer points by: bde


# 30309 11-Oct-1997 phk

Distribute and statizice a lot of the malloc M_* types.

Substantial input from: bde


# 29653 21-Sep-1997 dyson

Change the M_NAMEI allocations to use the zone allocator. This change
plus the previous changes to use the zone allocator decrease the useage
of malloc by half. The Zone allocator will be upgradeable to be able
to use per CPU-pools, and has more intelligent usage of SPLs. Additionally,
it has reasonable stats gathering capabilities, while making most calls
inline.


# 29316 12-Sep-1997 jlemon

Do not consider VM_PROT_OVERRIDE_WRITE to be part of the protection
entry when handling a fault. This is set by procfs whenever it wants
to write to a page, as a means of overriding `r-x COW' entries, but
causes failures in the `rwx' case.

Submitted by: bde


# 28992 01-Sep-1997 bde

Removed unused #includes.


# 28751 25-Aug-1997 bde

Fixed type mismatches for functions with args of type vm_prot_t and/or
vm_inherit_t. These types are smaller than ints, so the prototypes
should have used the promoted type (int) to match the old-style function
definitions. They use just vm_prot_t and/or vm_inherit_t. This depends
on gcc features to work. I fixed the definitions since this is easiest.
The correct fix may be to change the small types to u_int, to optimize
for time instead of space.


# 28349 18-Aug-1997 fsmp

Added includes of smp.h for SMP.
This eliminates a bazillion warnings about implicit s_lock & friends.


# 28345 18-Aug-1997 dyson

Fix kern_lock so that it will work. Additionally, clean-up some of the
VM systems usage of the kernel lock (lockmgr) code. This is a first
pass implementation, and is expected to evolve as needed. The API
for the lock manager code has not changed, but the underlying implementation
has changed significantly. This change should not materially affect
our current SMP or UP code without non-standard parameters being used.


# 27930 06-Aug-1997 dyson

Add exposure of some vm_zone allocation stats by sysctl. Also, change
the initialization parameters of some zones in VM map. This contains
only optimizations and not bugfixes.


# 27924 05-Aug-1997 dyson

Fixed the commit botch that was causing crashes soon after system
startup. Due to the error, the initialization of the zone for
pv_entries was missing. The system should be usable again.


# 27923 05-Aug-1997 dyson

Another attempt at cleaning up the new memory allocator.


# 27922 05-Aug-1997 dyson

Fix some bugs, document vm_zone better. Add copyright to vm_zone.h. Use
the new zone code in pmap.c so that we can get rid of the ugly ad-hoc
allocations in pmap.c.


# 27905 04-Aug-1997 dyson

Modify pmap to use our new memory allocator. Also, change the vm_map_entry
allocations to be interrupt safe.


# 27899 04-Aug-1997 dyson

Get rid of the ad-hoc memory allocator for vm_map_entries, in lieu of
a simple, clean zone type allocator. This new allocator will also be
used for machine dependent pmap PV entries.


# 27715 27-Jul-1997 dyson

Fix a very subtile problem that causes unnessary numbers of objects backing
a single logical object.
Submitted by: Alan Cox <alc@cs.rice.edu>


# 26851 23-Jun-1997 tegge

Don't try upgrading an existing exclusive lock in vm_map_user_pageable.
This should close PR kern/3180.
Also remove a bogus unconditional call to vm_map_unlock_read in
vm_map_lookup.


# 26667 15-Jun-1997 dyson

Fix a reference problem with maps. Only appears to manifest itself when
sharing address spaces.


# 24848 12-Apr-1997 dyson

Fully implement vfork. Vfork is now much much faster than even our
fork. (On my machine, fork is about 240usecs, vfork is 78usecs.)

Implement rfork(!RFPROC !RFMEM), which allows a thread to divorce its memory
from the other threads of a group.

Implement rfork(!RFPROC RFCFDG), which closes all file descriptors, eliminating
possible existing shares with other threads/processes.

Implement rfork(!RFPROC RFFDG), which divorces the file descriptors for a
thread from the rest of the group.

Fix the case where a thread does an exec. It is almost nonsense for a thread
to modify the other threads address space by an exec, so we
now automatically divorce the address space before modifying it.


# 24691 07-Apr-1997 peter

The biggie: Get rid of the UPAGES from the top of the per-process address
space. (!)

Have each process use the kernel stack and pcb in the kvm space. Since
the stacks are at a different address, we cannot copy the stack at fork()
and allow the child to return up through the function call tree to return
to user mode - create a new execution context and have the new process
begin executing from cpu_switch() and go to user mode directly.
In theory this should speed up fork a bit.

Context switch the tss_esp0 pointer in the common tss. This is a lot
simpler since than swithching the gdt[GPROC0_SEL].sd.sd_base pointer
to each process's tss since the esp0 pointer is a 32 bit pointer, and the
sd_base setting is split into three different bit sections at non-aligned
boundaries and requires a lot of twiddling to reset.

The 8K of memory at the top of the process space is now empty, and unmapped
(and unmappable, it's higher than VM_MAXUSER_ADDRESS).

Simplity the pmap code to manage process contexts, we no longer have to
double map the UPAGES, this simplifies and should measuably speed up fork().

The following parts came from John Dyson:

Set PG_G on the UPAGES that are now in kernel context, and invalidate
them when swapping them out.

Move the upages object (upobj) from the vmspace to the proc structure.

Now that the UPAGES (pcb and kernel stack) are out of user space, make
rfork(..RFMEM..) do what was intended by sharing the vmspace
entirely via reference counting rather than simply inheriting the mappings.


# 24668 06-Apr-1997 dyson

Make vm_map_protect be more complete about map simplification. This
is useful when a process changes it's page range protections very
much.
Submitted by: Alan Cox <alc@cs.rice.edu>


# 24666 06-Apr-1997 dyson

Fix the gdb executable modify problem. Thanks to the detective work
by Alan Cox <alc@cs.rice.edu>, and his description of the problem.

The bug was primarily in procfs_mem, but the mistake likely happened
due to the lack of vm system support for the operation. I added
better support for selective marking of page dirty flags so that
vm_map_pageable(wiring) will not cause this problem again.

The code in procfs_mem is now less bogus (but maybe still a little
so.)


# 22975 22-Feb-1997 peter

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 22521 10-Feb-1997 dyson

This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.

Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>


# 22156 31-Jan-1997 dyson

Another fix to inheriting shared segments. Do the copy on write
thing if needed.
Submitted by: Alan Cox <alc@cs.rice.edu>


# 21940 21-Jan-1997 dyson

Fix two problems where a NULL object is dereferenced. One problem
was in the VM_INHERIT_SHARE case of vmspace_fork, and also in vm_map_madvise.
Submitted by: Alan Cox <alc@cs.rice.edu>


# 21754 16-Jan-1997 dyson

Change the map entry flags from bitfields to bitmasks. Allows
for some code simplification.


# 21673 14-Jan-1997 jkh

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# 21258 03-Jan-1997 dyson

Undo the collapse breakage (swap space usage problem.)


# 21157 01-Jan-1997 dyson

Guess what? We left alot of the old collapse code that is not needed
anymore with the "full" collapse fix that we added about 1yr ago!!! The
code has been removed by optioning it out for now, so we can put it back
in ASAP if any problems are found.


# 21134 31-Dec-1996 dyson

A very significant improvement in the management of process maps
and objects. Previously, "fancy" memory management techniques
such as that used by the M3 RTS would have the tendancy of chopping
up processes allocated memory into lots of little objects. Alan
has come up with some improvements to migtigate the sitution to
the point where even the M3 RTS only has one object for bss and
it's managed memory (when running CVSUP.) (There are still cases where the
situation isn't improved when the system pages -- but this is much much
better for the vast majority of cases.) The system will now be able
to much more effectively merge map entries.

Submitted by: Alan Cox <alc@cs.rice.edu>


# 20993 28-Dec-1996 dyson

Eliminate the redundancy due to the similarity between the routines
vm_map_simplify and vm_map_simplify_entry. Make vm_map_simplify_entry
handle wired maps so that we can get rid of vm_map_simplify. Modify
the callers of vm_map_simplify to properly use vm_map_simplify_entry.
Submitted by: Alan Cox <alc@cs.rice.edu>


# 20449 14-Dec-1996 dyson

Implement closer-to POSIX mlock semantics. The major difference is
that we do allow mlock to span unallocated regions (of course, not
mlocking them.) We also allow mlocking of RO regions (which the old
code couldn't.) The restriction there is that once a RO region is
wired (mlocked), it cannot be debugged (or EVER written to.)

Under normal usage, the new mlock code will be a significant improvement
over our old stuff.


# 20189 07-Dec-1996 dyson

Expunge inlines...


# 20187 07-Dec-1996 dyson

Fix a map entry leak problem found by DG. Also, de-inline a function
vm_map_entry_dispose, because it won't help being inlined.


# 20182 06-Dec-1996 dyson

Make vm_map_insert much more intelligent in the MAP_NOFAULT case so
that map entries are coalesced when appropriate. Also, conditionalize
some code that is currently not used in vm_map_insert. This mod
has been added to eliminate unnecessary map entries in buffer map.

Additionally, there were some cases where map coalescing could be done
when it shouldn't. That problem has been resolved.


# 20054 30-Nov-1996 dyson

Implement a new totally dynamic (up to MAXPHYS) buffer kva allocation
scheme. Additionally, add the capability for checking for unexpected
kernel page faults. The maximum amount of kva space for buffers hasn't
been decreased from where it is, but it will now be possible to do so.

This scheme manages the kva space similar to the buffers themselves. If
there isn't enough kva space because of usage or fragementation, buffers
will be reclaimed until a buffer allocation is successful. This scheme
should be very resistant to fragmentation problems until/if the LFS code
is fixed and uses the bogus buffer locking scheme -- but a 'fixed' LFS
is not likely to use such a scheme.

Now there should be NO problem allocating buffers up to MAXPHYS.


# 18298 14-Sep-1996 bde

Attached vm ddb commands `show map', `show vmochk', `show object',
`show vmopag', `show page' and `show pageq'. Moved all vm ddb stuff
to the ends of the vm source files.

Changed printf() to db_printf(), `indent' to db_indent, and iprintf()
to db_iprintf() in ddb commands. Moved db_indent and db_iprintf()
from vm to ddb.

vm_page.c:
Don't use __pure. Staticized.

db_output.c:
Reduced page width from 80 to 79 to inhibit double spacing for long
lines (there are still some problems if words are printed across
column 79).


# 18178 08-Sep-1996 dyson

Fixed the use of the wrong variable in vm_map_madvise.


# 18163 08-Sep-1996 dyson

Improve the scalability of certain pmap operations.


# 17334 30-Jul-1996 dyson

Backed out the recent changes/enhancements to the VM code. The
problem with the 'shell scripts' was found, but there was a 'strange'
problem found with a 486 laptop that we could not find. This commit
backs the code back to 25-jul, and will be re-entered after the snapshot
in smaller (more easily tested) chunks.


# 17294 27-Jul-1996 dyson

This commit is meant to solve a couple of VM system problems or
performance issues.

1) The pmap module has had too many inlines, and so the
object file is simply bigger than it needs to be.
Some common code is also merged into subroutines.
2) Removal of some *evil* PHYS_TO_VM_PAGE macro calls.
Unfortunately, a few have needed to be added also.
The removal caused the need for more vm_page_lookups.
I added lookup hints to minimize the need for the
page table lookup operations.
3) Removal of some bogus performance improvements, that
mostly made the code more complex (tracking individual
page table page updates unnecessarily). Those improvements
actually hurt 386 processors perf (not that people who
worry about perf use 386 processors anymore :-)).
4) Changed pv queue manipulations/structures to be TAILQ's.
5) The pv queue code has had some performance problems since
day one. Some significant scalability issues are resolved
by threading the pv entries from the pmap AND the physical
address instead of just the physical address. This makes
certain pmap operations run much faster. This does
not affect most micro-benchmarks, but should help loaded system
performance *significantly*. DG helped and came up with most
of the solution for this one.
6) Most if not all pmap bit operations follow the pattern:
pmap_test_bit();
pmap_clear_bit();
That made for twice the necessary pv list traversal. The
pmap interface now supports only pmap_tc_bit type operations:
pmap_[test/clear]_modified, pmap_[test/clear]_referenced.
Additionally, the modified routine now takes a vm_page_t arg
instead of a phys address. This eliminates a PHYS_TO_VM_PAGE
operation.
7) Several rewrites of routines that contain redundant code to
use common routines, so that there is a greater likelihood of
keeping the cache footprint smaller.


# 16993 07-Jul-1996 dg

In all special cases for spl or page_alloc where kmem_map is check for,
mb_map (a submap of kmem_map) must also be checked.
Thanks to wcarchive (err...sort of) for demonstrating this bug.


# 16409 16-Jun-1996 dyson

Various bugfixes/cleanups from me and others:
1) Remove potential race conditions on waking up in vm_page_free_wakeup
by making sure that it is at splvm().
2) Fix another bug in vm_map_simplify_entry.
3) Be more complete about converting from default to swap pager
when an object grows to be large enough that there can be
a problem with data structure allocation under low memory
conditions.
4) Make some madvise code more efficient.
5) Added some comments.


# 16318 12-Jun-1996 dyson

Fix some serious errors in vm_map_simplify_entries.


# 16026 30-May-1996 dyson

This commit is dual-purpose, to fix more of the pageout daemon
queue corruption problems, and to apply Gary Palmer's code cleanups.
David Greenman helped with these problems also. There is still
a hang problem using X in small memory machines.


# 15978 29-May-1996 dyson

Make sure that pageout deadlocks cannot occur. There is a problem
that the datastructures needed to support the swap pager can take
enough space to fully deplete system memory, and cause a deadlock.
This change keeps large objects from being filled with dirty pages
without the appropriate swap pager datastructures. Right now,
default objects greater than 1/4 the size of available system memory
are converted to swap objects, thereby eliminating the risk of deadlock.


# 15873 22-May-1996 dyson

Initial support for MADV_FREE, support for pages that we don't care
about the contents anymore. This gives us alot of the advantage of
freeing individual pages through munmap, but with almost none of the
overhead.


# 15819 19-May-1996 dyson

Initial support for mincore and madvise. Both are almost fully
supported, except madvise does not page in with MADV_WILLNEED, and
MADV_DONTNEED doesn't force dirty pages out.


# 15809 18-May-1996 dyson

This set of commits to the VM system does the following, and contain
contributions or ideas from Stephen McKay <syssgm@devetir.qld.gov.au>,
Alan Cox <alc@cs.rice.edu>, David Greenman <davidg@freebsd.org> and me:

More usage of the TAILQ macros. Additional minor fix to queue.h.
Performance enhancements to the pageout daemon.
Addition of a wait in the case that the pageout daemon
has to run immediately.
Slightly modify the pageout algorithm.
Significant revamp of the pmap/fork code:
1) PTE's and UPAGES's are NO LONGER in the process's map.
2) PTE's and UPAGES's reside in their own objects.
3) TOTAL elimination of recursive page table pagefaults.
4) The page directory now resides in the PTE object.
5) Implemented pmap_copy, thereby speeding up fork time.
6) Changed the pv entries so that the head is a pointer
and not an entire entry.
7) Significant cleanup of pmap_protect, and pmap_remove.
8) Removed significant amounts of machine dependent
fork code from vm_glue. Pushed much of that code into
the machine dependent pmap module.
9) Support more completely the reuse of already zeroed
pages (Page table pages and page directories) as being
already zeroed.
Performance and code cleanups in vm_map:
1) Improved and simplified allocation of map entries.
2) Improved vm_map_copy code.
3) Corrected some minor problems in the simplify code.
Implemented splvm (combo of splbio and splimp.) The VM code now
seldom uses splhigh.
Improved the speed of and simplified kmem_malloc.
Minor mod to vm_fault to avoid using pre-zeroed pages in the case
of objects with backing objects along with the already
existant condition of having a vnode. (If there is a backing
object, there will likely be a COW... With a COW, it isn't
necessary to start with a pre-zeroed page.)
Minor reorg of source to perhaps improve locality of ref.


# 15583 03-May-1996 phk

Another sweep over the pmap/vm macros, this time with more focus on
the usage. I'm not satisfied with the naming, but now at least there is
less bogus stuff around.


# 15459 29-Apr-1996 dyson

Move the map entry allocations from the kmem_map to the kernel_map. As
a side effect, correct the associated object offset.


# 15018 03-Apr-1996 dyson

Fixed a problem that the UPAGES of a process were being run down
in a suboptimal manner. I had also noticed some panics that appeared
to be at least superficially caused by this problem. Also, included
are some minor mods to support more general handling of page table page
faulting. More details in a future commit.


# 14865 28-Mar-1996 dyson

VM performance improvements, and reorder some operations in VM fault
in anticipation of a fix in pmap that will allow the mlock system call to work
without panicing the system.


# 14864 28-Mar-1996 dyson

More map_simplify fixes from Alan Cox. This very significanly improves the
performance when the map has been chopped up. The map simplify operations
really work now.
Reviewed by: dyson
Submitted by: Alan Cox <alc@cs.rice.edu>


# 14610 12-Mar-1996 dyson

This commit is as a result of a comment by Alan Cox (alc@cs.rice.edu)
regarding the "real" problem with maps that we have been having
over the last few weeks. He noted that the first_free pointer was
left dangling in certain circumstances -- and he was right!!! This
should fix the map problems that we were having, and also give us the
advantage of being able to simplify maps more aggressively.


# 14589 12-Mar-1996 dyson

Fix the map corruption problem that appears as a u_map allocation
error.


# 14428 09-Mar-1996 dyson

Fix two problems:
The pmap_remove in vm_map_clean incorrectly unmapped the entire
map entry.
The new vm_map_simplify_entry code had an error (the offset
of the combined map entry was not set correctly.)
Submitted by: Alan Cox <alc@cs.rice.edu>


# 14366 04-Mar-1996 dyson

Fix a problem that pages in a mapped region were not always
properly invalidated. Now we traverse the object shadow chain
properly.


# 14360 03-Mar-1996 peter

Remove the #ifdef notyet from the prototype of vm_map_simplify. John
re-enabled the function but missed the prototype, causing a warning.


# 14316 02-Mar-1996 dyson

1) Eliminate unnecessary bzero of UPAGES.
2) Eliminate unnecessary copying of pages during/after forks.
3) Add user map simplification.


# 14036 11-Feb-1996 dyson

Fixed a really bogus problem with msync ripping pages away from
objects before they were written. Also, don't allow processes
without write access to remove pages from vm_objects.


# 13490 19-Jan-1996 dyson

Eliminated many redundant vm_map_lookup operations for vm_mmap.
Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish
overhead for merged cache.
Efficiency improvement for vfs_cluster. It used to do alot of redundant
calls to cluster_rbuild.
Correct the ordering for vrele of .text and release of credentials.
Use the selective tlb update for 486/586/P6.
Numerous fixes to the size of objects allocated for files. Additionally,
fixes in the various pagers.
Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs.
Fixes in the swap pager for exhausted resources. The pageout code
will not as readily thrash.
Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into
page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE),
thereby improving efficiency of several routines.
Eliminate even more unnecessary vm_page_protect operations.
Significantly speed up process forks.
Make vm_object_page_clean more efficient, thereby eliminating the pause
that happens every 30seconds.
Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the
case of filesystems mounted async.
Fix a panic with busy pages when write clustering is done for non-VMIO
buffers.


# 13228 04-Jan-1996 wollman

Convert DDB to new-style option.


# 12820 14-Dec-1995 phk

Another mega commit to staticize things.


# 12767 11-Dec-1995 dyson

Changes to support 1Tb filesizes. Pages are now named by an
(object,index) pair instead of (object,offset) pair.


# 12662 07-Dec-1995 dg

Untangled the vm.h include file spaghetti.


# 12423 20-Nov-1995 phk

Remove unused vars & funcs, make things static, protoize a little bit.


# 12226 12-Nov-1995 dg

Moved vm_map_lock call to inside the splhigh protection in vm_map_find().
This closes a probably rare but nonetheless real window that would result
in a process hanging or the system panicing.

Reviewed by: dyson, davidg
Submitted by: kato@eclogite.eps.nagoya-u.ac.jp (KATO Takenori)


# 11709 23-Oct-1995 dyson

Get rid of machine-dependent NBPG and replace with PAGE_SIZE.


# 10344 26-Aug-1995 bde

Change vm_map_print() to have the correct number and type of args for
a ddb command.


# 9507 13-Jul-1995 dg

NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct
proc or any VM system structure will have to be rebuilt!!!

Much needed overhaul of the VM system. Included in this first round of
changes:

1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages,
haspage, and sync operations are supported. The haspage interface now
provides information about clusterability. All pager routines now take
struct vm_object's instead of "pagers".

2) Improved data structures. In the previous paradigm, there is constant
confusion caused by pagers being both a data structure ("allocate a
pager") and a collection of routines. The idea of a pager structure has
escentially been eliminated. Objects now have types, and this type is
used to index the appropriate pager. In most cases, items in the pager
structure were duplicated in the object data structure and thus were
unnecessary. In the few cases that remained, a un_pager structure union
was created in the object to contain these items.

3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now
be removed. For instance, vm_object_enter(), vm_object_lookup(),
vm_object_remove(), and the associated object hash list were some of the
things that were removed.

4) simple_lock's removed. Discussion with several people reveals that the
SMP locking primitives used in the VM system aren't likely the mechanism
that we'll be adopting. Even if it were, the locking that was in the code
was very inadequate and would have to be mostly re-done anyway. The
locking in a uni-processor kernel was a no-op but went a long way toward
making the code difficult to read and debug.

5) Places that attempted to kludge-up the fact that we don't have kernel
thread support have been fixed to reflect the reality that we are really
dealing with processes, not threads. The VM system didn't have complete
thread support, so the comments and mis-named routines were just wrong.
We now use tsleep and wakeup directly in the lock routines, for instance.

6) Where appropriate, the pagers have been improved, especially in the
pager_alloc routines. Most of the pager_allocs have been rewritten and
are now faster and easier to maintain.

7) The pagedaemon pageout clustering algorithm has been rewritten and
now tries harder to output an even number of pages before and after
the requested page. This is sort of the reverse of the ideal pagein
algorithm and should provide better overall performance.

8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup
have been removed. Some other unnecessary casts have also been removed.

9) Some almost useless debugging code removed.

10) Terminology of shadow objects vs. backing objects straightened out.
The fact that the vm_object data structure escentially had this
backwards really confused things. The use of "shadow" and "backing
object" throughout the code is now internally consistent and correct
in the Mach terminology.

11) Several minor bug fixes, including one in the vm daemon that caused
0 RSS objects to not get purged as intended.

12) A "default pager" has now been created which cleans up the transition
of objects to the "swap" type. The previous checks throughout the code
for swp->pg_data != NULL were really ugly. This change also provides
the rudiments for future backing of "anonymous" memory by something
other than the swap pager (via the vnode pager, for example), and it
allows the decision about which of these pagers to use to be made
dynamically (although will need some additional decision code to do
this, of course).

13) (dyson) MAP_COPY has been deprecated and the corresponding "copy
object" code has been removed. MAP_COPY was undocumented and non-
standard. It was furthermore broken in several ways which caused its
behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will
continue to work correctly, but via the slightly different semantics
of MAP_PRIVATE.

14) (dyson) Sharing maps have been removed. It's marginal usefulness in a
threads design can be worked around in other ways. Both #12 and #13
were done to simplify the code and improve readability and maintain-
ability. (As were most all of these changes)

TODO:

1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing
this will reduce the vnode pager to a mere fraction of its current size.

2) Rewrite vm_fault and the swap/vnode pagers to use the clustering
information provided by the new haspage pager interface. This will
substantially reduce the overhead by eliminating a large number of
VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be
improved to provide both a "behind" and "ahead" indication of
contiguousness.

3) Implement the extended features of pager_haspage in swap_pager_haspage().
It currently just says 0 pages ahead/behind.

4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps
via a much more general mechanism that could also be used for disk
striping of regular filesystems.

5) Do something to improve the architecture of vm_object_collapse(). The
fact that it makes calls into the swap pager and knows too much about
how the swap pager operates really bothers me. It also doesn't allow
for collapsing of non-swap pager objects ("unnamed" objects backed by
other pagers).


# 8876 30-May-1995 rgrimes

Remove trailing whitespace.


# 7883 16-Apr-1995 dg

Moved some zero-initialized variables into .bss. Made code intended to be
called only from DDB #ifdef DDB. Removed some completely unused globals.


# 7365 25-Mar-1995 dg

Pass syncio flag to vm_object_clean(). It remains unimplemented, however.


# 7246 22-Mar-1995 dg

Removed unused fifth argument to vm_object_page_clean(). Fixed bug with
VTEXT not always getting cleared when it is supposed to. Added check to
make sure that vm_object_remove() isn't called with a NULL pager or for
a pager for an OBJ_INTERNAL object (neither of which will be on the hash
list). Clear OBJ_CANPERSIST if we decide to terminate it because of no
resident pages.


# 7204 20-Mar-1995 dg

Added a new boolean argument to vm_object_page_clean that causes it to
only toss out clean pages if TRUE.


# 7090 16-Mar-1995 bde

Add and move declarations to fix all of the warnings from `gcc -Wimplicit'
(except in netccitt, netiso and netns) and most of the warnings from
`gcc -Wnested-externs'. Fix all the bugs found. There were no serious
ones.


# 6816 01-Mar-1995 dg

Various changes from John and myself that do the following:

New functions create - vm_object_pip_wakeup and pagedaemon_wakeup that
are used to reduce the actual number of wakeups.
New function vm_page_protect which is used in conjuction with some new
page flags to reduce the number of calls to pmap_page_protect.
Minor changes to reduce unnecessary spl nesting.
Rewrote vm_page_alloc() to improve readability.
Various other mostly cosmetic changes.


# 6584 20-Feb-1995 dg

Set page alloced for map entries as valid.


# 6351 14-Feb-1995 dg

Fixed problem with msync causing a panic.

Submitted by: John Dyson


# 6129 02-Feb-1995 dg

swap_pager.c:
Fixed long standing bug in freeing swap space during object collapses.
Fixed 'out of space' messages from printing out too often.
Modified to use new kmem_malloc() calling convention.
Implemented an additional stat in the swap pager struct to count the
amount of space allocated to that pager. This may be removed at some
point in the future.
Minimized unnecessary wakeups.

vm_fault.c:
Don't try to collect fault stats on 'swapped' processes - there aren't
any upages to store the stats in.
Changed read-ahead policy (again!).

vm_glue.c:
Be sure to gain a reference to the process's map before swapping.
Be sure to lose it when done.

kern_malloc.c:
Added the ability to specify if allocations are at interrupt time or
are 'safe'; this affects what types of pages can be allocated.

vm_map.c:
Fixed a variety of map lock problems; there's still a lurking bug that
will eventually bite.

vm_object.c:
Explicitly initialize the object fields rather than bzeroing the struct.
Eliminated the 'rcollapse' code and folded it's functionality into the
"real" collapse routine.
Moved an object_unlock() so that the backing_object is protected in
the qcollapse routine.
Make sure nobody fools with the backing_object when we're destroying it.
Added some diagnostic code which can be called from the debugger that
looks through all the internal objects and makes certain that they
all belong to someone.

vm_page.c:
Fixed a rather serious logic bug that would result in random system
crashes. Changed pagedaemon wakeup policy (again!).

vm_pageout.c:
Removed unnecessary page rotations on the inactive queue.
Changed the number of pages to explicitly free to just free_reserved
level.

Submitted by: John Dyson


# 5841 24-Jan-1995 dg

Added ability to detect sequential faults and DTRT. (swap_pager.c)
Added hook for pmap_prefault() and use symbolic constant for new third
argument to vm_page_alloc() (vm_fault.c, various)
Changed the way that upages and page tables are held. (vm_glue.c)
Fixed architectural flaw in allocating pages at interrupt time that was
introduced with the merged cache changes. (vm_page.c, various)
Adjusted some algorithms to acheive better paging performance and to
accomodate the fix for the architectural flaw mentioned above. (vm_pageout.c)
Fixed pbuf handling problem, changed policy on handling read-behind page.
(vnode_pager.c)

Submitted by: John Dyson


# 5464 10-Jan-1995 dg

Fixed some formatting weirdness that I overlooked in the previous commit.


# 5455 09-Jan-1995 dg

These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.

The majority of the merged VM/cache work is by John Dyson.

The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.

vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.

vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.

vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.

vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.

vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.

pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.

vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.

proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.

swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.

machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.

machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.

ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.

Submitted by: John Dyson and David Greenman


# 5151 18-Dec-1994 dg

Fixed multiple bogons with the map entry handling.


# 5146 18-Dec-1994 dg

Fixed bug where statically allocated map entries might be freed to the
malloc pool...causing a panic.

Submitted by: John Dyson


# 5114 15-Dec-1994 dg

Protect kmem_map modifications with splhigh() to work around a problem with
the map being locked at interrupt time.


# 3449 08-Oct-1994 phk

Cosmetics: unused vars, ()'s, #include's &c &c to silence gcc.
Reviewed by: davidg


# 2112 18-Aug-1994 wollman

Fix up some sloppy coding practices:

- Delete redundant declarations.
- Add -Wredundant-declarations to Makefile.i386 so they don't come back.
- Delete sloppy COMMON-style declarations of uninitialized data in
header files.
- Add a few prototypes.
- Clean up warnings resulting from the above.

NB: ioconf.c will still generate a redundant-declaration warning, which
is unavoidable unless somebody volunteers to make `config' smarter.


# 1835 04-Aug-1994 dg

Added some code that was accidently left out early in the 1.x -> 2.0 VM
system conversion.
Submitted by: John Dyson


# 1817 02-Aug-1994 dg

Added $Id$


# 1549 25-May-1994 rgrimes

The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.

Reviewed by: Rodney W. Grimes
Submitted by: John Dyson and David Greenman


# 1542 24-May-1994 rgrimes

This commit was generated by cvs2svn to compensate for changes in r1541,
which included commits to RCS files with non-trunk default branches.


# 1541 24-May-1994 rgrimes

BSD 4.4 Lite Kernel Sources