History log of /freebsd-current/sys/kern/kern_malloc.c
Revision Date Author Comments
# deab5717 27-May-2024 Mitchell Horne <mhorne@FreeBSD.org>

Adjust comments referencing vm_mem_init()

I cannot find a time where the function was not named this.

Reviewed by: kib, markj
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45383


# 29363fb4 23-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove ancient SCCS tags.

Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.

Sponsored by: Netflix


# a03c2393 09-Nov-2023 Alexander Motin <mav@FreeBSD.org>

uma: Improve memory modified after free panic messages

- Pass zone pointer to trash_ctor() and report zone name in the panic
message. It may be difficult to figyre out zone just by the item size.
- Do not pass user arguments to internal trash calls, pass thezone.
- Report malloc type name in the same unified panic message.
- Report corruption offset from the beginning of the items instead of
the full pointer. It makes panic message shorter and more readable.


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# f49fd63a 22-Sep-2022 John Baldwin <jhb@FreeBSD.org>

kmem_malloc/free: Use void * instead of vm_offset_t for kernel pointers.

Reviewed by: kib, markj
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D36549


# c84c5e00 18-Jul-2022 Mitchell Horne <mhorne@FreeBSD.org>

ddb: annotate some commands with DB_CMD_MEMSAFE

This is not completely exhaustive, but covers a large majority of
commands in the tree.

Reviewed by: markj
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D35583


# dbd51c41 12-Apr-2022 John Baldwin <jhb@FreeBSD.org>

realloc(9): Move slab and zone under #ifndef DEBUG_REDZONE.


# 880b670c 06-Oct-2021 Mark Johnston <markj@FreeBSD.org>

malloc: Unmark KASAN redzones if the full allocation size was requested

Consumers that want the full allocation size will typically access the
full buffer, so mark the entire allocation as valid to avoid useless
KASAN reports.

Sponsored by: The FreeBSD Foundation


# 71d31f1c 24-Sep-2021 Konstantin Belousov <kib@FreeBSD.org>

malloc_aligned(9): allow zero size and alignment

For alignment we do not need to do anything to make it operational.
For size, upgrade zero sized request to one byte so that we do not
request insane amount of memory for placeholder.

Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32127


# 10094910 10-Aug-2021 Mark Johnston <markj@FreeBSD.org>

uma: Add KMSAN hooks

For now, just hook the allocation path: upon allocation, items are
marked as initialized (absent M_ZERO). Some zones are exempted from
this when it would otherwise raise false positives.

Use kmsan_orig() to update the origin map for UMA and malloc(9)
allocations. This allows KMSAN to print the return address when an
uninitialized UMA item is implicated in a report. For example:
panic: MSan: Uninitialized UMA memory from m_getm2+0x7fe

Sponsored by: The FreeBSD Foundation


# 89786088 10-Aug-2021 Mark Johnston <markj@FreeBSD.org>

amd64: Populate the KMSAN shadow maps and integrate with the VM

- During boot, allocate PDP pages for the shadow maps. The region above
KERNBASE is currently not shadowed.
- Create a dummy shadow for the vm page array. For now, this array is
not protected by the shadow map to help reduce kernel memory usage.
- Grow shadows when growing the kernel map.
- Increase the default kernel stack size when KMSAN is enabled. As with
KASAN, sanitizer instrumentation appears to create stack frames large
enough that the default value is not sufficient.
- Disable UMA's use of the direct map when KMSAN is configured. KMSAN
cannot validate the direct map.
- Disable unmapped I/O when KMSAN configured.
- Lower the limit on paging buffers when KMSAN is configured. Each
buffer has a static MAXPHYS-sized allocation of KVA, which in turn
eats 2*MAXPHYS of space in the shadow map.

Reviewed by: alc, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31295


# 04cc0c39 02-Aug-2021 Kyle Evans <kevans@FreeBSD.org>

malloc(9): provide missing malloc_aligned implementation

Pointy hat: kevans
Fixes: 6162cf885c00 ("malloc(9): Document/complete aligned variants")


# 45e23571 13-Jul-2021 Mark Johnston <markj@FreeBSD.org>

malloc: Pass the allocation size to malloc_large() by value

Its callers do not make use the modified size that malloc_large() was
returning, so there's no need to pass a pointer. No functional change
intended.

MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation


# 9a7c2de3 05-May-2021 Mark Johnston <markj@FreeBSD.org>

realloc: Fix KASAN(9) shadow map updates

When copying from the old buffer to the new buffer, we don't know the
requested size of the old allocation, but only the size of the
allocation provided by UMA. This value is "alloc". Because the copy
may access bytes in the old allocation's red zone, we must mark the full
allocation valid in the shadow map. Do so using the correct size.

Reported by: kp
Tested by: kp
Sponsored by: The FreeBSD Foundation


# 06a53ecf 13-Apr-2021 Mark Johnston <markj@FreeBSD.org>

malloc: Add state transitions for KASAN

- Reuse some REDZONE bits to keep track of the requested and allocated
sizes, and use that to provide red zones.
- As in UMA, disable memory trashing to avoid unnecessary CPU overhead.

MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29461


# 6faf45b3 13-Apr-2021 Mark Johnston <markj@FreeBSD.org>

amd64: Implement a KASAN shadow map

The idea behind KASAN is to use a region of memory to track the validity
of buffers in the kernel map. This region is the shadow map. The
compiler inserts calls to the KASAN runtime for every emitted load
and store, and the runtime uses the shadow map to decide whether the
access is valid. Various kernel allocators call kasan_mark() to update
the shadow map.

Since the shadow map tracks only accesses to the kernel map, accesses to
other kernel maps are not validated by KASAN. UMA_MD_SMALL_ALLOC is
disabled when KASAN is configured to reduce usage of the direct map.
Currently we have no mechanism to completely eliminate uses of the
direct map, so KASAN's coverage is not comprehensive.

The shadow map uses one byte per eight bytes in the kernel map. In
pmap_bootstrap() we create an initial set of page tables for the kernel
and preloaded data.

When pmap_growkernel() is called, we call kasan_shadow_map() to extend
the shadow map. kasan_shadow_map() uses pmap_kasan_enter() to allocate
memory for the shadow region and map it.

Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29417


# 1ae20f7c 07-Mar-2021 Kyle Evans <kevans@FreeBSD.org>

kern: malloc: fix panic on M_WAITOK during THREAD_NO_SLEEPING()

Simple condition flip; we wanted to panic here after epoch_trace_list().

Reviewed by: glebius, markj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D29125


# c743a6bd 06-Mar-2021 Hans Petter Selasky <hselasky@FreeBSD.org>

Implement mallocarray_domainset(9) variant of mallocarray(9).

Reviewed by: kib @
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking


# 1ac7c344 18-Jan-2021 Konstantin Belousov <kib@FreeBSD.org>

malloc_aligned: roundup allocation size up to next power of two

to make it use the right aligned zone.

Reported by: melifaro
Reviewed by: alc, markj (previous version)
Discussed with: jrtc27
Tested by: pho (previous version)
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28219


# 0781c79d 18-Jan-2021 Konstantin Belousov <kib@FreeBSD.org>

Restrict supported alignment for malloc_domainset_aligned(9) to PAGE_SIZE.

UMA page_alloc() does not take an alignment, so UMA can only handle
alignment less then page size.

Noted by: alc
Reviewed by: alc, markj (previous version)
Discussed with: jrtc27
Tested by: pho (previous version)
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28219


# 3b15beb3 13-Jan-2021 Konstantin Belousov <kib@FreeBSD.org>

Implement malloc_domainset_aligned(9).

Change the power-of-two malloc zones to require alignment equal to the
size [*]. Current uma allocator already provides such alignment, so in
fact this change does not change anything except providing future-proof
setup.

Suggested by: markj [*]
Reviewed by: andrew, jah, markj
Tested by: pho
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28147


# 89deca0a 16-Nov-2020 Mateusz Guzik <mjg@FreeBSD.org>

malloc: make malloc_large closer to standalone

This moves entire large alloc handling out of all consumers, apart from
deciding to go there.

This is a step towards creating a fast path.

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D27198


# 9b9bb9ff 13-Nov-2020 Mateusz Guzik <mjg@FreeBSD.org>

malloc: retire MALLOC_PROFILE

The global array has prohibitive performance impact on multicore systems.

The same data (and more) can be obtained with dtrace.

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D27199


# 9aa6d792 12-Nov-2020 Mateusz Guzik <mjg@FreeBSD.org>

malloc: retire malloc_last_fail

The routine does not serve any practical purpose.

Memory can be allocated in many other ways and most consumers pass the
M_WAITOK flag, making malloc not fail in the first place.

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D27143


# f0c90a09 09-Nov-2020 Mateusz Guzik <mjg@FreeBSD.org>

malloc: provide 384 byte zone

Total page count after buildworld on ZFS for 384 (if present) and 512 zones:
before: 29713
after: 25946

per-zone page use:
vm.uma.malloc_384.keg.domain.1.pages: 11621
vm.uma.malloc_384.keg.domain.0.pages: 11597
vm.uma.malloc_512.keg.domain.1.pages: 1280
vm.uma.malloc_512.keg.domain.0.pages: 1448

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D27145


# 8e6526e9 09-Nov-2020 Mateusz Guzik <mjg@FreeBSD.org>

malloc: retire mt_stats_zone in favor of pcpu_zone_64

Reviewed by: markj, imp
Differential Revision: https://reviews.freebsd.org/D27142


# e25d8b67 06-Nov-2020 Mateusz Guzik <mjg@FreeBSD.org>

malloc: tweak the version check in r367432 to include type name

While here fix a whitespace problem.


# bdcc2226 06-Nov-2020 Mateusz Guzik <mjg@FreeBSD.org>

malloc: move malloc_type_internal into malloc_type

According to code comments the original motivation was to allow for
malloc_type_internal changes without ABI breakage. This can be trivially
accomplished by providing spare fields and versioning the struct, as
implemented in the patch below.

The upshots are one less memory indirection on each alloc and disappearance
of mt_zone.

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D27104


# 16b971ed 05-Nov-2020 Mateusz Guzik <mjg@FreeBSD.org>

malloc: add a helper returning size allocated for given request

Sample usage: kernel modules can decide whether to stick to malloc or
create their own zone.

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D27097


# e1b6a7f8 02-Nov-2020 Mateusz Guzik <mjg@FreeBSD.org>

malloc: prefix zones with malloc-

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D27038


# 828afdda 02-Nov-2020 Mateusz Guzik <mjg@FreeBSD.org>

malloc: export kernel zones instead of relying on them being power-of-2

Reviewed by: markj (previous version)
Differential Revision: https://reviews.freebsd.org/D27026


# 82c174a3 30-Oct-2020 Mateusz Guzik <mjg@FreeBSD.org>

malloc: delegate M_EXEC handling to dedicacted routines

It is almost never needed and adds an avoidable branch.

While here do minior clean ups in preparation for larger changes.

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D27019


# 6fed89b1 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

kern: clean up empty lines in .c and .h files


# 5d4bf057 29-Aug-2020 Vladimir Kondratyev <wulf@FreeBSD.org>

LinuxKPI: Implement ksize() function.

In Linux, ksize() gets the actual amount of memory allocated for a given
object. This commit adds malloc_usable_size() to FreeBSD KPI which does
the same. It also maps LinuxKPI ksize() to newly created function.

ksize() function is used by drm-kmod.

Reviewed by: hselasky, kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D26215


# b21b022a 18-Aug-2020 Mark Johnston <markj@FreeBSD.org>

Revert r364310.

Some of the resulting fallout in CAM does not appear straightforward to
fix, so simply revert the commit for now in the absence of a better
solution.

Discussed with: mjg
Reported by: dhw


# 1921bb7b 17-Aug-2020 Gleb Smirnoff <glebius@FreeBSD.org>

With INVARIANTS panic immediately if M_WAITOK is requested in a
non-sleepable context. Previously only _sleep() would panic.
This will catch misuse of M_WAITOK at development stage rather
than at stress load stage.

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D26027


# 96ad26ee 04-Aug-2020 Mark Johnston <markj@FreeBSD.org>

Remove free_domain() and uma_zfree_domain().

These functions were introduced before UMA started ensuring that freed
memory gets placed in domain-local caches. They no longer serve any
purpose since UMA now provides their functionality by default. Remove
them to simplyify the kernel memory allocator interfaces a bit.

Reviewed by: cem, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25937


# 7029da5c 26-Feb-2020 Pawel Biernacki <kaktus@FreeBSD.org>

Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)

r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718


# eaa17d42 22-Feb-2020 Ryan Libby <rlibby@FreeBSD.org>

sys/vm: quiet -Wwrite-strings

Discussed with: kib
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D23796


# 45035bec 15-Feb-2020 Matt Macy <mmacy@FreeBSD.org>

Add zfree to zero allocation before free

Key and cookie management typically wants to
avoid information leaks by explicitly zeroing
before free. This routine simplifies that by
permitting consumers to do so without carrying
the size around.

Reviewed by: jeff@, jhb@
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC (Netgate)
Differential Revision: https://reviews.freebsd.org/D22790


# 58aa35d4 03-Feb-2020 Warner Losh <imp@FreeBSD.org>

Remove sparc64 kernel support

Remove all sparc64 specific files
Remove all sparc64 ifdefs
Removee indireeect sparc64 ifdefs


# dc727127 09-Jan-2020 Mark Johnston <markj@FreeBSD.org>

Change malloc_domain() to return the allocation size to the caller.

Otherwise the malloc type accounting in malloc_domainset(9) is wrong
after r355203.

Reviewed by: rlibby
Reported by: kaktus
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23095


# 6d6a03d7 28-Nov-2019 Jeff Roberson <jeff@FreeBSD.org>

Handle large mallocs by going directly to kmem. Taking a detour through
UMA does not provide any additional value.

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D22563


# b476ae7f 28-Nov-2019 Jeff Roberson <jeff@FreeBSD.org>

Fix DEBUG_REDZONE build after r355169


# 584061b4 28-Nov-2019 Jeff Roberson <jeff@FreeBSD.org>

Garbage collect the mostly unused us_keg field. Use appropriately named
union members in vm_page.h to store the zone and slab. Remove some nearby
dead code.

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D22564


# 5757b59f 29-Oct-2019 Gleb Smirnoff <glebius@FreeBSD.org>

Merge td_epochnest with td_no_sleeping.

Epoch itself doesn't rely on the counter and it is provided
merely for sleeping subsystems to check it.

- In functions that sleep use THREAD_CAN_SLEEP() to assert
correctness. With EPOCH_TRACE compiled print epoch info.
- _sleep() was a wrong place to put the assertion for epoch,
right place is sleepq_add(), as there ways to call the
latter bypassing _sleep().
- Do not increase td_no_sleeping in non-preemptible epochs.
The critical section would trigger all possible safeguards,
no sleeping counter is extraneous.

Reviewed by: kib


# 4b25d1f2 15-Oct-2019 Gleb Smirnoff <glebius@FreeBSD.org>

Missing from r353596.


# bac06038 15-Oct-2019 Gleb Smirnoff <glebius@FreeBSD.org>

When assertion for a thread not being in an epoch fails also print all
entered epochs. Works with EPOCH_TRACE only.

Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D22017


# 46d70077 10-Oct-2019 Conrad Meyer <cem@FreeBSD.org>

ddb: Add CSV option, sorting to 'show (malloc|uma)'

Add /i option for machine-parseable CSV output. This allows ready copy/
pasting into more sophisticated tooling outside of DDB.

Add total zone size ("Memory Use") as a new column for UMA.

For both, sort the displayed list on size (print the largest zones/types
first). This is handy for quickly diagnosing "where has my memory gone?" at
a high level.

Submitted by: Emily Pettigrew <Emily.Pettigrew AT isilon.com> (earlier version)
Sponsored by: Dell EMC Isilon


# 28b740da 14-Jan-2019 Konstantin Belousov <kib@FreeBSD.org>

Handle overflow in calculating max kmem size.

vm_kmem_size is u_long, and it might be not capable of holding page
count times PAGE_SIZE, even when scaled down by VM_KMEM_SIZE_SCALE. As
bde reported, 12G PAE config ends up with zero for kmem size.

Explicitly check for overflow and clamp kmem size at vm_kmem_size_max.
If we end up at zero size because VM_KMEM_SIZE_MAX is not defined,
panic with clear explanation rather then failing in a way which is
hard to relate.

Reported by: bde, pho
Tested by: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D18767


# 26e9d9b0 18-Dec-2018 Mark Johnston <markj@FreeBSD.org>

Fix DDB's "show malloc" after r338899.

MFC after: 3 days
Sponsored by: The FreeBSD Foundation


# 9978bd99 30-Oct-2018 Mark Johnston <markj@FreeBSD.org>

Add malloc_domainset(9) and _domainset variants to other allocator KPIs.

Remove malloc_domain(9) and most other _domain KPIs added in r327900.
The new functions allow the caller to specify a general NUMA domain
selection policy, rather than specifically requesting an allocation from
a specific domain. The latter policy tends to interact poorly with
M_WAITOK, resulting in situations where a caller is blocked indefinitely
because the specified domain is depleted. Most existing consumers of
the _domain KPIs are converted to instead use a DOMAINSET_PREF() policy,
in which we fall back to other domains to satisfy the allocation
request.

This change also defines a set of DOMAINSET_FIXED() policies, which
only permit allocations from the specified domain.

Discussed with: gallatin, jeff
Reported and tested by: pho (previous version)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17418


# 9afff6b1 23-Sep-2018 Mateusz Guzik <mjg@FreeBSD.org>

Eliminate false sharing in malloc due to statistic collection

Currently stats are collected in a MAXCPU-sized array which is not
aligned and suffers enormous false-sharing. Fix the problem by
utilizing per-cpu allocation.

The counter(9) API is not used here as it is too incomplete and does
not provide a win over per-cpu zone sized for malloc stats struct. In
particular stats are being reported for each cpu separately by just
copying what is supposed to be an array element for given cpu.

This eliminates significant false-sharing during malloc-heavy tests
e.g. on Skylake. See the review for details.

Reviewed by: markj
Approved by: re (kib)
Differential Revision: https://reviews.freebsd.org/D17289


# 49bfa624 25-Aug-2018 Alan Cox <alc@FreeBSD.org>

Eliminate the arena parameter to kmem_free(). Implicitly this corrects an
error in the function hypercall_memfree(), where the wrong arena was being
passed to kmem_free().

Introduce a per-page flag, VPO_KMEM_EXEC, to mark physical pages that are
mapped in kmem with execute permissions. Use this flag to determine which
arena the kmem virtual addresses are returned to.

Eliminate UMA_SLAB_KRWX. The introduction of VPO_KMEM_EXEC makes it
redundant.

Update the nearby comment for UMA_SLAB_KERNEL.

Reviewed by: kib, markj
Discussed with: jeff
Approved by: re (marius)
Differential Revision: https://reviews.freebsd.org/D16845


# 44d0efb2 20-Aug-2018 Alan Cox <alc@FreeBSD.org>

Eliminate kmem_alloc_contig()'s unused arena parameter.

Reviewed by: hselasky, kib, markj
Discussed with: jeff
Differential Revision: https://reviews.freebsd.org/D16799


# 0766f278 13-Jun-2018 Jonathan T. Looney <jtl@FreeBSD.org>

Make UMA and malloc(9) return non-executable memory in most cases.

Most kernel memory that is allocated after boot does not need to be
executable. There are a few exceptions. For example, kernel modules
do need executable memory, but they don't use UMA or malloc(9). The
BPF JIT compiler also needs executable memory and did use malloc(9)
until r317072.

(Note that a side effect of r316767 was that the "small allocation"
path in UMA on amd64 already returned non-executable memory. This
meant that some calls to malloc(9) or the UMA zone(9) allocator could
return executable memory, while others could return non-executable
memory. This change makes the behavior consistent.)

This change makes malloc(9) return non-executable memory unless the new
M_EXEC flag is specified. After this change, the UMA zone(9) allocator
will always return non-executable memory, and a KASSERT will catch
attempts to use the M_EXEC flag to allocate executable memory using
uma_zalloc() or its variants.

Allocations that do need executable memory have various choices. They
may use the M_EXEC flag to malloc(9), or they may use a different VM
interfact to obtain executable pages.

Now that malloc(9) again allows executable allocations, this change also
reverts most of r317072.

PR: 228927
Reviewed by: alc, kib, markj, jhb (previous version)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D15691


# 34c538c3 02-Jun-2018 Mateusz Guzik <mjg@FreeBSD.org>

malloc: try to use builtins for zeroing at the callsite

Plenty of allocation sites pass M_ZERO and sizes which are small and known
at compilation time. Handling them internally in malloc loses this information
and results in avoidable calls to memset.

Instead, let the compiler take the advantage of it whenever possible.

Discussed with: jeff


# 5072a5f4 18-May-2018 Matt Macy <mmacy@FreeBSD.org>

malloc: avoid possibly returning stack garbage if MALLOC_DEBUG is defined


# 06bf2a6a 10-May-2018 Matt Macy <mmacy@FreeBSD.org>

Add simple preempt safe epoch API

Read locking is over used in the kernel to guarantee liveness. This API makes
it easy to provide livenes guarantees without atomics.

Includes epoch_test kernel module to stress test the API.

Documentation will follow initial use case.

Test case and improvements to preemption handling in response to discussion
with mjg@

Reviewed by: imp@, shurd@
Approved by: sbruno@


# 7cd79421 23-Apr-2018 Mateusz Guzik <mjg@FreeBSD.org>

dtrace: depessimize dtmalloc when dtrace is active

Each malloc/free was testing dtrace_malloc_enabled and forcing
extra reads from the malloc type struct to see if perhaps a
dtmalloc probe was on.

Treat it like lockstat and sdt: have a global bolean.


# c9e05ccd 23-Apr-2018 Mateusz Guzik <mjg@FreeBSD.org>

malloc: stop reading the subzone if MALLOC_DEBUG_MAXZONES == 1 (the default)

malloc was showing at the top of profile during while running microbenchmarks.

#define DTMALLOC_PROBE_MAX 2
struct malloc_type_internal {
uint32_t mti_probes[DTMALLOC_PROBE_MAX];
u_char mti_zone;
struct malloc_type_stats mti_stats[MAXCPU];
};

Reading mti_zone it wastes a cacheline to hold mti_probes + mti_zone
(which we know is 0) + part of malloc stats of the first cpu which on top
induces false-sharing.

In particular will-it-scale lock1_processes -t 128 -s 10:
before: average:45879692
after: average:51655596

Note the counters can be padded but the right fix is to move them to
counter(9), leaving the struct read-only after creation (modulo dtrace
probes).


# f7d35785 08-Feb-2018 Gleb Smirnoff <glebius@FreeBSD.org>

Fix boot_pages exhaustion on machines with many domains and cores, where
size of UMA zone allocation is greater than page size. In this case zone
of zones can not use UMA_MD_SMALL_ALLOC, and we need to postpone switch
off of this zone from startup_alloc() until full launch of VM.

o Always supply number of VM zones to uma_startup_count(). On machines
with UMA_MD_SMALL_ALLOC ignore it completely, unless zsize goes over
a page. In the latter case account VM zones for number of allocations
from the zone of zones.
o Rewrite startup_alloc() so that it will immediately switch off from
itself any zone that is already capable of running real alloc.
In worst case scenario we may leak a single page here. See comment
in uma_startup_count().
o Hardcode call to uma_startup2() into vm_mem_init(). Otherwise some
extra SYSINITs, e.g. vm_page_init() may sneak in before.
o While here, remove uma_boot_pages_mtx. With recent changes to boot
pages calculation, we are guaranteed to use all of the boot_pages
in the early single threaded stage.

Reported & tested by: mav


# f4bef67c 05-Feb-2018 Gleb Smirnoff <glebius@FreeBSD.org>

Followup on r302393 by cperciva, improving calculation of boot pages required
for UMA startup.

o Introduce another stage of UMA startup, which is entered after
vm_page_startup() finishes. After this stage we don't yet enable buckets,
but we can ask VM for pages. Rename stages to meaningful names while here.
New list of stages: BOOT_COLD, BOOT_STRAPPED, BOOT_PAGEALLOC, BOOT_BUCKETS,
BOOT_RUNNING.
Enabling page alloc earlier allows us to dramatically reduce number of
boot pages required. What is more important number of zones becomes
consistent across different machines, as no MD allocations are done before
the BOOT_PAGEALLOC stage. Now only UMA internal zones actually need to use
startup_alloc(), however that may change, so vm_page_startup() provides
its need for early zones as argument.
o Introduce uma_startup_count() function, to avoid code duplication. The
functions calculates sizes of zones zone and kegs zone, and calculates how
many pages UMA will need to bootstrap.
It counts not only of zone structures, but also of kegs, slabs and hashes.
o Hide uma_startup_foo() declarations from public file.
o Provide several DIAGNOSTIC printfs on boot_pages usage.
o Bugfix: when calculating zone of zones size use (mp_maxid + 1) instead of
mp_ncpus. Use resulting number not only in the size argument to zone_ctor()
but also as args.size.

Reviewed by: imp, gallatin (earlier version)
Differential Revision: https://reviews.freebsd.org/D14054


# 5a70796a 24-Jan-2018 Li-Wen Hsu <lwhsu@FreeBSD.org>

Fix build for architectures where size_t is not unsigned long

Reviewed by: cem
Differential Revision: https://reviews.freebsd.org/D14045


# bd555da9 24-Jan-2018 Conrad Meyer <cem@FreeBSD.org>

malloc(9): Change nominal size to size_t to match standard C

No functional change -- size_t matches unsigned long on all platforms.

Reported by: bde
Discussed with: jhb
Sponsored by: Dell EMC Isilon


# ab3185d1 12-Jan-2018 Jeff Roberson <jeff@FreeBSD.org>

Implement NUMA support in uma(9) and malloc(9). Allocations from specific
domains can be done by the _domain() API variants. UMA also supports a
first-touch policy via the NUMA zone flag.

The slab layer is now segregated by VM domains and is precise. It handles
iteration for round-robin directly. The per-cpu cache layer remains
a mix of domains according to where memory is allocated and freed. Well
behaved clients can achieve perfect locality with no performance penalty.

The direct domain allocation functions have to visit the slab layer and
so require per-zone locks which come at some expense.

Reviewed by: Attilio (a slightly older version)
Tested by: pho
Sponsored by: Netflix, Dell/EMC Isilon


# c02fc960 10-Jan-2018 Conrad Meyer <cem@FreeBSD.org>

mallocarray(9): panic if the requested allocation would overflow

Additionally, move the overflow check logic out to WOULD_OVERFLOW() for
consumers to have a common means of testing for overflowing allocations.
WOULD_OVERFLOW() should be a secondary check -- on 64-bit platforms, just
because an allocation won't overflow size_t does not mean it is a sane size
to request. Callers should be imposing reasonable allocation limits far,
far, below overflow.

Discussed with: emaste, jhb, kp
Sponsored by: Dell EMC Isilon


# fd91e076 07-Jan-2018 Kristof Provost <kp@FreeBSD.org>

Introduce mallocarray() in the kernel

Similar to calloc() the mallocarray() function checks for integer
overflows before allocating memory.
It does not zero memory, unless the M_ZERO flag is set.

Reviewed by: pfg, vangyzen (previous version), imp (previous version)
Obtained from: OpenBSD
Differential Revision: https://reviews.freebsd.org/D13766


# 2e47807c 28-Nov-2017 Jeff Roberson <jeff@FreeBSD.org>

Eliminate kmem_arena and kmem_object in preparation for further NUMA commits.

The arena argument to kmem_*() is now only used in an assert. A follow-up
commit will remove the argument altogether before we freeze the API for the
next release.

This replaces the hard limit on kmem size with a soft limit imposed by UMA. When
the soft limit is exceeded we periodically wakeup the UMA reclaim thread to
attempt to shrink KVA. On 32bit architectures this should behave much more
gracefully as we exhaust KVA. On 64bit the limits are likely never hit.

Reviewed by: markj, kib (some objections)
Discussed with: alc
Tested by: pho
Sponsored by: Netflix / Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D13187


# 51369649 20-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.


# 69a28758 15-Sep-2016 Ed Maste <emaste@FreeBSD.org>

Renumber license clauses in sys/kern to avoid skipping #3


# 5e0a6f31 19-May-2016 Mark Johnston <markj@FreeBSD.org>

Move IPv6 malloc tag definitions into the IPv6 code.


# b28cc462 09-Feb-2016 Gleb Smirnoff <glebius@FreeBSD.org>

Include sys/_task.h into uma_int.h, so that taskqueue.h isn't a
requirement for uma_int.h.

Suggested by: jhb


# e60b2fcb 03-Feb-2016 Gleb Smirnoff <glebius@FreeBSD.org>

Redo r292484. Embed task(9) into zone, so that uz_maxaction is called
in a context that can sleep, allowing consumers of the KPI to run their
drain routines without any extra measures.

Discussed with: jtl


# d9e2e68d 11-Dec-2015 Mark Johnston <markj@FreeBSD.org>

Don't make assertions about td_critnest when the scheduler is stopped.

A panicking thread always executes with a critical section held, so any
attempt to allocate or free memory while dumping will otherwise cause a
second panic. This can occur, for example, if xpt_polled_action() completes
non-dump I/O that was pending at the time of the panic. The fact that this
can occur is itself a bug, but asserting in this case does little but
reduce the reliability of kernel dumps.

Suggested by: kib
Reported by: pho


# 1067a2ba 19-Nov-2015 Jonathan T. Looney <jtl@FreeBSD.org>

Consistently enforce the restriction against calling malloc/free when in a
critical section.

uma_zalloc_arg()/uma_zalloc_free() may acquire a sleepable lock on the
zone. The malloc() family of functions may call uma_zalloc_arg() or
uma_zalloc_free().

The malloc(9) man page currently claims that free() will never sleep.
It also implies that the malloc() family of functions will not sleep
when called with M_NOWAIT. However, it is more correct to say that
these functions will not sleep indefinitely. Indeed, they may acquire
a sleepable lock. However, a developer may overlook this restriction
because the WITNESS check that catches attempts to call the malloc()
family of functions within a critical section is inconsistenly
applied.

This change clarifies the language of the malloc(9) man page to clarify
the restriction against calling the malloc() family of functions
while in a critical section or holding a spin lock. It also adds
KASSERTs at appropriate points to make the enforcement of this
restriction more consistent.

PR: 204633
Differential Revision: https://reviews.freebsd.org/D4197
Reviewed by: markj
Approved by: gnn (mentor)
Sponsored by: Juniper Networks


# 44ec2b63 09-May-2015 Konstantin Belousov <kib@FreeBSD.org>

The vmem callback to reclaim kmem arena address space on low or
fragmented conditions currently just wakes up the pagedaemon. The
kmem arena is significantly smaller then the total available physical
memory, which means that there are loads where kmem arena space could
be exhausted, while there is a lot of pages available still. The
woken up pagedaemon sees vm_pages_needed != 0, verifies the condition
vm_paging_needed() which is false, clears the pass and returns back to
sleep, not calling neither uma_reclaim() nor lowmem handler.

To handle low kmem arena conditions, create additional pagedaemon
thread which calls uma_reclaim() directly. The thread sleeps on the
dedicated channel and kmem_reclaim() wakes the thread in addition to
the pagedaemon.

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 1eafc078 14-Mar-2015 Ian Lepore <ian@FreeBSD.org>

Set the SBUF_INCLUDENUL flag in sbuf_new_for_sysctl() so that sysctl
strings returned to userland include the nulterm byte.

Some uses of sbuf_new_for_sysctl() write binary data rather than strings;
clear the SBUF_INCLUDENUL flag after calling sbuf_new_for_sysctl() in
those cases. (Note that the sbuf code still automatically adds a nulterm
byte in sbuf_finish(), but since it's not included in the length it won't
get copied to userland along with the binary data.)

Remove explicit adding of a nulterm byte in a couple places now that it
gets done automatically by the sbuf drain code.

PR: 195668


# 7c51714e 21-Sep-2014 Sean Bruno <sbruno@FreeBSD.org>

svn revisions r269964 and r269963 seemed to have impaired small memory
footprint systems(32M/64M) and didn't leave enough free memory to load modules
when it was setting up page tables that for sizes that are never used on
these smallish boards.

Set kmem_zmax to PAGE_SIZE on these smaller systems (< 128M) to keep this
from happening. Verified on mips32 h/w.

PR: 193465
Submitted by: delphij
Reviewed by: adrian


# 7001d850 13-Aug-2014 Xin LI <delphij@FreeBSD.org>

Add a new loader tunable, vm.kmem_zmax which allows a system administrator
to limit the maximum allocation size that malloc(9) would consider using
the UMA cache allocator as backend.

Suggested by: alfred
MFC after: 2 weeks


# bda06553 13-Aug-2014 Xin LI <delphij@FreeBSD.org>

Re-instate UMA cached backend for 4K - 64K allocations. New consumers
like geli(4) uses malloc(9) to allocate temporary buffers that gets
free'ed shortly, causing frequent TLB shootdown as observed in hwpmc
supported flame graph.

Discussed with: jeff, alfred
MFC after: 1 week


# 4813ad54 28-Jun-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Compile fixes:

Remove duplicate "debug_ktr.mask" sysctl definition.
Remove now unused variable from "kern_ktr.c".
This fixes build of "ktr" which was broken by r267961.

Let the default value for "vm_kmem_size_scale" be zero. It is setup
after that the sysctl has been initialized from "getenv()" in the
"kmeminit()" function to equal the "VM_KMEM_SIZE_MAX" value, if
zero. On Sparc64 the "VM_KMEM_SIZE_MAX" macro is not a constant. This
fixes build of Sparc64 which was broken by r267961.

Add a special macro to dynamically create SYSCTL root nodes, because
root nodes have a special parent. This fixes build of existing OFED
module and CANBUS module for pc98 which was broken by r267961.

Add missing "sysctl.h" includes to get the needed sysctl header file
declarations. This is needed after r267961.

MFC after: 2 weeks


# af3b2549 27-Jun-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Pull in r267961 and r267973 again. Fix for issues reported will follow.


# 37a107a4 27-Jun-2014 Glen Barber <gjb@FreeBSD.org>

Revert r267961, r267973:

These changes prevent sysctl(8) from returning proper output,
such as:

1) no output from sysctl(8)
2) erroneously returning ENOMEM with tools like truss(1)
or uname(1)
truss: can not get etype: Cannot allocate memory


# 3da1cf1e 27-Jun-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after: 2 weeks
Sponsored by: Mellanox Technologies


# 44f1c916 22-Mar-2014 Bryan Drewery <bdrewery@FreeBSD.org>

Rename global cnt to vm_cnt to avoid shadowing.

To reduce the diff struct pcu.cnt field was not renamed, so
PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in
kvm(3) and vmstat(8). The goal was to not affect externally used KPI.

Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the
the global cnt variable.

Exp-run revealed no ports using it directly.

No objection from: arch@
Sponsored by: EMC / Isilon Storage Division


# f9d498ad 23-Feb-2014 Dimitry Andric <dim@FreeBSD.org>

On sparc64, VM_KMEM_SIZE_SCALE is not a constant expression, so it
cannot be tested in a CTASSERT().


# 54366c0b 25-Nov-2013 Attilio Rao <attilio@FreeBSD.org>

- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging
option, unbreak the lock tracing release semantic by embedding
calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined
version of the releasing functions for mutex, rwlock and sxlock.
Failing to do so skips the lockstat_probe_func invokation for
unlocking.
- As part of the LOCKSTAT support is inlined in mutex operation, for
kernel compiled without lock debugging options, potentially every
consumer must be compiled including opt_kdtrace.h.
Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the
dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES
is linked there and it is only used as a compile-time stub [0].

[0] immediately shows some new bug as DTRACE-derived support for debug
in sfxge is broken and it was never really tested. As it was not
including correctly opt_kdtrace.h before it was never enabled so it
was kept broken for a while. Fix this by using a protection stub,
leaving sfxge driver authors the responsibility for fixing it
appropriately [1].

Sponsored by: EMC / Isilon storage division
Discussed with: rstone
[0] Reported by: rstone
[1] Discussed with: philip


# c70af487 08-Nov-2013 Alan Cox <alc@FreeBSD.org>

As of r257209, all architectures have defined VM_KMEM_SIZE_SCALE. In other
words, every architecture is now auto-sizing the kmem arena. This revision
changes kmeminit() so that the definition of VM_KMEM_SIZE_SCALE becomes
mandatory and the definition of VM_KMEM_SIZE becomes optional.

Replace or eliminate all existing definitions of VM_KMEM_SIZE. With
auto-sizing enabled, VM_KMEM_SIZE effectively became an alternate spelling
for VM_KMEM_SIZE_MIN on most architectures. Use VM_KMEM_SIZE_MIN for
clarity.

Change kmeminit() so that the effect of defining VM_KMEM_SIZE is similar to
that of setting the tunable vm.kmem_size. Whereas the macros
VM_KMEM_SIZE_{MAX,MIN,SCALE} have had the same effect as the tunables
vm.kmem_size_{max,min,scale}, the effects of VM_KMEM_SIZE and vm.kmem_size
have been distinct. In particular, whereas VM_KMEM_SIZE was overridden by
VM_KMEM_SIZE_{MAX,MIN,SCALE} and vm.kmem_size_{max,min,scale}, vm.kmem_size
was not. Remedy this inconsistency. Now, VM_KMEM_SIZE can be used to set
the size of the kmem arena at compile-time without that value being
overridden by auto-sizing.

Update the nearby comments to reflect the kmem submap being replaced by the
kmem arena. Stop duplicating the auto-sizing formula in every machine-
dependent vmparam.h and place it in kmeminit() where auto-sizing takes
place.

Reviewed by: kib (an earlier version)
Sponsored by: EMC / Isilon Storage Division


# 61083fcc 05-Oct-2013 Alan Cox <alc@FreeBSD.org>

Tidy up kmeminit(): Since r245575, 'nmbclusters' is calculated after
kmeminit() runs, so it contributes nothing to 'vm_kmem_size'; update a
comment to reflect that r254025 replaced the kmem submap with the kmem
arena.

Reviewed by: kib
Approved by: re (gjb)
Sponsored by: EMC / Isilon Storage Division


# 99de9af2 13-Aug-2013 Jeff Roberson <jeff@FreeBSD.org>

- Disable quantum caches on the kmem_arena. This can make fragmentation
worse on small KVA systems. I had intended to only enable it for
debugging.

Sponsored by: EMC / Isilon Storage Division


# e137643e 09-Aug-2013 Olivier Houchard <cognet@FreeBSD.org>

Instead of just trying to do it for arm, make sure vm_kmem_size is properly
aligned in kmeminit(), where it'll work for any arch.

Suggested by: alc


# 5df87b21 07-Aug-2013 Jeff Roberson <jeff@FreeBSD.org>

Replace kernel virtual address space allocation with vmem. This provides
transparent layering and better fragmentation.

- Normalize functions that allocate memory to use kmem_*
- Those that allocate address space are named kva_*
- Those that operate on maps are named kmap_*
- Implement recursive allocation handling for kmem_arena in vmem.

Reviewed by: alc
Tested by: pho
Sponsored by: EMC / Isilon Storage Division


# 1f3ad93b 31-Jul-2013 Konstantin Belousov <kib@FreeBSD.org>

Remove unused malloc type.

Requested by: alc
MFC after: 1 week


# 94bfd5b1 04-Feb-2013 Marius Strobl <marius@FreeBSD.org>

Try to improve r242655 take III: move these SYSCTLs describing the kernel
map, which is defined and initialized in vm/vm_kern.c, to the latter.

Submitted by: alc


# e8cbe54b 03-Feb-2013 Marius Strobl <marius@FreeBSD.org>

Further improve r242655 and supply VM_{MIN,MAX}_KERNEL_ADDRESS as constant
values to SYSCTL_ULONG(9) where possible.

Submitted by: bde


# c882264c 08-Nov-2012 Marius Strobl <marius@FreeBSD.org>

Make r242655 build on sparc64. While at it, make vm_{max,min}_kernel_address
vm_offset_t as they should be.


# fc6874bc 05-Nov-2012 Alfred Perlstein <alfred@FreeBSD.org>

export VM_MIN_KERNEL_ADDRESS and VM_MAX_KERNEL_ADDRESS via sysctl.

On several platforms the are determined by too many nested #defines to be
easily discernible. This will aid in development of auto-tuning.


# f806cdcf 15-Jul-2012 Matthew D Fleming <mdf@FreeBSD.org>

Fix a bug with memguard(9) on 32-bit architectures without a
VM_KMEM_MAX_SIZE.

The code was not taking into account the size of the kernel_map, which
the kmem_map is allocated from, so it could produce a sub-map size too
large to fit. The simplest solution is to ignore VM_KMEM_MAX entirely
and base the memguard map's size off the kernel_map's size, since this
is always relevant and always smaller.

Found by: Justin Hibbits


# 687c94aa 02-Jul-2012 John Baldwin <jhb@FreeBSD.org>

Honor db_pager_quit in 'show uma' and 'show malloc'.

MFC after: 1 month


# 831ce4cb 01-Mar-2012 John Baldwin <jhb@FreeBSD.org>

- Change contigmalloc() to use the vm_paddr_t type instead of an unsigned
long for specifying a boundary constraint.
- Change bus_dma tags to use bus_addr_t instead of bus_size_t for boundary
constraints.

These allow boundary constraints to be fully expressed for cases where
sizeof(bus_addr_t) != sizeof(bus_size_t). Specifically, it allows a
driver to properly specify a 4GB boundary in a PAE kernel.

Note that this cannot be safely MFC'd without a lot of compat shims due
to KBI changes, so I do not intend to merge it.

Reviewed by: scottl


# ea3f07d3 07-Dec-2011 Alan Cox <alc@FreeBSD.org>

Eliminate stale numbers from a comment.


# c749c003 07-Dec-2011 Alan Cox <alc@FreeBSD.org>

Eliminate the possibility of 32-bit arithmetic overflow in the calculation
of vm_kmem_size that may occur if the system administrator has specified a
vm.vm_kmem_size tunable value that exceeds the hard cap.

PR: 162741
Submitted by: Adam McDougall
Reviewed by: bde@
MFC after: 3 weeks


# 6472ac3d 07-Nov-2011 Ed Schouten <ed@FreeBSD.org>

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


# f346986b 26-Oct-2011 Alan Cox <alc@FreeBSD.org>

contigmalloc(9) and contigfree(9) are now implemented in terms of other
more general VM system interfaces. So, their implementation can now
reside in kern_malloc.c alongside the other functions that are declared
in malloc.h.


# 8d689e04 12-Oct-2011 Gleb Smirnoff <glebius@FreeBSD.org>

Make memguard(9) capable to guard uma(9) allocations.


# 1549ed03 08-Oct-2011 Alan Cox <alc@FreeBSD.org>

Fix the handling of an empty kmem map by sysctl_kmem_map_free(). In
the unlikely event that sysctl_kmem_map_free() was performed on an
empty kmem map, it would incorrectly report the free space as zero.

Discussed with: avg
MFC after: 1 week


# e9a3f785 23-Mar-2011 Alan Cox <alc@FreeBSD.org>

Modestly increase the maximum allowed size of the kmem map on i386.
Also, express this new maximum as a fraction of the kernel's address
space size rather than a constant so that increasing KVA_PAGES will
automatically increase this maximum. As a side-effect of this change,
kern.maxvnodes will automatically increase by a proportional amount.

While I'm here ensure that this change doesn't result in an unintended
increase in maxpipekva on i386. Calculate maxpipekva based upon the
size of the kernel address space and the amount of physical memory
instead of the size of the kmem map. The memory backing pipes is not
allocated from the kmem map. It is allocated from its own submap of
the kernel map. In short, it has no real connection to the kmem map.
(In fact, the commit messages for the maxpipekva auto-sizing talk
about using the kernel map size, cf. r117325 and r117391, even though
the implementation actually used the kmem map size.) Although the
calculation is now done differently, the resulting value for
maxpipekva should remain almost the same on i386. However, on amd64,
the value will be reduced by 2/3. This is intentional. The recent
change to VM_KMEM_SIZE_SCALE on amd64 for the benefit of ZFS also had
the unnecessary side-effect of increasing maxpipekva. This change is
effectively restoring maxpipekva on amd64 to its prior value.

Eliminate init_param3() since it is no longer used.


# 00f0e671 26-Jan-2011 Matthew D Fleming <mdf@FreeBSD.org>

Explicitly wire the user buffer rather than doing it implicitly in
sbuf_new_for_sysctl(9). This allows using an sbuf with a SYSCTL_OUT
drain for extremely large amounts of data where the caller knows that
appropriate references are held, and sleeping is not an issue.

Inspired by: rwatson


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# 95bb9d38 09-Oct-2010 Andriy Gapon <avg@FreeBSD.org>

add kmem_map_free sysctl: query largest contiguous free range in kmem_map

Suggested by: alc
Reviewed by: alc
MFC after: 1 week


# 7814c80a 07-Oct-2010 Andriy Gapon <avg@FreeBSD.org>

vm.kmem_map_size: a sysctl to query current kmem_map->size

Based on a patch from Sandvine Incorporated via emaste.

Reviewed by: emaste
MFC after: 1 week


# d801e824 30-Sep-2010 Andriy Gapon <avg@FreeBSD.org>

kmem_size* sysctls: hint that these are also tunables

MFC after: 1 week


# 4e657159 16-Sep-2010 Matthew D Fleming <mdf@FreeBSD.org>

Re-add r212370 now that the LOR in powerpc64 has been resolved:

Add a drain function for struct sysctl_req, and use it for a variety
of handlers, some of which had to do awkward things to get a large
enough SBUF_FIXEDLEN buffer.

Note that some sysctl handlers were explicitly outputting a trailing
NUL byte. This behaviour was preserved, though it should not be
necessary.

Reviewed by: phk (original patch)


# 404a593e 13-Sep-2010 Matthew D Fleming <mdf@FreeBSD.org>

Revert r212370, as it causes a LOR on powerpc. powerpc does a few
unexpected things in copyout(9) and so wiring the user buffer is not
sufficient to perform a copyout(9) while holding a random mutex.

Requested by: nwhitehorn


# dd67e210 09-Sep-2010 Matthew D Fleming <mdf@FreeBSD.org>

Add a drain function for struct sysctl_req, and use it for a variety of
handlers, some of which had to do awkward things to get a large enough
FIXEDLEN buffer.

Note that some sysctl handlers were explicitly outputting a trailing NUL
byte. This behaviour was preserved, though it should not be necessary.

Reviewed by: phk


# 6d3ed393 31-Aug-2010 Matthew D Fleming <mdf@FreeBSD.org>

The realloc case for memguard(9) will copy too many bytes when
reallocating to a smaller-sized allocation. Fix this issue.

Noticed by: alc
Reviewed by: alc
Approved by: zml (mentor)
MFC after: 3 weeks


# e3813573 11-Aug-2010 Matthew D Fleming <mdf@FreeBSD.org>

Rework memguard(9) to reserve significantly more KVA to detect
use-after-free over a longer time. Also release the backing pages of
a guarded allocation at free(9) time to reduce the overhead of using
memguard(9). Allow setting and varying the malloc type at run-time.
Add knobs to allow:

- randomly guarding memory
- adding un-backed KVA guard pages to detect underflow and overflow
- a lower limit on the size of allocations that are guarded

Reviewed by: alc
Reviewed by: brueffer, Ulrich Spörlein <uqs spoerlein net> (man page)
Silence from: -arch
Approved by: zml (mentor)
MFC after: 1 month


# d7854da1 28-Jul-2010 Matthew D Fleming <mdf@FreeBSD.org>

Add MALLOC_DEBUG_MAXZONES debug malloc(9) option to use multiple uma
zones for each malloc bucket size. The purpose is to isolate
different malloc types into hash classes, so that any buffer overruns
or use-after-free will usually only affect memory from malloc types in
that hash class. This is purely a debugging tool; by varying the hash
function and tracking which hash class was corrupted, the intersection
of the hash classes from each instance will point to a single malloc
type that is being misused. At this point inspection or memguard(9)
can be used to catch the offending code.

Add MALLOC_DEBUG_MAXZONES=8 to -current GENERIC configuration files.
The suggestion to have this on by default came from Kostik Belousov on
-arch.

This code is based on work by Ron Steinke at Isilon Systems.

Reviewed by: -arch (mostly silence)
Reviewed by: zml
Approved by: zml (mentor)


# 60ae52f7 21-Jun-2010 Ed Schouten <ed@FreeBSD.org>

Use ISO C99 integer types in sys/kern where possible.

There are only about 100 occurences of the BSD-specific u_int*_t
datatypes in sys/kern. The ISO C99 integer types are used here more
often.


# f121baaa 05-Jun-2009 Brian Somers <brian@FreeBSD.org>

If we're passed garbage in malloc_init(), panic() rather than expecting
a KASSERT to handle it. People are likely to turn off INVARIANTS RSN
and loading an old module can cause garbage-in here.

I saw the issue with an older nvidia driver (x11/nvidia-driver) loading
into a new kernel - a crash wasn't seen 'till sysctl_kern_malloc_stats().
I was lucky that mtp->ks_shortdesc was NULL and not something horrible.

While I'm here, KASSERT that malloc_uninit() isn't passed something that's
not in kmemstatistics.

MFC after: 3 weeks


# e678f09a 09-May-2009 Warner Losh <imp@FreeBSD.org>

Retire kern.vm.kmem.size. It was marked as obsolete prior to 5.2, so
it can go.


# bb1c7df8 18-Apr-2009 Robert Watson <rwatson@FreeBSD.org>

struct malloc_type has had a 'magic' field statically initialized to
M_MAGIC by MALLOC_DEFINE() for a long time; add assertions that
malloc_type's passed to malloc(), free(), etc have that magic set.

MFC after: 2 weeks


# c90c9021 26-Feb-2009 Ed Schouten <ed@FreeBSD.org>

Remove even more unneeded variable assignments.

kern_time.c:
- Unused variable `p'.

kern_thr.c:
- Variable `error' is always caught immediately, so no reason to
initialize it. There is no way that error != 0 at the end of
create_thread().

kern_sig.c:
- Unused variable `code'.

kern_synch.c:
- `rval' is always assigned in all different cases.

kern_rwlock.c:
- `v' is always overwritten with RW_UNLOCKED further on.

kern_malloc.c:
- `size' is always initialized with the proper value before being used.

kern_exit.c:
- `error' is always caught and returned immediately. abort2() never
returns a non-zero value.

kern_exec.c:
- `len' is always assigned inside the if-statement right below it.

tty_info.c:
- `td' is always overwritten by FOREACH_THREAD_IN_PROC().

Found by: LLVM's scan-build


# e20a199f 25-Jan-2009 Jeff Roberson <jeff@FreeBSD.org>

- Make the keg abstraction more complete. Permit a zone to have multiple
backend kegs so it may source compatible memory from multiple backends.
This is useful for cases such as NUMA or different layouts for the same
memory type.
- Provide a new api for adding new backend kegs to secondary zones.
- Provide a new flag for adjusting the layout of zones to stagger
allocations better across cache lines.

Sponsored by: Nokia


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# b89eaf4e 05-Jul-2008 Alan Cox <alc@FreeBSD.org>

Enable the creation of a kmem map larger than 4GB.
Submitted by: Tz-Huan Huang

Make several variables related to kmem map auto-sizing static.
Found by: CScout


# 6819e13e 04-Jul-2008 Alan Cox <alc@FreeBSD.org>

Correct an error in the comments for init_param3().

Discussed with: silby


# 91dd776c 22-May-2008 John Birrell <jb@FreeBSD.org>

Add support for the DTrace malloc provider which can enable probes
on a per-malloc type basis.


# 3202ed75 10-May-2008 Alan Cox <alc@FreeBSD.org>

Introduce a new parameter "superpage_align" to kmem_suballoc() that is
used to request superpage alignment for the submap.

Request superpage alignment for the kmem_map.

Pass VMFS_ANY_SPACE instead of TRUE to vm_map_find(). (They are currently
equivalent but VMFS_ANY_SPACE is the new preferred spelling.)

Remove a stale comment from kmem_malloc().


# 237fdd78 16-Mar-2008 Robert Watson <rwatson@FreeBSD.org>

In keeping with style(9)'s recommendations on macros, use a ';'
after each SYSINIT() macro invocation. This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.

MFC after: 1 month
Discussed with: imp, rink


# dc2e1e3f 27-Jun-2007 Robert Watson <rwatson@FreeBSD.org>

Use vm_offset_t for kmembase and kmemlimit rather than char *, avoiding
unnecessary casts, and making it possible to compile kern_malloc.c with
strict aliasing.

Submitted by: rdivacky
Approved by: re (kensmith)


# 3805385e 13-Jun-2007 Robert Watson <rwatson@FreeBSD.org>

Spell statistics more correctly in comments.


# 2feb50bf 31-May-2007 Attilio Rao <attilio@FreeBSD.org>

Revert VMCNT_* operations introduction.
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.

Requested by: alc
Approved by: jeff (mentor)


# e4e80aa7 27-May-2007 Robert Watson <rwatson@FreeBSD.org>

Remove #if 0'd check for 0-size allocations, which if enabled, called
kdb_enter().


# 222d0195 18-May-2007 Jeff Roberson <jeff@FreeBSD.org>

- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating
vmcnts. This can be used to abstract away pcpu details but also changes
to use atomics for all counters now. This means sched lock is no longer
responsible for protecting counts in the switch routines.

Contributed by: Attilio Rao <attilio@FreeBSD.org>


# 0e5179e4 20-Apr-2007 Stephane E. Potvin <sepotvin@FreeBSD.org>

Add support for specifying a minimal size for vm.kmem_size in the loader via
vm.kmem_size_min. Useful when using ZFS to make sure that vm.kmem size will
be at least 256mb (for example) without forcing a particular value via vm.kmem_size.

Approved by: njl (mentor)
Reviewed by: alc


# 24076d13 26-Oct-2006 Robert Watson <rwatson@FreeBSD.org>

Increase usefulness of "show malloc" by moving from displaying the basic
counters of allocs/frees/use for each malloc type to calculating InUse,
MemUse, and Requests as displayed by the userspace vmstat -m. This is
more useful when debugging malloc(9)-related memory leaks, where the
count of allocs/frees may not usefully reflect that current memory
allocation (i.e., when highly variable size allocations occur with the
same malloc type, such as with contigmalloc).

MFC after: 3 days
Limitations observed by: scottl


# 4b19d603 23-Jul-2006 Robert Watson <rwatson@FreeBSD.org>

Remove old kern.malloc sysctl, which generated a text representation of
the kernel malloc(9) state for vmstat -m. libmemstat is now used to
generate a machine-readable version which is converged by vmstat -m
into a human-readable version.

Not for MFC.


# 0ce3f16d 23-Jul-2006 Robert Watson <rwatson@FreeBSD.org>

Expand comments for malloc(9) to better describe the design and
statistics / memory types model.


# 45d48bda 03-Mar-2006 Paul Saab <ps@FreeBSD.org>

Fix bug in malloc_uninit():
Releasing items from the mt_zone can not be done by a simple
uma_zfree() call since mt_zone is allocated with the UMA_ZONE_MALLOC
flag. Use uma_zfree_arg instead and supply the slab.

This bug caused panics in low memory situations on unloading kernel
modules containing MALLOC_DEFINE(..) statements.

Submitted by: ups


# 847a2a17 31-Jan-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Add buffer corruption protection (RedZone) for kernel's malloc(9).
It detects both: buffer underflows and buffer overflows bugs at runtime
(on free(9) and realloc(9)) and prints backtraces from where memory was
allocated and from where it was freed.

Tested by: kris


# d362c40d 30-Dec-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Improve memguard a bit:
- Provide tunable vm.memguard.desc, so one can specify memory type without
changing the code and recompiling the kernel.
- Allow to use memguard for kernel modules by providing sysctl
vm.memguard.desc, which can be changed to short description of memory
type before module is loaded.
- Move as much memguard code as possible to memguard.c.
- Add sysctl node vm.memguard. and move memguard-specific sysctl there.
- Add malloc_desc2type() function for finding memory type based on its
short description (ks_shortdesc field).
- Memory type can be changed (via vm.memguard.desc sysctl) only if it
doesn't exist (will be loaded later) or when no memory is allocated yet.
If there is allocated memory for the given memory type, return EBUSY.
- Implement two ways of memory types comparsion and make safer/slower the
default.


# 619f2841 27-Dec-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

In realloc(9), determine size of the original block based on
UMA_SLAB_MALLOC flag.
In some circumstances (I observed it when I was doing a lot of reallocs)
UMA_SLAB_MALLOC can be set even if us_keg != NULL.

If this is the case we have wonderful, silent data corruption, because less
data is copied to the newly allocated region than should be.

I'm not sure when this bug was introduced, it could be there undetected
for years now, as we don't have a lot of realloc(9) consumers and it was
hard to reproduce it...
...but what I know for sure, is that I don't want to know who introduce
the bug:) It took me two/three days to track it down (of course most of
the time I was looking for the bug in my own code).


# 2a143d5b 03-Nov-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Detect memory leaks when memory type is being destroyed.
This is very helpful for detecting kernel modules memory leaks on unload.

Discussed and reviewed by: rwatson


# 64a266f9 20-Oct-2005 Robert Watson <rwatson@FreeBSD.org>

Change format string for u_int64_t to %ju from %llu, in order to use the
correct format string on 64-bit systems.

Pointed out by: pjd


# 909ed16c 20-Oct-2005 Robert Watson <rwatson@FreeBSD.org>

Add a "show malloc" command to DDB, which prints out the current stats for
available kernel malloc types. Quite useful for post-mortem debugging of
memory leaks without a dump device configured on a panicked box.

MFC after: 2 weeks


# 23198357 02-Aug-2005 Ruslan Ermilov <ru@FreeBSD.org>

Long overdue, keep up with mbuf.h,v 1.148.


# 73864adb 27-Jul-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Fix the way how "InUse" column in 'vmstat -m' output works:
- increase number of allocations count only on successfull malloc(9),
so it doesn't confuse people;
- because we need to check if 'size > 0', hide 'mtsp->mts_memalloced += size;'
under the check as well, as for size=0 it is of course a no-op;
- avoid critical_enter()/critical_exit() in case of failure in
malloc_type_allocated() as there will be nothing to do.

OK'ed by: rwatson
MFC after: 2 days


# 4f8721d2 14-Jul-2005 Robert Watson <rwatson@FreeBSD.org>

Correct build on 64-bit: cast u_int64_t to (unsigned long long) before
printfing as (unsigned long long). 32-bit build on i386 didn't notice
this. Whoops.

Reported by: arved
Tested by: sledge


# cd814b26 14-Jul-2005 Robert Watson <rwatson@FreeBSD.org>

Introduce a new sysctl, kern.malloc_stats, which exports kernel malloc
statistics via a binary structure stream:

- Add structure 'malloc_type_stream_header', which defines a stream
version, definition of MAXCPUS used in the stream, and a number of
malloc_type records in the stream.

- Add structure 'malloc_type_header', which defines the name of the
malloc type being reported on.

- When the sysctl is queried, return a stream header, followed by a
series of type descriptions, each consisting of a type header
followed by a series of MAXCPUS malloc_type_stats structures holding
per-CPU allocation information. Typical values of MAXCPUS will be 1
(UP compiled kernel) and 16 (SMP compiled kernel).

This query mechanism allows user space monitoring tools to extract
memory allocation statistics in a machine-readable form, and to do so
at a per-CPU granularity, allowing monitoring of allocation patterns
across CPUs in order to better understand the distribution of work and
memory flow over multiple CPUs.

While here:

- Bump statistics width to uint64_t, and hard code using fixed-width
type in order to be more sure about structure layout in the stream.
We allocate and free a lot of memory.

- Add kmemcount, a counter of the number of registered malloc types,
in order to avoid excessive manual counting of types. Export via a
new sysctl to allow user-space code to better size buffers.

- De-XXX comment on no longer maintaining the high watermark in old
sysctl monitoring code.

A follow-up commit of libmemstat(3), a library to monitor kernel memory
allocation, will occur in the next few days. Likewise, similar changes
to UMA.


# c0cac8dc 16-Jun-2005 Ken Smith <kensmith@FreeBSD.org>

Remove a variable that became unused as a result of changes made
in v1.139. This was only exposed if MALLOC_PROFILE was defined.

Submitted by: Gary Jennejohn
Pointy hat: rwatson
Approved by: re (scottl)


# 8c61b219 10-Jun-2005 Joseph Koshy <jkoshy@FreeBSD.org>

Fix typo.

Reviewed by: rwatson, sam


# 63a7e0a3 29-May-2005 Robert Watson <rwatson@FreeBSD.org>

Kernel malloc layers malloc_type allocation over one of two underlying
allocators: a set of power-of-two UMA zones for small allocations, and the
VM page allocator for large allocations. In order to maintain unified
statistics for specific malloc types, kernel malloc maintains a separate
per-type statistics pool, which can be monitored using vmstat -m. Prior
to this commit, each pool of per-type statistics was protected using a
per-type mutex associated with the malloc type.

This change modifies kernel malloc to maintain per-CPU statistics pools
for each malloc type, and protects writing those statistics using critical
sections. It also moves to unsynchronized reads of per-CPU statistics
when generating coalesced statistics. To do this, several changes are
implemented:

- In the previous world order, the statistics memory was allocated by
the owner of the malloc type structure, allocated statically using
MALLOC_DEFINE(). This embedded the definition of the malloc_type
structure into all kernel modules. Move to a model in which a pointer
within struct malloc_type points at a UMA-allocated
malloc_type_internal data structure owned and maintained by
kern_malloc.c, and not part of the exported ABI/API to the rest of
the kernel. For the purposes of easing a possible MFC, re-use an
existing pointer in 'struct malloc_type', and maintain the current
malloc_type structure size, as well as layout with respect to the
fields reused outside of the malloc subsystem (such as ks_shortdesc).
There are several unused fields as a result of no longer requiring
the mutex in malloc_type.

- Struct malloc_type_internal contains an array of malloc_type_stats,
of size MAXCPU. The structure defined above avoids hard-coding a
kernel compile-time value of MAXCPU into kernel modules that interact
with malloc.

- When accessing per-cpu statistics for a malloc type, surround read -
modify - update requests with critical_enter()/critical_exit() in
order to avoid races during write. The per-CPU fields are written
only from the CPU that owns them.

- Per-CPU stats now maintained "allocated" and "freed" counters for
number of allocations/frees and bytes allocated/freed, since there is
no longer a coherent global notion of the totals. When coalescing
malloc stats, accept a slight race between reading stats across CPUs,
and avoid showing the user a negative allocation count for the type
in the event of a race. The global high watermark is no longer
maintained for a malloc type, as there is no global notion of the
number of allocations.

- While tearing up the sysctl() path, also switch to using sbufs. The
current "export as text" sysctl format is retained with the same
syntax. We may want to change this in the future to export more
per-CPU information, such as how allocations and frees are balanced
across CPUs.

This change results in a substantial speedup of kernel malloc and free
paths on SMP, as critical sections (where usable) out-perform mutexes
due to avoiding atomic/bus-locked operations. There is also a minor
improvement on UP due to the slightly lower cost of critical sections
there. The cost of the change to this approach is the loss of a
continuous notion of total allocations that can be exploited to track
per-type high watermarks, as well as increased complexity when
monitoring statistics.

Due to carefully avoiding changing the ABI, as well as hardening the ABI
against future changes, it is not necessary to recompile kernel modules
for this change. However, MFC'ing this change to RELENG_5 will require
also MFC'ing optimizations for soft critical sections, which may modify
exposed kernel ABIs. The internal malloc API is changed, and
modifications to vmstat in order to restore "vmstat -m" on core dumps will
follow shortly.

Several improvements from: bde
Statistics approach discussed with: ups
Tested by: scottl, others


# 87efd4d5 12-Apr-2005 Robert Watson <rwatson@FreeBSD.org>

Consistently style function declarations in kern_malloc.c.

MFC after: 3 days


# e4eb384b 21-Jan-2005 Bosko Milekic <bmilekic@FreeBSD.org>

Bring in MemGuard, a very simple and small replacement allocator
designed to help detect tamper-after-free scenarios, a problem more
and more common and likely with multithreaded kernels where race
conditions are more prevalent.

Currently MemGuard can only take over malloc()/realloc()/free() for
particular (a) malloc type(s) and the code brought in with this
change manually instruments it to take over M_SUBPROC allocations
as an example. If you are planning to use it, for now you must:

1) Put "options DEBUG_MEMGUARD" in your kernel config.
2) Edit src/sys/kern/kern_malloc.c manually, look for
"XXX CHANGEME" and replace the M_SUBPROC comparison with
the appropriate malloc type (this might require additional
but small/simple code modification if, say, the malloc type
is declared out of scope).
3) Build and install your kernel. Tune vm.memguard_divisor
boot-time tunable which is used to scale how much of kmem_map
you want to allott for MemGuard's use. The default is 10,
so kmem_size/10.

ToDo:
1) Bring in a memguard(9) man page.
2) Better instrumentation (e.g., boot-time) of MemGuard taking
over malloc types.
3) Teach UMA about MemGuard to allow MemGuard to override zone
allocations too.
4) Improve MemGuard if necessary.

This work is partly based on some old patches from Ian Dowse.


# 9454b2d8 06-Jan-2005 Warner Losh <imp@FreeBSD.org>

/* -> /*- for copyright notices, minor format tweaks as necessary


# 479439b4 29-Sep-2004 Dag-Erling Smørgrav <des@FreeBSD.org>

Turn VM_KMEM_SIZE_MAX and VM_KMEM_SIZE_SCALE into tunables.

MFC after: 3 days


# 4362fada 19-Jul-2004 Brian Feldman <green@FreeBSD.org>

Reimplement contigmalloc(9) with an algorithm which stands a greatly-
improved chance of working despite pressure from running programs.
Instead of trying to throw a bunch of pages out to swap and hope for
the best, only a range that can potentially fulfill contigmalloc(9)'s
request will have its contents paged out (potentially, not forcibly)
at a time.

The new contigmalloc operation still operates in three passes, but it
could potentially be tuned to more or less. The first pass only looks
at pages in the cache and free pages, so they would be thrown out
without having to block. If this is not enough, the subsequent passes
page out any unwired memory. To combat memory pressure refragmenting
the section of memory being laundered, each page is removed from the
systems' free memory queue once it has been freed so that blocking
later doesn't cause the memory laundered so far to get reallocated.

The page-out operations are now blocking, as it would make little sense
to try to push out a page, then get its status immediately afterward
to remove it from the available free pages queue, if it's unlikely to
have been freed. Another change is that if KVA allocation fails, the
allocated memory segment will be freed and not leaked.

There is a sysctl/tunable, defaulting to on, which causes the old
contigmalloc() algorithm to be used. Nonetheless, I have been using
vm.old_contigmalloc=0 for over a month. It is safe to switch at
run-time to see the difference it makes.

A new interface has been used which does not require mapping the
allocated pages into KVA: vm_page.h functions vm_page_alloc_contig()
and vm_page_release_contig(). These are what vm.old_contigmalloc=0
uses internally, so the sysctl/tunable does not affect their operation.

When using the contigmalloc(9) and contigfree(9) interfaces, memory
is now tracked with malloc(9) stats. Several functions have been
exported from kern_malloc.c to allow other subsystems to use these
statistics, as well. This invalidates the BUGS section of the
contigmalloc(9) manpage.


# 2d50560a 10-Jul-2004 Marcel Moolenaar <marcel@FreeBSD.org>

Update for the KDB framework:
o Make debugging code conditional upon KDB instead of DDB.
o Call kdb_enter() instead of Debugger().
o Call kdb_backtrace() instead of db_print_backtrace() or backtrace().

kern_mutex.c:
o Replace checks for db_active with checks for kdb_active and make
them unconditional.

kern_shutdown.c:
o s/DDB_UNATTENDED/KDB_UNATTENDED/g
o s/DDB_TRACE/KDB_TRACE/g
o Save the TID of the thread doing the kernel dump so the debugger
knows which thread to select as the current when debugging the
kernel core file.
o Clear kdb_active instead of db_active and do so unconditionally.
o Remove backtrace() implementation.

kern_synch.c:
o Call kdb_reenter() instead of db_error().


# 099a0e58 31-May-2004 Bosko Milekic <bmilekic@FreeBSD.org>

Bring in mbuma to replace mballoc.

mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.

Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.

mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.

From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.

Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.

This change removes more code than it adds.

A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.

Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)


# 7f8a436f 05-Apr-2004 Warner Losh <imp@FreeBSD.org>

Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core


# 84344f9f 27-Jan-2004 Dag-Erling Smørgrav <des@FreeBSD.org>

Rename the kern.vm.kmem.size tunable to the more logical vm.kmem_size. To
assure backward compatibility (conditional on !BURN_BRIDGES), look it up
by its old name first, and log a warning (but accept the setting) if it
was found. If both the old and new name are defined, the new name takes
precedence.

Also export vm.kmem_size as a read-only sysctl variable; I find it hard to
tune a parameter when I don't know its default value, especially when that
default value is computed at boot time.


# 9fb535de 18-Sep-2003 Jeff Roberson <jeff@FreeBSD.org>

- Only use UMA to cache malloc requests up to PAGE_SIZE. Values larger than
this are requested very infrequently and waste memory when we cache
spares.


# 68f2d20b 22-Jul-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Revert stuff which accidentally ended up in the previous commit.


# 55d1d703 22-Jul-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Don't attempt to inline large functions mb_alloc() and mb_free(),
it more than doubles the text size of this file.

GCC has wisely ignored us on this previously


# 347194c1 10-Jul-2003 Mike Silbersack <silby@FreeBSD.org>

Add init_param3() to subr_param. This function is called
immediately after the kernel map has been sized, and is
the optimal place for the autosizing of memory allocations
which occur within the kernel map to occur.

Suggested by: bde


# 1795d0cd 10-Jun-2003 Paul Saab <ps@FreeBSD.org>

Don't overflow when calculating vm_kmem_size. This fixes kmem_map
too small panics on PAE machines which have odd > 4GB sizes (4.5 gig
would render a 20MB of KVA for kmem_map instead of 200MB).

Submitted by: John Cagle <john.cagle@hp.com>, jeff
Reviewed by: jeff, peter, scottl, lots of USENIX folks


# 677b542e 10-Jun-2003 David E. O'Brien <obrien@FreeBSD.org>

Use __FBSDID().


# 1282e9ac 11-May-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Don't pass NULL pointer to memset if we are compiled with DIAGNOSTIC

Approved by: re/rwatson


# 8cb72d61 05-May-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Add two KASSERTS which trigger if free(9) would drag the "memuse" statistic
for a malloc bucket under zero. This typically happens if you malloc(9)
from one bucket and free to another.


# 3f6ee876 25-Apr-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Update the "last malloc failure timestamp" also for simulated
malloc errors.


# f2538508 26-Mar-2003 Robert Watson <rwatson@FreeBSD.org>

Permit debug.malloc.failure_rate to be specified using a tunable so
that the feature can be enabled during the boot process. Note the
continued limitation that FreeBSD fails so rapidly with this setting
enabled that it's hard to narrow down particular failures for
correction; we really need per-malloc type failure rates.


# eae870cd 26-Mar-2003 Robert Watson <rwatson@FreeBSD.org>

Add a new kernel option, MALLOC_MAKE_FAILURES, which compiles
in a debugging feature causing M_NOWAIT allocations to fail at
a specified rate. This can be useful for detecting poor
handling of M_NOWAIT: the most frequent problems I've bumped
into are unconditional deference of the pointer even though
it's NULL, and hangs as a result of a lost event where memory
for the event couldn't be allocated. Two sysctls are added:

debug.malloc.failure_rate

How often to generate a failure: if set to 0 (default), this
feature is disabled. Otherwise, the frequency of failures --
I've been using 10 (one in ten mallocs fails), but other
popular settings might be much lower or much higher.

debug.malloc.failure_count

Number of times a coerced malloc failure has occurred as a
result of this feature. Useful for tracking what might have
happened and whether failures are being generated.

Useful possible additions: tying failure rate to malloc type,
printfs indicating the thread that experienced the coerced
failure.

Reviewed by: jeffr, jhb


# 194a0abf 10-Mar-2003 Poul-Henning Kamp <phk@FreeBSD.org>

PHCC[1]:
I had commented the #ifdef INVARIANTS checks out to make sure I ran this
code in all kernels and forgot to comment the #ifdefs back in before I
committed.

Spotted by: bmilekic

[1] PHCC = Pointy Hat Correction Commit


# d3c11994 10-Mar-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Make malloc and mbuf allocation mode flags nonoverlapping.

Under INVARIANTS whine if we get incompatible flags.

Submitted by: imp


# 025b4be1 19-Feb-2003 Bosko Milekic <bmilekic@FreeBSD.org>

o Allow "buckets" in mb_alloc to be differently sized (according to
compile-time constants). That is, a "bucket" now is not necessarily
a page-worth of mbufs or clusters, but it is MBUF_BUCK_SZ, CLUS_BUCK_SZ
worth of mbufs, clusters.
o Rename {mbuf,clust}_limit to {mbuf,clust}_hiwm and introduce
{mbuf,clust}_lowm, which currently has no effect but will be used
to set the low watermarks.
o Fix netstat so that it can deal with the differently-sized buckets
and teach it about the low watermarks too.
o Make sure the per-cpu stats for an absent CPU has mb_active set to 0,
explicitly.
o Get rid of the allocate refcounts from mbuf map mess. Instead,
just malloc() the refcounts in one shot from mbuf_init()
o Clean up / update comments in subr_mbuf.c


# a163d034 18-Feb-2003 Warner Losh <imp@FreeBSD.org>

Back out M_* changes, per decision of the TRB.

Approved by: trb


# 4db4f5c8 01-Feb-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Under #ifdef DIAGNOSTIC, fill malloc(9) allocations which do not have
M_ZERO specified with 0x70. (malloc_flags=J for the kernel :-)


# 44956c98 21-Jan-2003 Alfred Perlstein <alfred@FreeBSD.org>

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


# 1fb14a47 01-Nov-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Introduce malloc_last_fail() which returns the number of seconds since
malloc(9) failed last time. This is intended to help code adjust
memory usage to the current circumstances.

A typical use could be:
if (malloc_last_fail() < 60)
reduce_cache_by_one();


# 99571dc3 18-Sep-2002 Jeff Roberson <jeff@FreeBSD.org>

- Split UMA_ZFLAG_OFFPAGE into UMA_ZFLAG_OFFPAGE and UMA_ZFLAG_HASH.
- Remove all instances of the mallochash.
- Stash the slab pointer in the vm page's object pointer when allocating from
the kmem_obj.
- Use the overloaded object pointer to find slabs for malloced memory.


# 280759e7 31-May-2002 Robert Drehmel <robert@FreeBSD.org>

- Replace the bandaid introduced in revision 1.110 with
a better solution.
- Add braces for a ``for'' statement containing a single
multi-line statement.


# 45eefe71 20-May-2002 Jake Burkholder <jake@FreeBSD.org>

Add a bandaid so that sysctl kern.malloc works on sparc64.


# 42e49865 20-May-2002 John Baldwin <jhb@FreeBSD.org>

Fix the td_intr_nesting_level check to work ok if a flag like M_ZERO is
passed in with M_WAITOK to malloc().


# 8f70816c 02-May-2002 Jeff Roberson <jeff@FreeBSD.org>

Hide a pointer to the malloc_type bucket at the end of the freed memory. If
this memory is modified after it has been freed we can now report it's
previous owner.


# 5a34a9f0 02-May-2002 Jeff Roberson <jeff@FreeBSD.org>

malloc/free(9) no longer require Giant. Use the malloc_mtx to protect the
mallochash. Mallochash is going to go away as soon as I introduce the
kfree/kmalloc api and partially overhaul the malloc wrapper. This can't happen
until all users of the malloc api that expect memory to be aligned on the size
of the allocation are fixed.


# 639c9550 01-May-2002 Jeff Roberson <jeff@FreeBSD.org>

Remove the temporary alignment check in free().

Implement the following checks on freed memory in the bucket path:
- Slab membership
- Alignment
- Duplicate free

This previously was only done if we skipped the buckets. This code will slow
down INVARIANTS a bit, but it is smp safe. The checks were moved out of the
normal path and into hooks supplied in uma_dbg.


# 289f207c 30-Apr-2002 Jeff Roberson <jeff@FreeBSD.org>

Convert longs to u_longs in stats. This will hold off wrap arounds for a
while longer.


# 8efc4eff 30-Apr-2002 Jeff Roberson <jeff@FreeBSD.org>

Add a new UMA debugging facility. This will overwrite freed memory with
0xdeadc0de and then check for it just before memory is handed off as part
of a new request. This will catch any post free/pre alloc modification of
memory, as well as introduce errors for anything that tries to dereference
it as a pointer.

This code takes the form of special init, fini, ctor and dtor routines that
are specificly used by malloc. It is in a seperate file because additional
debugging aids will want to live here as well.


# 2cc35ff9 29-Apr-2002 Jeff Roberson <jeff@FreeBSD.org>

Move the implementation of M_ZERO into UMA so that it can be passed to
uma_zalloc and friends. Remove this functionality from the malloc wrapper.

Document this change in uma.h and adjust variable names in uma_core.


# 43a7c4e9 29-Apr-2002 Robert Watson <rwatson@FreeBSD.org>

Re-add the 16384 bucket also.

Submitted by: green


# bd796eb2 29-Apr-2002 Robert Watson <rwatson@FreeBSD.org>

Revert a portion of kern_malloc.c:1.99, which (in addition to adding
malloc profiling) also modified the set of pre-defined buckets for the
memory allocator. For reasons unknown to me, this resulted in extensive
memory corruption in the kernel, in particular on SMP boxes, so I'm
committing this work-around until Jeff gets a chance to debug it
properly. David Wolfskill pointed me at this commit as the one that
might be a problem; I've been running this code on two dual-processor
burn-in boxes for about 12 hours now, and the rate of panics due to
memory corruption has dropped to zero (from one every five minutes).

Hopefully not treading on the toes of: jeff


# 708da94e 23-Apr-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Add a basic sanity check on pointers passed to free(9).

Should be improved by: jeff


# 5e914b96 14-Apr-2002 Jeff Roberson <jeff@FreeBSD.org>

Finish adding support code for sysctl kern.mprof. This dumps some malloc
information related to bucket size effeciency. Three things are printed on
each row:

Size is the size the user actually asked for rounded to 16 bytes.
Requests is the number of times this size was asked for.
Real Size is the size we actually handed out.

At the end the total memory used and total waste is displayed. Currently my
system displays about 33% wasted memory.

The intent of this code is to gather statistics for tuning the malloc bucket
sizes. It is not intended to be run with INVARIANTS and it is not entirely
mp safe. It can be enabled via 'options MALLOC_PROFILE' which was commited
earlier.


# 6f267175 14-Apr-2002 Jeff Roberson <jeff@FreeBSD.org>

Remove malloc_type's ks_limit.

Updated the kmemzones logic such that the ks_size bitmap can be used as an
index into it to report the size of the zone used.

Create the kern.malloc sysctl which replaces the kvm mechanism to report
similar data. This will provide an easy place for statistics aggregation if
malloc_type statistics become per cpu data.

Add some code ifdef'd under MALLOC_PROFILING to facilitate a tool for sizing
the malloc buckets.


# 6008862b 04-Apr-2002 John Baldwin <jhb@FreeBSD.org>

Change callers of mtx_init() to pass in an appropriate lock type name. In
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.

Tested on: i386, alpha, sparc64


# 4d77a549 19-Mar-2002 Alfred Perlstein <alfred@FreeBSD.org>

Remove __P.


# 8355f576 19-Mar-2002 Jeff Roberson <jeff@FreeBSD.org>

This is the first part of the new kernel memory allocator. This replaces
malloc(9) and vm_zone with a slab like allocator.

Reviewed by: arch@


# 44a8ff31 12-Mar-2002 Archie Cobbs <archie@FreeBSD.org>

Add realloc() and reallocf(), and make free(NULL, ...) acceptable.

Reviewed by: alfred


# b40ce416 12-Sep-2001 Julian Elischer <julian@FreeBSD.org>

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# c4a44810 10-Aug-2001 John Baldwin <jhb@FreeBSD.org>

- Remove asleep(), await(), and M_ASLEEP.
- Callers of asleep() and await() have been converted to calling tsleep().
The only caller outside of M_ASLEEP was the ata driver, which called both
asleep() and await() with spl-raised, so there was no need for the
asleep() and await() pair. M_ASLEEP was unused.

Reviewed by: jasone, peter


# ba3e8826 02-Aug-2001 Bosko Milekic <bmilekic@FreeBSD.org>

Rename mb_init() mbuf subsystem initialization routine to mbuf_init(), in
order to avoid namespace collision with subr_mchain.c's mb_init(). This
wasn't "fatal" as the mbuf initialization routine mb_init() was local to
subr_mbuf.c which in turn didn't pull in subr_mchain.c's mb_init()
declaration, but it should deffinately be changed now before it creates
headache.


# f74250ca 02-Aug-2001 Jake Burkholder <jake@FreeBSD.org>

Remove some code that appears to have endian problems with INVARIANTS.
This is #if BIG_ENDIAN, but is only necessary if malloc types are shorts,
not struct malloc_type * like they are now.


# 08442f8a 22-Jun-2001 Bosko Milekic <bmilekic@FreeBSD.org>

Introduce numerous SMP friendly changes to the mbuf allocator. Namely,
introduce a modified allocation mechanism for mbufs and mbuf clusters; one
which can scale under SMP and which offers the possibility of resource
reclamation to be implemented in the future. Notable advantages:

o Reduce contention for SMP by offering per-CPU pools and locks.
o Better use of data cache due to per-CPU pools.
o Much less code cache pollution due to excessively large allocation macros.
o Framework for `grouping' objects from same page together so as to be able
to possibly free wired-down pages back to the system if they are no longer
needed by the network stacks.

Additional things changed with this addition:

- Moved some mbuf specific declarations and initializations from
sys/conf/param.c into mbuf-specific code where they belong.
- m_getclr() has been renamed to m_get_clrd() because the old name is really
confusing. m_getclr() HAS been preserved though and is defined to the new
name. No tree sweep has been done "to change the interface," as the old
name will continue to be supported and is not depracated. The change was
merely done because m_getclr() sounds too much like "m_get a cluster."
- TEMPORARILY disabled mbtypes statistics displaying in netstat(1) and
systat(1) (see TODO below).
- Fixed systat(1) to display number of "free mbufs" based on new per-CPU
stat structures.
- Fixed netstat(1) to display new per-CPU stats based on sysctl-exported
per-CPU stat structures. All infos are fetched via sysctl.

TODO (in order of priority):

- Re-enable mbtypes statistics in both netstat(1) and systat(1) after
introducing an SMP friendly way to collect the mbtypes stats under the
already introduced per-CPU locks (i.e. hopefully don't use atomic() - it
seems too costly for a mere stat update, especially when other locks are
already present).
- Optionally have systat(1) display not only "total free mbufs" but also
"total free mbufs per CPU pool."
- Fix minor length-fetching issues in netstat(1) related to recently
re-enabled option to read mbuf stats from a core file.
- Move reference counters at least for mbuf clusters into an unused portion
of the cluster itself, to save space and need to allocate a counter.
- Look into introducing resource freeing possibly from a kproc.

Reviewed by (in parts): jlemon, jake, silby, terry
Tested by: jlemon (Intel & Alpha), mjacob (Intel & Alpha)
Preliminary performance measurements: jlemon (and me, obviously)
URL: http://people.freebsd.org/~bmilekic/mb_alloc/


# 09786698 07-Jun-2001 Peter Wemm <peter@FreeBSD.org>

"Fix" the previous initial attempt at fixing TUNABLE_INT(). This time
around, use a common function for looking up and extracting the tunables
from the kernel environment. This saves duplicating the same function
over and over again. This way typically has an overhead of 8 bytes + the
path string, versus about 26 bytes + the path string.


# 4422746f 06-Jun-2001 Peter Wemm <peter@FreeBSD.org>

Back out part of my previous commit. This was a last minute change
and I botched testing. This is a perfect example of how NOT to do
this sort of thing. :-(


# 81930014 06-Jun-2001 Peter Wemm <peter@FreeBSD.org>

Make the TUNABLE_*() macros look and behave more consistantly like the
SYSCTL_*() macros. TUNABLE_INT_DECL() was an odd name because it didn't
actually declare the int, which is what the name suggests it would do.


# fb919e4d 01-May-2001 Mark Murray <markm@FreeBSD.org>

Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by: bde (with reservations)


# d04d50d1 18-Apr-2001 Bosko Milekic <bmilekic@FreeBSD.org>

Fix inconsistency in setup of kernel_map: we need to make sure that
we also reserve _adequate_ space for the mb_map submap; i.e. we need
space for nmbclusters, nmbufs, _and_ nmbcnt. Furthermore, we need to
rounddown, and not roundup, so that we are consistent.

Pointed out by: bde


# 9ed346ba 08-Feb-2001 Bosko Milekic <bmilekic@FreeBSD.org>

Change and clean the mutex lock interface.

mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)


# 1707240d 30-Jan-2001 Boris Popov <bp@FreeBSD.org>

Let M_PANIC go back to the private tree as its intention isn't understood well
for now.


# 9211b0b6 28-Jan-2001 Boris Popov <bp@FreeBSD.org>

Add M_PANIC flag to the list of available flags passed to malloc().
With this flag set malloc() will panic if memory allocation failed.
This usable only in critical places where failed allocation is fatal.

Reviewed by: peter


# 0fee3d35 26-Jan-2001 Peter Wemm <peter@FreeBSD.org>

p->p_intr_nesting_level is MI now and initialized to 0 in kern_fork.c,
so it should be save to KASSERT() on it even on an arch that may not
use it.


# 0d6d6aa3 23-Jan-2001 John Baldwin <jhb@FreeBSD.org>

Don't grab Giant when calling kmem_alloc/kmem_free as this is just
encouraging other people to follow the same practice. If this is going
to be done, then it should be done inside of those two functions instead.


# a448b62a 21-Jan-2001 Jake Burkholder <jake@FreeBSD.org>

Make intr_nesting_level per-process, rather than per-cpu. Setup
interrupt threads to run with it always >= 1, so that malloc can
detect M_WAITOK from "interrupt" context. This is also necessary
in order to context switch from sched_ithd() directly.

Reviewed By: peter


# d1c1b841 21-Jan-2001 Jason Evans <jasone@FreeBSD.org>

Remove MUTEX_DECLARE() and MTX_COLD. Instead, postpone full mutex
initialization until after malloc() is safe to call, then iterate through
all mutexes and complete their initialization.

This change is necessary in order to avoid some circular bootstrapping
dependencies.


# ef73ae4b 09-Jan-2001 Jake Burkholder <jake@FreeBSD.org>

Use PCPU_GET, PCPU_PTR and PCPU_SET to access all per-cpu variables
other then curproc.


# 1921a06d 20-Oct-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Introduce the M_ZERO flag to malloc(9)

Instead of:

foo = malloc(sizeof(foo), M_WAIT);
bzero(foo, sizeof(foo));

You can now (and please do) use:

foo = malloc(sizeof(foo), M_WAIT | M_ZERO);

In the future this will enable us to do idle-time pre-zeroing of
malloc-space.


# eec258d2 20-Oct-2000 John Baldwin <jhb@FreeBSD.org>

- machine/mutex.h -> sys/mutex.h
- Use MUTEX_DECLARE() and MTX_COLD for the malloc_mtx mutex


# 9a02e8c6 22-Sep-2000 Jason Evans <jasone@FreeBSD.org>

Don't #include <sys/proc.h>, since machine/mutex.h does it now.


# 606f8eb2 14-Sep-2000 John Baldwin <jhb@FreeBSD.org>

Remove the mtx_t, witness_t, and witness_blessed_t types. Instead, just
use struct mtx, struct witness, and struct witness_blessed.

Requested by: bde


# 69ef67f9 10-Sep-2000 Jason Evans <jasone@FreeBSD.org>

Add malloc_mtx to protect malloc and friends, so that they're thread-safe.

Reviewed by: peter


# 28d4c2dd 09-Sep-2000 Jason Evans <jasone@FreeBSD.org>

Back out the addition of malloc_mtx. It was incompletely conceived, and
will be done correctly in the future.


# 9360d3eb 09-Sep-2000 Jason Evans <jasone@FreeBSD.org>

Add a mutex to the malloc interfaces so that it can safely be called
without owning the Giant lock.


# 5badeaba 29-Jun-2000 Boris Popov <bp@FreeBSD.org>

Move #ifdef to the right place.


# 99063cf8 28-Jun-2000 Boris Popov <bp@FreeBSD.org>

If kernel compiled with INVARIANTS:

On unload, remove references from freelist to memory type defined by module.
Print a warning if module defines and allocate its own memory type, but
didn't free it all on unload.

Reviewed by: peter


# 8e8cac55 14-Jun-2000 Bruce Evans <bde@FreeBSD.org>

sys/malloc.h:
Order the SYSINIT() for MALLOC_DEFINE() correctly so that malloc()
doesn't have to waste time initializing itself. The
(SI_SUB_KMEM, SI_ORDER_ANY) order was shared with syscons' SYSINIT()
for scmeminit(), and scmeminit() calls malloc(), so malloc()
initialization was not always complete on the first call to malloc().

kern/kern_malloc.c:
- Removed self-initialization in malloc().
- Removed half-baked sanity check in free(). Trust MALLOC_DEFINE().


# dc760634 14-Mar-2000 Jun Kuriyama <kuriyama@FreeBSD.org>

Print "previous type" correctly when INVARIANTS is defined.

Reviewed by: current@FreeBSD.org


# 1f6889a1 16-Feb-2000 Matthew Dillon <dillon@FreeBSD.org>

Fix null-pointer dereference crash when the system is intentionally
run out of KVM through a mmap()/fork() bomb that allocates hundreds
of thousands of vm_map_entry structures.

Add panic to make null-pointer dereference crash a little more verbose.

Add a new sysctl, vm.max_proc_mmap, which specifies the maximum number
of mmap()'d spaces (discrete vm_map_entry's in the process). The value
defaults to around 9000 for a 128MB machine. The test is scaled for the
number of processes sharing a vmspace (aka linux threads). Setting
the value to 0 disables the feature.

PR: kern/16573
Approved by: jkh


# 27b8623f 27-Jan-2000 David Greenman <dg@FreeBSD.org>

Fixed sign and overflow bugs that caused the allocation size of the kernel
malloc region (kmem_map) to be wrong and semi-random on systems with more
than 1GB of RAM. This is not a complete fix, but is sufficient for
machines with 4GB or less of memory. A complete fix will require some
changes to the getenv stuff so that 64bit values can be passed around.

NOT FIXED: machines with more than 4GB of RAM (e.g. some large Alphas)
since we're still using ints to hold some of the values.

Reviewed by: bde


# 82cd038d 21-Nov-1999 Yoshinobu Inoue <shin@FreeBSD.org>

KAME netinet6 basic part(no IPsec,no V6 Multicast Forwarding, no UDP/TCP
for IPv6 yet)

With this patch, you can assigne IPv6 addr automatically, and can reply to
IPv6 ping.

Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project


# 3b6fb885 02-Oct-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Before we start to mess with the VFS name-cache clean things up a little bit:
Isolate the namecache in its own file, and give it a dedicated malloc type.


# 984982d6 19-Sep-1999 Poul-Henning Kamp <phk@FreeBSD.org>

KASSERT that we cannot use M_WAITOK in interrupt context.

Reviewed by: bde


# 9ef246c6 11-Sep-1999 Bruce Evans <bde@FreeBSD.org>

Get rid of MALLOC_INSTANTIATE and MALLOC_MAKE_TYPE(). Just handle the 3
malloc types declared in <sys/malloc.h> like other global malloc types.


# c3aac50f 27-Aug-1999 Peter Wemm <peter@FreeBSD.org>

$Id$ -> $FreeBSD$


# 134c934c 05-Jul-1999 Mike Smith <msmith@FreeBSD.org>

Move the initialisation/tuning of nmbclusters from param.c/machdep.c
into uipc_mbuf.c. This reduces three sets of identical tunable code to
one set, and puts the initialisation with the mbuf code proper.

Make NMBUFs tunable as well.

Move the nmbclusters sysctl here as well.

Move the initialisation of maxsockets from param.c to uipc_socket2.c,
next to its corresponding sysctl.

Use the new tunable macros for the kern.vm.kmem.size tunable (this should have
been in a separate commit, whoops).


# ce45b512 12-May-1999 Bruce Evans <bde@FreeBSD.org>

Fixed corruption of the kmemstatistcs list. The first malloc()
with malloc type at the tail of the list changed the list from
linear to circular. This seemed to cause surprisingly few problems,
but it now causes weird output from `vmstat -m', probably because
a more important malloc type is now at the tail of the list.

Fix it by abusing ks_limit instead of ks_next as a flag for being
on the list. Don't forget to clear the flag when a malloc type is
uninit'ed. Uninit'ing is still fundamentally broken -- it loses
history.


# dfd5dee1 06-May-1999 Peter Wemm <peter@FreeBSD.org>

Add sufficient braces to keep egcs happy about potentially ambiguous
if/else nesting.


# d254af07 27-Jan-1999 Matthew Dillon <dillon@FreeBSD.org>

Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile


# 8de6e8e1 21-Jan-1999 Mike Smith <msmith@FreeBSD.org>

Allow VM_KMEM_SIZE to be tuned from the kernel environment. This tuning
value *completely* overrides any value precalculated by the kernel.


# 1c7c3c6a 21-Jan-1999 Matthew Dillon <dillon@FreeBSD.org>

This is a rather large commit that encompasses the new swapper,
changes to the VM system to support the new swapper, VM bug
fixes, several VM optimizations, and some additional revamping of the
VM code. The specific bug fixes will be documented with additional
forced commits. This commit is somewhat rough in regards to code
cleanup issues.

Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>


# 219cbf59 09-Jan-1999 Eivind Eklund <eivind@FreeBSD.org>

KNFize, by bde.


# 5526d2d9 08-Jan-1999 Eivind Eklund <eivind@FreeBSD.org>

Split DIAGNOSTIC -> DIAGNOSTIC, INVARIANTS, and INVARIANT_SUPPORT as
discussed on -hackers.

Introduce 'KASSERT(assertion, ("panic message", args))' for simple
check + panic.

Reviewed by: msmith


# db669378 10-Nov-1998 Peter Wemm <peter@FreeBSD.org>

Have MALLOC_DECLARE() initialize malloc types explicitly, and have them
removed at module unload (if in a module of course).
However; this introduces a new dependency on <sys/kernel.h> for things
that use MALLOC_DECLARE(). Bruce told me it is better to add sys/kernel.h
to the handful of files that need it rather than add an extra include to
sys/malloc.h for kernel compiles. Updates to follow in subsequent commits.


# f5ef029e 25-Oct-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Nitpicking and dusting performed on a train. Removes trivial warnings
about unused variables, labels and other lint.


# 86a14a7a 15-Aug-1998 Bruce Evans <bde@FreeBSD.org>

Use [u]intptr_t instead of [u_]long for casts between pointers and
integers. Don't forget to cast to (void *) as well.


# d974cf4d 29-Jul-1998 Bruce Evans <bde@FreeBSD.org>

Fixed printf format errors.


# b1897c19 08-Mar-1998 Julian Elischer <julian@FreeBSD.org>

Reviewed by: dyson@freebsd.org (john Dyson), dg@root.com (david greenman)
Submitted by: Kirk McKusick (mcKusick@mckusick.com)
Obtained from: WHistle development tree


# 8a58a9f6 23-Feb-1998 John Dyson <dyson@FreeBSD.org>

Try to dynamically size the VM_KMEM_SIZE (but is still able to be overridden
in a way identically as before.) I had problems with the system properly
handling the number of vnodes when there is alot of system memory, and the
default VM_KMEM_SIZE. Two new options "VM_KMEM_SIZE_SCALE" and
"VM_KMEM_SIZE_MAX" have been added to support better auto-sizing for systems
with greater than 128MB.


# 303b270b 08-Feb-1998 Eivind Eklund <eivind@FreeBSD.org>

Staticize.


# 0b08f5f7 05-Feb-1998 Eivind Eklund <eivind@FreeBSD.org>

Back out DIAGNOSTIC changes.


# 95461b45 04-Feb-1998 John Dyson <dyson@FreeBSD.org>

1) Start using a cleaner and more consistant page allocator instead
of the various ad-hoc schemes.
2) When bringing in UPAGES, the pmap code needs to do another vm_page_lookup.
3) When appropriate, set the PG_A or PG_M bits a-priori to both avoid some
processor errata, and to minimize redundant processor updating of page
tables.
4) Modify pmap_protect so that it can only remove permissions (as it
originally supported.) The additional capability is not needed.
5) Streamline read-only to read-write page mappings.
6) For pmap_copy_page, don't enable write mapping for source page.
7) Correct and clean-up pmap_incore.
8) Cluster initial kern_exec pagin.
9) Removal of some minor lint from kern_malloc.
10) Correct some ioopt code.
11) Remove some dead code from the MI swapout routine.
12) Correct vm_object_deallocate (to remove backing_object ref.)
13) Fix dead object handling, that had problems under heavy memory load.
14) Add minor vm_page_lookup improvements.
15) Some pages are not in objects, and make sure that the vm_page.c can
properly support such pages.
16) Add some more page deficit handling.
17) Some minor code readability improvements.


# 47cfdb16 04-Feb-1998 Eivind Eklund <eivind@FreeBSD.org>

Turn DIAGNOSTIC into a new-style option.


# 2d8acc0f 22-Jan-1998 John Dyson <dyson@FreeBSD.org>

VM level code cleanups.

1) Start using TSM.
Struct procs continue to point to upages structure, after being freed.
Struct vmspace continues to point to pte object and kva space for kstack.
u_map is now superfluous.
2) vm_map's don't need to be reference counted. They always exist either
in the kernel or in a vmspace. The vmspaces are managed by reference
counts.
3) Remove the "wired" vm_map nonsense.
4) No need to keep a cache of kernel stack kva's.
5) Get rid of strange looking ++var, and change to var++.
6) Change more data structures to use our "zone" allocator. Added
struct proc, struct vmspace and struct vnode. This saves a significant
amount of kva space and physical memory. Additionally, this enables
TSM for the zone managed memory.
7) Keep ioopt disabled for now.
8) Remove the now bogus "single use" map concept.
9) Use generation counts or id's for data structures residing in TSM, where
it allows us to avoid unneeded restart overhead during traversals, where
blocking might occur.
10) Account better for memory deficits, so the pageout daemon will be able
to make enough memory available (experimental.)
11) Fix some vnode locking problems. (From Tor, I think.)
12) Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp.
(experimental.)
13) Significantly shrink, cleanup, and make slightly faster the vm_fault.c
code. Use generation counts, get rid of unneded collpase operations,
and clean up the cluster code.
14) Make vm_zone more suitable for TSM.

This commit is partially as a result of discussions and contributions from
other people, including DG, Tor Egge, PHK, and probably others that I
have forgotten to attribute (so let me know, if I forgot.)

This is not the infamous, final cleanup of the vnode stuff, but a necessary
step. Vnode mgmt should be correct, but things might still change, and
there is still some missing stuff (like ioopt, and physical backing of
non-merged cache files, debugging of layering concepts.)


# d4060a87 04-Dec-1997 John Dyson <dyson@FreeBSD.org>

Some fixes from John Hood:
1) Fix the initialization of malloc structure that changed
due to perf opt.
2) Remove unneeded include.
3) An initialization assert added to malloc.
Submitted by: John Hood <cgull@smoke.marlboro.vt.us>


# d1bbc7ec 28-Oct-1997 Poul-Henning Kamp <phk@FreeBSD.org>

Remove the long description from the in-kernel datastructure.
Put a magic field in there instead, to help catch uninitialized
malloc types.


# a1c995b6 12-Oct-1997 Poul-Henning Kamp <phk@FreeBSD.org>

Last major round (Unless Bruce thinks of somthing :-) of malloc changes.

Distribute all but the most fundamental malloc types. This time I also
remembered the trick to making things static: Put "static" in front of
them.

A couple of finer points by: bde


# 22c64348 11-Oct-1997 Poul-Henning Kamp <phk@FreeBSD.org>

Freeing with unknown type is a panic kind of thing.


# a42428bb 11-Oct-1997 Poul-Henning Kamp <phk@FreeBSD.org>

Remove a debug printf entirely.


# bd975223 11-Oct-1997 Peter Wemm <peter@FreeBSD.org>

Disable an extremely annoying printf.


# 60a513e9 10-Oct-1997 Poul-Henning Kamp <phk@FreeBSD.org>

Rename "struct kmemstats" to "struct malloc_type" it makes more sense now.

Fix type argument to hashinit() and phashinit()


# 254c6cb3 10-Oct-1997 Poul-Henning Kamp <phk@FreeBSD.org>

Make malloc more extensible. The malloc type is now a pointer to
the struct kmemstats that describes the type.

This allows subsystems to declare their malloc types locally
and <sys/malloc.h> doesn't need tweaked everytime somebody
gets an idea. You can even have a type local to a lkm...

I don't know if we really need the longdesc, comments welcome.

TODO: There is a single nit in ext2fs, that will be fixed later,
and I intend to remove all unused malloc types and distribute
the rest closer to their use.


# 043a2f3b 16-Sep-1997 Bruce Evans <bde@FreeBSD.org>

Fixed staticization. buckets[] was staticized but was still declared
extern in <sys/malloc.h> and it should not have been staticized for
the !(KMEMSTATS || DIAGNOSTIC) case.

Fixed the !(KMEMSTATS || DIAGNOSTIC) case. The MALLOC() and FREE()
macros are evil, but code generally doesn't allow for this and some code
involving else clauses did not compile.

Finished staticization.


# e4ba6a82 02-Sep-1997 Bruce Evans <bde@FreeBSD.org>

Removed unused #includes.


# 3075778b 04-Aug-1997 John Dyson <dyson@FreeBSD.org>

Get rid of the ad-hoc memory allocator for vm_map_entries, in lieu of
a simple, clean zone type allocator. This new allocator will also be
used for machine dependent pmap PV entries.


# 358311fe 24-Jun-1997 David Greenman <dg@FreeBSD.org>

Killed bogus kernacc() call in malloc() DIAGNOSTIC code. kernacc() by
it's nature, locks the kernal_map, and this is deadly if kernal_map had
been locked previous to a (net) interrupt.


# 6875d254 22-Feb-1997 Peter Wemm <peter@FreeBSD.org>

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 1130b656 14-Jan-1997 Jordan K. Hubbard <jkh@FreeBSD.org>

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# ca67a4e4 04-Aug-1996 Poul-Henning Kamp <phk@FreeBSD.org>

The check for multiple freed items were bogus. fixed.


# 14bf02f8 18-May-1996 John Dyson <dyson@FreeBSD.org>

Minor performance improvement to kern_malloc.c that increases the
probability of reuse of recently freed memory. This improves cache
hit stats on cached memory, and improves at least fork speed consistancy.


# cb7545a9 10-May-1996 Garrett Wollman <wollman@FreeBSD.org>

Allocate mbufs from a separate submap so that NMBCLUSTERS works as
expected.


# e911eafc 02-May-1996 Poul-Henning Kamp <phk@FreeBSD.org>

removed:
CLBYTES PD_SHIFT PGSHIFT NBPG PGOFSET CLSIZELOG2 CLSIZE pdei()
ptei() kvtopte() ptetov() ispt() ptetoav() &c &c
new:
NPDEPG

Major macro cleanup.


# f8845af0 02-May-1996 Poul-Henning Kamp <phk@FreeBSD.org>

First pass at cleaning up macros relating to pages, clusters and all that.


# edbfedac 11-Mar-1996 Peter Wemm <peter@FreeBSD.org>

Import 4.4BSD-Lite2 onto the vendor branch, note that in the kernel, all
files are off the vendor branch, so this should not change anything.

A "U" marker generally means that the file was not changed in between
the 4.4Lite and Lite-2 releases, and does not need a merge. "C" generally
means that there was a change.
[note new unused (in this form) syscalls.conf, to be 'cvs rm'ed]


# 07bbd7f1 29-Jan-1996 David Greenman <dg@FreeBSD.org>

Implement what I mentioned in rev 1.18: limit per-bucket allocations to
60% of physical memory or 60% of malloc area size, whichever is smaller.


# 54e7152c 29-Jan-1996 David Greenman <dg@FreeBSD.org>

Fixed two bugs in the calculation of the malloc area (kmem_map) size:

1) The calculation didn't account for NMBCLUSTERS, so if a large number of
clusters was specified, it would leave little or no space for kernel
malloc.
2) It was bogusly restricted to v_page_count. This doesn't take into
account the sparseness of the malloc area and would have caused
problems on machines with small amounts of memory. It should probably
instead be changed to set the malloc limit to be constrained by
the amount of memory, but I didn't do this.


# 87b6de2b 14-Dec-1995 Poul-Henning Kamp <phk@FreeBSD.org>

A Major staticize sweep. Generates a couple of warnings that I'll deal
with later.
A number of unused vars removed.
A number of unused procs removed or #ifdefed.


# efeaf95a 06-Dec-1995 David Greenman <dg@FreeBSD.org>

Untangled the vm.h include file spaghetti.


# d841aaa7 02-Dec-1995 Bruce Evans <bde@FreeBSD.org>

Finished (?) cleaning up sysinit stuff.


# 4590fd3a 09-Sep-1995 David Greenman <dg@FreeBSD.org>

Fixed init functions argument type - caddr_t -> void *. Fixed a couple of
compiler warnings.


# 2b14f991 28-Aug-1995 Julian Elischer <julian@FreeBSD.org>

Reviewed by: julian with quick glances by bruce and others
Submitted by: terry (terry lambert)
This is a composite of 3 patch sets submitted by terry.
they are:
New low-level init code that supports loadbal modules better
some cleanups in the namei code to help terry in 16-bit character support
some changes to the mount-root code to make it a little more
modular..

NOTE: mounting root off cdrom or NFS MIGHT be broken as I haven't been able
to test those cases..

certainly mounting root of disk still works just fine..
mfs should work but is untested. (tomorrows task)

The low level init stuff includes a total rewrite of init_main.c
to make it possible for new modules to have an init phase by simply
adding an entry to a TEXT_SET (or is it DATA_SET) list. thus a new module can
be added to the kernel without editing any other files other than the
'files' file.


# 9b2e5354 30-May-1995 Rodney W. Grimes <rgrimes@FreeBSD.org>

Remove trailing whitespace.


# 5124d598 16-Apr-1995 David Greenman <dg@FreeBSD.org>

Make vegetarian and animal rights people happy and use 0xdeadc0de instead
of 0xdeadbeef as the fill pattern. Decreased MAX_COPY to 64 (256 was a bit
overzealous in most cases).


# edf8a815 19-Mar-1995 David Greenman <dg@FreeBSD.org>

Removed redundant newlines that were in some panic strings.


# 17dda4c9 11-Mar-1995 David Greenman <dg@FreeBSD.org>

Added some additional DIAGNOSTIC code that makes sure that freed
memory addresses and types are with the valid range. Increased
MAX_COPY to 256 (used to verify no freed memory use with DIAGNOSTIC).


# 9f518539 02-Feb-1995 David Greenman <dg@FreeBSD.org>

Calling semantics for kmem_malloc() have been changed...and the third
argument is now more than just a single flag. (kern_malloc.c)
Used new M_KERNEL value for socket allocations that previous were
"M_NOWAIT". Note that this will change when we clean up the M_ namespace
mess.

Submitted by: John Dyson


# 0d94caff 09-Jan-1995 David Greenman <dg@FreeBSD.org>

These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.

The majority of the merged VM/cache work is by John Dyson.

The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.

vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.

vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.

vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.

vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.

vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.

pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.

vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.

proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.

swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.

machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.

machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.

ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.

Submitted by: John Dyson and David Greenman


# 763424fc 16-Dec-1994 David Greenman <dg@FreeBSD.org>

Changed splimp to splhigh to close a potential hole that could lead
to corrupted malloc data structures caused by frees occurring at other
than splimp.


# 35c10d22 09-Oct-1994 David Greenman <dg@FreeBSD.org>

Got rid of map.h. It's a leftover from the rmap code, and we use rlists.
Changed swapmap into swaplist.


# 797f2d22 02-Oct-1994 Poul-Henning Kamp <phk@FreeBSD.org>

All of this is cosmetic. prototypes, #includes, printfs and so on. Makes
GCC a lot more silent.


# 3c4dd356 02-Aug-1994 David Greenman <dg@FreeBSD.org>

Added $Id$


# 26f9a767 25-May-1994 Rodney W. Grimes <rgrimes@FreeBSD.org>

The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.

Reviewed by: Rodney W. Grimes
Submitted by: John Dyson and David Greenman


# df8bae1d 24-May-1994 Rodney W. Grimes <rgrimes@FreeBSD.org>

BSD 4.4 Lite Kernel Sources