Cross Reference: /freebsd-current/sys/kern/kern

History log of /freebsd-current/sys/kern/kern_malloc.c
Revision	Date	Author	Comments
# deab5717	27-May-2024	Mitchell Horne <mhorne@FreeBSD.org>	Adjust comments referencing vm_mem_init() I cannot find a time where the function was not named this. Reviewed by: kib, markj MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D45383
# 29363fb4	23-Nov-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove ancient SCCS tags. Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script. Sponsored by: Netflix
# a03c2393	09-Nov-2023	Alexander Motin <mav@FreeBSD.org>	uma: Improve memory modified after free panic messages - Pass zone pointer to trash_ctor() and report zone name in the panic message. It may be difficult to figyre out zone just by the item size. - Do not pass user arguments to internal trash calls, pass thezone. - Report malloc type name in the same unified panic message. - Report corruption offset from the beginning of the items instead of the full pointer. It makes panic message shorter and more readable.
# 685dc743	16-Aug-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove $FreeBSD$: one-line .c pattern Remove /^[\s]__FBSDID$"\$FreeBSD\$"$;?\s*\n/
# f49fd63a	22-Sep-2022	John Baldwin <jhb@FreeBSD.org>	kmem_malloc/free: Use void * instead of vm_offset_t for kernel pointers. Reviewed by: kib, markj Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D36549
# c84c5e00	18-Jul-2022	Mitchell Horne <mhorne@FreeBSD.org>	ddb: annotate some commands with DB_CMD_MEMSAFE This is not completely exhaustive, but covers a large majority of commands in the tree. Reviewed by: markj Sponsored by: Juniper Networks, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D35583
# dbd51c41	12-Apr-2022	John Baldwin <jhb@FreeBSD.org>	realloc(9): Move slab and zone under #ifndef DEBUG_REDZONE.
# 880b670c	06-Oct-2021	Mark Johnston <markj@FreeBSD.org>	malloc: Unmark KASAN redzones if the full allocation size was requested Consumers that want the full allocation size will typically access the full buffer, so mark the entire allocation as valid to avoid useless KASAN reports. Sponsored by: The FreeBSD Foundation
# 71d31f1c	24-Sep-2021	Konstantin Belousov <kib@FreeBSD.org>	malloc_aligned(9): allow zero size and alignment For alignment we do not need to do anything to make it operational. For size, upgrade zero sized request to one byte so that we do not request insane amount of memory for placeholder. Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32127
# 10094910	10-Aug-2021	Mark Johnston <markj@FreeBSD.org>	uma: Add KMSAN hooks For now, just hook the allocation path: upon allocation, items are marked as initialized (absent M_ZERO). Some zones are exempted from this when it would otherwise raise false positives. Use kmsan_orig() to update the origin map for UMA and malloc(9) allocations. This allows KMSAN to print the return address when an uninitialized UMA item is implicated in a report. For example: panic: MSan: Uninitialized UMA memory from m_getm2+0x7fe Sponsored by: The FreeBSD Foundation
# 89786088	10-Aug-2021	Mark Johnston <markj@FreeBSD.org>	amd64: Populate the KMSAN shadow maps and integrate with the VM - During boot, allocate PDP pages for the shadow maps. The region above KERNBASE is currently not shadowed. - Create a dummy shadow for the vm page array. For now, this array is not protected by the shadow map to help reduce kernel memory usage. - Grow shadows when growing the kernel map. - Increase the default kernel stack size when KMSAN is enabled. As with KASAN, sanitizer instrumentation appears to create stack frames large enough that the default value is not sufficient. - Disable UMA's use of the direct map when KMSAN is configured. KMSAN cannot validate the direct map. - Disable unmapped I/O when KMSAN configured. - Lower the limit on paging buffers when KMSAN is configured. Each buffer has a static MAXPHYS-sized allocation of KVA, which in turn eats 2*MAXPHYS of space in the shadow map. Reviewed by: alc, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31295
# 04cc0c39	02-Aug-2021	Kyle Evans <kevans@FreeBSD.org>	malloc(9): provide missing malloc_aligned implementation Pointy hat: kevans Fixes: 6162cf885c00 ("malloc(9): Document/complete aligned variants")
# 45e23571	13-Jul-2021	Mark Johnston <markj@FreeBSD.org>	malloc: Pass the allocation size to malloc_large() by value Its callers do not make use the modified size that malloc_large() was returning, so there's no need to pass a pointer. No functional change intended. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
# 9a7c2de3	05-May-2021	Mark Johnston <markj@FreeBSD.org>	realloc: Fix KASAN(9) shadow map updates When copying from the old buffer to the new buffer, we don't know the requested size of the old allocation, but only the size of the allocation provided by UMA. This value is "alloc". Because the copy may access bytes in the old allocation's red zone, we must mark the full allocation valid in the shadow map. Do so using the correct size. Reported by: kp Tested by: kp Sponsored by: The FreeBSD Foundation
# 06a53ecf	13-Apr-2021	Mark Johnston <markj@FreeBSD.org>	malloc: Add state transitions for KASAN - Reuse some REDZONE bits to keep track of the requested and allocated sizes, and use that to provide red zones. - As in UMA, disable memory trashing to avoid unnecessary CPU overhead. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D29461
# 6faf45b3	13-Apr-2021	Mark Johnston <markj@FreeBSD.org>	amd64: Implement a KASAN shadow map The idea behind KASAN is to use a region of memory to track the validity of buffers in the kernel map. This region is the shadow map. The compiler inserts calls to the KASAN runtime for every emitted load and store, and the runtime uses the shadow map to decide whether the access is valid. Various kernel allocators call kasan_mark() to update the shadow map. Since the shadow map tracks only accesses to the kernel map, accesses to other kernel maps are not validated by KASAN. UMA_MD_SMALL_ALLOC is disabled when KASAN is configured to reduce usage of the direct map. Currently we have no mechanism to completely eliminate uses of the direct map, so KASAN's coverage is not comprehensive. The shadow map uses one byte per eight bytes in the kernel map. In pmap_bootstrap() we create an initial set of page tables for the kernel and preloaded data. When pmap_growkernel() is called, we call kasan_shadow_map() to extend the shadow map. kasan_shadow_map() uses pmap_kasan_enter() to allocate memory for the shadow region and map it. Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D29417
# 1ae20f7c	07-Mar-2021	Kyle Evans <kevans@FreeBSD.org>	kern: malloc: fix panic on M_WAITOK during THREAD_NO_SLEEPING() Simple condition flip; we wanted to panic here after epoch_trace_list(). Reviewed by: glebius, markj MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D29125
# c743a6bd	06-Mar-2021	Hans Petter Selasky <hselasky@FreeBSD.org>	Implement mallocarray_domainset(9) variant of mallocarray(9). Reviewed by: kib @ MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking
# 1ac7c344	18-Jan-2021	Konstantin Belousov <kib@FreeBSD.org>	malloc_aligned: roundup allocation size up to next power of two to make it use the right aligned zone. Reported by: melifaro Reviewed by: alc, markj (previous version) Discussed with: jrtc27 Tested by: pho (previous version) MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28219
# 0781c79d	18-Jan-2021	Konstantin Belousov <kib@FreeBSD.org>	Restrict supported alignment for malloc_domainset_aligned(9) to PAGE_SIZE. UMA page_alloc() does not take an alignment, so UMA can only handle alignment less then page size. Noted by: alc Reviewed by: alc, markj (previous version) Discussed with: jrtc27 Tested by: pho (previous version) MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28219
# 3b15beb3	13-Jan-2021	Konstantin Belousov <kib@FreeBSD.org>	Implement malloc_domainset_aligned(9). Change the power-of-two malloc zones to require alignment equal to the size []. Current uma allocator already provides such alignment, so in fact this change does not change anything except providing future-proof setup. Suggested by: markj [] Reviewed by: andrew, jah, markj Tested by: pho MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28147
# 89deca0a	16-Nov-2020	Mateusz Guzik <mjg@FreeBSD.org>	malloc: make malloc_large closer to standalone This moves entire large alloc handling out of all consumers, apart from deciding to go there. This is a step towards creating a fast path. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D27198
# 9b9bb9ff	13-Nov-2020	Mateusz Guzik <mjg@FreeBSD.org>	malloc: retire MALLOC_PROFILE The global array has prohibitive performance impact on multicore systems. The same data (and more) can be obtained with dtrace. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D27199
# 9aa6d792	12-Nov-2020	Mateusz Guzik <mjg@FreeBSD.org>	malloc: retire malloc_last_fail The routine does not serve any practical purpose. Memory can be allocated in many other ways and most consumers pass the M_WAITOK flag, making malloc not fail in the first place. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D27143
# f0c90a09	09-Nov-2020	Mateusz Guzik <mjg@FreeBSD.org>	malloc: provide 384 byte zone Total page count after buildworld on ZFS for 384 (if present) and 512 zones: before: 29713 after: 25946 per-zone page use: vm.uma.malloc_384.keg.domain.1.pages: 11621 vm.uma.malloc_384.keg.domain.0.pages: 11597 vm.uma.malloc_512.keg.domain.1.pages: 1280 vm.uma.malloc_512.keg.domain.0.pages: 1448 Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D27145
# 8e6526e9	09-Nov-2020	Mateusz Guzik <mjg@FreeBSD.org>	malloc: retire mt_stats_zone in favor of pcpu_zone_64 Reviewed by: markj, imp Differential Revision: https://reviews.freebsd.org/D27142
# e25d8b67	06-Nov-2020	Mateusz Guzik <mjg@FreeBSD.org>	malloc: tweak the version check in r367432 to include type name While here fix a whitespace problem.
# bdcc2226	06-Nov-2020	Mateusz Guzik <mjg@FreeBSD.org>	malloc: move malloc_type_internal into malloc_type According to code comments the original motivation was to allow for malloc_type_internal changes without ABI breakage. This can be trivially accomplished by providing spare fields and versioning the struct, as implemented in the patch below. The upshots are one less memory indirection on each alloc and disappearance of mt_zone. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D27104
# 16b971ed	05-Nov-2020	Mateusz Guzik <mjg@FreeBSD.org>	malloc: add a helper returning size allocated for given request Sample usage: kernel modules can decide whether to stick to malloc or create their own zone. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D27097
# e1b6a7f8	02-Nov-2020	Mateusz Guzik <mjg@FreeBSD.org>	malloc: prefix zones with malloc- Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D27038
# 828afdda	02-Nov-2020	Mateusz Guzik <mjg@FreeBSD.org>	malloc: export kernel zones instead of relying on them being power-of-2 Reviewed by: markj (previous version) Differential Revision: https://reviews.freebsd.org/D27026
# 82c174a3	30-Oct-2020	Mateusz Guzik <mjg@FreeBSD.org>	malloc: delegate M_EXEC handling to dedicacted routines It is almost never needed and adds an avoidable branch. While here do minior clean ups in preparation for larger changes. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D27019
# 6fed89b1	01-Sep-2020	Mateusz Guzik <mjg@FreeBSD.org>	kern: clean up empty lines in .c and .h files
# 5d4bf057	29-Aug-2020	Vladimir Kondratyev <wulf@FreeBSD.org>	LinuxKPI: Implement ksize() function. In Linux, ksize() gets the actual amount of memory allocated for a given object. This commit adds malloc_usable_size() to FreeBSD KPI which does the same. It also maps LinuxKPI ksize() to newly created function. ksize() function is used by drm-kmod. Reviewed by: hselasky, kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D26215
# b21b022a	18-Aug-2020	Mark Johnston <markj@FreeBSD.org>	Revert r364310. Some of the resulting fallout in CAM does not appear straightforward to fix, so simply revert the commit for now in the absence of a better solution. Discussed with: mjg Reported by: dhw
# 1921bb7b	17-Aug-2020	Gleb Smirnoff <glebius@FreeBSD.org>	With INVARIANTS panic immediately if M_WAITOK is requested in a non-sleepable context. Previously only _sleep() would panic. This will catch misuse of M_WAITOK at development stage rather than at stress load stage. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D26027
# 96ad26ee	04-Aug-2020	Mark Johnston <markj@FreeBSD.org>	Remove free_domain() and uma_zfree_domain(). These functions were introduced before UMA started ensuring that freed memory gets placed in domain-local caches. They no longer serve any purpose since UMA now provides their functionality by default. Remove them to simplyify the kernel memory allocator interfaces a bit. Reviewed by: cem, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25937
# 7029da5c	26-Feb-2020	Pawel Biernacki <kaktus@FreeBSD.org>	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718
# eaa17d42	22-Feb-2020	Ryan Libby <rlibby@FreeBSD.org>	sys/vm: quiet -Wwrite-strings Discussed with: kib Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D23796
# 45035bec	15-Feb-2020	Matt Macy <mmacy@FreeBSD.org>	Add zfree to zero allocation before free Key and cookie management typically wants to avoid information leaks by explicitly zeroing before free. This routine simplifies that by permitting consumers to do so without carrying the size around. Reviewed by: jeff@, jhb@ MFC after: 1 week Sponsored by: Rubicon Communications, LLC (Netgate) Differential Revision: https://reviews.freebsd.org/D22790
# 58aa35d4	03-Feb-2020	Warner Losh <imp@FreeBSD.org>	Remove sparc64 kernel support Remove all sparc64 specific files Remove all sparc64 ifdefs Removee indireeect sparc64 ifdefs
# dc727127	09-Jan-2020	Mark Johnston <markj@FreeBSD.org>	Change malloc_domain() to return the allocation size to the caller. Otherwise the malloc type accounting in malloc_domainset(9) is wrong after r355203. Reviewed by: rlibby Reported by: kaktus Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23095
# 6d6a03d7	28-Nov-2019	Jeff Roberson <jeff@FreeBSD.org>	Handle large mallocs by going directly to kmem. Taking a detour through UMA does not provide any additional value. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D22563
# b476ae7f	28-Nov-2019	Jeff Roberson <jeff@FreeBSD.org>	Fix DEBUG_REDZONE build after r355169
# 584061b4	28-Nov-2019	Jeff Roberson <jeff@FreeBSD.org>	Garbage collect the mostly unused us_keg field. Use appropriately named union members in vm_page.h to store the zone and slab. Remove some nearby dead code. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D22564
# 5757b59f	29-Oct-2019	Gleb Smirnoff <glebius@FreeBSD.org>	Merge td_epochnest with td_no_sleeping. Epoch itself doesn't rely on the counter and it is provided merely for sleeping subsystems to check it. - In functions that sleep use THREAD_CAN_SLEEP() to assert correctness. With EPOCH_TRACE compiled print epoch info. - _sleep() was a wrong place to put the assertion for epoch, right place is sleepq_add(), as there ways to call the latter bypassing _sleep(). - Do not increase td_no_sleeping in non-preemptible epochs. The critical section would trigger all possible safeguards, no sleeping counter is extraneous. Reviewed by: kib
# 4b25d1f2	15-Oct-2019	Gleb Smirnoff <glebius@FreeBSD.org>	Missing from r353596.
# bac06038	15-Oct-2019	Gleb Smirnoff <glebius@FreeBSD.org>	When assertion for a thread not being in an epoch fails also print all entered epochs. Works with EPOCH_TRACE only. Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D22017
# 46d70077	10-Oct-2019	Conrad Meyer <cem@FreeBSD.org>	ddb: Add CSV option, sorting to 'show (malloc\|uma)' Add /i option for machine-parseable CSV output. This allows ready copy/ pasting into more sophisticated tooling outside of DDB. Add total zone size ("Memory Use") as a new column for UMA. For both, sort the displayed list on size (print the largest zones/types first). This is handy for quickly diagnosing "where has my memory gone?" at a high level. Submitted by: Emily Pettigrew <Emily.Pettigrew AT isilon.com> (earlier version) Sponsored by: Dell EMC Isilon
# 28b740da	14-Jan-2019	Konstantin Belousov <kib@FreeBSD.org>	Handle overflow in calculating max kmem size. vm_kmem_size is u_long, and it might be not capable of holding page count times PAGE_SIZE, even when scaled down by VM_KMEM_SIZE_SCALE. As bde reported, 12G PAE config ends up with zero for kmem size. Explicitly check for overflow and clamp kmem size at vm_kmem_size_max. If we end up at zero size because VM_KMEM_SIZE_MAX is not defined, panic with clear explanation rather then failing in a way which is hard to relate. Reported by: bde, pho Tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D18767
# 26e9d9b0	18-Dec-2018	Mark Johnston <markj@FreeBSD.org>	Fix DDB's "show malloc" after r338899. MFC after: 3 days Sponsored by: The FreeBSD Foundation
# 9978bd99	30-Oct-2018	Mark Johnston <markj@FreeBSD.org>	Add malloc_domainset(9) and _domainset variants to other allocator KPIs. Remove malloc_domain(9) and most other _domain KPIs added in r327900. The new functions allow the caller to specify a general NUMA domain selection policy, rather than specifically requesting an allocation from a specific domain. The latter policy tends to interact poorly with M_WAITOK, resulting in situations where a caller is blocked indefinitely because the specified domain is depleted. Most existing consumers of the _domain KPIs are converted to instead use a DOMAINSET_PREF() policy, in which we fall back to other domains to satisfy the allocation request. This change also defines a set of DOMAINSET_FIXED() policies, which only permit allocations from the specified domain. Discussed with: gallatin, jeff Reported and tested by: pho (previous version) MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17418
# 9afff6b1	23-Sep-2018	Mateusz Guzik <mjg@FreeBSD.org>	Eliminate false sharing in malloc due to statistic collection Currently stats are collected in a MAXCPU-sized array which is not aligned and suffers enormous false-sharing. Fix the problem by utilizing per-cpu allocation. The counter(9) API is not used here as it is too incomplete and does not provide a win over per-cpu zone sized for malloc stats struct. In particular stats are being reported for each cpu separately by just copying what is supposed to be an array element for given cpu. This eliminates significant false-sharing during malloc-heavy tests e.g. on Skylake. See the review for details. Reviewed by: markj Approved by: re (kib) Differential Revision: https://reviews.freebsd.org/D17289
# 49bfa624	25-Aug-2018	Alan Cox <alc@FreeBSD.org>	Eliminate the arena parameter to kmem_free(). Implicitly this corrects an error in the function hypercall_memfree(), where the wrong arena was being passed to kmem_free(). Introduce a per-page flag, VPO_KMEM_EXEC, to mark physical pages that are mapped in kmem with execute permissions. Use this flag to determine which arena the kmem virtual addresses are returned to. Eliminate UMA_SLAB_KRWX. The introduction of VPO_KMEM_EXEC makes it redundant. Update the nearby comment for UMA_SLAB_KERNEL. Reviewed by: kib, markj Discussed with: jeff Approved by: re (marius) Differential Revision: https://reviews.freebsd.org/D16845
# 44d0efb2	20-Aug-2018	Alan Cox <alc@FreeBSD.org>	Eliminate kmem_alloc_contig()'s unused arena parameter. Reviewed by: hselasky, kib, markj Discussed with: jeff Differential Revision: https://reviews.freebsd.org/D16799
# 0766f278	13-Jun-2018	Jonathan T. Looney <jtl@FreeBSD.org>	Make UMA and malloc(9) return non-executable memory in most cases. Most kernel memory that is allocated after boot does not need to be executable. There are a few exceptions. For example, kernel modules do need executable memory, but they don't use UMA or malloc(9). The BPF JIT compiler also needs executable memory and did use malloc(9) until r317072. (Note that a side effect of r316767 was that the "small allocation" path in UMA on amd64 already returned non-executable memory. This meant that some calls to malloc(9) or the UMA zone(9) allocator could return executable memory, while others could return non-executable memory. This change makes the behavior consistent.) This change makes malloc(9) return non-executable memory unless the new M_EXEC flag is specified. After this change, the UMA zone(9) allocator will always return non-executable memory, and a KASSERT will catch attempts to use the M_EXEC flag to allocate executable memory using uma_zalloc() or its variants. Allocations that do need executable memory have various choices. They may use the M_EXEC flag to malloc(9), or they may use a different VM interfact to obtain executable pages. Now that malloc(9) again allows executable allocations, this change also reverts most of r317072. PR: 228927 Reviewed by: alc, kib, markj, jhb (previous version) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D15691
# 34c538c3	02-Jun-2018	Mateusz Guzik <mjg@FreeBSD.org>	malloc: try to use builtins for zeroing at the callsite Plenty of allocation sites pass M_ZERO and sizes which are small and known at compilation time. Handling them internally in malloc loses this information and results in avoidable calls to memset. Instead, let the compiler take the advantage of it whenever possible. Discussed with: jeff
# 5072a5f4	18-May-2018	Matt Macy <mmacy@FreeBSD.org>	malloc: avoid possibly returning stack garbage if MALLOC_DEBUG is defined
# 06bf2a6a	10-May-2018	Matt Macy <mmacy@FreeBSD.org>	Add simple preempt safe epoch API Read locking is over used in the kernel to guarantee liveness. This API makes it easy to provide livenes guarantees without atomics. Includes epoch_test kernel module to stress test the API. Documentation will follow initial use case. Test case and improvements to preemption handling in response to discussion with mjg@ Reviewed by: imp@, shurd@ Approved by: sbruno@
# 7cd79421	23-Apr-2018	Mateusz Guzik <mjg@FreeBSD.org>	dtrace: depessimize dtmalloc when dtrace is active Each malloc/free was testing dtrace_malloc_enabled and forcing extra reads from the malloc type struct to see if perhaps a dtmalloc probe was on. Treat it like lockstat and sdt: have a global bolean.
# c9e05ccd	23-Apr-2018	Mateusz Guzik <mjg@FreeBSD.org>	malloc: stop reading the subzone if MALLOC_DEBUG_MAXZONES == 1 (the default) malloc was showing at the top of profile during while running microbenchmarks. #define DTMALLOC_PROBE_MAX 2 struct malloc_type_internal { uint32_t mti_probes[DTMALLOC_PROBE_MAX]; u_char mti_zone; struct malloc_type_stats mti_stats[MAXCPU]; }; Reading mti_zone it wastes a cacheline to hold mti_probes + mti_zone (which we know is 0) + part of malloc stats of the first cpu which on top induces false-sharing. In particular will-it-scale lock1_processes -t 128 -s 10: before: average:45879692 after: average:51655596 Note the counters can be padded but the right fix is to move them to counter(9), leaving the struct read-only after creation (modulo dtrace probes).
# f7d35785	08-Feb-2018	Gleb Smirnoff <glebius@FreeBSD.org>	Fix boot_pages exhaustion on machines with many domains and cores, where size of UMA zone allocation is greater than page size. In this case zone of zones can not use UMA_MD_SMALL_ALLOC, and we need to postpone switch off of this zone from startup_alloc() until full launch of VM. o Always supply number of VM zones to uma_startup_count(). On machines with UMA_MD_SMALL_ALLOC ignore it completely, unless zsize goes over a page. In the latter case account VM zones for number of allocations from the zone of zones. o Rewrite startup_alloc() so that it will immediately switch off from itself any zone that is already capable of running real alloc. In worst case scenario we may leak a single page here. See comment in uma_startup_count(). o Hardcode call to uma_startup2() into vm_mem_init(). Otherwise some extra SYSINITs, e.g. vm_page_init() may sneak in before. o While here, remove uma_boot_pages_mtx. With recent changes to boot pages calculation, we are guaranteed to use all of the boot_pages in the early single threaded stage. Reported & tested by: mav
# f4bef67c	05-Feb-2018	Gleb Smirnoff <glebius@FreeBSD.org>	Followup on r302393 by cperciva, improving calculation of boot pages required for UMA startup. o Introduce another stage of UMA startup, which is entered after vm_page_startup() finishes. After this stage we don't yet enable buckets, but we can ask VM for pages. Rename stages to meaningful names while here. New list of stages: BOOT_COLD, BOOT_STRAPPED, BOOT_PAGEALLOC, BOOT_BUCKETS, BOOT_RUNNING. Enabling page alloc earlier allows us to dramatically reduce number of boot pages required. What is more important number of zones becomes consistent across different machines, as no MD allocations are done before the BOOT_PAGEALLOC stage. Now only UMA internal zones actually need to use startup_alloc(), however that may change, so vm_page_startup() provides its need for early zones as argument. o Introduce uma_startup_count() function, to avoid code duplication. The functions calculates sizes of zones zone and kegs zone, and calculates how many pages UMA will need to bootstrap. It counts not only of zone structures, but also of kegs, slabs and hashes. o Hide uma_startup_foo() declarations from public file. o Provide several DIAGNOSTIC printfs on boot_pages usage. o Bugfix: when calculating zone of zones size use (mp_maxid + 1) instead of mp_ncpus. Use resulting number not only in the size argument to zone_ctor() but also as args.size. Reviewed by: imp, gallatin (earlier version) Differential Revision: https://reviews.freebsd.org/D14054
# 5a70796a	24-Jan-2018	Li-Wen Hsu <lwhsu@FreeBSD.org>	Fix build for architectures where size_t is not unsigned long Reviewed by: cem Differential Revision: https://reviews.freebsd.org/D14045
# bd555da9	24-Jan-2018	Conrad Meyer <cem@FreeBSD.org>	malloc(9): Change nominal size to size_t to match standard C No functional change -- size_t matches unsigned long on all platforms. Reported by: bde Discussed with: jhb Sponsored by: Dell EMC Isilon
# ab3185d1	12-Jan-2018	Jeff Roberson <jeff@FreeBSD.org>	Implement NUMA support in uma(9) and malloc(9). Allocations from specific domains can be done by the _domain() API variants. UMA also supports a first-touch policy via the NUMA zone flag. The slab layer is now segregated by VM domains and is precise. It handles iteration for round-robin directly. The per-cpu cache layer remains a mix of domains according to where memory is allocated and freed. Well behaved clients can achieve perfect locality with no performance penalty. The direct domain allocation functions have to visit the slab layer and so require per-zone locks which come at some expense. Reviewed by: Attilio (a slightly older version) Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon
# c02fc960	10-Jan-2018	Conrad Meyer <cem@FreeBSD.org>	mallocarray(9): panic if the requested allocation would overflow Additionally, move the overflow check logic out to WOULD_OVERFLOW() for consumers to have a common means of testing for overflowing allocations. WOULD_OVERFLOW() should be a secondary check -- on 64-bit platforms, just because an allocation won't overflow size_t does not mean it is a sane size to request. Callers should be imposing reasonable allocation limits far, far, below overflow. Discussed with: emaste, jhb, kp Sponsored by: Dell EMC Isilon
# fd91e076	07-Jan-2018	Kristof Provost <kp@FreeBSD.org>	Introduce mallocarray() in the kernel Similar to calloc() the mallocarray() function checks for integer overflows before allocating memory. It does not zero memory, unless the M_ZERO flag is set. Reviewed by: pfg, vangyzen (previous version), imp (previous version) Obtained from: OpenBSD Differential Revision: https://reviews.freebsd.org/D13766
# 2e47807c	28-Nov-2017	Jeff Roberson <jeff@FreeBSD.org>	Eliminate kmem_arena and kmem_object in preparation for further NUMA commits. The arena argument to kmem_*() is now only used in an assert. A follow-up commit will remove the argument altogether before we freeze the API for the next release. This replaces the hard limit on kmem size with a soft limit imposed by UMA. When the soft limit is exceeded we periodically wakeup the UMA reclaim thread to attempt to shrink KVA. On 32bit architectures this should behave much more gracefully as we exhaust KVA. On 64bit the limits are likely never hit. Reviewed by: markj, kib (some objections) Discussed with: alc Tested by: pho Sponsored by: Netflix / Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D13187
# 51369649	20-Nov-2017	Pedro F. Giffuni <pfg@FreeBSD.org>	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.
# 69a28758	15-Sep-2016	Ed Maste <emaste@FreeBSD.org>	Renumber license clauses in sys/kern to avoid skipping #3
# 5e0a6f31	19-May-2016	Mark Johnston <markj@FreeBSD.org>	Move IPv6 malloc tag definitions into the IPv6 code.
# b28cc462	09-Feb-2016	Gleb Smirnoff <glebius@FreeBSD.org>	Include sys/_task.h into uma_int.h, so that taskqueue.h isn't a requirement for uma_int.h. Suggested by: jhb
# e60b2fcb	03-Feb-2016	Gleb Smirnoff <glebius@FreeBSD.org>	Redo r292484. Embed task(9) into zone, so that uz_maxaction is called in a context that can sleep, allowing consumers of the KPI to run their drain routines without any extra measures. Discussed with: jtl
# d9e2e68d	11-Dec-2015	Mark Johnston <markj@FreeBSD.org>	Don't make assertions about td_critnest when the scheduler is stopped. A panicking thread always executes with a critical section held, so any attempt to allocate or free memory while dumping will otherwise cause a second panic. This can occur, for example, if xpt_polled_action() completes non-dump I/O that was pending at the time of the panic. The fact that this can occur is itself a bug, but asserting in this case does little but reduce the reliability of kernel dumps. Suggested by: kib Reported by: pho
# 1067a2ba	19-Nov-2015	Jonathan T. Looney <jtl@FreeBSD.org>	Consistently enforce the restriction against calling malloc/free when in a critical section. uma_zalloc_arg()/uma_zalloc_free() may acquire a sleepable lock on the zone. The malloc() family of functions may call uma_zalloc_arg() or uma_zalloc_free(). The malloc(9) man page currently claims that free() will never sleep. It also implies that the malloc() family of functions will not sleep when called with M_NOWAIT. However, it is more correct to say that these functions will not sleep indefinitely. Indeed, they may acquire a sleepable lock. However, a developer may overlook this restriction because the WITNESS check that catches attempts to call the malloc() family of functions within a critical section is inconsistenly applied. This change clarifies the language of the malloc(9) man page to clarify the restriction against calling the malloc() family of functions while in a critical section or holding a spin lock. It also adds KASSERTs at appropriate points to make the enforcement of this restriction more consistent. PR: 204633 Differential Revision: https://reviews.freebsd.org/D4197 Reviewed by: markj Approved by: gnn (mentor) Sponsored by: Juniper Networks
# 44ec2b63	09-May-2015	Konstantin Belousov <kib@FreeBSD.org>	The vmem callback to reclaim kmem arena address space on low or fragmented conditions currently just wakes up the pagedaemon. The kmem arena is significantly smaller then the total available physical memory, which means that there are loads where kmem arena space could be exhausted, while there is a lot of pages available still. The woken up pagedaemon sees vm_pages_needed != 0, verifies the condition vm_paging_needed() which is false, clears the pass and returns back to sleep, not calling neither uma_reclaim() nor lowmem handler. To handle low kmem arena conditions, create additional pagedaemon thread which calls uma_reclaim() directly. The thread sleeps on the dedicated channel and kmem_reclaim() wakes the thread in addition to the pagedaemon. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# 1eafc078	14-Mar-2015	Ian Lepore <ian@FreeBSD.org>	Set the SBUF_INCLUDENUL flag in sbuf_new_for_sysctl() so that sysctl strings returned to userland include the nulterm byte. Some uses of sbuf_new_for_sysctl() write binary data rather than strings; clear the SBUF_INCLUDENUL flag after calling sbuf_new_for_sysctl() in those cases. (Note that the sbuf code still automatically adds a nulterm byte in sbuf_finish(), but since it's not included in the length it won't get copied to userland along with the binary data.) Remove explicit adding of a nulterm byte in a couple places now that it gets done automatically by the sbuf drain code. PR: 195668
# 7c51714e	21-Sep-2014	Sean Bruno <sbruno@FreeBSD.org>	svn revisions r269964 and r269963 seemed to have impaired small memory footprint systems(32M/64M) and didn't leave enough free memory to load modules when it was setting up page tables that for sizes that are never used on these smallish boards. Set kmem_zmax to PAGE_SIZE on these smaller systems (< 128M) to keep this from happening. Verified on mips32 h/w. PR: 193465 Submitted by: delphij Reviewed by: adrian
# 7001d850	13-Aug-2014	Xin LI <delphij@FreeBSD.org>	Add a new loader tunable, vm.kmem_zmax which allows a system administrator to limit the maximum allocation size that malloc(9) would consider using the UMA cache allocator as backend. Suggested by: alfred MFC after: 2 weeks
# bda06553	13-Aug-2014	Xin LI <delphij@FreeBSD.org>	Re-instate UMA cached backend for 4K - 64K allocations. New consumers like geli(4) uses malloc(9) to allocate temporary buffers that gets free'ed shortly, causing frequent TLB shootdown as observed in hwpmc supported flame graph. Discussed with: jeff, alfred MFC after: 1 week
# 4813ad54	28-Jun-2014	Hans Petter Selasky <hselasky@FreeBSD.org>	Compile fixes: Remove duplicate "debug_ktr.mask" sysctl definition. Remove now unused variable from "kern_ktr.c". This fixes build of "ktr" which was broken by r267961. Let the default value for "vm_kmem_size_scale" be zero. It is setup after that the sysctl has been initialized from "getenv()" in the "kmeminit()" function to equal the "VM_KMEM_SIZE_MAX" value, if zero. On Sparc64 the "VM_KMEM_SIZE_MAX" macro is not a constant. This fixes build of Sparc64 which was broken by r267961. Add a special macro to dynamically create SYSCTL root nodes, because root nodes have a special parent. This fixes build of existing OFED module and CANBUS module for pc98 which was broken by r267961. Add missing "sysctl.h" includes to get the needed sysctl header file declarations. This is needed after r267961. MFC after: 2 weeks
# af3b2549	27-Jun-2014	Hans Petter Selasky <hselasky@FreeBSD.org>	Pull in r267961 and r267973 again. Fix for issues reported will follow.
# 37a107a4	27-Jun-2014	Glen Barber <gjb@FreeBSD.org>	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory
# 3da1cf1e	27-Jun-2014	Hans Petter Selasky <hselasky@FreeBSD.org>	Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies
# 44f1c916	22-Mar-2014	Bryan Drewery <bdrewery@FreeBSD.org>	Rename global cnt to vm_cnt to avoid shadowing. To reduce the diff struct pcu.cnt field was not renamed, so PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in kvm(3) and vmstat(8). The goal was to not affect externally used KPI. Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the the global cnt variable. Exp-run revealed no ports using it directly. No objection from: arch@ Sponsored by: EMC / Isilon Storage Division
# f9d498ad	23-Feb-2014	Dimitry Andric <dim@FreeBSD.org>	On sparc64, VM_KMEM_SIZE_SCALE is not a constant expression, so it cannot be tested in a CTASSERT().
# 54366c0b	25-Nov-2013	Attilio Rao <attilio@FreeBSD.org>	- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip
# c70af487	08-Nov-2013	Alan Cox <alc@FreeBSD.org>	As of r257209, all architectures have defined VM_KMEM_SIZE_SCALE. In other words, every architecture is now auto-sizing the kmem arena. This revision changes kmeminit() so that the definition of VM_KMEM_SIZE_SCALE becomes mandatory and the definition of VM_KMEM_SIZE becomes optional. Replace or eliminate all existing definitions of VM_KMEM_SIZE. With auto-sizing enabled, VM_KMEM_SIZE effectively became an alternate spelling for VM_KMEM_SIZE_MIN on most architectures. Use VM_KMEM_SIZE_MIN for clarity. Change kmeminit() so that the effect of defining VM_KMEM_SIZE is similar to that of setting the tunable vm.kmem_size. Whereas the macros VM_KMEM_SIZE_{MAX,MIN,SCALE} have had the same effect as the tunables vm.kmem_size_{max,min,scale}, the effects of VM_KMEM_SIZE and vm.kmem_size have been distinct. In particular, whereas VM_KMEM_SIZE was overridden by VM_KMEM_SIZE_{MAX,MIN,SCALE} and vm.kmem_size_{max,min,scale}, vm.kmem_size was not. Remedy this inconsistency. Now, VM_KMEM_SIZE can be used to set the size of the kmem arena at compile-time without that value being overridden by auto-sizing. Update the nearby comments to reflect the kmem submap being replaced by the kmem arena. Stop duplicating the auto-sizing formula in every machine- dependent vmparam.h and place it in kmeminit() where auto-sizing takes place. Reviewed by: kib (an earlier version) Sponsored by: EMC / Isilon Storage Division
# 61083fcc	05-Oct-2013	Alan Cox <alc@FreeBSD.org>	Tidy up kmeminit(): Since r245575, 'nmbclusters' is calculated after kmeminit() runs, so it contributes nothing to 'vm_kmem_size'; update a comment to reflect that r254025 replaced the kmem submap with the kmem arena. Reviewed by: kib Approved by: re (gjb) Sponsored by: EMC / Isilon Storage Division
# 99de9af2	13-Aug-2013	Jeff Roberson <jeff@FreeBSD.org>	- Disable quantum caches on the kmem_arena. This can make fragmentation worse on small KVA systems. I had intended to only enable it for debugging. Sponsored by: EMC / Isilon Storage Division
# e137643e	09-Aug-2013	Olivier Houchard <cognet@FreeBSD.org>	Instead of just trying to do it for arm, make sure vm_kmem_size is properly aligned in kmeminit(), where it'll work for any arch. Suggested by: alc
# 5df87b21	07-Aug-2013	Jeff Roberson <jeff@FreeBSD.org>	Replace kernel virtual address space allocation with vmem. This provides transparent layering and better fragmentation. - Normalize functions that allocate memory to use kmem_* - Those that allocate address space are named kva_* - Those that operate on maps are named kmap_* - Implement recursive allocation handling for kmem_arena in vmem. Reviewed by: alc Tested by: pho Sponsored by: EMC / Isilon Storage Division
# 1f3ad93b	31-Jul-2013	Konstantin Belousov <kib@FreeBSD.org>	Remove unused malloc type. Requested by: alc MFC after: 1 week
# 94bfd5b1	04-Feb-2013	Marius Strobl <marius@FreeBSD.org>	Try to improve r242655 take III: move these SYSCTLs describing the kernel map, which is defined and initialized in vm/vm_kern.c, to the latter. Submitted by: alc
# e8cbe54b	03-Feb-2013	Marius Strobl <marius@FreeBSD.org>	Further improve r242655 and supply VM_{MIN,MAX}_KERNEL_ADDRESS as constant values to SYSCTL_ULONG(9) where possible. Submitted by: bde
# c882264c	08-Nov-2012	Marius Strobl <marius@FreeBSD.org>	Make r242655 build on sparc64. While at it, make vm_{max,min}_kernel_address vm_offset_t as they should be.
# fc6874bc	05-Nov-2012	Alfred Perlstein <alfred@FreeBSD.org>	export VM_MIN_KERNEL_ADDRESS and VM_MAX_KERNEL_ADDRESS via sysctl. On several platforms the are determined by too many nested #defines to be easily discernible. This will aid in development of auto-tuning.
# f806cdcf	15-Jul-2012	Matthew D Fleming <mdf@FreeBSD.org>	Fix a bug with memguard(9) on 32-bit architectures without a VM_KMEM_MAX_SIZE. The code was not taking into account the size of the kernel_map, which the kmem_map is allocated from, so it could produce a sub-map size too large to fit. The simplest solution is to ignore VM_KMEM_MAX entirely and base the memguard map's size off the kernel_map's size, since this is always relevant and always smaller. Found by: Justin Hibbits
# 687c94aa	02-Jul-2012	John Baldwin <jhb@FreeBSD.org>	Honor db_pager_quit in 'show uma' and 'show malloc'. MFC after: 1 month
# 831ce4cb	01-Mar-2012	John Baldwin <jhb@FreeBSD.org>	- Change contigmalloc() to use the vm_paddr_t type instead of an unsigned long for specifying a boundary constraint. - Change bus_dma tags to use bus_addr_t instead of bus_size_t for boundary constraints. These allow boundary constraints to be fully expressed for cases where sizeof(bus_addr_t) != sizeof(bus_size_t). Specifically, it allows a driver to properly specify a 4GB boundary in a PAE kernel. Note that this cannot be safely MFC'd without a lot of compat shims due to KBI changes, so I do not intend to merge it. Reviewed by: scottl
# ea3f07d3	07-Dec-2011	Alan Cox <alc@FreeBSD.org>	Eliminate stale numbers from a comment.
# c749c003	07-Dec-2011	Alan Cox <alc@FreeBSD.org>	Eliminate the possibility of 32-bit arithmetic overflow in the calculation of vm_kmem_size that may occur if the system administrator has specified a vm.vm_kmem_size tunable value that exceeds the hard cap. PR: 162741 Submitted by: Adam McDougall Reviewed by: bde@ MFC after: 3 weeks
# 6472ac3d	07-Nov-2011	Ed Schouten <ed@FreeBSD.org>	Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.
# f346986b	26-Oct-2011	Alan Cox <alc@FreeBSD.org>	contigmalloc(9) and contigfree(9) are now implemented in terms of other more general VM system interfaces. So, their implementation can now reside in kern_malloc.c alongside the other functions that are declared in malloc.h.
# 8d689e04	12-Oct-2011	Gleb Smirnoff <glebius@FreeBSD.org>	Make memguard(9) capable to guard uma(9) allocations.
# 1549ed03	08-Oct-2011	Alan Cox <alc@FreeBSD.org>	Fix the handling of an empty kmem map by sysctl_kmem_map_free(). In the unlikely event that sysctl_kmem_map_free() was performed on an empty kmem map, it would incorrectly report the free space as zero. Discussed with: avg MFC after: 1 week
# e9a3f785	23-Mar-2011	Alan Cox <alc@FreeBSD.org>	Modestly increase the maximum allowed size of the kmem map on i386. Also, express this new maximum as a fraction of the kernel's address space size rather than a constant so that increasing KVA_PAGES will automatically increase this maximum. As a side-effect of this change, kern.maxvnodes will automatically increase by a proportional amount. While I'm here ensure that this change doesn't result in an unintended increase in maxpipekva on i386. Calculate maxpipekva based upon the size of the kernel address space and the amount of physical memory instead of the size of the kmem map. The memory backing pipes is not allocated from the kmem map. It is allocated from its own submap of the kernel map. In short, it has no real connection to the kmem map. (In fact, the commit messages for the maxpipekva auto-sizing talk about using the kernel map size, cf. r117325 and r117391, even though the implementation actually used the kmem map size.) Although the calculation is now done differently, the resulting value for maxpipekva should remain almost the same on i386. However, on amd64, the value will be reduced by 2/3. This is intentional. The recent change to VM_KMEM_SIZE_SCALE on amd64 for the benefit of ZFS also had the unnecessary side-effect of increasing maxpipekva. This change is effectively restoring maxpipekva on amd64 to its prior value. Eliminate init_param3() since it is no longer used.
# 00f0e671	26-Jan-2011	Matthew D Fleming <mdf@FreeBSD.org>	Explicitly wire the user buffer rather than doing it implicitly in sbuf_new_for_sysctl(9). This allows using an sbuf with a SYSCTL_OUT drain for extremely large amounts of data where the caller knows that appropriate references are held, and sleeping is not an issue. Inspired by: rwatson
# a7d5f7eb	19-Oct-2010	Jamie Gritton <jamie@FreeBSD.org>	A new jail(8) with a configuration file, to replace the work currently done by /etc/rc.d/jail.
# 95bb9d38	09-Oct-2010	Andriy Gapon <avg@FreeBSD.org>	add kmem_map_free sysctl: query largest contiguous free range in kmem_map Suggested by: alc Reviewed by: alc MFC after: 1 week
# 7814c80a	07-Oct-2010	Andriy Gapon <avg@FreeBSD.org>	vm.kmem_map_size: a sysctl to query current kmem_map->size Based on a patch from Sandvine Incorporated via emaste. Reviewed by: emaste MFC after: 1 week
# d801e824	30-Sep-2010	Andriy Gapon <avg@FreeBSD.org>	kmem_size* sysctls: hint that these are also tunables MFC after: 1 week
# 4e657159	16-Sep-2010	Matthew D Fleming <mdf@FreeBSD.org>	Re-add r212370 now that the LOR in powerpc64 has been resolved: Add a drain function for struct sysctl_req, and use it for a variety of handlers, some of which had to do awkward things to get a large enough SBUF_FIXEDLEN buffer. Note that some sysctl handlers were explicitly outputting a trailing NUL byte. This behaviour was preserved, though it should not be necessary. Reviewed by: phk (original patch)
# 404a593e	13-Sep-2010	Matthew D Fleming <mdf@FreeBSD.org>	Revert r212370, as it causes a LOR on powerpc. powerpc does a few unexpected things in copyout(9) and so wiring the user buffer is not sufficient to perform a copyout(9) while holding a random mutex. Requested by: nwhitehorn
# dd67e210	09-Sep-2010	Matthew D Fleming <mdf@FreeBSD.org>	Add a drain function for struct sysctl_req, and use it for a variety of handlers, some of which had to do awkward things to get a large enough FIXEDLEN buffer. Note that some sysctl handlers were explicitly outputting a trailing NUL byte. This behaviour was preserved, though it should not be necessary. Reviewed by: phk
# 6d3ed393	31-Aug-2010	Matthew D Fleming <mdf@FreeBSD.org>	The realloc case for memguard(9) will copy too many bytes when reallocating to a smaller-sized allocation. Fix this issue. Noticed by: alc Reviewed by: alc Approved by: zml (mentor) MFC after: 3 weeks
# e3813573	11-Aug-2010	Matthew D Fleming <mdf@FreeBSD.org>	Rework memguard(9) to reserve significantly more KVA to detect use-after-free over a longer time. Also release the backing pages of a guarded allocation at free(9) time to reduce the overhead of using memguard(9). Allow setting and varying the malloc type at run-time. Add knobs to allow: - randomly guarding memory - adding un-backed KVA guard pages to detect underflow and overflow - a lower limit on the size of allocations that are guarded Reviewed by: alc Reviewed by: brueffer, Ulrich Spörlein <uqs spoerlein net> (man page) Silence from: -arch Approved by: zml (mentor) MFC after: 1 month
# d7854da1	28-Jul-2010	Matthew D Fleming <mdf@FreeBSD.org>	Add MALLOC_DEBUG_MAXZONES debug malloc(9) option to use multiple uma zones for each malloc bucket size. The purpose is to isolate different malloc types into hash classes, so that any buffer overruns or use-after-free will usually only affect memory from malloc types in that hash class. This is purely a debugging tool; by varying the hash function and tracking which hash class was corrupted, the intersection of the hash classes from each instance will point to a single malloc type that is being misused. At this point inspection or memguard(9) can be used to catch the offending code. Add MALLOC_DEBUG_MAXZONES=8 to -current GENERIC configuration files. The suggestion to have this on by default came from Kostik Belousov on -arch. This code is based on work by Ron Steinke at Isilon Systems. Reviewed by: -arch (mostly silence) Reviewed by: zml Approved by: zml (mentor)
# 60ae52f7	21-Jun-2010	Ed Schouten <ed@FreeBSD.org>	Use ISO C99 integer types in sys/kern where possible. There are only about 100 occurences of the BSD-specific u_int*_t datatypes in sys/kern. The ISO C99 integer types are used here more often.
# f121baaa	05-Jun-2009	Brian Somers <brian@FreeBSD.org>	If we're passed garbage in malloc_init(), panic() rather than expecting a KASSERT to handle it. People are likely to turn off INVARIANTS RSN and loading an old module can cause garbage-in here. I saw the issue with an older nvidia driver (x11/nvidia-driver) loading into a new kernel - a crash wasn't seen 'till sysctl_kern_malloc_stats(). I was lucky that mtp->ks_shortdesc was NULL and not something horrible. While I'm here, KASSERT that malloc_uninit() isn't passed something that's not in kmemstatistics. MFC after: 3 weeks
# e678f09a	09-May-2009	Warner Losh <imp@FreeBSD.org>	Retire kern.vm.kmem.size. It was marked as obsolete prior to 5.2, so it can go.
# bb1c7df8	18-Apr-2009	Robert Watson <rwatson@FreeBSD.org>	struct malloc_type has had a 'magic' field statically initialized to M_MAGIC by MALLOC_DEFINE() for a long time; add assertions that malloc_type's passed to malloc(), free(), etc have that magic set. MFC after: 2 weeks
# c90c9021	26-Feb-2009	Ed Schouten <ed@FreeBSD.org>	Remove even more unneeded variable assignments. kern_time.c: - Unused variable `p'. kern_thr.c: - Variable `error' is always caught immediately, so no reason to initialize it. There is no way that error != 0 at the end of create_thread(). kern_sig.c: - Unused variable `code'. kern_synch.c: - `rval' is always assigned in all different cases. kern_rwlock.c: - `v' is always overwritten with RW_UNLOCKED further on. kern_malloc.c: - `size' is always initialized with the proper value before being used. kern_exit.c: - `error' is always caught and returned immediately. abort2() never returns a non-zero value. kern_exec.c: - `len' is always assigned inside the if-statement right below it. tty_info.c: - `td' is always overwritten by FOREACH_THREAD_IN_PROC(). Found by: LLVM's scan-build
# e20a199f	25-Jan-2009	Jeff Roberson <jeff@FreeBSD.org>	- Make the keg abstraction more complete. Permit a zone to have multiple backend kegs so it may source compatible memory from multiple backends. This is useful for cases such as NUMA or different layouts for the same memory type. - Provide a new api for adding new backend kegs to secondary zones. - Provide a new flag for adjusting the layout of zones to stagger allocations better across cache lines. Sponsored by: Nokia
# d7f03759	19-Oct-2008	Ulf Lilleengen <lulf@FreeBSD.org>	- Import the HEAD csup code which is the basis for the cvsmode work.
# b89eaf4e	05-Jul-2008	Alan Cox <alc@FreeBSD.org>	Enable the creation of a kmem map larger than 4GB. Submitted by: Tz-Huan Huang Make several variables related to kmem map auto-sizing static. Found by: CScout
# 6819e13e	04-Jul-2008	Alan Cox <alc@FreeBSD.org>	Correct an error in the comments for init_param3(). Discussed with: silby
# 91dd776c	22-May-2008	John Birrell <jb@FreeBSD.org>	Add support for the DTrace malloc provider which can enable probes on a per-malloc type basis.
# 3202ed75	10-May-2008	Alan Cox <alc@FreeBSD.org>	Introduce a new parameter "superpage_align" to kmem_suballoc() that is used to request superpage alignment for the submap. Request superpage alignment for the kmem_map. Pass VMFS_ANY_SPACE instead of TRUE to vm_map_find(). (They are currently equivalent but VMFS_ANY_SPACE is the new preferred spelling.) Remove a stale comment from kmem_malloc().
# 237fdd78	16-Mar-2008	Robert Watson <rwatson@FreeBSD.org>	In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink
# dc2e1e3f	27-Jun-2007	Robert Watson <rwatson@FreeBSD.org>	Use vm_offset_t for kmembase and kmemlimit rather than char *, avoiding unnecessary casts, and making it possible to compile kern_malloc.c with strict aliasing. Submitted by: rdivacky Approved by: re (kensmith)
# 3805385e	13-Jun-2007	Robert Watson <rwatson@FreeBSD.org>	Spell statistics more correctly in comments.
# 2feb50bf	31-May-2007	Attilio Rao <attilio@FreeBSD.org>	Revert VMCNT_* operations introduction. Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately. Requested by: alc Approved by: jeff (mentor)
# e4e80aa7	27-May-2007	Robert Watson <rwatson@FreeBSD.org>	Remove #if 0'd check for 0-size allocations, which if enabled, called kdb_enter().
# 222d0195	18-May-2007	Jeff Roberson <jeff@FreeBSD.org>	- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines. Contributed by: Attilio Rao <attilio@FreeBSD.org>
# 0e5179e4	20-Apr-2007	Stephane E. Potvin <sepotvin@FreeBSD.org>	Add support for specifying a minimal size for vm.kmem_size in the loader via vm.kmem_size_min. Useful when using ZFS to make sure that vm.kmem size will be at least 256mb (for example) without forcing a particular value via vm.kmem_size. Approved by: njl (mentor) Reviewed by: alc
# 24076d13	26-Oct-2006	Robert Watson <rwatson@FreeBSD.org>	Increase usefulness of "show malloc" by moving from displaying the basic counters of allocs/frees/use for each malloc type to calculating InUse, MemUse, and Requests as displayed by the userspace vmstat -m. This is more useful when debugging malloc(9)-related memory leaks, where the count of allocs/frees may not usefully reflect that current memory allocation (i.e., when highly variable size allocations occur with the same malloc type, such as with contigmalloc). MFC after: 3 days Limitations observed by: scottl
# 4b19d603	23-Jul-2006	Robert Watson <rwatson@FreeBSD.org>	Remove old kern.malloc sysctl, which generated a text representation of the kernel malloc(9) state for vmstat -m. libmemstat is now used to generate a machine-readable version which is converged by vmstat -m into a human-readable version. Not for MFC.
# 0ce3f16d	23-Jul-2006	Robert Watson <rwatson@FreeBSD.org>	Expand comments for malloc(9) to better describe the design and statistics / memory types model.
# 45d48bda	03-Mar-2006	Paul Saab <ps@FreeBSD.org>	Fix bug in malloc_uninit(): Releasing items from the mt_zone can not be done by a simple uma_zfree() call since mt_zone is allocated with the UMA_ZONE_MALLOC flag. Use uma_zfree_arg instead and supply the slab. This bug caused panics in low memory situations on unloading kernel modules containing MALLOC_DEFINE(..) statements. Submitted by: ups
# 847a2a17	31-Jan-2006	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Add buffer corruption protection (RedZone) for kernel's malloc(9). It detects both: buffer underflows and buffer overflows bugs at runtime (on free(9) and realloc(9)) and prints backtraces from where memory was allocated and from where it was freed. Tested by: kris
# d362c40d	30-Dec-2005	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Improve memguard a bit: - Provide tunable vm.memguard.desc, so one can specify memory type without changing the code and recompiling the kernel. - Allow to use memguard for kernel modules by providing sysctl vm.memguard.desc, which can be changed to short description of memory type before module is loaded. - Move as much memguard code as possible to memguard.c. - Add sysctl node vm.memguard. and move memguard-specific sysctl there. - Add malloc_desc2type() function for finding memory type based on its short description (ks_shortdesc field). - Memory type can be changed (via vm.memguard.desc sysctl) only if it doesn't exist (will be loaded later) or when no memory is allocated yet. If there is allocated memory for the given memory type, return EBUSY. - Implement two ways of memory types comparsion and make safer/slower the default.
# 619f2841	27-Dec-2005	Pawel Jakub Dawidek <pjd@FreeBSD.org>	In realloc(9), determine size of the original block based on UMA_SLAB_MALLOC flag. In some circumstances (I observed it when I was doing a lot of reallocs) UMA_SLAB_MALLOC can be set even if us_keg != NULL. If this is the case we have wonderful, silent data corruption, because less data is copied to the newly allocated region than should be. I'm not sure when this bug was introduced, it could be there undetected for years now, as we don't have a lot of realloc(9) consumers and it was hard to reproduce it... ...but what I know for sure, is that I don't want to know who introduce the bug:) It took me two/three days to track it down (of course most of the time I was looking for the bug in my own code).
# 2a143d5b	03-Nov-2005	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Detect memory leaks when memory type is being destroyed. This is very helpful for detecting kernel modules memory leaks on unload. Discussed and reviewed by: rwatson
# 64a266f9	20-Oct-2005	Robert Watson <rwatson@FreeBSD.org>	Change format string for u_int64_t to %ju from %llu, in order to use the correct format string on 64-bit systems. Pointed out by: pjd
# 909ed16c	20-Oct-2005	Robert Watson <rwatson@FreeBSD.org>	Add a "show malloc" command to DDB, which prints out the current stats for available kernel malloc types. Quite useful for post-mortem debugging of memory leaks without a dump device configured on a panicked box. MFC after: 2 weeks
# 23198357	02-Aug-2005	Ruslan Ermilov <ru@FreeBSD.org>	Long overdue, keep up with mbuf.h,v 1.148.
# 73864adb	27-Jul-2005	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Fix the way how "InUse" column in 'vmstat -m' output works: - increase number of allocations count only on successfull malloc(9), so it doesn't confuse people; - because we need to check if 'size > 0', hide 'mtsp->mts_memalloced += size;' under the check as well, as for size=0 it is of course a no-op; - avoid critical_enter()/critical_exit() in case of failure in malloc_type_allocated() as there will be nothing to do. OK'ed by: rwatson MFC after: 2 days
# 4f8721d2	14-Jul-2005	Robert Watson <rwatson@FreeBSD.org>	Correct build on 64-bit: cast u_int64_t to (unsigned long long) before printfing as (unsigned long long). 32-bit build on i386 didn't notice this. Whoops. Reported by: arved Tested by: sledge
# cd814b26	14-Jul-2005	Robert Watson <rwatson@FreeBSD.org>	Introduce a new sysctl, kern.malloc_stats, which exports kernel malloc statistics via a binary structure stream: - Add structure 'malloc_type_stream_header', which defines a stream version, definition of MAXCPUS used in the stream, and a number of malloc_type records in the stream. - Add structure 'malloc_type_header', which defines the name of the malloc type being reported on. - When the sysctl is queried, return a stream header, followed by a series of type descriptions, each consisting of a type header followed by a series of MAXCPUS malloc_type_stats structures holding per-CPU allocation information. Typical values of MAXCPUS will be 1 (UP compiled kernel) and 16 (SMP compiled kernel). This query mechanism allows user space monitoring tools to extract memory allocation statistics in a machine-readable form, and to do so at a per-CPU granularity, allowing monitoring of allocation patterns across CPUs in order to better understand the distribution of work and memory flow over multiple CPUs. While here: - Bump statistics width to uint64_t, and hard code using fixed-width type in order to be more sure about structure layout in the stream. We allocate and free a lot of memory. - Add kmemcount, a counter of the number of registered malloc types, in order to avoid excessive manual counting of types. Export via a new sysctl to allow user-space code to better size buffers. - De-XXX comment on no longer maintaining the high watermark in old sysctl monitoring code. A follow-up commit of libmemstat(3), a library to monitor kernel memory allocation, will occur in the next few days. Likewise, similar changes to UMA.
# c0cac8dc	16-Jun-2005	Ken Smith <kensmith@FreeBSD.org>	Remove a variable that became unused as a result of changes made in v1.139. This was only exposed if MALLOC_PROFILE was defined. Submitted by: Gary Jennejohn Pointy hat: rwatson Approved by: re (scottl)
# 8c61b219	10-Jun-2005	Joseph Koshy <jkoshy@FreeBSD.org>	Fix typo. Reviewed by: rwatson, sam
# 63a7e0a3	29-May-2005	Robert Watson <rwatson@FreeBSD.org>	Kernel malloc layers malloc_type allocation over one of two underlying allocators: a set of power-of-two UMA zones for small allocations, and the VM page allocator for large allocations. In order to maintain unified statistics for specific malloc types, kernel malloc maintains a separate per-type statistics pool, which can be monitored using vmstat -m. Prior to this commit, each pool of per-type statistics was protected using a per-type mutex associated with the malloc type. This change modifies kernel malloc to maintain per-CPU statistics pools for each malloc type, and protects writing those statistics using critical sections. It also moves to unsynchronized reads of per-CPU statistics when generating coalesced statistics. To do this, several changes are implemented: - In the previous world order, the statistics memory was allocated by the owner of the malloc type structure, allocated statically using MALLOC_DEFINE(). This embedded the definition of the malloc_type structure into all kernel modules. Move to a model in which a pointer within struct malloc_type points at a UMA-allocated malloc_type_internal data structure owned and maintained by kern_malloc.c, and not part of the exported ABI/API to the rest of the kernel. For the purposes of easing a possible MFC, re-use an existing pointer in 'struct malloc_type', and maintain the current malloc_type structure size, as well as layout with respect to the fields reused outside of the malloc subsystem (such as ks_shortdesc). There are several unused fields as a result of no longer requiring the mutex in malloc_type. - Struct malloc_type_internal contains an array of malloc_type_stats, of size MAXCPU. The structure defined above avoids hard-coding a kernel compile-time value of MAXCPU into kernel modules that interact with malloc. - When accessing per-cpu statistics for a malloc type, surround read - modify - update requests with critical_enter()/critical_exit() in order to avoid races during write. The per-CPU fields are written only from the CPU that owns them. - Per-CPU stats now maintained "allocated" and "freed" counters for number of allocations/frees and bytes allocated/freed, since there is no longer a coherent global notion of the totals. When coalescing malloc stats, accept a slight race between reading stats across CPUs, and avoid showing the user a negative allocation count for the type in the event of a race. The global high watermark is no longer maintained for a malloc type, as there is no global notion of the number of allocations. - While tearing up the sysctl() path, also switch to using sbufs. The current "export as text" sysctl format is retained with the same syntax. We may want to change this in the future to export more per-CPU information, such as how allocations and frees are balanced across CPUs. This change results in a substantial speedup of kernel malloc and free paths on SMP, as critical sections (where usable) out-perform mutexes due to avoiding atomic/bus-locked operations. There is also a minor improvement on UP due to the slightly lower cost of critical sections there. The cost of the change to this approach is the loss of a continuous notion of total allocations that can be exploited to track per-type high watermarks, as well as increased complexity when monitoring statistics. Due to carefully avoiding changing the ABI, as well as hardening the ABI against future changes, it is not necessary to recompile kernel modules for this change. However, MFC'ing this change to RELENG_5 will require also MFC'ing optimizations for soft critical sections, which may modify exposed kernel ABIs. The internal malloc API is changed, and modifications to vmstat in order to restore "vmstat -m" on core dumps will follow shortly. Several improvements from: bde Statistics approach discussed with: ups Tested by: scottl, others
# 87efd4d5	12-Apr-2005	Robert Watson <rwatson@FreeBSD.org>	Consistently style function declarations in kern_malloc.c. MFC after: 3 days
# e4eb384b	21-Jan-2005	Bosko Milekic <bmilekic@FreeBSD.org>	Bring in MemGuard, a very simple and small replacement allocator designed to help detect tamper-after-free scenarios, a problem more and more common and likely with multithreaded kernels where race conditions are more prevalent. Currently MemGuard can only take over malloc()/realloc()/free() for particular (a) malloc type(s) and the code brought in with this change manually instruments it to take over M_SUBPROC allocations as an example. If you are planning to use it, for now you must: 1) Put "options DEBUG_MEMGUARD" in your kernel config. 2) Edit src/sys/kern/kern_malloc.c manually, look for "XXX CHANGEME" and replace the M_SUBPROC comparison with the appropriate malloc type (this might require additional but small/simple code modification if, say, the malloc type is declared out of scope). 3) Build and install your kernel. Tune vm.memguard_divisor boot-time tunable which is used to scale how much of kmem_map you want to allott for MemGuard's use. The default is 10, so kmem_size/10. ToDo: 1) Bring in a memguard(9) man page. 2) Better instrumentation (e.g., boot-time) of MemGuard taking over malloc types. 3) Teach UMA about MemGuard to allow MemGuard to override zone allocations too. 4) Improve MemGuard if necessary. This work is partly based on some old patches from Ian Dowse.
# 9454b2d8	06-Jan-2005	Warner Losh <imp@FreeBSD.org>	/* -> /*- for copyright notices, minor format tweaks as necessary
# 479439b4	29-Sep-2004	Dag-Erling Smørgrav <des@FreeBSD.org>	Turn VM_KMEM_SIZE_MAX and VM_KMEM_SIZE_SCALE into tunables. MFC after: 3 days
# 4362fada	19-Jul-2004	Brian Feldman <green@FreeBSD.org>	Reimplement contigmalloc(9) with an algorithm which stands a greatly- improved chance of working despite pressure from running programs. Instead of trying to throw a bunch of pages out to swap and hope for the best, only a range that can potentially fulfill contigmalloc(9)'s request will have its contents paged out (potentially, not forcibly) at a time. The new contigmalloc operation still operates in three passes, but it could potentially be tuned to more or less. The first pass only looks at pages in the cache and free pages, so they would be thrown out without having to block. If this is not enough, the subsequent passes page out any unwired memory. To combat memory pressure refragmenting the section of memory being laundered, each page is removed from the systems' free memory queue once it has been freed so that blocking later doesn't cause the memory laundered so far to get reallocated. The page-out operations are now blocking, as it would make little sense to try to push out a page, then get its status immediately afterward to remove it from the available free pages queue, if it's unlikely to have been freed. Another change is that if KVA allocation fails, the allocated memory segment will be freed and not leaked. There is a sysctl/tunable, defaulting to on, which causes the old contigmalloc() algorithm to be used. Nonetheless, I have been using vm.old_contigmalloc=0 for over a month. It is safe to switch at run-time to see the difference it makes. A new interface has been used which does not require mapping the allocated pages into KVA: vm_page.h functions vm_page_alloc_contig() and vm_page_release_contig(). These are what vm.old_contigmalloc=0 uses internally, so the sysctl/tunable does not affect their operation. When using the contigmalloc(9) and contigfree(9) interfaces, memory is now tracked with malloc(9) stats. Several functions have been exported from kern_malloc.c to allow other subsystems to use these statistics, as well. This invalidates the BUGS section of the contigmalloc(9) manpage.
# 2d50560a	10-Jul-2004	Marcel Moolenaar <marcel@FreeBSD.org>	Update for the KDB framework: o Make debugging code conditional upon KDB instead of DDB. o Call kdb_enter() instead of Debugger(). o Call kdb_backtrace() instead of db_print_backtrace() or backtrace(). kern_mutex.c: o Replace checks for db_active with checks for kdb_active and make them unconditional. kern_shutdown.c: o s/DDB_UNATTENDED/KDB_UNATTENDED/g o s/DDB_TRACE/KDB_TRACE/g o Save the TID of the thread doing the kernel dump so the debugger knows which thread to select as the current when debugging the kernel core file. o Clear kdb_active instead of db_active and do so unconditionally. o Remove backtrace() implementation. kern_synch.c: o Call kdb_reenter() instead of db_error().
# 099a0e58	31-May-2004	Bosko Milekic <bmilekic@FreeBSD.org>	Bring in mbuma to replace mballoc. mbuma is an Mbuf & Cluster allocator built on top of a number of extensions to the UMA framework, all included herein. Extensions to UMA worth noting: - Better layering between slab <-> zone caches; introduce Keg structure which splits off slab cache away from the zone structure and allows multiple zones to be stacked on top of a single Keg (single type of slab cache); perhaps we should look into defining a subset API on top of the Keg for special use by malloc(9), for example. - UMA_ZONE_REFCNT zones can now be added, and reference counters automagically allocated for them within the end of the associated slab structures. uma_find_refcnt() does a kextract to fetch the slab struct reference from the underlying page, and lookup the corresponding refcnt. mbuma things worth noting: - integrates mbuf & cluster allocations with extended UMA and provides caches for commonly-allocated items; defines several zones (two primary, one secondary) and two kegs. - change up certain code paths that always used to do: m_get() + m_clget() to instead just use m_getcl() and try to take advantage of the newly defined secondary Packet zone. - netstat(1) and systat(1) quickly hacked up to do basic stat reporting but additional stats work needs to be done once some other details within UMA have been taken care of and it becomes clearer to how stats will work within the modified framework. From the user perspective, one implication is that the NMBCLUSTERS compile-time option is no longer used. The maximum number of clusters is still capped off according to maxusers, but it can be made unlimited by setting the kern.ipc.nmbclusters boot-time tunable to zero. Work should be done to write an appropriate sysctl handler allowing dynamic tuning of kern.ipc.nmbclusters at runtime. Additional things worth noting/known issues (READ): - One report of 'ips' (ServeRAID) driver acting really slow in conjunction with mbuma. Need more data. Latest report is that ips is equally sucking with and without mbuma. - Giant leak in NFS code sometimes occurs, can't reproduce but currently analyzing; brueffer is able to reproduce but THIS IS NOT an mbuma-specific problem and currently occurs even WITHOUT mbuma. - Issues in network locking: there is at least one code path in the rip code where one or more locks are acquired and we end up in m_prepend() with M_WAITOK, which causes WITNESS to whine from within UMA. Current temporary solution: force all UMA allocations to be M_NOWAIT from within UMA for now to avoid deadlocks unless WITNESS is defined and we can determine with certainty that we're not holding any locks when we're M_WAITOK. - I've seen at least one weird socketbuffer empty-but- mbuf-still-attached panic. I don't believe this to be related to mbuma but please keep your eyes open, turn on debugging, and capture crash dumps. This change removes more code than it adds. A paper is available detailing the change and considering various performance issues, it was presented at BSDCan2004: http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf Please read the paper for Future Work and implementation details, as well as credits. Testing and Debugging: rwatson, brueffer, Ketrien I. Saihr-Kesenchedra, ... Reviewed by: Lots of people (for different parts)
# 7f8a436f	05-Apr-2004	Warner Losh <imp@FreeBSD.org>	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999. Approved by: core
# 84344f9f	27-Jan-2004	Dag-Erling Smørgrav <des@FreeBSD.org>	Rename the kern.vm.kmem.size tunable to the more logical vm.kmem_size. To assure backward compatibility (conditional on !BURN_BRIDGES), look it up by its old name first, and log a warning (but accept the setting) if it was found. If both the old and new name are defined, the new name takes precedence. Also export vm.kmem_size as a read-only sysctl variable; I find it hard to tune a parameter when I don't know its default value, especially when that default value is computed at boot time.
# 9fb535de	18-Sep-2003	Jeff Roberson <jeff@FreeBSD.org>	- Only use UMA to cache malloc requests up to PAGE_SIZE. Values larger than this are requested very infrequently and waste memory when we cache spares.
# 68f2d20b	22-Jul-2003	Poul-Henning Kamp <phk@FreeBSD.org>	Revert stuff which accidentally ended up in the previous commit.
# 55d1d703	22-Jul-2003	Poul-Henning Kamp <phk@FreeBSD.org>	Don't attempt to inline large functions mb_alloc() and mb_free(), it more than doubles the text size of this file. GCC has wisely ignored us on this previously
# 347194c1	10-Jul-2003	Mike Silbersack <silby@FreeBSD.org>	Add init_param3() to subr_param. This function is called immediately after the kernel map has been sized, and is the optimal place for the autosizing of memory allocations which occur within the kernel map to occur. Suggested by: bde
# 1795d0cd	10-Jun-2003	Paul Saab <ps@FreeBSD.org>	Don't overflow when calculating vm_kmem_size. This fixes kmem_map too small panics on PAE machines which have odd > 4GB sizes (4.5 gig would render a 20MB of KVA for kmem_map instead of 200MB). Submitted by: John Cagle <john.cagle@hp.com>, jeff Reviewed by: jeff, peter, scottl, lots of USENIX folks
# 677b542e	10-Jun-2003	David E. O'Brien <obrien@FreeBSD.org>	Use __FBSDID().
# 1282e9ac	11-May-2003	Poul-Henning Kamp <phk@FreeBSD.org>	Don't pass NULL pointer to memset if we are compiled with DIAGNOSTIC Approved by: re/rwatson
# 8cb72d61	05-May-2003	Poul-Henning Kamp <phk@FreeBSD.org>	Add two KASSERTS which trigger if free(9) would drag the "memuse" statistic for a malloc bucket under zero. This typically happens if you malloc(9) from one bucket and free to another.
# 3f6ee876	25-Apr-2003	Poul-Henning Kamp <phk@FreeBSD.org>	Update the "last malloc failure timestamp" also for simulated malloc errors.
# f2538508	26-Mar-2003	Robert Watson <rwatson@FreeBSD.org>	Permit debug.malloc.failure_rate to be specified using a tunable so that the feature can be enabled during the boot process. Note the continued limitation that FreeBSD fails so rapidly with this setting enabled that it's hard to narrow down particular failures for correction; we really need per-malloc type failure rates.
# eae870cd	26-Mar-2003	Robert Watson <rwatson@FreeBSD.org>	Add a new kernel option, MALLOC_MAKE_FAILURES, which compiles in a debugging feature causing M_NOWAIT allocations to fail at a specified rate. This can be useful for detecting poor handling of M_NOWAIT: the most frequent problems I've bumped into are unconditional deference of the pointer even though it's NULL, and hangs as a result of a lost event where memory for the event couldn't be allocated. Two sysctls are added: debug.malloc.failure_rate How often to generate a failure: if set to 0 (default), this feature is disabled. Otherwise, the frequency of failures -- I've been using 10 (one in ten mallocs fails), but other popular settings might be much lower or much higher. debug.malloc.failure_count Number of times a coerced malloc failure has occurred as a result of this feature. Useful for tracking what might have happened and whether failures are being generated. Useful possible additions: tying failure rate to malloc type, printfs indicating the thread that experienced the coerced failure. Reviewed by: jeffr, jhb
# 194a0abf	10-Mar-2003	Poul-Henning Kamp <phk@FreeBSD.org>	PHCC[1]: I had commented the #ifdef INVARIANTS checks out to make sure I ran this code in all kernels and forgot to comment the #ifdefs back in before I committed. Spotted by: bmilekic [1] PHCC = Pointy Hat Correction Commit
# d3c11994	10-Mar-2003	Poul-Henning Kamp <phk@FreeBSD.org>	Make malloc and mbuf allocation mode flags nonoverlapping. Under INVARIANTS whine if we get incompatible flags. Submitted by: imp
# 025b4be1	19-Feb-2003	Bosko Milekic <bmilekic@FreeBSD.org>	o Allow "buckets" in mb_alloc to be differently sized (according to compile-time constants). That is, a "bucket" now is not necessarily a page-worth of mbufs or clusters, but it is MBUF_BUCK_SZ, CLUS_BUCK_SZ worth of mbufs, clusters. o Rename {mbuf,clust}_limit to {mbuf,clust}_hiwm and introduce {mbuf,clust}_lowm, which currently has no effect but will be used to set the low watermarks. o Fix netstat so that it can deal with the differently-sized buckets and teach it about the low watermarks too. o Make sure the per-cpu stats for an absent CPU has mb_active set to 0, explicitly. o Get rid of the allocate refcounts from mbuf map mess. Instead, just malloc() the refcounts in one shot from mbuf_init() o Clean up / update comments in subr_mbuf.c
# a163d034	18-Feb-2003	Warner Losh <imp@FreeBSD.org>	Back out M_* changes, per decision of the TRB. Approved by: trb
# 4db4f5c8	01-Feb-2003	Poul-Henning Kamp <phk@FreeBSD.org>	Under #ifdef DIAGNOSTIC, fill malloc(9) allocations which do not have M_ZERO specified with 0x70. (malloc_flags=J for the kernel :-)
# 44956c98	21-Jan-2003	Alfred Perlstein <alfred@FreeBSD.org>	Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
# 1fb14a47	01-Nov-2002	Poul-Henning Kamp <phk@FreeBSD.org>	Introduce malloc_last_fail() which returns the number of seconds since malloc(9) failed last time. This is intended to help code adjust memory usage to the current circumstances. A typical use could be: if (malloc_last_fail() < 60) reduce_cache_by_one();
# 99571dc3	18-Sep-2002	Jeff Roberson <jeff@FreeBSD.org>	- Split UMA_ZFLAG_OFFPAGE into UMA_ZFLAG_OFFPAGE and UMA_ZFLAG_HASH. - Remove all instances of the mallochash. - Stash the slab pointer in the vm page's object pointer when allocating from the kmem_obj. - Use the overloaded object pointer to find slabs for malloced memory.
# 280759e7	31-May-2002	Robert Drehmel <robert@FreeBSD.org>	- Replace the bandaid introduced in revision 1.110 with a better solution. - Add braces for a ``for'' statement containing a single multi-line statement.
# 45eefe71	20-May-2002	Jake Burkholder <jake@FreeBSD.org>	Add a bandaid so that sysctl kern.malloc works on sparc64.
# 42e49865	20-May-2002	John Baldwin <jhb@FreeBSD.org>	Fix the td_intr_nesting_level check to work ok if a flag like M_ZERO is passed in with M_WAITOK to malloc().
# 8f70816c	02-May-2002	Jeff Roberson <jeff@FreeBSD.org>	Hide a pointer to the malloc_type bucket at the end of the freed memory. If this memory is modified after it has been freed we can now report it's previous owner.
# 5a34a9f0	02-May-2002	Jeff Roberson <jeff@FreeBSD.org>	malloc/free(9) no longer require Giant. Use the malloc_mtx to protect the mallochash. Mallochash is going to go away as soon as I introduce the kfree/kmalloc api and partially overhaul the malloc wrapper. This can't happen until all users of the malloc api that expect memory to be aligned on the size of the allocation are fixed.
# 639c9550	01-May-2002	Jeff Roberson <jeff@FreeBSD.org>	Remove the temporary alignment check in free(). Implement the following checks on freed memory in the bucket path: - Slab membership - Alignment - Duplicate free This previously was only done if we skipped the buckets. This code will slow down INVARIANTS a bit, but it is smp safe. The checks were moved out of the normal path and into hooks supplied in uma_dbg.
# 289f207c	30-Apr-2002	Jeff Roberson <jeff@FreeBSD.org>	Convert longs to u_longs in stats. This will hold off wrap arounds for a while longer.
# 8efc4eff	30-Apr-2002	Jeff Roberson <jeff@FreeBSD.org>	Add a new UMA debugging facility. This will overwrite freed memory with 0xdeadc0de and then check for it just before memory is handed off as part of a new request. This will catch any post free/pre alloc modification of memory, as well as introduce errors for anything that tries to dereference it as a pointer. This code takes the form of special init, fini, ctor and dtor routines that are specificly used by malloc. It is in a seperate file because additional debugging aids will want to live here as well.
# 2cc35ff9	29-Apr-2002	Jeff Roberson <jeff@FreeBSD.org>	Move the implementation of M_ZERO into UMA so that it can be passed to uma_zalloc and friends. Remove this functionality from the malloc wrapper. Document this change in uma.h and adjust variable names in uma_core.
# 43a7c4e9	29-Apr-2002	Robert Watson <rwatson@FreeBSD.org>	Re-add the 16384 bucket also. Submitted by: green
# bd796eb2	29-Apr-2002	Robert Watson <rwatson@FreeBSD.org>	Revert a portion of kern_malloc.c:1.99, which (in addition to adding malloc profiling) also modified the set of pre-defined buckets for the memory allocator. For reasons unknown to me, this resulted in extensive memory corruption in the kernel, in particular on SMP boxes, so I'm committing this work-around until Jeff gets a chance to debug it properly. David Wolfskill pointed me at this commit as the one that might be a problem; I've been running this code on two dual-processor burn-in boxes for about 12 hours now, and the rate of panics due to memory corruption has dropped to zero (from one every five minutes). Hopefully not treading on the toes of: jeff
# 708da94e	23-Apr-2002	Poul-Henning Kamp <phk@FreeBSD.org>	Add a basic sanity check on pointers passed to free(9). Should be improved by: jeff
# 5e914b96	14-Apr-2002	Jeff Roberson <jeff@FreeBSD.org>	Finish adding support code for sysctl kern.mprof. This dumps some malloc information related to bucket size effeciency. Three things are printed on each row: Size is the size the user actually asked for rounded to 16 bytes. Requests is the number of times this size was asked for. Real Size is the size we actually handed out. At the end the total memory used and total waste is displayed. Currently my system displays about 33% wasted memory. The intent of this code is to gather statistics for tuning the malloc bucket sizes. It is not intended to be run with INVARIANTS and it is not entirely mp safe. It can be enabled via 'options MALLOC_PROFILE' which was commited earlier.
# 6f267175	14-Apr-2002	Jeff Roberson <jeff@FreeBSD.org>	Remove malloc_type's ks_limit. Updated the kmemzones logic such that the ks_size bitmap can be used as an index into it to report the size of the zone used. Create the kern.malloc sysctl which replaces the kvm mechanism to report similar data. This will provide an easy place for statistics aggregation if malloc_type statistics become per cpu data. Add some code ifdef'd under MALLOC_PROFILING to facilitate a tool for sizing the malloc buckets.
# 6008862b	04-Apr-2002	John Baldwin <jhb@FreeBSD.org>	Change callers of mtx_init() to pass in an appropriate lock type name. In most cases NULL is passed, but in some cases such as network driver locks (which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used. Tested on: i386, alpha, sparc64
# 4d77a549	19-Mar-2002	Alfred Perlstein <alfred@FreeBSD.org>	Remove __P.
# 8355f576	19-Mar-2002	Jeff Roberson <jeff@FreeBSD.org>	This is the first part of the new kernel memory allocator. This replaces malloc(9) and vm_zone with a slab like allocator. Reviewed by: arch@
# 44a8ff31	12-Mar-2002	Archie Cobbs <archie@FreeBSD.org>	Add realloc() and reallocf(), and make free(NULL, ...) acceptable. Reviewed by: alfred
# b40ce416	12-Sep-2001	Julian Elischer <julian@FreeBSD.org>	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
# c4a44810	10-Aug-2001	John Baldwin <jhb@FreeBSD.org>	- Remove asleep(), await(), and M_ASLEEP. - Callers of asleep() and await() have been converted to calling tsleep(). The only caller outside of M_ASLEEP was the ata driver, which called both asleep() and await() with spl-raised, so there was no need for the asleep() and await() pair. M_ASLEEP was unused. Reviewed by: jasone, peter
# ba3e8826	02-Aug-2001	Bosko Milekic <bmilekic@FreeBSD.org>	Rename mb_init() mbuf subsystem initialization routine to mbuf_init(), in order to avoid namespace collision with subr_mchain.c's mb_init(). This wasn't "fatal" as the mbuf initialization routine mb_init() was local to subr_mbuf.c which in turn didn't pull in subr_mchain.c's mb_init() declaration, but it should deffinately be changed now before it creates headache.
# f74250ca	02-Aug-2001	Jake Burkholder <jake@FreeBSD.org>	Remove some code that appears to have endian problems with INVARIANTS. This is #if BIG_ENDIAN, but is only necessary if malloc types are shorts, not struct malloc_type * like they are now.
# 08442f8a	22-Jun-2001	Bosko Milekic <bmilekic@FreeBSD.org>	Introduce numerous SMP friendly changes to the mbuf allocator. Namely, introduce a modified allocation mechanism for mbufs and mbuf clusters; one which can scale under SMP and which offers the possibility of resource reclamation to be implemented in the future. Notable advantages: o Reduce contention for SMP by offering per-CPU pools and locks. o Better use of data cache due to per-CPU pools. o Much less code cache pollution due to excessively large allocation macros. o Framework for `grouping' objects from same page together so as to be able to possibly free wired-down pages back to the system if they are no longer needed by the network stacks. Additional things changed with this addition: - Moved some mbuf specific declarations and initializations from sys/conf/param.c into mbuf-specific code where they belong. - m_getclr() has been renamed to m_get_clrd() because the old name is really confusing. m_getclr() HAS been preserved though and is defined to the new name. No tree sweep has been done "to change the interface," as the old name will continue to be supported and is not depracated. The change was merely done because m_getclr() sounds too much like "m_get a cluster." - TEMPORARILY disabled mbtypes statistics displaying in netstat(1) and systat(1) (see TODO below). - Fixed systat(1) to display number of "free mbufs" based on new per-CPU stat structures. - Fixed netstat(1) to display new per-CPU stats based on sysctl-exported per-CPU stat structures. All infos are fetched via sysctl. TODO (in order of priority): - Re-enable mbtypes statistics in both netstat(1) and systat(1) after introducing an SMP friendly way to collect the mbtypes stats under the already introduced per-CPU locks (i.e. hopefully don't use atomic() - it seems too costly for a mere stat update, especially when other locks are already present). - Optionally have systat(1) display not only "total free mbufs" but also "total free mbufs per CPU pool." - Fix minor length-fetching issues in netstat(1) related to recently re-enabled option to read mbuf stats from a core file. - Move reference counters at least for mbuf clusters into an unused portion of the cluster itself, to save space and need to allocate a counter. - Look into introducing resource freeing possibly from a kproc. Reviewed by (in parts): jlemon, jake, silby, terry Tested by: jlemon (Intel & Alpha), mjacob (Intel & Alpha) Preliminary performance measurements: jlemon (and me, obviously) URL: http://people.freebsd.org/~bmilekic/mb_alloc/
# 09786698	07-Jun-2001	Peter Wemm <peter@FreeBSD.org>	"Fix" the previous initial attempt at fixing TUNABLE_INT(). This time around, use a common function for looking up and extracting the tunables from the kernel environment. This saves duplicating the same function over and over again. This way typically has an overhead of 8 bytes + the path string, versus about 26 bytes + the path string.
# 4422746f	06-Jun-2001	Peter Wemm <peter@FreeBSD.org>	Back out part of my previous commit. This was a last minute change and I botched testing. This is a perfect example of how NOT to do this sort of thing. :-(
# 81930014	06-Jun-2001	Peter Wemm <peter@FreeBSD.org>	Make the TUNABLE_() macros look and behave more consistantly like the SYSCTL_() macros. TUNABLE_INT_DECL() was an odd name because it didn't actually declare the int, which is what the name suggests it would do.
# fb919e4d	01-May-2001	Mark Murray <markm@FreeBSD.org>	Undo part of the tangle of having sys/lock.h and sys/mutex.h included in other "system" header files. Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files. Sort sys/*.h includes where possible in affected files. OK'ed by: bde (with reservations)
# d04d50d1	18-Apr-2001	Bosko Milekic <bmilekic@FreeBSD.org>	Fix inconsistency in setup of kernel_map: we need to make sure that we also reserve _adequate_ space for the mb_map submap; i.e. we need space for nmbclusters, nmbufs, _and_ nmbcnt. Furthermore, we need to rounddown, and not roundup, so that we are consistent. Pointed out by: bde
# 9ed346ba	08-Feb-2001	Bosko Milekic <bmilekic@FreeBSD.org>	Change and clean the mutex lock interface. mtx_enter(lock, type) becomes: mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized) similarily, for releasing a lock, we now have: mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument. The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind. Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two: MTX_QUIET and MTX_NOSWITCH The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers: mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively. Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case. Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled. Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those. Finally, caught up to the interface changes in all sys code. Contributors: jake, jhb, jasone (in no particular order)
# 1707240d	30-Jan-2001	Boris Popov <bp@FreeBSD.org>	Let M_PANIC go back to the private tree as its intention isn't understood well for now.
# 9211b0b6	28-Jan-2001	Boris Popov <bp@FreeBSD.org>	Add M_PANIC flag to the list of available flags passed to malloc(). With this flag set malloc() will panic if memory allocation failed. This usable only in critical places where failed allocation is fatal. Reviewed by: peter
# 0fee3d35	26-Jan-2001	Peter Wemm <peter@FreeBSD.org>	p->p_intr_nesting_level is MI now and initialized to 0 in kern_fork.c, so it should be save to KASSERT() on it even on an arch that may not use it.
# 0d6d6aa3	23-Jan-2001	John Baldwin <jhb@FreeBSD.org>	Don't grab Giant when calling kmem_alloc/kmem_free as this is just encouraging other people to follow the same practice. If this is going to be done, then it should be done inside of those two functions instead.
# a448b62a	21-Jan-2001	Jake Burkholder <jake@FreeBSD.org>	Make intr_nesting_level per-process, rather than per-cpu. Setup interrupt threads to run with it always >= 1, so that malloc can detect M_WAITOK from "interrupt" context. This is also necessary in order to context switch from sched_ithd() directly. Reviewed By: peter
# d1c1b841	21-Jan-2001	Jason Evans <jasone@FreeBSD.org>	Remove MUTEX_DECLARE() and MTX_COLD. Instead, postpone full mutex initialization until after malloc() is safe to call, then iterate through all mutexes and complete their initialization. This change is necessary in order to avoid some circular bootstrapping dependencies.
# ef73ae4b	09-Jan-2001	Jake Burkholder <jake@FreeBSD.org>	Use PCPU_GET, PCPU_PTR and PCPU_SET to access all per-cpu variables other then curproc.
# 1921a06d	20-Oct-2000	Poul-Henning Kamp <phk@FreeBSD.org>	Introduce the M_ZERO flag to malloc(9) Instead of: foo = malloc(sizeof(foo), M_WAIT); bzero(foo, sizeof(foo)); You can now (and please do) use: foo = malloc(sizeof(foo), M_WAIT \| M_ZERO); In the future this will enable us to do idle-time pre-zeroing of malloc-space.
# eec258d2	20-Oct-2000	John Baldwin <jhb@FreeBSD.org>	- machine/mutex.h -> sys/mutex.h - Use MUTEX_DECLARE() and MTX_COLD for the malloc_mtx mutex
# 9a02e8c6	22-Sep-2000	Jason Evans <jasone@FreeBSD.org>	Don't #include <sys/proc.h>, since machine/mutex.h does it now.
# 606f8eb2	14-Sep-2000	John Baldwin <jhb@FreeBSD.org>	Remove the mtx_t, witness_t, and witness_blessed_t types. Instead, just use struct mtx, struct witness, and struct witness_blessed. Requested by: bde
# 69ef67f9	10-Sep-2000	Jason Evans <jasone@FreeBSD.org>	Add malloc_mtx to protect malloc and friends, so that they're thread-safe. Reviewed by: peter
# 28d4c2dd	09-Sep-2000	Jason Evans <jasone@FreeBSD.org>	Back out the addition of malloc_mtx. It was incompletely conceived, and will be done correctly in the future.
# 9360d3eb	09-Sep-2000	Jason Evans <jasone@FreeBSD.org>	Add a mutex to the malloc interfaces so that it can safely be called without owning the Giant lock.
# 5badeaba	29-Jun-2000	Boris Popov <bp@FreeBSD.org>	Move #ifdef to the right place.
# 99063cf8	28-Jun-2000	Boris Popov <bp@FreeBSD.org>	If kernel compiled with INVARIANTS: On unload, remove references from freelist to memory type defined by module. Print a warning if module defines and allocate its own memory type, but didn't free it all on unload. Reviewed by: peter
# 8e8cac55	14-Jun-2000	Bruce Evans <bde@FreeBSD.org>	sys/malloc.h: Order the SYSINIT() for MALLOC_DEFINE() correctly so that malloc() doesn't have to waste time initializing itself. The (SI_SUB_KMEM, SI_ORDER_ANY) order was shared with syscons' SYSINIT() for scmeminit(), and scmeminit() calls malloc(), so malloc() initialization was not always complete on the first call to malloc(). kern/kern_malloc.c: - Removed self-initialization in malloc(). - Removed half-baked sanity check in free(). Trust MALLOC_DEFINE().
# dc760634	14-Mar-2000	Jun Kuriyama <kuriyama@FreeBSD.org>	Print "previous type" correctly when INVARIANTS is defined. Reviewed by: current@FreeBSD.org
# 1f6889a1	16-Feb-2000	Matthew Dillon <dillon@FreeBSD.org>	Fix null-pointer dereference crash when the system is intentionally run out of KVM through a mmap()/fork() bomb that allocates hundreds of thousands of vm_map_entry structures. Add panic to make null-pointer dereference crash a little more verbose. Add a new sysctl, vm.max_proc_mmap, which specifies the maximum number of mmap()'d spaces (discrete vm_map_entry's in the process). The value defaults to around 9000 for a 128MB machine. The test is scaled for the number of processes sharing a vmspace (aka linux threads). Setting the value to 0 disables the feature. PR: kern/16573 Approved by: jkh
# 27b8623f	27-Jan-2000	David Greenman <dg@FreeBSD.org>	Fixed sign and overflow bugs that caused the allocation size of the kernel malloc region (kmem_map) to be wrong and semi-random on systems with more than 1GB of RAM. This is not a complete fix, but is sufficient for machines with 4GB or less of memory. A complete fix will require some changes to the getenv stuff so that 64bit values can be passed around. NOT FIXED: machines with more than 4GB of RAM (e.g. some large Alphas) since we're still using ints to hold some of the values. Reviewed by: bde
# 82cd038d	21-Nov-1999	Yoshinobu Inoue <shin@FreeBSD.org>	KAME netinet6 basic part(no IPsec,no V6 Multicast Forwarding, no UDP/TCP for IPv6 yet) With this patch, you can assigne IPv6 addr automatically, and can reply to IPv6 ping. Reviewed by: freebsd-arch, cvs-committers Obtained from: KAME project
# 3b6fb885	02-Oct-1999	Poul-Henning Kamp <phk@FreeBSD.org>	Before we start to mess with the VFS name-cache clean things up a little bit: Isolate the namecache in its own file, and give it a dedicated malloc type.
# 984982d6	19-Sep-1999	Poul-Henning Kamp <phk@FreeBSD.org>	KASSERT that we cannot use M_WAITOK in interrupt context. Reviewed by: bde
# 9ef246c6	11-Sep-1999	Bruce Evans <bde@FreeBSD.org>	Get rid of MALLOC_INSTANTIATE and MALLOC_MAKE_TYPE(). Just handle the 3 malloc types declared in <sys/malloc.h> like other global malloc types.
# c3aac50f	27-Aug-1999	Peter Wemm <peter@FreeBSD.org>	$Id$ -> $FreeBSD$
# 134c934c	05-Jul-1999	Mike Smith <msmith@FreeBSD.org>	Move the initialisation/tuning of nmbclusters from param.c/machdep.c into uipc_mbuf.c. This reduces three sets of identical tunable code to one set, and puts the initialisation with the mbuf code proper. Make NMBUFs tunable as well. Move the nmbclusters sysctl here as well. Move the initialisation of maxsockets from param.c to uipc_socket2.c, next to its corresponding sysctl. Use the new tunable macros for the kern.vm.kmem.size tunable (this should have been in a separate commit, whoops).
# ce45b512	12-May-1999	Bruce Evans <bde@FreeBSD.org>	Fixed corruption of the kmemstatistcs list. The first malloc() with malloc type at the tail of the list changed the list from linear to circular. This seemed to cause surprisingly few problems, but it now causes weird output from `vmstat -m', probably because a more important malloc type is now at the tail of the list. Fix it by abusing ks_limit instead of ks_next as a flag for being on the list. Don't forget to clear the flag when a malloc type is uninit'ed. Uninit'ing is still fundamentally broken -- it loses history.
# dfd5dee1	06-May-1999	Peter Wemm <peter@FreeBSD.org>	Add sufficient braces to keep egcs happy about potentially ambiguous if/else nesting.
# d254af07	27-Jan-1999	Matthew Dillon <dillon@FreeBSD.org>	Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile
# 8de6e8e1	21-Jan-1999	Mike Smith <msmith@FreeBSD.org>	Allow VM_KMEM_SIZE to be tuned from the kernel environment. This tuning value completely overrides any value precalculated by the kernel.
# 1c7c3c6a	21-Jan-1999	Matthew Dillon <dillon@FreeBSD.org>	This is a rather large commit that encompasses the new swapper, changes to the VM system to support the new swapper, VM bug fixes, several VM optimizations, and some additional revamping of the VM code. The specific bug fixes will be documented with additional forced commits. This commit is somewhat rough in regards to code cleanup issues. Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
# 219cbf59	09-Jan-1999	Eivind Eklund <eivind@FreeBSD.org>	KNFize, by bde.
# 5526d2d9	08-Jan-1999	Eivind Eklund <eivind@FreeBSD.org>	Split DIAGNOSTIC -> DIAGNOSTIC, INVARIANTS, and INVARIANT_SUPPORT as discussed on -hackers. Introduce 'KASSERT(assertion, ("panic message", args))' for simple check + panic. Reviewed by: msmith
# db669378	10-Nov-1998	Peter Wemm <peter@FreeBSD.org>	Have MALLOC_DECLARE() initialize malloc types explicitly, and have them removed at module unload (if in a module of course). However; this introduces a new dependency on <sys/kernel.h> for things that use MALLOC_DECLARE(). Bruce told me it is better to add sys/kernel.h to the handful of files that need it rather than add an extra include to sys/malloc.h for kernel compiles. Updates to follow in subsequent commits.
# f5ef029e	25-Oct-1998	Poul-Henning Kamp <phk@FreeBSD.org>	Nitpicking and dusting performed on a train. Removes trivial warnings about unused variables, labels and other lint.
# 86a14a7a	15-Aug-1998	Bruce Evans <bde@FreeBSD.org>	Use [u]intptr_t instead of [u_]long for casts between pointers and integers. Don't forget to cast to (void *) as well.
# d974cf4d	29-Jul-1998	Bruce Evans <bde@FreeBSD.org>	Fixed printf format errors.
# b1897c19	08-Mar-1998	Julian Elischer <julian@FreeBSD.org>	Reviewed by: dyson@freebsd.org (john Dyson), dg@root.com (david greenman) Submitted by: Kirk McKusick (mcKusick@mckusick.com) Obtained from: WHistle development tree
# 8a58a9f6	23-Feb-1998	John Dyson <dyson@FreeBSD.org>	Try to dynamically size the VM_KMEM_SIZE (but is still able to be overridden in a way identically as before.) I had problems with the system properly handling the number of vnodes when there is alot of system memory, and the default VM_KMEM_SIZE. Two new options "VM_KMEM_SIZE_SCALE" and "VM_KMEM_SIZE_MAX" have been added to support better auto-sizing for systems with greater than 128MB.
# 303b270b	08-Feb-1998	Eivind Eklund <eivind@FreeBSD.org>	Staticize.
# 0b08f5f7	05-Feb-1998	Eivind Eklund <eivind@FreeBSD.org>	Back out DIAGNOSTIC changes.
# 95461b45	04-Feb-1998	John Dyson <dyson@FreeBSD.org>	1) Start using a cleaner and more consistant page allocator instead of the various ad-hoc schemes. 2) When bringing in UPAGES, the pmap code needs to do another vm_page_lookup. 3) When appropriate, set the PG_A or PG_M bits a-priori to both avoid some processor errata, and to minimize redundant processor updating of page tables. 4) Modify pmap_protect so that it can only remove permissions (as it originally supported.) The additional capability is not needed. 5) Streamline read-only to read-write page mappings. 6) For pmap_copy_page, don't enable write mapping for source page. 7) Correct and clean-up pmap_incore. 8) Cluster initial kern_exec pagin. 9) Removal of some minor lint from kern_malloc. 10) Correct some ioopt code. 11) Remove some dead code from the MI swapout routine. 12) Correct vm_object_deallocate (to remove backing_object ref.) 13) Fix dead object handling, that had problems under heavy memory load. 14) Add minor vm_page_lookup improvements. 15) Some pages are not in objects, and make sure that the vm_page.c can properly support such pages. 16) Add some more page deficit handling. 17) Some minor code readability improvements.
# 47cfdb16	04-Feb-1998	Eivind Eklund <eivind@FreeBSD.org>	Turn DIAGNOSTIC into a new-style option.
# 2d8acc0f	22-Jan-1998	John Dyson <dyson@FreeBSD.org>	VM level code cleanups. 1) Start using TSM. Struct procs continue to point to upages structure, after being freed. Struct vmspace continues to point to pte object and kva space for kstack. u_map is now superfluous. 2) vm_map's don't need to be reference counted. They always exist either in the kernel or in a vmspace. The vmspaces are managed by reference counts. 3) Remove the "wired" vm_map nonsense. 4) No need to keep a cache of kernel stack kva's. 5) Get rid of strange looking ++var, and change to var++. 6) Change more data structures to use our "zone" allocator. Added struct proc, struct vmspace and struct vnode. This saves a significant amount of kva space and physical memory. Additionally, this enables TSM for the zone managed memory. 7) Keep ioopt disabled for now. 8) Remove the now bogus "single use" map concept. 9) Use generation counts or id's for data structures residing in TSM, where it allows us to avoid unneeded restart overhead during traversals, where blocking might occur. 10) Account better for memory deficits, so the pageout daemon will be able to make enough memory available (experimental.) 11) Fix some vnode locking problems. (From Tor, I think.) 12) Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp. (experimental.) 13) Significantly shrink, cleanup, and make slightly faster the vm_fault.c code. Use generation counts, get rid of unneded collpase operations, and clean up the cluster code. 14) Make vm_zone more suitable for TSM. This commit is partially as a result of discussions and contributions from other people, including DG, Tor Egge, PHK, and probably others that I have forgotten to attribute (so let me know, if I forgot.) This is not the infamous, final cleanup of the vnode stuff, but a necessary step. Vnode mgmt should be correct, but things might still change, and there is still some missing stuff (like ioopt, and physical backing of non-merged cache files, debugging of layering concepts.)
# d4060a87	04-Dec-1997	John Dyson <dyson@FreeBSD.org>	Some fixes from John Hood: 1) Fix the initialization of malloc structure that changed due to perf opt. 2) Remove unneeded include. 3) An initialization assert added to malloc. Submitted by: John Hood <cgull@smoke.marlboro.vt.us>
# d1bbc7ec	28-Oct-1997	Poul-Henning Kamp <phk@FreeBSD.org>	Remove the long description from the in-kernel datastructure. Put a magic field in there instead, to help catch uninitialized malloc types.
# a1c995b6	12-Oct-1997	Poul-Henning Kamp <phk@FreeBSD.org>	Last major round (Unless Bruce thinks of somthing :-) of malloc changes. Distribute all but the most fundamental malloc types. This time I also remembered the trick to making things static: Put "static" in front of them. A couple of finer points by: bde
# 22c64348	11-Oct-1997	Poul-Henning Kamp <phk@FreeBSD.org>	Freeing with unknown type is a panic kind of thing.
# a42428bb	11-Oct-1997	Poul-Henning Kamp <phk@FreeBSD.org>	Remove a debug printf entirely.
# bd975223	11-Oct-1997	Peter Wemm <peter@FreeBSD.org>	Disable an extremely annoying printf.
# 60a513e9	10-Oct-1997	Poul-Henning Kamp <phk@FreeBSD.org>	Rename "struct kmemstats" to "struct malloc_type" it makes more sense now. Fix type argument to hashinit() and phashinit()
# 254c6cb3	10-Oct-1997	Poul-Henning Kamp <phk@FreeBSD.org>	Make malloc more extensible. The malloc type is now a pointer to the struct kmemstats that describes the type. This allows subsystems to declare their malloc types locally and <sys/malloc.h> doesn't need tweaked everytime somebody gets an idea. You can even have a type local to a lkm... I don't know if we really need the longdesc, comments welcome. TODO: There is a single nit in ext2fs, that will be fixed later, and I intend to remove all unused malloc types and distribute the rest closer to their use.
# 043a2f3b	16-Sep-1997	Bruce Evans <bde@FreeBSD.org>	Fixed staticization. buckets[] was staticized but was still declared extern in <sys/malloc.h> and it should not have been staticized for the !(KMEMSTATS \|\| DIAGNOSTIC) case. Fixed the !(KMEMSTATS \|\| DIAGNOSTIC) case. The MALLOC() and FREE() macros are evil, but code generally doesn't allow for this and some code involving else clauses did not compile. Finished staticization.
# e4ba6a82	02-Sep-1997	Bruce Evans <bde@FreeBSD.org>	Removed unused #includes.
# 3075778b	04-Aug-1997	John Dyson <dyson@FreeBSD.org>	Get rid of the ad-hoc memory allocator for vm_map_entries, in lieu of a simple, clean zone type allocator. This new allocator will also be used for machine dependent pmap PV entries.
# 358311fe	24-Jun-1997	David Greenman <dg@FreeBSD.org>	Killed bogus kernacc() call in malloc() DIAGNOSTIC code. kernacc() by it's nature, locks the kernal_map, and this is deadly if kernal_map had been locked previous to a (net) interrupt.
# 6875d254	22-Feb-1997	Peter Wemm <peter@FreeBSD.org>	Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
# 1130b656	14-Jan-1997	Jordan K. Hubbard <jkh@FreeBSD.org>	Make the long-awaited change from $Id$ to $FreeBSD$ This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
# ca67a4e4	04-Aug-1996	Poul-Henning Kamp <phk@FreeBSD.org>	The check for multiple freed items were bogus. fixed.
# 14bf02f8	18-May-1996	John Dyson <dyson@FreeBSD.org>	Minor performance improvement to kern_malloc.c that increases the probability of reuse of recently freed memory. This improves cache hit stats on cached memory, and improves at least fork speed consistancy.
# cb7545a9	10-May-1996	Garrett Wollman <wollman@FreeBSD.org>	Allocate mbufs from a separate submap so that NMBCLUSTERS works as expected.
# e911eafc	02-May-1996	Poul-Henning Kamp <phk@FreeBSD.org>	removed: CLBYTES PD_SHIFT PGSHIFT NBPG PGOFSET CLSIZELOG2 CLSIZE pdei() ptei() kvtopte() ptetov() ispt() ptetoav() &c &c new: NPDEPG Major macro cleanup.
# f8845af0	02-May-1996	Poul-Henning Kamp <phk@FreeBSD.org>	First pass at cleaning up macros relating to pages, clusters and all that.
# edbfedac	11-Mar-1996	Peter Wemm <peter@FreeBSD.org>	Import 4.4BSD-Lite2 onto the vendor branch, note that in the kernel, all files are off the vendor branch, so this should not change anything. A "U" marker generally means that the file was not changed in between the 4.4Lite and Lite-2 releases, and does not need a merge. "C" generally means that there was a change. [note new unused (in this form) syscalls.conf, to be 'cvs rm'ed]
# 07bbd7f1	29-Jan-1996	David Greenman <dg@FreeBSD.org>	Implement what I mentioned in rev 1.18: limit per-bucket allocations to 60% of physical memory or 60% of malloc area size, whichever is smaller.
# 54e7152c	29-Jan-1996	David Greenman <dg@FreeBSD.org>	Fixed two bugs in the calculation of the malloc area (kmem_map) size: 1) The calculation didn't account for NMBCLUSTERS, so if a large number of clusters was specified, it would leave little or no space for kernel malloc. 2) It was bogusly restricted to v_page_count. This doesn't take into account the sparseness of the malloc area and would have caused problems on machines with small amounts of memory. It should probably instead be changed to set the malloc limit to be constrained by the amount of memory, but I didn't do this.
# 87b6de2b	14-Dec-1995	Poul-Henning Kamp <phk@FreeBSD.org>	A Major staticize sweep. Generates a couple of warnings that I'll deal with later. A number of unused vars removed. A number of unused procs removed or #ifdefed.
# efeaf95a	06-Dec-1995	David Greenman <dg@FreeBSD.org>	Untangled the vm.h include file spaghetti.
# d841aaa7	02-Dec-1995	Bruce Evans <bde@FreeBSD.org>	Finished (?) cleaning up sysinit stuff.
# 4590fd3a	09-Sep-1995	David Greenman <dg@FreeBSD.org>	Fixed init functions argument type - caddr_t -> void *. Fixed a couple of compiler warnings.
# 2b14f991	28-Aug-1995	Julian Elischer <julian@FreeBSD.org>	Reviewed by: julian with quick glances by bruce and others Submitted by: terry (terry lambert) This is a composite of 3 patch sets submitted by terry. they are: New low-level init code that supports loadbal modules better some cleanups in the namei code to help terry in 16-bit character support some changes to the mount-root code to make it a little more modular.. NOTE: mounting root off cdrom or NFS MIGHT be broken as I haven't been able to test those cases.. certainly mounting root of disk still works just fine.. mfs should work but is untested. (tomorrows task) The low level init stuff includes a total rewrite of init_main.c to make it possible for new modules to have an init phase by simply adding an entry to a TEXT_SET (or is it DATA_SET) list. thus a new module can be added to the kernel without editing any other files other than the 'files' file.
# 9b2e5354	30-May-1995	Rodney W. Grimes <rgrimes@FreeBSD.org>	Remove trailing whitespace.
# 5124d598	16-Apr-1995	David Greenman <dg@FreeBSD.org>	Make vegetarian and animal rights people happy and use 0xdeadc0de instead of 0xdeadbeef as the fill pattern. Decreased MAX_COPY to 64 (256 was a bit overzealous in most cases).
# edf8a815	19-Mar-1995	David Greenman <dg@FreeBSD.org>	Removed redundant newlines that were in some panic strings.
# 17dda4c9	11-Mar-1995	David Greenman <dg@FreeBSD.org>	Added some additional DIAGNOSTIC code that makes sure that freed memory addresses and types are with the valid range. Increased MAX_COPY to 256 (used to verify no freed memory use with DIAGNOSTIC).
# 9f518539	02-Feb-1995	David Greenman <dg@FreeBSD.org>	Calling semantics for kmem_malloc() have been changed...and the third argument is now more than just a single flag. (kern_malloc.c) Used new M_KERNEL value for socket allocations that previous were "M_NOWAIT". Note that this will change when we clean up the M_ namespace mess. Submitted by: John Dyson
# 0d94caff	09-Jan-1995	David Greenman <dg@FreeBSD.org>	These changes embody the support of the fully coherent merged VM buffer cache, much higher filesystem I/O performance, and much better paging performance. It represents the culmination of over 6 months of R&D. The majority of the merged VM/cache work is by John Dyson. The following highlights the most significant changes. Additionally, there are (mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to support the new VM/buffer scheme. vfs_bio.c: Significant rewrite of most of vfs_bio to support the merged VM buffer cache scheme. The scheme is almost fully compatible with the old filesystem interface. Significant improvement in the number of opportunities for write clustering. vfs_cluster.c, vfs_subr.c Upgrade and performance enhancements in vfs layer code to support merged VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff. vm_object.c: Yet more improvements in the collapse code. Elimination of some windows that can cause list corruption. vm_pageout.c: Fixed it, it really works better now. Somehow in 2.0, some "enhancements" broke the code. This code has been reworked from the ground-up. vm_fault.c, vm_page.c, pmap.c, vm_object.c Support for small-block filesystems with merged VM/buffer cache scheme. pmap.c vm_map.c Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of kernel PTs. vm_glue.c Much simpler and more effective swapping code. No more gratuitous swapping. proc.h Fixed the problem that the p_lock flag was not being cleared on a fork. swap_pager.c, vnode_pager.c Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the code doesn't need it anymore. machdep.c Changes to better support the parameter values for the merged VM/buffer cache scheme. machdep.c, kern_exec.c, vm_glue.c Implemented a seperate submap for temporary exec string space and another one to contain process upages. This eliminates all map fragmentation problems that previously existed. ffs_inode.c, ufs_inode.c, ufs_readwrite.c Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on busy buffers. Submitted by: John Dyson and David Greenman
# 763424fc	16-Dec-1994	David Greenman <dg@FreeBSD.org>	Changed splimp to splhigh to close a potential hole that could lead to corrupted malloc data structures caused by frees occurring at other than splimp.
# 35c10d22	09-Oct-1994	David Greenman <dg@FreeBSD.org>	Got rid of map.h. It's a leftover from the rmap code, and we use rlists. Changed swapmap into swaplist.
# 797f2d22	02-Oct-1994	Poul-Henning Kamp <phk@FreeBSD.org>	All of this is cosmetic. prototypes, #includes, printfs and so on. Makes GCC a lot more silent.
# 3c4dd356	02-Aug-1994	David Greenman <dg@FreeBSD.org>	Added $Id$
# 26f9a767	25-May-1994	Rodney W. Grimes <rgrimes@FreeBSD.org>	The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch. Reviewed by: Rodney W. Grimes Submitted by: John Dyson and David Greenman
# df8bae1d	24-May-1994	Rodney W. Grimes <rgrimes@FreeBSD.org>	BSD 4.4 Lite Kernel Sources