History log of /freebsd-10.1-release/sys/vm/uma.h
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 272461 02-Oct-2014 gjb

Copy stable/10@r272459 to releng/10.1 as part of
the 10.1-RELEASE process.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

# 262739 04-Mar-2014 glebius

Merge r261722, r261723, r261724, r261725 from head:
several minor improvements for UMA_ZPCPU_ZONE zones.


# 256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


# 252226 25-Jun-2013 jeff

- Resolve bucket recursion issues by passing a cookie with zone flags
through bucket_alloc() to uma_zalloc_arg() and uma_zfree_arg().
- Make some smaller buckets for large zones to further reduce memory
waste.
- Implement uma_zone_reserve(). This holds aside a number of items only
for callers who specify M_USE_RESERVE. buckets will never be filled
from reserve allocations.

Sponsored by: EMC / Isilon Storage Division


# 252040 20-Jun-2013 jeff

- Add a per-zone lock for zones without kegs.
- Be more explicit about zone vs keg locking. This functionally changes
almost nothing.
- Add a size parameter to uma_zcache_create() so we can size the buckets.
- Pass the zone to bucket_alloc() so it can modify allocation flags
as appropriate.
- Fix a bug in zone_alloc_bucket() where I missed an address of operator
in a failure case. (Found by pho)

Sponsored by: EMC / Isilon Storage Division


# 251826 17-Jun-2013 jeff

- Add a new UMA API: uma_zcache_create(). This makes a zone without any
backing memory that is only a container for per-cpu caches of arbitrary
pointer items. These zones have no kegs.
- Convert the regular keg based allocator to use the new import/release
functions.
- Move some stats to be atomics since they would require excessive zone
locking/unlocking with the new import/release paradigm. Make
zone_free_item simpler now that callers can manage more stats.
- Check for these cache-only zones in the public APIs and debugging
code by checking zone_first_keg() against NULL.

Sponsored by: EMC / Isilong Storage Division


# 249313 09-Apr-2013 glebius

Convert UMA code to C99 uintXX_t types.


# 249264 08-Apr-2013 glebius

Merge from projects/counters: UMA_ZONE_PCPU zones.

These zones have slab size == sizeof(struct pcpu), but request from VM
enough pages to fit (uk_slabsize * mp_ncpus). An item allocated from such
zone would have a separate twin for each CPU in the system, and these twins
are at a distance of sizeof(struct pcpu) from each other. This magic value
of distance would allow us to make some optimizations later.

To address private item from a CPU simple arithmetics should be used:

item = (type *)((char *)base + sizeof(struct pcpu) * curcpu)

These arithmetics are available as zpcpu_get() macro in pcpu.h.

To introduce non-page size slabs a new field had been added to uma_keg
uk_slabsize. This shifted some frequently used fields of uma_keg to the
fourth cache line on amd64. To mitigate this pessimization, uma_keg fields
were a bit rearranged and least frequently used uk_name and uk_link moved
down to the fourth cache line. All other fields, that are dereferenced
frequently fit into first three cache lines.

Sponsored by: Nginx, Inc.


# 247360 26-Feb-2013 attilio

Merge from vmc-playground branch:
Replace the sub-optimal uma_zone_set_obj() primitive with more modern
uma_zone_reserve_kva(). The new primitive reserves before hand
the necessary KVA space to cater the zone allocations and allocates pages
with ALLOC_NOOBJ. More specifically:
- uma_zone_reserve_kva() does not need an object to cater the backend
allocator.
- uma_zone_reserve_kva() can cater M_WAITOK requests, in order to
serve zones which need to do uma_prealloc() too.
- When possible, uma_zone_reserve_kva() uses directly the direct-mapping
by uma_small_alloc() rather than relying on the KVA / offset
combination.

The removal of the object attribute allows 2 further changes:
1) _vm_object_allocate() becomes static within vm_object.c
2) VM_OBJECT_LOCK_INIT() is removed. This function is replaced by
direct calls to mtx_init() as there is no need to export it anymore
and the calls aren't either homogeneous anymore: there are now small
differences between arguments passed to mtx_init().

Sponsored by: EMC / Isilon storage division
Reviewed by: alc (which also offered almost all the comments)
Tested by: pho, jhb, davide


# 244024 08-Dec-2012 pjd

White-space cleanups.


# 243998 07-Dec-2012 pjd

Implemented uma_zone_set_warning(9) function that sets a warning, which
will be printed once the given zone becomes full and cannot allocate an
item. The warning will not be printed more often than every five minutes.

All UMA warnings can be globally turned off by setting sysctl/tunable
vm.zone_warnings to 0.

Discussed on: arch
Obtained from: WHEEL Systems
MFC after: 2 weeks


# 242152 26-Oct-2012 mdf

Const-ify the zone name argument to uma_zcreate(9).

MFC after: 3 days


# 230623 27-Jan-2012 kmacy

exclude kmem_alloc'ed ARC data buffers from kernel minidumps on amd64
excluding other allocations including UMA now entails the addition of
a single flag to kmem_alloc or uma zone create

Reviewed by: alc, avg
MFC after: 2 weeks


# 226313 12-Oct-2011 glebius

Make memguard(9) capable to guard uma(9) allocations.


# 213911 16-Oct-2010 lstewart

Change uma_zone_set_max to return the effective value of "nitems" after
rounding. The same value can also be obtained with uma_zone_get_max, but this
change avoids a caller having to make two back-to-back calls.

Sponsored by: FreeBSD Foundation
Reviewed by: gnn, jhb


# 213910 16-Oct-2010 lstewart

- Simplify implementation of uma_zone_get_max.
- Add uma_zone_get_cur which returns the current approximate occupancy of
a zone. This is useful for providing stats via sysctl amongst other things.

Sponsored by: FreeBSD Foundation
Reviewed by: gnn, jhb
MFC after: 2 weeks


# 211396 16-Aug-2010 andre

Add uma_zone_get_max() to obtain the effective limit after a call
to uma_zone_set_max().

The UMA zone limit is not exactly set to the value supplied but
rounded up to completely fill the backing store increment (a page
normally). This can lead to surprising situations where the number
of elements allocated from UMA is higher than the supplied limit
value. The new get function reads back the effective value so that
the supplied limit value can be adjusted to the real limit.

Reviewed by: jeffr
MFC after: 1 week


# 209215 15-Jun-2010 sbruno

Add a new column to the output of vmstat -z to indicate the number
of times the system was forced to sleep when requesting a new allocation.

Expand the debugger hook, db_show_uma, to display these results as well.

This has proven to be very useful in out of memory situations when
it is not known why systems have become sluggish or fail in odd ways.

Reviewed by: rwatson alc
Approved by: scottl (mentor) peter
Obtained from: Yahoo Inc.


# 187681 25-Jan-2009 jeff

- Make the keg abstraction more complete. Permit a zone to have multiple
backend kegs so it may source compatible memory from multiple backends.
This is useful for cases such as NUMA or different layouts for the same
memory type.
- Provide a new api for adding new backend kegs to secondary zones.
- Provide a new flag for adjusting the layout of zones to stagger
allocations better across cache lines.

Sponsored by: Nokia


# 184546 01-Nov-2008 keramida

Various comment nits, and typos.


# 177921 04-Apr-2008 alc

Reintroduce UMA_SLAB_KMAP; however, change its spelling to
UMA_SLAB_KERNEL for consistency with its sibling UMA_SLAB_KMEM.
(UMA_SLAB_KMAP met its original demise in revision 1.30 of
vm/uma_core.c.) UMA_SLAB_KERNEL is now required by the jumbo frame
allocators. Without it, UMA cannot correctly return pages from the
jumbo frame zones to the VM system because it resets the pages' object
field to NULL instead of the kernel object. In more detail, the jumbo
frame zones are created with the option UMA_ZONE_REFCNT. This causes
UMA to overwrite the pages' object field with the address of the slab.
However, when UMA wants to release these pages, it doesn't know how to
restore the object field, so it sets it to NULL. This change teaches
UMA how to reset the object field to the kernel object.

Crashes reported by: kris
Fix tested by: kris
Fix discussed with: jeff
MFC after: 6 weeks


# 166654 11-Feb-2007 rwatson

Add uma_set_align() interface, which will be called at most once during
boot by MD code to indicated detected alignment preference. Rather than
cache alignment being encoded in UMA consumers by defining a global
alignment value of (16 - 1) in UMA_ALIGN_CACHE, UMA_ALIGN_CACHE is now
a special value (-1) that causes UMA to look at registered alignment. If
no preferred alignment has been selected by MD code, a default alignment
of (16 - 1) will be used.

Currently, no hardware platforms specify alignment; architecture
maintainers will need to modify MD startup code to specify an alignment
if desired. This must occur before initialization of UMA so that all UMA
zones pick up the requested alignment.

Reviewed by: jeff, alc
Submitted by: attilio


# 166213 24-Jan-2007 mohans

Fix for problems that occur when all mbuf clusters migrate to the mbuf packet
zone. Cluster allocations fail when this happens. Also processes that may have
blocked on cluster allocations will never be woken up. Thanks to rwatson for
an overview of the issue and pointers to the mbuma paper and his tool to dump
out UMA zones.

Reviewed by: andre@


# 165809 05-Jan-2007 jhb

- Add a new function uma_zone_exhausted() to see if a zone is full.
- Add a printf in swp_pager_meta_build() to warn if the swapzone becomes
exhausted so that there's at least a warning before a box that runs out
of swapzone space before running out of swap space deadlocks.

MFC after: 1 week
Reviwed by: alc


# 151104 08-Oct-2005 des

As alc pointed out to me, vm_page.c 1.305 was incomplete: uma_startup()
still uses the constant UMA_BOOT_PAGES. Change it to accept boot_pages
as an additional argument.

MFC after: 2 weeks


# 148371 24-Jul-2005 rwatson

Rename UMA_MAX_NAME to UTH_MAX_NAME, since it's a maximum in the
monitoring API, which might or might not be the same as the internal
maximum (currently none).

Export flag information on UMA zones -- in particular, whether or
not this is a secondary zone, and so the keg free count should be
considered in that light.

MFC after: 1 day


# 148078 16-Jul-2005 rwatson

Improve canonicalization of copyrights. Order copyrights by order of
assertion (jeff, bmilekic, rwatson).

Suggested ages ago by: bde
MFC after: 1 week


# 148072 16-Jul-2005 silby

Increase the flags field for kegs from a 16 to a 32 bit value;
we have exhausted all 16 flags.


# 148070 15-Jul-2005 rwatson

Track UMA(9) allocation failures by zone, and export via sysctl.

Requested by: victor cruceru <victor dot cruceru at gmail dot com>
MFC after: 1 week


# 147996 14-Jul-2005 rwatson

Introduce a new sysctl, vm.zone_stats, which exports UMA(9) allocator
statistics via a binary structure stream:

- Add structure 'uma_stream_header', which defines a stream version,
definition of MAXCPUs used in the stream, and the number of zone
records in the stream.

- Add structure 'uma_type_header', which defines the name, alignment,
size, resource allocation limits, current pages allocated, preferred
bucket size, and central zone + keg statistics.

- Add structure 'uma_percpu_stat', which, for each per-CPU cache,
includes the number of allocations and frees, as well as the number
of free items in the cache.

- When the sysctl is queried, return a stream header, followed by a
series of type descriptions, each consisting of a type header
followed by a series of MAXCPUs uma_percpu_stat structures holding
per-CPU allocation information. Typical values of MAXCPU will be
1 (UP compiled kernel) and 16 (SMP compiled kernel).

This query mechanism allows user space monitoring tools to extract
memory allocation statistics in a machine-readable form, and to do so
at a per-CPU granularity, allowing monitoring of allocation patterns
across CPUs in order to better understand the distribution of work and
memory flow over multiple CPUs.

While here, also export the number of UMA zones as a sysctl
vm.uma_count, in order to assist in sizing user swpace buffers to
receive the stream.

A follow-up commit of libmemstat(3), a library to monitor kernel memory
allocation, will occur in the next few days. This change directly
supports converting netstat(1)'s "-mb" mode to using UMA-sourced stats
rather than separately maintained mbuf allocator statistics.

MFC after: 1 week


# 141991 16-Feb-2005 bmilekic

Well, it seems that I pre-maturely removed the "All rights reserved"
statement from some files, so re-add it for the moment, until the
related legalese is sorted out. This change affects:

sys/kern/kern_mbuf.c
sys/vm/memguard.c
sys/vm/memguard.h
sys/vm/uma.h
sys/vm/uma_core.c
sys/vm/uma_dbg.c
sys/vm/uma_dbg.h
sys/vm/uma_int.h


# 139825 07-Jan-2005 imp

/* -> /*- for license, minor formatting changes


# 139318 25-Dec-2004 bmilekic

Add my copyright and update Jeff's copyright on UMA source files,
as per his request.

Discussed with: Jeffrey Roberson


# 132987 01-Aug-2004 green

* Add a "how" argument to uma_zone constructors and initialization functions
so that they know whether the allocation is supposed to be able to sleep
or not.
* Allow uma_zone constructors and initialation functions to return either
success or error. Almost all of the ones in the tree currently return
success unconditionally, but mbuf is a notable exception: the packet
zone constructor wants to be able to fail if it cannot suballocate an
mbuf cluster, and the mbuf allocators want to be able to fail in general
in a MAC kernel if the MAC mbuf initializer fails. This fixes the
panics people are seeing when they run out of memory for mbuf clusters.
* Allow debug.nosleepwithlocks on WITNESS to be disabled, without changing
the default.

Both bmilekic and jeff have reviewed the changes made to make failable
zone allocations work.


# 129913 31-May-2004 bmilekic

Fix a comment above uma_zsecond_create(), describing its arguments.
It doesn't take 'align' and 'flags' but 'master' instead, which is
a reference to the Master Zone, containing the backing Keg.

Pointed out by: Tim Robbins (tjr)


# 129906 31-May-2004 bmilekic

Bring in mbuma to replace mballoc.

mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.

Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.

mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.

From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.

Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.

This change removes more code than it adds.

A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.

Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)


# 120223 19-Sep-2003 jeff

- Fix the silly flag situation in UMA. Remove redundant ZFLAG/ZONE flags
by accepting the user supplied flags directly. Previously this was not
done so that flags for the same field would not be defined in two
different files. Add comments in each header instructing future
developers on how now to shoot their feet.
- Fix a test for !OFFPAGE which should have been a test for HASH. This would
have caused a panic if we had ever destructed a malloc zone. This also
opens up the possibility that other zones could use the vsetobj() method
rather than a hash.


# 111119 19-Feb-2003 imp

Back out M_* changes, per decision of the TRB.

Approved by: trb


# 109623 21-Jan-2003 alfred

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


# 105689 22-Oct-2002 sheldonh

Fix typo in comments (misspelled "necessary").


# 103623 19-Sep-2002 jeff

- Use my freebsd email alias in the copyright.
- Remove redundant instances of my email alias in the file summary.


# 103531 18-Sep-2002 jeff

- Split UMA_ZFLAG_OFFPAGE into UMA_ZFLAG_OFFPAGE and UMA_ZFLAG_HASH.
- Remove all instances of the mallochash.
- Stash the slab pointer in the vm page's object pointer when allocating from
the kmem_obj.
- Use the overloaded object pointer to find slabs for malloced memory.


# 100326 18-Jul-2002 markm

Void functions cannot return values.


# 98451 19-Jun-2002 jeff

- Remove bogus use of kmem_alloc that was inherited from the old zone
allocator.
- Properly set M_ZERO when talking to the back end page allocators for
non malloc zones. This forces us to zero fill pages when they are first
brought into a cache.
- Properly handle M_ZERO in uma_zalloc_internal. This fixes a problem where
per cpu buckets weren't always getting zeroed.


# 98361 17-Jun-2002 jeff

- Introduce the new M_NOVM option which tells uma to only check the currently
allocated slabs and bucket caches for free items. It will not go ask the vm
for pages. This differs from M_NOWAIT in that it not only doesn't block, it
doesn't even ask.

- Add a new zcreate option ZONE_VM, that sets the BUCKETCACHE zflag. This
tells uma that it should only allocate buckets out of the bucket cache, and
not from the VM. It does this by using the M_NOVM option to zalloc when
getting a new bucket. This is so that the VM doesn't recursively enter
itself while trying to allocate buckets for vm_map_entry zones. If there
are already allocated buckets when we get here we'll still use them but
otherwise we'll skip it.

- Use the ZONE_VM flag on vm map entries and pv entries on x86.


# 95925 02-May-2002 arr

- Changed the size element of uma_zctor_args to be size_t instead of int.
- Changed uma_zcreate to accept the size argument as a size_t intead of
int.

Approved by: jeff


# 95766 30-Apr-2002 jeff

Move the implementation of M_ZERO into UMA so that it can be passed to
uma_zalloc and friends. Remove this functionality from the malloc wrapper.

Document this change in uma.h and adjust variable names in uma_core.


# 95758 29-Apr-2002 jeff

Add a new zone flag UMA_ZONE_MTXCLASS. This puts the zone in it's own
mutex class. Currently this is only used for kmapentzone because kmapents
are are potentially allocated when freeing memory. This is not dangerous
though because no other allocations will be done while holding the
kmapentzone lock.


# 94161 08-Apr-2002 jeff

Implement uma_zdestroy(). It's prototype changed slightly. I decided that I
didn't like the wait argument and that if you were removing a zone it had
better be empty.

Also, I broke out part of hash_expand and made a seperate hash_free() for use
in uma_zdestroy.


# 94157 07-Apr-2002 jeff

Spelling correction; s/seperate/separate/g

Submitted by: eric


# 92758 20-Mar-2002 jeff

Add uma_zone_set_max() to add enforced limits to non vm obj backed zones.


# 92654 19-Mar-2002 jeff

This is the first part of the new kernel memory allocator. This replaces
malloc(9) and vm_zone with a slab like allocator.

Reviewed by: arch@