History log of /freebsd-current/sys/fs/tmpfs/tmpfs_subr.c
Revision Date Author Comments
# 6bd3f23a 26-May-2024 Ryan Libby <rlibby@FreeBSD.org>

tmpfs_node_init: use MTX_NEW on lock from uninitialized memory

Reported by: netchild
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D45364


# 46811949 11-May-2024 Konstantin Belousov <kib@FreeBSD.org>

tmpfs_destroy_vobject(): clear v_object under the object lock

Which allows tmpfs_pager_writecount_recalc() to reliably detect
reclaimed vnode and make its accesses to object->un_pager.swp.private
(== vp) safe against reclaim. Note that vnode instantiation already
assigns v_object under the object lock.

Reviewed by: markj
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D45119


# 6ada4e8a 08-May-2024 Konstantin Belousov <kib@FreeBSD.org>

swap-like pagers: assert that writemapping decrease does not pass zero

Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D45119


# 58d7ac11 06-May-2024 Konstantin Belousov <kib@FreeBSD.org>

tmpfs: recalculate OBJ_TMPFS_VREF on reinstantiating node' vnode

Reported and tested by: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D45119


# 6bb132ba 15-Apr-2024 Brooks Davis <brooks@FreeBSD.org>

Reduce reliance on sys/sysproto.h pollution

Add sys/errno.h, sys/malloc.h, sys/queue.h, and vm/uma.h as needed.

sys/sysproto.h currently includes sys/acl.h which currently includes
sys/param.h, sys/queue.h, and vm/uma.h which in turn bring in
sys/errno.h sys/malloc.h.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D44465


# 63659234 19-Dec-2023 Mike Karels <karels@FreeBSD.org>

tmpfs: increase memory reserve to a percent of available memory + swap

The tmpfs memory reserve defaulted to 4 MB, and other than that,
all of available memory + swap could be allocated to tmpfs files.
This was dangerous, as the page daemon attempts to keep some memory
free, using up swap, and then resulting in processes being killed.
Increase the reserve to a fraction of available memory + swap at
file system startup time. The limit is expressed as a percentage
of available memory + swap that can be used, and defaults to 95%.
The percentage can be changed via the vfs.tmpfs.memory_percent sysctl,
recomputing the reserve with the new percentage but the initial
available memory + swap. Note that the reserve can also be set
directly with an existing sysctl, ignoring the percentage. The
previous behavior can be specified by setting vfs.tmpfs.memory_percent
to 100.

Add sysctl for vfs.tmpfs.memory_percent and the pre-existing
vfs.tmpfs.memory_reserved to tmpfs(5).

PR: 275436
MFC after: 1 month
Reviewed by: rgrimes
Differential Revision: https://reviews.freebsd.org/D43011


# ed19c098 19-Dec-2023 Mike Karels <karels@FreeBSD.org>

tmpfs: enforce size limit on writes when file system size is default

tmpfs enforced the file system size limit on writes for file systems
with a specified size, but not when the size was the default. Add
enforcement when the size is default: do not allocate additional
pages if the available memory + swap falls to the reserve level.
Note, enforcement is also done when attempting to create a file,
both with and without an explicit file system size.

PR: 275436
MFC after: 1 month
Reviewed by: cy
Differential Revision: https://reviews.freebsd.org/D43010


# fdafd315 24-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Automated cleanup of cdefs and other formatting

Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.

Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/

Sponsored by: Netflix


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 0f613ab8 05-Aug-2023 Konstantin Belousov <kib@FreeBSD.org>

tmpfs: add a knob to enable pgcache read for mount

Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41334


# ba8cc6d7 12-Mar-2023 Mateusz Guzik <mjg@FreeBSD.org>

vfs: use __enum_uint8 for vtype and vstate

This whacks hackery around only reading v_type once.

Bump __FreeBSD_version to 1400093


# b61a5730 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-NetBSD identifier is obsolete, drop -NetBSD

The SPDX folks have obsoleted the BSD-2-Clause-NetBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix


# b918ee2c 13-Feb-2023 Konstantin Belousov <kib@FreeBSD.org>

tmpfs: remove IFF macro

Requested by: mjg
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D38576


# 9ff2fbdf 13-Feb-2023 Konstantin Belousov <kib@FreeBSD.org>

tmpfs: remove bogus MPASS(VOP_ISLOCKED(vp)) asserts

VOP_ISLOCKED() does not return bool, its only reliable use it to check
that the vnode is exclusively locked by the calling thread. Almost all
asserts of this form repeated auto-generated assertions from
vnode_if.src for VOPs, in the incorrect way.

In two places where the assertions would be meaningful, convert them to
ASSERT_VOP_LOCKED() statements.

Reviewed by: markj, mjg
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D38576


# 56242a4c 05-Dec-2022 Fedor Uporov <fsu@FreeBSD.org>

Add extended attributes

The extattrs follows semantic of ufs, mean it cannot
be set to char/block devices and fifos. The attributes
are allocated using regular malloc with M_WAITOK
allocation with the own malloc tag M_TMPFSEA. The memory
consumed by extended attributes is limited to avoid OOM
triggereing by tmpfs_mount variable tm_ea_memory_max,
which is set initialy to 16 MB. The extended attributes
entries are stored as linked list in the tmpfs node.
The mount point lock is required only under setextattr
and deleteextattr to update extended attributes
memory-inuse counter, all other operations are doing
under vnode lock.

Reviewed by: kib
MFC after: 2 week
Differential revision: https://reviews.freebsd.org/D38052


# 829f0bcb 19-Dec-2022 Mateusz Guzik <mjg@FreeBSD.org>

vfs: add the concept of vnode state transitions

To quote from a comment above vput_final:
<quote>
* XXX Some filesystems pass in an exclusively locked vnode and strongly depend
* on the lock being held all the way until VOP_INACTIVE. This in particular
* happens with UFS which adds half-constructed vnodes to the hash, where they
* can be found by other code.
</quote>

As is there is no mechanism which allows filesystems to denote that a
vnode is fully initialized, consequently problems like the above are
only found the hard way(tm).

Add rudimentary support for state transitions, which in particular allow
to assert the vnode is not legally unlocked until its fate is decided
(either construction finishes or vgone is called to abort it).

The new field lands in a 1-byte hole, thus it does not grow the struct.

Bump __FreeBSD_version to 1400077

Reviewed by: kib (previous version)
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D37759


# 860399eb 23-Dec-2022 Konstantin Belousov <kib@FreeBSD.org>

tmpfs: update changed/modified timestamps for truncates that do not change size

While there, move all error checks into the common place at the start,
and eliminate the 'out' label.

PR: 268528
Analyzed and tested by: Mark Millard <marklmi@yahoo.com>
Reviewed by: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D37866


# 37aea264 20-Oct-2022 Konstantin Belousov <kib@FreeBSD.org>

tmpfs: for used pages, account really allocated pages, instead of file sizes

This makes tmpfs size accounting correct for the sparce files. Also
correct report st_blocks/va_bytes. Previously the reported value did not
accounted for the swapped out pages.

PR: 223015
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D37097


# d9dc64f1 20-Oct-2022 Konstantin Belousov <kib@FreeBSD.org>

tmpfs: make vm_object point to the tmpfs node instead of vnode

The vnode could be reclaimed and allocated again during the lifecycle of
the node, but the node cannot. Also, referencing the node would allow
to reach it and tmpfs mount data from the object, regardless of the
state of the possibly absent vnode.

Still use swp_tmpfs for back-pointer, instead of using handle. Use of
named swap objects would incur taking the sw_alloc_sx on node allocation
and deallocation.

swp_tmpfs is renamed to swp_priv to remove the last bit of tmpfs in vm/.

Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D37097


# 7f055843 17-Oct-2022 Konstantin Belousov <kib@FreeBSD.org>

tmpfs: change return type of tmpfs_pages_check_avail() to bool

Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D37024


# b5b16659 18-Sep-2022 Konstantin Belousov <kib@FreeBSD.org>

tmpfs: disallow truncation to set file size past RLIMIT_FSIZE

PR: 164793
Reviewed by: asomers, jah, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D36625


# 0f01fb01 18-Sep-2022 Konstantin Belousov <kib@FreeBSD.org>

tmpfs_subr.c: some style

Use 'td' as the local thread name.
Wrap long lines.
Remove unneeded blank lines.

Reviewed by: asomers, jah, markj
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D36625


# 5b5b7e2c 17-Sep-2022 Mateusz Guzik <mjg@FreeBSD.org>

vfs: always retain path buffer after lookup

This removes some of the complexity needed to maintain HASBUF and
allows for removing injecting SAVENAME by filesystems.

Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D36542


# d81e1b44 04-Sep-2022 Gordon Bergling <gbe@FreeBSD.org>

tmpfs(5): Remove a double word in a source code comment

- s/the the/the/

MFC after: 3 days


# 66c5fbca 27-Jan-2022 Konstantin Belousov <kib@FreeBSD.org>

insmntque1(): remove useless arguments

Also remove once-used functions to clean up after failed insmntque1(),
which were destructor callbacks in previous life.

Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D34071


# 2a7e4cf8 27-Jan-2022 Mateusz Guzik <mjg@FreeBSD.org>

Revert b58ca5df0bb7 ("vfs: remove the now unused insmntque1")

I was somehow convinced that insmntque calls insmntque1 with a NULL
destructor. Unfortunately this worked well enough to not immediately
blow up in simple testing.

Keep not using the destructor in previously patched filesystems though
as it avoids unnecessary casts.

Noted by: kib
Reported by: pho


# 5ccdfdab 26-Jan-2022 Mateusz Guzik <mjg@FreeBSD.org>

tmpfs: stop using insmntque1

It adds nothing of value over insmntque.


# b214fcce 13-Dec-2021 Alan Somers <asomers@FreeBSD.org>

Change VOP_READDIR's cookies argument to a **uint64_t

The cookies argument is only used by the NFS server. NFSv2 defines the
cookie as 32 bits on the wire, but NFSv3 increased it to 64 bits. Our
VOP_READDIR, however, has always defined it as u_long, which is 32 bits
on some architectures. Change it to 64 bits on all architectures. This
doesn't matter for any in-tree file systems, but it matters for some
FUSE file systems that use 64-bit directory cookies.

PR: 260375
Reviewed by: rmacklem
Differential Revision: https://reviews.freebsd.org/D33404


# 4dcdf398 17-May-2021 Mateusz Guzik <mjg@FreeBSD.org>

vfs: replace the MNTK_TEXT_REFS flag with VIRF_TEXT_REF

This allows to stop maintaing the VI_TEXT_REF flag and consequently
opens up fully lockless v_writecount adjustment.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D33127


# 80bca63c 19-Oct-2021 Konstantin Belousov <kib@FreeBSD.org>

tmpfs: remove write-only variables

Reviewed by: imp, mjg
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32577


# 8d7cd10b 25-Aug-2021 Ka Ho Ng <khng@FreeBSD.org>

tmpfs: Implement VOP_DEALLOCATE

Implementing VOP_DEALLOCATE to allow hole-punching in the same manner as
POSIX shared memory's fspacectl(SPACECTL_DEALLOC) support.

Sponsored by: The FreeBSD Foundation
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31684


# 399be910 25-Aug-2021 Ka Ho Ng <khng@FreeBSD.org>

tmpfs: Move partial page invalidation to a separate helper

The partial page invalidation code is factored out to be a separate
helper from tmpfs_reg_resize().

Sponsored by: The FreeBSD Foundation
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31683


# c12118f6 24-Aug-2021 Ka Ho Ng <khng@FreeBSD.org>

tmpfs: Fix styles

A lot of return statements were in the wrong style before this commit.

Sponsored by: The FreeBSD Foundation
MFC after: 3 days


# f4aa6452 27-May-2021 Mateusz Guzik <mjg@FreeBSD.org>

tmpfs: save on relocking the allnode lock in tmpfs_free_node_locked


# 439d942b 28-May-2021 Mateusz Guzik <mjg@FreeBSD.org>

tmpfs: drop a redundant NULL check in tmpfs_alloc_vp


# 7fbeaf33 29-May-2021 Mateusz Guzik <mjg@FreeBSD.org>

tmpfs: drop useless parent locking from tmpfs_dir_getdotdotdent

The id field is immutable until the node gets freed.


# eec2e4ef 07-May-2021 Mateusz Guzik <mjg@FreeBSD.org>

tmpfs: reimplement the mtime scan to use the lazy list

Tested by: pho
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D30065


# 28bc23ab 07-May-2021 Konstantin Belousov <kib@FreeBSD.org>

tmpfs: dynamically register tmpfs pager

Remove OBJT_SWAP_TMPFS. Move tmpfs-specific swap pager bits into
tmpfs_subr.c.

There is no longer any code to directly support tmpfs in sys/vm, most
tmpfs knowledge is shared by non-anon swap object type implementation.
The tmpfs-specific methods are provided by registered tmpfs pager, which
inherits from the swap pager.

Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30168


# 4b8365d7 30-Apr-2021 Konstantin Belousov <kib@FreeBSD.org>

Add OBJT_SWAP_TMPFS pager

This is OBJT_SWAP pager, specialized for tmpfs. Right now, both swap pager
and generic vm code have to explicitly handle swap objects which are tmpfs
vnode v_object, in the special ways. Replace (almost) all such places with
proper methods.

Since VM still needs a notion of the 'swap object', regardless of its
use, add yet another type-classification flag OBJ_SWAP. Set it in
vm_object_allocate() where other type-class flags are set.

This change almost completely eliminates the knowledge of tmpfs from VM,
and opens a way to make OBJT_SWAP_TMPFS loadable from tmpfs.ko.

Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30070


# c09f7992 25-Jan-2021 Mateusz Guzik <mjg@FreeBSD.org>

tmpfs: drop acq fence now that vn_load_v_data_smr has consume semantics


# cc96f92a 25-Jan-2021 Mateusz Guzik <mjg@FreeBSD.org>

atomic: make atomic_store_ptr type-aware


# 618029af 23-Jan-2021 Mateusz Guzik <mjg@FreeBSD.org>

tmpfs: add support for lockless symlink lookup

Reviewed by: kib (previous version)
Tested by: pho (previous version)
Differential Revision: https://reviews.freebsd.org/D27488


# 90f580b9 03-Jan-2021 Mark Johnston <markj@FreeBSD.org>

Ensure that dirent's d_off field is initialized

We have the d_off field in struct dirent for providing the seek offset
of the next directory entry. Several filesystems were not initializing
the field, which ends up being copied out to userland.

Reported by: Syed Faraz Abrar <faraz@elttam.com>
Reviewed by: kib
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27792


# 3e506a67 27-Dec-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: add v_irflag accessors

Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D27793


# 7c58c37e 30-Oct-2020 Mateusz Guzik <mjg@FreeBSD.org>

tmpfs: change tmpfs dirent zone into a malloc type

It is 64 bytes.


# f9cc8410 18-Sep-2020 Eric van Gyzen <vangyzen@FreeBSD.org>

vm_ooffset_t is now unsigned

vm_ooffset_t is now unsigned. Remove some tests for negative values,
or make other adjustments accordingly.

Reported by: Coverity
Reviewed by: kib markj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D26214


# 016b7c7e 16-Sep-2020 Konstantin Belousov <kib@FreeBSD.org>

tmpfs: restore atime updates for reads from page cache.

Split TMPFS_NODE_ACCCESSED bit into dedicated byte that can be updated
atomically without locks or (locked) atomics.

tn_update_getattr() change also contains unrelated bug fix.

Reported by: lwhsu
PR: 249362
Reviewed by: markj (previous version)
Discussed with: mjg
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D26451


# 23f90714 16-Sep-2020 Konstantin Belousov <kib@FreeBSD.org>

Style.

Sponsored by: The FreeBSD Foundation
MFC after: 3 days


# 081e36e7 15-Sep-2020 Konstantin Belousov <kib@FreeBSD.org>

Add tmpfs page cache read support.

Or it could be explained as lockless (for vnode lock) reads. Reads
are performed from the node tn_obj object. Tmpfs regular vnode object
lifecycle is significantly different from the normal OBJT_VNODE: it is
alive as far as ref_count > 0.

Ensure liveness of the tmpfs VREG node and consequently v_object
inside VOP_READ_PGCACHE by referencing tmpfs node in tmpfs_open().
Provide custom tmpfs fo_close() method on file, to ensure that close
is paired with open.

Add tmpfs VOP_READ_PGCACHE that takes advantage of all tmpfs quirks.
It is quite cheap in code size sense to support page-ins for read for
tmpfs even if we do not own tmpfs vnode lock. Also, we can handle
holes in tmpfs node without additional efforts, and do not have
limitation of the transfer size.

Reviewed by: markj
Discussed with and benchmarked by: mjg (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D26346


# 4601f5f5 15-Sep-2020 Konstantin Belousov <kib@FreeBSD.org>

Microoptimize tmpfs node ref/unref by using atomics.

Avoid tmpfs mount and node locks when ref count is greater than zero,
which is the case until node is being destroyed by unlink or unmount.

Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D26346


# 586ee69f 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

fs: clean up empty lines in .c and .h files


# 1abe3656 16-Aug-2020 Mateusz Guzik <mjg@FreeBSD.org>

tmpfs: use vget_prep/vget_finish instead of vget + vnode


# a92a971b 16-Aug-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: remove the thread argument from vget

It was already asserted to be curthread.

Semantic patch:

@@

expression arg1, arg2, arg3;

@@

- vget(arg1, arg2, arg3)
+ vget(arg1, arg2)


# 172ffe70 25-Jul-2020 Mateusz Guzik <mjg@FreeBSD.org>

tmpfs: add support for lockless lookup

Reviewed by: kib
Tested by: pho (in a patchset)
Differential Revision: https://reviews.freebsd.org/D25580


# 84242cf6 25-Jun-2020 Mark Johnston <markj@FreeBSD.org>

Call swap_pager_freespace() from vm_object_page_remove().

All vm_object_page_remove() callers, except
linux_invalidate_mapping_pages() in the LinuxKPI, free swap space when
removing a range of pages from an object. The LinuxKPI case appears to
be an unintentional omission that could result in leaked swap blocks, so
unconditionally free swap space in vm_object_page_remove() to protect
against similar bugs in the future.

Reviewed by: alc, kib
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25329


# 7029da5c 26-Feb-2020 Pawel Biernacki <kaktus@FreeBSD.org>

Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)

r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718


# 2abdae33 03-Feb-2020 Mateusz Guzik <mjg@FreeBSD.org>

tmpfs: inline tmpfs_update

It was generated to be just a jumping off point to tmpfs_itimes.

While here provide a dedicated variant for getattr since we normally don't
expect to need to the update from that caller.


# d6e13f3b 19-Jan-2020 Jeff Roberson <jeff@FreeBSD.org>

Don't hold the object lock while calling getpages.

The vnode pager does not want the object lock held. Moving this out allows
further object lock scope reduction in callers. While here add some missing
paging in progress calls and an assert. The object handle is now protected
explicitly with pip.

Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D23033


# 2a829749 14-Jan-2020 Mateusz Guzik <mjg@FreeBSD.org>

tmpfs: add missing CLTFLAG_MPSAFE annotation


# 9f5632e6 28-Dec-2019 Mark Johnston <markj@FreeBSD.org>

Remove page locking for queue operations.

With the previous reviews, the page lock is no longer required in order
to perform queue operations on a page. It is also no longer needed in
the page queue scans. This change effectively eliminates remaining uses
of the page lock and also the false sharing caused by multiple pages
sharing a page lock.

Reviewed by: jeff
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D22885


# a8081778 14-Dec-2019 Jeff Roberson <jeff@FreeBSD.org>

Add a deferred free mechanism for freeing swap space that does not require
an exclusive object lock.

Previously swap space was freed on a best effort basis when a page that
had valid swap was dirtied, thus invalidating the swap copy. This may be
done inconsistently and requires the object lock which is not always
convenient.

Instead, track when swap space is present. The first dirty is responsible
for deleting space or setting PGA_SWAP_FREE which will trigger background
scans to free the swap space.

Simplify the locking in vm_fault_dirty() now that we can reliably identify
the first dirty.

Discussed with: alc, kib, markj
Differential Revision: https://reviews.freebsd.org/D22654


# c8b29d12 11-Dec-2019 Mateusz Guzik <mjg@FreeBSD.org>

vfs: locking primitives which elide ->v_vnlock and shared locking disablement

Both of these features are not needed by many consumers and result in avoidable
reads which in turn puts them on profiles due to cache-line ping ponging.

On top of that the current lockgmr entry point is slower than necessary
single-threaded. As an attempted clean up preparing for other changes,
provide new routines which don't support any of the aforementioned features.

With these patches in place vop_stdlock and vop_stdunlock disappear from
flamegraphs during -j 104 buildkernel.

Reviewed by: jeff (previous version)
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22665


# abd80ddb 08-Dec-2019 Mateusz Guzik <mjg@FreeBSD.org>

vfs: introduce v_irflag and make v_type smaller

The current vnode layout is not smp-friendly by having frequently read data
avoidably sharing cachelines with very frequently modified fields. In
particular v_iflag inspected for VI_DOOMED can be found in the same line with
v_usecount. Instead make it available in the same cacheline as the v_op, v_data
and v_type which all get read all the time.

v_type is avoidably 4 bytes while the necessary data will easily fit in 1.
Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new
flag field with a new value: VIRF_DOOMED.

Reviewed by: kib, jeff
Differential Revision: https://reviews.freebsd.org/D22715


# a51c8071 04-Dec-2019 Konstantin Belousov <kib@FreeBSD.org>

Stop using per-mount tmpfs zones.

Requested and reviewed by: jeff
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D22643


# 63967687 19-Nov-2019 Jeff Roberson <jeff@FreeBSD.org>

Simplify anonymous memory handling with an OBJ_ANON flag. This eliminates
reudundant complicated checks and additional locking required only for
anonymous memory. Introduce vm_object_allocate_anon() to create these
objects. DEFAULT and SWAP objects now have the correct settings for
non-anonymous consumers and so individual consumers need not modify the
default flags to create super-pages and avoid ONEMAPPING/NOSPLIT.

Reviewed by: alc, dougm, kib, markj
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22119


# 67d0e293 29-Oct-2019 Jeff Roberson <jeff@FreeBSD.org>

Replace OBJ_MIGHTBEDIRTY with a system using atomics. Remove the TMPFS_DIRTY
flag and use the same system.

This enables further fault locking improvements by allowing more faults to
proceed with a shared lock.

Reviewed by: kib
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22116


# 0012f373 14-Oct-2019 Jeff Roberson <jeff@FreeBSD.org>

(4/6) Protect page valid with the busy lock.

Atomics are used for page busy and valid state when the shared busy is
held. The details of the locking protocol and valid and dirty
synchronization are in the updated vm_page.h comments.

Reviewed by: kib, markj
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21594


# 63e97555 14-Oct-2019 Jeff Roberson <jeff@FreeBSD.org>

(1/6) Replace busy checks with acquires where it is trival to do so.

This is the first in a series of patches that promotes the page busy field
to a first class lock that no longer requires the object lock for
consistency.

Reviewed by: kib, markj
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21548


# c5dac63c 03-Oct-2019 Konstantin Belousov <kib@FreeBSD.org>

tmpfs_readdir(): unlock the locked node.

During readdir() we guarantee that the tn_dir.tn_parent does not go
away, but it might be replaced by a parallel rename. Read tn_parent
only once, then use the cached value.

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 4cace859 16-Sep-2019 Mateusz Guzik <mjg@FreeBSD.org>

vfs: convert struct mount counters to per-cpu

There are 3 counters modified all the time in this structure - one for
keeping the structure alive, one for preventing unmount and one for
tracking active writers. Exact values of these counters are very rarely
needed, which makes them a prime candidate for conversion to a per-cpu
scheme, resulting in much better performance.

Sample benchmark performing fstatfs (modifying 2 out of 3 counters) on
a 104-way 2 socket Skylake system:
before: 852393 ops/s
after: 76682077 ops/s

Reviewed by: kib, jeff
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21637


# fee2a2fa 09-Sep-2019 Mark Johnston <markj@FreeBSD.org>

Change synchonization rules for vm_page reference counting.

There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator. In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well. These references are protected by the page lock, which must
therefore be acquired for many per-page operations. This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.

Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter. A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held. As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.

The vm_page_wire() and vm_page_unwire() KPIs are changed. The former
requires that either the object lock or the busy lock is held. The
latter no longer has a return value and may free the page if it releases
the last reference to that page. vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold(). It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler. vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state). In particular, synchronization details are no longer
leaked into the caller.

The change excises the page lock from several frequently executed code
paths. In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock. In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.

__FreeBSD_version is bumped. The DRM ports have been updated to
accomodate the KPI changes.

Reviewed by: jeff (earlier version)
Tested by: gallatin (earlier version), pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20486


# 3c93d227 05-Jun-2019 Konstantin Belousov <kib@FreeBSD.org>

Manually clear text references on reclaim for nullfs and tmpfs.

Both filesystems do no use vnode_pager_dealloc() which would handle
this case otherwise. Nullfs because vnode vm_object handle never
points to nullfs vnode. Tmpfs because its vm_object is never vnode
object at all.

Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# e1cdc30f 02-Apr-2019 Konstantin Belousov <kib@FreeBSD.org>

tmpfs: ignore tmpfs_set_status() if mount point is read-only.

In particular, this fixes atimes still changing for ro tmpfs.
tmpfs_set_status() gains tmpfs_mount * argument.

Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D19737


# ae265753 02-Apr-2019 Konstantin Belousov <kib@FreeBSD.org>

Block creation of the new nodes for read-only tmpfs mounts.

Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D19737


# cc426dd3 11-Dec-2018 Mateusz Guzik <mjg@FreeBSD.org>

Remove unused argument to priv_check_cred.

Patch mostly generated with cocinnelle:

@@
expression E1,E2;
@@

- priv_check_cred(E1,E2,0)
+ priv_check_cred(E1,E2)

Sponsored by: The FreeBSD Foundation


# 6d2e2df7 23-Nov-2018 Mark Johnston <markj@FreeBSD.org>

Ensure that directory entry padding bytes are zeroed.

Directory entries must be padded to maintain alignment; in many
filesystems the padding was not initialized, resulting in stack
memory being copied out to userspace. With the ino64 work there
are also some explicit pad fields in struct dirent. Add a subroutine
to clear these bytes and use it in the in-tree filesystems. The
NFS client is omitted for now as it was fixed separately in r340787.

Reported by: Thomas Barabosch, Fraunhofer FKIE
Reviewed by: kib
MFC after: 3 days
Sponsored by: The FreeBSD Foundation


# 30e0cf49 20-Nov-2018 Mateusz Guzik <mjg@FreeBSD.org>

tmpfs: use unr64 for inode numbers

Sponsored by: The FreeBSD Foundation


# 1493c2ee 02-Nov-2018 Brooks Davis <brooks@FreeBSD.org>

Make vop_symlink take a const target path.

This will enable callers to take const paths as part of syscall
decleration improvements.

Where doing so is easy and non-distruptive carry the const through
implementations. In UFS the value is passed to an interface that must
take non-const values. In ZFS, const poisoning would touch code shared
with upstream and it's not worth adding diffs.

Bump __FreeBSD_version for external API consumers.

Reviewed by: kib (prior version)
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D17805


# 19fa89e9 25-Aug-2018 Mark Murray <markm@FreeBSD.org>

Remove the Yarrow PRNG algorithm option in accordance with due notice
given in random(4).

This includes updating of the relevant man pages, and no-longer-used
harvesting parameters.

Ensure that the pseudo-unit-test still does something useful, now also
with the "other" algorithm instead of Yarrow.

PR: 230870
Reviewed by: cem
Approved by: so(delphij,gtetlow)
Approved by: re(marius)
Differential Revision: https://reviews.freebsd.org/D16898


# e2068d0b 06-Feb-2018 Jeff Roberson <jeff@FreeBSD.org>

Use per-domain locks for vm page queue free. Move paging control from
global to per-domain state. Protect reservations with the free lock
from the domain that they belong to. Refactor to make vm domains more
of a first class object.

Reviewed by: markj, kib, gallatin
Tested by: pho
Sponsored by: Netflix, Dell/EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14000


# 35b1a3ab 19-Dec-2017 John Baldwin <jhb@FreeBSD.org>

Update tmpfs link count handling for ino64.

Add a new TMPFS_LINK_MAX to use in place of LINK_MAX for link overflow
checks and pathconf() reporting. Rather than storing a full 64-bit
link count, just use a plain int and use INT_MAX as TMPFS_LINK_MAX.

Discussed with: bde
Reviewed by: kib (part of a larger patch)
Sponsored by: Chelsio Communications


# 135beaf6 05-Dec-2017 Gleb Smirnoff <glebius@FreeBSD.org>

Reduce pollution via tmpfs.h.


# d63027b6 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/fs: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.


# 8d6fbbb8 07-Nov-2017 Jeff Roberson <jeff@FreeBSD.org>

Replace manyinstances of VM_WAIT with blocking page allocation flags
similar to the kernel memory allocator.

This simplifies NUMA allocation because the domain will be known at wait
time and races between failure and sleeping are eliminated. This also
reduces boilerplate code and simplifies callers.

A wait primitive is supplied for uma zones for similar reasons. This
eliminates some non-specific VM_WAIT calls in favor of more explicit
sleeps that may be satisfied without new pages.

Reviewed by: alc, kib, markj
Tested by: pho
Sponsored by: Netflix, Dell/EMC Isilon


# 9aaf913e 11-Oct-2017 Matt Joras <mjoras@FreeBSD.org>

When unmounting a tmpfs, do not call free_unr.

tmpfs uses unr(9) to allocate inodes. Previously when unmounting it
would individually free the units when it freed each vnode. This is
unnecessary as we can use the newly-added unrhdr_clear function to clear
out the unr in onde go. This measurably reduces the time to unmount a
tmpfs with many files.

Reviewed by: cem, lidl
Approved by: rstone (mentor)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12591


# e3e10c39 30-Sep-2017 Mateusz Guzik <mjg@FreeBSD.org>

tmpfs: skip zero-sized page count updates

Such updates consisted of vast majority of modificiations, especially
in tmpfs_reg_resize.

For the case where page count did no change and the size grew we only
need to update tn_size. Use this fact to avoid vm object lock/relock.

MFC after: 1 week


# 00ac6a98 19-Jan-2017 Konstantin Belousov <kib@FreeBSD.org>

Add mount option for tmpfs(5) to not use namecache.

The option "nonc" disables using of namecache for the created mount,
by default namecache is used. The rationale for the option is that
namecache duplicates the information which is already kept in memory
by tmpfs. Since it believed that namecache scales better than tmpfs,
or will scale better, do not enable the option by default. On the
other hand, smaller machines may benefit from lesser namecache
pressure.

Discussed with: mjg
Tested by: pho (as part of larger patch)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# b4ba3b64 19-Jan-2017 Konstantin Belousov <kib@FreeBSD.org>

VNON nodes cannot exist.

Tested by: pho (as part of larger patch)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 64c25043 19-Jan-2017 Konstantin Belousov <kib@FreeBSD.org>

Refcount tmpfs nodes and mount structures.

On dotdot lookup and fhtovp operations, it is possible for the file
represented by tmpfs node to be removed after the thread calculated
the pointer. In this case, tmpfs_alloc_vp() accesses freed memory.

Introduce the reference count on the nodes. The allnodes list from
tmpfs mount owns 1 reference, and threads performing unlocked
operations on the node, add one transient reference. Similarly, since
struct tmpfs_mount maintains the list where nodes are enlisted,
refcount it by one reference from struct mount and one reference from
each node on the list. Both nodes and tmpfs_mounts are removed when
refcount goes to zero.

Note that this means that nodes and tmpfs_mounts might survive some
time after the node is deleted or tmpfs_unmount() finished. The
tmpfs_alloc_vp() in these cases returns error either due to node
removal (tn_nlinks == 0) or because of insmntque1(9) error.

Tested by: pho (as part of larger patch)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 1c07d69b 19-Jan-2017 Konstantin Belousov <kib@FreeBSD.org>

Make tmpfs directory cursor available outside tmpfs_subr.c.

Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# e7e6c820 19-Jan-2017 Konstantin Belousov <kib@FreeBSD.org>

Rework some tmpfs lock assertions.

Remove TMPFS_ASSERT_ELOCKED(). Its claims are already stated by other
asserts nearby and by VFS guarantees.
Change TMPFS_ASSERT_LOCKED() and one inlined place to use
ASSERT_VOP_(E)LOCKED() instead of hand-rolled imprecise asserts.

Tested by: pho (as part of the larger patch)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# bba7ed20 19-Jan-2017 Konstantin Belousov <kib@FreeBSD.org>

Style fixes and comment updates.

Edit comments which explain no longer relevant details, and add
locking annotations to the struct tmpfs_node members.

Tested by: pho (as part of the larger patch)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# ed2159c9 13-Jan-2017 Mateusz Guzik <mjg@FreeBSD.org>

tmpfs: manage tm_pages_used with atomics

Reviewed by: kib (previous version)


# 3b622fc8 06-Jan-2017 Mateusz Guzik <mjg@FreeBSD.org>

tmpfs: perform a lockless check in tmpfs_itimes

Most of the time the status is 0 as the function is repeatedly
called from tmpfs_getattr.


# 5dc11286 06-Jan-2017 Konstantin Belousov <kib@FreeBSD.org>

Lock tmpfs node tn_status updates done under the shared vnode lock.

If tmpfs vnode is only shared locked, tn_status field still needs
updates to note the access time modification. Use the same locking
scheme as for UFS, protect tn_status with the node interlock + shared
vnode lock.

Fix nearby style.

Noted and reviewed by: mjg
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 305b4229 06-Jan-2017 Konstantin Belousov <kib@FreeBSD.org>

Use vnode lock assertion expression, and upgrade it to assert the
required exclusive state of the vnode lock in tmpfs chflags, chmod,
chown, chsize, chtimes operations.

Fix nearby style.

Reviewed by: mjg
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 2d612d2d 11-Dec-2016 Alan Cox <alc@FreeBSD.org>

When tmpfs and POSIX shm pagein a page for the sole purpose of performing
truncation, immediately queue the page for asynchronous laundering rather
than making the page pass through inactive queue first.

Reviewed by: kib, markj


# bba39b9a 22-Nov-2016 Alan Cox <alc@FreeBSD.org>

Remove PG_CACHED-related fields from struct vmmeter, because they are no
longer used. More precisely, they are always zero because the code that
decremented and incremented them no longer exists.

Bump __FreeBSD_version to mark this change.

Reviewed by: kib, markj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8583


# 7667839a 15-Nov-2016 Alan Cox <alc@FreeBSD.org>

Remove most of the code for implementing PG_CACHED pages. (This change does
not remove user-space visible fields from vm_cnt or all of the references to
cached pages from comments. Those changes will come later.)

Reviewed by: kib, markj
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8497


# 15ad3e51 10-Aug-2016 Konstantin Belousov <kib@FreeBSD.org>

Convert another tmpfs assert into runtime check.

The offset of the directory file, passed to getdirentries(2) syscall,
is user-controllable. The value of the offset must not be asserted,
instead the invalid value should be checked and rejected if invalid.

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 74b8d63d 10-Apr-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

Cleanup unnecessary semicolons from the kernel.

Found with devel/coccinelle.


# b0cd2017 16-Dec-2015 Gleb Smirnoff <glebius@FreeBSD.org>

A change to KPI of vm_pager_get_pages() and underlying VOP_GETPAGES().

o With new KPI consumers can request contiguous ranges of pages, and
unlike before, all pages will be kept busied on return, like it was
done before with the 'reqpage' only. Now the reqpage goes away. With
new interface it is easier to implement code protected from race
conditions.

Such arrayed requests for now should be preceeded by a call to
vm_pager_haspage() to make sure that request is possible. This
could be improved later, making vm_pager_haspage() obsolete.

Strenghtening the promises on the business of the array of pages
allows us to remove such hacks as swp_pager_free_nrpage() and
vm_pager_free_nonreq().

o New KPI accepts two integer pointers that may optionally point at
values for read ahead and read behind, that a pager may do, if it
can. These pages are completely owned by pager, and not controlled
by the caller.

This shifts the UFS-specific readahead logic from vm_fault.c, which
should be file system agnostic, into vnode_pager.c. It also removes
one VOP_BMAP() request per hard fault.

Discussed with: kib, alc, jeff, scottl
Sponsored by: Nginx, Inc.
Sponsored by: Netflix


# 382353e2 26-Jul-2015 Christian Brueffer <brueffer@FreeBSD.org>

In tmpfs_chtimes(), remove checks on the nanosecond level when
determining whether a node changed.

Other filesystems, e.g., UFS, only check on seconds, when determining
whether something changed.

This also corrects the birthtime case, where we checked tv_nsec
twice, instead of tv_sec and tv_nsec (PR).

PR: 201284
Submitted by: David Binderman
Patch suggested by: kib
Reviewed by: kib
MFC after: 2 weeks
Committed from: Essen FreeBSD Hackathon


# d1b06863 30-Jun-2015 Mark Murray <markm@FreeBSD.org>

Huge cleanup of random(4) code.

* GENERAL
- Update copyright.
- Make kernel options for RANDOM_YARROW and RANDOM_DUMMY. Set
neither to ON, which means we want Fortuna
- If there is no 'device random' in the kernel, there will be NO
random(4) device in the kernel, and the KERN_ARND sysctl will
return nothing. With RANDOM_DUMMY there will be a random(4) that
always blocks.
- Repair kern.arandom (KERN_ARND sysctl). The old version went
through arc4random(9) and was a bit weird.
- Adjust arc4random stirring a bit - the existing code looks a little
suspect.
- Fix the nasty pre- and post-read overloading by providing explictit
functions to do these tasks.
- Redo read_random(9) so as to duplicate random(4)'s read internals.
This makes it a first-class citizen rather than a hack.
- Move stuff out of locked regions when it does not need to be
there.
- Trim RANDOM_DEBUG printfs. Some are excess to requirement, some
behind boot verbose.
- Use SYSINIT to sequence the startup.
- Fix init/deinit sysctl stuff.
- Make relevant sysctls also tunables.
- Add different harvesting "styles" to allow for different requirements
(direct, queue, fast).
- Add harvesting of FFS atime events. This needs to be checked for
weighing down the FS code.
- Add harvesting of slab allocator events. This needs to be checked for
weighing down the allocator code.
- Fix the random(9) manpage.
- Loadable modules are not present for now. These will be re-engineered
when the dust settles.
- Use macros for locks.
- Fix comments.

* src/share/man/...
- Update the man pages.

* src/etc/...
- The startup/shutdown work is done in D2924.

* src/UPDATING
- Add UPDATING announcement.

* src/sys/dev/random/build.sh
- Add copyright.
- Add libz for unit tests.

* src/sys/dev/random/dummy.c
- Remove; no longer needed. Functionality incorporated into randomdev.*.

* live_entropy_sources.c live_entropy_sources.h
- Remove; content moved.
- move content to randomdev.[ch] and optimise.

* src/sys/dev/random/random_adaptors.c src/sys/dev/random/random_adaptors.h
- Remove; plugability is no longer used. Compile-time algorithm
selection is the way to go.

* src/sys/dev/random/random_harvestq.c src/sys/dev/random/random_harvestq.h
- Add early (re)boot-time randomness caching.

* src/sys/dev/random/randomdev_soft.c src/sys/dev/random/randomdev_soft.h
- Remove; no longer needed.

* src/sys/dev/random/uint128.h
- Provide a fake uint128_t; if a real one ever arrived, we can use
that instead. All that is needed here is N=0, N++, N==0, and some
localised trickery is used to manufacture a 128-bit 0ULLL.

* src/sys/dev/random/unit_test.c src/sys/dev/random/unit_test.h
- Improve unit tests; previously the testing human needed clairvoyance;
now the test will do a basic check of compressibility. Clairvoyant
talent is still a good idea.
- This is still a long way off a proper unit test.

* src/sys/dev/random/fortuna.c src/sys/dev/random/fortuna.h
- Improve messy union to just uint128_t.
- Remove unneeded 'static struct fortuna_start_cache'.
- Tighten up up arithmetic.
- Provide a method to allow eternal junk to be introduced; harden
it against blatant by compress/hashing.
- Assert that locks are held correctly.
- Fix the nasty pre- and post-read overloading by providing explictit
functions to do these tasks.
- Turn into self-sufficient module (no longer requires randomdev_soft.[ch])

* src/sys/dev/random/yarrow.c src/sys/dev/random/yarrow.h
- Improve messy union to just uint128_t.
- Remove unneeded 'staic struct start_cache'.
- Tighten up up arithmetic.
- Provide a method to allow eternal junk to be introduced; harden
it against blatant by compress/hashing.
- Assert that locks are held correctly.
- Fix the nasty pre- and post-read overloading by providing explictit
functions to do these tasks.
- Turn into self-sufficient module (no longer requires randomdev_soft.[ch])
- Fix some magic numbers elsewhere used as FAST and SLOW.

Differential Revision: https://reviews.freebsd.org/D2025
Reviewed by: vsevolod,delphij,rwatson,trasz,jmg
Approved by: so (delphij)


# 85512850 19-Jun-2015 Konstantin Belousov <kib@FreeBSD.org>

Restore the td_cookie value for the tmpfs directory entry which was a
dup entry, upon detach from the parent directory. If the node is
renamed, the entry is re-attached at the different directory, and
invalud cookie value triggers assert (or corrupts directory rb tree,
it seems).

Reported by: clusteradm (gjb, antoine)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 093c7f39 12-Jun-2015 Gleb Smirnoff <glebius@FreeBSD.org>

Make KPI of vm_pager_get_pages() more strict: if a pager changes a page
in the requested array, then it is responsible for disposition of previous
page and is responsible for updating the entry in the requested array.
Now consumers of KPI do not need to re-lookup the pages after call to
vm_pager_get_pages().

Reviewed by: kib
Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# bf5fce2b 02-Feb-2015 Konstantin Belousov <kib@FreeBSD.org>

Remove duplicated assignment.

CID: 1267988
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# e0a60ae1 31-Jan-2015 Konstantin Belousov <kib@FreeBSD.org>

Update directory times immediately after an entry is created or
removed. Postponing it until tmpfs_getattr() is called causes
discordant values reported for file times vs. directory times.

Reported and tested by: madpilot
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 311d39f2 30-Jan-2015 Konstantin Belousov <kib@FreeBSD.org>

POSIX states that write(2) "shall mark for update the last data
modification and last file status change timestamps of the file".
Currently, tmpfs only modifies ctime when file was extended. Since
r277828 followed tmpfs_write(), mmaped writes also do not modify
ctime.

Fix this, by updating both ctime and mtime for writes to tmpfs files.

Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# f40cb1c6 28-Jan-2015 Konstantin Belousov <kib@FreeBSD.org>

Update mtime for tmpfs files modified through memory mapping. Similar
to UFS, perform updates during syncer scans, which in particular means
that tmpfs now performs scan on sync. Also, this means that a mtime
update may be delayed up to 30 seconds after the write.

The vm_object' OBJ_TMPFS_DIRTY flag for tmpfs swap object is similar
to the OBJ_MIGHTBEDIRTY flag for the vnode object, it indicates that
object could have been dirtied. Adapt fast page fault handler and
vm_object_set_writeable_dirty() to handle OBJ_TMPFS_NODE same as
OBJT_VNODE.

Reported by: Ronald Klop <ronald-lists@klop.ws>
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 4cda7f7e 14-Jul-2014 Konstantin Belousov <kib@FreeBSD.org>

Rework the tmpfs unmount.

- Suspend filesystem for unmount. This prevents new tmpfs nodes from
instantiating, and also ensures that only unmount thread can destroy
nodes.

- Do not start tmpfs node deletion until all vnodes are reclaimed,
which guarantees that no thread can access tmpfs data. For this,
call vflush() in the loop, until the mnt_nvnodelistsize is non-zero.
Note that after mnt_nvnodelistsize becomes 0, insmntque() blocks
insertion of a vnode germ into the mount list of vnodes.

- Fail node allocation when the filesystem is being unmounted. This
is race-free due to the vflush() call in loop. This is mostly
cosmetic, avoiding some more work which might be done until
suspension in unmount is started.

Note that there is currently no way to prevent new vnode instantiation
from readers during the unmount. Due to this, forced unmount might
live-lock if vflush() loop cannot get to the zero vnode count due to
races with readers. The unmount would proceed after the load is
lifted.

Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# b5b33261 14-Jul-2014 Konstantin Belousov <kib@FreeBSD.org>

Change forgotten in r268615. Set the OBJ_TMPFS_NODE flag for
vm_object of VREG tmpfs node.

Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# fd63693d 14-Jul-2014 Konstantin Belousov <kib@FreeBSD.org>

Style. Add comment about lock mode.

Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 7a41bc2f 14-Jul-2014 Konstantin Belousov <kib@FreeBSD.org>

In tmpfs_alloc_file(), code after the 'out' label does only 'return
error;'. Replace goto's with the return.

Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# d2ca06cd 14-Jul-2014 Konstantin Belousov <kib@FreeBSD.org>

Add convenience macro to assert tmpfs node lock.

Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 55781cb9 14-Jul-2014 Konstantin Belousov <kib@FreeBSD.org>

Add some assertions for the code handling vm_object for tmpfs vnode.
In particular, vnode must be exclusively locked when the tmpfs vnode
and object are divorced. When the vnode is opened, the object must be
still alive, since only live vnode can be opened, and the tmpfs node
owns a reference on the object.

Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# fca015d3 14-Jul-2014 Konstantin Belousov <kib@FreeBSD.org>

Remove code separator lines which do not conform to style(9).

Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 7b81a399 17-Jun-2014 Konstantin Belousov <kib@FreeBSD.org>

In msdosfs_setattr(), add a check for result of the utimes(2)
permissions test, forgotten in r164033.

Refactor the permission checks for utimes(2) into vnode helper
function vn_utimes_perm(9), and simplify its code comparing with the
UFS origin, by writing the call to VOP_ACCESSX only once. Use the
helper for UFS(5), tmpfs(5), devfs(5) and msdosfs(5).

Reported by: bde
Reviewed by: bde, trasz
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 60c5c866 04-Jun-2014 Konstantin Belousov <kib@FreeBSD.org>

Allow shared locking for the tmpfs vnodes.

Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 44f1c916 22-Mar-2014 Bryan Drewery <bdrewery@FreeBSD.org>

Rename global cnt to vm_cnt to avoid shadowing.

To reduce the diff struct pcu.cnt field was not renamed, so
PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in
kvm(3) and vmstat(8). The goal was to not affect externally used KPI.

Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the
the global cnt variable.

Exp-run revealed no ports using it directly.

No objection from: arch@
Sponsored by: EMC / Isilon Storage Division


# 504bde01 14-Mar-2014 Bryan Drewery <bdrewery@FreeBSD.org>

Add missing FALLTHROUGH comment in tmpfs_dir_getdents for looking up '.' and
'..'.

Reviewed by: Russell Cattelan
Sponsored by: EMC / Isilon Storage Division
MFC after: 2 weeks


# ac09d109 14-Mar-2014 Bryan Drewery <bdrewery@FreeBSD.org>

Rename cnt to maxcookies and change its use as the condition for when to
lookup cookies to be less obscure.

No functional change.

Since r245115, cnt has not really been needed in tmpfs_dir_getdents(). Keep
it for the MPASS() for now though.

Sponsored by: EMC / Isilon Storage Division
MFC after: 2 weeks


# 62dca316 13-Mar-2014 Bryan Drewery <bdrewery@FreeBSD.org>

Cleanup redundant logic and add some comments to help explain how
it works in lieu of potentially less clear code.

Sponsored by: EMC / Isilon Storage Division
Discussed with: Russell Cattelan


# 3b5f179d 28-Aug-2013 Kenneth D. Merry <ken@FreeBSD.org>

Support storing 7 additional file flags in tmpfs:

UF_SYSTEM, UF_SPARSE, UF_OFFLINE, UF_REPARSE, UF_ARCHIVE, UF_READONLY,
and UF_HIDDEN.

Sort the file flags tmpfs supports alphabetically. tmpfs now
supports the same flags as UFS, with the exception of SF_SNAPSHOT.

Reported by: bdrewery, antoine
Sponsored by: Spectra Logic


# c7aebda8 09-Aug-2013 Attilio Rao <attilio@FreeBSD.org>

The soft and hard busy mechanism rely on the vm object lock to work.
Unify the 2 concept into a real, minimal, sxlock where the shared
acquisition represent the soft busy and the exclusive acquisition
represent the hard busy.
The old VPO_WANTED mechanism becames the hard-path for this new lock
and it becomes per-page rather than per-object.
The vm_object lock becames an interlock for this functionality:
it can be held in both read or write mode.
However, if the vm_object lock is held in read mode while acquiring
or releasing the busy state, the thread owner cannot make any
assumption on the busy state unless it is also busying it.

Also:
- Add a new flag to directly shared busy pages while vm_page_alloc
and vm_page_grab are being executed. This will be very helpful
once these functions happen under a read object lock.
- Move the swapping sleep into its own per-object flag

The KPI is heavilly changed this is why the version is bumped.
It is very likely that some VM ports users will need to change
their own code.

Sponsored by: EMC / Isilon storage division
Discussed with: alc
Reviewed by: jeff, kib
Tested by: gavin, bapt (older version)
Tested by: pho, scottl


# 8239a7a8 05-Aug-2013 Konstantin Belousov <kib@FreeBSD.org>

The tmpfs_alloc_vp() is used to instantiate vnode for the tmpfs node,
in particular, from the tmpfs_lookup VOP method. If LK_NOWAIT is not
specified in the lkflags, the lookup is supposed to return an alive
vnode whenever the underlying node is valid.

Currently, the tmpfs_alloc_vp() returns ENOENT if the vnode attached
to node exists and is being reclaimed. This causes spurious ENOENT
errors from lookup on tmpfs and corresponding random 'No such file'
failures from syscalls working with tmpfs files.

Fix this by waiting for the doomed vnode to be detached from the tmpfs
node if sleepable allocation is requested.

Note that filesystems which use vfs_hash.c, correctly handle the case
due to vfs_hash_get() looping when vget() returns ENOENT for sleepable
requests.

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 67b4ed4b 30-May-2013 Konstantin Belousov <kib@FreeBSD.org>

Assert that OBJ_TMPFS flag on the vm object for the tmpfs node is
cleared when the tmpfs node is going away.

Tested by: bdrewery, pho


# 158cc900 02-May-2013 Konstantin Belousov <kib@FreeBSD.org>

For the new regular tmpfs vnode, v_object is initialized before
insmntque() is called. The standard insmntque destructor resets the
vop vector to deadfs one, and calls vgone() on the vnode. As result,
v_object is kept unchanged, which triggers an assertion in the reclaim
code, on instmntque() failure. Also, in this case, OBJ_TMPFS flag on
the backed vm object is not cleared.

Provide the tmpfs insmntque() destructor which properly clears
OBJ_TMPFS flag and resets v_object.

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation


# 6f2af3fc 28-Apr-2013 Konstantin Belousov <kib@FreeBSD.org>

Rework the handling of the tmpfs node backing swap object and tmpfs
vnode v_object to avoid double-buffering. Use the same object both as
the backing store for tmpfs node and as the v_object.

Besides reducing memory use up to 2x times for situation of mapping
files from tmpfs, it also makes tmpfs read and write operations copy
twice bytes less.

VM subsystem was already slightly adapted to tolerate OBJT_SWAP object
as v_object. Now the vm_object_deallocate() is modified to not
reinstantiate OBJ_ONEMAPPING flag and help the VFS to correctly handle
VV_TEXT flag on the last dereference of the tmpfs backing object.

Reviewed by: alc
Tested by: pho, bf
MFC after: 1 month


# b4b2596b 21-Mar-2013 Pawel Jakub Dawidek <pjd@FreeBSD.org>

- Make 'flags' argument to chflags(2), fchflags(2) and lchflags(2) of type
u_long. Before this change it was of type int for syscalls, but prototypes
in sys/stat.h and documentation for chflags(2) and fchflags(2) (but not
for lchflags(2)) stated that it was u_long. Now some related functions
use u_long type for flags (strtofflags(3), fflagstostr(3)).
- Make path argument of type 'const char *' for consistency.

Discussed on: arch
Sponsored by: The FreeBSD Foundation


# 89f6b863 08-Mar-2013 Attilio Rao <attilio@FreeBSD.org>

Switch the vm_object mutex to be a rwlock. This will enable in the
future further optimizations where the vm_object lock will be held
in read mode most of the time the page cache resident pool of pages
are accessed for reading purposes.

The change is mostly mechanical but few notes are reported:
* The KPI changes as follow:
- VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK()
- VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK()
- VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK()
- VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED()
(in order to avoid visibility of implementation details)
- The read-mode operations are added:
VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(),
VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED()
* The vm/vm_pager.h namespace pollution avoidance (forcing requiring
sys/mutex.h in consumers directly to cater its inlining functions
using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h
consumers now must include also sys/rwlock.h.
* zfs requires a quite convoluted fix to include FreeBSD rwlocks into
the compat layer because the name clash between FreeBSD and solaris
versions must be avoided.
At this purpose zfs redefines the vm_object locking functions
directly, isolating the FreeBSD components in specific compat stubs.

The KPI results heavilly broken by this commit. Thirdy part ports must
be updated accordingly (I can think off-hand of VirtualBox, for example).

Sponsored by: EMC / Isilon storage division
Reviewed by: jeff
Reviewed by: pjd (ZFS specific review)
Discussed with: alc
Tested by: pho


# 4fd5efe7 06-Jan-2013 Gleb Kurtsou <gleb@FreeBSD.org>

tmpfs: Replace directory entry linked list with RB-Tree.

Use file name hash as a tree key, handle duplicate keys. Both VOP_LOOKUP
and VOP_READDIR operations utilize same tree for search. Directory
entry offset (cookie) is either file name hash or incremental id in case
of hash collisions (duplicate-cookies). Keep sorted per directory list
of duplicate-cookie entries to facilitate cookie number allocation.

Don't fail if previous VOP_READDIR() offset is no longer valid, start
with next dirent instead. Other file system handle it similarly.

Workaround race prone tn_readdir_last[pn] fields update.

Add tmpfs_dir_destroy() to free all dirents.

Set NFS cookies in tmpfs_dir_getdents(). Return EJUSTRETURN from
tmpfs_dir_getdents() instead of hard coded -1.

Mark directory traversal routines static as they are no longer
used outside of tmpfs_subr.c


# 1c771f92 05-Aug-2012 Konstantin Belousov <kib@FreeBSD.org>

After the PHYS_TO_VM_PAGE() function was de-inlined, the main reason
to pull vm_param.h was removed. Other big dependency of vm_page.h on
vm_param.h are PA_LOCK* definitions, which are only needed for
in-kernel code, because modules use KBI-safe functions to lock the
pages.

Stop including vm_param.h into vm_page.h. Include vm_param.h
explicitely for the kernel code which needs it.

Suggested and reviewed by: alc
MFC after: 2 weeks


# fd1062ce 18-Apr-2012 Jaakko Heinonen <jh@FreeBSD.org>

Return EOPNOTSUPP rather than EPERM for the SF_SNAPSHOT flag because
tmpfs doesn't support snapshots.

Suggested by: bde


# 587fdb53 16-Apr-2012 Jaakko Heinonen <jh@FreeBSD.org>

Sync tmpfs_chflags() with the recent changes to UFS:

- Add a check for unsupported file flags.
- Return EPERM when an user without PRIV_VFS_SYSFLAGS privilege attempts
to toggle SF_SETTABLE flags.


# f8439900 15-Apr-2012 Gleb Kurtsou <gleb@FreeBSD.org>

Provide better description for vfs.tmpfs.memory_reserved sysctl.

Suggested by: Anton Yuzhaninov <citrin@citrin.ru>


# da7aa277 07-Apr-2012 Gleb Kurtsou <gleb@FreeBSD.org>

Add reserved memory limit sysctl to tmpfs.

Cleanup availble and used memory functions.
Check if free pages available before allocating new node.

Discussed with: delphij


# db94ad12 14-Mar-2012 Gleb Kurtsou <gleb@FreeBSD.org>

Prevent tmpfs_rename() deadlock in a way similar to UFS

Unlock vnodes and try to lock them one by one. Relookup fvp and tvp.

Approved by: mdf (mentor)


# ca846258 14-Mar-2012 Gleb Kurtsou <gleb@FreeBSD.org>

Don't enforce LK_RETRY to get existing vnode in tmpfs_alloc_vp()

Doomed vnode is hardly of any use here, besides all callers handle error
case. vfs_hash_get() does the same.

Don't mess with vnode holdcount, vget() takes care of it already.

Approved by: mdf (mentor)


# 0b05cac3 15-Jan-2012 Alan Cox <alc@FreeBSD.org>

When tmpfs_write() resets an extended file to its original size after an
error, we want tmpfs_reg_resize() to ignore I/O errors and unconditionally
update the file's size.

Reviewed by: kib
MFC after: 3 weeks


# 93431cb7 14-Jan-2012 Alan Cox <alc@FreeBSD.org>

Neither tmpfs_nocacheread() nor tmpfs_mappedwrite() needs to call
vm_object_pip_{add,subtract}() on the swap object because the swap
object can't be destroyed while the vnode is exclusively locked.
Moreover, even if the swap object could have been destroyed during
tmpfs_nocacheread() and tmpfs_mappedwrite() this code is broken
because vm_object_pip_subtract() does not wake up the sleeping thread
that is trying to destroy the swap object.

Free invalid pages after an I/O error. There is no virtue in keeping
them around in the swap object creating more work for the page daemon.
(I believe that any non-busy page in the swap object will now always
be valid.)

vm_pager_get_pages() does not return a standard errno, so its return
value should not be returned by tmpfs without translation to an errno
value.

There is no reason for the wakeup on vpg in tmpfs_mappedwrite() to
occur with the swap object locked.

Eliminate printf()s from tmpfs_nocacheread() and tmpfs_mappedwrite().
(The swap pager already spam your console if data corruption is
imminent.)

Reviewed by: kib
MFC after: 3 weeks


# 2971897d 08-Jan-2012 Alan Cox <alc@FreeBSD.org>

Correct an error of omission in the implementation of the truncation
operation on POSIX shared memory objects and tmpfs. Previously, neither of
these modules correctly handled the case in which the new size of the object
or file was not a multiple of the page size. Specifically, they did not
handle partial page truncation of data stored on swap. As a result, stale
data might later be returned to an application.

Interestingly, a data inconsistency was less likely to occur under tmpfs
than POSIX shared memory objects. The reason being that a different mistake
by the tmpfs truncation operation helped avoid a data inconsistency. If the
data was still resident in memory in a PG_CACHED page, then the tmpfs
truncation operation would reactivate that page, zero the truncated portion,
and leave the page pinned in memory. More precisely, the benevolent error
was that the truncation operation didn't add the reactivated page to any of
the paging queues, effectively pinning the page. This page would remain
pinned until the file was destroyed or the page was read or written. With
this change, the page is now added to the inactive queue.

Discussed with: jhb
Reviewed by: kib (an earlier version)
MFC after: 3 weeks


# 6bbee8e2 29-Jun-2011 Alan Cox <alc@FreeBSD.org>

Add a new option, OBJPR_NOTMAPPED, to vm_object_page_remove(). Passing this
option to vm_object_page_remove() asserts that the specified range of pages
is not mapped, or more precisely that none of these pages have any managed
mappings. Thus, vm_object_page_remove() need not call pmap_remove_all() on
the pages.

This change not only saves time by eliminating pointless calls to
pmap_remove_all(), but it also eliminates an inconsistency in the use of
pmap_remove_all() versus related functions, like pmap_remove_write(). It
eliminates harmless but pointless calls to pmap_remove_all() that were being
performed on PG_UNMANAGED pages.

Update all of the existing assertions on pmap_remove_all() to reflect this
change.

Reviewed by: kib


# 4673c751 14-Feb-2011 Alan Cox <alc@FreeBSD.org>

Further simplify tmpfs_reg_resize(). Also, update its comments, including
style fixes.


# b10d1d5d 13-Feb-2011 Alan Cox <alc@FreeBSD.org>

Eliminate tn_reg.tn_aobj_pages. Instead, correctly maintain the vm
object's size field. Previously, that field was always zero, even
when the object tn_reg.tn_aobj contained numerous pages.

Apply style fixes to tmpfs_reg_resize().

In collaboration with: kib


# 9fb9c623 20-Jan-2011 Konstantin Belousov <kib@FreeBSD.org>

In tmpfs_readdir(), normalize handling of the directory entries that
either overflow the supplied buffer, or cause uiomove fail.
Do not advance cached de when directory entry was not copied out.
Do not return EOF when no entries could be copied due to first entry
too large for supplied buffer, signal EINVAL instead.

Reported by: Beat G?tzi <beat chruetertee ch>
MFC after: 1 week


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# 99d57a6b 21-Aug-2010 Ed Schouten <ed@FreeBSD.org>

Add support for whiteouts on tmpfs.

Right now unionfs only allows filesystems to be mounted on top of
another if it supports whiteouts. Even though I have sent a patch to
daichi@ to let unionfs work without it, we'd better also add support for
whiteouts to tmpfs.

This patch implements .vop_whiteout and makes necessary changes to
lookup() and readdir() to take them into account. We must also make sure
that when adding or removing a file, we honour the componentname's
DOWHITEOUT and ISWHITEOUT, to prevent duplicate filenames.

MFC after: 1 month


# 189ee6be 20-Jan-2010 Jaakko Heinonen <jh@FreeBSD.org>

- Change the type of nodes_max to u_int and use "%u" format string to
convert its value. [1]
- Set default tm_nodes_max to min(pages + 3, UINT32_MAX). It's more
reasonable than the old four nodes per page (with page size 4096) because
non-empty regular files always use at least one page. This fixes possible
overflow in the calculation. [2]
- Don't allow more than tm_nodes_max nodes allocated in tmpfs_alloc_node().

PR: kern/138367
Suggested by: bde [1], Gleb Kurtsou [2]
Approved by: trasz (mentor)


# 4afcae9b 26-Oct-2009 Alan Cox <alc@FreeBSD.org>

There is no need to "busy" a page when the object is locked for the duration
of the operation.


# 82cf92d4 11-Oct-2009 Xin LI <delphij@FreeBSD.org>

Add locking around access to parent node, and bail out when the parent
node is already freed rather than panicking the system.

PR: kern/122038
Submitted by: gk
Tested by: pho
MFC after: 1 week


# 3364c323 23-Jun-2009 Konstantin Belousov <kib@FreeBSD.org>

Implement global and per-uid accounting of the anonymous memory. Add
rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved
for the uid.

The accounting information (charge) is associated with either map entry,
or vm object backing the entry, assuming the object is the first one
in the shadow chain and entry does not require COW. Charge is moved
from entry to object on allocation of the object, e.g. during the mmap,
assuming the object is allocated, or on the first page fault on the
entry. It moves back to the entry on forks due to COW setup.

The per-entry granularity of accounting makes the charge process fair
for processes that change uid during lifetime, and decrements charge
for proper uid when region is unmapped.

The interface of vm_pager_allocate(9) is extended by adding struct ucred *,
that is used to charge appropriate uid when allocation if performed by
kernel, e.g. md(4).

Several syscalls, among them is fork(2), may now return ENOMEM when
global or per-uid limits are enforced.

In collaboration with: pho
Reviewed by: alc
Approved by: re (kensmith)


# dfd233ed 11-May-2009 Attilio Rao <attilio@FreeBSD.org>

Remove the thread argument from the FSD (File-System Dependent) parts of
the VFS. Now all the VFS_* functions and relating parts don't want the
context as long as it always refers to curthread.

In some points, in particular when dealing with VOPs and functions living
in the same namespace (eg. vflush) which still need to be converted,
pass curthread explicitly in order to retain the old behaviour.
Such loose ends will be fixed ASAP.

While here fix a bug: now, UFS_EXTATTR can be compiled alone without the
UFS_EXTATTR_AUTOSTART option.

VFS KPI is heavilly changed by this commit so thirdy parts modules needs
to be recompiled. Bump __FreeBSD_version in order to signal such
situation.


# e3c7e753 08-Feb-2009 Konstantin Belousov <kib@FreeBSD.org>

Lookup up the directory entry for the tmpfs node that are deleted by
both node pointer and name component. This does the right thing for
hardlinks to the same node in the same directory.

Submitted by: Yoshihiro Ota <ota j email ne jp>
PR: kern/131356
MFC after: 2 weeks


# 7956d34b 31-Jan-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

Remove unused local variables.

Submitted by: Christoph Mallon christoph.mallon@gmx.de
Reviewed by: kib
MFC after: 2 weeks


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# ae72afe0 23-Sep-2008 David E. O'Brien <obrien@FreeBSD.org>

The kernel implemented 'memcmp' is an alias for 'bcmp'. However, memcmp
and bcmp are not the same thing. 'man bcmp' states that the return is
"non-zero" if the two byte strings are not identical. Where as,
'man memcmp' states that the return is the "difference between the
first two differing bytes (treated as unsigned char values" if the
two byte strings are not identical.

So provide a proper memcmp(9), but it is a C implementation not a tuned
assembly implementation. Therefore bcmp(9) should be preferred over memcmp(9).


# e08d5567 03-Sep-2008 Xin LI <delphij@FreeBSD.org>

Reflect license change of NetBSD code.

Obtained from: NetBSD
MFC after: 3 days


# a0b454dc 15-Jun-2008 Konstantin Belousov <kib@FreeBSD.org>

Do not redo the vnode tear-down work already done by insmntque() when
vnode cannot be put on the vnode list for mount.

Reported and tested by: marck
Guilty party: me
MFC after: 3 days


# 81c794f9 25-Feb-2008 Attilio Rao <attilio@FreeBSD.org>

Axe the 'thread' argument from VOP_ISLOCKED() and lockstatus() as it is
always curthread.

As KPI gets broken by this patch, manpages and __FreeBSD_version will be
updated by further commits.

Tested by: Andrea Barberio <insomniac at slackware dot it>


# cb05b60a 09-Jan-2008 Attilio Rao <attilio@FreeBSD.org>

vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by: Diego Sardina <siarodx at gmail dot com>,
Andrea Di Pasquale <whyx dot it at gmail dot com>


# 1fa8f5f0 06-Dec-2007 Xin LI <delphij@FreeBSD.org>

Turn MPASS(0) into panic with more obvious reason why the assertion
is failed.


# 7871e52b 17-Nov-2007 Xin LI <delphij@FreeBSD.org>

MFp4: Several fixes to tmpfs which makes it to survive from pho@'s
strees2 suite, to quote his letter, this change:

1. It removes the tn_lookup_dirent stuff. I think this cannot be fixed,
because nothing protects vnode/tmpfs node between lookup is done, and
actual operation is performed, in the case the vnode lock is dropped.
At least, this is the case with the from vnode for rename.

For now, we do the linear lookup in the parent node. This has its own
drawbacks. Not mentioning speed (that could be fixed by using hash), the
real problem is the situation where several hardlinks exist in the dvp.
But, I think this is fixable.

2. The patch restores the VV_ROOT flag on the root vnode after it became
reclaimed and allocated again. This fixes MPASS assertion at the start
of the tmpfs_lookup() reported by many.

Submitted by: kib


# ad3638ee 10-Aug-2007 Xin LI <delphij@FreeBSD.org>

MFp4:
- LK_RETRY prohibits vget() and vn_lock() to return error.
Remove associated code. [1]
- Properly use vhold() and vdrop() instead of their unlocked
versions, we are guaranteed to have the vnode's interlock
unheld. [1]
- Fix a pseudo-infinite loop caused by 64/32-bit arithmetic
with the same way used in modern NetBSD versions. [2]
- Reorganize tmpfs_readdir to reduce duplicated code.

Submitted by: kib [1]
Obtained from: NetBSD [2]
Approved by: re (tmpfs blanket)


# 0ae6383d 09-Aug-2007 Xin LI <delphij@FreeBSD.org>

MFp4:

- Respect cnflag and don't lock vnode always as LK_EXCLUSIVE [1]
- Properly lock around tn_vnode to avoid NULL deference
- Be more careful handling vnodes (*)

(*) This is a WIP
[1] by pjd via howardsu

Thanks kib@ for his valuable VFS related comments.

Tested with: fsx, fstest, tmpfs regression test set
Found by: pho's stress2 suite
Approved by: re (tmpfs blanket)


# fb755714 03-Aug-2007 Xin LI <delphij@FreeBSD.org>

MFp4 - Refine locking to eliminate some potential race/panics:

- Copy before testing a pointer. This closes a race window.
- Use msleep with the node interlock instead of tsleep.
- Do proper locking around access to tn_vpstate.
- Assert vnode VOP lock for dir_{atta,de}tach to capture
inconsistent locking.

Suggested by: kib
Submitted by: delphij
Reviewed by: Howard Su
Approved by: re (tmpfs blanket)


# 8d9a89a3 11-Jul-2007 Xin LI <delphij@FreeBSD.org>

MFp4: Make use of the kernel unit number allocation facility
for tmpfs nodes.

Submitted by: Mingyan Guo <guomingyan gmail com>
Approved by: re (tmpfs blanket)


# 1df86a32 08-Jul-2007 Xin LI <delphij@FreeBSD.org>

MFp4:
- Plug memory leak.
- Respect underlying vnode's properties rather than assuming that
the user want root:wheel + 0755. Useful for using tmpfs(5) for
/tmp.
- Use roundup2 and howmany macros instead of rolling our own version.
- Try to fix fsx -W -R foo case.
- Instead of blindly zeroing a page, determine whether we need a pagein
order to prevent data corruption.
- Fix several bugs reported by Coverity.

Submitted by: Mingyan Guo <guomingyan gmail com>, Howard Su, delphij
Coverity ID: CID 2550, 2551, 2552, 2557
Approved by: re (tmpfs blanket)


# 9b258fca 28-Jun-2007 Xin LI <delphij@FreeBSD.org>

MFp4:

- Remove unnecessary NULL checks after M_WAITOK allocations.
- Use VOP_ACCESS instead of hand-rolled suser_cred()
calls. [1]
- Use malloc(9) KPI to allocate memory for string. The
optimization taken from NetBSD is not valid for FreeBSD
because our malloc(9) already act that way. [2]

Requested by: rwatson [1]
Submitted by: Howard Su [2]
Approved by: re (tmpfs blanket)


# a321f489 27-Jun-2007 Xin LI <delphij@FreeBSD.org>

Space/style cleanups after last set of commits.

Approved by: re (tmpfs blanket)


# 8d5892ee 27-Jun-2007 Xin LI <delphij@FreeBSD.org>

Use vfs_timestamp instead of nanotime when obtaining
a timestamp for use with timekeeping.

Approved by: re (tmpfs blanket)


# 7adb1776 25-Jun-2007 Xin LI <delphij@FreeBSD.org>

MFp4: Several clean-ups and improvements over tmpfs:

- Remove tmpfs_zone_xxx KPI, the uma(9) wrapper, since
they does not bring any value now.
- Use |= instead of = when applying VV_ROOT flag.
- Remove tm_avariable_nodes list. Use uma to hold the
released nodes.
- init/destory interlock mutex of node when init/fini
instead of ctor/dtor.
- Change memory computing using u_int to fix negative
value in 2G mem machine.
- Remove unnecessary bzero's
- Rely uma logic to make file id allocation harder to
guess.
- Fix some unsigned/signed related things. Make sure
we respect -o size=xxxx
- Use wire instead of hold a page.
- Pass allocate_zero to obtain zeroed pages upon first
use.

Submitted by: Howard Su
Approved by: re (tmpfs blanket, kensmith)


# b746bf08 18-Jun-2007 Xin LI <delphij@FreeBSD.org>

Use vfs_timestamp() instead of nanotime() - make it up to
the user to make decisions about how detail they wanted
timestamps to have.


# 21cf0e39 17-Jun-2007 Xin LI <delphij@FreeBSD.org>

MFp4: fix two locking problems:

- Hold TMPFS_LOCK while updating tm_pages_used.
- Hold vm page while doing uiomove.

This will hopefully fix all known panics.

Submitted by: Howard Su


# d1fa59e9 15-Jun-2007 Xin LI <delphij@FreeBSD.org>

MFp4: Add tmpfs, an efficient memory file system.

Please note that, this is currently considered as an
experimental feature so there could be some rough
edges. Consult http://wiki.freebsd.org/TMPFS for
more information.

For now, connect tmpfs to build on i386 and amd64
architectures only. Please let us know if you have
success with other platforms.

This work was developed by Julio M. Merino Vidal
for NetBSD as a SoC project; Rohit Jalan ported it
from NetBSD to FreeBSD. Howard Su and Glen Leeder
are worked on it to continue this effort.

Obtained from: NetBSD via p4
Submitted by: Howard Su (with some minor changes)
Approved by: re (kensmith)