History log of /freebsd-10.0-release/sys/fs/tmpfs/tmpfs_subr.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 259065 07-Dec-2013 gjb

- Copy stable/10 (r259064) to releng/10.0 as part of the
10.0-RELEASE cycle.
- Update __FreeBSD_version [1]
- Set branch name to -RC1

[1] 10.0-CURRENT __FreeBSD_version value ended at '55', so
start releng/10.0 at '100' so the branch is started with
a value ending in zero.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

# 256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


# 255008 28-Aug-2013 ken

Support storing 7 additional file flags in tmpfs:

UF_SYSTEM, UF_SPARSE, UF_OFFLINE, UF_REPARSE, UF_ARCHIVE, UF_READONLY,
and UF_HIDDEN.

Sort the file flags tmpfs supports alphabetically. tmpfs now
supports the same flags as UFS, with the exception of SF_SNAPSHOT.

Reported by: bdrewery, antoine
Sponsored by: Spectra Logic


# 254138 09-Aug-2013 attilio

The soft and hard busy mechanism rely on the vm object lock to work.
Unify the 2 concept into a real, minimal, sxlock where the shared
acquisition represent the soft busy and the exclusive acquisition
represent the hard busy.
The old VPO_WANTED mechanism becames the hard-path for this new lock
and it becomes per-page rather than per-object.
The vm_object lock becames an interlock for this functionality:
it can be held in both read or write mode.
However, if the vm_object lock is held in read mode while acquiring
or releasing the busy state, the thread owner cannot make any
assumption on the busy state unless it is also busying it.

Also:
- Add a new flag to directly shared busy pages while vm_page_alloc
and vm_page_grab are being executed. This will be very helpful
once these functions happen under a read object lock.
- Move the swapping sleep into its own per-object flag

The KPI is heavilly changed this is why the version is bumped.
It is very likely that some VM ports users will need to change
their own code.

Sponsored by: EMC / Isilon storage division
Discussed with: alc
Reviewed by: jeff, kib
Tested by: gavin, bapt (older version)
Tested by: pho, scottl


# 253967 05-Aug-2013 kib

The tmpfs_alloc_vp() is used to instantiate vnode for the tmpfs node,
in particular, from the tmpfs_lookup VOP method. If LK_NOWAIT is not
specified in the lkflags, the lookup is supposed to return an alive
vnode whenever the underlying node is valid.

Currently, the tmpfs_alloc_vp() returns ENOENT if the vnode attached
to node exists and is being reclaimed. This causes spurious ENOENT
errors from lookup on tmpfs and corresponding random 'No such file'
failures from syscalls working with tmpfs files.

Fix this by waiting for the doomed vnode to be detached from the tmpfs
node if sleepable allocation is requested.

Note that filesystems which use vfs_hash.c, correctly handle the case
due to vfs_hash_get() looping when vget() returns ENOENT for sleepable
requests.

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 251149 30-May-2013 kib

Assert that OBJ_TMPFS flag on the vm object for the tmpfs node is
cleared when the tmpfs node is going away.

Tested by: bdrewery, pho


# 250189 02-May-2013 kib

For the new regular tmpfs vnode, v_object is initialized before
insmntque() is called. The standard insmntque destructor resets the
vop vector to deadfs one, and calls vgone() on the vnode. As result,
v_object is kept unchanged, which triggers an assertion in the reclaim
code, on instmntque() failure. Also, in this case, OBJ_TMPFS flag on
the backed vm object is not cleared.

Provide the tmpfs insmntque() destructor which properly clears
OBJ_TMPFS flag and resets v_object.

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation


# 250030 28-Apr-2013 kib

Rework the handling of the tmpfs node backing swap object and tmpfs
vnode v_object to avoid double-buffering. Use the same object both as
the backing store for tmpfs node and as the v_object.

Besides reducing memory use up to 2x times for situation of mapping
files from tmpfs, it also makes tmpfs read and write operations copy
twice bytes less.

VM subsystem was already slightly adapted to tolerate OBJT_SWAP object
as v_object. Now the vm_object_deallocate() is modified to not
reinstantiate OBJ_ONEMAPPING flag and help the VFS to correctly handle
VV_TEXT flag on the last dereference of the tmpfs backing object.

Reviewed by: alc
Tested by: pho, bf
MFC after: 1 month


# 248597 21-Mar-2013 pjd

- Make 'flags' argument to chflags(2), fchflags(2) and lchflags(2) of type
u_long. Before this change it was of type int for syscalls, but prototypes
in sys/stat.h and documentation for chflags(2) and fchflags(2) (but not
for lchflags(2)) stated that it was u_long. Now some related functions
use u_long type for flags (strtofflags(3), fflagstostr(3)).
- Make path argument of type 'const char *' for consistency.

Discussed on: arch
Sponsored by: The FreeBSD Foundation


# 248084 09-Mar-2013 attilio

Switch the vm_object mutex to be a rwlock. This will enable in the
future further optimizations where the vm_object lock will be held
in read mode most of the time the page cache resident pool of pages
are accessed for reading purposes.

The change is mostly mechanical but few notes are reported:
* The KPI changes as follow:
- VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK()
- VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK()
- VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK()
- VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED()
(in order to avoid visibility of implementation details)
- The read-mode operations are added:
VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(),
VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED()
* The vm/vm_pager.h namespace pollution avoidance (forcing requiring
sys/mutex.h in consumers directly to cater its inlining functions
using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h
consumers now must include also sys/rwlock.h.
* zfs requires a quite convoluted fix to include FreeBSD rwlocks into
the compat layer because the name clash between FreeBSD and solaris
versions must be avoided.
At this purpose zfs redefines the vm_object locking functions
directly, isolating the FreeBSD components in specific compat stubs.

The KPI results heavilly broken by this commit. Thirdy part ports must
be updated accordingly (I can think off-hand of VirtualBox, for example).

Sponsored by: EMC / Isilon storage division
Reviewed by: jeff
Reviewed by: pjd (ZFS specific review)
Discussed with: alc
Tested by: pho


# 245115 06-Jan-2013 gleb

tmpfs: Replace directory entry linked list with RB-Tree.

Use file name hash as a tree key, handle duplicate keys. Both VOP_LOOKUP
and VOP_READDIR operations utilize same tree for search. Directory
entry offset (cookie) is either file name hash or incremental id in case
of hash collisions (duplicate-cookies). Keep sorted per directory list
of duplicate-cookie entries to facilitate cookie number allocation.

Don't fail if previous VOP_READDIR() offset is no longer valid, start
with next dirent instead. Other file system handle it similarly.

Workaround race prone tn_readdir_last[pn] fields update.

Add tmpfs_dir_destroy() to free all dirents.

Set NFS cookies in tmpfs_dir_getdents(). Return EJUSTRETURN from
tmpfs_dir_getdents() instead of hard coded -1.

Mark directory traversal routines static as they are no longer
used outside of tmpfs_subr.c


# 239065 05-Aug-2012 kib

After the PHYS_TO_VM_PAGE() function was de-inlined, the main reason
to pull vm_param.h was removed. Other big dependency of vm_page.h on
vm_param.h are PA_LOCK* definitions, which are only needed for
in-kernel code, because modules use KBI-safe functions to lock the
pages.

Stop including vm_param.h into vm_page.h. Include vm_param.h
explicitely for the kernel code which needs it.

Suggested and reviewed by: alc
MFC after: 2 weeks


# 234422 18-Apr-2012 jh

Return EOPNOTSUPP rather than EPERM for the SF_SNAPSHOT flag because
tmpfs doesn't support snapshots.

Suggested by: bde


# 234347 16-Apr-2012 jh

Sync tmpfs_chflags() with the recent changes to UFS:

- Add a check for unsupported file flags.
- Return EPERM when an user without PRIV_VFS_SYSFLAGS privilege attempts
to toggle SF_SETTABLE flags.


# 234325 15-Apr-2012 gleb

Provide better description for vfs.tmpfs.memory_reserved sysctl.

Suggested by: Anton Yuzhaninov <citrin@citrin.ru>


# 233998 07-Apr-2012 gleb

Add reserved memory limit sysctl to tmpfs.

Cleanup availble and used memory functions.
Check if free pages available before allocating new node.

Discussed with: delphij


# 232960 14-Mar-2012 gleb

Prevent tmpfs_rename() deadlock in a way similar to UFS

Unlock vnodes and try to lock them one by one. Relookup fvp and tvp.

Approved by: mdf (mentor)


# 232959 14-Mar-2012 gleb

Don't enforce LK_RETRY to get existing vnode in tmpfs_alloc_vp()

Doomed vnode is hardly of any use here, besides all callers handle error
case. vfs_hash_get() does the same.

Don't mess with vnode holdcount, vget() takes care of it already.

Approved by: mdf (mentor)


# 230180 15-Jan-2012 alc

When tmpfs_write() resets an extended file to its original size after an
error, we want tmpfs_reg_resize() to ignore I/O errors and unconditionally
update the file's size.

Reviewed by: kib
MFC after: 3 weeks


# 230120 14-Jan-2012 alc

Neither tmpfs_nocacheread() nor tmpfs_mappedwrite() needs to call
vm_object_pip_{add,subtract}() on the swap object because the swap
object can't be destroyed while the vnode is exclusively locked.
Moreover, even if the swap object could have been destroyed during
tmpfs_nocacheread() and tmpfs_mappedwrite() this code is broken
because vm_object_pip_subtract() does not wake up the sleeping thread
that is trying to destroy the swap object.

Free invalid pages after an I/O error. There is no virtue in keeping
them around in the swap object creating more work for the page daemon.
(I believe that any non-busy page in the swap object will now always
be valid.)

vm_pager_get_pages() does not return a standard errno, so its return
value should not be returned by tmpfs without translation to an errno
value.

There is no reason for the wakeup on vpg in tmpfs_mappedwrite() to
occur with the swap object locked.

Eliminate printf()s from tmpfs_nocacheread() and tmpfs_mappedwrite().
(The swap pager already spam your console if data corruption is
imminent.)

Reviewed by: kib
MFC after: 3 weeks


# 229821 08-Jan-2012 alc

Correct an error of omission in the implementation of the truncation
operation on POSIX shared memory objects and tmpfs. Previously, neither of
these modules correctly handled the case in which the new size of the object
or file was not a multiple of the page size. Specifically, they did not
handle partial page truncation of data stored on swap. As a result, stale
data might later be returned to an application.

Interestingly, a data inconsistency was less likely to occur under tmpfs
than POSIX shared memory objects. The reason being that a different mistake
by the tmpfs truncation operation helped avoid a data inconsistency. If the
data was still resident in memory in a PG_CACHED page, then the tmpfs
truncation operation would reactivate that page, zero the truncated portion,
and leave the page pinned in memory. More precisely, the benevolent error
was that the truncation operation didn't add the reactivated page to any of
the paging queues, effectively pinning the page. This page would remain
pinned until the file was destroyed or the page was read or written. With
this change, the page is now added to the inactive queue.

Discussed with: jhb
Reviewed by: kib (an earlier version)
MFC after: 3 weeks


# 223677 29-Jun-2011 alc

Add a new option, OBJPR_NOTMAPPED, to vm_object_page_remove(). Passing this
option to vm_object_page_remove() asserts that the specified range of pages
is not mapped, or more precisely that none of these pages have any managed
mappings. Thus, vm_object_page_remove() need not call pmap_remove_all() on
the pages.

This change not only saves time by eliminating pointless calls to
pmap_remove_all(), but it also eliminates an inconsistency in the use of
pmap_remove_all() versus related functions, like pmap_remove_write(). It
eliminates harmless but pointless calls to pmap_remove_all() that were being
performed on PG_UNMANAGED pages.

Update all of the existing assertions on pmap_remove_all() to reflect this
change.

Reviewed by: kib


# 218681 14-Feb-2011 alc

Further simplify tmpfs_reg_resize(). Also, update its comments, including
style fixes.


# 218640 13-Feb-2011 alc

Eliminate tn_reg.tn_aobj_pages. Instead, correctly maintain the vm
object's size field. Previously, that field was always zero, even
when the object tn_reg.tn_aobj contained numerous pages.

Apply style fixes to tmpfs_reg_resize().

In collaboration with: kib


# 217633 20-Jan-2011 kib

In tmpfs_readdir(), normalize handling of the directory entries that
either overflow the supplied buffer, or cause uiomove fail.
Do not advance cached de when directory entry was not copied out.
Do not return EOF when no entries could be copied due to first entry
too large for supplied buffer, signal EINVAL instead.

Reported by: Beat G?tzi <beat chruetertee ch>
MFC after: 1 week


# 211598 22-Aug-2010 ed

Add support for whiteouts on tmpfs.

Right now unionfs only allows filesystems to be mounted on top of
another if it supports whiteouts. Even though I have sent a patch to
daichi@ to let unionfs work without it, we'd better also add support for
whiteouts to tmpfs.

This patch implements .vop_whiteout and makes necessary changes to
lookup() and readdir() to take them into account. We must also make sure
that when adding or removing a file, we honour the componentname's
DOWHITEOUT and ISWHITEOUT, to prevent duplicate filenames.

MFC after: 1 month


# 202708 20-Jan-2010 jh

- Change the type of nodes_max to u_int and use "%u" format string to
convert its value. [1]
- Set default tm_nodes_max to min(pages + 3, UINT32_MAX). It's more
reasonable than the old four nodes per page (with page size 4096) because
non-empty regular files always use at least one page. This fixes possible
overflow in the calculation. [2]
- Don't allow more than tm_nodes_max nodes allocated in tmpfs_alloc_node().

PR: kern/138367
Suggested by: bde [1], Gleb Kurtsou [2]
Approved by: trasz (mentor)


# 198494 26-Oct-2009 alc

There is no need to "busy" a page when the object is locked for the duration
of the operation.


# 197953 11-Oct-2009 delphij

Add locking around access to parent node, and bail out when the parent
node is already freed rather than panicking the system.

PR: kern/122038
Submitted by: gk
Tested by: pho
MFC after: 1 week


# 194766 23-Jun-2009 kib

Implement global and per-uid accounting of the anonymous memory. Add
rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved
for the uid.

The accounting information (charge) is associated with either map entry,
or vm object backing the entry, assuming the object is the first one
in the shadow chain and entry does not require COW. Charge is moved
from entry to object on allocation of the object, e.g. during the mmap,
assuming the object is allocated, or on the first page fault on the
entry. It moves back to the entry on forks due to COW setup.

The per-entry granularity of accounting makes the charge process fair
for processes that change uid during lifetime, and decrements charge
for proper uid when region is unmapped.

The interface of vm_pager_allocate(9) is extended by adding struct ucred *,
that is used to charge appropriate uid when allocation if performed by
kernel, e.g. md(4).

Several syscalls, among them is fork(2), may now return ENOMEM when
global or per-uid limits are enforced.

In collaboration with: pho
Reviewed by: alc
Approved by: re (kensmith)


# 191990 11-May-2009 attilio

Remove the thread argument from the FSD (File-System Dependent) parts of
the VFS. Now all the VFS_* functions and relating parts don't want the
context as long as it always refers to curthread.

In some points, in particular when dealing with VOPs and functions living
in the same namespace (eg. vflush) which still need to be converted,
pass curthread explicitly in order to retain the old behaviour.
Such loose ends will be fixed ASAP.

While here fix a bug: now, UFS_EXTATTR can be compiled alone without the
UFS_EXTATTR_AUTOSTART option.

VFS KPI is heavilly changed by this commit so thirdy parts modules needs
to be recompiled. Bump __FreeBSD_version in order to signal such
situation.


# 188318 08-Feb-2009 kib

Lookup up the directory entry for the tmpfs node that are deleted by
both node pointer and name component. This does the right thing for
hardlinks to the same node in the same directory.

Submitted by: Yoshihiro Ota <ota j email ne jp>
PR: kern/131356
MFC after: 2 weeks


# 187959 31-Jan-2009 bz

Remove unused local variables.

Submitted by: Christoph Mallon christoph.mallon@gmx.de
Reviewed by: kib
MFC after: 2 weeks


# 183299 23-Sep-2008 obrien

The kernel implemented 'memcmp' is an alias for 'bcmp'. However, memcmp
and bcmp are not the same thing. 'man bcmp' states that the return is
"non-zero" if the two byte strings are not identical. Where as,
'man memcmp' states that the return is the "difference between the
first two differing bytes (treated as unsigned char values" if the
two byte strings are not identical.

So provide a proper memcmp(9), but it is a C implementation not a tuned
assembly implementation. Therefore bcmp(9) should be preferred over memcmp(9).


# 182739 03-Sep-2008 delphij

Reflect license change of NetBSD code.

Obtained from: NetBSD
MFC after: 3 days


# 179808 15-Jun-2008 kib

Do not redo the vnode tear-down work already done by insmntque() when
vnode cannot be put on the vnode list for mount.

Reported and tested by: marck
Guilty party: me
MFC after: 3 days


# 176559 25-Feb-2008 attilio

Axe the 'thread' argument from VOP_ISLOCKED() and lockstatus() as it is
always curthread.

As KPI gets broken by this patch, manpages and __FreeBSD_version will be
updated by further commits.

Tested by: Andrea Barberio <insomniac at slackware dot it>


# 175202 09-Jan-2008 attilio

vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by: Diego Sardina <siarodx at gmail dot com>,
Andrea Di Pasquale <whyx dot it at gmail dot com>


# 174384 06-Dec-2007 delphij

Turn MPASS(0) into panic with more obvious reason why the assertion
is failed.


# 173725 18-Nov-2007 delphij

MFp4: Several fixes to tmpfs which makes it to survive from pho@'s
strees2 suite, to quote his letter, this change:

1. It removes the tn_lookup_dirent stuff. I think this cannot be fixed,
because nothing protects vnode/tmpfs node between lookup is done, and
actual operation is performed, in the case the vnode lock is dropped.
At least, this is the case with the from vnode for rename.

For now, we do the linear lookup in the parent node. This has its own
drawbacks. Not mentioning speed (that could be fixed by using hash), the
real problem is the situation where several hardlinks exist in the dvp.
But, I think this is fixable.

2. The patch restores the VV_ROOT flag on the root vnode after it became
reclaimed and allocated again. This fixes MPASS assertion at the start
of the tmpfs_lookup() reported by many.

Submitted by: kib


# 171802 10-Aug-2007 delphij

MFp4:
- LK_RETRY prohibits vget() and vn_lock() to return error.
Remove associated code. [1]
- Properly use vhold() and vdrop() instead of their unlocked
versions, we are guaranteed to have the vnode's interlock
unheld. [1]
- Fix a pseudo-infinite loop caused by 64/32-bit arithmetic
with the same way used in modern NetBSD versions. [2]
- Reorganize tmpfs_readdir to reduce duplicated code.

Submitted by: kib [1]
Obtained from: NetBSD [2]
Approved by: re (tmpfs blanket)


# 171799 10-Aug-2007 delphij

MFp4:

- Respect cnflag and don't lock vnode always as LK_EXCLUSIVE [1]
- Properly lock around tn_vnode to avoid NULL deference
- Be more careful handling vnodes (*)

(*) This is a WIP
[1] by pjd via howardsu

Thanks kib@ for his valuable VFS related comments.

Tested with: fsx, fstest, tmpfs regression test set
Found by: pho's stress2 suite
Approved by: re (tmpfs blanket)


# 171704 03-Aug-2007 delphij

MFp4 - Refine locking to eliminate some potential race/panics:

- Copy before testing a pointer. This closes a race window.
- Use msleep with the node interlock instead of tsleep.
- Do proper locking around access to tn_vpstate.
- Assert vnode VOP lock for dir_{atta,de}tach to capture
inconsistent locking.

Suggested by: kib
Submitted by: delphij
Reviewed by: Howard Su
Approved by: re (tmpfs blanket)


# 171362 11-Jul-2007 delphij

MFp4: Make use of the kernel unit number allocation facility
for tmpfs nodes.

Submitted by: Mingyan Guo <guomingyan gmail com>
Approved by: re (tmpfs blanket)


# 171308 08-Jul-2007 delphij

MFp4:
- Plug memory leak.
- Respect underlying vnode's properties rather than assuming that
the user want root:wheel + 0755. Useful for using tmpfs(5) for
/tmp.
- Use roundup2 and howmany macros instead of rolling our own version.
- Try to fix fsx -W -R foo case.
- Instead of blindly zeroing a page, determine whether we need a pagein
order to prevent data corruption.
- Fix several bugs reported by Coverity.

Submitted by: Mingyan Guo <guomingyan gmail com>, Howard Su, delphij
Coverity ID: CID 2550, 2551, 2552, 2557
Approved by: re (tmpfs blanket)


# 171087 29-Jun-2007 delphij

MFp4:

- Remove unnecessary NULL checks after M_WAITOK allocations.
- Use VOP_ACCESS instead of hand-rolled suser_cred()
calls. [1]
- Use malloc(9) KPI to allocate memory for string. The
optimization taken from NetBSD is not valid for FreeBSD
because our malloc(9) already act that way. [2]

Requested by: rwatson [1]
Submitted by: Howard Su [2]
Approved by: re (tmpfs blanket)


# 171070 28-Jun-2007 delphij

Space/style cleanups after last set of commits.

Approved by: re (tmpfs blanket)


# 171068 28-Jun-2007 delphij

Use vfs_timestamp instead of nanotime when obtaining
a timestamp for use with timekeeping.

Approved by: re (tmpfs blanket)


# 171029 25-Jun-2007 delphij

MFp4: Several clean-ups and improvements over tmpfs:

- Remove tmpfs_zone_xxx KPI, the uma(9) wrapper, since
they does not bring any value now.
- Use |= instead of = when applying VV_ROOT flag.
- Remove tm_avariable_nodes list. Use uma to hold the
released nodes.
- init/destory interlock mutex of node when init/fini
instead of ctor/dtor.
- Change memory computing using u_int to fix negative
value in 2G mem machine.
- Remove unnecessary bzero's
- Rely uma logic to make file id allocation harder to
guess.
- Fix some unsigned/signed related things. Make sure
we respect -o size=xxxx
- Use wire instead of hold a page.
- Pass allocate_zero to obtain zeroed pages upon first
use.

Submitted by: Howard Su
Approved by: re (tmpfs blanket, kensmith)


# 170922 18-Jun-2007 delphij

Use vfs_timestamp() instead of nanotime() - make it up to
the user to make decisions about how detail they wanted
timestamps to have.


# 170903 17-Jun-2007 delphij

MFp4: fix two locking problems:

- Hold TMPFS_LOCK while updating tm_pages_used.
- Hold vm page while doing uiomove.

This will hopefully fix all known panics.

Submitted by: Howard Su


# 170808 15-Jun-2007 delphij

MFp4: Add tmpfs, an efficient memory file system.

Please note that, this is currently considered as an
experimental feature so there could be some rough
edges. Consult http://wiki.freebsd.org/TMPFS for
more information.

For now, connect tmpfs to build on i386 and amd64
architectures only. Please let us know if you have
success with other platforms.

This work was developed by Julio M. Merino Vidal
for NetBSD as a SoC project; Rohit Jalan ported it
from NetBSD to FreeBSD. Howard Su and Glen Leeder
are worked on it to continue this effort.

Obtained from: NetBSD via p4
Submitted by: Howard Su (with some minor changes)
Approved by: re (kensmith)