History log of /freebsd-10-stable/sys/gnu/fs/
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
288985 07-Oct-2015 avatar

MFC r287698: Fixing a memory leak on module unloading.

287287 29-Aug-2015 avatar

MFC r286888: Using consistent coding style to deal with error inside the loop.

256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


247631 02-Mar-2013 attilio

Garbage collect XFS bits which are now already completely disconnected
from the tree since few months.

This is not targeted for MFC.


243311 19-Nov-2012 attilio

r16312 is not any longer real since many years (likely since when VFS
received granular locking) but the comment present in UFS has been
copied all over other filesystems code incorrectly for several times.

Removes comments that makes no sense now.

Reviewed by: kib
MFC after: 3 days


242833 09-Nov-2012 attilio

Complete MPSAFE VFS interface and remove MNTK_MPSAFE flag.
Porters should refer to __FreeBSD_version 1000021 for this change as
it may have happened at the same timeframe.


241374 09-Oct-2012 attilio

Add an unified macro to deny ability from the compiler to reorder
instruction loads/stores at its will.
The macro __compiler_membar() is currently supported for both gcc and
clang, but kernel compilation will fail otherwise.

Reviewed by: bde, kib
Discussed with: dim, theraven
MFC after: 2 weeks


240379 12-Sep-2012 kevlo

Add VFCF_READONLY flag that indicates ntfs and xfs file systems are
only supported as read-only.


240011 02-Sep-2012 dim

Partially revert r239959, after actually fixing most of the clang
warnings in sys/gnu/fs/xfs. The only warnings that still need to be
suppressed are those about array bound overruns of flexible array
members in xfs_dir2_{block,sf}.c, which are too expensive (in terms of
cascading code changes) to fix.

MFC after: 1 week
X-MFC-With: r239959


238980 01-Aug-2012 avatar

Just like the other file systems found in /sys/fs, g_vfs_open()
should be paried with g_vfs_close(). Though g_vfs_close() is a wrapper
around g_wither_geom_close(), r206130 added the following test in
g_vfs_open():

if (bo->bo_private != vp)
return (EBUSY);

Which will cause a 'Device busy' error inside reiserfs_mountfs() if
the same file system is re-mounted again after umount or mounting failure:

(case 1, /dev/ad4s3 is not a valid REISERFS partition)
# mount -t reiserfs -o ro /dev/ad4s3 /mnt
mount: /dev/ad4s3: Invalid argument
# mount -t msdosfs -o ro /dev/ad4s3 /mnt
mount: /dev/ad4s3: Device busy

(case 2, /dev/ad4s3 is a valid REISERFS partition)
# mount -t reiserfs -o ro /dev/ad4s3 /mnt
# umount /mnt
# mount -t reiserfs -o ro /dev/ad4s3 /mnt
mount: /dev/ad4s3: Device busy

On the other hand, g_vfs_close() 'fixed' the above cases by doing an
extra step to keep 'sc->sc_bo->bo_private' and 'cp->private' pointers
synchronised.

Reviewed by: kib
MFC after: 1 month


235822 23-May-2012 delphij

Fix build:

- Use %ll instead of %q for explicit long long casts;
- Use %j instead of %q in XFS and cast to intmax_t.

Tested with: make universe


234607 23-Apr-2012 trasz

Remove unused thread argument to vrecycle().

Reviewed by: kib


233575 27-Mar-2012 dumbbell

Make ReiserFS MPSAFE

Most functions seemed to be already fine w.r.t. what's done in msdosfs.

MFC after: 1 month


232821 11-Mar-2012 kib

Remove fifo.h. The only used function declaration from the header is
migrated to sys/vnode.h.

Submitted by: gianni


230249 17-Jan-2012 mckusick

Make sure all intermediate variables holding mount flags (mnt_flag)
and that all internal kernel calls passing mount flags are declared
as uint64_t so that flags in the top 32-bits are not lost.

MFC after: 2 weeks


230132 15-Jan-2012 uqs

Convert files to UTF-8


229272 02-Jan-2012 ed

Use strchr() and strrchr().

It seems strchr() and strrchr() are used more often than index() and
rindex(). Therefore, simply migrate all kernel code to use it.

For the XFS code, remove an empty line to make the code identical to
the code in the Linux kernel.


227309 07-Nov-2011 ed

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


227293 07-Nov-2011 ed

Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs.

This means that their use is restricted to a single C file.


224778 11-Aug-2011 rwatson

Second-to-last commit implementing Capsicum capabilities in the FreeBSD
kernel for FreeBSD 9.0:

Add a new capability mask argument to fget(9) and friends, allowing system
call code to declare what capabilities are required when an integer file
descriptor is converted into an in-kernel struct file *. With options
CAPABILITIES compiled into the kernel, this enforces capability
protection; without, this change is effectively a no-op.

Some cases require special handling, such as mmap(2), which must preserve
information about the maximum rights at the time of mapping in the memory
map so that they can later be enforced in mprotect(2) -- this is done by
narrowing the rights in the existing max_protection field used for similar
purposes with file permissions.

In namei(9), we assert that the code is not reached from within capability
mode, as we're not yet ready to enforce namespace capabilities there.
This will follow in a later commit.

Update two capability names: CAP_EVENT and CAP_KEVENT become
CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they
represent.

Approved by: re (bz)
Submitted by: jonathan
Sponsored by: Google Inc


222172 22-May-2011 uqs

Fix typo in unused function name

Submitted by: arundel


222167 22-May-2011 rmacklem

Add a lock flags argument to the VFS_FHTOVP() file system
method, so that callers can indicate the minimum vnode
locking requirement. This will allow some file systems to choose
to return a LK_SHARED locked vnode when LK_SHARED is specified
for the flags argument. This patch only adds the flag. It
does not change any file system to use it and all callers
specify LK_EXCLUSIVE, so file system semantics are not changed.

Reviewed by: kib


218909 21-Feb-2011 brucec

Fix typos - remove duplicate "the".

PR: bin/154928
Submitted by: Eitan Adler <lists at eitanadler.com>
MFC after: 3 days


215548 19-Nov-2010 kib

Remove prtactive variable and related printf()s in the vop_inactive
and vop_reclaim() methods. They seems to be unused, and the reported
situation is normal for the forced unmount.

MFC after: 1 week
X-MFC-note: keep prtactive symbol in vfs_subr.c


213664 10-Oct-2010 kib

The r184588 changed the layout of struct export_args, causing an ABI
breakage for old mount(2) syscall, since most struct <filesystem>_args
embed export_args. The mount(2) is supposed to provide ABI
compatibility for pre-nmount mount(8) binaries, so restore ABI to
pre-r184588.

Requested and reviewed by: bde
MFC after: 2 weeks


211531 20-Aug-2010 jhb

Add dedicated routines to toggle lockmgr flags such as LK_NOSHARE and
LK_CANRECURSE after a lock is created. Use them to implement macros that
otherwise manipulated the flags directly. Assert that the associated
lockmgr lock is exclusively locked by the current thread when manipulating
these flags to ensure the flag updates are safe. This last change required
some minor shuffling in a few filesystems to exclusively lock a brand new
vnode slightly earlier.

Reviewed by: kib
MFC after: 3 days


207662 05-May-2010 trasz

Move checking against RLIMIT_FSIZE into one place, vn_rlimit_fsize().

Reviewed by: kib


202283 14-Jan-2010 lulf

Bring in the ext2fs work done by Aditya Sarawgi during and after Google Summer
of Code 2009:

- BSDL block and inode allocation policies for ext2fs. This involves the use
FFS1 style block and inode allocation for ext2fs. Preallocation was removed
since it was GPL'd.
- Make ext2fs MPSAFE by introducing locks to per-mount datastructures.
- Fixes for kern/122047 PR.
- Various small bugfixes.
- Move out of gnu/ directory.

Sponsored by: Google Inc.
Submitted by: Aditya Sarawgi <sarawgi.aditya AT SPAMFREE gmail DOT com>


201758 07-Jan-2010 mbr

Remove extraneous semicolons, no functional changes.

Submitted by: Marc Balmer <marc@msys.ch>
MFC after: 1 week


200071 03-Dec-2009 trasz

Remove unused code.


198940 05-Nov-2009 jh

File flags handling fixes for ext2fs:

- Disallow setting of flags not supported by ext2fs.
- Map EXT2_APPEND_FL to SF_APPEND.
- Map EXT2_IMMUTABLE_FL to SF_IMMUTABLE.
- Map EXT2_NODUMP_FL to UF_NODUMP.

Note that ext2fs doesn't support user settable append and immutable flags.
EXT2_NODUMP_FL is an user settable flag also on Linux.

PR: kern/122047
Reported by: Ighighi
Submitted by: Aditya Sarawgi (original version)
Reviewed by: bde
Approved by: trasz (mentor)


194974 25-Jun-2009 rdivacky

Fix the build by using proper format.

Pointy hat: me
Approved by: kib


194944 25-Jun-2009 rdivacky

Switch cmd argument of ioctl to u_long as elsewhere in the kernel.
Propagate this change down the callchain.

Approved by: kan (maintainer)
Approved by: ed (mentor)


194296 16-Jun-2009 kib

Do not use casts (int *)0 and (struct thread *)0 for the arguments of
vn_rdwr, use NULL.

Reviewed by: jhb
MFC after: 1 week


193924 10-Jun-2009 kib

Fix r193923 by noting that type of a_fp is struct file *, not int.
It was assumed that r193923 was trivial change that cannot be done
wrong.

MFC after: 2 weeks


193923 10-Jun-2009 kib

s/a_fdidx/a_fp/ for VOP_OPEN comments that inline struct vop_open_args
definition.

Discussed with: bde
MFC after: 2 weeks


193628 07-Jun-2009 stas

- Outindent long printf lines instead of splitting them in the
middle of senetences. This also makes the code more consistent
with the corresponding FFS code.
- Use 2-space sentences breaks consistently.

Suggested by: bde


193390 03-Jun-2009 stas

- Remove unused sparc64-bitops.h file. Our ext2fs code doesn't use
sparc64-specific bitops implemetations and relies on generic ones.
Furthermore, bitops implementations present in sparc64-bitops.h
are written in C similarly to generic bitops.


193382 03-Jun-2009 stas

- Style(9) improvements.
- Convert all K&R definitions to ANSI equialents.
- Retire bsd_malloc and bsd_free macros and
use malloc/free directly.
- Drop some unused debugging calls.

This commit brings no functional changes.


193377 03-Jun-2009 stas

- Sync our copies of ext2fs Linux headers to current Linux versions.
Minimize differencies between our ext2fs headers and relevant Linux
versions by using EXT2_SB macro to access the superblock fields. Most
of the differencies in access to these fields are now hidden inside
this macro.
- Rename the s_db_per_group field of ext2fs_sb_info to s_gdb_count
to reflect the similar change in Linux headers. New name also seem
to be more appropriate for this field.
- Use proper types for s_first_inode and s_inode_size in-core superblock
fields. Now they reflec types used in the on-disk superblock version.
- Add support for older filesystem revisions that doesn't have proper
s_first_ino and s_inode_size fields in the on-disk superblock. In these
cases predefined values for these fields are used.
- Add simple sanity checks for s_first_inode and s_inode_size correctness.

Reviewed by: bde (previous version)
MFC after: 2 weeks


192314 18-May-2009 kan

Remove empty files and do nto try to build them.
Apparently, they are problematic for CTF users.

PR: 119298
Submitted by: Julian H. Stacey


192114 14-May-2009 attilio

FreeBSD right now support 32 CPUs on all the architectures at least.
With the arrival of 128+ cores it is necessary to handle more than that.
One of the first thing to change is the support for cpumask_t that needs
to handle more than 32 bits masking (which happens now). Some places,
however, still assume that cpumask_t is a 32 bits mask.
Fix that situation by using always correctly cpumask_t when needed.

While here, remove the part under STOP_NMI for the Xen support as it
is broken in any case.

Additively make ipi_nmi_pending as static.

Reviewed by: jhb, kmacy
Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>


191990 11-May-2009 attilio

Remove the thread argument from the FSD (File-System Dependent) parts of
the VFS. Now all the VFS_* functions and relating parts don't want the
context as long as it always refers to curthread.

In some points, in particular when dealing with VOPs and functions living
in the same namespace (eg. vflush) which still need to be converted,
pass curthread explicitly in order to retain the old behaviour.
Such loose ends will be fixed ASAP.

While here fix a bug: now, UFS_EXTATTR can be compiled alone without the
UFS_EXTATTR_AUTOSTART option.

VFS KPI is heavilly changed by this commit so thirdy parts modules needs
to be recompiled. Bump __FreeBSD_version in order to signal such
situation.


189878 16-Mar-2009 kib

Fix two issues with bufdaemon, often causing the processes to hang in
the "nbufkv" sleep.

First, ffs background cg group block write requests a new buffer for
the shadow copy. When ffs_bufwrite() is called from the bufdaemon due
to buffers shortage, requesting the buffer deadlock bufdaemon.
Introduce a new flag for getnewbuf(), GB_NOWAIT_BD, to request getblk
to not block while allocating the buffer, and return failure
instead. Add a flag argument to the geteblk to allow to pass the flags
to getblk(). Do not repeat the getnewbuf() call from geteblk if buffer
allocation failed and either GB_NOWAIT_BD is specified, or geteblk()
is called from bufdaemon (or its helper, see below). In
ffs_bufwrite(), fall back to synchronous cg block write if shadow
block allocation failed.

Since r107847, buffer write assumes that vnode owning the buffer is
locked. The second problem is that buffer cache may accumulate many
buffers belonging to limited number of vnodes. With such workload,
quite often threads that own the mentioned vnodes locks are trying to
read another block from the vnodes, and, due to buffer cache
exhaustion, are asking bufdaemon for help. Bufdaemon is unable to make
any substantial progress because the vnodes are locked.

Allow the threads owning vnode locks to help the bufdaemon by doing
the flush pass over the buffer cache before getnewbuf() is going to
uninterruptible sleep. Move the flushing code from buf_daemon() to new
helper function buf_do_flush(), that is called from getnewbuf(). The
number of buffers flushed by single call to buf_do_flush() from
getnewbuf() is limited by new sysctl vfs.flushbufqtarget. Prevent
recursive calls to buf_do_flush() by marking the bufdaemon and threads
that temporarily help bufdaemon by TDP_BUFNEED flag.

In collaboration with: pho
Reviewed by: tegge (previous version)
Tested by: glebius, yandex ...
MFC after: 3 weeks


189525 08-Mar-2009 das

Don't declare bin_search() as an inline function, since there's no
inline definition of it.


189170 28-Feb-2009 ed

Add memmove() to the kernel, making the kernel compile with Clang.

When copying big structures, LLVM generates calls to memmove(), because
it may not be able to figure out whether structures overlap. This caused
linker errors to occur. memmove() is now implemented using bcopy().
Ideally it would be the other way around, but that can be solved in the
future. On ARM we don't do add anything, because it already has
memmove().

Discussed on: arch@
Reviewed by: rdivacky


187397 18-Jan-2009 stas

- Eliminate warnings in debug print macros by explicitly converting all
field to unsigned long.


187396 18-Jan-2009 stas

- Whitespace fixes.
- s_bmask field doesn't exist.
- Use correct flags in debug printf.


187395 18-Jan-2009 stas

- Obtain inode sizes and location of the first inode based on the contents
of superblock rather than using hardcoded values. This fixes ext2fs on
filesystems with inode sized other than 128.

Submitted by: Alex Lyashkov <Alexey.Lyashkov@Sun.COM> (based on)
MFC after: 2 weeks


186740 04-Jan-2009 kib

Do not incorrectly add the low 5 bits of the offset to the resulting
position of the found zero bit.

Submitted by: Jaakko Heinonen <jh saunalahti fi>
MFC after: 2 weeks


186194 16-Dec-2008 trasz

According to phk@, VOP_STRATEGY should never, _ever_, return
anything other than 0. Make it so. This fixes
"panic: VOP_STRATEGY failed bp=0xc320dd90 vp=0xc3b9f648",
encountered when writing to an orphaned filesystem. Reason
for the panic was the following assert:
KASSERT(i == 0, ("VOP_STRATEGY failed bp=%p vp=%p", bp, bp->b_vp));
at vfs_bio:bufstrategy().

Reviewed by: scottl, phk
Approved by: rwatson (mentor)
Sponsored by: FreeBSD Foundation


184965 14-Nov-2008 trasz

Adapt to accmode_t changes.

Approved by: rwatson (mentor), kan


184554 02-Nov-2008 attilio

Improve VFS locking:
- Implement real draining for vfs consumers by not relying on the
mnt_lock and using instead a refcount in order to keep track of lock
requesters.
- Due to the change above, remove the mnt_lock lockmgr because it is now
useless.
- Due to the change above, vfs_busy() is no more linked to a lockmgr.
Change so its KPI by removing the interlock argument and defining 2 new
flags for it: MBF_NOWAIT which basically replaces the LK_NOWAIT of the
old version (which was unlinked from the lockmgr alredy) and
MBF_MNTLSTLOCK which provides the ability to drop the mountlist_mtx
once the mnt interlock is held (ability still desired by most consumers).
- The stub used into vfs_mount_destroy(), that allows to override the
mnt_ref if running for more than 3 seconds, make it totally useless.
Remove it as it was thought to work into older versions.
If a problem of "refcount held never going away" should appear, we will
need to fix properly instead than trust on such hackish solution.
- Fix a bug where returning (with an error) from dounmount() was still
leaving the MNTK_MWAIT flag on even if it the waiters were actually
woken up. Just a place in vfs_mount_destroy() is left because it is
going to recycle the structure in any case, so it doesn't matter.
- Remove the markercnt refcount as it is useless.

This patch modifies VFS ABI and breaks KPI for vfs_busy() so manpages and
__FreeBSD_version will be modified accordingly.

Discussed with: kib
Tested by: pho


184413 28-Oct-2008 trasz

Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessary
to add more V* constants, and the variables changed by this patch were often
being assigned to mode_t variables, which is 16 bit.

Approved by: rwatson (mentor)


184410 28-Oct-2008 kib

Garbage-collect ext2_kqfilter vop that is now a copy of vop_stdkqfilter().


184205 23-Oct-2008 des

Retire the MALLOC and FREE macros. They are an abomination unto style(9).

MFC after: 3 months


183754 10-Oct-2008 attilio

Remove the struct thread unuseful argument from bufobj interface.
In particular following functions KPI results modified:
- bufobj_invalbuf()
- bufsync()

and BO_SYNC() "virtual method" of the buffer objects set.
Main consumers of bufobj functions are affected by this change too and,
in particular, functions which changed their KPI are:
- vinvalbuf()
- g_vfs_close()

Due to the KPI breakage, __FreeBSD_version will be bumped in a later
commit.

As a side note, please consider just temporary the 'curthread' argument
passing to VOP_SYNC() (in bufsync()) as it will be axed out ASAP

Reviewed by: kib
Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>


183215 20-Sep-2008 kib

fdescfs, devfs, mqueuefs, nfs, portalfs, pseudofs, tmpfs and xfs
initialize the vattr structure in VOP_GETATTR() with VATTR_NULL(),
vattr_null() or by zeroing it. Remove these to allow preinitialization
of fields work in vn_stat(). This is needed to get birthtime initialized
correctly.

Submitted by: Jaakko Heinonen <jh saunalahti fi>
Discussed on: freebsd-fs
MFC after: 1 month


183212 20-Sep-2008 kib

Initialize va_flags and va_filerev properly in VOP_GETATTR(). Don't
initialize va_vaflags and va_spare because they are not part of the
VOP_GETATTR() API. Also don't initialize birthtime to ctime or zero.

Submitted by: Jaakko Heinonen <jh saunalahti fi>
Reviewed by: bde
Discussed on: freebsd-fs
MFC after: 1 month


183071 16-Sep-2008 kib

Garbage-collect vn_write_suspend_wait().

Suggested and reviewed by: tegge
Tested by: pho
MFC after: 1 month


183054 15-Sep-2008 sam

Make ddb command registration dynamic so modules can extend
the command set (only so long as the module is present):
o add db_command_register and db_command_unregister to add and remove
commands, respectively
o replace linker sets with SYSINIT's (and SYSUINIT's) that register
commands
o expose 3 list heads: db_cmd_table, db_show_table, and db_show_all_table
for registering top-level commands, show operands, and show all operands,
respectively

While here also:
o sort command lists
o add DB_ALIAS, DB_SHOW_ALIAS, and DB_SHOW_ALL_ALIAS to add aliases
for existing commands
o add "show all trace" as an alias for "show alltrace"
o add "show all locks" as an alias for "show alllocks"

Submitted by: Guillaume Ballet <gballet@gmail.com> (original version)
Reviewed by: jhb
MFC after: 1 month


182905 10-Sep-2008 trasz

Remove VSVTX, VSGID and VSUID. This should be a no-op,
as VSVTX == S_ISVTX, VSGID == S_ISGID and VSUID == S_ISUID.

Approved by: rwatson (mentor)


182542 31-Aug-2008 attilio

Decontextualize vfs_busy(), vfs_unbusy() and vfs_mount_alloc() functions.

Manpages are updated accordingly.

Tested by: Diego Sardina <siarodx at gmail dot com>


182371 28-Aug-2008 attilio

Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread
was always curthread and totally unuseful.

Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>


180682 21-Jul-2008 attilio

- Disallow XFS mounting in write mode. The write support never worked really
and there is no need to maintain it.
- Fix vn_get() in order to let it call vget(9) with a valid locking
request. vget(9) returns the vnode locked in order to prevent recycling,
but in this case internal XFS locks alredy prevent it from happening, so
it is safe to drop the vnode lock before to return by vn_get().
- Add a VNASSERT() in vget(9) in order to catch malformed locking requests.

Discussed with: kan, kib
Tested by: Lothar Braun <lothar at lobraun dot de>


180208 03-Jul-2008 peter

Set magic fbsd:nokeywords property that allows files to bypass
keyword expansion. (file-specific replacement for CVSROOT/exclude)


178243 16-Apr-2008 kib

Move the head of byte-level advisory lock list from the
filesystem-specific vnode data to the struct vnode. Provide the
default implementation for the vop_advlock and vop_advlockasync.
Purge the locks on the vnode reclaim by using the lf_purgelocks().
The default implementation is augmented for the nfs and smbfs.
In the nfs_advlock, push the Giant inside the nfs_dolock.

Before the change, the vop_advlock and vop_advlockasync have taken the
unlocked vnode and dereferenced the fs-private inode data, racing with
with the vnode reclamation due to forced unmount. Now, the vop_getattr
under the shared vnode lock is used to obtain the inode size, and
later, in the lf_advlockasync, after locking the vnode interlock, the
VI_DOOMED flag is checked to prevent an operation on the doomed vnode.

The implementation of the lf_purgelocks() is submitted by dfr.

Reported by: kris
Tested by: kris, pho
Discussed with: jeff, dfr
MFC after: 2 weeks


177645 26-Mar-2008 jhb

Fix a nit with the 'nofoo' options where 'foo' is mapped to 'nonofoo'
(such as 'atime' vs 'noatime'). The filesystems will always see either
'nofoo' or 'nonofoo', never plain 'foo'. As such, their list of valid
mount options should include 'nofoo' instead of 'foo'. With this fix,
you can do 'mount -u -o atime' on a FFS filesystem that isn't marked as
noatime without getting an error. You can also update a noatime FFS
filesystem mounted via mount(2) (e.g. 6.x /sbin/mount binary) to 'atime'
using nmount(2) (e.g. 7.x /sbin/mount binary).

MFC after: 1 week
Reviewed by: crodig


176519 24-Feb-2008 attilio

Introduce some functions in the vnode locks namespace and in the ffs
namespace in order to handle lockmgr fields in a controlled way instead
than spreading all around bogus stubs:
- VN_LOCK_AREC() allows lock recursion for a specified vnode
- VN_LOCK_ASHARE() allows lock sharing for a specified vnode

In FFS land:
- BUF_AREC() allows lock recursion for a specified buffer lock
- BUF_NOREC() disallows recursion for a specified buffer lock

Side note: union_subr.c::unionfs_node_update() is the only other function
directly handling lockmgr fields. As this is not simple to fix, it has
been left behind as "sole" exception.


176249 13-Feb-2008 attilio

- Add real assertions to lockmgr locking primitives.
A couple of notes for this:
* WITNESS support, when enabled, is only used for shared locks in order
to avoid problems with the "disowned" locks
* KA_HELD and KA_UNHELD only exists in the lockmgr namespace in order
to assert for a generic thread (not curthread) owning or not the
lock. Really, this kind of check is bogus but it seems very
widespread in the consumers code. So, for the moment, we cater this
untrusted behaviour, until the consumers are not fixed and the
options could be removed (hopefully during 8.0-CURRENT lifecycle)
* Implementing KA_HELD and KA_UNHELD (not surported natively by
WITNESS) made necessary the introduction of LA_MASKASSERT which
specifies the range for default lock assertion flags
* About other aspects, lockmgr_assert() follows exactly what other
locking primitives offer about this operation.

- Build real assertions for buffer cache locks on the top of
lockmgr_assert(). They can be used with the BUF_ASSERT_*(bp)
paradigm.

- Add checks at lock destruction time and use a cookie for verifying
lock integrity at any operation.

- Redefine BUF_LOCKFREE() in order to not use a direct assert but
let it rely on the aforementioned destruction time check.

KPI results evidently broken, so __FreeBSD_version bumping and
manpage update result necessary and will be committed soon.

Side note: lockmgr_assert() will be used soon in order to implement
real assertions in the vnode namespace replacing the legacy and still
bogus "VOP_ISLOCKED()" way.

Tested by: kris (earlier version)
Reviewed by: jhb


175635 24-Jan-2008 attilio

Cleanup lockmgr interface and exported KPI:
- Remove the "thread" argument from the lockmgr() function as it is
always curthread now
- Axe lockcount() function as it is no longer used
- Axe LOCKMGR_ASSERT() as it is bogus really and no currently used.
Hopefully this will be soonly replaced by something suitable for it.
- Remove the prototype for dumplockinfo() as the function is no longer
present

Addictionally:
- Introduce a KASSERT() in lockstatus() in order to let it accept only
curthread or NULL as they should only be passed
- Do a little bit of style(9) cleanup on lockmgr.h

KPI results heavilly broken by this change, so manpages and
FreeBSD_version will be modified accordingly by further commits.

Tested by: matteo


175486 19-Jan-2008 attilio

- Introduce the function lockmgr_recursed() which returns true if the
lockmgr lkp, when held in exclusive mode, is recursed
- Introduce the function BUF_RECURSED() which does the same for bufobj
locks based on the top of lockmgr_recursed()
- Introduce the function BUF_ISLOCKED() which works like the counterpart
VOP_ISLOCKED(9), showing the state of lockmgr linked with the bufobj

BUF_RECURSED() and BUF_ISLOCKED() entirely replace the usage of bogus
BUF_REFCNT() in a more explicative and SMP-compliant way.
This allows us to axe out BUF_REFCNT() and leaving the function
lockcount() totally unused in our stock kernel. Further commits will
axe lockcount() as well as part of lockmgr() cleanup.

KPI results, obviously, broken so further commits will update manpages
and freebsd version.

Tested by: kris (on UFS and NFS)


175294 13-Jan-2008 attilio

VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>


175202 10-Jan-2008 attilio

vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by: Diego Sardina <siarodx at gmail dot com>,
Andrea Di Pasquale <whyx dot it at gmail dot com>


173066 27-Oct-2007 rodrigc

Remove duplicate "union" from ext2_opts.

Noticed by: bde


172697 16-Oct-2007 alfred

Get rid of qaddr_t.

Requested by: bde


171905 20-Aug-2007 cognet

Some times ago, vfs_getopts() was changed, so that it would set error to
ENOENT if the option wasn't provided, instead of setting it to 0.
xfs however didn't catch up on this, so it assumed something went bad if
vfs_getopts() sets the error to non-zero, and just returns the error.
Unbreak xfs mount by just ignoring the error if vfs_getopts() sets the
error to ENOENT, as we should have sane defaults.

Reviewed by: kan
Approved by: re (rwatson)
Tested by: rpaulo


171852 15-Aug-2007 jhb

On 6.x this works:

% mount | grep home
/dev/ad4s1e on /home (ufs, local, noatime, soft-updates)
% mount -u -o atime /home
% mount | grep home
/dev/ad4s1e on /home (ufs, local, soft-updates)

Restore this behavior for on 7.x for the following mount options:
noatime, noclusterr, noclusterw, noexec, nosuid, nosymfollow

In addition, on 7.x, the following are equivalent:
mount -u -o atime /home
mount -u -o nonoatime /home

Ideally, when we introduce new mount options, we should avoid
options starting with "no". :)

Requested by: jhb
Reported by: Karol Kwiat <karol.kwiat gmail com>, Scott Hetzel <swhetzel gmail com>
Approved by: re (bmah)
Proxy commit for: rodrigc


171450 14-Jul-2007 rodrigc

The last entry in the ext2_opts array must be NULL,
otherwise the kernel with crash in vfs_filteropt() if an invalid
mount option is passed to ext2fs.

Approved by: re (kensmith)


170587 12-Jun-2007 rwatson

Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in
some cases, move to priv_check() if it was an operation on a thread and
no other flags were present.

Eliminate caller-side jail exception checking (also now-unused); jail
privilege exception code now goes solely in kern_jail.c.

We can't yet eliminate suser() due to some cases in the KAME code where
a privilege check is performed and then used in many different deferred
paths. Do, however, move those prototypes to priv.h.

Reviewed by: csjp
Obtained from: TrustedBSD Project


170491 10-Jun-2007 mjacob

Remove 'inline' qualifiers from functions which are not, in fact, inlines.


170183 01-Jun-2007 kib

Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation
argument from being file descriptor index into the pointer to struct file:
part 2. Convert calls missed in the first big commit.

Noted by: rwatson
Pointy hat to: kib


170174 01-Jun-2007 jeff

- Move rusage from being per-process in struct pstats to per-thread in
td_ru. This removes the requirement for per-process synchronization in
statclock() and mi_switch(). This was previously supported by
sched_lock which is going away. All modifications to rusage are now
done in the context of the owning thread. reads proceed without locks.
- Aggregate exiting threads rusage in thread_exit() such that the exiting
thread's rusage is not lost.
- Provide a new routine, rufetch() to fetch an aggregate of all rusage
structures from all threads in a process. This routine must be used
in any place requiring a rusage from a process prior to it's exit. The
exited process's rusage is still available via p_ru.
- Aggregate tick statistics only on demand via rufetch() or when a thread
exits. Tick statistics are kept in the thread and protected by sched_lock
until it exits.

Initial patch by: attilio
Reviewed by: attilio, bde (some objections), arch (mostly silent)


170124 30-May-2007 kan

Bow to incomplete GCC 4. constant propagation optimizations and
initialize some of the local variables GCC claims are being used
uninitialized.


168206 01-Apr-2007 rodrigc

Change #include <machine/pcpu.h> to #include <sys/pcpu.h>
to get definition of curthread, required by <sys/sx.h>.


168191 31-Mar-2007 jhb

Optimize sx locks to use simple atomic operations for the common cases of
obtaining and releasing shared and exclusive locks. The algorithms for
manipulating the lock cookie are very similar to that rwlocks. This patch
also adds support for exclusive locks using the same algorithm as mutexes.

A new sx_init_flags() function has been added so that optional flags can be
specified to alter a given locks behavior. The flags include SX_DUPOK,
SX_NOWITNESS, SX_NOPROFILE, and SX_QUITE which are all identical in nature
to the similar flags for mutexes.

Adaptive spinning on select locks may be enabled by enabling the
ADAPTIVE_SX kernel option. Only locks initialized with the SX_ADAPTIVESPIN
flag via sx_init_flags() will adaptively spin.

The common cases for sx_slock(), sx_sunlock(), sx_xlock(), and sx_xunlock()
are now performed inline in non-debug kernels. As a result, <sys/sx.h> now
requires <sys/lock.h> to be included prior to <sys/sx.h>.

The new kernel option SX_NOINLINE can be used to disable the aforementioned
inlining in non-debug kernels.

The size of struct sx has changed, so the kernel ABI is probably greatly
disturbed.

MFC after: 1 month
Submitted by: attilio
Tested by: kris, pjd


167580 15-Mar-2007 rodrigc

Add "force" to ext2_ops, to match what was in the old mount_ext2fs binary.

Reported by: Ivan Voras <ivoras fer hr>


167497 13-Mar-2007 tegge

Make insmntque() externally visibile and allow it to fail (e.g. during
late stages of unmount). On failure, the vnode is recycled.

Add insmntque1(), to allow for file system specific cleanup when
recycling vnode on failure.

Change getnewvnode() to no longer call insmntque(). Previously,
embryonic vnodes were put onto the list of vnode belonging to a file
system, which is unsafe for a file system marked MPSAFE.

Change vfs_hash_insert() to no longer lock the vnode. The caller now
has that responsibility.

Change most file systems to lock the vnode and call insmntque() or
insmntque1() after a new vnode has been sufficiently setup. Handle
failed insmntque*() calls by propagating errors to callers, possibly
after some file system specific cleanup.

Approved by: re (kensmith)
Reviewed by: kib
In collaboration with: kib


167152 01-Mar-2007 pjd

Rename PRIV_VFS_CLEARSUGID to PRIV_VFS_RETAINSUGID, which seems to better
describe the privilege.

OK'ed by: rwatson


167151 01-Mar-2007 pjd

Avoid checking for privileges if there is no need to.

Discussed with: rwatson


166774 15-Feb-2007 pjd

Move vnode-to-file-handle translation from vfs_vptofh to vop_vptofh method.
This way we may support multiple structures in v_data vnode field within
one file system without using black magic.

Vnode-to-file-handle should be VOP in the first place, but was made VFS
operation to keep interface as compatible as possible with SUN's VFS.
BTW. Now Solaris also implements vnode-to-file-handle as VOP operation.

VFS_VPTOFH() was left for API backward compatibility, but is marked for
removal before 8.0-RELEASE.

Approved by: mckusick
Discussed with: many (on IRC)
Tested with: ufs, msdosfs, cd9660, nullfs and zfs


166193 23-Jan-2007 kib

Cylinder group bitmaps and blocks containing inode for a snapshot
file are after snaplock, while other ffs device buffers are before
snaplock in global lock order. By itself, this could cause deadlock
when bdwrite() tries to flush dirty buffers on snapshotted ffs. If,
during the flush, COW activity for snapshot needs to allocate block
and ffs_alloccg() selects the cylinder group that is being written
by bdwrite(), then kernel would panic due to recursive buffer lock
acquision.

Avoid dealing with buffers in bdwrite() that are from other side of
snaplock divisor in the lock order then the buffer being written. Add
new BOP, bop_bdwrite(), to do dirty buffer flushing for same vnode in
the bdwrite(). Default implementation, bufbdflush(), refactors the code
from bdwrite(). For ffs device buffers, specialized implementation is
used.

Reviewed by: tegge, jeff, Russell Cattelan (cattelan xfs org, xfs changes)
Tested by: Peter Holm
X-MFC after: 3 weeks (if ever: it changes ABI)


164385 18-Nov-2006 rodrigc

Previously, the mount_ext2fs binary listed the acceptable mount
options for ext2fs. Now that we use nmount() directly from the mount
binary to access ext2fs filesystems, add the list of acceptable mount
options to ext2_ops, so that vfs_filteropts() will accept
options like "noatime" for ext2fs.

PR: 105483
Noticed by: Dr. Markus Waldeck <waldeck gmx de>
MFC after: 1 month


164033 06-Nov-2006 rwatson

Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges. These may
require some future tweaking.

Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>


162649 26-Sep-2006 tegge

Add mnt_noasync counter to better handle interleaved calls to nmount(),
sync() and sync_fsync() without losing MNT_ASYNC. Add MNTK_ASYNC flag
which is set only when MNT_ASYNC is set and mnt_noasync is zero, and
check that flag instead of MNT_ASYNC before initiating async io.


162647 26-Sep-2006 tegge

Use mount interlock to protect all changes to mnt_flag and mnt_kern_flag.
This eliminates a race where MNT_UPDATE flag could be lost when nmount()
raced against sync(), sync_fsync() or quotactl().


159498 11-Jun-2006 rodrigc

Implement vnode operations for setting and removing extended attributes.


159497 11-Jun-2006 rodrigc

Restore routines for getting and listing extended attributes which
were lost in the last merge.


159496 11-Jun-2006 rodrigc

Restore changes to spinlock macros before merge.


159495 11-Jun-2006 rodrigc

Remove debugging printf


159493 10-Jun-2006 rodrigc

Temporarily disable log recovery until we fix panics.


159492 10-Jun-2006 rodrigc

Logical OR the following flags into the va_mode field:
S_IFDIR when making a directory
S_IFLNK when making a symbolic link
S_IFIFO when making a pipe

xfs_ialloc() checks this field for these flags when figuring
out whether to make a directory, make a symbolic link or make a pipe.


159489 10-Jun-2006 rodrigc

Call g_vfs_close() if:
(1) _xfs_mount() fails
(2) at the end of _xfs_unmount()


159488 10-Jun-2006 rodrigc

Do not call vput() after we call VOP_UNLOCK().


159456 09-Jun-2006 rodrigc

Change %llx to %jx in printf() to eliminate warnings on 64-bit platforms.


159455 09-Jun-2006 rodrigc

Bring back changes in version 1.3 lost in previous commit.


159452 09-Jun-2006 rodrigc

More changes due to latest XFS import.

Work done by: Russell Cattelan <cattelan at xfs dot org>


159451 09-Jun-2006 rodrigc

Sync XFS for FreeBSD tree with newer changes from SGI XFS for Linux tree.
Improve support for writing to XFS partitions.

Work done by: Russell Cattelan <cattelan at xfs dot org>


159153 01-Jun-2006 rodrigc

Include "xfs_macros.h" to fix tinderbox build breakage.


159147 01-Jun-2006 imp

Cope with -Wundef. This means including xfs_macros.h early in a few more
files and changing #if XXXKAN -> #ifdef XXXKAN.

# this is just compile tested, since I don't have xfs partitions.


158953 26-May-2006 rodrigc

Add support for "export" option, to allow NFS exporting
of XFS filesystems.


158951 26-May-2006 rodrigc

Remove calls to vfs_export() for exporting a filesystem for NFS mounting
from individual filesystems. Call it instead in vfs_mount.c,
after we call VFS_MOUNT() for a specific filesystem.

Approved by: dumbbell


158924 26-May-2006 rodrigc

Remove calls to vfs_export() for exporting a filesystem for NFS mounting
from individual filesystems. Call it instead in vfs_mount.c,
after we call VFS_MOUNT() for a specific filesystem.


158317 05-May-2006 keramida

Check for VFS_STATFS() failure in _xfs_mount() and abort the mount
on errors.

Found by: Coverity Prevent
Approved by: rodrigc, Russell Cattelan
MFC after: 4 weeks


157739 13-Apr-2006 cracauer

Repair ext2fs writes.

Strong candidate for backport to 6.x.

When allocating new blocks, the search for block group beginnings
would fail with a segfault. There was a side-effect read access with
an off-by-one errors. The results were not used in the error case so
the code worked in the past. But now the FreeBSD kernel has tighter
mappings and the word accessed is not mapped (for me).

The Linux kernel has rewritten most of the allocation strategy by now.
Also, the Linux kernel cleaned up the integration of these files and
it look feasable to wrap the original Linux files in wrapper that
provides their favorite arguments instead of dragging around our own
code.


156433 08-Mar-2006 jhb

Update a DB_SET to DB_FUNC I missed yesterday.


154152 09-Jan-2006 tegge

Add marker vnodes to ensure that all vnodes associated with the mount point are
iterated over when using MNT_VNODE_FOREACH.

Reviewed by: truckman


154060 05-Jan-2006 dumbbell

Don't hold a reference to the disk vnode for each inode.


153858 29-Dec-2005 cracauer

This is the style-fix for my previous commit. Sorry for the delay, I
forgot about it.


153400 14-Dec-2005 des

Eradicate caddr_t from the VFS API.


153394 13-Dec-2005 rodrigc

Hide DDB-specific functions inside check for #ifdef DDB.

Noticed by: des


153369 13-Dec-2005 rodrigc

Inherit system-wide BLKDEV_IOSIZE definition.

Submitted by: kan


153330 12-Dec-2005 rodrigc

#define __user to nothing


153323 12-Dec-2005 rodrigc

Initial import of read-only support for SGI's XFS filesystem.

Contributed by: XFS for FreeBSD project


153110 05-Dec-2005 ru

Fix -Wundef warnings found when compiling i386 LINT, GENERIC and
custom kernels.


153084 04-Dec-2005 ru

Fix -Wundef from compiling the amd64 LINT.


153081 04-Dec-2005 ru

Oops, the bug is still here, but reimplement the cpp(1) conditional properly.


153080 04-Dec-2005 ru

There no longer seems to be this bug in gcc(1). Remove the
badly implemented workaround that caused a workaround to be
applied to all architectures, not only amd64.


151897 31-Oct-2005 rwatson

Normalize a significant number of kernel malloc type names:

- Prefer '_' to ' ', as it results in more easily parsed results in
memory monitoring tools such as vmstat.

- Remove punctuation that is incompatible with using memory type names
as file names, such as '/' characters.

- Disambiguate some collisions by adding subsystem prefixes to some
memory types.

- Generally prefer lower case to upper case.

- If the same type is defined in multiple architecture directories,
attempt to use the same name in additional cases.

Not all instances were caught in this change, so more work is required to
finish this conversion. Similar changes are required for UMA zone names.


151811 28-Oct-2005 cracauer

Fix this:
kern/87959 cracauer ext2fs: no cp(1) possible, mmap returns EINVAL

ext2fs was missing vnode_create_vobject.

(Reisefs probably has the same problem but I want to get this in quick
for 6-release)


151532 21-Oct-2005 dumbbell

Apply the same fix to a potential race in the ISDOTDOT code
in reiserfs_lookup() that was used to fix an actual race in
ufs_lookup.c:1.78. This is not currently a hazard, but the
bug would be activated by marking reiserfs as MPSAFE.

Reviewed by: mux (mentor)
MFC after: 2 weeks


151391 16-Oct-2005 truckman

Apply the same fix to a potential race in the ISDOTDOT code in
ext2_lookup() that was used to fix an actual race in ufs_lookup.c:1.78.
This is not currently a hazard, but the bug would be activated by
marking ext2fs as MPSAFE.

Requested by: bde
MFC after: 2 weeks


150663 28-Sep-2005 rwatson

Back out alpha/alpha/trap.c:1.124, osf1_ioctl.c:1.14, osf1_misc.c:1.57,
osf1_signal.c:1.41, amd64/amd64/trap.c:1.291, linux_socket.c:1.60,
svr4_fcntl.c:1.36, svr4_ioctl.c:1.23, svr4_ipc.c:1.18, svr4_misc.c:1.81,
svr4_signal.c:1.34, svr4_stat.c:1.21, svr4_stream.c:1.55,
svr4_termios.c:1.13, svr4_ttold.c:1.15, svr4_util.h:1.10,
ext2_alloc.c:1.43, i386/i386/trap.c:1.279, vm86.c:1.58,
unaligned.c:1.12, imgact_elf.c:1.164, ffs_alloc.c:1.133:

Now that Giant is acquired in uprintf() and tprintf(), the caller no
longer leads to acquire Giant unless it also holds another mutex that
would generate a lock order reversal when calling into these functions.
Specifically not backed out is the acquisition of Giant in nfs_socket.c
and rpcclnt.c, where local mutexes are held and would otherwise violate
the lock order with Giant.

This aligns this code more with the eventual locking of ttys.

Suggested by: bde


150335 19-Sep-2005 rwatson

Add GIANT_REQUIRED and WITNESS sleep warnings to uprintf() and tprintf(),
as they both interact with the tty code (!MPSAFE) and may sleep if the
tty buffer is full (per comment).

Modify all consumers of uprintf() and tprintf() to hold Giant around
calls into these functions. In most cases, this means adding an
acquisition of Giant immediately around the function. In some cases
(nfs_timer()), it means acquiring Giant higher up in the callout.

With these changes, UFS no longer panics on SMP when either blocks are
exhausted or inodes are exhausted under load due to races in the tty
code when running without Giant.

NB: Some reduction in calls to uprintf() in the svr4 code is probably
desirable.

NB: In the case of nfs_timer(), calling uprintf() while holding a mutex,
or even in a callout at all, is a bad idea, and will generate warnings
and potential upset. This needs to be fixed, but was a problem before
this change.

NB: uprintf()/tprintf() sleeping is generally a bad ideas, as is having
non-MPSAFE tty code.

MFC after: 1 week


149960 10-Sep-2005 rodrigc

In ext2_mountfs(), check that the superblock size, SBSIZE,
is aligned with the sectorsize value returned by GEOM, before
doing a bread() of the superblock.
This eliminates a panic when trying the following on an empty CD-ROM drive:
mount_ext2fs /dev/acd0 /mnt

Reviewed by: phk


149875 08-Sep-2005 truckman

Add a new struct buf flag bit, B_PERSISTENT, and use it to tag
struct bufs that are persistently held by ext2fs. Ignore any buffers
with this flag in the code in boot() that counts "busy" and dirty
buffers and attempts to sync the dirty buffers, which is done before
attempting to unmount all the file systems during shutdown.

This fixes the problem caused by any ext2fs file systems that are
mounted at system shutdown time, which caused boot() to give up on
a non-zero number of buffers and skip the call to vfs_unmountall().
This left all the mounted file systems in a dirty state and caused
them to all require cleanup by fsck on reboot.

Move the two separate copies of the "busy" buffer test in boot()
to a separate function.

Nuke the useless spl() stuff in the ext2fs ULCK_BUF() macro.

Bring the PRINT_BUF_FLAGS definition in sys/buf.h up to date with
this and previous flag changes.

PR: kern/56675, kern/85163
Tested by: "Matthias Andree" matthias.andree at gmx.de
Reviewed by: bde
MFC after: 3 days


149771 03-Sep-2005 ssouhlal

Unbreak hpfs/ntfs/udf/ext2fs/reiserfs mounting.

Another pointyhat to: ssouhlal


149720 02-Sep-2005 ssouhlal

*_mountfs() (if the filesystem mounts from a device) needs devvp to be
locked, so lock it.

Glanced at by: phk
MFC after: 3 days


147868 09-Jul-2005 cracauer

Repair this:

ext2fs fails to set the device in the stat(2) system call.

Subsequently, that makes fts(3) fail, which goes as far as make ls(1)
fail (which uses fts) on ext2fs.

Approved by: re (Robert Watson <rwatson@FreeBSD.org>)


147512 21-Jun-2005 dumbbell

Replace the use if ext2fs' bitops by bitstring.h macros. This fixes
portability issues. Also note that for amd64, a hack is used to work
around gcc optimization (thanks to cognet@).

Reviewed by: mux (mentor)
Approved by: re (dougb)


147476 18-Jun-2005 dumbbell

Moving reiserfs from sys/gnu to sys/gnu/fs. This was discussed on arch@.

Reviewed by: mux (mentor)
Approved by: re (scottl)


147408 16-Jun-2005 imp

Add standard GPL boilerplate to these files. They are the only ones
contaminated with the GPL code. While this information was present in
the COPYRIGHT.INFO file, it is FreeBSD's standard practice to, where
possible, include explicit license information in files.

Approved by: release engineer (scottl)


147393 15-Jun-2005 rodrigc

Move ext2fs from src/gnu to src/gnu/fs.
Discussed on arch@.

Reviewed by: kan
Approved by: re (blanket), kan


147198 09-Jun-2005 ssouhlal

Allow EVFILT_VNODE events to work on every filesystem type, not just
UFS by:
- Making the pre and post hooks for the VOP functions work even when
DEBUG_VFS_LOCKS is not defined.
- Moving the KNOTE activations into the corresponding VOP hooks.
- Creating a MNTK_NOKNOTE flag for the mnt_kern_flag field of struct
mount that permits filesystems to disable the new behavior.
- Creating a default VOP_KQFILTER function: vfs_kqfilter()

My benchmarks have not revealed any performance degradation.

Reviewed by: jeff, bde
Approved by: rwatson, jmg (kqueue changes), grehan (mentor)


145006 13-Apr-2005 jeff

- Change all filesystems and vfs_cache to relock the dvp once the child is
locked in the ISDOTDOT case. Se vfs_lookup.c r1.79 for details.

Sponsored by: Isilon Systems, Inc.


144299 29-Mar-2005 jeff

- Remove wantparent, it is no longer necessary. An assert in vfs_lookup.c
prevents any callers from doing a modifying op without
LOCKPARENT or WANTPARENT.


144211 28-Mar-2005 jeff

- ext2fs_lookup() is no longer responsible for unlocking the dvp, this is
handled in vfs_lookup.c. This code was missing PDIRUNLOCK use prior
to the removal of PDIRUNLOCK in rev 1.73 of vfs_lookup.c.

Sponsored by: Isilon Systems, Inc.


144059 24-Mar-2005 jeff

- Update vfs_root implementations to match the new prototype. None of
these filesystems will support shared locks until they are explicitly
modified to do so. Careful review must be done to ensure that this
is safe for each individual filesystem.

Sponsored by: Isilon Systems, Inc.


143692 16-Mar-2005 phk

Add two arguments to the vfs_hash() KPI so that filesystems which do
not have unique hashes (NFS) can also use it.


143686 16-Mar-2005 phk

Remove inode fields previously used for private inode hash tables.


143677 16-Mar-2005 phk

Don't hold a reference to the disk vnode for each inode.

Don't store the disk cdev in all inodes, it's only used for debugging
printfs.


143663 15-Mar-2005 phk

Improve the vfs_hash() API: vput() the unneeded vnode centrally to
avoid replicating the vput in all the filesystems.


143619 15-Mar-2005 phk

Simplify the vfs_hash calling convention.


143578 14-Mar-2005 phk

Use vfs_hash() instead of home-rolled


143509 13-Mar-2005 jeff

- Catch up with ufs_inode 1.59, ffs_vfsops.c 1.280, and ufs_vnops.c 1.267.
Various changes to support new vgone() locking protocol.

Sponsored by: Isilon Systems, Inc.


142692 27-Feb-2005 phk

Remove debug printout of major/minor numbers, print name instead.


142353 24-Feb-2005 sam

move ptr deref's to after null ptr checks

Noticed by: Coverity Prevent analysis tool


141633 10-Feb-2005 phk

Make a SYSCTL_NODE static


140939 28-Jan-2005 phk

Make filesystems get rid of their own vnodes vnode_pager object in
VOP_RECLAIM().


140936 28-Jan-2005 phk

Remove unused argument to vrecycle()


140822 25-Jan-2005 phk

Introduce and use g_vfs_close().


140768 24-Jan-2005 phk

Create a vp->v_object in VFS_FHTOVP() if we want to be exportable
with NFS.

We are moving responsibility for creating the vnode_pager object into
the filesystems which own the vnode, and this is one of the places
we have to cover.

We call vnode_create_vobject() directly because we own the vnode.

If we can get the size easily, pass it as an argument to save the
call to VOP_GETATTR() in vnode_create_vobject()


140736 24-Jan-2005 phk

Remove unused cred argument to ext2_reload()


140220 14-Jan-2005 phk

Eliminate unused and unnecessary "cred" argument from vinvalbuf()


140051 11-Jan-2005 phk

Wrap the bufobj operations in macros: BO_STRATEGY() and BO_WRITE()


140048 11-Jan-2005 phk

Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC().

I'm not sure why a credential was added to these in the first place, it is
not used anywhere and it doesn't make much sense:

The credentials for syncing a file (ability to write to the
file) should be checked at the system call level.

Credentials for syncing one or more filesystems ("none")
should be checked at the system call level as well.

If the filesystem implementation needs a particular credential
to carry out the syncing it would logically have to the
cached mount credential, or a credential cached along with
any delayed write data.

Discussed with: rwatson


139778 06-Jan-2005 imp

/* -> /*- for copyright notices, minor format tweaks as necessary


139777 06-Jan-2005 imp

Add dol FreeBSD dol and /*+ize license


138868 14-Dec-2004 phk

Implement simpler panics for VOP_{read,write} on fifos.


138693 11-Dec-2004 marcel

Revert previous commit. The null-pointer function call (a dereference
on ia64) was not the result of a change in the vector operations. It
was caused by the NFS locking code using a FIFO and those bypassing
the vnode. This indirectly caused the panic. The NFS locking code has
been changed.

Requested by: phk


138493 06-Dec-2004 phk

Convert to nmount. Add omount compat code.


138412 05-Dec-2004 phk

VFS_STATFS(mp, ...) is mostly called with &mp->mnt_stat, but a few cases
doesn't. Most of the implementations have grown weeds for this so they
copy some fields from mnt_stat if the passed argument isn't that.

Fix this the cleaner way: Always call the implementation on mnt_stat
and copy that in toto to the VFS_STATFS argument if different.


138411 05-Dec-2004 marcel

Fix null-pointer indirect function calls introduced in the previous
commit. In the new world order, the transitive closure on the vector
operations is not precomputed. As such, it's unsafe to actually use
any of the function pointers in an indirect function call. They can
be null, and we need to use the default vector in that case.
This is mostly a quick fix for the four function pointers that are
ed explicitly. A more generic or scalable solution is likely to see
the light of day.

No pathos on: current@


138368 04-Dec-2004 phk

Remove #if 0'ed rootfs mounting code.


138290 01-Dec-2004 phk

Back when VOP_* was introduced, we did not have new-style struct
initializations but we did have lofty goals and big ideals.

Adjust to more contemporary circumstances and gain type checking.

Replace the entire vop_t frobbing thing with properly typed
structures. The only casualty is that we can not add a new
VOP_ method with a loadable module. History has not given
us reason to belive this would ever be feasible in the the
first place.

Eliminate in toto VOCALL(), vop_t, VNODEOP_SET() etc.

Give coda correct prototypes and function definitions for
all vop_()s.

Generate a bit more data from the vnode_if.src file: a
struct vop_vector and protype typedefs for all vop methods.

Add a new vop_bypass() and make vop_default be a pointer
to another struct vop_vector.

Remove a lot of vfs_init since vop_vector is ready to use
from the compiler.

Cast various vop_mumble() to void * with uppercase name,
for instance VOP_PANIC, VOP_NULL etc.

Implement VCALL() by making vdesc_offset the offsetof() the
relevant function pointer in vop_vector. This is disgusting
but since the code is generated by a script comparatively
safe. The alternative for nullfs etc. would be much worse.

Fix up all vnode method vectors to remove casts so they
become typesafe. (The bulk of this is generated by scripts)


138270 01-Dec-2004 phk

Mechanically change prototypes for vnode operations to use the new typedefs.


137726 15-Nov-2004 phk

Make VOP_BMAP return a struct bufobj for the underlying storage device
instead of a vnode for it.

The vnode_pager does not and should not have any interest in what
the filesystem uses for backend.

(vfs_cluster doesn't use the backing store argument.)


137321 06-Nov-2004 phk

Get even closer to not crashing ext2fs


137320 06-Nov-2004 phk

Get closer to unbreaking ext2fs


137308 06-Nov-2004 phk

Properly implement a default version of VOP_GETWRITEMOUNT.

Remove improper access to vop_stdgetwritemount() which should and
will instead rely on the VOP default path.


137039 29-Oct-2004 phk

Move EXT2FS to GEOM backing instead of DEVFS.

For details, please see src/sys/ufs/ffs/ffs_vfsops.c 1.250.


137008 28-Oct-2004 phk

Reduce the locking activity by epsilon by checking VNON condition before
releasing the mountlock.


136991 27-Oct-2004 phk

Eliminate unnecessary KASSERTs.

Don't use bp->b_vp in VOP_STRATEGY: the vnode is passed in as an argument.


136943 25-Oct-2004 phk

Loose the v_dirty* and v_clean* alias macros.

Check the count field where we just want to know the full/empty state,
rather than using TAILQ_EMPTY() or TAILQ_FIRST().


136927 24-Oct-2004 phk

Move the buffer method vector (buf->b_op) to the bufobj.

Extend it with a strategy method.

Add bufstrategy() which do the usual VOP_SPECSTRATEGY/VOP_STRATEGY
song and dance.

Rename ibwrite to bufwrite().

Move the two NFS buf_ops to more sensible places, add bufstrategy
to them.

Add inlines for bwrite() and bstrategy() which calls through
buf->b_bufobj->b_ops->b_{write,strategy}().

Replace almost all VOP_STRATEGY()/VOP_SPECSTRATEGY() calls with bstrategy().


136767 22-Oct-2004 phk

Add b_bufobj to struct buf which eventually will eliminate the need for b_vp.

Initialize b_bufobj for all buffers.

Make incore() and gbincore() take a bufobj instead of a vnode.

Make inmem() local to vfs_bio.c

Change a lot of VI_[UN]LOCK(bp->b_vp) to BO_[UN]LOCK(bp->b_bufobj)
also VI_MTX() to BO_MTX(),

Make buf_vlist_add() take a bufobj instead of a vnode.

Eliminate other uses of bp->b_vp where bp->b_bufobj will do.

Various minor polishing: remove "register", turn panic into KASSERT,
use new function declarations, TAILQ_FOREACH_SAFE() etc.


135864 27-Sep-2004 phk

Desupport device nodes on EXT2 filesystems.


135858 27-Sep-2004 phk

Give cluster_write() an explicit vnode argument.

In the future a struct buf will not automatically point out a vnode for us.


134899 07-Sep-2004 phk

Create simple function init_va_filerev() for initializing a va_filerev
field.

Replace three instances of longhaired initialization va_filerev fields.

Added XXX comment wondering why we don't use random bits instead of
uptime of the system for this purpose.


133741 15-Aug-2004 jmg

Add locking to the kqueue subsystem. This also makes the kqueue subsystem
a more complete subsystem, and removes the knowlege of how things are
implemented from the drivers. Include locking around filter ops, so a
module like aio will know when not to be unloaded if there are outstanding
knotes using it's filter ops.

Currently, it uses the MTX_DUPOK even though it is not always safe to
aquire duplicate locks. Witness currently doesn't support the ability
to discover if a dup lock is ok (in some cases).

Reviewed by: green, rwatson (both earlier versions)


132902 30-Jul-2004 phk

Put a version element in the VFS filesystem configuration structure
and refuse initializing filesystems with a wrong version. This will
aid maintenance activites on the 5-stable branch.

s/vfs_mount/vfs_omount/

s/vfs_nmount/vfs_mount/

Name our filesystems mount function consistently.

Eliminate the namiedata argument to both vfs_mount and vfs_omount.
It was originally there to save stack space. A few places abused
it to get hold of some credentials to pass around. Effectively
it is unused.

Reorganize the root filesystem selection code.


132805 28-Jul-2004 phk

Remove global variable rootdevs and rootvp, they are unused as such.

Add local rootvp variables as needed.

Remove checks for miniroot's in the swappartition. We never did that
and most of the filesystems could never be used for that, but it had
still been copy&pasted all over the place.


132653 26-Jul-2004 cperciva

Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is
somewhat clearer, but more importantly allows for a consistent naming
scheme for suser_cred flags.

The old name is still defined, but will be removed in a few days (unless I
hear any complaints...)

Discussed with: rwatson, scottl
Requested by: jhb


132023 12-Jul-2004 alfred

Make VFS_ROOT() and vflush() take a thread argument.
This is to allow filesystems to decide based on the passed thread
which vnode to return.
Several filesystems used curthread, they now use the passed thread.


131925 10-Jul-2004 marcel

Update for the KDB framework:
o Make debugging code conditional upon KDB instead of DDB.


131551 04-Jul-2004 phk

When we traverse the vnodes on a mountpoint we need to look out for
our cached 'next vnode' being removed from this mountpoint. If we
find that it was recycled, we restart our traversal from the start
of the list.

Code to do that is in all local disk filesystems (and a few other
places) and looks roughly like this:

MNT_ILOCK(mp);
loop:
for (vp = TAILQ_FIRST(&mp...);
(vp = nvp) != NULL;
nvp = TAILQ_NEXT(vp,...)) {
if (vp->v_mount != mp)
goto loop;
MNT_IUNLOCK(mp);
...
MNT_ILOCK(mp);
}
MNT_IUNLOCK(mp);

The code which takes vnodes off a mountpoint looks like this:

MNT_ILOCK(vp->v_mount);
...
TAILQ_REMOVE(&vp->v_mount->mnt_nvnodelist, vp, v_nmntvnodes);
...
MNT_IUNLOCK(vp->v_mount);
...
vp->v_mount = something;

(Take a moment and try to spot the locking error before you read on.)

On a SMP system, one CPU could have removed nvp from our mountlist
but not yet gotten to assign a new value to vp->v_mount while another
CPU simultaneously get to the top of the traversal loop where it
finds that (vp->v_mount != mp) is not true despite the fact that
the vnode has indeed been removed from our mountpoint.

Fix:

Introduce the macro MNT_VNODE_FOREACH() to traverse the list of
vnodes on a mountpoint while taking into account that vnodes may
be removed from the list as we go. This saves approx 65 lines of
duplicated code.

Split the insmntque() which potentially moves a vnode from one mount
point to another into delmntque() and insmntque() which does just
what the names say.

Fix delmntque() to set vp->v_mount to NULL while holding the
mountpoint lock.


130763 20-Jun-2004 bde

Fixed misformatting of code and breaking of a comment in previous commit.


130762 20-Jun-2004 bde

Fixed misformatting in previous commit.


130585 16-Jun-2004 phk

Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.


128019 07-Apr-2004 imp

Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson


126853 11-Mar-2004 phk

Properly vector all bwrite() and BUF_WRITE() calls through the same path
and s/BUF_WRITE()/bwrite()/ since it now does the same as bwrite().


126852 11-Mar-2004 phk

Remove unused mnt_reservedvnlist field.


126851 11-Mar-2004 phk

Remove unused second arg to vfinddev().
Don't call addaliasu() on VBLK nodes.


125991 19-Feb-2004 tjr

Enforce the file size limit in VOP_WRITE() as well as VOP_TRUNCATE();
pointed out by bde.


125962 18-Feb-2004 tjr

Add partial support for large (>4GB) files on ext2 filesystems. This
support is partial in that it will refuse to create large files on
filesystems that haven't been upgraded to EXT2_DYN_REV or that don't
have the EXT2_FEATURE_RO_COMPAT_LARGE_FILE flag set in the superblock.

MFC after: 2 weeks


125843 15-Feb-2004 bde

Fixed misspellings of "ext2_*" as "ufs_*" and " "ext2fs_*", and of
"independent" as "dependent" Fixed some other relatively minor wording
and formatting errors.


125842 15-Feb-2004 bde

Removed support for the unsupported option READONLY. It just forced
dishonoring of requests for read-write mounts.


125786 13-Feb-2004 bde

MFffs (ffs_vfsops.c 1.76 (part of the big soft updates commit): lock
the vnode around calls to vinvalbuf()). Apparently no one has tested
ext2fs with DEBUG_VOP_LOCKS. Vnode locking for vinvalbuf() might not
be required in non-soft-updates cases, but it is now asserted.

MFffs (uncommitted related and nearby cleanups: don't unlock the vnode
after vinvalbuf() only to have to relock it almost immediately; don't
refer to devices classified by vn_isdisk() as block devices).


125781 13-Feb-2004 bde

Fixed longstanding brokenness of inode updates. The waitfor flag was
dishonored in rev.1.1 by commenting out the code that honored it. This
gave the worst disadvantages of async mounts in an uncontrollable way.

Honoring the flag costs about 50% in real time in worst cases on a new
but not very fast ATA drive with write caching (probably more on drives
without write caching). The old misbehavior can be recovered using
async mounts after implementing them in mount_ext2fs(8) (just put the
MNT_ASYNC flag in mount_ext2fs's table of supported options like it
is in mount's table).


125739 12-Feb-2004 bde

MFffs (ffs_vfsops.c 1.227: clean up open mode bandaid). This reduces
gratuitous differences with ffs a little.


125454 04-Feb-2004 jhb

Locking for the per-process resource limits structure.
- struct plimit includes a mutex to protect a reference count. The plimit
structure is treated similarly to struct ucred in that is is always copy
on write, so having a reference to a structure is sufficient to read from
it without needing a further lock.
- The proc lock protects the p_limit pointer and must be held while reading
limits from a process to keep the limit structure from changing out from
under you while reading from it.
- Various global limits that are ints are not protected by a lock since
int writes are atomic on all the archs we support and thus a lock
wouldn't buy us anything.
- All accesses to individual resource limits from a process are abstracted
behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return
either an rlimit, or the current or max individual limit of the specified
resource from a process.
- dosetrlimit() was renamed to kern_setrlimit() to match existing style of
other similar syscall helper functions.
- The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit()
(it didn't used the stackgap when it should have) but uses lim_rlimit()
and kern_setrlimit() instead.
- The svr4 compat no longer uses the stackgap for resource limits calls,
but uses lim_rlimit() and kern_setrlimit() instead.
- The ibcs2 compat no longer uses the stackgap for resource limits. It
also no longer uses the stackgap for accessing sysctl's for the
ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result,
ibcs2_sysconf() no longer needs Giant.
- The p_rlimit macro no longer exists.

Submitted by: mtm (mostly, I only did a few cleanups and catchups)
Tested on: i386
Compiled on: alpha, amd64


124908 24-Jan-2004 tjr

Copy workaround from FFS: open device for write access even if
the user requests a read-only mount. This is necessary because we
don't do the VOP_OPEN again if they upgrade a read-only mount to
read-write.

Noticed by: bde


124728 19-Jan-2004 kan

Spell magic '16' number as IO_SEQSHIFT.


122114 05-Nov-2003 bde

Fixed a reference to a nonexistent variable in previous commit. Renaming
of ffs_reload()'s mountp parameter to mp in rev.1.28 of ffs_vnops.c
had not been merged here.

ext2fs_reload() is still missing locking from not merging other changes
to ffs_reload(), but none of these is related to recent locking changes.


122091 05-Nov-2003 kan

Remove mntvnode_mtx and replace it with per-mountpoint mutex.
Introduce two new macros MNT_ILOCK(mp)/MNT_IUNLOCK(mp) to
operate on this mutex transparently.

Eventually new mutex will be protecting more fields in
struct mount, not only vnode list.

Discussed with: jeff


121925 03-Nov-2003 kan

Use VOP_UNLOCK/vrele instead of vput. td was erecived as a parameter
and one cannot be sure it is equal to curthread.


121874 02-Nov-2003 kan

Take care not to call vput if thread used in corresponding vget
wasn't curthread, i.e. when we receive a thread pointer to use
as a function argument. Use VOP_UNLOCK/vrele in these cases.

The only case there td != curthread known at the moment is
boot() calling sync with thread0 pointer.

This fixes the panic on shutdown people have reported.


121847 01-Nov-2003 kan

Temporarily undo parts of the stuct mount locking commit by jeff.
It is unsafe to hold a mutex across vput/vrele calls.

This will be redone when a better locking strategy is agreed upon.

Discussed with: jeff


121647 29-Oct-2003 marcel

Fix the alpha tinderbox. The alpha specific bitops used by the bitmap
code has the typical branch prediction detour, which creates cross-
section branches. A LINT kernel is apparently large enough nowadays
that the .text and .text2 sections cannot always be layed-out so that
branches between them reach.
The fix is to stop using the alpha-specific bitops and instead use
the portable implementation used by all platforms other than alpha
and i386.


121205 18-Oct-2003 phk

DuH!

bp->b_iooffset (the spot on the disk), not bp->b_offset (the offset in
the file)


121197 18-Oct-2003 phk

Initialize bp->b_offset before calling VOP_[SPEC]STRATEGY()


120783 05-Oct-2003 jeff

- File systems that wish to inspect the vnode contents or their private
v_data field before calling vget/vn_lock must check VI_XLOCK manually to
be sure that v_data is still valid. Implement this check in two places
here.


120776 05-Oct-2003 jeff

- Don't cache_purge() in ext2_reclaim. vclean() does it for us so
this is redundant.


120751 04-Oct-2003 jeff

- Don't use vrecycle() call vgonel() directly after grabing the vnode
interlock. We do this so that we still hold the interlock when we lock
the vnode later. This prevents races with the mnt vnode list.


119513 28-Aug-2003 jeff

- Clean-up comments that refer to the use of B_LOCKED.


119512 28-Aug-2003 jeff

- In LCK_BUF() simply change the owner of the buf to the kernel.
- In ULCK_BUF we no longer need to acquire the lock, just write the buf out.
- The combination of these changes eliminates one more use of B_LOCKED which
is in the way of making the buffer cache SMP safe. In the long term
ext2fs should probably not try to optimize the use of their metadata bufs
with a private cache. This will starve the rest of the system for buffers
in the extreme case.

Discussed with: bde (A long time ago..)
Tested on: md disk/x86


119435 25-Aug-2003 marcel

Change of plans: Add ext2_bitops.h with generic and portable
implementations. Use those on platforms that don't have MD
headers. Remove the ia64 MD header. We're going to use the C
implementation there.

Suggested by: bde


119348 23-Aug-2003 marcel

Add compilation support for extfs on ia64, primarily to support LINT.
The functions in ia64-bitops.h merely call panic() for now. They need
to be implemented some day, just not today.


118047 26-Jul-2003 phk

Add a "int fd" argument to VOP_OPEN() which in the future will
contain the filedescriptor number on opens from userland.

The index is used rather than a "struct file *" since it conveys a bit
more information, which may be useful to in particular fdescfs and /dev/fd/*

For now pass -1 all over the place.


116412 15-Jun-2003 phk

Add the same KASSERT to all VOP_STRATEGY and VOP_SPECSTRATEGY implementations
to check that the buffer points to the correct vnode.


116271 12-Jun-2003 phk

Initialize struct vfsops C99-sparsely.

Submitted by: hmp
Reviewed by: phk


112180 13-Mar-2003 jeff

- Lock the buf before clearing flags.


111856 04-Mar-2003 jeff

- Add a new 'flags' parameter to getblk().
- Define one flag GB_LOCK_NOWAIT that tells getblk() to pass the LK_NOWAIT
flag to the initial BUF_LOCK(). This will eventually be used in cases
were we want to use a buffer only if it is not currently in use.
- Convert all consumers of the getblk() api to use this extra parameter.

Reviwed by: arch
Not objected to by: mckusick


111841 03-Mar-2003 njl

Finish cleanup of vprint() which was begun with changing v_tag to a string.
Remove extraneous uses of vop_null, instead defering to the default op.
Rename vnode type "vfs" to the more descriptive "syncer".
Fix formatting for various filesystems that use vop_print.


111742 02-Mar-2003 des

Clean up whitespace, s/register //, refrain from strong urge to ANSIfy.


111741 02-Mar-2003 des

uiomove-related caddr_t -> void * (just the low-hanging fruit)


111463 25-Feb-2003 jeff

- Add an interlock argument to BUF_LOCK and BUF_TIMELOCK.
- Remove the buftimelock mutex and acquire the buf's interlock to protect
these fields instead.
- Hold the vnode interlock while locking bufs on the clean/dirty queues.
This reduces some cases from one BUF_LOCK with a LK_NOWAIT and another
BUF_LOCK with a LK_TIMEFAIL to a single lock.

Reviewed by: arch, mckusick


111368 23-Feb-2003 obrien

Import Linux's linux/include/asm-sparc64/bitopts.h.
This is taken from the 2.4.3 Linux sources as shipped on Red Hat 7.1 Alpha.


111119 19-Feb-2003 imp

Back out M_* changes, per decision of the TRB.

Approved by: trb


110587 09-Feb-2003 jeff

- Use the new vop_stdfsync instead of recreating our own.


110005 28-Jan-2003 phk

Use VOP_SPECSTRATEGY() instead of VOP_STRATEGY().


109623 21-Jan-2003 alfred

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


108648 04-Jan-2003 phk

Since Jeffr made the std* functions the default in rev 1.63 of
kern/vfs_defaults.c it is wrong for the individual filesystems to use
the std* functions as that prevents override of the default.

Found by: src/tools/tools/vop_table


108589 03-Jan-2003 phk

Convert calls to BUF_STRATEGY to VOP_STRATEGY calls. This is a no-op since
all BUF_STRATEGY did in the first place was call VOP_STRATEGY.


108533 01-Jan-2003 schweikh

Correct typos, mostly s/ a / an / where appropriate. Some whitespace cleanup,
especially in troff files.


108470 30-Dec-2002 schweikh

Fix typos, mostly s/ an / a / where appropriate and a few s/an/and/
Add FreeBSD Id tag where missing.


105420 18-Oct-2002 bde

MFufs 1.33:
In the 'found' case for ext2_lookup() the underlying bp's data was
being accessed after the bp had been releaed. A simple move of the
brelse() solves the problem.

The PR reports that this caused panics running the GDB testsuite unless
NO_GEOM is configured.

PR: 44060
Reported by: Mark Kettenis <kettenis@chello.nl>
MFC after: 3 days


105223 16-Oct-2002 phk

Be consistent about functions being static.
Fix misindentation.

Spotted by: DARPA & NAI Labs.


105077 14-Oct-2002 mckusick

Regularize the vop_stdlock'ing protocol across all the filesystems
that use it. Specifically, vop_stdlock uses the lock pointed to by
vp->v_vnlock. By default, getnewvnode sets up vp->v_vnlock to
reference vp->v_lock. Filesystems that wish to use the default
do not need to allocate a lock at the front of their node structure
(as some still did) or do a lockinit. They can simply start using
vn_lock/VOP_UNLOCK. Filesystems that wish to manage their own locks,
but still use the vop_stdlock functions (such as nullfs) can simply
replace vp->v_vnlock with a pointer to the lock that they wish to
have used for the vnode. Such filesystems are responsible for
setting the vp->v_vnlock back to the default in their vop_reclaim
routine (e.g., vp->v_vnlock = &vp->v_lock).

In theory, this set of changes cleans up the existing filesystem
lock interface and should have no function change to the existing
locking scheme.

Sponsored by: DARPA & NAI Labs.


103938 25-Sep-2002 jeff

- Lock access to the buf lists.
- Use vrefcnt() where appropriate.


103636 19-Sep-2002 truckman

VOP_FSYNC() requires that it's vnode argument be locked, which nfs_link()
wasn't doing. Rather than just lock and unlock the vnode around the call
to VOP_FSYNC(), implement rwatson's suggestion to lock the file vnode
in kern_link() before calling VOP_LINK(), since the other filesystems
also locked the file vnode right away in their link methods. Remove the
locking and and unlocking from the leaf filesystem link methods.

Reviewed by: rwatson, bde (except for the unionfs_link() changes)


103559 18-Sep-2002 njl

Remove any VOP_PRINT that redundantly prints the tag.
Move lockmgr_printinfo() into vprint() for everyone's benefit.

Suggested by: bde


103314 14-Sep-2002 njl

Remove all use of vnode->v_tag, replacing with appropriate substitutes.
v_tag is now const char * and should only be used for debugging.

Additionally:
1. All users of VT_NTS now check vfsconf->vf_type VFCF_NETWORK
2. The user of VT_PROCFS now checks for the new flag VV_PROCDEP, which
is propagated by pseudofs to all child vnodes if the fs sets PFS_PROCDEP.

Suggested by: phk
Reviewed by: bde, rwatson (earlier version)


103180 10-Sep-2002 bde

vfs_syscalls.c:
Changed rename(2) to follow the letter of the POSIX spec. POSIX
requires rename() to have no effect if its args "resolve to the same
existing file". I think "file" can only reasonably be read as referring
to the inode, although the rationale and "resolve" seem to say that
sameness is at the level of (resolved) directory entries.

ext2fs_vnops.c, ufs_vnops.c:
Replaced code that gave the historical BSD behaviour of removing one
link name by checks that this code is now unreachable. This fixes
some races. All vnodes needed to be unlocked for the removal, and
locking at another level using something like IN_RENAME was not even
attempted, so it was possible for rename(x, y) to return with both x
and y removed even without any unlink(2) syscalls (one process can
remove x using rename(x, y) and another process can remove y using
rename(y, x)).

Prodded by: alfred
MFC after: 8 weeks
PR: 42617


101941 15-Aug-2002 rwatson

In order to better support flexible and extensible access control,
make a series of modifications to the credential arguments relating
to file read and write operations to cliarfy which credential is
used for what:

- Change fo_read() and fo_write() to accept "active_cred" instead of
"cred", and change the semantics of consumers of fo_read() and
fo_write() to pass the active credential of the thread requesting
an operation rather than the cached file cred. The cached file
cred is still available in fo_read() and fo_write() consumers
via fp->f_cred. These changes largely in sys_generic.c.

For each implementation of fo_read() and fo_write(), update cred
usage to reflect this change and maintain current semantics:

- badfo_readwrite() unchanged
- kqueue_read/write() unchanged
pipe_read/write() now authorize MAC using active_cred rather
than td->td_ucred
- soo_read/write() unchanged
- vn_read/write() now authorize MAC using active_cred but
VOP_READ/WRITE() with fp->f_cred

Modify vn_rdwr() to accept two credential arguments instead of a
single credential: active_cred and file_cred. Use active_cred
for MAC authorization, and select a credential for use in
VOP_READ/WRITE() based on whether file_cred is NULL or not. If
file_cred is provided, authorize the VOP using that cred,
otherwise the active credential, matching current semantics.

Modify current vn_rdwr() consumers to pass a file_cred if used
in the context of a struct file, and to always pass active_cred.
When vn_rdwr() is used without a file_cred, pass NOCRED.

These changes should maintain current semantics for read/write,
but avoid a redundant passing of fp->f_cred, as well as making
it more clear what the origin of each credential is in file
descriptor read/write operations.

Follow-up commits will make similar changes to other file descriptor
operations, and modify the MAC framework to pass both credentials
to MAC policy modules so they can implement either semantic for
revocation.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


101744 12-Aug-2002 rwatson

Pass IO_NOMACCHECK to vn_rdwr() in the following checks to prevent
enforcement of MAC policy on the read or write operations:

- In ext2fs, don't enforce MAC on loop-back reads and writes supporting
directory read operations in lookup(), directory modifications in
rename(), directory write operations in mkdir(), symlink write
operations in symlink().

- In the NFS client locking code, perform vn_rdwr() on the NFS locking
socket without enforcing MAC, since the write is done on behalf of
the kernel NFS implementation rather than the user process.

- In UFS, don't enforce MAC on loop-back reads and writes supporting
directory read operations in lookup(), and symlink write operations
in symlink().

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


101308 04-Aug-2002 jeff

- Replace v_flag with v_iflag and v_vflag
- v_vflag is protected by the vnode lock and is used when synchronization
with VOP calls is needed.
- v_iflag is protected by interlock and is used for dealing with vnode
management issues. These flags include X/O LOCK, FREE, DOOMED, etc.
- All accesses to v_iflag and v_vflag have either been locked or marked with
mp_fixme's.
- Many ASSERT_VOP_LOCKED calls have been added where the locking was not
clear.
- Many functions in vfs_subr.c were restructured to provide for stronger
locking.

Idea stolen from: BSD/OS


97255 24-May-2002 mux

Convert ext2fs to nmount(2).


96881 18-May-2002 iedowse

Add an ext2_uninit() routine that undoes the actions performed by
ext2_init(). This permits the ext2fs module to be unloaded without
causing panics and leaking memory.


96880 18-May-2002 iedowse

Fix two off-by-one errors when sanity-checking inode numbers. In
ext2fs, inode numbers start at 1, so the maximum valid inode number
is (s_inodes_per_group * s_groups_count), not one less. This is
just a minimal change to avoid unnecessary panics and errors; some
other related bugs that Bruce Evans mentioned to me are not addressed.

Reviewed by: bde (ages ago)


96877 18-May-2002 iedowse

Use explicitly-sized types where necessary to make ext2fs work again
after the change to a 64-bit daddr_t.


96753 16-May-2002 iedowse

Give ext2fs its own static "dirchk" variable instead of using ufs's
variable. Make this accessible as the sysctl vfs.e2fs.dirchk.


96752 16-May-2002 iedowse

Remove register keyword.


96749 16-May-2002 iedowse

Complete the separation of ext2fs from ufs by copying the remaining
shared code and converting all ufs references. Originally it may
have made sense to share common features between the two filesystems,
but recently it has only caused problems, the UFS2 work being the
final straw.

All UFS_* indirect calls are now direct calls to ext2_* functions,
and ext2fs-specific mount and inode structures have been introduced.


96596 14-May-2002 iedowse

Following a repo-copy from src/sys/ufs/ufs, rename functions and
structures etc. to ext2fs-specific names, and remove ufs-specific
code that is no longer required. As a first stage, the code will
still convert back and forth between the on-disk format and struct
inode, so the struct dinode fields have been added to struct inode
for now.

Note that these files are not yet connected to the build.


96572 14-May-2002 phk

Make daddr_t and u_daddr_t 64bits wide.
Retire daddr64_t and use daddr_t instead.

Sponsored by: DARPA & NAI Labs.


96521 13-May-2002 bde

Fixed syntax errors (garbage after #endif; just editing errors in this
case). These errors and related style bugs swere cloned from ufs
shortly after they were committed to ufs. They were mostly fixed in
ufs long ago.


96506 13-May-2002 phk

Remove register keyword.

Sponsored by: DARPA & NAI Labs.
Submitted by: mckusick


96473 12-May-2002 phk

ARGH! SBLOCK is not unused. Try to get this right.

BBSIZE belongs in <sys/disklabel.h> (but shouldn't be a constant).

Define SBLOCK again, using the right math.

Sponsored by: DARPA & NAI Labs.


96472 12-May-2002 phk

Remove #define for BBOFF, it is assumed == 0 so many places that we might
as well forget about it. In fact the only thing which used it was the
SBOFF macro.

Sponsored by: DARPA & NAI Labs.


96471 12-May-2002 phk

Remove unused BBLOCK and SBLOCK #defines.

Sponsored by: DARPA & NAI Labs.


93593 01-Apr-2002 jhb

Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API. The entire API now consists of two functions
similar to the pre-KSE API. The suser() function takes a thread pointer
as its only argument. The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0. The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on: smp@


93430 30-Mar-2002 bde

In ffs_mountffs(), set mnt_iosize_max to si_iosize_max unconditionally
provided the latter is nonzero. At this point, the former is a fairly
arbitrary default value (DFTPHYS), so changing it to any reasonable
value specified by the device driver is safe. Using the maximum of
these limits broke ffs clustered i/o for devices whose si_iosize_max
is < DFLTPHYS. Using the minimum would break device drivers' ability
to increase the active limit from DFTLPHYS up to MAXPHYS.

Copied the code for this and the associated (unnecessary?) fixup of
mp_iosize_max to all other filesystems that use clustering (ext2fs and
msdosfs). It was completely missing.

PR: 36309
MFC-after: 1 week


93016 23-Mar-2002 bde

Moved $FreeBSD$ to the correct place.


93015 23-Mar-2002 bde

Repaired CSRG id. This file was not in Lite1; it was just cloned from a
file with a in Lite1 before being cvs-added to FreeBSD.


93014 23-Mar-2002 bde

Fixed some style bugs in the removal of __P(()). Continuation lines
were not outdented to preserve non-KNF lining up of code with parentheses.
Switch to KNF formatting.


92728 19-Mar-2002 alfred

Remove __P.


92462 17-Mar-2002 mckusick

Add a flags parameter to VFS_VGET to pass through the desired
locking flags when acquiring a vnode. The immediate purpose is
to allow polling lock requests (LK_NOWAIT) needed by soft updates
to avoid deadlock when enlisting other processes to help with
the background cleanup. For the future it will allow the use of
shared locks for read access to vnodes. This change touches a
lot of files as it affects most filesystems within the system.
It has been well tested on FFS, loopback, and CD-ROM filesystems.
only lightly on the others, so if you find a problem there, please
let me (mckusick@mckusick.com) know.


92363 15-Mar-2002 mckusick

Introduce the new 64-bit size disk block, daddr64_t. Change
the bio and buffer structures to have daddr64_t bio_pblkno,
b_blkno, and b_lblkno fields which allows access to disks
larger than a Terabyte in size. This change also requires
that the VOP_BMAP vnode operation accept and return daddr64_t
blocks. This delta should not affect system operation in
any way. It merely sets up the necessary interfaces to allow
the development of disk drivers that work with these larger
disk block addresses. It also allows for the development of
UFS2 which will use 64-bit block addresses.


92082 11-Mar-2002 phk

Remove use of the bogus ioctl DIOCGPART.

It was used to initialize an unused variable, because ext2fs was
copy&pasted from UFS rather than copy,paste&cleaned from UFS.

Suggested by: bde


91406 27-Feb-2002 jhb

Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.


87599 10-Dec-2001 obrien

Update to C99, s/__FUNCTION__/__func__/,
also don't use ANSI string concatenation.


86037 04-Nov-2001 dillon

Add mnt_reservedvnlist so we can MFC to 4.x, in order to make all mount
structure changes now rather then piecemeal later on. mnt_nvnodelist
currently holds all the vnodes under the mount point. This will eventually
be split into a 'dirty' and 'clean' list. This way we only break kld's once
rather then twice. nvnodelist will eventually turn into the dirty list
and should remain compatible with the klds.


85339 23-Oct-2001 dillon

Change the vnode list under the mount point from a LIST to a TAILQ
in preparation for an implementation of limiting code for kern.maxvnodes.

MFC after: 3 days


83899 24-Sep-2001 iedowse

The addition of i_dirhash to struct inode pushed RELENG_4's
sizeof(struct inode) into a new malloc bucket on the i386. This
didn't happen in -current due to the removal of i_lock, but it does
no harm to apply the workaround to -current first.

Reduce the size of the i_spare[] array in struct inode from 4 to
3 entries, and change ext2fs to use i_din.di_spare[1] so that it
does not need i_spare[3].

Reviewed by: bde
MFC after: 3 days


83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


79561 10-Jul-2001 iedowse

Bring in dirhash, a simple hash-based lookup optimisation for large
directories. When enabled via "options UFS_DIRHASH", in-core hash
arrays are maintained for large directories. These allow all
directory operations to take place quickly instead of requiring
long linear searches. For now anyway, dirhash is not enabled by
default.

The in-core hash arrays have a memory requirement that is approximately
half the size of the size of the on-disk directory file. A number
of new sysctl variables allow control over which directories get
hashed and over the maximum amount of memory that dirhash will use:

vfs.ufs.dirhash_minsize
The minimum on-disk directory size for which hashing should be
used. The default is 2560 (2.5k).

vfs.ufs.dirhash_maxmem
The system-wide maximum total memory to be used by dirhash data
structures. The default is 2097152 (2MB).

The current amount of memory being used by dirhash is visible
through the read-only sysctl variable vfs.ufs.dirhash_maxmem.
Finally, some extra sanity checks that are enabled by default, but
which may have an impact on performance, can be disabled by setting
vfs.ufs.dirhash_docheck to 0.

Discussed on: -fs, -hackers


78940 28-Jun-2001 jhb

Fix more mntvnode and vnode interlock order reversals.


78907 28-Jun-2001 jhb

Fix a mntvnode and vnode interlock reversal.


77437 29-May-2001 phk

Remove last vestiges of MFS.


76688 16-May-2001 iedowse

Change the second argument of vflush() to an integer that specifies
the number of references on the filesystem root vnode to be both
expected and released. Many filesystems hold an extra reference on
the filesystem root vnode, which must be accounted for when
determining if the filesystem is busy and then released if it isn't
busy. The old `skipvp' approach required individual filesystem
xxx_unmount functions to re-implement much of vflush()'s logic to
deal with the root vnode.

All 9 filesystems that hold an extra reference on the root vnode
got the logic wrong in the case of forced unmounts, so `umount -f'
would always fail if there were any extra root vnode references.
Fix this issue centrally in vflush(), now that we can.

This commit also fixes a vnode reference leak in devfs, which could
result in idle devfs filesystems that refuse to unmount.

Reviewed by: phk, bp


76357 08-May-2001 mckusick

When running with soft updates, track the number of blocks and files
that are committed to being freed and reflect these blocks in the
counts returned by statfs (and thus also by the `df' command). This
change allows programs such as those that do news expiration to
know when to stop if they are trying to create a certain percentage
of free space. Note that this change does not solve the much harder
problem of making this to-be-freed space available to applications
that want it (thus on a nearly full filesystem, you may still
encounter out-of-space conditions even though the free space will
show up eventually). Hopefully this harder problem will be the
subject of a future enhancement.


76172 01-May-2001 phk

Remove blatantly pointless call to VOP_BMAP().


76167 01-May-2001 phk

Implement vop_std{get|put}pages() and add them to the default vop[].

Un-copy&paste all the VOP_{GET|PUT}PAGES() functions which do nothing but
the default.


76166 01-May-2001 markm

Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by: bde (with reservations)


76132 29-Apr-2001 phk

VOP_BALLOC was never really a VOP in the first place, so convert it
to UFS_BALLOC like the other "between UFS and FFS function interfaces".


76130 29-Apr-2001 phk

Make a panic less misleading.


76128 29-Apr-2001 phk

Remove two unused arguments from ufs_bmaparray().


76117 29-Apr-2001 grog

Revert consequences of changes to mount.h, part 2.

Requested by: bde


75951 25-Apr-2001 bde

MFffs ffs_balloc.c 1.5.

Long ago, bread() set b_blkno to the disk block number as a side effect
of doing physical i/o (or it just retained the setting from when the
i/o was done). The setting is lost when buffers go away and then are
reconsituted from VM. bread() originally compensated by doing a
VOP_BMAP() to recover b_blkno, but this was no good since it sometimes
caused extra i/o or even deadlock for bread()ing metadata to do the
bmap. This was fixed in vfs_bio.c 1.33 (1995/03/03) and ffs_balloc.c
1.5, etc., by removing the VOP_BMAP() from bread() and breadn(), and
changing all (?) places that used b_blkno to set it if necessary.

ext2fs was not imported until later in 1995 and was still depending on
the old behaviour of bread() in at least ext2_balloc(). This caused
filesystem and file corruption by clobbering direct block numbers in
inodes.


75934 25-Apr-2001 phk

Move the netexport structure from the fs-specific mountstructure
to struct mount.

This makes the "struct netexport *" paramter to the vfs_export
and vfs_checkexport interface unneeded.

Consequently that all non-stacking filesystems can use
vfs_stdcheckexp().

At the same time, make it a pointer to a struct netexport
in struct mount, so that we can remove the bogus AF_MAX
and #include <net/radix.h> from <sys/mount.h>


75858 23-Apr-2001 grog

Correct #includes to work with fixed sys/mount.h.


73942 07-Mar-2001 mckusick

Fixes to track snapshot copy-on-write checking in the specinfo
structure rather than assuming that the device vnode would reside
in the FFS filesystem (which is obviously a broken assumption with
the device filesystem).


73929 07-Mar-2001 jhb

Grab the process lock while calling psignal and before calling psignal.


73286 01-Mar-2001 adrian

Reviewed by: jlemon

An initial tidyup of the mount() syscall and VFS mount code.

This code replaces the earlier work done by jlemon in an attempt to
make linux_mount() work.

* the guts of the mount work has been moved into vfs_mount().

* move `type', `path' and `flags' from being userland variables into being
kernel variables in vfs_mount(). `data' remains a pointer into
userspace.

* Attempt to verify the `type' and `path' strings passed to vfs_mount()
aren't too long.

* rework mount() and linux_mount() to take the userland parameters
(besides data, as mentioned) and pass kernel variables to vfs_mount().
(linux_mount() already did this, I've just tidied it up a little more.)

* remove the copyin*() stuff for `path'. `data' still requires copyin*()
since its a pointer into userland.

* set `mount->mnt_statf_mntonname' in vfs_mount() rather than in each
filesystem. This variable is generally initialised with `path', and
each filesystem can override it if they want to.

* NOTE: f_mntonname is intiailised with "/" in the case of a root mount.


72640 18-Feb-2001 asmodai

Preceed/preceeding are not english words. Use precede or preceding.


72200 09-Feb-2001 bmilekic

Change and clean the mutex lock interface.

mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)


71999 04-Feb-2001 phk

Mechanical change to use <sys/queue.h> macro API instead of
fondling implementation details.

Created with: sed(1)
Reviewed by: md5(1)


71699 27-Jan-2001 jhb

Back out proc locking to protect p_ucred for obtaining additional
references along with the actual obtaining of additional references.


71576 24-Jan-2001 jasone

Convert all simplelocks to mutexes and remove the simplelock implementations.


71483 23-Jan-2001 jhb

Proc locking, mostly protecting p_ucred while obtaining additional
references.


70131 17-Dec-2000 dillon

Avoid a data-consistency race between write() and mmap()
by ensuring that newly allocated blocks are zerod. The
race can occur even in the case where the write covers
the entire block.

Reported by: Sven Berkvens <sven@berkvens.net>, Marc Olzheim <zlo@zlo.nu>


69808 09-Dec-2000 mjacob

Put the bits in place for Alpha support for ext2. Not tested.


69807 09-Dec-2000 mjacob

Correct to a common %ld the 5 argument to a printf.


69806 09-Dec-2000 mjacob

Use a pointer to a size_t for the 4th argument to copyinstr-
not a pointer to a u_int.


69517 02-Dec-2000 bde

Backed out previous commit. Don't depend on namespace pollution in
<sys/buf.h>.


69399 30-Nov-2000 alfred

remove unneded sys/ucred.h includes


68568 10-Nov-2000 bde

Quick fix for not writing group descriptor group, inode bitmaps or
block bitmaps before unmount() completes. They were written using
bdwrite(), so they were normally written less than 32 seconds after
unmount(), but this is too late if the media is removed or the system
is rebooted soon after unmount(). sync()ing before unmount() didn't
help, because ext2fs uses buggy private caching for these blocks --
it doesn't even bdwrite() them until they are uncached or the filesystem
is unmounted. sync()ing after unmount() didn't help, because sync()
only applies to (vnodes for) mounted filesystems.

PR: 22726


68307 04-Nov-2000 bde

Fixed breakage of mknod() in rev.1.48 of ext2_vnops.c and rev.1.126 of
ufs_vnops.c:

1) i_ino was confused with i_number, so the inode number passed to
VFS_VGET() was usually wrong (usually 0U).
2) ip was dereferenced after vgone() freed it, so the inode number
passed to VFS_VGET() was sometimes not even wrong.

Bug (1) was usually fatal in ext2_mknod(), since ext2fs doesn't have
space for inode 0 on the disk; ino_to_fsba() subtracts 1 from the
inode number, so inode number 0U gives a way out of bounds array
index. Bug(1) was usually harmless in ufs_mknod(); ino_to_fsba()
doesn't subtract 1, and VFS_VGET() reads suitable garbage (all 0's?)
from the disk for the invalid inode number 0U; ufs_mknod() returns
a wrong vnode, but most callers just vput() it; the correct vnode is
eventually obtained by an implicit VFS_VGET() just like it used to be.

Bug (2) usually doesn't happen.


68291 03-Nov-2000 bde

Support filesystems with the not-so-new "sparse_superblocks" feature.
When this feature is enabled, mke2fs doesn't necessarily allocate a
super block and its associated descriptor blocks for every group.
The (non-)allocations are reflected in the block bitmap. Since the
filesystem code doesn't write to these blocks except for the first
superblock, all it has to do to support them is to not count them in
ext2_statfs() and not attempt to check them at mount time in
ext2_check_blocks_bitmap() (the check has never been enabled in
FreeBSD anyway).


67885 29-Oct-2000 phk

Weaken a bogus dependency on <sys/proc.h> in <sys/buf.h> by #ifdef'ing
the offending inline function (BUF_KERNPROC) on it being #included
already.

I'm not sure BUF_KERNPROC() is even the right thing to do or in the
right place or implemented the right way (inline vs normal function).

Remove consequently unneeded #includes of <sys/proc.h>


67708 27-Oct-2000 phk

Convert all users of fldoff() to offsetof(). fldoff() is bad
because it only takes a struct tag which makes it impossible to
use unions, typedefs etc.

Define __offsetof() in <machine/ansi.h>

Define offsetof() in terms of __offsetof() in <stddef.h> and <sys/types.h>

Remove myriad of local offsetof() definitions.

Remove includes of <stddef.h> in kernel code.

NB: Kernelcode should *never* include from /usr/include !

Make <sys/queue.h> include <machine/ansi.h> to avoid polluting the API.

Deprecate <struct.h> with a warning. The warning turns into an error on
01-12-2000 and the file gets removed entirely on 01-01-2001.

Paritials reviews by: various.
Significant brucifications by: bde


66886 09-Oct-2000 eivind

Blow away the v_specmountpoint define, replacing it with what it was
defined as (rdev->si_mountpoint)


66615 04-Oct-2000 jasone

Convert lockmgr locks from using simple locks to using mutexes.

Add lockdestroy() and appropriate invocations, which corresponds to
lockinit() and must be called to clean up after a lockmgr lock is no
longer needed.


66378 26-Sep-2000 bp

ext2fs depends on ufs code, so update it to properly handle v_lock field.

Noticed by: bde


66355 25-Sep-2000 bp

Add a lock structure to vnode structure. Previously it was either allocated
separately (nfs, cd9660 etc) or keept as a first element of structure
referenced by v_data pointer(ffs). Such organization leads to known problems
with stacked filesystems.

From this point vop_no*lock*() functions maintain only interlock lock.
vop_std*lock*() functions maintain built-in v_lock structure using lockmgr().
vop_sharedlock() is compatible with vop_stdunlock(), but maintains a shared
lock on vnode.

If filesystem wishes to export lockmgr compatible lock, it can put an address
of this lock to v_vnlock field. This indicates that the upper filesystem
can take advantage of it and use single lock structure for entire (or part)
of stack of vnodes. This field shouldn't be examined or modified by VFS code
except for initialization purposes.

Reviewed in general by: mckusick


65780 12-Sep-2000 bde

Fixed some serious bugs in ext2_readdir():

The cookie buffer was usually overrun by a large amount whenever
cookies were used. Cookies are used by nfs and the Linuxulator, so
this bug usually caused panics whenever an ext2fs filesystem was nfs
mounted or a Linux utility that calls readdir() was run on an ext2fs
filesystem.

The directory buffer was sometimes overrun by a small amount. This
sometimes caused panics and wrong results even for FreeBSD utilities,
but it was usually harmless because FreeBSD utilities use a large
enough buffer size (4K). Linux utilities usually triggered the bug
since they use a too-small buffer size (512 bytes), at least with the
old RedHat utilities that I tested with.

PR: 19407 (this fix is incomplete or for a slightly different bug)


63788 24-Jul-2000 mckusick

This patch corrects the first round of panics and hangs reported
with the new snapshot code.

Update addaliasu to correctly implement the semantics of the old
checkalias function. When a device vnode first comes into existence,
check to see if an anonymous vnode for the same device was created
at boot time by bdevvp(). If so, adopt the bdevvp vnode rather than
creating a new vnode for the device. This corrects a problem which
caused the kernel to panic when taking a snapshot of the root
filesystem.

Change the calling convention of vn_write_suspend_wait() to be the
same as vn_start_write().

Split out softdep_flushworklist() from softdep_flushfiles() so that
it can be used to clear the work queue when suspending filesystem
operations.

Access to buffers becomes recursive so that snapshots can recursively
traverse their indirect blocks using ffs_copyonwrite() when checking
for the need for copy on write when flushing one of their own indirect
blocks. This eliminates a deadlock between the syncer daemon and a
process taking a snapshot.

Ensure that softdep_process_worklist() can never block because of a
snapshot being taken. This eliminates a problem with buffer starvation.

Cleanup change in ffs_sync() which did not synchronously wait when
MNT_WAIT was specified. The result was an unclean filesystem panic
when doing forcible unmount with heavy filesystem I/O in progress.

Return a zero'ed block when reading a block that was not in use at
the time that a snapshot was taken. Normally, these blocks should
never be read. However, the readahead code will occationally read
them which can cause unexpected behavior.

Clean up the debugging code that ensures that no blocks be written
on a filesystem while it is suspended. Snapshots must explicitly
label the blocks that they are writing during the suspension so that
they do not cause a `write on suspended filesystem' panic.

Reorganize ffs_copyonwrite() to eliminate a deadlock and also to
prevent a race condition that would permit the same block to be
copied twice. This change eliminates an unexpected soft updates
inconsistency in fsck caused by the double allocation.

Use bqrelse rather than brelse for buffers that will be needed
soon again by the snapshot code. This improves snapshot performance.


62976 11-Jul-2000 mckusick

Add snapshots to the fast filesystem. Most of the changes support
the gating of system calls that cause modifications to the underlying
filesystem. The gating can be enabled by any filesystem that needs
to consistently suspend operations by adding the vop_stdgetwritemount
to their set of vnops. Once gating is enabled, the function
vfs_write_suspend stops all new write operations to a filesystem,
allows any filesystem modifying system calls already in progress
to complete, then sync's the filesystem to disk and returns. The
function vfs_write_resume allows the suspended write operations to
begin again. Gating is not added by default for all filesystems as
for SMP systems it adds two extra locks to such critical kernel
paths as the write system call. Thus, gating should only be added
as needed.

Details on the use and current status of snapshots in FFS can be
found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness
is not included here. Unless and until you create a snapshot file,
these changes should have no effect on your system (famous last words).


61686 14-Jun-2000 alex

Fix typo (accessable --> accessible).

PR: 18588
Submitted by: Anatoly Vorobey <mellon@pobox.com>
Reviewed by: asmodai


60938 26-May-2000 jake

Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by: msmith and others


60833 23-May-2000 jake

Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by: phk
Reviewed by: phk
Approved by: mdodd


60041 05-May-2000 phk

Separate the struct bio related stuff out of <sys/buf.h> into
<sys/bio.h>.

<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall
not be made a nested include according to bdes teachings on the
subject of nested includes.

Diskdrivers and similar stuff below specfs::strategy() should no
longer need to include <sys/buf.> unless they need caching of data.

Still a few bogus uses of struct buf to track down.

Repocopy by: peter


59794 30-Apr-2000 phk

Remove unneeded #include <vm/vm_zone.h>

Generated by: src/tools/tools/kerninclude


59762 29-Apr-2000 phk

s/biowait/bufwait/g

Prodded by: several.


59391 19-Apr-2000 phk

Remove ~25 unneeded #include <sys/conf.h>
Remove ~60 unneeded #include <sys/malloc.h>


59259 15-Apr-2000 rwatson

ext2fs relies on UFS support code, and as a result also requires
extattr.h to be included. This fixes the broken ext2fs build as of
the import of extattr code.

Also added $FreeBSD: $ to a couple of files that didn't have them,
without which I couldn't commit this fix.

Reported by: "George W. Dinolt" <gdinolt@pacbell.net>


59241 15-Apr-2000 rwatson

Introduce extended attribute support for FFS, allowing arbitrary
(name, value) pairs to be associated with inodes. This support is
used for ACLs, MAC labels, and Capabilities in the TrustedBSD
security extensions, which are currently under development.

In this implementation, attributes are backed to data vnodes in the
style of the quota support in FFS. Support for FFS extended
attributes may be enabled using the FFS_EXTATTR kernel option
(disabled by default). Userland utilities and man pages will be
committed in the next batch. VFS interfaces and man pages have
been in the repo since 4.0-RELEASE and are unchanged.

o ufs/ufs/extattr.h: UFS-specific extattr defines
o ufs/ufs/ufs_extattr.c: bulk of support routines
o ufs/{ufs,ffs,mfs}/*.[ch]: hooks and extattr.h includes
o contrib/softupdates/ffs_softdep.c: extattr.h includes
o conf/options, conf/files, i386/conf/LINT: added FFS_EXTATTR

o coda/coda_vfsops.c: XXX required extattr.h due to ufsmount.h
(This should not be the case, and will be fixed in a future commit)

Currently attributes are not supported in MFS. This will be fixed.

Reviewed by: adrian, bp, freebsd-fs, other unthanked souls
Obtained from: TrustedBSD Project


58934 02-Apr-2000 phk

Move B_ERROR flag to b_ioflags and call it BIO_ERROR.

(Much of this done by script)

Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED.

Move b_pblkno and b_iodone_chain to struct bio while we transition, they
will be obsoleted once bio structs chain/stack.

Add bio_queue field for struct bio aware disksort.

Address a lot of stylistic issues brought up by bde.


58909 02-Apr-2000 dillon

Change the write-behind code to take more care when starting
async I/O's. The sequential read heuristic has been extended to
cover writes as well. We continue to call cluster_write() normally,
thus blocks in the file will still be reallocated for large (but still
random) I/O's, but I/O will only be initiated for truely sequential
writes.

This solves a number of annoying situations, especially with DBM (hash
method) writes, and also has the side effect of fixing a number of
(stupid) benchmarks.

Reviewed-by: mckusick


58349 20-Mar-2000 phk

Rename the existing BUF_STRATEGY() to DEV_STRATEGY()

substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo)

substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo)

This patch is machine generated except for the ccd.c and buf.h parts.


58345 20-Mar-2000 phk

Remove B_READ, B_WRITE and B_FREEBUF and replace them with a new
field in struct buf: b_iocmd. The b_iocmd is enforced to have
exactly one bit set.

B_WRITE was bogusly defined as zero giving rise to obvious coding
mistakes.

Also eliminate the redundant struct buf flag B_CALL, it can just
as efficiently be done by comparing b_iodone to NULL.

Should you get a panic or drop into the debugger, complaining about
"b_iocmd", don't continue. It is likely to write on your disk
where it should have been reading.

This change is a step in the direction towards a stackable BIO capability.

A lot of this patch were machine generated (Thanks to style(9) compliance!)

Vinum users: Greg has not had time to test this yet, be careful.


58088 15-Mar-2000 mckusick

Bug fixes for currently harmless bugs that could rise to bite
the unwary if the code were called in slightly different ways.

1) In ufs_bmaparray() the code for calculating 'runb' will stop one block
short of the first entry in an indirect block. i.e. if an indirect block
contains N block numbers b[0]..b[N-1] then the code will never check if
b[0] and b[1] are sequential. For reference, compare with the equivalent
code that deals with direct blocks.

2) In ufs_lookup() there is an off-by-one error in the test that checks
if dp->i_diroff is outside the range of the the current directory size.
This is completely harmless, since the following while-loop condition
'dp->i_offset < endsearch' is never met, so the code immediately
does a second pass starting at dp->i_offset = 0.

3) Again in ufs_lookup(), the condition in a sanity check is wrong
for directories that are longer than one block. This bug means that
the sanity check is only effective for small directories.

Submitted by: Ian Dowse <iedowse@maths.tcd.ie>


57839 09-Mar-2000 bde

Don't forget to check for unsupported features when updating. It was
possible to defeat the check for rw incompatibilty by mounting ro and
updating to rw.

Approved by: jkh


57710 03-Mar-2000 bde

MFS (ext2_lookup.c 1.17.2.2, ext2_vnops.c 1.42.2.2: fix "filetype" support).

Approved by: jkh


55756 10-Jan-2000 phk

Give vn_isdisk() a second argument where it can return a suitable errno.

Suggested by: bde


55477 05-Jan-2000 bde

Support filesystems with the not-so-new "filetype" feature. This
feature gives the d_type field for struct dirent. We used to panic
in ext2_readdir() for filesystems with this feature.


55313 02-Jan-2000 bde

Don't allow mounting (or mounting R/W) of filesystems with unsupported
features (except for file types in directory entries, which will be
supported soon).

Centralized the magic number and compatibility checking.

Dropped support for ancient (pre-0.2b) filesystems, as in the Linux
version. Our "support" consisted of printing more details in the error
message before failing at mount time.


55304 01-Jan-2000 bde

Merged changes in ext2_fs.h between Linux 1.2.2 and Linux 2.3.35. The
main changes are:
- many things are more dynamic; e.g., the inode size is a new parameter
in the superblock instead of a constant.
- extensions are controlled by new flags in the superblock.
- directory entries may have a file type field.
These changes are not used yet, except for a spelling change which affects
ext2_cnv.c


55303 01-Jan-2000 bde

Merged cosmetic changes from the initial import on the vendor branch
(mainly things that were lost or misformatted in a different way by
moving them to ext2_fs_i.h and back, and ifdefs for user mode that
were excessively edited).


55299 01-Jan-2000 bde

Use an ifdef in ext2_fs.h instead of a bogus separate file (ext2_fs_i.h)
to avoid the namespace problems caused by <ufs/ufs/inode.h> #defining
i_mode, etc.

ext2_fs_i.h had nothing to do with the Linux version. It was a small
part of the Linux version of ext2_fs.h (the part that declares extra
in-core fields for an inode). We don't need it because we use the
ufs in-core inode for the extra fields.


55293 01-Jan-2000 bde

Updated/corrected the list of GPL'ed files.


55206 29-Dec-1999 peter

Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot). This is consistant with the other
BSD's who made this change quite some time ago. More commits to come.


54803 19-Dec-1999 rwatson

Second pass commit to introduce new ACL and Extended Attribute system
calls, vnops, vfsops, both in /kern, and to individual file systems that
require a vfsop_ array entry.

Reviewed by: eivind


54655 15-Dec-1999 eivind

Introduce NDFREE (and remove VOP_ABORTOP)


53452 20-Nov-1999 phk

struct mountlist and struct mount.mnt_list have no business being
a CIRCLEQ. Change them to TAILQ_HEAD and TAILQ_ENTRY respectively.

This removes ugly mp != (void*)&mountlist comparisons.

Requested by: phk
Submitted by: Jake Burkholder jake@checker.org
PR: 14967


53201 15-Nov-1999 obrien

Fix __asm__ clobber list abuse.

Submitted by: bde


53131 13-Nov-1999 eivind

Remove WILLRELE from VOP_SYMLINK

Note: Previous commit to these files (except coda_vnops and devfs_vnops)
that claimed to remove WILLRELE from VOP_RENAME actually removed it from
VOP_MKNOD.


53101 12-Nov-1999 eivind

Remove WILLRELE from VOP_RENAME


53059 09-Nov-1999 phk

Next step in the device cleanup process.

Correctly lock vnodes when calling VOP_OPEN() from filesystem mount code.

Unify spec_open() for bdev and cdev cases.

Remove the disabled bdev specific read/write code.


52838 03-Nov-1999 bde

Quick fix for breakage of ext2fs link counts as reported by stat(2) by
the soft updates changes: only report the link count to be i_effnlink
in ufs_getattr() for file systems that maintain i_effnlink.

Tested by: Mike Dracopoulos <mdraco@math.uoa.gr>


52782 01-Nov-1999 msmith

Newline-terminate the complaint message about not being able to find
the root vnode pointer.


51808 30-Sep-1999 phk

Remove the D_NOCLUSTER[RW] options which were added because vn had
problems. Now that Matt has fixed vn, this can go. The vn driver
should have used d_maxio (now si_iosize_max) anyway.


51797 29-Sep-1999 phk

Remove v_maxio from struct vnode.

Replace it with mnt_iosize_max in struct mount.

Nits from: bde


51486 20-Sep-1999 dillon

More removals of vnode->v_lastr, replaced by preexisting seqcount
heuristic to detect sequential operation.

VM-related forced clustering code removed from ufs in preparation for a
commit to vm/vm_fault.c that does it more generally.

Reviewed by: David Greenman <dg@root.com>, Alan Cox <alc@cs.rice.edu>


51483 20-Sep-1999 phk

Fix a harmless bug I introduced, simplify a bit more while here.


51479 20-Sep-1999 phk

Step one of replacing devsw->d_maxio with si_bsize_max.

Rename dev->si_bsize_max to si_iosize_max and set it in spec_open
if the device didn't.

Set vp->v_maxio from dev->si_bsize_max in spec_open rather than
in ufs_bmap.c


51138 11-Sep-1999 alfred

Seperate the export check in VFS_FHTOVP, exports are now checked via
VFS_CHECKEXP.

Add fh(open|stat|stafs) syscalls to allow userland to query filesystems
based on (network) filehandle.

Obtained from: NetBSD


50477 28-Aug-1999 peter

$Id$ -> $FreeBSD$


50347 25-Aug-1999 phk

Introduce vn_isdisk(struct vnode *vp) function, and use it to test for diskness.


50260 23-Aug-1999 bde

Oops, the previous commit was missing a new include.


50256 23-Aug-1999 bde

Initialise fsids with (user) device numbers again. Bitrot when dev_t's
were changed to pointers was obscured by casting dev_t's to longs.
fsids haven't even been comprised of longs since the Lite2 merge.


50253 23-Aug-1999 bde

Use devtoname() to print dev_t's instead of casting them to long or u_long
for misprinting in %lx format.


49679 13-Aug-1999 phk

The bdevsw() and cdevsw() are now identical, so kill the former.


49535 08-Aug-1999 phk

Decommision miscfs/specfs/specdev.h. Most of it goes into <sys/conf.h>,
a few lines into <sys/vnode.h>.

Add a few fields to struct specinfo, paving the way for the fun part.


49074 25-Jul-1999 bde

Don't set IN_ACCESS for requests to read 0 bytes or for unsuccessful reads.

Translated from: similar fixes in ufs_readwrite.c rev.1.61. Things
are simpler (but annoyingly different) here because there are no
vm optimisations.


48801 13-Jul-1999 mckusick

Create the macro DOINGASYNC to check whether the MNT_ASYNC flag has
been set for a mount point. Insert missing checks to ensure that all
write operations are done asynchronously when the MNT_ASYNC option
has been requested.

Submitted by: Craig A Soules <soules+@andrew.cmu.edu>
Reviewed by: Kirk McKusick <mckusick@mckusick.com>


48225 26-Jun-1999 mckusick

Convert buffer locking from using the B_BUSY and B_WANTED flags to using
lockmgr locks. This commit should be functionally equivalent to the old
semantics. That is, all buffer locking is done with LK_EXCLUSIVE
requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will
be done in future commits.


47964 16-Jun-1999 mckusick

Add a vnode argument to VOP_BWRITE to get rid of the last vnode
operator special case. Delete special case code from vnode_if.sh,
vnode_if.src, umap_vnops.c, and null_vnops.c.


47640 31-May-1999 phk

Simplify cdevsw registration.

The cdevsw_add() function now finds the major number(s) in the
struct cdevsw passed to it. cdevsw_add_generic() is no longer
needed, cdevsw_add() does the same thing.

cdevsw_add() will print an message if the d_maj field looks bogus.

Remove nblkdev and nchrdev variables. Most places they were used
bogusly. Instead check a dev_t for validity by seeing if devsw()
or bdevsw() returns NULL.

Move bdevsw() and devsw() functions to kern/kern_conf.c

Bump __FreeBSD_version to 400006

This commit removes:
72 bogus makedev() calls
26 bogus SYSINIT functions

if_xe.c bogusly accessed cdevsw[], author/maintainer please fix.

I4b and vinum not changed. Patches emailed to authors. LINT
probably broken until they catch up.


47099 13-May-1999 bde

Fixed printing of a dev_t in a panic message. Fixed the function name
in this message.


46676 08-May-1999 phk

I got tired of seeing all the cdevsw[major(foo)] all over the place.

Made a new (inline) function devsw(dev_t dev) and substituted it.

Changed to the BDEV variant to this format as well: bdevsw(dev_t dev)

DEVFS will eventually benefit from this change too.


46635 07-May-1999 phk

Continue where Julian left off in July 1998:

Virtualize bdevsw[] from cdevsw. bdevsw() is now an (inline)
function.

Join CDEV_MODULE and BDEV_MODULE to DEV_MODULE (please pay attention
to the order of the cmaj/bmaj arguments!)

Join CDEV_DRIVER_MODULE and BDEV_DRIVER_MODULE to DEV_DRIVER_MODULE
(ditto!)

(Next step will be to convert all bdev dev_t's to cdev dev_t's
before they get to do any damage^H^H^H^H^H^Hwork in the kernel.)


46349 02-May-1999 alc

The VFS/BIO subsystem contained a number of hacks in order to optimize
piecemeal, middle-of-file writes for NFS. These hacks have caused no
end of trouble, especially when combined with mmap(). I've removed
them. Instead, NFS will issue a read-before-write to fully
instantiate the struct buf containing the write. NFS does, however,
optimize piecemeal appends to files. For most common file operations,
you will not notice the difference. The sole remaining fragment in
the VFS/BIO system is b_dirtyoff/end, which NFS uses to avoid cache
coherency issues with read-merge-write style operations. NFS also
optimizes the write-covers-entire-buffer case by avoiding the
read-before-write. There is quite a bit of room for further
optimization in these areas.

The VM system marks pages fully-valid (AKA vm_page_t->valid =
VM_PAGE_BITS_ALL) in several places, most noteably in vm_fault. This
is not correct operation. The vm_pager_get_pages() code is now
responsible for marking VM pages all-valid. A number of VM helper
routines have been added to aid in zeroing-out the invalid portions of
a VM page prior to the page being marked all-valid. This operation is
necessary to properly support mmap(). The zeroing occurs most often
when dealing with file-EOF situations. Several bugs have been fixed
in the NFS subsystem, including bits handling file and directory EOF
situations and buf->b_flags consistancy issues relating to clearing
B_ERROR & B_INVAL, and handling B_DONE.

getblk() and allocbuf() have been rewritten. B_CACHE operation is now
formally defined in comments and more straightforward in
implementation. B_CACHE for VMIO buffers is based on the validity of
the backing store. B_CACHE for non-VMIO buffers is based simply on
whether the buffer is B_INVAL or not (B_CACHE set if B_INVAL clear,
and vise-versa). biodone() is now responsible for setting B_CACHE
when a successful read completes. B_CACHE is also set when a bdwrite()
is initiated and when a bwrite() is initiated. VFS VOP_BWRITE
routines (there are only two - nfs_bwrite() and bwrite()) are now
expected to set B_CACHE. This means that bowrite() and bawrite() also
set B_CACHE indirectly.

There are a number of places in the code which were previously using
buf->b_bufsize (which is DEV_BSIZE aligned) when they should have
been using buf->b_bcount. These have been fixed. getblk() now clears
B_DONE on return because the rest of the system is so bad about
dealing with B_DONE.

Major fixes to NFS/TCP have been made. A server-side bug could cause
requests to be lost by the server due to nfs_realign() overwriting
other rpc's in the same TCP mbuf chain. The server's kernel must be
recompiled to get the benefit of the fixes.

Submitted by: Matthew Dillon <dillon@apollo.backplane.com>


46155 28-Apr-1999 phk

This Implements the mumbled about "Jail" feature.

This is a seriously beefed up chroot kind of thing. The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.

For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact: "real virtual servers".

Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.

Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.

It generally does what one would expect, but setting up a jail
still takes a little knowledge.

A few notes:

I have no scripts for setting up a jail, don't ask me for them.

The IP number should be an alias on one of the interfaces.

mount a /proc in each jail, it will make ps more useable.

/proc/<pid>/status tells the hostname of the prison for
jailed processes.

Quotas are only sensible if you have a mountpoint per prison.

There are no privisions for stopping resource-hogging.

Some "#ifdef INET" and similar may be missing (send patches!)

If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!

Tools, comments, patches & documentation most welcome.

Have fun...

Sponsored by: http://www.rndassociates.com/
Run for almost a year by: http://www.servetheweb.com/


46112 27-Apr-1999 phk

Suser() simplification:

1:
s/suser/suser_xxx/

2:
Add new function: suser(struct proc *), prototyped in <sys/proc.h>.

3:
s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/

The remaining suser_xxx() calls will be scrutinized and dealt with
later.

There may be some unneeded #include <sys/cred.h>, but they are left
as an exercise for Bruce.

More changes to the suser() API will come along with the "jail" code.


44512 06-Mar-1999 bde

Don't depend on <ufs/ufs/quota.h> or another (old) prerequisite including
<sys/queue.h>. This fixes my recent breakage of biosboot by unpolluting
<ufs/ufs/quota.h> in the !KERNEL case.


44395 02-Mar-1999 imp

Merge patch to ufs_vnops.c's ufs_rename to the copy of ufs_rename that
lives in ext2_vnops.c for ext2fs. Also remove cast from comparision.
Bruce pointed out that it was bogus since we'd force a signed
comparision when we really wanted an unsigned comparison.


44272 25-Feb-1999 bde

Added a used #include (don't depend on "vnode_if.h" including <sys/buf.h>).


43395 29-Jan-1999 bde

Fixed parenthesization botch in previous commit. Async update of inodes
was broken.


43311 28-Jan-1999 dillon

Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile


43309 27-Jan-1999 dillon

Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile.

This commit includes significant work to proper handle const arguments
for the DDB symbol routines.


43301 27-Jan-1999 dillon

Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile


42539 11-Jan-1999 eivind

Avoid warning for unused variable.


42374 07-Jan-1999 bde

Don't pass unused unused timestamp args to UFS_UPDATE() or waste
time initializing them. This almost finishes centralizing (in-core)
timestamp updates in ufs_itimes().


42354 06-Jan-1999 bde

UFS_UPDATE() takes a boolean `waitfor' arg, so don't pass it the value
MNT_WAIT when we mean boolean `true' or check for that value not being
passed. There was no problem in practice because MNT_WAIT had the
magic value of 1.


41591 07-Dec-1998 archie

The "easy" fixes for compiling the kernel -Wunused: remove unreferenced static
and local variables, goto labels, and functions declared but not defined.


41174 15-Nov-1998 bde

Fixed a misspelling of boolean true as MNT_WAIT.


40790 31-Oct-1998 peter

Use TAILQ macros for clean/dirty block list processing. Set b_xflags
rather than abusing the list next pointer with a magic number.


40721 29-Oct-1998 peter

error return assignment was less than ideal. Fix the part that caused
warnings to be the same as the ffs code. Previously, any error from
the UFS_UPDATE() call was lost (I think).


40718 29-Oct-1998 peter

Use vtruncbuf() to clean out cached blocks on a file shorten rather than
the more expensive vinvalbuf(), based on the FFS version of the same
routine. I don't have any ext2fs filesystems to test this on.


40672 27-Oct-1998 bde

Oops, the redundant tests for major numbers weren't redundant here.
They checked for the magic major number for the "device" behind mfs
mount points. Use a more obvious check for this device.

Debugged by: Andrew Gallatin <gallatin@cs.duke.edu>


40660 26-Oct-1998 bde

Removed redundant bitrotted checks for major numbers instead of updating
them.


40651 25-Oct-1998 bde

Don't follow null bdevsw pointers. The `major(dev) < nblkdev' test rotted
when bdevsw[] became sparse. We still depend on magic to avoid having to
check that (v_rdev) device numbers in vnodes are not NODEV.


40304 13-Oct-1998 bde

Fixed bloatage of `struct inode'. We used 5 "spare" fields for ext2fs,
but when i_effnlink was added to support soft updates, there was only
room for 4 spares. The number of spares was not reduced, so the inode
size became 260 (on i386's), or 512 after rounding up by malloc().
Use one spare field in `struct dinode' instead of the 5th spare field
in the inode and reduced to 4 spares in the inode so that the size is
256 again.

Changed the types of the spares in the inode from int to u_int32_t
so that the inode size has more chance of being <= 256 under other
arches, and downdated ext2fs to match (it was broken to use ints
before rev.1.1).


39924 03-Oct-1998 bde

Quick fix for not being able to sync all the buffers in boot() if
an ext2fs file system is mounted. The soft update changes added
a check for B_DELWRI buffers. This exposed the complete brokenness
of the previous quick fix for failing syncs (PR 3571, committed on
1997/08/04). Use a new buffer flag B_DIRTY and don't abuse B_DELWRI.
B_DIRTY buffers are still written too late, as broken in the previous
fix. This is fairly harmless, because B_DIRTY is only used for
bitmap buffers and fsck.ext2 can fix up the bitmaps perfectly.

Fixed a race in ULCK_BUF() (bremfree() was outside of the splbio()
section).


39753 29-Sep-1998 bde

Fixed initialization of new inodes. ext2fs doesn't clear inodes when
they are deleted, so inodes must be cleared when they are reused, but
we didn't clear the indirect blocks. This caused serious filesystem
corruption.


39678 26-Sep-1998 bde

Updated ext2_reload() and ext2_sync(). Locking was broken, and MNT_LAZY
syncs weren't optimized properly (they probably still aren't, but are bug
for bug compatible with ffs). These fixes are mostly academic, since
ext2fs is too broken to mount read-write (it apparently doesn't clear
indirect blocks).

Obtained from: mostly from Lite2


39671 26-Sep-1998 bde

Fixed missing newlines in messages in ext2_check_descriptors().

Fixed vnode and memory leaks after an unlikely (?) error in
ext2_mountfs().

Fixed an unconditional memory leak in ext2_unmount().


39670 26-Sep-1998 bde

Fixed clean flag handling:

Fixes for bugs not shared with ffs:
- don't mount unclean filesystems rw unless forced to.
- accept EXT2_ERROR_FS (treat it like !EXT2_VALID_FS). We still don't set
this or honour the maximal mount count.
- don't attempt to print the name of the mount point when mounting an
unclean file system, since the name of the previous mount point is
unknown and the name of the current mount point is still "".

Fixes for bugs shared with ffs until recently:
- don't set the clean flag on unmount of an initially-unclean filesystem
that was (forcibly) mounted rw.
- set the clean flag on rw -> ro update of a mounted initially-clean
filesystem.
- fixed some style bugs (mostly long lines).

The fixes are slightly simpler than for ffs, because the relevant on-disk
state is not a simple boolean variable, and the superblock has a core-only
extension.

Obtained from: parts from ffs_vfsops.c, parts from NetBSD


39028 09-Sep-1998 bde

Fixed the usual missing permissions checks in mount(). As for cd9660,
the damage was limited by the default of 0 for vfs.usermount.

Obtained from: Lite2 via the -current ffs_vfsops.c


38998 09-Sep-1998 bde

Don't forget to initialize the inode lock. This bug caused
surprisingly few problems. Most fields were initialized to the
correct values by bzero(), but lk_prio was 0 instead of PINOD (=8),
the lk_wmsg was NULL instead of "ext2in", and lk_lockholder was 0
instead of -1.

Obtained from: Lite2 via the -current ffs_vfsops.c


38997 09-Sep-1998 bde

Support compiling with `gcc -pedantic' (don't use hard newlines in
(asm) string constants).


38909 07-Sep-1998 bde

Removed statically configured mount type numbers (MOUNT_*) and all
references to them.

The change a couple of days ago to ignore these numbers in statically
configured vfsconf structs was slightly premature because the cd9660,
cfs, devfs, ext2fs, nfs vfs's still used MOUNT_* instead of the number
in their vfsconf struct.


38418 18-Aug-1998 bde

Quick fix for breakage of read clustering on non-IDE drives. Read
clustering is obsolescent technology so hardly anyone noticed. On
a DORS 32160 SCSI drive with 4 tags, read clustering makes very
little difference even for huge sequential reads. However, on a
ZIP SCSI drive with 0 tags, the minimum overhead per block is about
40 msec, so very large clusters must be used to get anywhere near
the maximum transfer rate. Using clusters consisting of 1 8K block
reduces the transfer rate to about 250K/sec. Under msdosfs, missing
read clustering is normal and a cluster size of 1 512 byte block
reduces the transfer rate to about 25K/sec.

Broken in: rev.1.18


38292 12-Aug-1998 msmith

"The releaseing of the reference and lock is not temporary and belongs
where it is. The reference and lock(s) are acquired just above the
code in VREF() and relookup()."

Submitted by: Michael Hancock <michaelh@cet.co.jp>


37976 30-Jul-1998 bde

Fixed printf format errors.


37966 30-Jul-1998 julian

add anti-panic workaround from chris radek (cradek@in221.inetnebr.com)
Not sure why this is needed but but does stop crashes.


37555 11-Jul-1998 bde

Fixed printf format errors.


37490 08-Jul-1998 julian

Catch a few corner cases where FreeBSD differs enough from BSD 4.4 to
confuse Soft updates..
Should solve several "dangling deps" panics.


37384 04-Jul-1998 julian

VOP_STRATEGY grows an (struct vnode *) argument
as the value in b_vp is often not really what you want.
(and needs to be frobbed). more cleanups will follow this.
Reviewed by: Bruce Evans <bde@freebsd.org>


37363 03-Jul-1998 bde

Sync timestamp changes for inodes of special files to disk as late
as possible (when the inode is reclaimed). Temporarily only do
this if option UFS_LAZYMOD configured and softupdates aren't enabled.
UFS_LAZYMOD is intentionally left out of /sys/conf/options.

This is mainly to avoid almost useless disk i/o on battery powered
machines. It's silly to write to disk (on the next sync or when the
inode becomes inactive) just because someone hit a key or something
wrote to the screen or /dev/null.

PR: 5577
Previous version reviewed by: phk


37362 03-Jul-1998 bde

Centralized in-core inode update. Update the in-core inode directly
in ufs_setattr() so that there is no need to pass timestamps to
UFS_UPDATE() (everything else just needs the current time). Ignore
the passed-in timestamps in UFS_UPDATE() and always call ufs_itimes()
(was: itimes()) to do the update. The timestamps are still passed
so that all the callers don't need to be changed yet.


37103 21-Jun-1998 bde

Fixed (?) races in mark_buffer_dirty(). We abuse the buffer cache
by hacking on locked buffers without getblk()ing them, and we didn't
even use splbio() to prevent biodone() changing the buffer underneath
use when a write completes. I think there was no problem in practice
on i386's because the operations on b_flags and numdirtybufs happen to
be atomic. We still depend on biodone()'s operations on b_flags not
interfering with ours. I think there is only interference for B_ERROR,
and this is harmless because errors for async writes are ignored anyway.

Don't use mark_buffer_dirty() except for superblock-related metadata.
It was used in just one case where ordinary BSD buffering is more
natural.


37102 21-Jun-1998 bde

Removed unused function ll_w_block(). It has always had races due
to not using splbio(), and has rotted a little. The races were
probably harmless in practice because this function was only used
for superblock updates, and separate superblock updates are probably
prevented from running into each other by doing part of the update
synchronously.


37101 21-Jun-1998 bde

Removed unused includes.


37094 21-Jun-1998 bde

Removed unused includes.


37088 21-Jun-1998 bde

Added a missing options include.


36102 16-May-1998 bde

Don't use "ffs" in an ext2fs sleep message string.

Don't forget to clear the inode hash lock before returning from ext2_vget()
after getnewvnode() fails. Obtained from: rev.1.24 of ffs_vfsops.c (the
original patch for the getnewvnode() race). Forgotten in: rev.1.4 here.

Removed a duplicate comment. Duplicated in: rev.1.4 here.

Fixed the MALLOC() vs getnewvnode() race in ext2_vget(). Obtained from:
rev.1.39 of ffs_vfsops.c.


36101 16-May-1998 bde

Abbreviate "ext2fs_fsync" as "e2fsyn" instead of as "extfsn" in a sleep
message string.


35823 07-May-1998 msmith

In the words of the submitter:

---------
Make callers of namei() responsible for releasing references or locks
instead of having the underlying filesystems do it. This eliminates
redundancy in all terminal filesystems and makes it possible for stacked
transport layers such as umapfs or nullfs to operate correctly.

Quality testing was done with testvn, and lat_fs from the lmbench suite.

Some NFS client testing courtesy of Patrik Kudo.

vop_mknod and vop_symlink still release the returned vpp. vop_rename
still releases 4 vnode arguments before it returns. These remaining cases
will be corrected in the next set of patches.
---------

Submitted by: Michael Hancock <michaelh@cet.co.jp>


35769 06-May-1998 msmith

As described by the submitter:

Reverse the VFS_VRELE patch. Reference counting of vnodes does not need
to be done per-fs. I noticed this while fixing vfs layering violations.
Doing reference counting in generic code is also the preference cited by
John Heidemann in recent discussions with him.

The implementation of alternative vnode management per-fs is still a valid
requirement for some filesystems but will be revisited sometime later,
most likely using a different framework.

Submitted by: Michael Hancock <michaelh@cet.co.jp>


35256 17-Apr-1998 des

Seventy-odd "its" / "it's" typos in comments fixed as per kern/6108.


35210 15-Apr-1998 bde

Support compiling with `gcc -ansi'.


34961 30-Mar-1998 phk

Eradicate the variable "time" from the kernel, using various measures.
"time" wasn't a atomic variable, so splfoo() protection were needed
around any access to it, unless you just wanted the seconds part.

Most uses of time.tv_sec now uses the new variable time_second instead.

gettime() changed to getmicrotime(0.

Remove a couple of unneeded splfoo() protections, the new getmicrotime()
is atomic, (until Bruce sets a breakpoint in it).

A couple of places needed random data, so use read_random() instead
of mucking about with time which isn't random.

Add a new nfs_curusec() function.

Mark a couple of bogosities involving the now disappeard time variable.

Update ffs_update() to avoid the weird "== &time" checks, by fixing the
one remaining call that passwd &time as args.

Change profiling in ncr.c to use ticks instead of time. Resolution is
the same.

Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call
hzto() which subtracts time" sequences.

Reviewed by: bde


34924 28-Mar-1998 bde

Moved some #includes from <sys/param.h> nearer to where they are actually
used.


34901 26-Mar-1998 phk

Add two new functions, get{micro|nano}time.

They are atomic, but return in essence what is in the "time" variable.
gettime() is now a macro front for getmicrotime().

Various patches to use the two new functions instead of the various
hacks used in their absence.

Some puntuation and grammer patches from Bruce.

A couple of XXX comments.


34430 09-Mar-1998 eivind

Make this compile after soft updates integration.

LINTing forgotten by: julian


34266 08-Mar-1998 julian

Reviewed by: dyson@freebsd.org (john Dyson), dg@root.com (david greenman)
Submitted by: Kirk McKusick (mcKusick@mckusick.com)
Obtained from: WHistle development tree


33964 01-Mar-1998 msmith

The intent is to get rid of WILLRELE in vnode_if.src by making
a complement to all ops that return a vpp, VFS_VRELE. This is
initially only for file systems that implement the following ops
that do a WILLRELE:

vop_create, vop_whiteout, vop_mknod, vop_remove, vop_link,
vop_rename, vop_mkdir, vop_rmdir, vop_symlink

This is initial DNA that doesn't do anything yet. VFS_VRELE is
implemented but not called.

A default vfs_vrele was created for fs implementations that use the
standard vnode management routines.

VFS_VRELE implementations were made for the following file systems:

Standard (vfs_vrele)
ffs mfs nfs msdosfs devfs ext2fs

Custom
union umapfs

Just EOPNOTSUPP
fdesc procfs kernfs portal cd9660

These implementations may change as VOP changes are implemented.

In the next phase, in the vop implementations calls to vrele and the vrele
part of vput will be moved to the top layer vfs_vnops and made visible
to all layers. vput will be replaced by unlock in these cases. Unlocking
will still be done in the per fs layer but the refcount decrement will be
triggered at the top because it doesn't hurt to hold a vnode reference a
little longer. This will have minimal impact on the structure of the
existing code.

This will only be done for vnode arguments that are released by the various
fs vop implementations.

Wider use of VFS_VRELE will likely require restructuring of the code.

Reviewed by: phk, dyson, terry et. al.
Submitted by: Michael Hancock <michaelh@cet.co.jp>


33935 01-Mar-1998 msmith

Style nits and staticism with the previous commit.
Submitted by: bde


33933 01-Mar-1998 msmith

Add local stup putpages/getpages routines.
Submitted by: Terry Lambert <terry@freebsd.org>


33291 13-Feb-1998 bde

Fixed configuration and linkage of ext2_checkoverlap().


33134 06-Feb-1998 eivind

Back out DIAGNOSTIC changes.


33108 04-Feb-1998 eivind

Turn DIAGNOSTIC into a new-style option.


33064 04-Feb-1998 eivind

Make LINT at least compile. This faithfully duplicate the changes
done to ufs/ufs/ufs_vnops.c for the same problem, but I don't know if
that will actually make SUIDDIR work for ext2fs.


32889 30-Jan-1998 phk

Retire LFS.

If you want to play with it, you can find the final version of the
code in the repository the tag LFS_RETIREMENT.

If somebody makes LFS work again, adding it back is certainly
desireable, but as it is now nobody seems to care much about it,
and it has suffered considerable bitrot since its somewhat haphazard
integration.

R.I.P


32724 24-Jan-1998 dyson

Add better support for larger I/O clusters, including larger physical
I/O. The support is not mature yet, and some of the underlying implementation
needs help. However, support does exist for IDE devices now.


32011 27-Dec-1997 bde

Unspammed nested include of <vm/vm_zone.h>.


31749 15-Dec-1997 eivind

Convert SUIDDIR fully to a new-style option.

Forgotten by: julian


31561 05-Dec-1997 bde

Don't include <sys/lock.h> in headers when only `struct simplelock' is
required. Fixed everything that depended on the pollution.


31557 05-Dec-1997 jkh

Needs to include <sys/lock.h> if we're using struct lock.


31517 03-Dec-1997 bde

Fixed corruption of the per-group used directories count. It wasn't
decremented when directories were removed because rev.1.12 broke the
fixup of the i_mode of the inode being removed.


31495 02-Dec-1997 phk

Fix the copyright and attribution on this file. I forgot this
when the file was cloned.


31485 02-Dec-1997 bde

Use the same algorithm as ffs for generation numbers.


31483 02-Dec-1997 bde

Removed __FreeBSD__ ifdefs.


31398 24-Nov-1997 bde

Fixed missing #include of "opt_quota.h".

Sorted the functions into the same order as in ufs_vnops.c so that this
can be compared with the latter without getting 2627 lines of diffs.
Now we get only 1920 lines of diffs.


31394 24-Nov-1997 bde

Fixed overflow in ufs_getblns(). For ufs on systems with 32-bit ints,
triple indirect blocks only worked for block sizes of 4K, since
MNINDIR(ump)**3 overflows for larger block sizes (e.g.,
(8192/4)**3 = 2**33 > INT_MAX). This fix is not the obvious one of
changing some types to 64 bits. It rearranges the code to avoid some
unnecessary 64-bit calculations.

Reviewed by: Kirk McKusick <mckusick@McKusick.COM>


31315 20-Nov-1997 bde

Use consistent description strings for M_EXT2NODE. This also fixes a
spelling error in the unused string.


31268 18-Nov-1997 phk

Give ext2fs it's own VOP_REMOVE, VOP_LINK, VOP_RENAME, VOP_MKDIR, VOP_RMDIR,
VOP_CREATE, VOP_MKNOD, VOP_SYMLINK and ext2_makeinode().


31132 12-Nov-1997 julian

Reviewed by: various.

Ever since I first say the way the mount flags were used I've hated the
fact that modes, and events, internal and exported, and short-term
and long term flags are all thrown together. Finally it's annoyed me enough..
This patch to the entire FreeBSD tree adds a second mount flag word
to the mount struct. it is not exported to userspace. I have moved
some of the non exported flags over to this word. this means that we now
have 8 free bits in the mount flags. There are another two that might
well move over, but which I'm not sure about.
The only user visible change would have been in pstat -v, except
that davidg has disabled it anyhow.
I'd still like to move the state flags and the 'command' flags
apart from each other.. e.g. MNT_FORCE really doesn't have the
same semantics as MNT_RDONLY, but that's left for another day.


30780 27-Oct-1997 bde

Removed unused #includes. The need for most of them went away with
recent changes (docluster* and vfs improvements).


30745 26-Oct-1997 phk

I guess nobody uses ext2fs in current ?
vop_lookup is back now, don't know whan I lost it.


30513 17-Oct-1997 phk

Make a set of VOP standard lock, unlock & islocked VOP operators, which
depend on the lock being located at vp->v_data. Saves 3x3 identical
vop procs, more as the other filesystems becomes lock aware.


30492 16-Oct-1997 phk

Another VFS cleanup "kilo commit"

1. Remove VOP_UPDATE, it is (also) an UFS/{FFS,LFS,EXT2FS,MFS}
intereface function, and now lives in the ufsmount structure.

2. Remove VOP_SEEK, it was unused.

3. Add mode default vops:

VOP_ADVLOCK vop_einval
VOP_CLOSE vop_null
VOP_FSYNC vop_null
VOP_IOCTL vop_enotty
VOP_MMAP vop_einval
VOP_OPEN vop_null
VOP_PATHCONF vop_einval
VOP_READLINK vop_einval
VOP_REALLOCBLKS vop_eopnotsupp

And remove identical functionality from filesystems

4. Add vop_stdpathconf, which returns the canonical stuff. Use
it in the filesystems. (XXX: It's probably wrong that specfs
and fifofs sets this vop, shouldn't it come from the "host"
filesystem, for instance ufs or cd9660 ?)

5. Try to make system wide VOP functions have vop_* names.

6. Initialize the um_* vectors in LFS.

(Recompile your LKMS!!!)


30474 16-Oct-1997 phk

VFS mega cleanup commit (x/N)

1. Add new file "sys/kern/vfs_default.c" where default actions for
VOPs go. Implement proper defaults for ABORTOP, BWRITE, LEASE,
POLL, REVOKE and STRATEGY. Various stuff spread over the entire
tree belongs here.

2. Change VOP_BLKATOFF to a normal function in cd9660.

3. Kill VOP_BLKATOFF, VOP_TRUNCATE, VOP_VFREE, VOP_VALLOC. These
are private interface functions between UFS and the underlying
storage manager layer (FFS/LFS/MFS/EXT2FS). The functions now
live in struct ufsmount instead.

4. Remove a kludge of VOP_ functions in all filesystems, that did
nothing but obscure the simplicity and break the expandability.
If a filesystem doesn't implement VOP_FOO, it shouldn't have an
entry for it in its vnops table. The system will try to DTRT
if it is not implemented. There are still some cruft left, but
the bulk of it is done.

5. Fix another VCALL in vfs_cache.c (thanks Bruce!)


30469 16-Oct-1997 julian

Two more places where root filesystems were mounted, put them at the head of
the mount list in case there is already DEVFS present.


30439 15-Oct-1997 phk

vnops megacommit

1. Use the default function to access all the specfs operations.
2. Use the default function to access all the fifofs operations.
3. Use the default function to access all the ufs operations.
4. Fix VCALL usage in vfs_cache.c
5. Use VOCALL to access specfs functions in devfs_vnops.c
6. Staticize most of the spec and fifofs vnops functions.
7. Make UFS panic if it lacks bits of the underlying storage handling.


30434 15-Oct-1997 phk

Hmm, realign the vnops into two columns.


30431 15-Oct-1997 phk

Stylistic overhaul of vnops tables.
1. Remove comment stating the blatantly obvious.
2. Align in two columns.
3. Sort all but the default element alphabetically.
4. Remove XXX comments pointing out entries not needed.


30418 14-Oct-1997 phk

I think my previous change may have opened a race conditio.
This patch does the same thing, with no change in semantics.


30402 14-Oct-1997 phk

ufs_ihashrem() should not be called from the UFS layer, but from the
lower layer (LFS/FFS/?) like the rest of the ihash functions.
Otherwise it is impossible to make a lower layer that doesn't use the
ihash facility.


30354 12-Oct-1997 phk

Last major round (Unless Bruce thinks of somthing :-) of malloc changes.

Distribute all but the most fundamental malloc types. This time I also
remembered the trick to making things static: Put "static" in front of
them.

A couple of finer points by: bde


30309 11-Oct-1997 phk

Distribute and statizice a lot of the malloc M_* types.

Substantial input from: bde


30285 10-Oct-1997 phk

Make ufs_reclaim free the underlying inode.


30280 10-Oct-1997 phk

Mega commit to cleanup the "remaining nits" after my malloc change.

Introduce a M_EXT2NODE for ext2fs vnodes.
Use generic ufs_reclaim instead of hijacking ffs_reclaim.


30204 07-Oct-1997 bde

`numdirtybuffers' was not maintained properly. This caused excessive
flushing of buffers in an attempt to reduce numdirtybuffers, and
perhaps other problems.


29906 28-Sep-1997 kato

Oops, include <sys/conf.h>.

Reminded-by: Simon Shapiro <Shimon@i-Connect.Net>


29888 27-Sep-1997 kato

Clustered read and write are switched at mount-option level.

1. Clustered I/O is switched by the MNT_NOCLUSTERR and MNT_NOCLUSTERW
bits of the mnt_flag. The sysctl variables, vfs.foo.doclusterread
and vfs.foo.doclusterwrite are deleted. Only mount option can
control clustered I/O from userland.
2. When foofs_mount mounts block device, foofs_mount checks D_CLUSTERR
and D_CLUSTERW bits of the d_flags member in the block device switch
table. If D_NOCLUSTERR / D_NOCLUSTERW are set, MNT_NOCLUSTERR /
MNT_NOCLUSTERW bits will be set. In this case, MNT_NOCLUSTERR and
MNT_NOCLUSTERW cannot be cleared from userland.
3. Vnode driver disables both clustered read and write.
4. Union filesystem disables clutered write.

Reviewed by: bde


29725 22-Sep-1997 joerg

Make MFS a supported option, finally.


29362 14-Sep-1997 peter

Convert select -> poll.
Delete 'always succeed' select/poll handlers, replaced with generic call.
Flag missing vnode op table entries.


29284 10-Sep-1997 phk

Remove some stuff from lookup which is now handled centrally.


29208 07-Sep-1997 bde

Removed yet more vestiges of config-time swap configuration and/or
cleaned up nearby cruft.


29041 02-Sep-1997 bde

Removed unused #includes.


28787 26-Aug-1997 phk

Uncut&paste cache_lookup().

This unifies several times in theory indentical 50 lines of code.

The filesystems have a new method: vop_cachedlookup, which is the
meat of the lookup, and use vfs_cache_lookup() for their vop_lookup
method. vfs_cache_lookup() will check the namecache and pass on
to the vop_cachedlookup method in case of a miss.

It's still the task of the individual filesystems to populate the
namecache with cache_enter().

Filesystems that do not use the namecache will just provide the
vop_lookup method as usual.


28656 24-Aug-1997 kato

Code cleanup. Removed !FreeBSD code arround sysctl stuff. Renamed
doclusterread/doclusterwrite into ext2_doclusterread and
ext2_doclusterwrite, which are unique names. Moved #include of
<sys/sysctl.h> to the top of the file.

Pointed out by: Bruce Evans <bde@zeta.org.au>


28610 23-Aug-1997 kato

Added sysctl args vfs.ext2fs.doclusterread and
vfs.ext2fs.doclusterwrite which control cluster read/write operation
on ext2fs filesystem.


28270 16-Aug-1997 wollman

Fix all areas of the system (or at least all those in LINT) to avoid storing
socket addresses in mbufs. (Socket buffers are the one exception.) A number
of kernel APIs needed to get fixed in order to make this happen. Also,
fix three protocol families which kept PCBs in mbufs to not malloc them
instead. Delete some old compatibility cruft while we're at it, and add
some new routines in the in_cksum family.


27881 04-Aug-1997 dyson

Fix a problem with ext2fs so that filesystems mounted at reboot don't
keep ahold of buffers, and therefore leave filesystems dirty. I haven't
been able to test, but the code compiles. Those who run -current, please
test and report back!!! (Sorry :-)).

PR: kern/3571
Submitted by: Dirk Keunecke <dk@panda.rhein-main.de>


27375 13-Jul-1997 bde

Fixed comment about i_spare.


26664 15-Jun-1997 dyson

Fix a problem with the VN device. Specifically, the VN device can
cause a problem of spiraling death due to buffer resource limitations.
The vfs_bio code in general had little ability to handle buffer resource
management, and now it does. Also, there are a lot more knobs for tuning the
vfs_bio code now. The knobs came free because of the need that there
always be some immediately available buffers (non-delayed or locked) for
use. Note that the buffer cache code is much less likely to get bogged
down with lots of delayed writes, even more so than before.


26641 14-Jun-1997 bde

Removed unused #includes.


26001 22-May-1997 phk

Shrink struct inode by 20 bytes, so that malloc wastes less space.

Pointed out by: bde


24649 05-Apr-1997 dfr

Support NFS cookies in VOP_READDIR, allowing ext2fs filesystems to be
exported via NFS.

2.2 candidate.


24492 01-Apr-1997 bde

Fixed gratuitous ANSIisms.

Removed trailing newline from panic messages.


24491 01-Apr-1997 bde

Use __i386__ instead of i386 in ifdefs.

Don't compile unused (debugging?) functions.


24477 01-Apr-1997 bde

Removed nested include of <ufs/ufs/dir.h>. Use the pre-Lite2 hack of
defining doff_t both here and in <ufs/ufs/dir.h> so that this file
is independent of <ufs/ufs/dir.h>. It still has old prerequisites
<sys/param.h> and <ufs/ufs/quota.h>, and a new Lite2 prerequisite of
<sys/lock.h>, sigh.

This might fix lsof, which was broken by namespace pollution giving
conflicting definitions of DIRBLKSIZ.


24203 24-Mar-1997 bde

Don't include <sys/ioctl.h> in the kernel. Stage 1: don't include
it when it is not used. In most cases, the reasons for including it
went away when the special ioctl headers became self-sufficient.


24131 23-Mar-1997 bde

Don't #include <sys/fcntl.h> in <sys/file.h> if KERNEL is defined.
Fixed everything that depended on getting fcntl.h stuff from the wrong
place. Most things don't depend on file.h stuff at all.


24101 22-Mar-1997 bde

Fixed some invalid (non-atomic) accesses to `time', mostly ones of the
form `tv = time'. Use a new function gettime(). The current version
just forces atomicicity without fixing precision or efficiency bugs.
Simplified some related valid accesses by using the central function.


23562 09-Mar-1997 mpp

Update a number of routines to reflect the actual name
of the routine that caused the panic.


23347 03-Mar-1997 bde

Removed unused flag IN_RECURSE and unused struct member i_lockcount.


23346 03-Mar-1997 bde

Removed useless setting of IN_RECURSE. The (anti) locking for this needs
to be done in a different way, if at all.


23125 26-Feb-1997 dyson

Correct the port of ext2fs to Lite/2. I incorrectly used ufs_reclaim
instead of ffs_reclaim.


22975 22-Feb-1997 peter

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


22598 12-Feb-1997 bde

Fixed type mismatches. i_spare[N] in ufs/inode.h changed from long to
int. Change ext2fs to match. We probably already assume that ints have
>= 32 bits.


22579 12-Feb-1997 mpp

Add function prototypes for most of the new Lite2 functions.
Also made a few of the miscfs routines static to be
consistent. Some modules simply required some additional
#includes to remove -Wall warnings.


22521 10-Feb-1997 dyson

This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.

Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>


21673 14-Jan-1997 jkh

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


21001 29-Dec-1996 dyson

This commit is the embodiment of some VFS read clustering improvements.
Firstly, now our read-ahead clustering is on a file descriptor basis and not
on a per-vnode basis. This will allow multiple processes reading the
same file to take advantage of read-ahead clustering. Secondly, there
previously was a problem with large reads still using the ramp-up
algorithm. Of course, that was bogus, and now we read the entire
"chunk" off of the disk in one operation. The read-ahead clustering
algorithm should use less CPU than the previous also (I hope :-)).


19562 09-Nov-1996 bde

Fixed lookup of ".." in checkpath. It always failed, so renames of
directories to a different parent directory always failed. This bug
was caused by 4.4Lite2 changing the directory format and ext2fs not
keeping up.

Should be in 2.2.


19541 08-Nov-1996 bde

Fixed spacefree calculation in ext2_direnter(). This bug sometimes caused
panics.

This should be in 2.2, of course.

Submitted by: davidg
Obtained from: bouyer@antioche.ibp.fr (Manuel BOUYER) (fix for NetBSD)


19540 08-Nov-1996 bde

Removed gratuitous differences between ext2_readwrite.c and ufs_readwrite.c.
This fixes several bugs and one missing feature:
- cluster_read() was needlessly used for reading files of size exactly 1
block.
- EFAULT errors for read didn't terminate the loop. This was probably
harmless.
- IO_VMIO handling was missing near line 275. I don't know what this does.
- B_CLUSTEROK was only set if (doclusterwrite) nead line 293. This was
harmless, if only because another bug prevents doclusterwrite from being
0.
- MNT_NOATIME wasn't implemented.

This should be in 2.2, of course.

Reviewed by: davidg


18412 20-Sep-1996 nate

Whoops, I should've used the LINT config file. More ts -> tv changes
for timespec structure.


18397 19-Sep-1996 nate

In sys/time.h, struct timespec is defined as:

/*
* Structure defined by POSIX.4 to be like a timeval.
*/
struct timespec {
time_t ts_sec; /* seconds */
long ts_nsec; /* and nanoseconds */
};

The correct names of the fields are tv_sec and tv_nsec.

Reminded by: James Drobina <jdrobina@infinet.com>


18207 10-Sep-1996 bde

Updated #includes to 4.4Lite style.


16322 12-Jun-1996 gpalmer

Clean up -Wunused warnings.

Reviewed by: bde


15493 01-May-1996 bde

Removed bogus _BEGIN_DECLS/_END_DECLS.

Removed unused struct tag declarations in cloned code.

Added or cleaned up idempotency ifdefs.


14249 25-Feb-1996 bde

Removed vestigial support for the obsolete FIFO option. In ext2fs
it caused null pointer panics for all fifo operations unless FIFO
was defined.


13765 30-Jan-1996 mpp

Fix a bunch of spelling errors in the comment fields of
a bunch of system include files.


13490 19-Jan-1996 dyson

Eliminated many redundant vm_map_lookup operations for vm_mmap.
Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish
overhead for merged cache.
Efficiency improvement for vfs_cluster. It used to do alot of redundant
calls to cluster_rbuild.
Correct the ordering for vrele of .text and release of credentials.
Use the selective tlb update for 486/586/P6.
Numerous fixes to the size of objects allocated for files. Additionally,
fixes in the various pagers.
Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs.
Fixes in the swap pager for exhausted resources. The pageout code
will not as readily thrash.
Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into
page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE),
thereby improving efficiency of several routines.
Eliminate even more unnecessary vm_page_protect operations.
Significantly speed up process forks.
Make vm_object_page_clean more efficient, thereby eliminating the pause
that happens every 30seconds.
Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the
case of filesystems mounted async.
Fix a panic with busy pages when write clustering is done for non-VMIO
buffers.


13260 05-Jan-1996 wollman

Convert QUOTA to new-style option.


12911 17-Dec-1995 phk

Staticize.


12746 10-Dec-1995 bde

Restored variables that are used iff QUOTA is defined.

ext2fs still uses #if in many cases where the rest of the kernel uses
#ifdef (for QUOTA...).


12726 10-Dec-1995 bde

Restored used includes of <vm/vm_extern.h>.


12406 19-Nov-1995 dyson

Correct some serious porting errors. The worst one was that the
vnode was being placed upon the mount point twice!!!


12288 14-Nov-1995 phk

Get rid of the last debug sysctl variables of the old style.


12159 09-Nov-1995 bde

ext2_inode_cnv.c:
Included <sys/vnode.h> and its prerequisite <sys/proc.h>, and cleaned
up includes. The vop_t changes made the non-inclusion of vnode.h
fatal instead of just sloppy.

i386_bitops.h:
Changed `extern inline' to `static inline'. `extern inline' is a
Linuxism that stops things from compiling without -O. Fixed
idempotency identifier.

Misc:
Added prototypes. Staticized some functions so that prototypes are
unnecessary. Added casts. Cleaned up includes.


12158 09-Nov-1995 bde

Introduced a type `vop_t' for vnode operation functions and used
it 1138 times (:-() in casts and a few more times in declarations.
This change is null for the i386.

The type has to be `typedef int vop_t(void *)' and not `typedef
int vop_t()' because `gcc -Wstrict-prototypes' warns about the
latter. Since vnode op functions are called with args of different
(struct pointer) types, neither of these function types is any use
for type checking of the arg, so it would be preferable not to use
the complete function type, especially since using the complete
type requires adding 1138 casts to avoid compiler warnings and
another 40+ casts to reverse the function pointer conversions before
calling the functions.


12147 08-Nov-1995 dyson

Cleaned up some lint and some obvious prototyping errors.


12121 06-Nov-1995 dyson

Omitted a '#if FIFO' in ext2_vnops.c
Submitted by: Justin Gibbs


12117 05-Nov-1995 dyson

Changes to existing files for ext2fs support. The UFS mods need rework
in the future as they are a bit crufty -- but at least the stuff is in the
tree now.


12115 05-Nov-1995 dyson

Main code for the ext2fs filesystem. Please refer to the COPYRIGHT.INFO
file for GPL restrictions. This code was ported to the BSD platform
by Godmar Back <gback@facility.cs.utah.edu> and specifically to FreeBSD
by John Dyson. This code is still green and should be used with caution.
Additional changes to UFS necessary to make this code work will be commited
seperately.
Submitted by: Godmar Back <gback@facility.cs.utah.edu>
Obtained from: Lites/Mach4


12114 05-Nov-1995 dyson

Fix ufs_bmap so that triple indirect blocks might work.
Submitted by: Godmar Back <gback@facility.cs.utah.edu>


10551 04-Sep-1995 dyson

Added VOP_GETPAGES/VOP_PUTPAGES and also the "backwards" block count
for VOP_BMAP. Updated affected filesystems...


8876 30-May-1995 rgrimes

Remove trailing whitespace.


8041 24-Apr-1995 dyson

Changes to get rid of ufslk2 hangs when doing read/write to/from
mmap regions that are in the same file as the read/write.


7430 28-Mar-1995 bde

Add and move declarations to fix all of the warnings from `gcc -Wimplicit'
(except in netccitt, netiso and netns) that I didn't notice when I fixed
"all" such warnings before.


6875 04-Mar-1995 dg

Removed obsolete vtrace() remnants.


5455 09-Jan-1995 dg

These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.

The majority of the merged VM/cache work is by John Dyson.

The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.

vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.

vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.

vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.

vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.

vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.

pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.

vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.

proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.

swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.

machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.

machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.

ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.

Submitted by: John Dyson and David Greenman


5247 27-Dec-1994 bde

Use the same current time throughout ITIMES(). I want all current
timestamps for an atomic operation such as rename() on a local file
system to be identical.

Uniformize yet another idempotency ifdef. The comment nesting was
bogus.


3427 08-Oct-1994 phk

POSSIBLE BOGUS CODE found, (related to dos-partitions) in ufs_disksubr.c,
look for CC_WALL.
Cosmetics, a couple of unused vars.


2177 21-Aug-1994 paul

Made idempotent
Reviewed by:
Submitted by:


1817 02-Aug-1994 dg

Added $Id$


1549 25-May-1994 rgrimes

The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.

Reviewed by: Rodney W. Grimes
Submitted by: John Dyson and David Greenman


1541 24-May-1994 rgrimes

BSD 4.4 Lite Kernel Sources