History log of /freebsd-10-stable/sys/fs/msdosfs/
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
338180 22-Aug-2018 pfg

MFC r337456:
msdosfs: fixes for Undefined Behavior.

These were found by the Undefined Behaviour GsoC project at NetBSD:

Do not change signedness bit with left shift.
While there avoid signed integer overflow.
Address both issues with using unsigned type.

msdosfs_fat.c:512:42, left shift of 1 by 31 places cannot be represented
in type 'int'
msdosfs_fat.c:521:44, left shift of 1 by 31 places cannot be represented
in type 'int'
msdosfs_fat.c:744:14, left shift of 1 by 31 places cannot be represented
in type 'int'
msdosfs_fat.c:744:24, signed integer overflow: -2147483648 - 1 cannot be
represented in type 'int [20]'
msdosfs_fat.c:840:13, left shift of 1 by 31 places cannot be represented
in type 'int'
msdosfs_fat.c:840:36, signed integer overflow: -2147483648 - 1 cannot be
represented in type 'int [20]'

Detected with micro-UBSan in the user mode.

Hinted from: NetBSD (CVS 1.33)

333611 14-May-2018 pfg

MFC r333239:
msdosfs: long names of files are created incorrectly.

This fixes a regression that happened in r120492 (2003) where libkiconv
was introduced and we went from checking unlen to checking for '\0'.

PR: 111843
Patch by: Damjan Jovanovic

308552 11-Nov-2016 kib

MFC r308025:
Enable vn_io_fault() deadlock avoidance for msdosfs.

308551 11-Nov-2016 kib

MFC r308024:
Ensure that cluster allocations never allocate clusters outside the
volume limits.

308550 11-Nov-2016 kib

MFC r308023:
If the fatchain() call in chainalloc() returned an error, revert
marking the cluster run as in-use.

308549 11-Nov-2016 kib

MFC r308022:
Use symbolic name for the value of fully free word in pm_inusemap.

308548 11-Nov-2016 kib

MFC r308021:
Use symbolic name for the free cluster number.

308547 11-Nov-2016 kib

MFC r308020:
Fix comment formatting.

308546 11-Nov-2016 kib

MFC r308019:
Remove useless NULL check.

307742 21-Oct-2016 asomers

MFC r306276, but don't remove findwin95

Mount msdosfs with longnames support by default.

The old behavior depended on the FAT version and on what files were in the
root directory. "mount_msdosfs -o shortnames" is still supported.

298799 29-Apr-2016 kp

MFC r298664

msdosfs: Prevent buffer overflow when expanding win95 names

In win2unixfn() we expand Windows 95 style long names. In some cases that
requires moving the data in the nbp->nb_buf buffer backwards to make room. That
code failed to check for overflows, leading to a stack overflow in win2unixfn().

We now check for this event, and mark the entire conversion as failed in that
case. This means we present the 8 character, dos style, name instead.

PR: 204643
Differential Revision: https://reviews.freebsd.org/D6015

282270 30-Apr-2015 rmacklem

MFC: r281562
File systems that do not use the buffer cache (such as ZFS) must
use VOP_FSYNC() to perform the NFS server's Commit operation.
This patch adds a mnt_kern_flag called MNTK_USES_BCACHE which
is set by file systems that use the buffer cache. If this flag
is not set, the NFS server always does a VOP_FSYNC().
This should be ok for old file system modules that do not set
MNTK_USES_BCACHE, since calling VOP_FSYNC() is correct, although
it might not be optimal for file systems that use the buffer cache.

281457 12-Apr-2015 kib

MFC r281121:
Do not call msdosfs_sync() on the read-only msdosfs mounts.

PR: 199152

281456 12-Apr-2015 kib

MFC r281120:
Assert that an msdosfs mount is not read-only when FAT modifications
are requested.

281095 05-Apr-2015 kib

MFC r280342:
msdosfs: mark unused compat-mount fields.

278059 02-Feb-2015 dim

MFC r277898:

Fix a bunch of -Wcast-qual warnings in msdosfs_conv.c, by using
__DECONST. No functional change.

276648 04-Jan-2015 kib

MFC r276007:
Handle MAKEENTRY cnp flag in the VOP_CREATE().

276500 01-Jan-2015 kib

MFC r275897:
Set NOCACHE flag for CREATE namei() calls, do not specially handle
MAKEENTRY in VOP_LOOKUP().

276408 30-Dec-2014 kib

MFC r275638:
Do not call VFS_SYNC() before VFS_UNMOUNT() for forced unmount.

273255 18-Oct-2014 kib

MFC r272952:
Do not set IN_ACCESS flag for read-only mounts.

269165 28-Jul-2014 kib

MFC r268606:
Generalize vn_get_ino() to allow filesystems to use custom vnode
producer. Convert inline copies of vn_get_ino() in msdosfs and cd9660
into the uses of vn_get_ino_gen().

267816 24-Jun-2014 kib

MFC r267564:
In msdosfs_setattr(), add a check for result of the utimes(2) permissions test.
Refactor the permission checks for utimes(2).

265807 10-May-2014 kib

MFC r265275:
Overwrite the de_Name for the directories on rename to correct the dot
name.

263670 23-Mar-2014 pfg

MFC: r263441:

msdosfs: minor format fix - spaces vs tab

256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


254627 21-Aug-2013 ken

Expand the use of stat(2) flags to allow storing some Windows/DOS
and CIFS file attributes as BSD stat(2) flags.

This work is intended to be compatible with ZFS, the Solaris CIFS
server's interaction with ZFS, somewhat compatible with MacOS X,
and of course compatible with Windows.

The Windows attributes that are implemented were chosen based on
the attributes that ZFS already supports.

The summary of the flags is as follows:

UF_SYSTEM: Command line name: "system" or "usystem"
ZFS name: XAT_SYSTEM, ZFS_SYSTEM
Windows: FILE_ATTRIBUTE_SYSTEM

This flag means that the file is used by the
operating system. FreeBSD does not enforce any
special handling when this flag is set.

UF_SPARSE: Command line name: "sparse" or "usparse"
ZFS name: XAT_SPARSE, ZFS_SPARSE
Windows: FILE_ATTRIBUTE_SPARSE_FILE

This flag means that the file is sparse. Although
ZFS may modify this in some situations, there is
not generally any special handling for this flag.

UF_OFFLINE: Command line name: "offline" or "uoffline"
ZFS name: XAT_OFFLINE, ZFS_OFFLINE
Windows: FILE_ATTRIBUTE_OFFLINE

This flag means that the file has been moved to
offline storage. FreeBSD does not have any special
handling for this flag.

UF_REPARSE: Command line name: "reparse" or "ureparse"
ZFS name: XAT_REPARSE, ZFS_REPARSE
Windows: FILE_ATTRIBUTE_REPARSE_POINT

This flag means that the file is a Windows reparse
point. ZFS has special handling code for reparse
points, but we don't currently have the other
supporting infrastructure for them.

UF_HIDDEN: Command line name: "hidden" or "uhidden"
ZFS name: XAT_HIDDEN, ZFS_HIDDEN
Windows: FILE_ATTRIBUTE_HIDDEN

This flag means that the file may be excluded from
a directory listing if the application honors it.
FreeBSD has no special handling for this flag.

The name and bit definition for UF_HIDDEN are
identical to the definition in MacOS X.

UF_READONLY: Command line name: "urdonly", "rdonly", "readonly"
ZFS name: XAT_READONLY, ZFS_READONLY
Windows: FILE_ATTRIBUTE_READONLY

This flag means that the file may not written or
appended, but its attributes may be changed.

ZFS currently enforces this flag, but Illumos
developers have discussed disabling enforcement.

The behavior of this flag is different than MacOS X.
MacOS X uses UF_IMMUTABLE to represent the DOS
readonly permission, but that flag has a stronger
meaning than the semantics of DOS readonly permissions.

UF_ARCHIVE: Command line name: "uarch", "uarchive"
ZFS_NAME: XAT_ARCHIVE, ZFS_ARCHIVE
Windows name: FILE_ATTRIBUTE_ARCHIVE

The UF_ARCHIVED flag means that the file has changed and
needs to be archived. The meaning is same as
the Windows FILE_ATTRIBUTE_ARCHIVE attribute, and
the ZFS XAT_ARCHIVE and ZFS_ARCHIVE attribute.

msdosfs and ZFS have special handling for this flag.
i.e. they will set it when the file changes.

sys/param.h: Bump __FreeBSD_version to 1000047 for the
addition of new stat(2) flags.

chflags.1: Document the new command line flag names
(e.g. "system", "hidden") available to the
user.

ls.1: Reference chflags(1) for a list of file flags
and their meanings.

strtofflags.c: Implement the mapping between the new
command line flag names and new stat(2)
flags.

chflags.2: Document all of the new stat(2) flags, and
explain the intended behavior in a little
more detail. Explain how they map to
Windows file attributes.

Different filesystems behave differently
with respect to flags, so warn the
application developer to take care when
using them.

zfs_vnops.c: Add support for getting and setting the
UF_ARCHIVE, UF_READONLY, UF_SYSTEM, UF_HIDDEN,
UF_REPARSE, UF_OFFLINE, and UF_SPARSE flags.

All of these flags are implemented using
attributes that ZFS already supports, so
the on-disk format has not changed.

ZFS currently doesn't allow setting the
UF_REPARSE flag, and we don't really have
the other infrastructure to support reparse
points.

msdosfs_denode.c,
msdosfs_vnops.c: Add support for getting and setting
UF_HIDDEN, UF_SYSTEM and UF_READONLY
in MSDOSFS.

It supported SF_ARCHIVED, but this has been
changed to be UF_ARCHIVE, which has the same
semantics as the DOS archive attribute instead
of inverse semantics like SF_ARCHIVED.

After discussion with Bruce Evans, change
several things in the msdosfs behavior:

Use UF_READONLY to indicate whether a file
is writeable instead of file permissions, but
don't actually enforce it.

Refuse to change attributes on the root
directory, because it is special in FAT
filesystems, but allow most other attribute
changes on directories.

Don't set the archive attribute on a directory
when its modification time is updated.
Windows and DOS don't set the archive attribute
in that scenario, so we are now bug-for-bug
compatible.

smbfs_node.c,
smbfs_vnops.c: Add support for UF_HIDDEN, UF_SYSTEM,
UF_READONLY and UF_ARCHIVE in SMBFS.

This is similar to changes that Apple has
made in their version of SMBFS (as of
smb-583.8, posted on opensource.apple.com),
but not quite the same.

We map SMB_FA_READONLY to UF_READONLY,
because UF_READONLY is intended to match
the semantics of the DOS readonly flag.
The MacOS X code maps both UF_IMMUTABLE
and SF_IMMUTABLE to SMB_FA_READONLY, but
the immutable flags have stronger meaning
than the DOS readonly bit.

stat.h: Add definitions for UF_SYSTEM, UF_SPARSE,
UF_OFFLINE, UF_REPARSE, UF_ARCHIVE, UF_READONLY
and UF_HIDDEN.

The definition of UF_HIDDEN is the same as
the MacOS X definition.

Add commented-out definitions of
UF_COMPRESSED and UF_TRACKED. They are
defined in MacOS X (as of 10.8.2), but we
do not implement them (yet).

ufs_vnops.c: Add support for getting and setting
UF_ARCHIVE, UF_HIDDEN, UF_OFFLINE, UF_READONLY,
UF_REPARSE, UF_SPARSE, and UF_SYSTEM in UFS.
Alphabetize the flags that are supported.

These new flags are only stored, UFS does
not take any action if the flag is set.

Sponsored by: Spectra Logic
Reviewed by: bde (earlier version)


250193 02-May-2013 kib

The fsync(2) call should sync the vnode in such way that even after
system crash which happen after successfull fsync() return, the data
is accessible. For msdosfs, this means that FAT entries for the file
must be written.

Since we do not track the FAT blocks containing entries for the
current file, just do a sloppy sync of the devvp vnode for the mount,
which buffers, among other things, contain FAT blocks.

Simultaneously, for deupdat():
- optimize by clearing the modified flags before short-circuiting a
return, if the mount is read-only;
- only ignore the rest of the function for denode with DE_MODIFIED
flag clear when the waitfor argument is false. The directory buffer
for the entry might be of delayed write;
- microoptimize by comparing the updated directory entry with the
current block content;
- try to cluster the write, fall back to bawrite() if low on
resources.

Based on the submission by: bde
MFC after: 2 weeks


249588 17-Apr-2013 gabor

- Correct spelling in comments

Submitted by: Christoph Mallon <christoph.mallon@gmx.de> (via private mail)


249583 17-Apr-2013 gabor

- Correct mispellings of the word necessary

Submitted by: Christoph Mallon <christoph.mallon@gmx.de> (via private mail)


248282 14-Mar-2013 kib

Add currently unused flag argument to the cluster_read(),
cluster_write() and cluster_wbuild() functions. The flags to be
allowed are a subset of the GB_* flags for getblk().

Sponsored by: The FreeBSD Foundation
Tested by: pho


246921 17-Feb-2013 kib

Do not update the fsinfo block on each update of any fat block, this
is excessive. Postpone the flush of the fsinfo to VFS_SYNC(),
remembering the need for update with the flag MSDOSFS_FSIMOD, stored
in pm_flags.

FAT32 specification describes both FSI_Free_Count and FSI_Nxt_Free as
the advisory hints, not requiring them to be correct.

Based on the patch from bde, modified by me.

Reviewed by: bde
MFC after: 2 weeks


246219 01-Feb-2013 kib

The MSDOSFSMNT_WAITONFAT flag is bogus and broken. It does less than
track the MNT_SYNCHRONOUS flag. It is set to the latter at mount time
but not updated by MNT_UPDATE.

Use MNT_SYNCHRONOUS to decide to write the FAT updates syncrhonously.

Submitted by: bde
MFC after: 1 week


246218 01-Feb-2013 kib

Backup FATs were sometimes marked dirty by copying their first block
from the primary FAT, and then they were not marked clean on unmount.
Force marking them clean when appropriate.

Submitted by: bde
MFC after: 1 week


246217 01-Feb-2013 kib

The directory entry for dotdot was corrupted in the FAT32 case when moving
a directory to a subdir of the root directory from somewhere else.

For all directory moves that change the parent directory, the dotdot
entry must be fixed up. For msdosfs, the root directory is magic for
non-FAT32. It is less magic for FAT32, but needs the same magic for
the dotdot fixup. It didn't have it.

Both chkdsk and fsck_msdosfs fix the corrupt directory entries with no
problems.

The fix is to use the same magic for dotdot in msdosfs_rename() as in
msdosfs_mkdir().

For msdosfs_mkdir(), document the magic. When writing the dotdot entry
in mkdir, use explicitly set pcl variable instead on relying on the
start cluster of the root directory typically has a value < 65536.

Submitted by: bde
MFC after: 1 week


246216 01-Feb-2013 kib

The mountmsdosfs() function had an insane sanity test, remove it.

Trying FAT32 on a small partition failed to mount because
pmp->pm_Sectors was nonzero. Normally, FAT32 file systems are so
large that the 16-bit pm_Sectors can't hold the size. This is
indicated by setting it to 0 and using only pm_HugeSectors. But at
least old versions of newfs_msdos use the 16-bit field if possible,
and msdosfs supports this except for breaking its own support in the
sanity check. This is quite different from the handling of pm_FATsecs
-- now the 16-bit value is always ignored for FAT32 except for
checking that it is 0, and newfs_msdos doesn't use the 16-bit value
for FAT32.

Submitted by: bde
MFC after: 1 week


246215 01-Feb-2013 kib

Fix a backwards comment in markvoldirty().

Submitted by: bde
MFC after: 1 week


243311 19-Nov-2012 attilio

r16312 is not any longer real since many years (likely since when VFS
received granular locking) but the comment present in UFS has been
copied all over other filesystems code incorrectly for several times.

Removes comments that makes no sense now.

Reviewed by: kib
MFC after: 3 days


242833 09-Nov-2012 attilio

Complete MPSAFE VFS interface and remove MNTK_MPSAFE flag.
Porters should refer to __FreeBSD_version 1000021 for this change as
it may have happened at the same timeframe.


238697 22-Jul-2012 kevlo

Use NULL instead of 0 for pointers


234607 23-Apr-2012 trasz

Remove unused thread argument to vrecycle().

Reviewed by: kib


234605 23-Apr-2012 trasz

Remove unused thread argument from vtruncbuf().

Reviewed by: kib


234482 20-Apr-2012 mckusick

This change creates a new list of active vnodes associated with
a mount point. Active vnodes are those with a non-zero use or hold
count, e.g., those vnodes that are not on the free list. Note that
this list is in addition to the list of all the vnodes associated
with a mount point.

To avoid adding another set of linkage pointers to the vnode
structure, the active list uses the existing linkage pointers
used by the free list (previously named v_freelist, now renamed
v_actfreelist).

This update adds the MNT_VNODE_FOREACH_ACTIVE interface that loops
over just the active vnodes associated with a mount point (typically
less than 1% of the vnodes associated with the mount point).

Reviewed by: kib
Tested by: Peter Holm
MFC after: 2 weeks


234386 17-Apr-2012 mckusick

Replace the MNT_VNODE_FOREACH interface with MNT_VNODE_FOREACH_ALL.
The primary changes are that the user of the interface no longer
needs to manage the mount-mutex locking and that the vnode that
is returned has its mutex locked (thus avoiding the need to check
to see if its is DOOMED or other possible end of life senarios).

To minimize compatibility issues for third-party developers, the
old MNT_VNODE_FOREACH interface will remain available so that this
change can be MFC'ed to 9. Following the MFC to 9, MNT_VNODE_FOREACH
will be removed in head.

The reason for this update is to prepare for the addition of the
MNT_VNODE_FOREACH_ACTIVE interface that will loop over just the
active vnodes associated with a mount point (typically less than
1% of the vnodes associated with the mount point).

Reviewed by: kib
Tested by: Peter Holm
MFC after: 2 weeks


234025 08-Apr-2012 mckusick

Add I/O accounting to msdos filesystem.

Suggested and reviewed by: kib


232483 04-Mar-2012 kevlo

Clean up style(9) nits


231998 22-Feb-2012 kib

Use DOINGASYNC() to test for async allowance, to honor VFS syncing requests.

Noted by: bde
MFC after: 1 week


231949 21-Feb-2012 kib

Fix found places where uio_resid is truncated to int.

Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the
sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from
the usermode.

Discussed with: bde, das (previous versions)
MFC after: 1 month


230249 17-Jan-2012 mckusick

Make sure all intermediate variables holding mount flags (mnt_flag)
and that all internal kernel calls passing mount flags are declared
as uint64_t so that flags in the top 32-bits are not lost.

MFC after: 2 weeks


228796 22-Dec-2011 kevlo

Discarding local array based on return values


227817 22-Nov-2011 kib

Put all the messages from msdosfs under the MSDOSFS_DEBUG ifdef.
They are confusing to user, and not informative for general consumption.

MFC after: 1 week


227650 18-Nov-2011 kevlo

Add unicode support to msdosfs and smbfs; original pathes from imura,
bug fixes by Kuan-Chung Chiu <buganini at gmail dot com>.

Tested by me in production for several days at work.


224290 24-Jul-2011 mckusick

This update changes the mnt_flag field in the mount structure from
32 bits to 64 bits and eliminates the unused mnt_xflag field. The
existing mnt_flag field is completely out of bits, so this update
gives us room to expand. Note that the f_flags field in the statfs
structure is already 64 bits, so the expanded mnt_flag field can
be exported without having to make any changes in the statfs structure.

Approved by: re (bz)


222167 22-May-2011 rmacklem

Add a lock flags argument to the VFS_FHTOVP() file system
method, so that callers can indicate the minimum vnode
locking requirement. This will allow some file systems to choose
to return a LK_SHARED locked vnode when LK_SHARED is specified
for the flags argument. This patch only adds the flag. It
does not change any file system to use it and all callers
specify LK_EXCLUSIVE, so file system semantics are not changed.

Reviewed by: kib


220014 25-Mar-2011 kib

Report EBUSY instead of EROFS for attempt of deleting or renaming the
root directory of msdosfs mount. The VFS code would handle deletion
case itself too, assuming VV_ROOT flag is not lost. The msdosfs_rename()
should also note attempt to rename root via doscheckpath() or different
mount point check leading to EXDEV. Nonetheless, keep the checks for now.

The change is inspired by NetBSD change referenced in PR, but return
EBUSY like kern_unlinkat() does.

PR: kern/152079
MFC after: 1 week


218909 21-Feb-2011 brucec

Fix typos - remove duplicate "the".

PR: bin/154928
Submitted by: Eitan Adler <lists at eitanadler.com>
MFC after: 3 days


215548 19-Nov-2010 kib

Remove prtactive variable and related printf()s in the vop_inactive
and vop_reclaim() methods. They seems to be unused, and the reported
situation is normal for the forced unmount.

MFC after: 1 week
X-MFC-note: keep prtactive symbol in vfs_subr.c


214001 18-Oct-2010 kevlo

Fix a possible race where the directory dirent is moved to the location
that was used by ".." entry.
This change seems fixed panic during attempt to access msdosfs data
over nfs.

Reviewed by: kib
MFC after: 1 week


213771 13-Oct-2010 rpaulo

Ignore the return value of DE_INTERNALIZE().


213664 10-Oct-2010 kib

The r184588 changed the layout of struct export_args, causing an ABI
breakage for old mount(2) syscall, since most struct <filesystem>_args
embed export_args. The mount(2) is supposed to provide ABI
compatibility for pre-nmount mount(8) binaries, so restore ABI to
pre-r184588.

Requested and reviewed by: bde
MFC after: 2 weeks


213543 08-Oct-2010 kib

Add a comment describing the reason for calling cache_purge(fvp).

Requested by: danfe
MFC after: 6 days


213508 07-Oct-2010 kib

The msdosfs lookup is case insensitive. Several aliases may be inserted for
a single directory entry. As a consequnce, name cache purge done by lookup
for fvp when DELETE op for namei is specified, might be not enough to
expunge all namecache entries that were installed for this direntry.

Explicitely call cache_purge(fvp) when msdosfs_rename() succeeded.

PR: kern/93634
MFC after: 1 week


207719 06-May-2010 trasz

Style fixes and removal of unneeded variable.

Submitted by: bde@


207662 05-May-2010 trasz

Move checking against RLIMIT_FSIZE into one place, vn_rlimit_fsize().

Reviewed by: kib


206098 02-Apr-2010 avg

mountmsdosfs: reject too high value of bytes per cluster

Bytes per cluster are calcuated as bytes per sector times sectors per
cluster. Too high value can overflow an internal variable with type
that can hold only values in valid range. Trying to use a wider type
results in an attempt to read more than MAXBSIZE at once, a panic.
Unfortunately, it is FreeBSD newfs_msdos that produces filesystems
with invalid parameters for certain types of media.

Reported by: Fabian Keil <freebsd-listen@fabiankeil.de>,
Paul B. Mahol <onemda@gmail.com>
Discussed with: bde, kib
MFC after: 1 week
X-ToDo: fix newfs_msdos


204675 03-Mar-2010 kib

When returning error from msdosfs_lookup(), make sure that *vpp is NULL.
lookup() KASSERTs this condition.

Reported and tested by: pho
MFC after: 3 weeks


204589 02-Mar-2010 kib

Do not leak vnode lock when msdosfs mount is updated and specified
device is different from the device used to the original mount.

Note that update_mp does not need devvp locked, and pmp->pm_devvp cannot
be freed meantime.

Reported and tested by: pho
MFC after: 3 weeks


204576 02-Mar-2010 kib

Only destroy pm_fatlock on error if it was initialized.

MFC after: 3 weeks


204475 28-Feb-2010 kib

Mark msdosfs as mpsafe.

Tested by: pho
MFC after: 3 weeks


204474 28-Feb-2010 kib

Fix the race between dotdot lookup and forced unmount, by using
msdosfs-specific variant of vn_vget_ino(), msdosfs_deget_dotdot().

As was done for UFS, relookup the dotdot denode after the call to
msdosfs_deget_dotdot(), because vnode lock is dropped and directory
might be moved.

Tested by: pho
MFC after: 3 weeks


204473 28-Feb-2010 kib

Use pm_fatlock to protect per-filesystem rb tree used to allocate fileno
on the large FAT volumes. Previously, a single global mutex was used.

Tested by: pho
MFC after: 3 weeks


204472 28-Feb-2010 kib

Add assertions for FAT bitmap state.

Tested by: pho
MFC after: 3 weeks


204471 28-Feb-2010 kib

Use pm_fatlock to protect fat bitmap.

Tested by: pho
MFC after: 3 weeks


204470 28-Feb-2010 kib

Add per-mountpoint lockmgr lock for msdosfs. It is intended to be used
as fat bitmap lock and to replace global mutex protecting fileno rbtree.

Tested by: pho
MFC after: 3 weeks


204469 28-Feb-2010 kib

In msdosfs deget(), properly handle the case when the vnode is found in hash.

Tested by: pho
MFC after: 3 weeks


204468 28-Feb-2010 kib

In msdosfs_inactive(), reclaim the vnodes both for SLOT_DELETED and
SLOT_EMPTY deName[0] values. Besides conforming to FAT specification, it
also clears the issue where vfs_hash_insert found the vnode in hash, and
newly allocated vnode is vput()ed. There, deName[0] == 0, and vnode is
not reclaimed, indefinitely kept on mountlist.

Tested by: pho
MFC after: 3 weeks


204467 28-Feb-2010 kib

Remove seemingly unneeded unlock/relock of the dvp in msdosfs_rmdir,
causing LOR.

Reported and tested by: pho
MFC after: 3 weeks


204466 28-Feb-2010 kib

Assert that the msdosfs vnode is (e)locked in several places.
The plan is to use vnode lock to protect denode and fat cache,
and having separate lock for block use map.

Change the check and return on impossible condition into KASSERT().

Tested by: pho
MFC after: 3 weeks


204465 28-Feb-2010 kib

Remove unused global statistic about fat cache usage.

Tested by: pho
MFC after: 3 weeks


204111 20-Feb-2010 uqs

Fix common misspelling of hierarchy

Pointed out by: bf1783 at gmail
Approved by: np (cxgb), kientzle (tar, etc.), philip (mentor)


203866 14-Feb-2010 kib

Invalid filesystem might cause the bp to be never read.

Noted by: Pedro F. Giffuni <giffunip tutopia com>
Obtanined from: NetBSD
MFC after: 1 week


203828 13-Feb-2010 kib

Fix function name in the comment in the second location too.

Submitted by: ed
MFC after: 1 week


203827 13-Feb-2010 kib

- Add idempotency guards so the structures can be used in other utilities.
- Update bpb structs with reserved fields.
- In direntry struct join deName with deExtension. Although a
fix was attempted in the past, these fields were being overflowed,
Now this is consistent with the spec, and we can now share the
WinChksum code with NetBSD.

Submitted by: Pedro F. Giffuni <giffunip tutopia com>
Mostly obtained from: NetBSD
Reviewed by: bde
MFC after: 2 weeks


203826 13-Feb-2010 kib

Use M_ZERO instead of calling bzero().
Fix function name in the comment.

MFC after: 1 week


203822 13-Feb-2010 kib

Remove unused macros.

MFC after: 1 week


196970 08-Sep-2009 phk

Revert previous commit and add myself to the list of people who should
know better than to commit with a cat in the area.


196969 08-Sep-2009 phk

Add necessary include.


193924 10-Jun-2009 kib

Fix r193923 by noting that type of a_fp is struct file *, not int.
It was assumed that r193923 was trivial change that cannot be done
wrong.

MFC after: 2 weeks


193923 10-Jun-2009 kib

s/a_fdidx/a_fp/ for VOP_OPEN comments that inline struct vop_open_args
definition.

Discussed with: bde
MFC after: 2 weeks


191990 11-May-2009 attilio

Remove the thread argument from the FSD (File-System Dependent) parts of
the VFS. Now all the VFS_* functions and relating parts don't want the
context as long as it always refers to curthread.

In some points, in particular when dealing with VOPs and functions living
in the same namespace (eg. vflush) which still need to be converted,
pass curthread explicitly in order to retain the old behaviour.
Such loose ends will be fixed ASAP.

While here fix a bug: now, UFS_EXTATTR can be compiled alone without the
UFS_EXTATTR_AUTOSTART option.

VFS KPI is heavilly changed by this commit so thirdy parts modules needs
to be recompiled. Bump __FreeBSD_version in order to signal such
situation.


189120 27-Feb-2009 jhb

- Hold a reference on the cdev a filesystem is mounted from in the mount.
- Remove the cdev pointers from the denode and instead use the mountpoint's
reference to call dev2udev() in getattr().

Reviewed by: kib, julian


188956 23-Feb-2009 trasz

Right now, when trying to unmount a device that's already gone,
msdosfs_unmount() and ffs_unmount() exit early after getting ENXIO.
However, dounmount() treats ENXIO as a success and proceeds with
unmounting. In effect, the filesystem gets unmounted without closing
GEOM provider etc.

Reviewed by: kib
Approved by: rwatson (mentor)
Tested by: dho
Sponsored by: FreeBSD Foundation


187199 13-Jan-2009 trasz

Turn a "panic: non-decreasing id" into an error printf. This seems
to be caused by a metadata corruption that occurs quite often after
unplugging a pendrive during write activity.

Reviewed by: scottl
Approved by: rwatson (mentor)
Sponsored by: FreeBSD Foundation


187058 11-Jan-2009 trasz

Fix msdosfs_print(), which in turn fixes "show lockedvnods" for msdosfs
vnodes.

Reviewed by: kib
Approved by: rwatson (mentor)
Sponsored by: FreeBSD Foundation


186194 16-Dec-2008 trasz

According to phk@, VOP_STRATEGY should never, _ever_, return
anything other than 0. Make it so. This fixes
"panic: VOP_STRATEGY failed bp=0xc320dd90 vp=0xc3b9f648",
encountered when writing to an orphaned filesystem. Reason
for the panic was the following assert:
KASSERT(i == 0, ("VOP_STRATEGY failed bp=%p vp=%p", bp, bp->b_vp));
at vfs_bio:bufstrategy().

Reviewed by: scottl, phk
Approved by: rwatson (mentor)
Sponsored by: FreeBSD Foundation


184413 28-Oct-2008 trasz

Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessary
to add more V* constants, and the variables changed by this patch were often
being assigned to mode_t variables, which is 16 bit.

Approved by: rwatson (mentor)


184205 23-Oct-2008 des

Retire the MALLOC and FREE macros. They are an abomination unto style(9).

MFC after: 3 months


183754 10-Oct-2008 attilio

Remove the struct thread unuseful argument from bufobj interface.
In particular following functions KPI results modified:
- bufobj_invalbuf()
- bufsync()

and BO_SYNC() "virtual method" of the buffer objects set.
Main consumers of bufobj functions are affected by this change too and,
in particular, functions which changed their KPI are:
- vinvalbuf()
- g_vfs_close()

Due to the KPI breakage, __FreeBSD_version will be bumped in a later
commit.

As a side note, please consider just temporary the 'curthread' argument
passing to VOP_SYNC() (in bufsync()) as it will be axed out ASAP

Reviewed by: kib
Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>


183214 20-Sep-2008 kib

Initialize va_rdev to NODEV instead of 0 or VNOVAL in VOP_GETATTR().
NODEV is more appropriate when va_rdev doesn't have a meaningful value.

Submitted by: Jaakko Heinonen <jh saunalahti fi>
Suggested by: bde
Discussed on: freebsd-fs
MFC after: 1 month


182600 01-Sep-2008 kib

In rev. 1.17 (r33548) of msdosfs_fat.c, relative cluster numbers were
replaced by file relative sector numbers as the buffer block number when
zero-padding a file during extension. Revert the change, it causes wrong
blocks filled with zeroes on seeking beyond end of file.

PR: kern/47628
Submitted by: tegge
MFC after: 3 days


182371 28-Aug-2008 attilio

Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread
was always curthread and totally unuseful.

Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>


180252 04-Jul-2008 kib

The uniqdosname() function takes char[12] as it third argument.

Found by: -fstack-protector
Reported by: dougb
Tested by: dougb, Rainer Hurling <rhurlin gwdg de>
MFC after: 3 days


178243 16-Apr-2008 kib

Move the head of byte-level advisory lock list from the
filesystem-specific vnode data to the struct vnode. Provide the
default implementation for the vop_advlock and vop_advlockasync.
Purge the locks on the vnode reclaim by using the lf_purgelocks().
The default implementation is augmented for the nfs and smbfs.
In the nfs_advlock, push the Giant inside the nfs_dolock.

Before the change, the vop_advlock and vop_advlockasync have taken the
unlocked vnode and dereferenced the fs-private inode data, racing with
with the vnode reclamation due to forced unmount. Now, the vop_getattr
under the shared vnode lock is used to obtain the inode size, and
later, in the lf_advlockasync, after locking the vnode interlock, the
VI_DOOMED flag is checked to prevent an operation on the doomed vnode.

The implementation of the lf_purgelocks() is submitted by dfr.

Reported by: kris
Tested by: kris, pho
Discussed with: jeff, dfr
MFC after: 2 weeks


177785 31-Mar-2008 kib

Add the support for the AT_FDCWD and fd-relative name lookups to the
namei(9).

Based on the submission by rdivacky,
sponsored by Google Summer of Code 2007
Reviewed by: rwatson, rdivacky
Tested by: pho


177633 26-Mar-2008 dfr

Add the new kernel-mode NFS Lock Manager. To use it instead of the
user-mode lock manager, build a kernel with the NFSLOCKD option and
add '-k' to 'rpc_lockd_flags' in rc.conf.

Highlights include:

* Thread-safe kernel RPC client - many threads can use the same RPC
client handle safely with replies being de-multiplexed at the socket
upcall (typically driven directly by the NIC interrupt) and handed
off to whichever thread matches the reply. For UDP sockets, many RPC
clients can share the same socket. This allows the use of a single
privileged UDP port number to talk to an arbitrary number of remote
hosts.

* Single-threaded kernel RPC server. Adding support for multi-threaded
server would be relatively straightforward and would follow
approximately the Solaris KPI. A single thread should be sufficient
for the NLM since it should rarely block in normal operation.

* Kernel mode NLM server supporting cancel requests and granted
callbacks. I've tested the NLM server reasonably extensively - it
passes both my own tests and the NFS Connectathon locking tests
running on Solaris, Mac OS X and Ubuntu Linux.

* Userland NLM client supported. While the NLM server doesn't have
support for the local NFS client's locking needs, it does have to
field async replies and granted callbacks from remote NLMs that the
local client has contacted. We relay these replies to the userland
rpc.lockd over a local domain RPC socket.

* Robust deadlock detection for the local lock manager. In particular
it will detect deadlocks caused by a lock request that covers more
than one blocking request. As required by the NLM protocol, all
deadlock detection happens synchronously - a user is guaranteed that
if a lock request isn't rejected immediately, the lock will
eventually be granted. The old system allowed for a 'deferred
deadlock' condition where a blocked lock request could wake up and
find that some other deadlock-causing lock owner had beaten them to
the lock.

* Since both local and remote locks are managed by the same kernel
locking code, local and remote processes can safely use file locks
for mutual exclusion. Local processes have no fairness advantage
compared to remote processes when contending to lock a region that
has just been unlocked - the local lock manager enforces a strict
first-come first-served model for both local and remote lockers.

Sponsored by: Isilon Systems
PR: 95247 107555 115524 116679
MFC after: 2 weeks


177493 22-Mar-2008 jeff

- Complete part of the unfinished bufobj work by consistently using
BO_LOCK/UNLOCK/MTX when manipulating the bufobj.
- Create a new lock in the bufobj to lock bufobj fields independently.
This leaves the vnode interlock as an 'identity' lock while the bufobj
is an io lock. The bufobj lock is ordered before the vnode interlock
and also before the mnt ilock.
- Exploit this new lock order to simplify softdep_check_suspend().
- A few sync related functions are marked with a new XXX to note that
we may not properly interlock against a non-zero bv_cnt when
attempting to sync all vnodes on a mountlist. I do not believe this
race is important. If I'm wrong this will make these locations easier
to find.

Reviewed by: kib (earlier diff)
Tested by: kris, pho (earlier diff)


176431 21-Feb-2008 marcel

Don't check the bpbSecPerTrack and bpbHeads fields of the BPB.
They are typically 0 on new ia64 systems. Since we don't use
either field, there's no harm in not checking.


175635 24-Jan-2008 attilio

Cleanup lockmgr interface and exported KPI:
- Remove the "thread" argument from the lockmgr() function as it is
always curthread now
- Axe lockcount() function as it is no longer used
- Axe LOCKMGR_ASSERT() as it is bogus really and no currently used.
Hopefully this will be soonly replaced by something suitable for it.
- Remove the prototype for dumplockinfo() as the function is no longer
present

Addictionally:
- Introduce a KASSERT() in lockstatus() in order to let it accept only
curthread or NULL as they should only be passed
- Do a little bit of style(9) cleanup on lockmgr.h

KPI results heavilly broken by this change, so manpages and
FreeBSD_version will be modified accordingly by further commits.

Tested by: matteo


175294 13-Jan-2008 attilio

VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>


175202 10-Jan-2008 attilio

vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by: Diego Sardina <siarodx at gmail dot com>,
Andrea Di Pasquale <whyx dot it at gmail dot com>


173728 18-Nov-2007 maxim

o English lesson from bde@: "iff" is not a typo, it means "if and only if".
Backout previous.


173690 17-Nov-2007 maxim

o Fix a typo in the comment.


172954 25-Oct-2007 trhodes

Remove some debugging code that, while useful, doesn't belong in the committed
version. While here, expand a macro only used once.

Discussed with/oked by: bde


172883 22-Oct-2007 delphij

Fixes to msdosfs dirtyflag related stuff:

- markvoldirty() needs to write to underlying GEOM provider. We
have to do that *before* g_access() which sets the GEOM provider
to read-only.
- Remove dirty flag before free'ing iconv related resources. The
dirty flag removal could fail, and it is hard to revert the
iconv-free after the fail.
- Mark volume as dirty if we have failed to mark it clean for safe.
- Other style fixes to the touched functions.


172798 19-Oct-2007 bde

Implement the async (really, delayed-write) mount option for msdosfs.

This is much simpler than for ffs since there are many fewer places
where we need to choose between a delayed write and a sync write --
just 5 in msdosfs and more than 30 in ffs.

This is more complete and correct than in ffs. Several places in ffs
are are still missing the choice. ffs_update() has a layering violation
that breaks callers which want to force a sync update (mainly fsync(2)
and O_SYNC write(2)).

However, fsync(2) and O_SYNC write(2) are still more broken than in
ffs, since they are broken for default (non-sync non-async) mounts
too. Both fail to sync the FAT in all cases, and both fail to sync
the directory entry in some cases after losing a race. Async everything
is probably safer than the half-baked sync of metadata given by default
mounts.


172758 18-Oct-2007 bde

Add noclusterr and noclusterw options to the options list. I forgot these
when I implemented clustering.


172757 18-Oct-2007 bde

Fix some style bugs in the mount options list. Mainly, sort the list,
leaving space for adding missing options. Negative options are sorted
after removing their "no" prefix, and generic options are sorted before
msdosfs-specific ones.


172741 18-Oct-2007 bde

In msdosfs_settattr(), don't do synchronous updates of the denode
(except indirectly for the size pseudo-attribute). If anything deserves
a sync update, then it is ids and immutable flags, since these are
related to security, but ffs never synced these and msdosfs doesn't
support them. (ufs_setattr() only does an update in one case where
it is least needed (for timestamps); it did pessimal sync updates for
timestamps until 1998/03/08 but was changed for unlogged reasons related
to soft updates.)

Now msdosfs calls deupdat() with waitfor == 0, which normally gives a
delayed update to disk but always gives a sync update of timestamps
in core, while for ffs everything is delayed until the syncer daemon
or other activity causes an update (except for timestamps).

This gives a large optimization mainly for things like cp -p, where
attribute adjustment could easily triple the number of physical I/O's
if it is done synchronously (but cp -p to msdosfs is not as bad as
that, since msdosfs doesn't support many attributes so null adjustments
are more common, and msdosfs doesn't support ctimes so even if cp
doesn't weed out null adjustments they don't become non-null after
clobbering the ctime).


172697 16-Oct-2007 alfred

Get rid of qaddr_t.

Requested by: bde


172303 23-Sep-2007 bde

Remove some of the pessimizations involving writing the fsi sector.
All active fields in fsi are advisory/optional, so we shouldn't do
extra work to make them valid at all times, but instead we write to
the fsi too often (we still do), and we searched for a free cluster
for fsinxtfree too often.

This commit just removes the whole search and its results, so that we
write out our in-core copy of fsinxtfree instead of writing a "fixed"
copy and clobbering our in-core copy. This saves fixing 3 bugs:
- off-by-1 error for the end of the search, resulting in fsinxtfree
not actually being adjusted iff only the last cluster is free.
- missing adjustment when no clusters are free.
- off-by-many error for the start of the search. Starting the search
at 0 instead of at (the in-core copy of) fsinxtfree did more than
defeat the reasons for existence of fsinxtfree. fsinxtfree exists
mainly to avoid having to start at 0 for just the first search per
mount, but has the side effect of reducing bias towards allocating
near cluster 0. The bias would normally only be generated by the
first search per mount (if fsinxtfree is not supported), but since
we also adjusted the in-core copy of fsinxtfree here, we were doing
extra work to maximize the bias.

Approved by: re (kensmith)


172027 31-Aug-2007 bde

Fix races in msdosfs_lookup() and msdosfs_readdir(). These functions
can easily block in bread(), and then there was nothing to prevent the
static buffer (nambuf_{ptr,len,last_id}) being clobbered by another
thread.

The effects of the bug seem to have been limited to failed lookups and
mangled names in readdir(), since Giant locking provides enough
serialization to prevent concurrent calls to the functions that access
the buffer. They were very obvious for multiple concurrent tree walks,
especially with a small cluster size.

The bug was introduced in msdosfs_conv.c 1.34 and associated changes,
and is in all releases starting with 5.2.

The fix is to allocate the buffer as a local variable and pass around
pointers to it like "_r" functions in libc do. Stack use from this
is large but not too large. This also fixes a memory leak on module
unload.

Reviewed by: kib
Approved by: re (kensmith)


171852 15-Aug-2007 jhb

On 6.x this works:

% mount | grep home
/dev/ad4s1e on /home (ufs, local, noatime, soft-updates)
% mount -u -o atime /home
% mount | grep home
/dev/ad4s1e on /home (ufs, local, soft-updates)

Restore this behavior for on 7.x for the following mount options:
noatime, noclusterr, noclusterw, noexec, nosuid, nosymfollow

In addition, on 7.x, the following are equivalent:
mount -u -o atime /home
mount -u -o nonoatime /home

Ideally, when we introduce new mount options, we should avoid
options starting with "no". :)

Requested by: jhb
Reported by: Karol Kwiat <karol.kwiat gmail com>, Scott Hetzel <swhetzel gmail com>
Approved by: re (bmah)
Proxy commit for: rodrigc


171774 07-Aug-2007 bde

In msdosfs_read() and msdosfs_write(), don't check explicitly for
(uio_offset < 0) since this can't happen. If this happens, then the
general code handles the problem safely (better than before for reading,
returning 0 (EOF) instead of the bogus errno EINVAL, and the same as
before for writing, returning EFBIG).

In msdosfs_read(), don't check for (uio_resid < 0). msdosfs_write()
already didn't check.

In msdosfs_read(), document in a comment our assumptions that the caller
passed a valid uio_offset and uio_resid. ffs checks using KASSERT(),
and that is enough sanity checking. In the same comment, partly document
there is no need to check for the EOVERFLOW case, unlike in ffs where this
case can happen at least in theory.

In msdosfs_write(), add a comment about why the checking of
(uio_resid == 0) is explicit, unlike in ffs.

In msdosfs_write(), check for impossibly large final offsets before
checking if the file size rlimit would be exceeded, so that we don't
have an overflow bug in the rlimit check and are consistent with ffs.
We now return EFBIG instead of EFBIG plus a SIGXFSZ signal if the final
offset would be impossibly large but not so large as to cause overflow.
Overflow normally gave the benign behaviour of no signal.

Approved by: re (kensmith) (blanket)


171771 07-Aug-2007 bde

Fix and update the comments about the effect of the read-only flag on writing.
They are still too verbose.

Remove nearby unreachable code for handling symlinks.

Approved by: re (kensmith) (blanket)


171759 07-Aug-2007 bde

Fix some style bugs (don't assume that off_t == int64_t; fix some comments;
remove some parentheses; fix some whitespace errors; fix only one case of
a boolean comparison of a non-boolean).

Improve an error message by quoting ".", and by not printing large positive
values as negative ones.

Approved by: re (kensmith) (blanket)


171758 07-Aug-2007 bde

Fix some style bugs (don't assume that off_t == int64_t; fix some comments;
remove some parentheses; fix only a couple of whtespace errors).

Approved by: re (kensmith) (blanket)


171757 07-Aug-2007 bde

Fix some style bugs (mainly some whitespace errors).

Approved by: re (kensmith) (blanket)


171756 07-Aug-2007 bde

Fix some style bugs (some whitespace errors only).

Approved by: re (kensmith) (blanket)


171755 07-Aug-2007 bde

Sort includes.

Remove rotted banal comment attached to includes.

Approved by: re (kensmith) (blanket)


171754 07-Aug-2007 bde

Sort includes.

Remove banal comments attached to includes.

Approved by: re (kensmith) (blanket)


171752 07-Aug-2007 bde

Sort includes.

Remove banal comments before includes. Remove rotted banal comments attached
to includes.

Approved by: re (kensmith) (blanket)


171751 07-Aug-2007 bde

Remove unused include(s).

Remove banal comments before includes.

Approved by: re (kensmith) (blanket)


171750 07-Aug-2007 bde

Remove unused include(s).

Approved by: re (kensmith) (blanket)


171749 07-Aug-2007 bde

Include <sys/mutex.h> and its prerequisite <sys/lock.h> instead of
depending on namespace pollution in <sys/buf.h> and/or <sys/vnode.h>

Approved by: re (kensmith) (blanket)


171748 07-Aug-2007 bde

Include <sys/mutex.h>'s prerequisite <sys/lock.h> instead of depending on
namespace pollution in <sys/vnode.h>.

Sort the include of <sys/mutex.h> instead of unsorting it after
<sys/vnode.h> and depending on the pollution there.

Approved by: re (kensmith) (blanket)


171747 07-Aug-2007 bde

Remove unused include(s).

Approved by: re (kensmith) (blanket)


171731 05-Aug-2007 bde

Silently fix up the estimated next free cluster number from the fsinfo
sector, instead of failing the whole mount if it is garbage. Fields
in the fsinfo sector are only advisory, so there are better sanity
checks than this, and we already silently fix up the only other advisory
field in the fsinfo (the free cluster count).

This wasn't handled quite right in rev.1.92, 1.117, or in NetBSD. 1.92
also failed the whole mount for the non-garbage magic value 0xffffffff
1.117 fixed this well enough in practice since garbage values shouldn't
occur in practice, but left the error handling larger and more convoluted
than necessary. Now we handle the magic value as a special case of
fixing up all out of bounds values.

Also fix up the estimated next free cluster number when there is no
fsinfo sector. We were using 0, but CLUST_FIRST is safer.

Approved by: re (kensmith)


171711 03-Aug-2007 bde

Oops, fix the fix for the i/o size of the fsinfo block. Its log
message explained why the size is 1 sector, but the code used a
size of 1 cluster.

I/o sizes larger than necessary may cause serious coherency problems
in the buffer cache. Here I think there were only minor efficiency
problems, since a too-large fsinfo buffer could only get far enough
to overlap buffers for the same vnode (the device vnode), so mappings
are coherent at the page level although not at the buffer level, and
the former is probably enough due to our limited use of the fsinfo
buffer.

Approved by: re (kensmith)


171551 23-Jul-2007 bde

Make using msdosfs as the root file system sort of work:

o Initialize ownerships and permissions. They were garbage (0) for
root mounts since vfs_mountroot_try() doesn't ask for them to be set
and msdosfs's old incomplete code to set them was removed. The
garbage happened to give the correct ownerships root:wheel, but it
gave permissions 000 so init could not be execed. Use the macros
for root: wheel and 0755. (The removed code gave 0:0 and 0777. 0755
is more normal and secure, thought wrong for /tmp.)

o Check the readonly flag for initial (non-MNT_UPDATE) mounts in the
correct place, as in ffs. For root mounts, it is only passed in
mp->mnt_flags, since vfs_mountroot_try() only passes it as a flag
and nothing translates the flag to the "ro" option string. msdosfs
only looked for it in the string, so it gave a rw mount for root
mounts without even clearing the flag in mp->mnt_flags, so the final
state was inconsistent. Checking the flag only in mp->mnt_flags
works for initial userland mounts too. The MNT_UPDATE case is
messier.

The main point that should work but doesn't is fsck of msdosfs root
while it is mounted ro. This needs mainly MNT_RELOAD support to work.
It should be possible to run fsck -p and succeed provided the fs is
consistent, not just for msdosfs, but this fails because fsck -p always
tries to open the device rw. The hack that allows open for writing
in ffs is not implemented in msdosfs, since without MNT_RELOAD support
writing could only be harmful. So fsck must be turned off to use
msdosfs as root. This is quite dangerous, since msdosfs is still missing
actually using its fs-dirty flag internally, so it is happy to mount
dirty fileystems rw.

Unrelated changes:
- Fix missing error handling for MNT_UPDATE from rw to ro.
- Catch up with renaming msdos to msdosfs in a string.

Approved by: re (kensmith)


171523 20-Jul-2007 bde

Implement vfs clustering for msdosfs.

This gives a very large speedup for small block sizes (in my tests,
about 5 times for write and 3 times for read with a block size of 512,
if clustering is possible) and a moderate speedup for the moderatatly
large block sizes that should be used on non-small media (4K is the
best size in most cases, and the speedup for that is about 1.3 times
for write and 1.2 times for read). mmap() should benefit from clustering
like read()/write(), but the current implementation of vm only supports
clustering (at least for getpages) if the fs block size is >= PAGE SIZE.

msdosfs is now only slightly slower than ffs with soft updates for
writing and slightly faster for reading when both use their best block
sizes. Writing is slower for msdosfs because of more sync writes.
Reading is faster for msdosfs because indirect blocks interfere with
clustering in ffs.

The changes in msdosfs_read() and msdosfs_write() are simpler merges
of corresponding code in ffs (after fixing some style bugs in ffs).
msdosfs_bmap() needs fs-specific code. This implementation loops
calling a lower level bmap function to do the hard parts. This is a
bit inefficient, but is efficient enough since msdsfs_bmap() is only
called when there is physical i/o to do.

Approved by: re (hrs)


171522 20-Jul-2007 bde

Clean up before implementing vfs clustering for msdosfs:

In msdosfs_read(), mainly reorder the main loop to the same order as in
ffs_read().

In msdosfs_write() and extendfile(), use vfs_bio_clrbuf() instead of
clrbuf(). I think this just just a bogus optimization, but ffs always
does it and msdosfs already did it in one place, and it is what I've
tested.

In msdosfs_write(), merge good bits from a comment in ffs_write(), and
fix 1 style bug.

In the main comment for msdosfs_pcbmap(), improve wording and catch
up with 13 years of changes in the function. This comment belongs in
VOP_BMAP.9 but that doesn't exist.

In msdosfs_bmap(), return EFBIG if the requested cluster number is out
of bounds instead of blindly truncating it, and fix many style bugs.

Approved by: re (hrs)


171408 12-Jul-2007 bde

Round up the FAT block size to a multiple of the sector size so that i/o
to the FAT is possible.

Make the FAT block size less arbitrary before it is rounded up:
- for FAT12, default to 3*512 instead of to 3 sectors. The magic 3 is
the default number of 512-byte FAT sectors on a floppy drive. That
many sectors is too many if the sector size is larger.
- for !FAT12, default to PAGE_SIZE instead of to 4096. Remove
MSDOSFS_DFLTBSIZE since it only obfuscated this 4096.

For reading the BPB, use a block size of 8192 instead of 2048 so that
sector sizes up to 8192 can work. We should try several sizes, or just
try the maximum supported size (MAXBSIZE = 64K). I use 8192 because
that is enough for DVD-RW's (even 2048 is enough) and 8192 has been
tested a lot in use by ffs.

This completes fixing msdosfs for some large sector sizes (up to 8K
for read and 64K for write). Microsoft documents support for sector
sizes up to 4K in mdosfs. ffs is currently limited to 8K for both
read and write.

Approved by: re (kensmith)
Approved by: nyan (several years ago)


171406 12-Jul-2007 bde

Fix some bugs involving the fsinfo block (many remain unfixed). This is
part of fixing msdosfs for large sector sizes. One of the fixed bugs
was fatal for large sector sizes.

1. The fsinfo block has size 512, but it was misunderstood and declared
as having size 1024, with nothing in the second 512 bytes except a
signature at the end. The second 512 bytes actually normally (if
the file system was created by Windows) consist of a second boot
sector which is normally (in WinXP) empty except for a signature --
the normal layout is one boot sector, one fsinfo sector, another
boot sector, then these 3 sectors duplicated. However, other
layouts are valid. newfs_msdos produces a valid layout with one
boot sector, one fsinfo sector, then these 2 sectors duplicated.
The signature check for the extra part of the fsinfo was thus
normally checking the signature in either the second boot sector
or the first boot sector in the copy, and thus accidentally
succeeding. The extra signature check would just fail for weirder
layouts with 512-byte sectors, and for normal layouts with any other
sector size.

Remove the extra bytes and the extra signature check.

2. Old versions did i/o to the fsinfo block using size 1024, with the
second half only used for the extra signature check on read. This
was harmless for sector size 512, and worked accidentally for sector
size 1024. The i/o just failed for larger sector sizes.

The version being fixed did i/o to the fsinfo block using size
fsi_size(pmp) = (1024 << ((pmp)->pm_BlkPerSec >> 2)). This
expression makes no sense. It happens to work for sector small
sector sizes, but for sector size 32K it gives the preposterous
value of 64M and thus causes panics. A sector size of 32768 is
necessary for at least some DVD-RW's (where the minimum write size
is 32768 although the minimum read size is 2048).

Now that the size of the fsinfo block is 512, it always fits in
one sector so there is no need for a macro to express it. Just
use the sector size where the old code uses 1024.

Approved by: re (kensmith)
Approved by: nyan (several years ago for a different version of (2))


171343 10-Jul-2007 bde

Don't use almost perfectly pessimal cluster allocation. Allocation
of the the first cluster in a file (and, if the allocation cannot be
continued contiguously, for subsequent clusters in a file) was randomized
in an attempt to leave space for contiguous allocation of subsequent
clusters in each file when there are multiple writers. This reduced
internal fragmentation by a few percent, but it increased external
fragmentation by up to a few thousand percent.

Use simple sequential allocation instead. Actually maintain the fsinfo
sequence index for this. The read and write of this index from/to
disk still have many non-critical bugs, but we now write an index that
has something to do with our allocations instead of being modified
garbage. If there is no fsinfo on the disk, then we maintain the index
internally and don't go near the bugs for writing it.

Allocating the first free cluster gives a layout that is almost as good
(better in some cases), but takes too much CPU if the FAT is large and
the first free cluster is not near the beginning.

The effect of this change for untar and tar of a slightly reduced copy
of /usr/src on a new file system was:

Before (msdosfs 4K-clusters):
untar: 459.57 real untar from cached file (actually a pipe)
tar: 342.50 real tar from uncached tree to /dev/zero
Before (ffs2 soft updates 4K-blocks 4K-frags)
untar: 39.18 real
tar: 29.94 real
Before (ffs2 soft updates 16K-blocks 2K-frags)
untar: 31.35 real
tar: 18.30 real

After (msdosfs 4K-clusters):
untar 54.83 real
tar 16.18 real

All of these times can be improved further.

With multiple concurrent writers or readers (especially readers), the
improvement is smaller, but I couldn't find any case where it is
negative. 342 seconds for tarring up about 342 MB on a ~47MB/S partition
is just hard to unimprove on. (This operation would take about 7.3
seconds with reasonably localized allocation and perfect read-ahead.)
However, for active file systems, 342 seconds is closer to normal than
the 16+ seconds above or the 11 seconds with other changes (best I've
measured -- won easily by msdosfs!). E.g., my active /usr/src on ffs1
is quite old and fragmented, so reading to prepare for the above
benchmark takes about 6 times longer than reading back the fresh copies
of it.

Approved by: re (kensmith)


170587 12-Jun-2007 rwatson

Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in
some cases, move to priv_check() if it was an operation on a thread and
no other flags were present.

Eliminate caller-side jail exception checking (also now-unused); jail
privilege exception code now goes solely in kern_jail.c.

We can't yet eliminate suser() due to some cases in the KAME code where
a privilege check is performed and then used in many different deferred
paths. Do, however, move those prototypes to priv.h.

Reviewed by: csjp
Obtained from: TrustedBSD Project


170188 01-Jun-2007 trhodes

Revert previous, part of NFS that I didn't know about.


170184 01-Jun-2007 trhodes

Garbage collect msdosfs_fhtovp; it appears unused and I have been using
MSDOSFS without this function and problems for the last month.


167497 13-Mar-2007 tegge

Make insmntque() externally visibile and allow it to fail (e.g. during
late stages of unmount). On failure, the vnode is recycled.

Add insmntque1(), to allow for file system specific cleanup when
recycling vnode on failure.

Change getnewvnode() to no longer call insmntque(). Previously,
embryonic vnodes were put onto the list of vnode belonging to a file
system, which is unsafe for a file system marked MPSAFE.

Change vfs_hash_insert() to no longer lock the vnode. The caller now
has that responsibility.

Change most file systems to lock the vnode and call insmntque() or
insmntque1() after a new vnode has been sufficiently setup. Handle
failed insmntque*() calls by propagating errors to callers, possibly
after some file system specific cleanup.

Approved by: re (kensmith)
Reviewed by: kib
In collaboration with: kib


166774 15-Feb-2007 pjd

Move vnode-to-file-handle translation from vfs_vptofh to vop_vptofh method.
This way we may support multiple structures in v_data vnode field within
one file system without using black magic.

Vnode-to-file-handle should be VOP in the first place, but was made VFS
operation to keep interface as compatible as possible with SUN's VFS.
BTW. Now Solaris also implements vnode-to-file-handle as VOP operation.

VFS_VPTOFH() was left for API backward compatibility, but is marked for
removal before 8.0-RELEASE.

Approved by: mckusick
Discussed with: many (on IRC)
Tested with: ufs, msdosfs, cd9660, nullfs and zfs


166559 08-Feb-2007 rodrigc

Add noatime to the list of mount options that msdosfs accepts.

PR: 108896
Submitted by: Eugene Grosbein <eugen grosbein pp ru>


166558 08-Feb-2007 rodrigc

Style fixes: use ANSI C function declarations.


166524 06-Feb-2007 rodrigc

Eliminate some dead code which was introduced in 1.23, yet was always
commented out.


166343 30-Jan-2007 avatar

Fixing compilation bustage by removing references to opt_msdosfs.h.

This auto-generated header file no longer exists since the removal of
MSDOSFS_LARGE in sys/conf/options:1.574.


166341 30-Jan-2007 trhodes

Fix spacing from my previous commit to this file:

Noticed by: fjoe


166340 30-Jan-2007 rodrigc

Add a "-o large" mount option for msdosfs. Convert compile-time checks for
#ifdef MSDOSFS_LARGE to run-time checks to see if "-o large" was specified.

Test case provided by Oliver Fromme:
truncate -s 200G test.img
mdconfig -a -t vnode -f test.img -u 9
newfs_msdos -s 419430400 -n 1 /dev/md9 zip250
mount -t msdosfs /dev/md9 /mnt # should fail
mount -t msdosfs -o large /dev/md9 /mnt # should succeed

PR: 105964
Requested by: Oliver Fromme <olli lurza secnetix de>
Tested by: trhodes
MFC after: 2 weeks


166062 16-Jan-2007 trhodes

Add a 3rd entry in the cache, which keeps the end position
from just before extending a file. This has the desired effect
of keeping the write speed constant. And yes, that helps a lot
copying large files always at full speed now, and I have seen
improvements using benchmarks/bonnie.

Stolen from: NetBSD
Reviewed by: bde


165836 06-Jan-2007 rodrigc

When performing a mount update to change a mount from read-only to read-write,
do not call markvoldirty() until the mount has been flagged as read-write.
Due to the nature of the msdosfs code, this bug only seemed to appear for
FAT-16 and FAT-32.

This fixes the testcase:
#!/bin/sh
dd if=/dev/zero bs=1m count=1 oseek=119 of=image.msdos
mdconfig -a -t vnode -f image.msdos
newfs_msdos -F 16 /dev/md0 fd120m
mount_msdosfs -o ro /dev/md0 /mnt
mount | grep md0
mount -u -o rw /dev/md0; echo $?
mount | grep md0
umount /mnt
mdconfig -d -u 0

PR: 105412
Tested by: Eugene Grosbein <eugen grosbein pp ru>


165792 05-Jan-2007 rodrigc

Eliminate obsolete comment, now that getushort() is implemented in
terms of functions in <sys/endian.h>.


165431 21-Dec-2006 marcel

Unbreak 64-bit little-endian systems that do require alignment.
The fix involves using le16dec(), le32dec(), le16enc() and
le32enc(). This eliminates invalid casts and duplicated logic.


165342 19-Dec-2006 rodrigc

For big-endian version of getulong() macro, cast result to u_int32_t.
This macro was written expecting a 32-bit unsigned long, and
doesn't work properly on 64-bit systems. This bug caused vn_stat()
to return incorrect values for files larger than 2gb on msdosfs filesystems
on 64-bit systems.

PR: 106703
Submitted by: Axel Gonzalez <loox e-shell net>
MFC after: 3 days


165341 19-Dec-2006 rodrigc

Fix get_ulong() macro on AMD64 (or any little-endian 64-bit platform).
This bug caused vn_stat() to fail on files larger than 2gb on msdosfs
filesystems on AMD64.

PR: 106703
Tested by: Axel Gonzalez <loox e-shell net>
MFC after: 3 days


165022 09-Dec-2006 rodrigc

Minor cleanup. If we are doing a mount update, and we pass in
an "export" flag indicating that we are trying to NFS export the
filesystem, and the MSDOSFS_LARGEFS flag is set on the filesystem,
then deny the mount update and export request. Otherwise,
let the full mount update proceed normally.
MSDOSFS_LARGES and NFS don't mix because of the way inodes are calculated
for MSDOSFS_LARGEFS.

MFC after: 3 days


164855 03-Dec-2006 maxim

o Do not leave uninitialized birthtime: in MSDOSFSMNT_LONGNAME
set birthtime to FAT CTime (creation time) and in the other cases
set birthtime to -1.

o Set ctime to mtime instead of FAT CTime which has completely
different meaning.

PR: kern/106018
Submitted by: Oliver Fromme
MFC after: 1 month


164627 26-Nov-2006 maxim

o From the submitter: dos2unixchr will convert to lower case if
LCASE_BASE or LCASE_EXT or both are set. But dos2unixfn uses
dos2unixchr separately for the basename and the extension. So if
either LCASE_BASE or LCASE_EXT is set, dos2unixfn will convert both
the basename and extension to lowercase because it is blindly
passing in the state of both flags to dos2unixchr. The bit masks I
used ensure that only the state of LCASE_BASE gets passed to
dos2unixchr when the basename is converted, and only the state of
LCASE_EXT is passed in when the extension is converted.

PR: kern/86655
Submitted by: Micah Lieske
MFC after: 3 weeks


164033 06-Nov-2006 rwatson

Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges. These may
require some future tweaking.

Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>


163647 24-Oct-2006 phk

Replace slightly crummy fattime<->timespec conversion functions.


162970 02-Oct-2006 phk

Use utc_offset() where applicable, and hide the internals of it
as static variables.


162954 02-Oct-2006 phk

First part of a little cleanup in the calendar/timezone/RTC handling.

Move relevant variables to <sys/clock.h> and fix #includes as necessary.

Use libkern's much more time- & spamce-efficient BCD routines.


162647 26-Sep-2006 tegge

Use mount interlock to protect all changes to mnt_flag and mnt_kern_flag.
This eliminates a race where MNT_UPDATE flag could be lost when nmount()
raced against sync(), sync_fsync() or quotactl().


161425 17-Aug-2006 imp

while (0); -> while (0) in multi-line macros


160939 03-Aug-2006 delphij

When the volume is being downgraded from a read-write mode, mark
it as clean.

PR: kern/85366
Submitted by: Dan Lukes <dan at obluda dot cz>
MFC After: 2 weeks


159128 01-Jun-2006 rodrigc

mount_msdosfs.c:
- remove call to getmntopts(), and just pass -o options to
nmount(). This removes some confusion as to what options
msdosfs can parse, by pushing the responsibility of option parsing
to the VFS and FS specific code in the kernel.

msdosfs_vfsops.c:
- add "force" and "sync" to msdosfs_opts. They used to be specified
in mount_msdosfs.c, so move them here. It's not clear whethere these
options should be placed into global_opts in vfs_mount.c or not.

Motivated by: marcus


158924 26-May-2006 rodrigc

Remove calls to vfs_export() for exporting a filesystem for NFS mounting
from individual filesystems. Call it instead in vfs_mount.c,
after we call VFS_MOUNT() for a specific filesystem.


155160 01-Feb-2006 jeff

- Reorder calls to vrele() after calls to vput() when the vrele is a
directory. vrele() may lock the passed vnode, which in these cases would
give an invalid lock order of child -> parent. These situations are
deadlock prone although do not typically deadlock because the vrele
is typically not releasing the last reference to the vnode. Users of
vrele must consider it as a call to vn_lock() and order it appropriately.

MFC After: 1 week
Sponsored by: Isilon Systems, Inc.
Tested by: kkenn


154730 23-Jan-2006 trhodes

Update incorrect comments here, there should not be a call to panic()
over fs corruption.

Discussed with: alfred, phk


154692 22-Jan-2006 fjoe

Do not assume that `char direntry::deExtension[3]' starts right after
`char direntry::deName[8]' and access deExtension[] explicitly.

Found by: Coverity Prevent(tm)
CID: 350, 351, 352


154487 17-Jan-2006 alfred

I ran into an nfs client panic a couple of times in a row over the
last few days. I tracked it down to the fact that nfs_reclaim()
is setting vp->v_data to NULL _before_ calling vnode_destroy_object().
After silence from the mailing list I checked further and discovered
that ufs_reclaim() is unique among FreeBSD filesystems for calling
vnode_destroy_object() early, long before tossing v_data or much
of anything else, for that matter. The rest, including NFS, appear
to be identical, as if they were just clones of one original routine.

The enclosed patch fixes all file systems in essentially the same
way, by moving the call to vnode_destroy_object() to early in the
routine (before the call to vfs_hash_remove(), if any). I have
only tested NFS, but I've now run for over eighteen hours with the
patch where I wouldn't get past four or five without it.

Submitted by: Frank Mayhar
Requested by: Mohan Srinivasan
MFC After: 1 week


152610 19-Nov-2005 rodrigc

Properly parse the nowin95 mount option.

Tested by: Rainer Hurling <rhurlin at gwdg dot de>


152595 18-Nov-2005 rodrigc

Add "shortnames" and "longnames" mount options which are
synonyms for "shortname" and "longname" mount options. The old
(before nmount()) mount_msdosfs program accepted "shortnames" and "longnames",
but the kernel nmount() checked for "shortname" and "longname".
So, make the kernel accept "shortnames", "longnames", "shortname", "longname"
for forwards and backwarsd compatibility.

Discovered by: Rainer Hurling <rhurlin at gwdg dot de>


151897 31-Oct-2005 rwatson

Normalize a significant number of kernel malloc type names:

- Prefer '_' to ' ', as it results in more easily parsed results in
memory monitoring tools such as vmstat.

- Remove punctuation that is incompatible with using memory type names
as file names, such as '/' characters.

- Disambiguate some collisions by adding subsystem prefixes to some
memory types.

- Generally prefer lower case to upper case.

- If the same type is defined in multiple architecture directories,
attempt to use the same name in additional cases.

Not all instances were caught in this change, so more work is required to
finish this conversion. Similar changes are required for UMA zone names.


150711 29-Sep-2005 peadar

Remove checks for BOOTSIG[23] from FAT32 bootblocks.

There seems to be very little documentary evidence outside this
implementation to suggest a these checks are neccessary, and more
than one camera-formatted flash disk fails the check, but mounts
successfully on most other systems.

Reviewed By: bde@


149850 07-Sep-2005 obrien

Ensure the full value is written into inode variables.

PR: 85503
Submitted by: Dmitry Pryanishnikov <dmitry@atlantis.dp.ua>


149720 02-Sep-2005 ssouhlal

*_mountfs() (if the filesystem mounts from a device) needs devvp to be
locked, so lock it.

Glanced at by: phk
MFC after: 3 days


148089 17-Jul-2005 imura

[1] unix2doschr()
If a character cannot be converted to DOS code page,
unix2doschr() returned `0'. As a result, unix2dosfn()
was forced to return `0', so we saw a file which was
composed of these characters as `Invalid argument'.
To correct this, if a character can be converted to
Unicode, unix2doschr() now returns `1' which is a magic
number to make unix2dosfn() know that the character
must be converted to `_'.

[2] unix2dosfn()
The above-mentioned solution only works if a file
has both of Unicode name and DOS code page name.
Unicode name would not be recorded if file name
can be settled within 11 bytes (DOS short name)
and if no conversion from Unix charset to DOS code
page has occurred. Thus, FreeBSD can create a file
which has only short name, but there is no guarantee
that the short name contains allways valid characters
because we leave it to people by using mount_msdosfs(8)
to select which conversion is used between DOS code
page and unix charset.
To avoid this, Unicode file name should be recorded
unless a character is an ascii character. This is
the way Windows XP do.

PR: 77074 [1]
MFC after: 1 week


145174 16-Apr-2005 das

Disable negative name caching for msdosfs to work around a bug.
Since the name cache is case-sensitive and msdosfs isn't,
creating a file 'foo' won't invalidate a negative entry for 'FOO'.
There are similar problems related to 8.3 filenames.

A better solution is to override VOP_LOOKUP with a method that
canonicalizes the name, then calls vfs_cache_lookup(). Unfortunately,
it's not quite that simple because vfs_cache_lookup() will call
msdosfs_lookup() on a cache miss, and msdosfs_lookup() needs a way to
get at the original component name.


145131 16-Apr-2005 njl

Fix mbnambuf support for multi-byte characters. If a substring is larger
than WIN_CHARS bytes, we shift the suffix (previous substrings) upwards
by the amount this substring exceeds its WIN_CHARS slot. Profiling shows
this change is indistinguishable from the previous code at 95% confidence.
This bug would result in attempts to access or create files or directories
with multi-byte characters returning an error but no data loss.

Reported and tested by: avatar
MFC after: 3 days


145006 13-Apr-2005 jeff

- Change all filesystems and vfs_cache to relock the dvp once the child is
locked in the ISDOTDOT case. Se vfs_lookup.c r1.79 for details.

Sponsored by: Isilon Systems, Inc.


144740 07-Apr-2005 phk

Give msdosfs a unique inode number which is really the byteoffset of
the directory entry.

This solves the corruption problem I belive.

Regression test script by: silby


144298 29-Mar-2005 jeff

- Remove wantparent, it is no longer necessary. An assert in vfs_lookup.c
prevents any callers from doing a DELETE or RENAME without locking
the parent.


144208 28-Mar-2005 jeff

- We no longer have to bother with PDIRUNLOCK, lookup() handles it for us.

Sponsored by: Isilon Systems, Inc.


144103 25-Mar-2005 jeff

- Pass LK_EXCLUSIVE as the lock type to vget in vfs_hash_insert().


144058 24-Mar-2005 jeff

- Update vfs_root implementations to match the new prototype. None of
these filesystems will support shared locks until they are explicitly
modified to do so. Careful review must be done to ensure that this
is safe for each individual filesystem.

Sponsored by: Isilon Systems, Inc.


143692 16-Mar-2005 phk

Add two arguments to the vfs_hash() KPI so that filesystems which do
not have unique hashes (NFS) can also use it.


143667 15-Mar-2005 phk

Eliminate cdev pointer in inodes, they're not used or needed.

The cdev could have been pulled out of the mountpoint cheaper back
when it was used anyway.


143663 15-Mar-2005 phk

Improve the vfs_hash() API: vput() the unneeded vnode centrally to
avoid replicating the vput in all the filesystems.


143619 15-Mar-2005 phk

Simplify the vfs_hash calling convention.


143570 14-Mar-2005 phk

Use vfs_hash instead of home-rolling.


143513 13-Mar-2005 jeff

- The VI_DOOMED flag now signals the end of a vnode's relationship with
the filesystem. Check that rather than VI_XLOCK.
- VOP_INACTIVE should no longer drop the vnode lock.
- The vnode lock is required around calls to vrecycle() and vgone().

Sponsored by: Isilon Systems, Inc.


143446 12-Mar-2005 obrien

Used unsigned version.

Submitted by: jmallett


143444 12-Mar-2005 obrien

Fix kernel build on 64-bit machines.


143436 11-Mar-2005 njl

Correct a last-minute thinko. Instead of copying the nul with the string,
nul-terminate the dp->d_name directly and only copy the string.


143435 11-Mar-2005 njl

The mbnambuf routines combine multiple substrings into a single
long filename. Each substring is indexed by the windows ID, a
sequential one-based value. The previous code was extremely slow,
doing a malloc/strcpy/free for each substring.

This code optimizes these routines with this in mind, using the ID
to index into a single array and concatenating each WIN_CHARS chunk
at once. (The last chunk is variable-length.)

This code has been tested as working on an FS with difficult filename
sizes (255, 13, 26, etc.) It gives a 77.1% decrease in profiled
time (total across all functions) and a 73.7% decrease in wall time.
Test was "ls -laR > /dev/null".

Per-function time savings:
mbnambuf_init: -90.7%
mbnambuf_write: -18.7%
mbnambuf_flush: -67.1%

MFC after: 1 month


142692 27-Feb-2005 phk

Remove debug printout of major/minor numbers, print name instead.


142235 22-Feb-2005 phk

Use vn_printf() instead of home-rolling.


141497 08-Feb-2005 njl

Unroll the loop for calculating the 8.3 filename checksum. In testing
on my P3, microbenchmarks show the unrolled version is 78x faster. In
actual use (recursive ls), this gives an average of 9% improvement in
system time and 2% improvement in wall time.


140965 29-Jan-2005 peadar

Unbreak a few filesystems for which vnode_create_vobject() wasn't being
called in "open", causing mmap() to fail.

Where possible, pass size of file to vnode_create_vobject() rather
than having it find it out the hard way via VOP_LOOKUP

Reviewed by: phk


140939 28-Jan-2005 phk

Make filesystems get rid of their own vnodes vnode_pager object in
VOP_RECLAIM().


140936 28-Jan-2005 phk

Remove unused argument to vrecycle()


140822 25-Jan-2005 phk

Introduce and use g_vfs_close().


140768 24-Jan-2005 phk

Create a vp->v_object in VFS_FHTOVP() if we want to be exportable
with NFS.

We are moving responsibility for creating the vnode_pager object into
the filesystems which own the vnode, and this is one of the places
we have to cover.

We call vnode_create_vobject() directly because we own the vnode.

If we can get the size easily, pass it as an argument to save the
call to VOP_GETATTR() in vnode_create_vobject()


140196 13-Jan-2005 phk

Whitespace in vop_vector{} initializations.


140051 11-Jan-2005 phk

Wrap the bufobj operations in macros: BO_STRATEGY() and BO_WRITE()


140048 11-Jan-2005 phk

Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC().

I'm not sure why a credential was added to these in the first place, it is
not used anywhere and it doesn't make much sense:

The credentials for syncing a file (ability to write to the
file) should be checked at the system call level.

Credentials for syncing one or more filesystems ("none")
should be checked at the system call level as well.

If the filesystem implementation needs a particular credential
to carry out the syncing it would logically have to the
cached mount credential, or a credential cached along with
any delayed write data.

Discussed with: rwatson


139776 06-Jan-2005 imp

/* -> /*- for copyright notices, minor format tweaks as necessary


138689 11-Dec-2004 phk

Handle MNT_UPDATE export requests first and return so we do not
interpret the rest of the msdosfs_args structure.

Detected by: marcel


138471 06-Dec-2004 phk

Convert msdosfs to nmount.

Add a vfs_cmount() function which converts omount argument stucture
to nmount arguments.

Convert vfs_omount() to vfs_mount() and parse nmount arguments.

This is 100% compatible with existing userland.

Later on, but before userland gets converted to nmount we may want
to revisit the names of the mountoptions, for instance it may make
sense to use consistent options for charset conversion etc.


138412 05-Dec-2004 phk

VFS_STATFS(mp, ...) is mostly called with &mp->mnt_stat, but a few cases
doesn't. Most of the implementations have grown weeds for this so they
copy some fields from mnt_stat if the passed argument isn't that.

Fix this the cleaner way: Always call the implementation on mnt_stat
and copy that in toto to the VFS_STATFS argument if different.


138309 02-Dec-2004 phk

Remove the de_devvp and stop VREF'ing it for every vnode we create.


138290 01-Dec-2004 phk

Back when VOP_* was introduced, we did not have new-style struct
initializations but we did have lofty goals and big ideals.

Adjust to more contemporary circumstances and gain type checking.

Replace the entire vop_t frobbing thing with properly typed
structures. The only casualty is that we can not add a new
VOP_ method with a loadable module. History has not given
us reason to belive this would ever be feasible in the the
first place.

Eliminate in toto VOCALL(), vop_t, VNODEOP_SET() etc.

Give coda correct prototypes and function definitions for
all vop_()s.

Generate a bit more data from the vnode_if.src file: a
struct vop_vector and protype typedefs for all vop methods.

Add a new vop_bypass() and make vop_default be a pointer
to another struct vop_vector.

Remove a lot of vfs_init since vop_vector is ready to use
from the compiler.

Cast various vop_mumble() to void * with uppercase name,
for instance VOP_PANIC, VOP_NULL etc.

Implement VCALL() by making vdesc_offset the offsetof() the
relevant function pointer in vop_vector. This is disgusting
but since the code is generated by a script comparatively
safe. The alternative for nullfs etc. would be much worse.

Fix up all vnode method vectors to remove casts so they
become typesafe. (The bulk of this is generated by scripts)


138270 01-Dec-2004 phk

Mechanically change prototypes for vnode operations to use the new typedefs.


137726 15-Nov-2004 phk

Make VOP_BMAP return a struct bufobj for the underlying storage device
instead of a vnode for it.

The vnode_pager does not and should not have any interest in what
the filesystem uses for backend.

(vfs_cluster doesn't use the backing store argument.)


137478 09-Nov-2004 phk

Refuse attemps to mount root filesystem


137036 29-Oct-2004 phk

Move MSDOSFS to GEOM backing instead of DEVFS.

For details, please see src/sys/ufs/ffs/ffs_vfsops.c 1.250.


137008 28-Oct-2004 phk

Reduce the locking activity by epsilon by checking VNON condition before
releasing the mountlock.


136991 27-Oct-2004 phk

Eliminate unnecessary KASSERTs.

Don't use bp->b_vp in VOP_STRATEGY: the vnode is passed in as an argument.


136943 25-Oct-2004 phk

Loose the v_dirty* and v_clean* alias macros.

Check the count field where we just want to know the full/empty state,
rather than using TAILQ_EMPTY() or TAILQ_FIRST().


134945 08-Sep-2004 tjr

Reduce the size of struct defid's defid_dirclust, defid_dirofs and
(disabled) defid_gen members from u_long to u_int32_t so that alignment
requirements don't cause the structure to become larger than struct fid
on LP64 platforms. This fixes NFS exports of msdos filesystems on at
least amd64.

PR: 71173


134942 08-Sep-2004 tjr

Merge from NetBSD:
Fix a problem in previous: we can't blindly assume that we have
wincnt entries available at the offset the file has been found. If the dos
directory entry is not preceded by appropriate number of long name
entries (happens e.g. when the filesystem is corrupted, or when
the filename complies to DOS rules and doesn't use any long name entry),
we would overwrite random directory entries.

There are still some problems, the whole thing has to be revisited and solved
right.

Submitted by: Xin LI


134941 08-Sep-2004 tjr

Merge from NetBSD:
Fix a panic that occurred when trying to traverse a corrupt msdosfs
filesystem. With this particular corruption, the code in pcbmap()
would compute an offset into an array that was way out of bounds,
so check the bounds before trying to access and return an error if
the offset would be out of bounds.

Submitted by: Xin LI


134899 07-Sep-2004 phk

Create simple function init_va_filerev() for initializing a va_filerev
field.

Replace three instances of longhaired initialization va_filerev fields.

Added XXX comment wondering why we don't use random bits instead of
uptime of the system for this purpose.


134374 27-Aug-2004 tjr

Remove bogus vrele() call added in previous.


134345 26-Aug-2004 tjr

Improve the robustness of MSDOSFSMNT_KICONV handling:
- Use copyinstr() to read cs_win, cs_dos, cs_local strings from the
mount argument structure instead of reading through user-space pointers(!).
- When mounting a filesystem, or updating an existing mount, only try to
update the iconv handles from the information in the mount argument
structure if the structure itself has the MSDOSFSMNT_KICONV flag set.
- Attempt to handle failure of update_mp() in the MNT_UPDATE case.


133287 07-Aug-2004 phk

Push all changes to disk before downgrading a mount from rw to ro.


132902 30-Jul-2004 phk

Put a version element in the VFS filesystem configuration structure
and refuse initializing filesystems with a wrong version. This will
aid maintenance activites on the 5-stable branch.

s/vfs_mount/vfs_omount/

s/vfs_nmount/vfs_mount/

Name our filesystems mount function consistently.

Eliminate the namiedata argument to both vfs_mount and vfs_omount.
It was originally there to save stack space. A few places abused
it to get hold of some credentials to pass around. Effectively
it is unused.

Reorganize the root filesystem selection code.


132805 28-Jul-2004 phk

Remove global variable rootdevs and rootvp, they are unused as such.

Add local rootvp variables as needed.

Remove checks for miniroot's in the swappartition. We never did that
and most of the filesystems could never be used for that, but it had
still been copy&pasted all over the place.


132653 26-Jul-2004 cperciva

Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is
somewhat clearer, but more importantly allows for a consistent naming
scheme for suser_cred flags.

The old name is still defined, but will be removed in a few days (unless I
hear any complaints...)

Discussed with: rwatson, scottl
Requested by: jhb


132023 12-Jul-2004 alfred

Make VFS_ROOT() and vflush() take a thread argument.
This is to allow filesystems to decide based on the passed thread
which vnode to return.
Several filesystems used curthread, they now use the passed thread.


131551 04-Jul-2004 phk

When we traverse the vnodes on a mountpoint we need to look out for
our cached 'next vnode' being removed from this mountpoint. If we
find that it was recycled, we restart our traversal from the start
of the list.

Code to do that is in all local disk filesystems (and a few other
places) and looks roughly like this:

MNT_ILOCK(mp);
loop:
for (vp = TAILQ_FIRST(&mp...);
(vp = nvp) != NULL;
nvp = TAILQ_NEXT(vp,...)) {
if (vp->v_mount != mp)
goto loop;
MNT_IUNLOCK(mp);
...
MNT_ILOCK(mp);
}
MNT_IUNLOCK(mp);

The code which takes vnodes off a mountpoint looks like this:

MNT_ILOCK(vp->v_mount);
...
TAILQ_REMOVE(&vp->v_mount->mnt_nvnodelist, vp, v_nmntvnodes);
...
MNT_IUNLOCK(vp->v_mount);
...
vp->v_mount = something;

(Take a moment and try to spot the locking error before you read on.)

On a SMP system, one CPU could have removed nvp from our mountlist
but not yet gotten to assign a new value to vp->v_mount while another
CPU simultaneously get to the top of the traversal loop where it
finds that (vp->v_mount != mp) is not true despite the fact that
the vnode has indeed been removed from our mountpoint.

Fix:

Introduce the macro MNT_VNODE_FOREACH() to traverse the list of
vnodes on a mountpoint while taking into account that vnodes may
be removed from the list as we go. This saves approx 65 lines of
duplicated code.

Split the insmntque() which potentially moves a vnode from one mount
point to another into delmntque() and insmntque() which does just
what the names say.

Fix delmntque() to set vp->v_mount to NULL while holding the
mountpoint lock.


131523 03-Jul-2004 tjr

By popular request, add a workaround that allows large (>128GB or so)
FAT32 filesystems to be mounted, subject to some fairly serious limitations.

This works by extending the internal pseudo-inode-numbers generated from
the file's starting cluster number to 64-bits, then creating a table
mapping these into arbitrary 32-bit inode numbers, which can fit in
struct dirent's d_fileno and struct vattr's va_fileid fields. The mappings
do not persist across unmounts or reboots, so it's not possible to export
these filesystems through NFS. The mapping table may grow to be rather
large, and may grow large enough to exhaust kernel memory on filesystems
with millions of files.

Don't enable this option unless you understand the consequences.


130585 16-Jun-2004 phk

Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.


126998 14-Mar-2004 rwatson

Don't reject FAT file systems with a number of "Heads" greater than
255; USB keychains exist that use 256 as the number of heads. This
check has also been removed in Darwin (along with most of the other
head/sector sanity checks).


126086 21-Feb-2004 bde

Fixed a serious off by 1 error. The cluster-in-use bitmap was overrun
by 1 u_int if the number of clusters was 1 more than a multiple of
(8 * sizeof(u_int)). The bitmap is malloced and large (often huge), so
fatal overrun probably only occurred if the number of clusters was 1
more than 1 multiple of PAGE_SIZE/8.


125992 19-Feb-2004 tjr

Use size_t or ssize_t wherever appropriate instead of casting from int *
to size_t *, which is incorrect because they may have different widths.
This caused some subtle forms of corruption, the mostly frequently
reported one being that the last character of a filename was sometimes
duplicated on amd64.


125942 17-Feb-2004 trhodes

Do not place dirmask in unnamed padding. Move it to the bottom of this
list where it should have been added originally.

Prodded by: bde


125934 17-Feb-2004 tjr

If the "next free cluster" field of the FSInfo block is 0xFFFFFFFF,
it means that the correct value is unknown. Since this value is just
a hint to improve performance, initially assume that the first non-reserved
cluster is free, then correct this assumption if necessary before writing
the FSInfo block back to disk.

PR: 62826
MFC after: 2 weeks


125796 14-Feb-2004 bde

Fixed some style bugs:
- don't unlock the vnode after vinvalbuf() only to have to relock it
almost immediately.
- don't refer to devices classified by vn_isdisk() as block devices.


125739 12-Feb-2004 bde

MFffs (ffs_vfsops.c 1.227: clean up open mode bandaid). This reduces
gratuitous differences with ffs a little.


125454 04-Feb-2004 jhb

Locking for the per-process resource limits structure.
- struct plimit includes a mutex to protect a reference count. The plimit
structure is treated similarly to struct ucred in that is is always copy
on write, so having a reference to a structure is sufficient to read from
it without needing a further lock.
- The proc lock protects the p_limit pointer and must be held while reading
limits from a process to keep the limit structure from changing out from
under you while reading from it.
- Various global limits that are ints are not protected by a lock since
int writes are atomic on all the archs we support and thus a lock
wouldn't buy us anything.
- All accesses to individual resource limits from a process are abstracted
behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return
either an rlimit, or the current or max individual limit of the specified
resource from a process.
- dosetrlimit() was renamed to kern_setrlimit() to match existing style of
other similar syscall helper functions.
- The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit()
(it didn't used the stackgap when it should have) but uses lim_rlimit()
and kern_setrlimit() instead.
- The svr4 compat no longer uses the stackgap for resource limits calls,
but uses lim_rlimit() and kern_setrlimit() instead.
- The ibcs2 compat no longer uses the stackgap for resource limits. It
also no longer uses the stackgap for accessing sysctl's for the
ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result,
ibcs2_sysconf() no longer needs Giant.
- The p_rlimit macro no longer exists.

Submitted by: mtm (mostly, I only did a few cleanups and catchups)
Tested on: i386
Compiled on: alpha, amd64


123967 29-Dec-2003 bde

Fixed some (most) style bugs in rev.1.33. Mainly 4-char indentation
(msdosfs uses normal 8-char indentation almost everywhere else),
too-long lines, and minor English usage errors. The verbose formal
comment before the new function is still abnormal.


123964 29-Dec-2003 bde

Fixed some minor style bugs in rev.1.144. All related to msdosfs_advlock()
(mainly unsorting). There were no changes related to the dirty flag
here. The reference NetBSD implementation put msdosfs_advlock() in a
different place. This commit only moves its declarations and changes
some of the function body to be like the NetBSD version.


123963 29-Dec-2003 bde

Fixed style bugs in rev.1.112. The bugs started with obscure magic
numbers in comments (Apple PR numbers?) and didn't improve.


123873 26-Dec-2003 trhodes

Make msdosfs support the dirty flag in FAT16 and FAT32.
Enable lockf support.

PR: 55861
Submitted by: Jun Su <junsu@m-net.arbornet.org> (original version)
Reviewed by: make universe


123293 08-Dec-2003 fjoe

Make msdosfs long filenames matching case insensitive again.

PR: 59765
Submitted by: Ryuichiro Imura <imura@ryu16.org>


122091 05-Nov-2003 kan

Remove mntvnode_mtx and replace it with per-mountpoint mutex.
Introduce two new macros MNT_ILOCK(mp)/MNT_IUNLOCK(mp) to
operate on this mutex transparently.

Eventually new mutex will be protecting more fields in
struct mount, not only vnode list.

Discussed with: jeff


121874 02-Nov-2003 kan

Take care not to call vput if thread used in corresponding vget
wasn't curthread, i.e. when we receive a thread pointer to use
as a function argument. Use VOP_UNLOCK/vrele in these cases.

The only case there td != curthread known at the moment is
boot() calling sync with thread0 pointer.

This fixes the panic on shutdown people have reported.


121847 01-Nov-2003 kan

Temporarily undo parts of the stuct mount locking commit by jeff.
It is unsafe to hold a mutex across vput/vrele calls.

This will be redone when a better locking strategy is agreed upon.

Discussed with: jeff


121205 18-Oct-2003 phk

DuH!

bp->b_iooffset (the spot on the disk), not bp->b_offset (the offset in
the file)


121196 18-Oct-2003 phk

Initialize b_offset before calling VOP_STRATEGY/VOP_SPECSTRATEGY.

Remove various comments of KASSERTS and comments about B_PHYS which
does not apply anymore.


120785 05-Oct-2003 jeff

- Check the XLOCK prior to inspecting v_data.


120775 05-Oct-2003 jeff

- Don't cache_purge() in *_reclaim routines. vclean() does it for us so
this is redundant.


120733 04-Oct-2003 jeff

- Acquire the vnode interlock prior to droping the mntvnode_mtx. This does
not eliminate races where the vnode could be reclaimed and end up with
a NULL v_data pointer but Giant is protecting us from that at the moment.


120498 27-Sep-2003 bde

Fixed some style bugs in previous commit. Mainly, forward-declare
struct msdosfsmount so that this file has the same prerequisites as
it used to. The new prerequistite was a meta-style bug. It required
many style bugs (unsorted includes ...) elsewhere.

Formatted prototypes in KNF. Resisted urge to sort all the prototypes,
to minimise differences with NetBSD. (NetBSD has reformatted the
prototypes but has not sorted them and still uses __P(()).)


120492 26-Sep-2003 fjoe

- Support for multibyte charsets in LIBICONV.
- CD9660_ICONV, NTFS_ICONV and MSDOSFS_ICONV kernel options
(with corresponding modules).
- kiconv(3) for loadable charset conversion tables support.

Submitted by: Ryuichiro Imura <imura@ryu16.org>


118837 12-Aug-2003 trhodes

Add a '-M mask' option so that users can have different
masks for files and directories. This should make some
of the Midnight Commander users happy.

Remove an extra ')' in the manual page.

PR: 35699
Submitted by: Eugene Grosbein <eugen@grosbein.pp.ru> (original version)
Tested by: simon


118047 26-Jul-2003 phk

Add a "int fd" argument to VOP_OPEN() which in the future will
contain the filedescriptor number on opens from userland.

The index is used rather than a "struct file *" since it conveys a bit
more information, which may be useful to in particular fdescfs and /dev/fd/*

For now pass -1 all over the place.


117200 03-Jul-2003 trhodes

If bread() returns a zero-length buffer, as can happen after a
failed write, return an error instead of looping forever.

PR: 37035
Submitted by: das


117018 29-Jun-2003 tjr

XXX Copy workaround from UFS: open device for write access even if
the user requests a read-only mount. This is necessary because we
don't do the VOP_OPEN again if they upgrade a read-only mount to
read-write.

Fixes lockup when creating files on msdosfs mounts that have been
mounted read-only then upgraded to read-write. The exact cause of
the lockup is not known, but it is likely to be the kernel getting
stuck in an infinite loop trying to write dirty buffers to a device
without write permission.

Reported/tested by andreas, discussed with phk.


116917 27-Jun-2003 trhodes

Fix a bug where a truncate operation involving truncate() or ftruncate() on
an MSDOSFS file system either failed, silently corrupted the file, or
sometimes corrupted the neighboring file.

PR: 53695
Submitted by: Ariff Abdullah <skywizard@MyBSD.org.my> (original version)
MFC: 3 days


116796 24-Jun-2003 jmg

change dev_t to struct cdev * to match ufs. This fixes fstat for cd9660
and msdosfs.

Reviewed by: bde


116412 15-Jun-2003 phk

Add the same KASSERT to all VOP_STRATEGY and VOP_SPECSTRATEGY implementations
to check that the buffer points to the correct vnode.


116271 12-Jun-2003 phk

Initialize struct vfsops C99-sparsely.

Submitted by: hmp
Reviewed by: phk


115549 31-May-2003 phk

Remove unused variable(s).

Found by: FlexeLint


113979 24-Apr-2003 jhb

Fail to mount a device if the bytes per sector in the BPB is less than
DEV_BSIZE or if the number of FAT sectors is zero.


113310 10-Apr-2003 imp

It appears that msdosfs_init() is called multiple times. This happens
on my system where I preload msdosfs and have it in my kernel.
There's likely another bug that's causing msdosfs_init() to be called
multiple times, but this makes that harmless.


111856 04-Mar-2003 jeff

- Add a new 'flags' parameter to getblk().
- Define one flag GB_LOCK_NOWAIT that tells getblk() to pass the LK_NOWAIT
flag to the initial BUF_LOCK(). This will eventually be used in cases
were we want to use a buffer only if it is not currently in use.
- Convert all consumers of the getblk() api to use this extra parameter.

Reviwed by: arch
Not objected to by: mckusick


111841 03-Mar-2003 njl

Finish cleanup of vprint() which was begun with changing v_tag to a string.
Remove extraneous uses of vop_null, instead defering to the default op.
Rename vnode type "vfs" to the more descriptive "syncer".
Fix formatting for various filesystems that use vop_print.


111742 02-Mar-2003 des

Clean up whitespace, s/register //, refrain from strong urge to ANSIfy.


111741 02-Mar-2003 des

uiomove-related caddr_t -> void * (just the low-hanging fruit)


111119 19-Feb-2003 imp

Back out M_* changes, per decision of the TRB.

Approved by: trb


110584 09-Feb-2003 jeff

- Cleanup unlocked accesses to buf flags by introducing a new b_vflag member
that is protected by the vnode lock.
- Move B_SCANNED into b_vflags and call it BV_SCANNED.
- Create a vop_stdfsync() modeled after spec's sync.
- Replace spec_fsync, msdos_fsync, and hpfs_fsync with the stdfsync and some
fs specific processing. This gives all of these filesystems proper
behavior wrt MNT_WAIT/NOWAIT and the use of the B_SCANNED flag.
- Annotate the locking in buf.h


110299 03-Feb-2003 phk

Split the global timezone structure into two integer fields to
prevent the compiler from optimizing assignments into byte-copy
operations which might make access to the individual fields non-atomic.

Use the individual fields throughout, and don't bother locking them with
Giant: it is no longer needed.

Inspired by: tjr


109623 21-Jan-2003 alfred

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


108686 04-Jan-2003 phk

Temporarily introduce a new VOP_SPECSTRATEGY operation while I try
to sort out disk-io from file-io in the vm/buffer/filesystem space.

The intent is to sort VOP_STRATEGY calls into those which operate
on "real" vnodes and those which operate on VCHR vnodes. For
the latter kind, the call will be changed to VOP_SPECSTRATEGY,
possibly conditionally for those places where dual-use happens.

Add a default VOP_SPECSTRATEGY method which will call the normal
VOP_STRATEGY. First time it is called it will print debugging
information. This will only happen if a normal vnode is passed
to VOP_SPECSTRATEGY by mistake.

Add a real VOP_SPECSTRATEGY in specfs, which does what VOP_STRATEGY
does on a VCHR vnode today.

Add a new VOP_STRATEGY method in specfs to catch instances where
the conversion to VOP_SPECSTRATEGY has not yet happened. Handle
the request just like we always did, but first time called print
debugging information.

Apart up to two instances of console messages per boot, this amounts
to a glorified no-op commit.

If you get any of the messages on your console I would very much
like a copy of them mailed to phk@freebsd.org


108648 04-Jan-2003 phk

Since Jeffr made the std* functions the default in rev 1.63 of
kern/vfs_defaults.c it is wrong for the individual filesystems to use
the std* functions as that prevents override of the default.

Found by: src/tools/tools/vop_table


108357 28-Dec-2002 dillon

Abstract-out the constants for the sequential heuristic.

No operational changes.

MFC after: 1 day


106696 09-Nov-2002 alfred

Fix instances of macros with improperly parenthasized arguments.

Verified by: md5


106110 29-Oct-2002 semenu

Fix winChkName() to match when the last slot contains nothing but the
terminating zero (it was treated as length missmatch). The mtools create
such slots if the name len is the product of 13 (max number of unicode
chars fitting in directory slot).

MFC after: 1 week


105655 21-Oct-2002 jhb

Grrr, s/PBP/BPB/ here as well.

Noticed by: peter


105645 21-Oct-2002 jhb

Spell the BPB member of the 7.10 bootsector as bsBPB rather than bsPBP to
be like all the other bootsectors. Apple has done the same it seems.


105077 14-Oct-2002 mckusick

Regularize the vop_stdlock'ing protocol across all the filesystems
that use it. Specifically, vop_stdlock uses the lock pointed to by
vp->v_vnlock. By default, getnewvnode sets up vp->v_vnlock to
reference vp->v_lock. Filesystems that wish to use the default
do not need to allocate a lock at the front of their node structure
(as some still did) or do a lockinit. They can simply start using
vn_lock/VOP_UNLOCK. Filesystems that wish to manage their own locks,
but still use the vop_stdlock functions (such as nullfs) can simply
replace vp->v_vnlock with a pointer to the lock that they wish to
have used for the vnode. Such filesystems are responsible for
setting the vp->v_vnlock back to the default in their vop_reclaim
routine (e.g., vp->v_vnlock = &vp->v_lock).

In theory, this set of changes cleans up the existing filesystem
lock interface and should have no function change to the existing
locking scheme.

Sponsored by: DARPA & NAI Labs.


103936 25-Sep-2002 jeff

- Use vrefcnt() where it is safe to do so instead of doing direct and
unlocked accesses to v_usecount.
- Lock access to the buf lists in the various sync routines. interlock
locking could be avoided almost entirely in leaf filesystems if the
fsync function had a generic helper.


103559 18-Sep-2002 njl

Remove any VOP_PRINT that redundantly prints the tag.
Move lockmgr_printinfo() into vprint() for everyone's benefit.

Suggested by: bde


103314 14-Sep-2002 njl

Remove all use of vnode->v_tag, replacing with appropriate substitutes.
v_tag is now const char * and should only be used for debugging.

Additionally:
1. All users of VT_NTS now check vfsconf->vf_type VFCF_NETWORK
2. The user of VT_PROCFS now checks for the new flag VV_PROCDEP, which
is propagated by pseudofs to all child vnodes if the fs sets PFS_PROCDEP.

Suggested by: phk
Reviewed by: bde, rwatson (earlier version)


102392 25-Aug-2002 bde

Fixed printf format errors and style bugs in rev.1.92. This is the version
that should have been committed in rev.1.93.


102391 25-Aug-2002 bde

Oops, the previous commit wasn't the version that I meant to commit (it
does some extra things which are probably harmless). Back it out.


102385 25-Aug-2002 bde

Fixed printf format errors and style bugs in previous commit.


102295 22-Aug-2002 trhodes

Fix a bug where large msdos partitions were not handled correctly, and fix
a few fsck_msdosfs related 'issues'

PR: 28536, 30168
Submitted by: Jiangyi Liu <jyliu@163.net> && NetBSD
Approved by: rwatson (mentor)


101967 16-Aug-2002 trhodes

When a cluster entry for ``.'' is set to 0, msdosfs fails to handle it
correctly.

PR: 24393
Submitted by: semenu
Approved by: rwatson (mentor)
MFC after: 1 week


101777 13-Aug-2002 phk

Introduce typedefs for the member functions of struct vfsops and employ
these in the main filesystems. This does not change the resulting code
but makes the source a little bit more grepable.

Sponsored by: DARPA and NAI Labs.


101404 05-Aug-2002 pb

Fix typo in vnode flags causing deadlock in msdosfs_fsync().

Reviewed by: jeff


101308 04-Aug-2002 jeff

- Replace v_flag with v_iflag and v_vflag
- v_vflag is protected by the vnode lock and is used when synchronization
with VOP calls is needed.
- v_iflag is protected by interlock and is used for dealing with vnode
management issues. These flags include X/O LOCK, FREE, DOOMED, etc.
- All accesses to v_iflag and v_vflag have either been locked or marked with
mp_fixme's.
- Many ASSERT_VOP_LOCKED calls have been added where the locking was not
clear.
- Many functions in vfs_subr.c were restructured to provide for stronger
locking.

Idea stolen from: BSD/OS


96755 16-May-2002 trhodes

More s/file system/filesystem/g


96572 14-May-2002 phk

Make daddr_t and u_daddr_t 64bits wide.
Retire daddr64_t and use daddr_t instead.

Sponsored by: DARPA & NAI Labs.


93883 05-Apr-2002 bde

Fixed a very old bug in setting timestamps using utimes(2) on msdosfs
files. We didn't clear the update marks when we set the times, so
some of the settings were sometimes clobbered with the current time a
little later. This caused cp -p even by root to almost always fail
to preserve any times despite not reporting any errors in attempting
to preserve them.

Don't forget to set the archive attribute when we set the read-only
attribute. We should only set the archive attribute if we actually
change something, but we mostly don't bother avoiding setting it
elsewhere, so don't bother here yet.

MFC after: 1 week


93818 04-Apr-2002 jhb

Change callers of mtx_init() to pass in an appropriate lock type name. In
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.

Tested on: i386, alpha, sparc64


93593 01-Apr-2002 jhb

Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API. The entire API now consists of two functions
similar to the pre-KSE API. The suser() function takes a thread pointer
as its only argument. The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0. The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on: smp@


93012 23-Mar-2002 bde

Fixed some style bugs in the removal of __P(()). Continuation lines
were not outdented to preserve non-KNF lining up of code with parentheses.
Switch to KNF formatting.


92727 19-Mar-2002 alfred

Remove __P.


92363 15-Mar-2002 mckusick

Introduce the new 64-bit size disk block, daddr64_t. Change
the bio and buffer structures to have daddr64_t bio_pblkno,
b_blkno, and b_lblkno fields which allows access to disks
larger than a Terabyte in size. This change also requires
that the VOP_BMAP vnode operation accept and return daddr64_t
blocks. This delta should not affect system operation in
any way. It merely sets up the necessary interfaces to allow
the development of disk drivers that work with these larger
disk block addresses. It also allows for the development of
UFS2 which will use 64-bit block addresses.


91406 27-Feb-2002 jhb

Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.


88318 20-Dec-2001 dillon

Fix a BUF_TIMELOCK race against BUF_LOCK and fix a deadlock in vget()
against VM_WAIT in the pageout code. Both fixes involve adjusting
the lockmgr's timeout capability so locks obtained with timeouts do not
interfere with locks obtained without a timeout.

Hopefully MFC: before the 4.5 release


87068 28-Nov-2001 jhb

Fix indentation after removing GEMDOS support. Whitespace changes only.


87067 28-Nov-2001 jhb

Use suser_td() instead of explicitly checking cr_uid against 0.

PR: kern/21809
Submitted by: <mbendiks@eunet.no>
Reviewed by: rwatson


87061 28-Nov-2001 jhb

Axe more unused GEMDOS code that was #ifdef atari.

PR: kern/21809
Submitted by: <mbendiks@eunet.no>


87007 27-Nov-2001 jhb

Remove GEMDOS support from msdosfs. I don't think anyone is going to
port FreeBSD to Atari machines any time soon.


86037 04-Nov-2001 dillon

Add mnt_reservedvnlist so we can MFC to 4.x, in order to make all mount
structure changes now rather then piecemeal later on. mnt_nvnodelist
currently holds all the vnodes under the mount point. This will eventually
be split into a 'dirty' and 'clean' list. This way we only break kld's once
rather then twice. nvnodelist will eventually turn into the dirty list
and should remain compatible with the klds.


85339 23-Oct-2001 dillon

Change the vnode list under the mount point from a LIST to a TAILQ
in preparation for an implementation of limiting code for kern.maxvnodes.

MFC after: 3 days


83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


79996 19-Jul-2001 assar

remove support for creating files and directories from msdosfs_mknod


78907 28-Jun-2001 jhb

Fix a mntvnode and vnode interlock reversal.


77577 01-Jun-2001 ru

- VFS_SET(msdos) -> VFS_SET(msdosfs)
- msdos.ko -> msdosfs.ko
- mount_msdos(8) -> mount_msdosfs(8)
- "msdos" -> "msdosfs" compatibility glue in mount(8)


77162 25-May-2001 ru

- sys/msdosfs moved to sys/fs/msdosfs
- msdos.ko renamed to msdosfs.ko
- /usr/include/msdosfs moved to /usr/include/fs/msdosfs


76688 16-May-2001 iedowse

Change the second argument of vflush() to an integer that specifies
the number of references on the filesystem root vnode to be both
expected and released. Many filesystems hold an extra reference on
the filesystem root vnode, which must be accounted for when
determining if the filesystem is busy and then released if it isn't
busy. The old `skipvp' approach required individual filesystem
xxx_unmount functions to re-implement much of vflush()'s logic to
deal with the root vnode.

All 9 filesystems that hold an extra reference on the root vnode
got the logic wrong in the case of forced unmounts, so `umount -f'
would always fail if there were any extra root vnode references.
Fix this issue centrally in vflush(), now that we can.

This commit also fixes a vnode reference leak in devfs, which could
result in idle devfs filesystems that refuse to unmount.

Reviewed by: phk, bp


76167 01-May-2001 phk

Implement vop_std{get|put}pages() and add them to the default vop[].

Un-copy&paste all the VOP_{GET|PUT}PAGES() functions which do nothing but
the default.


76117 29-Apr-2001 grog

Revert consequences of changes to mount.h, part 2.

Requested by: bde


75934 25-Apr-2001 phk

Move the netexport structure from the fs-specific mountstructure
to struct mount.

This makes the "struct netexport *" paramter to the vfs_export
and vfs_checkexport interface unneeded.

Consequently that all non-stacking filesystems can use
vfs_stdcheckexp().

At the same time, make it a pointer to a struct netexport
in struct mount, so that we can remove the bogus AF_MAX
and #include <net/radix.h> from <sys/mount.h>


75858 23-Apr-2001 grog

Correct #includes to work with fixed sys/mount.h.


73929 07-Mar-2001 jhb

Grab the process lock while calling psignal and before calling psignal.


73286 01-Mar-2001 adrian

Reviewed by: jlemon

An initial tidyup of the mount() syscall and VFS mount code.

This code replaces the earlier work done by jlemon in an attempt to
make linux_mount() work.

* the guts of the mount work has been moved into vfs_mount().

* move `type', `path' and `flags' from being userland variables into being
kernel variables in vfs_mount(). `data' remains a pointer into
userspace.

* Attempt to verify the `type' and `path' strings passed to vfs_mount()
aren't too long.

* rework mount() and linux_mount() to take the userland parameters
(besides data, as mentioned) and pass kernel variables to vfs_mount().
(linux_mount() already did this, I've just tidied it up a little more.)

* remove the copyin*() stuff for `path'. `data' still requires copyin*()
since its a pointer into userland.

* set `mount->mnt_statf_mntonname' in vfs_mount() rather than in each
filesystem. This variable is generally initialised with `path', and
each filesystem can override it if they want to.

* NOTE: f_mntonname is intiailised with "/" in the case of a root mount.


72200 09-Feb-2001 bmilekic

Change and clean the mutex lock interface.

mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)


72091 06-Feb-2001 asmodai

Fix typo: seperate -> separate.

Seperate does not exist in the english language.


71999 04-Feb-2001 phk

Mechanical change to use <sys/queue.h> macro API instead of
fondling implementation details.

Created with: sed(1)
Reviewed by: md5(1)


71576 24-Jan-2001 jasone

Convert all simplelocks to mutexes and remove the simplelock implementations.


70536 31-Dec-2000 phk

Use macro API to <sys/queue.h>


69781 08-Dec-2000 dwmalone

Convert more malloc+bzero to malloc+M_ZERO.

Submitted by: josh@zipperup.org
Submitted by: Robert Drehmel <robd@gmx.net>


67885 29-Oct-2000 phk

Weaken a bogus dependency on <sys/proc.h> in <sys/buf.h> by #ifdef'ing
the offending inline function (BUF_KERNPROC) on it being #included
already.

I'm not sure BUF_KERNPROC() is even the right thing to do or in the
right place or implemented the right way (inline vs normal function).

Remove consequently unneeded #includes of <sys/proc.h>


67438 22-Oct-2000 bp

Update stale comment.

PR: kern/21805


67437 22-Oct-2000 bp

Remove de_lock field from denode structure and make msdosfs PDIRUNLOCK aware.


66886 09-Oct-2000 eivind

Blow away the v_specmountpoint define, replacing it with what it was
defined as (rdev->si_mountpoint)


66615 04-Oct-2000 jasone

Convert lockmgr locks from using simple locks to using mutexes.

Add lockdestroy() and appropriate invocations, which corresponds to
lockinit() and must be called to clean up after a lockmgr lock is no
longer needed.


65200 29-Aug-2000 rwatson

o Restructure vaccess() so as to check for DAC permission to modify the
object before falling back on privilege. Make vaccess() accept an
additional optional argument, privused, to determine whether
privilege was required for vaccess() to return 0. Add commented
out capability checks for reference. Rename some variables to make
it more clear which modes/uids/etc are associated with the object,
and which with the access mode.
o Update file system use of vaccess() to pass NULL as the optional
privused argument. Once additional patches are applied, suser()
will no longer set ASU, so privused will permit passing of
privilege information up the stack to the caller.

Reviewed by: bde, green, phk, -security, others
Obtained from: TrustedBSD Project


65075 25-Aug-2000 bde

Quick fix for msdsofs_write() on alphas and other machines with either
longs larger than 32 bits or strict alignment requirements.

pm_fatmask had type u_long, but it must have a type that has precisely
32 bits and this type must be no smaller than int, so that ~pmp->pm_fatmask
has no bits above the 31st set. Otherwise, comparisons between (cn
| ~pmp->pm_fatmask) and magic 32-bit "cluster" numbers always fail.
The correct fix is to use the C99 type uint_least32_t and mask with
0xffffffff. The quick fix is to use u_int32_t and assume that ints
have

msdosfs metadata is riddled with unaligned fields, and on alphas,
unaligned_fixup() apparently has problems fixing up the unaligned
accesses caused by this. The quick fix is to not comment out the
NetBSD code that sort of handles this, and define UNALIGNED_ACCESS on
i386's so that the code doesn't change on i386's. The correct fix
would define UNALIGNED_ACCESS in a central machine-dependent header
and maybe add some extra cases to unaligned_fixup(). UNALIGNED_ACCESS
is also tested in isofs.

Submitted by: parts by Mark Abene <phiber@radicalmedia.com>
PR: 19086


64865 20-Aug-2000 phk

Centralize the canonical vop_access user/group/other check in vaccess().

Discussed with: bde


63141 14-Jul-2000 dwmalone

Certain error contitions cause msdosfs_rename() to decrement the
vnode reference count on 'fdvp' more times than it should.

PR: 17347
Submitted by: Ian Dowse <iedowse@maths.tcd.ie>
Approved by: bde


62227 29-Jun-2000 bp

Fix memory leakage on module unload.

Spotted by: fixed INVARIANTS code


62048 25-Jun-2000 bp

Remove obsolete comment.

Submitted by: Marius Bendiksen <mbendiks@eunet.no>


60041 05-May-2000 phk

Separate the struct bio related stuff out of <sys/buf.h> into
<sys/bio.h>.

<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall
not be made a nested include according to bdes teachings on the
subject of nested includes.

Diskdrivers and similar stuff below specfs::strategy() should no
longer need to include <sys/buf.> unless they need caching of data.

Still a few bogus uses of struct buf to track down.

Repocopy by: peter


59794 30-Apr-2000 phk

Remove unneeded #include <vm/vm_zone.h>

Generated by: src/tools/tools/kerninclude


59249 15-Apr-2000 phk

Complete the bio/buf divorce for all code below devfs::strategy

Exceptions:
Vinum untouched. This means that it cannot be compiled.
Greg Lehey is on the case.

CCD not converted yet, casts to struct buf (still safe)

atapi-cd casts to struct buf to examine B_PHYS


58934 02-Apr-2000 phk

Move B_ERROR flag to b_ioflags and call it BIO_ERROR.

(Much of this done by script)

Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED.

Move b_pblkno and b_iodone_chain to struct bio while we transition, they
will be obsoleted once bio structs chain/stack.

Add bio_queue field for struct bio aware disksort.

Address a lot of stylistic issues brought up by bde.


58706 27-Mar-2000 dillon

Commit the buffer cache cleanup patch to 4.x and 5.x. This patch fixes a
fragmentation problem due to geteblk() reserving too much space for the
buffer and imposes a larger granularity (16K) on KVA reservations for
the buffer cache to avoid fragmentation issues. The buffer cache size
calculations have been redone to simplify them (fewer defines, better
comments, less chance of running out of KVA).

The geteblk() fix solves a performance problem that DG was able reproduce.

This patch does not completely fix the KVA fragmentation problems, but
it goes a long way

Mostly Reviewed by: bde and others
Approved by: jkh


56674 27-Jan-2000 nyan

Supported non-512 bytes/sector format.

PR: misc/12992
Submitted by: chi@bd.mbn.or.jp (Chiharu Shibata) and
Dmitrij Tejblum <tejblum@arc.hq.cti.ru>
Reviewed by: Dmitrij Tejblum <tejblum@arc.hq.cti.ru>


55756 10-Jan-2000 phk

Give vn_isdisk() a second argument where it can return a suitable errno.

Suggested by: bde


55594 08-Jan-2000 bp

Treat negative uio_offset value as eof (idea by: bde).
Prevent overflows by casting uio_offset to uoff_t.
Return correct error number if directory entry is broken.

Reviewed by: bde


55308 02-Jan-2000 bp

Fix the mess with signed/unsigned longs and ints (inspired by bde).
Fix potential bug with directory reading.
Explicitly limit file size to 4GB (msdos can't handle larger files).
Slightly reorganize msdosfs_read() to reduce number of 'if's.


55206 29-Dec-1999 peter

Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot). This is consistant with the other
BSD's who made this change quite some time ago. More commits to come.


55190 28-Dec-1999 bp

Avoid to write garbage if uiomove fails.


55189 28-Dec-1999 bp

Fix an overflow in the msdosfs_read() function which exposed on the files
with size > 2GB.

PR: 15639
Submitted by: Tim Kientzle <kientzle@acm.org>
Reviewed by: phk


55188 28-Dec-1999 bp

It is possible that number of sectors specified in the BPB
will exceed FAT capacity. This will lead to kernel panic while other
systems just limit number of clusters.

PR: 4381, 15136
Reviewed by: phk


54803 19-Dec-1999 rwatson

Second pass commit to introduce new ACL and Extended Attribute system
calls, vnops, vfsops, both in /kern, and to individual file systems that
require a vfsop_ array entry.

Reviewed by: eivind


54655 15-Dec-1999 eivind

Introduce NDFREE (and remove VOP_ABORTOP)


53452 20-Nov-1999 phk

struct mountlist and struct mount.mnt_list have no business being
a CIRCLEQ. Change them to TAILQ_HEAD and TAILQ_ENTRY respectively.

This removes ugly mp != (void*)&mountlist comparisons.

Requested by: phk
Submitted by: Jake Burkholder jake@checker.org
PR: 14967


53059 09-Nov-1999 phk

Next step in the device cleanup process.

Correctly lock vnodes when calling VOP_OPEN() from filesystem mount code.

Unify spec_open() for bdev and cdev cases.

Remove the disabled bdev specific read/write code.


51486 20-Sep-1999 dillon

More removals of vnode->v_lastr, replaced by preexisting seqcount
heuristic to detect sequential operation.

VM-related forced clustering code removed from ufs in preparation for a
commit to vm/vm_fault.c that does it more generally.

Reviewed by: David Greenman <dg@root.com>, Alan Cox <alc@cs.rice.edu>


51138 11-Sep-1999 alfred

Seperate the export check in VFS_FHTOVP, exports are now checked via
VFS_CHECKEXP.

Add fh(open|stat|stafs) syscalls to allow userland to query filesystems
based on (network) filehandle.

Obtained from: NetBSD


51068 07-Sep-1999 alfred

All unimplemented VFS ops now have entries in kern/vfs_default.c that return
reasonable defaults.

This avoids confusing and ugly casting to eopnotsupp or making dummy functions.
Bogus casting of filesystem sysctls to eopnotsupp() have been removed.

This should make *_vfsops.c more readable and reduce bloat.

Reviewed by: msmith, eivind
Approved by: phk
Tested by: Jeroen Ruigrok/Asmodai <asmodai@wxs.nl>


50477 28-Aug-1999 peter

$Id$ -> $FreeBSD$


50347 25-Aug-1999 phk

Introduce vn_isdisk(struct vnode *vp) function, and use it to test for diskness.


50256 23-Aug-1999 bde

Initialise fsids with (user) device numbers again. Bitrot when dev_t's
were changed to pointers was obscured by casting dev_t's to longs.
fsids haven't even been comprised of longs since the Lite2 merge.


49679 13-Aug-1999 phk

The bdevsw() and cdevsw() are now identical, so kill the former.


49535 08-Aug-1999 phk

Decommision miscfs/specfs/specdev.h. Most of it goes into <sys/conf.h>,
a few lines into <sys/vnode.h>.

Add a few fields to struct specinfo, paving the way for the fun part.


49075 25-Jul-1999 bde

Don't set DE_ACCESS for unsuccessful reads.
Translated from: a similar fix in ufs_readwrite.c rev.1.61.

Don't forget to set DE_ACCESS for short reads.

Check for invalid (negative) offsets before checking for reads of
0 bytes, as in ufs, although checking for invalid offsets at all
is probably a bug.


48425 01-Jul-1999 peter

move <sys/systm.h> before <sys/buf.h>


48225 26-Jun-1999 mckusick

Convert buffer locking from using the B_BUSY and B_WANTED flags to using
lockmgr locks. This commit should be functionally equivalent to the old
semantics. That is, all buffer locking is done with LK_EXCLUSIVE
requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will
be done in future commits.


47640 31-May-1999 phk

Simplify cdevsw registration.

The cdevsw_add() function now finds the major number(s) in the
struct cdevsw passed to it. cdevsw_add_generic() is no longer
needed, cdevsw_add() does the same thing.

cdevsw_add() will print an message if the d_maj field looks bogus.

Remove nblkdev and nchrdev variables. Most places they were used
bogusly. Instead check a dev_t for validity by seeing if devsw()
or bdevsw() returns NULL.

Move bdevsw() and devsw() functions to kern/kern_conf.c

Bump __FreeBSD_version to 400006

This commit removes:
72 bogus makedev() calls
26 bogus SYSINIT functions

if_xe.c bogusly accessed cdevsw[], author/maintainer please fix.

I4b and vinum not changed. Patches emailed to authors. LINT
probably broken until they catch up.


47028 11-May-1999 phk

Divorce "dev_t" from the "major|minor" bitmap, which is now called
udev_t in the kernel but still called dev_t in userland.

Provide functions to manipulate both types:
major() umajor()
minor() uminor()
makedev() umakedev()
dev2udev() udev2dev()

For now they're functions, they will become in-line functions
after one of the next two steps in this process.

Return major/minor/makedev to macro-hood for userland.

Register a name in cdevsw[] for the "filedescriptor" driver.

In the kernel the udev_t appears in places where we have the
major/minor number combination, (ie: a potential device: we
may not have the driver nor the device), like in inodes, vattr,
cdevsw registration and so on, whereas the dev_t appears where
we carry around a reference to a actual device.

In the future the cdevsw and the aliased-from vnode will be hung
directly from the dev_t, along with up to two softc pointers for
the device driver and a few houskeeping bits. This will essentially
replace the current "alias" check code (same buck, bigger bang).

A little stunt has been provided to try to catch places where the
wrong type is being used (dev_t vs udev_t), if you see something
not working, #undef DEVT_FASCIST in kern/kern_conf.c and see if
it makes a difference. If it does, please try to track it down
(many hands make light work) or at least try to reproduce it
as simply as possible, and describe how to do that.

Without DEVT_FASCIST I belive this patch is a no-op.

Stylistic/posixoid comments about the userland view of the <sys/*.h>
files welcome now, from userland they now contain the end result.

Next planned step: make all dev_t's refer to the same devsw[] which
means convert BLK's to CHR's at the perimeter of the vnodes and
other places where they enter the game (bootdev, mknod, sysctl).


46676 08-May-1999 phk

I got tired of seeing all the cdevsw[major(foo)] all over the place.

Made a new (inline) function devsw(dev_t dev) and substituted it.

Changed to the BDEV variant to this format as well: bdevsw(dev_t dev)

DEVFS will eventually benefit from this change too.


46635 07-May-1999 phk

Continue where Julian left off in July 1998:

Virtualize bdevsw[] from cdevsw. bdevsw() is now an (inline)
function.

Join CDEV_MODULE and BDEV_MODULE to DEV_MODULE (please pay attention
to the order of the cmaj/bmaj arguments!)

Join CDEV_DRIVER_MODULE and BDEV_DRIVER_MODULE to DEV_DRIVER_MODULE
(ditto!)

(Next step will be to convert all bdev dev_t's to cdev dev_t's
before they get to do any damage^H^H^H^H^H^Hwork in the kernel.)


46568 06-May-1999 peter

Add sufficient braces to keep egcs happy about potentially ambiguous
if/else nesting.


46155 28-Apr-1999 phk

This Implements the mumbled about "Jail" feature.

This is a seriously beefed up chroot kind of thing. The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.

For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact: "real virtual servers".

Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.

Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.

It generally does what one would expect, but setting up a jail
still takes a little knowledge.

A few notes:

I have no scripts for setting up a jail, don't ask me for them.

The IP number should be an alias on one of the interfaces.

mount a /proc in each jail, it will make ps more useable.

/proc/<pid>/status tells the hostname of the prison for
jailed processes.

Quotas are only sensible if you have a mountpoint per prison.

There are no privisions for stopping resource-hogging.

Some "#ifdef INET" and similar may be missing (send patches!)

If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!

Tools, comments, patches & documentation most welcome.

Have fun...

Sponsored by: http://www.rndassociates.com/
Run for almost a year by: http://www.servetheweb.com/


46112 27-Apr-1999 phk

Suser() simplification:

1:
s/suser/suser_xxx/

2:
Add new function: suser(struct proc *), prototyped in <sys/proc.h>.

3:
s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/

The remaining suser_xxx() calls will be scrutinized and dealt with
later.

There may be some unneeded #include <sys/cred.h>, but they are left
as an exercise for Bruce.

More changes to the suser() API will come along with the "jail" code.


45098 28-Mar-1999 dt

Back out half of 1.32: don't print a message on every failed mount attempt.
It is too chatty and hardly useful. 2 mesages in somewhat usual cases are
left for now.


43305 27-Jan-1999 dillon

Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile


42252 02-Jan-1999 dt

Now empty DOS filesystems default to long file names. Non-empty filesystems
without traces of Win95 default to short file names, as before.


42249 02-Jan-1999 dt

Ensure that deHighClust in direntry always initialized.

Noticed by: Carl Mascott <cmascott@world.std.com>

Don't write access time of a file more than once per day. (Its precision is
1 day anyway). Don't try to write access and creation time in nonwin95 case.

Suggested by: bde (long time ago).


42248 02-Jan-1999 bde

Ifdefed conditionally used simplock variables.


41591 07-Dec-1998 archie

The "easy" fixes for compiling the kernel -Wunused: remove unreferenced static
and local variables, goto labels, and functions declared but not defined.


41416 29-Nov-1998 dt

Honor MNT_NOATIME.

PR: 8383
Submitted by: Carl Mascott <cmascott@world.std.com>


41275 21-Nov-1998 dt

Support NT VFAT lower case flags.

PR: 8383
(Mostly) Submitted by: Carl Mascott <cmascott@world.std.com>


41059 10-Nov-1998 peter

add #include <sys/kernel.h> where it's needed by MALLOC_DEFINE()


40790 31-Oct-1998 peter

Use TAILQ macros for clean/dirty block list processing. Set b_xflags
rather than abusing the list next pointer with a magic number.


40717 29-Oct-1998 peter

Use vtruncbuf() rather than vinvalbuf() when shortening files.


40651 25-Oct-1998 bde

Don't follow null bdevsw pointers. The `major(dev) < nblkdev' test rotted
when bdevsw[] became sparse. We still depend on magic to avoid having to
check that (v_rdev) device numbers in vnodes are not NODEV.


39129 13-Sep-1998 dt

Remove unused variable.

Pointed out by: bde


39128 13-Sep-1998 dt

Fix a bug related to renaming in root directory. This bug reported by
Cejka Rudolf <cejkar@dcse.fee.vutbr.cz> on freebsd-current in Messaage-Id
<199807141023.MAA09803@kazi.dcse.fee.vutbr.cz>.

Reviewed by: bde


38909 07-Sep-1998 bde

Removed statically configured mount type numbers (MOUNT_*) and all
references to them.

The change a couple of days ago to ignore these numbers in statically
configured vfsconf structs was slightly premature because the cd9660,
cfs, devfs, ext2fs, nfs vfs's still used MOUNT_* instead of the number
in their vfsconf struct.


38408 17-Aug-1998 bde

Removed unused includes.


37555 11-Jul-1998 bde

Fixed printf format errors.


37384 04-Jul-1998 julian

VOP_STRATEGY grows an (struct vnode *) argument
as the value in b_vp is often not really what you want.
(and needs to be frobbed). more cleanups will follow this.
Reviewed by: Bruce Evans <bde@freebsd.org>


36858 10-Jun-1998 dt

Back out previous change. This behavior is at least completely
"susv2"-compliant.


36851 10-Jun-1998 dt

Also return EOPNOTSUPP rather than EINVAL for not supported owner and group
changes.


36839 10-Jun-1998 peter

Return EOPNOTSUPP rather than EINVAL for flags that are not supported.


36811 09-Jun-1998 dt

Fix typo in a comment.


36154 18-May-1998 dt

Fix priority bug in previous commit.

Submitted by: bde


36133 17-May-1998 dt

Fix support for pre-Win95 filesystems: Make it possible to lookup just
created short file name. Don't insert "generation numbers".


36130 17-May-1998 dt

Remove bogus LK_RETRY.

Submitted by: bde


36123 17-May-1998 bde

Don't forget to clean up after an error reading the directory entry
in deget().


36122 17-May-1998 bde

Removed vestiges of pre-Lite2 locking.


36119 17-May-1998 phk

s/nanoruntime/nanouptime/g
s/microruntime/microuptime/g

Reviewed by: bde


35871 09-May-1998 dt

Fix off by ane error in previous commit.

This caused following commands:
mkdir z
cd z
touch A B
mv B A
corrupt the '..' entry in 'z'.

Reported by: bde


35823 07-May-1998 msmith

In the words of the submitter:

---------
Make callers of namei() responsible for releasing references or locks
instead of having the underlying filesystems do it. This eliminates
redundancy in all terminal filesystems and makes it possible for stacked
transport layers such as umapfs or nullfs to operate correctly.

Quality testing was done with testvn, and lat_fs from the lmbench suite.

Some NFS client testing courtesy of Patrik Kudo.

vop_mknod and vop_symlink still release the returned vpp. vop_rename
still releases 4 vnode arguments before it returns. These remaining cases
will be corrected in the next set of patches.
---------

Submitted by: Michael Hancock <michaelh@cet.co.jp>


35769 06-May-1998 msmith

As described by the submitter:

Reverse the VFS_VRELE patch. Reference counting of vnodes does not need
to be done per-fs. I noticed this while fixing vfs layering violations.
Doing reference counting in generic code is also the preference cited by
John Heidemann in recent discussions with him.

The implementation of alternative vnode management per-fs is still a valid
requirement for some filesystems but will be revisited sometime later,
most likely using a different framework.

Submitted by: Michael Hancock <michaelh@cet.co.jp>


35511 29-Apr-1998 dt

Use DFLTBSIZE instead of MAXBSIZE for pm_fatblksize.

In msdosfs_sync: spelling fix, formatting changes; fix MNT_LAZY (sync
modified denodes, don't sync device)

Mostly submitted by (and with hints from): bde

Increase limit for maximum disk size: as far as I can see previous limit was
gratuitously too low.


35210 15-Apr-1998 bde

Support compiling with `gcc -ansi'.


35202 15-Apr-1998 dt

Add a missing LK_RETRY.
Noticed by: Bruce (almost 2 monts ago)

Remove a debugging printf.


35063 06-Apr-1998 phk

Use random() rather then than homegrown stuff.


35046 05-Apr-1998 ache

Print explanation diagnostics when mount is impossible
Submitted by: Dmitrij Tejblum <dima@tejblum.dnttm.rssi.ru>


35029 04-Apr-1998 phk

Time changes mark 2:

* Figure out UTC relative to boottime. Four new functions provide
time relative to boottime.

* move "runtime" into struct proc. This helps fix the calcru()
problem in SMP.

* kill mono_time.

* add timespec{add|sub|cmp} macros to time.h. (XXX: These may change!)

* nanosleep, select & poll takes long sleeps one day at a time

Reviewed by: bde
Tested by: ache and others


34920 28-Mar-1998 ache

Fix dead hang writing to FAT
Submitted by: Dmitrij Tejblum <dima@tejblum.dnttm.rssi.ru>


34901 26-Mar-1998 phk

Add two new functions, get{micro|nano}time.

They are atomic, but return in essence what is in the "time" variable.
gettime() is now a macro front for getmicrotime().

Various patches to use the two new functions instead of the various
hacks used in their absence.

Some puntuation and grammer patches from Bruce.

A couple of XXX comments.


34698 20-Mar-1998 kato

Deleted 1024bytes/sector floppy code for PC-98 arch. The
1024bytes/sector code has not worked for long time and it should be
re-implemented.


34266 08-Mar-1998 julian

Reviewed by: dyson@freebsd.org (john Dyson), dg@root.com (david greenman)
Submitted by: Kirk McKusick (mcKusick@mckusick.com)
Obtained from: WHistle development tree


34096 06-Mar-1998 msmith

Trivial filesystem getpages/putpages implementations, set the second.
These should be considered the first steps in a work-in-progress.
Submitted by: Terry Lambert <terry@freebsd.org>


34002 03-Mar-1998 msmith

Patch to the last commit; attempt to unspam stuff from NetBSD.
Submitted by: Dmitrij Tejblum <dima@tejblum.dnttm.rssi.ru>


33964 01-Mar-1998 msmith

The intent is to get rid of WILLRELE in vnode_if.src by making
a complement to all ops that return a vpp, VFS_VRELE. This is
initially only for file systems that implement the following ops
that do a WILLRELE:

vop_create, vop_whiteout, vop_mknod, vop_remove, vop_link,
vop_rename, vop_mkdir, vop_rmdir, vop_symlink

This is initial DNA that doesn't do anything yet. VFS_VRELE is
implemented but not called.

A default vfs_vrele was created for fs implementations that use the
standard vnode management routines.

VFS_VRELE implementations were made for the following file systems:

Standard (vfs_vrele)
ffs mfs nfs msdosfs devfs ext2fs

Custom
union umapfs

Just EOPNOTSUPP
fdesc procfs kernfs portal cd9660

These implementations may change as VOP changes are implemented.

In the next phase, in the vop implementations calls to vrele and the vrele
part of vput will be moved to the top layer vfs_vnops and made visible
to all layers. vput will be replaced by unlock in these cases. Unlocking
will still be done in the per fs layer but the refcount decrement will be
triggered at the top because it doesn't hurt to hold a vnode reference a
little longer. This will have minimal impact on the structure of the
existing code.

This will only be done for vnode arguments that are released by the various
fs vop implementations.

Wider use of VFS_VRELE will likely require restructuring of the code.

Reviewed by: phk, dyson, terry et. al.
Submitted by: Michael Hancock <michaelh@cet.co.jp>


33959 01-Mar-1998 msmith

Fix mmap() on msdosfs. In the words of the submitter:

|In the process of evaluating the getpages/putpages issues I discovered
|that mmap on MSDOSFS does not work. This is because I blindly merged
|NetBSD changes in msdosfs_bmap and msdosfs_strategy. Apparently, their
|blocksize is always DEV_BSIZE (even in files), while in FreeBSD
|blocksize in files is v_mount->mnt_stat.f_iosize (i.e. clustersize in
|MSDOSFS case). The patch is below.

Submitted by: Dmitrij Tejblum <dima@tejblum.dnttm.rssi.ru>


33872 27-Feb-1998 msmith

Fix a problem with the conversion of Unix filenames into the VFAT
namespace.
Submitted by: Dmitrij Tejblum <dima@tejblum.dnttm.rssi.ru>


33848 26-Feb-1998 msmith

Fixes for some bugs in the VFAT/FAT32 support:

- 'mv longnamedfile1 longnamedfile2' would cause longnamedfile2 to lose its
long name.
- Long names have trailing spaces/dots stripped for lookup as well as
assignment.
- A lockup when the mdsosfs was accessed from within the Linux emulator is fixed.
- A bug whereby long filenames were recognised by Microsoft operating systems but
not FreeBSD is fixed.

Submitted by: Dmitrij Tejblum <dima@tejblum.dnttm.rssi.ru>


33791 24-Feb-1998 ache

Back out "always view in lowercase" part
Return to previous variant "comparing in lowercase" in winChkName


33768 23-Feb-1998 ache

Implement loadable DOS<->local conversion tables for DOS names
Always create DOS name in uppercase
Always view DOS name in lowercase


33765 23-Feb-1998 kato

Fix signatures of NEC's DOS formats.

Submitted by: Takahashi Yoshihiro <nyan@wyvern.cc.kogakuin.ac.jp>


33762 23-Feb-1998 ache

Oops, add missing bcopy of upper->lower table


33760 23-Feb-1998 ache

Implement loadable upper->lower local conversion table


33751 22-Feb-1998 ache

Reduce new arguments number added in my changes


33750 22-Feb-1998 ache

Add Unicode support to winChkName, now lookup works!


33747 22-Feb-1998 ache

Implement loadable local<->unicode file names conversion
Note: it produce correct names only for Win95, DOS names are still
incorrect and need similar work
mount_msdos support coming soon


33745 22-Feb-1998 ache

Replace all unknown Unicode characters with '?' in win->unix mapping


33744 22-Feb-1998 ache

Add initial support to map 0x4XX Unicode Cyrillic range names:
only win->unix part is implemented at this time with 256-byte
table defaulted to KOI8-R (will be loadable in future).
Since back mapping not supported yet, you'll get "No such file or directory"
on each Cyrillic name with 'ls -l', only 'echo *' work at this moment.
Teach current code to understand Unicode a bit.


33676 20-Feb-1998 bde

Removed unused #includes.


33548 18-Feb-1998 jkh

Update MSDOSFS code using NetBSD's msdosfs as a guide to support
FAT32 partitions. Unfortunately, we looked around here at
Walnut Creek CDROM for any newer FAT32-supporting versions
of Win95 and we were unsuccessful; only the older stuff here.
So this is untested beyond simply making sure it compiles and
someone with access to an actual FAT32 fs will have
to let us know how well it actually works.
Submitted by: Dmitrij Tejblum <dima@tejblum.dnttm.rssi.ru>
Obtained from: NetBSD


33181 09-Feb-1998 eivind

Staticize.


33134 06-Feb-1998 eivind

Back out DIAGNOSTIC changes.


33108 04-Feb-1998 eivind

Turn DIAGNOSTIC into a new-style option.


32011 27-Dec-1997 bde

Unspammed nested include of <vm/vm_zone.h>.


31132 12-Nov-1997 julian

Reviewed by: various.

Ever since I first say the way the mount flags were used I've hated the
fact that modes, and events, internal and exported, and short-term
and long term flags are all thrown together. Finally it's annoyed me enough..
This patch to the entire FreeBSD tree adds a second mount flag word
to the mount struct. it is not exported to userspace. I have moved
some of the non exported flags over to this word. this means that we now
have 8 free bits in the mount flags. There are another two that might
well move over, but which I'm not sure about.
The only user visible change would have been in pstat -v, except
that davidg has disabled it anyhow.
I'd still like to move the state flags and the 'command' flags
apart from each other.. e.g. MNT_FORCE really doesn't have the
same semantics as MNT_RDONLY, but that's left for another day.


30780 27-Oct-1997 bde

Removed unused #includes. The need for most of them went away with
recent changes (docluster* and vfs improvements).


30743 26-Oct-1997 phk

VFS interior redecoration.

Rename vn_default_error to vop_defaultop all over the place.
Move vn_bwrite from vfs_bio.c to vfs_default.c and call it vop_stdbwrite.
Use vop_null instead of nullop.
Move vop_nopoll from vfs_subr.c to vfs_default.c
Move vop_sharedlock from vfs_subr.c to vfs_default.c
Move vop_nolock from vfs_subr.c to vfs_default.c
Move vop_nounlock from vfs_subr.c to vfs_default.c
Move vop_noislocked from vfs_subr.c to vfs_default.c
Use vop_ebadf instead of *_ebadf.
Add vop_defaultop for getpages on master vnode in MFS.


30513 17-Oct-1997 phk

Make a set of VOP standard lock, unlock & islocked VOP operators, which
depend on the lock being located at vp->v_data. Saves 3x3 identical
vop procs, more as the other filesystems becomes lock aware.


30492 16-Oct-1997 phk

Another VFS cleanup "kilo commit"

1. Remove VOP_UPDATE, it is (also) an UFS/{FFS,LFS,EXT2FS,MFS}
intereface function, and now lives in the ufsmount structure.

2. Remove VOP_SEEK, it was unused.

3. Add mode default vops:

VOP_ADVLOCK vop_einval
VOP_CLOSE vop_null
VOP_FSYNC vop_null
VOP_IOCTL vop_enotty
VOP_MMAP vop_einval
VOP_OPEN vop_null
VOP_PATHCONF vop_einval
VOP_READLINK vop_einval
VOP_REALLOCBLKS vop_eopnotsupp

And remove identical functionality from filesystems

4. Add vop_stdpathconf, which returns the canonical stuff. Use
it in the filesystems. (XXX: It's probably wrong that specfs
and fifofs sets this vop, shouldn't it come from the "host"
filesystem, for instance ufs or cd9660 ?)

5. Try to make system wide VOP functions have vop_* names.

6. Initialize the um_* vectors in LFS.

(Recompile your LKMS!!!)


30474 16-Oct-1997 phk

VFS mega cleanup commit (x/N)

1. Add new file "sys/kern/vfs_default.c" where default actions for
VOPs go. Implement proper defaults for ABORTOP, BWRITE, LEASE,
POLL, REVOKE and STRATEGY. Various stuff spread over the entire
tree belongs here.

2. Change VOP_BLKATOFF to a normal function in cd9660.

3. Kill VOP_BLKATOFF, VOP_TRUNCATE, VOP_VFREE, VOP_VALLOC. These
are private interface functions between UFS and the underlying
storage manager layer (FFS/LFS/MFS/EXT2FS). The functions now
live in struct ufsmount instead.

4. Remove a kludge of VOP_ functions in all filesystems, that did
nothing but obscure the simplicity and break the expandability.
If a filesystem doesn't implement VOP_FOO, it shouldn't have an
entry for it in its vnops table. The system will try to DTRT
if it is not implemented. There are still some cruft left, but
the bulk of it is done.

5. Fix another VCALL in vfs_cache.c (thanks Bruce!)


30434 15-Oct-1997 phk

Hmm, realign the vnops into two columns.


30431 15-Oct-1997 phk

Stylistic overhaul of vnops tables.
1. Remove comment stating the blatantly obvious.
2. Align in two columns.
3. Sort all but the default element alphabetically.
4. Remove XXX comments pointing out entries not needed.


30354 12-Oct-1997 phk

Last major round (Unless Bruce thinks of somthing :-) of malloc changes.

Distribute all but the most fundamental malloc types. This time I also
remembered the trick to making things static: Put "static" in front of
them.

A couple of finer points by: bde


30309 11-Oct-1997 phk

Distribute and statizice a lot of the malloc M_* types.

Substantial input from: bde


29653 21-Sep-1997 dyson

Change the M_NAMEI allocations to use the zone allocator. This change
plus the previous changes to use the zone allocator decrease the useage
of malloc by half. The Zone allocator will be upgradeable to be able
to use per CPU-pools, and has more intelligent usage of SPLs. Additionally,
it has reasonable stats gathering capabilities, while making most calls
inline.


29362 14-Sep-1997 peter

Convert select -> poll.
Delete 'always succeed' select/poll handlers, replaced with generic call.
Flag missing vnode op table entries.


29286 10-Sep-1997 phk

Fix a type in a comment and remove some checks now done centrally.


29041 02-Sep-1997 bde

Removed unused #includes.


28787 26-Aug-1997 phk

Uncut&paste cache_lookup().

This unifies several times in theory indentical 50 lines of code.

The filesystems have a new method: vop_cachedlookup, which is the
meat of the lookup, and use vfs_cache_lookup() for their vop_lookup
method. vfs_cache_lookup() will check the namecache and pass on
to the vop_cachedlookup method in case of a miss.

It's still the task of the individual filesystems to populate the
namecache with cache_enter().

Filesystems that do not use the namecache will just provide the
vop_lookup method as usual.


28774 26-Aug-1997 dyson

Back out some incorrect changes that was worse than the original bug.


28558 22-Aug-1997 dyson

This is a trial improvement for the vnode reference count while on the vnode
free list problem. Also, the vnode age flag is no longer used by the
vnode pager. (It is actually incorrect to use then.) Constructive
feedback welcome -- just be kind.


28270 16-Aug-1997 wollman

Fix all areas of the system (or at least all those in LINT) to avoid storing
socket addresses in mbufs. (Socket buffers are the one exception.) A number
of kernel APIs needed to get fixed in order to make this happen. Also,
fix three protocol families which kept PCBs in mbufs to not malloc them
instead. Delete some old compatibility cruft while we're at it, and add
some new routines in the in_cksum family.


25877 17-May-1997 phk

Remove redundant check for vp == dvp (done in VFS before calling).


24787 10-Apr-1997 bde

Get the declaration of `struct dirent' from <sys/dirent.h>, not from
<sys/dir.h>.

Removed unused #include.

Fixed type and order of struct members in pseudo-declaration of `struct
vop_readdir_args'.


24131 23-Mar-1997 bde

Don't #include <sys/fcntl.h> in <sys/file.h> if KERNEL is defined.
Fixed everything that depended on getting fcntl.h stuff from the wrong
place. Most things don't depend on file.h stuff at all.


23997 18-Mar-1997 peter

Restore the lost MNT_LOCAL flag twiddle. Lite2 has a different mechanism
of setting it (compiled into vfs_conf.c), but we have a dynamic system
in place. This could probably be better done via a runtime configure
flag in the VFS_SET() VFS declaration, perhaps VFCF_LOCAL, and have the
VFS code propagate this down into MNT_LOCAL at mount time. The other FS's
would need to be updated, havinf UFS and MSDOSFS filesystems without
MNT_LOCAL breaks a few things.. the man page rebuild scans for local
filesystems and currently fails, I suspect that other tools like find
and tar with their "local filesystem only" modes might be affected.


23351 03-Mar-1997 bde

Don't export kernel interfaces to applications. msdosfs_mount probably
didn't compile before this change.

Added idempotency ifdef.


23134 26-Feb-1997 bde

Updated msdosfs to use Lite2 vfs configuration and Lite2 locking. It
should now work as (un)well as before the Lite2 merge.


22975 22-Feb-1997 peter

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


22601 12-Feb-1997 mpp

Make this compile without warnings after the Lite2 merge:

- *fs_init routines now take a "struct vfsconf * vfsp" pointer
as an argument.
- Use the correct type for cookies.
- Update function prototypes.

Submitted by: bde


22521 10-Feb-1997 dyson

This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.

Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>


21673 14-Jan-1997 jkh

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


20910 25-Dec-1996 bde

Don't synchronously update the directory entry at the end of every
successful write. Only do it for the IO_SYNC case (like ufs). On
one of my systems, this speeds up `iozone 24 512' from 32K/sec
(1/128 as fast as ufs) to 2.8MB/sec (7/10 as fast as ufs).

Obtained from: partly from NetBSD


20138 04-Dec-1996 bde

Fixed an off by 1 error in unix2dostime(). The first day of each month
was converted to the last day of the previous month. This bug was
introduced in the optimizations in rev.1.4.


18640 02-Oct-1996 dyson

MSDOS FS used to allocate a buffer before extending the VM object. In
certain error conditions, it is possible for pages to be left allocated
in the object beyond it's end. It is generally bad practice to allocate
pages beyond the end of an object.


18397 19-Sep-1996 nate

In sys/time.h, struct timespec is defined as:

/*
* Structure defined by POSIX.4 to be like a timeval.
*/
struct timespec {
time_t ts_sec; /* seconds */
long ts_nsec; /* and nanoseconds */
};

The correct names of the fields are tv_sec and tv_nsec.

Reminded by: James Drobina <jdrobina@infinet.com>


18020 03-Sep-1996 bde

Eliminated nested include of <sys/unistd.h> in <sys/file.h> in the kernel.
Include it directly in the few places where it is used.

Reduced some #includes of <sys/file.h> to #includes of <sys/fcntl.h> or
nothing.


17314 28-Jul-1996 ache

bzero reserved field into directory entry, junk here cause
scandisk error under Win95


16363 14-Jun-1996 asami

The Great PC98 Merge.

All new code is "#ifdef PC98"ed so this should make no difference to
PC/AT (and its clones) users.

Ok'd by: core
Submitted by: FreeBSD(98) development team


16312 12-Jun-1996 dg

Moved the fsnode MALLOC to before the call to getnewvnode() so that the
process won't possibly block before filling in the fsnode pointer (v_data)
which might be dereferenced during a sync since the vnode is put on the
mnt_vnodelist by getnewvnode.

Pointed out by Matt Day <mday@artisoft.com>


15055 05-Apr-1996 ache

Fix adjkerntz expression priority.
Make filetimes the same as DOS times for UTC cmos clock.


15053 05-Apr-1996 ache

Don't adjust file times for UTC clock to have the same timestamps
for DOS/FreeBSD.


15033 03-Apr-1996 gpalmer

add a `Warning:' to the message saying that the root directory is not a
multiple of the clustersize in length to try and reduce the number
of questions we get on the subject.


13765 30-Jan-1996 mpp

Fix a bunch of spelling errors in the comment fields of
a bunch of system include files.


13490 19-Jan-1996 dyson

Eliminated many redundant vm_map_lookup operations for vm_mmap.
Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish
overhead for merged cache.
Efficiency improvement for vfs_cluster. It used to do alot of redundant
calls to cluster_rbuild.
Correct the ordering for vrele of .text and release of credentials.
Use the selective tlb update for 486/586/P6.
Numerous fixes to the size of objects allocated for files. Additionally,
fixes in the various pagers.
Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs.
Fixes in the swap pager for exhausted resources. The pageout code
will not as readily thrash.
Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into
page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE),
thereby improving efficiency of several routines.
Eliminate even more unnecessary vm_page_protect operations.
Significantly speed up process forks.
Make vm_object_page_clean more efficient, thereby eliminating the pause
that happens every 30seconds.
Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the
case of filesystems mounted async.
Fix a panic with busy pages when write clustering is done for non-VMIO
buffers.


13260 05-Jan-1996 wollman

Convert QUOTA to new-style option.


12662 07-Dec-1995 dg

Untangled the vm.h include file spaghetti.


12596 03-Dec-1995 bde

Added prototypes.


12338 16-Nov-1995 bde

Moved declarations for static functions to the correct place (not in a
header) and cleaned them up.


12265 13-Nov-1995 bde

Fixed getdirentries() on nfs mounted msdosfs's. No cookies were returned
for certain common combinations of directory sizes, cluster sizes, and i/o
sizes (e.g., 4K, 4K, and 4K). The fix in rev. 1.21 was incomplete.

Reviewed by: dfr
Obtained from: party from NetBSD


12158 09-Nov-1995 bde

Introduced a type `vop_t' for vnode operation functions and used
it 1138 times (:-() in casts and a few more times in declarations.
This change is null for the i386.

The type has to be `typedef int vop_t(void *)' and not `typedef
int vop_t()' because `gcc -Wstrict-prototypes' warns about the
latter. Since vnode op functions are called with args of different
(struct pointer) types, neither of these function types is any use
for type checking of the arg, so it would be preferable not to use
the complete function type, especially since using the complete
type requires adding 1138 casts to avoid compiler warnings and
another 40+ casts to reverse the function pointer conversions before
calling the functions.


12145 07-Nov-1995 phk

missed one static thingie.


12144 07-Nov-1995 phk

staticize private parts.


11977 31-Oct-1995 pst

Pad out MSDOS boot block to 512 bytes (bugfix only)
Submitted by: Andreas Haakh, ah@alman.RoBIN.de


11921 29-Oct-1995 phk

Second batch of cleanup changes.
This time mostly making a lot of things static and some unused
variables here and there.


11644 22-Oct-1995 dg

Moved the filesystem read-only check out of the syscalls and into the
filesystem layer, as was done in lite-2. Merged in some other cosmetic
changes while I was at it. Rewrote most of msdosfs_access() to be more
like ufs_access() and to include the FS read-only check.

Obtained from: partially from 4.4BSD-lite2


11297 07-Oct-1995 bde

Return EINVAL instead of panicing for rename("dir1", "dir2/..").

Fixes part of PR 760.

This bug seems to be very old.


10551 04-Sep-1995 dyson

Added VOP_GETPAGES/VOP_PUTPAGES and also the "backwards" block count
for VOP_BMAP. Updated affected filesystems...


10272 25-Aug-1995 bde

Fix bogus arg (&p instead of p) in the call to VOP_ACCESS() from
msdosfs_setattr(). The bug was benign because the arg isn't used.


9878 03-Aug-1995 dfr

Make sure that a non-null cookie vector is returned even if there were no
valid entries in the block. Doing otherwise confuses the nfs server.


9862 02-Aug-1995 dfr

Add support for the va_filerev attribute required by NFSv3.


9842 01-Aug-1995 dg

Removed my special-case hack for VOP_LINK and fixed the problem with the
wrong vp's ops vector being used by changing the VOP_LINK's argument order.
The special-case hack doesn't go far enough and breaks the generic
bypass routine used in some non-leaf filesystems. Pointed out by Kirk
McKusick.


9354 28-Jun-1995 dg

Fixed VOP_LINK argument order botch.


9202 11-Jun-1995 rgrimes

Merge RELENG_2_0_5 into HEAD


8876 30-May-1995 rgrimes

Remove trailing whitespace.


8386 09-May-1995 bde

Submitted by: Mike Pritchard <pritc003@maroon.tc.umn.edu>

msdosfs_lookup() did no validation to see if the caller was validated
to delete/rename/create files. msdosfs_setattr() did no validation
to see if the caller was allowed to change the file permissions (turn
on/off the write bit) or update the file modification time (utimes).

The routines were fixed to validate the calls just like ufs does.


7760 11-Apr-1995 ache

Fix link sys call
Submitted by: pritc003@maroon.tc.umn.edu


7755 11-Apr-1995 bde

Submitted by: Mike Pritchard <pritc003@maroon.tc.umn.edu>

Fix PR 303: msdosfs: moving a file into another directory causes panic.

" ... the code that does the rename already has the denode
locked when msdosfs_hashins() gets called, resulting in the panic
when the routine attempts to lock the denode again.
...
The attached patch changes the msdosfs_hashins() routine to not lock the
denode. The caller is now resposible for obtaining the lock instead
of having msdosfs_hashins() do it for them."


7754 11-Apr-1995 bde

Submitted by: Wolfgang Solfrank <ws@tools.de>

Fix off-by-1-sector error in the range checking for the end of the root
directory. It was possible for the root directory to overwrite the FAT.


7465 29-Mar-1995 ache

Fix timestamps when using Wall CMOS clock,
optimize dos2unixtime()
Submitted by: pritc003@maroon.tc.umn.edu


7170 19-Mar-1995 dg

Removed redundant newlines that were in some panic strings.


7161 19-Mar-1995 dg

Removed bogus, commented out, call to vnode_pager_uncache().


7090 16-Mar-1995 bde

Add and move declarations to fix all of the warnings from `gcc -Wimplicit'
(except in netccitt, netiso and netns) and most of the warnings from
`gcc -Wnested-externs'. Fix all the bugs found. There were no serious
ones.


6303 10-Feb-1995 bde

Use the correct block number for updating the backup copy of the FAT when
deleting a file. Deleting a large file used to scramble the backup copy.


6001 29-Jan-1995 ats

Kill the comment in a comment to shut up the compiler.


5455 09-Jan-1995 dg

These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.

The majority of the merged VM/cache work is by John Dyson.

The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.

vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.

vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.

vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.

vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.

vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.

pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.

vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.

proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.

swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.

machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.

machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.

ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.

Submitted by: John Dyson and David Greenman


5241 27-Dec-1994 bde

Fix panic for `cp -p' by root to an msdos file system. Improve handling
of attributes so that `cp -p' to an msdos file system can succeed under
favourable circumstances (no uid or gid changes and no nonzero flags
except SF_ARCHIVED).

msdosfs_vnops.c:
The in-core inode flags were confused with the on-disk inode flags, so
chflags() clobbered the lock flag and caused a panic.

denode.h, msdosfs_denode.c, msdosfs_vnops.c:
Support the msdosfs archive attibute (ATTR_ARCHIVE) by mapping it to
the complement of the SF_ARCHIVED flag and setting the ATTR_ARCHIVE
bit when a file's modification time is set (but not when a file's
permissions are set; this is the standard wrong DOS behaviour).

denode.h, msdosfs_denode.c:
Remove the DE_UPDAT() macro. It was only used once, and the corresponding
macro in ufs has already been removed.

denode.h:
Don't change the timestamp for directories in DE_TIMES() (be consistent
with deupdat()).

msdosfs_vnops.c:
Handle chown() better: return EPERM instead of EINVAL if there are
insufficient permissions; otherwise, allow null changes.


5083 12-Dec-1994 bde

Fix numerous timestamp bugs.

DE_UPDATE was confused with DE_MODIFIED in some places (they do have
confusing names). Handle them exactly the same as IN_UPDATE and
IN_MODIFIED. This fixes chmod() and chown() clobbering the mtime
and other bugs.

DE_MODIFIED was set but not used.

Parenthesize macro args.

DE_TIMES() now takes a timeval arg instead of a timespec arg. It was
stupid to use a macro for speed and do unused conversions to prepare
for the macro.

Restore the left shifting of the DOS seconds count by 1. It got
lost among the shifts for the bitfields, so DOS seconds counts
appeared to range from 0 to 29 seconds (step 1) instead of 0 to 58
seconds (step 2).

Actually use the passed-in mtime in deupdat() as documented so that
utimes() works.

Change `extern __inline's to `static inline's so that msdosfs_fat.o
can be linked when it is compiled without -O.

Remove faking of directory mtimes to always be the current time. It's
more surprising for directory mtimes to change when you read the
directories than for them not to change when you write the directories.
This should be controlled by a mount-time option if at all.


4868 29-Nov-1994 ache

Restore mv check, cause panic without it
Submitted by: Ade Barkah


4057 01-Nov-1994 jkh

Fix from John Hay to avoid kernel panics when ap->a_eofflag is NULL.
I'm not sure if this is just masking another problem (like, should
ap->a_eofflag EVER be NULL?), but if it prevents a panic for now then
it may save an ALPHA customer.
Submitted by: jhay


3935 27-Oct-1994 pst

Set the EOF flag properly.
Obtained from: netbsd-bugs mailing list


3805 23-Oct-1994 martin

Fixed panic when unmounting floppy msdos filesystems. Problem was
we weren't flushing dirty buffers. Fix stolen from ffs_fsync()


3498 10-Oct-1994 phk

Cosmetics. Silence gcc -Wall


3396 06-Oct-1994 dg

Use tsleep() rather than sleep so that 'ps' is more informative about
the wait.


3311 02-Oct-1994 phk

GCC cleanup.
Reviewed by:
Submitted by:
Obtained from:


3167 28-Sep-1994 dfr

Make NFS ask the filesystems for directory cookies instead of making them
itself.


3152 27-Sep-1994 phk

Added declarations, fixed bugs due to missing decls. At least one of them
could panic a system. (I know, it paniced mine!).


2946 21-Sep-1994 wollman

Implemented loadable VFS modules, and made most existing filesystems
loadable. (NFS is a notable exception.)


2899 19-Sep-1994 dfr

Changed some NetBSD backwards compatibility code which was confusing mountd.


2893 19-Sep-1994 dfr

Added msdosfs.

Obtained from: NetBSD