History log of /freebsd-10.1-release/sys/fs/devfs/
Revision Date Author Comments
272461 03-Oct-2014 gjb

Copy stable/10@r272459 to releng/10.1 as part of
the 10.1-RELEASE process.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


267816 24-Jun-2014 kib

MFC r267564:
In msdosfs_setattr(), add a check for result of the utimes(2) permissions test.
Refactor the permission checks for utimes(2).


257122 25-Oct-2013 kib

MFC r256502:
Similar to debug.iosize_max_clamp sysctl, introduce
devfs_iosize_max_clamp sysctl, which allows/disables SSIZE_MAX-sized
i/o requests on the devfs files.

Approved by: re (glebius)


257121 25-Oct-2013 kib

MFC r256501:
Remove two instances of ARGSUSED comment, and wrap lines nearby the
code that is to be changed.

Approved by: re (glebius)


256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


254602 21-Aug-2013 kib

Make the seek a method of the struct fileops.

Tested by: pho
Sponsored by: The FreeBSD Foundation


254415 16-Aug-2013 kib

Restore the previous sendfile(2) behaviour on the block devices.
Provide valid .fo_sendfile method for several missed struct fileops.

Reviewed by: glebius
Sponsored by: The FreeBSD Foundation


253677 26-Jul-2013 avg

make path matching in devfs rules consistent and sane (and safer)

Before this change path matching had the following features:
- for device nodes the patterns were matched against full path
- in the above case '/' in a path could be matched by a wildcard
- for directories and links only the last component was matched

So, for example, a pattern like 're*' could match the following entries:
- re0 device
- responder/u0 device
- zvol/recpool directory

Although it was possible to work around this behavior (once it was spotted
and understood), it was very confusing and contrary to documentation.

Now we always match a full path for all types of devfs entries (devices,
directories, links) and a '/' has to be matched explicitly.
This behavior follows the shell globbing rules.

This change is originally developed by Jaakko Heinonen.
Many thanks!

PR: kern/122838
Submitted by: jh
MFC after: 4 weeks


249583 17-Apr-2013 gabor

- Correct mispellings of the word necessary

Submitted by: Christoph Mallon <christoph.mallon@gmx.de> (via private mail)


246472 07-Feb-2013 kib

Stop translating the ERESTART error from the open(2) into EINTR.
Posix requires that open(2) is restartable for SA_RESTART.

For non-posix objects, in particular, devfs nodes, still disable
automatic restart of the opens. The open call to a driver could have
significant side effects for the hardware.

Noted and reviewed by: jilles
Discussed with: bde
MFC after: 2 weeks


244643 23-Dec-2012 kib

Do not force a writer to the devfs file to drain the buffer writes.

Requested and tested by: Ian Lepore <freebsd@damnhippie.dyndns.org>
MFC after: 2 weeks


243039 14-Nov-2012 kib

Remove M_USE_RESERVE from the devfs cdp allocator, which is one of two
uses of M_USE_RESERVE in the kernel. This allocation is not special.

Reviewed by: alc
Tested by: pho
MFC after: 2 weeks


242833 09-Nov-2012 attilio

Complete MPSAFE VFS interface and remove MNTK_MPSAFE flag.
Porters should refer to __FreeBSD_version 1000021 for this change as
it may have happened at the same timeframe.


240539 15-Sep-2012 ed

Prefer __containerof() above member2struct().

The first does proper checking of the argument types, while the latter
does not.


239303 15-Aug-2012 hselasky

Streamline use of cdevpriv and correct some corner cases.

1) It is not useful to call "devfs_clear_cdevpriv()" from
"d_close" callbacks, hence for example read, write, ioctl and
so on might be sleeping at the time of "d_close" being called
and then then freed private data can still be accessed.
Examples: dtrace, linux_compat, ksyms (all fixed by this patch)

2) In sys/dev/drm* there are some cases in which memory will
be freed twice, if open fails, first by code in the open
routine, secondly by the cdevpriv destructor. Move registration
of the cdevpriv to the end of the drm open routines.

3) devfs_clear_cdevpriv() is not called if the "d_open" callback
registered cdevpriv data and the "d_open" callback function
returned an error. Fix this.

Discussed with: phk
MFC after: 2 weeks


238029 02-Jul-2012 kib

Extend the KPI to lock and unlock f_offset member of struct file. It
now fully encapsulates all accesses to f_offset, and extends f_offset
locking to other consumers that need it, in particular, to lseek() and
variants of getdirentries().

Ensure that on 32bit architectures f_offset, which is 64bit quantity,
always read and written under the mtxpool protection. This fixes
apparently easy to trigger race when parallel lseek()s or lseek() and
read/write could destroy file offset.

The already broken ABI emulations, including iBCS and SysV, are not
converted (yet).

Tested by: pho
No objections from: jhb
MFC after: 3 weeks


235922 24-May-2012 mav

Revert devfs part of r235911. I was unaware about old but unfinished
discussion between kib@ and gibbs@ about it.


235911 24-May-2012 mav

MFprojects/zfsd:
Revamp the CAM enclosure services driver.
This updated driver uses an in-kernel daemon to track state changes and
publishes physical path location information\for disk elements into the
CAM device database.

Sponsored by: Spectra Logic Corporation
Sponsored by: iXsystems, Inc.
Submitted by: gibbs, will, mav


232307 29-Feb-2012 mm

Add "export" to devfs_opts[] and return EOPNOTSUPP if called with it.
Fixes mountd warnings.

Reported by: kib
MFC after: 1 week


232059 23-Feb-2012 mm

To improve control over the use of mount(8) inside a jail(8), introduce
a new jail parameter node with the following parameters:

allow.mount.devfs:
allow mounting the devfs filesystem inside a jail

allow.mount.nullfs:
allow mounting the nullfs filesystem inside a jail

Both parameters are disabled by default (equals the behavior before
devfs and nullfs in jails). Administrators have to explicitly allow
mounting devfs and nullfs for each jail. The value "-1" of the
devfs_ruleset parameter is removed in favor of the new allow setting.

Reviewed by: jamie
Suggested by: pjd
MFC after: 2 weeks


231949 21-Feb-2012 kib

Fix found places where uio_resid is truncated to int.

Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the
sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from
the usermode.

Discussed with: bde, das (previous versions)
MFC after: 1 month


231379 10-Feb-2012 ed

Merge si_name and __si_namebuf.

The si_name pointer always points to the __si_namebuf member inside the
same object. Remove it and rename __si_namebuf to si_name.


231267 09-Feb-2012 mm

Add support for mounting devfs inside jails.

A new jail(8) option "devfs_ruleset" defines the ruleset enforcement for
mounting devfs inside jails. A value of -1 disables mounting devfs in
jails, a value of zero means no restrictions. Nested jails can only
have mounting devfs disabled or inherit parent's enforcement as jails are
not allowed to view or manipulate devfs(8) rules.

Utilizes new functions introduced in r231265.

Reviewed by: jamie
MFC after: 1 month


231265 09-Feb-2012 mm

Introduce the "ruleset=number" option for devfs(5) mounts.
Add support for updating the devfs mount (currently only changing the
ruleset number is supported).
Check mnt_optnew with vfs_filteropt(9).

This new option sets the specified ruleset number as the active ruleset
of the new devfs mount and applies all its rules at mount time. If the
specified ruleset doesn't exist, a new empty ruleset is created.

MFC after: 1 month


228361 09-Dec-2011 jhb

Explicitly use curthread while manipulating td_fpop during last close
of a devfs file descriptor in devfs_close_f(). The passed in td argument
may be NULL if the close was invoked by garbage collection of open
file descriptors in pending control messages in the socket buffer of a
UNIX domain socket after it was closed.

PR: kern/151758
Submitted by: Andrey Shidakov andrey shidakov ru
Submitted by: Ruben van Staveren ruben verweg com
Reviewed by: kib
MFC after: 2 weeks


227697 19-Nov-2011 kib

Existing VOP_VPTOCNP() interface has a fatal flow that is critical for
nullfs. The problem is that resulting vnode is only required to be
held on return from the successfull call to vop, instead of being
referenced.

Nullfs VOP_INACTIVE() method reclaims the vnode, which in combination
with the VOP_VPTOCNP() interface means that the directory vnode
returned from VOP_VPTOCNP() is reclaimed in advance, causing
vn_fullpath() to error with EBADF or like.

Change the interface for VOP_VPTOCNP(), now the dvp must be
referenced. Convert all in-tree implementations of VOP_VPTOCNP(),
which is trivial, because vhold(9) and vref(9) are similar in the
locking prerequisites. Out-of-tree fs implementation of VOP_VPTOCNP(),
if any, should have no trouble with the fix.

Tested by: pho
Reviewed by: mckusick
MFC after: 3 weeks (subject of re approval)


227489 13-Nov-2011 eadler

- fix duplicate "a a" in some comments

Submitted by: eadler
Approved by: simon
MFC after: 3 days


227069 04-Nov-2011 jhb

Move the cleanup of f_cdevpriv when the reference count of a devfs
file descriptor drops to zero out of _fdrop() and into devfs_close_f()
as it is only relevant for devfs file descriptors.

Reviewed by: kib
MFC after: 1 week


227062 03-Nov-2011 kib

Fix kernel panic when d_fdopen csw method is called for NULL fp.
This may happen when kernel consumer calls VOP_OPEN().

Reported by: Tavis Ormandy <taviso cmpxchg8b com> through delphij
MFC after: 3 days


226041 05-Oct-2011 kib

Export devfs inode number allocator for the kernel consumers.

Reviewed by: jhb
MFC after: 2 weeks


224914 16-Aug-2011 kib

Add the fo_chown and fo_chmod methods to struct fileops and use them
to implement fchown(2) and fchmod(2) support for several file types
that previously lacked it. Add MAC entries for chown/chmod done on
posix shared memory and (old) in-kernel posix semaphores.

Based on the submission by: glebius
Reviewed by: rwatson
Approved by: re (bz)


224743 09-Aug-2011 kib

Do not update mountpoint generation counter to the value which was not
yet acted upon by devfs_populate().

Submitted by: Kohji Okuno <okuno.kohji jp panasonic com>
Approved by: re (bz)
MFC after: 1 week


223988 13-Jul-2011 kib

While fixing the looping of a thread while devfs vnode is reclaimed,
r179247 introduced a possibility of devfs_allocv() returning spurious
ENOENT. If the vnode is selected by vnlru daemon for reclamation, then
devfs_allocv() can get ENOENT from vget() due to devfs_close() dropping
vnode lock around the call to cdevsw d_close method.

Use LK_RETRY in the vget() call, and do some part of the devfs_reclaim()
work in devfs_allocv(), clearing vp->v_data and de->de_vnode. Retry the
allocation of the vnode, now with de->de_vnode == NULL.

The check vp->v_data == NULL at the start of devfs_close() cannot be
affected by the change, since vnode lock must be held while VI_DOOMED
is set, and only dropped after the check.

Reported and tested by: Kohji Okuno <okuno.kohji jp panasonic com>
Reviewed by: attilio
MFC after: 3 weeks


216462 15-Dec-2010 jh

Don't allow user created symbolic links to cover another entries marked
with DE_USER. If a devfs rule hid such entry, it was possible to create
infinite number of symbolic links with the same name.

Reviewed by: kib


216461 15-Dec-2010 jh

- Assert that dm_lock is exclusively held in devfs_rules_apply() and
in devfs_vmkdir() while adding the entry to de_list of the parent.
- Apply devfs rules to newly created directories and symbolic links.

PR: kern/125034
Submitted by: Mateusz Guzik (original version)


216391 12-Dec-2010 jh

Handle the special ruleset 0 in devfs_ruleset_use(). An attempt set the
current ruleset to 0 with command "devfs ruleset 0" triggered a KASSERT
in devfs_ruleset_create().

PR: kern/125030
Submitted by: Mateusz Guzik


213725 12-Oct-2010 jh

Format prototypes to follow style(9) more closely.

Discussed with: kib, phk


213221 27-Sep-2010 jh

Add a new function devfs_dev_exists() to be able to find out if a
specific devfs path already exists.

The function will be used from kern_conf.c to detect duplicate device
registrations. Callers must hold the devmtx mutex.

Reviewed by: kib


213215 27-Sep-2010 jh

Add reference counting for devfs paths containing user created symbolic
links. The reference counting is needed to be able to determine if a
specific devfs path exists. For true device file paths we can traverse
the cdevp_list but a separate directory list is needed for user created
symbolic links.

Add a new directory entry flag DE_USER to mark entries which should
unreference their parent directory on deletion.

A new function to traverse cdevp_list and the directory list will be
introduced in a separate commit.

Idea from: kib
Reviewed by: kib


212966 21-Sep-2010 jh

Modify devfs_fqpn() for future use in devfs path reference counting
code:

- Accept devfs_mount and devfs_dirent as the arguments instead of a
vnode. This generalizes the function so that it can be used from
contexts where vnode references are not available.
- Accept NULL cnp argument. No '/' will be appended, if a NULL cnp is
provided.
- Make the function global and add its prototype to devfs.h.

Reviewed by: kib


212826 18-Sep-2010 jh

- For consistency, remove "." and ".." entries from de_dlist before
calling devfs_delete() (and thus possibly dropping dm_lock) in
devfs_rmdir_empty().
- Assert that we don't return doomed entries from devfs_find(). [1]

Suggested by: kib [1]
Reviewed by: kib


212660 15-Sep-2010 jh

Remove empty devfs directories automatically.

devfs_delete() now recursively removes empty parent directories unless
the DEVFS_DEL_NORECURSE flag is specified. devfs_delete() can't be
called anymore with a parent directory vnode lock held because the
possible parent directory deletion needs to lock the vnode. Thus we
unlock the parent directory vnode in devfs_remove() before calling
devfs_delete().

Call devfs_populate_vp() from devfs_symlink() and devfs_vptocnp() as now
directories can get removed.

Add a check for DE_DOOMED flag to devfs_populate_vp() because
devfs_delete() drops dm_lock before the VI_DOOMED vnode flag gets set.
This ensures that devfs_populate_vp() returns an error for directories
which are in progress of deletion.

Reviewed by: kib
Discussed on: freebsd-current (mostly silence)


211847 26-Aug-2010 jh

Set de_dir for user created symbolic links. This will be needed to be
able to resolve their parent directories.


211816 25-Aug-2010 jh

Call devfs_populate_vp() from devfs_getattr(). It was possible that
fstat(2) returned stale information through an open file descriptor.


211628 22-Aug-2010 jh

Introduce and use devfs_populate_vp() to unlock a vnode before calling
devfs_populate(). This is a prerequisite for the automatic removal of
empty directories which will be committed in the future.

Reviewed by: kib (previous version)


211531 20-Aug-2010 jhb

Add dedicated routines to toggle lockmgr flags such as LK_NOSHARE and
LK_CANRECURSE after a lock is created. Use them to implement macros that
otherwise manipulated the flags directly. Assert that the associated
lockmgr lock is exclusively locked by the current thread when manipulating
these flags to ensure the flag updates are safe. This last change required
some minor shuffling in a few filesystems to exclusively lock a brand new
vnode slightly earlier.

Reviewed by: kib
MFC after: 3 days


211513 19-Aug-2010 jh

Call dev_rel() in error paths.

Reported by: kib
Reviewed by: kib
MFC after: 2 weeks


211226 12-Aug-2010 jh

Allow user created symbolic links to cover device files and directories
if the device file appears during or after the link creation.

User created symbolic links are now inserted at the head of the
directory entry list after the "." and ".." entries. A new directory
entry flag DE_COVERED indicates that an entry is covered by a symbolic
link.

PR: kern/114057
Reviewed by: kib
Idea from: kib
Discussed on: freebsd-current (mostly silence)


210925 06-Aug-2010 kib

Enable shared lookups and externed shared ops for devfs.

In collaboration with: pho
MFC after: 1 month


210923 06-Aug-2010 kib

Add new make_dev_p(9) flag MAKEDEV_ETERNAL to inform devfs that created
cdev will never be destroyed. Propagate the flag to devfs vnodes as
VV_ETERNVALDEV. Use the flags to avoid acquiring devmtx and taking a
thread reference on such nodes.

In collaboration with: pho
MFC after: 1 month


210921 06-Aug-2010 kib

Enable shared locks for the devfs vnodes. Honor the locking mode
requested by lookup(). This should be a nop at the moment.

In collaboration with: pho
MFC after: 1 month


210918 06-Aug-2010 kib

Initialize VV_ISTTY vnode flag on the devfs vnode creation instead of
doing it on each open.

In collaboration with: pho
MFC after: 1 month


208951 09-Jun-2010 jh

Add a new function devfs_parent_dirent() for resolving devfs parent
directory entry. Use the new function in devfs_fqpn(), devfs_lookupx()
and devfs_vptocnp() instead of manually resolving the parent entry.

Reviewed by: kib


208717 01-Jun-2010 jh

Don't try to call cdevsw d_close() method when devfs_close() is called
because of insmntque1() failure.

Found with: stress2
Suggested and reviewed by: kib


207729 06-May-2010 kib

Add MAKEDEV_NOWAIT flag to make_dev_credf(9), to create a device node
in a no-sleep context. If resource allocation cannot be done without
sleep, make_dev_credf() fails and returns NULL.

Reviewed by: jh
MFC after: 2 weeks


206698 16-Apr-2010 jh

Revert r206560. The change doesn't work correctly in all cases with
multiple devfs mounts.


206560 13-Apr-2010 jh

- Ignore and report duplicate and empty device names in devfs_populate_loop()
instead of causing erratic behavior. Currently make_dev(9) can't fail, so
there is no way to report an error to make_dev(9) callers.
- Disallow using "." and ".." in device path names. It didn't work previously
but now it is reported rather than panicing.
- Treat multiple sequential slashes as single in device path names.

Discussed with: pjd


203292 31-Jan-2010 ed

Properly use dev_refl()/dev_rel() in kern.devname.

While there, perform some clean-up fixes. Update some stale comments on
struct cdev * instead of dev_t and devfs_random(). Also add some missing
whitespace.

MFC after: 1 week


200732 19-Dec-2009 ed

Let access overriding to TTYs depend on the cdev_priv, not the vnode.

Basically this commit changes two things, which improves access to TTYs
in exceptional conditions. Basically the problem was that when you ran
jexec(8) to attach to a jail, you couldn't use /dev/tty (well, also the
node of the actual TTY, e.g. /dev/pts/X). This is very inconvenient if
you want to attach to screens quickly, use ssh(1), etc.

The fixes:

- Cache the cdev_priv of the controlling TTY in struct session. Change
devfs_access() to compare against the cdev_priv instead of the vnode.
This allows you to bypass UNIX permissions, even across different
mounts of devfs.

- Extend devfs_prison_check() to unconditionally expose the device node
of the controlling TTY, even if normal prison nesting rules normally
don't allow this. This actually allows you to interact with this
device node.

To be honest, I'm not really happy with this solution. We now have to
store three pointers to a controlling TTY (s_ttyp, s_ttyvp, s_ttydp).
In an ideal world, we should just get rid of the latter two and only use
s_ttyp, but this makes certian pieces of code very impractical (e.g.
devfs, kern_exit.c).

Reported by: Many people


194532 20-Jun-2009 ed

Improve nested jail awareness of devfs by handling credentials.

Now that we start to use credentials on character devices more often
(because of MPSAFE TTY), move the prison-checks that are in place in the
TTY code into devfs.

Instead of strictly comparing the prisons, use the more common
prison_check() function to compare credentials. This means that
pseudo-terminals are only visible in devfs by processes within the same
jail and parent jails.

Even though regular users in parent jails can now interact with
pseudo-terminals from child jails, this seems to be the right approach.
These processes are also capable of interacting with the jailed
processes anyway, through signals for example.

Reviewed by: kib, rwatson (older version)


193919 10-Jun-2009 kib

VOP_IOCTL takes unlocked vnode as an argument. Due to this, v_data may
be NULL or derefenced memory may become free at arbitrary moment.

Lock the vnode in cd9660, devfs and pseudofs implementation of VOP_IOCTL
to prevent reclaim; check whether the vnode was already reclaimed after
the lock is granted.

Reported by: georg at dts su
Reviewed by: des (pseudofs)
MFC after: 2 weeks


193511 05-Jun-2009 rwatson

Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.

Discussed with: pjd


193433 04-Jun-2009 rwatson

Re-add opt_mac.h include, which is required in order for MNT_MULTILABEL
to be set properly on devfs. Otherwise, it isn't possible to set labels
on /dev nodes.

Reported by: Sergio Rodriguez <sergiorr at yahoo.com>
MFC after: 3 days


192151 15-May-2009 kib

Devfs replaces file ops vector with devfs-specific one in devfs_open(),
before the struct file is fully initialized in vn_open(), in particular,
fp->f_vnode is NULL. Other thread calling file operation before f_vnode
is set results in NULL pointer dereference in devvn_refthread().

Initialize f_vnode before calling d_fdopen() cdevsw method, that might
set file ops too.

Reported and tested by: Chris Timmons <cwt networks cwu edu>
(RELENG_7 version)
MFC after: 3 days


191990 11-May-2009 attilio

Remove the thread argument from the FSD (File-System Dependent) parts of
the VFS. Now all the VFS_* functions and relating parts don't want the
context as long as it always refers to curthread.

In some points, in particular when dealing with VOPs and functions living
in the same namespace (eg. vflush) which still need to be converted,
pass curthread explicitly in order to retain the old behaviour.
Such loose ends will be fixed ASAP.

While here fix a bug: now, UFS_EXTATTR can be compiled alone without the
UFS_EXTATTR_AUTOSTART option.

VFS KPI is heavilly changed by this commit so thirdy parts modules needs
to be recompiled. Bump __FreeBSD_version in order to signal such
situation.


190888 10-Apr-2009 rwatson

Remove VOP_LEASE and supporting functions. This hasn't been used since
the removal of NQNFS, but was left in in case it was required for NFSv4.
Since our new NFSv4 client and server can't use it for their
requirements, GC the old mechanism, as well as other unused lease-
related code and interfaces.

Due to its impact on kernel programming and binary interfaces, this
change should not be MFC'd.

Proposed by: jeff
Reviewed by: jeff
Discussed with: rmacklem, zach loafman @ isilon


189693 11-Mar-2009 kib

Enable advisory file locking for devfs vnodes.

Reported by: Timothy Redaelli <timothy redaelli eu>
MFC after: 1 week


189450 06-Mar-2009 kib

Extract the no_poll() and vop_nopoll() code into the common routine
poll_no_poll().
Return a poll_no_poll() result from devfs_poll_f() when
filedescriptor does not reference the live cdev, instead of ENXIO.

Noted and tested by: hps
MFC after: 1 week


187959 31-Jan-2009 bz

Remove unused local variables.

Submitted by: Christoph Mallon christoph.mallon@gmx.de
Reviewed by: kib
MFC after: 2 weeks


187864 28-Jan-2009 ed

Mark most often used sysctl's as MPSAFE.

After running a `make buildkernel', I noticed most of the Giant locks in
sysctl are only caused by a very small amount of sysctl's:

- sysctl.name2oid. This one is locked by SYSCTL_LOCK, just like
sysctl.oidfmt.

- kern.ident, kern.osrelease, kern.version, etc. These are just constant
strings.

- kern.arandom, used by the stack protector. It is already protected by
arc4_mtx.

I also saw the following sysctl's show up. Not as often as the ones
above, but still quite often:

- security.jail.jailed. Also mark security.jail.list as MPSAFE. They
don't need locking or already use allprison_lock.

- kern.devname, used by devname(3), ttyname(3), etc.

This seems to reduce Giant locking inside sysctl by ~75% in my primitive
test setup.


186911 08-Jan-2009 trasz

Don't panic with "vinvalbuf: dirty bufs" when the mounted device that was
being written to goes away.

Reviewed by: kib, scottl
Approved by: rwatson (mentor)
Sponsored by: FreeBSD Foundation


185980 12-Dec-2008 kib

Do not leak defs_de_interlock on error.

Another pointy hat for my collection.


185959 12-Dec-2008 marcus

Implement VOP_VPTOCNP for devfs. Directory and character device vnodes are
properly translated to their component names.

Reviewed by: arch
Approved by: kib


184413 28-Oct-2008 trasz

Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessary
to add more V* constants, and the variables changed by this patch were often
being assigned to mode_t variables, which is 16 bit.

Approved by: rwatson (mentor)


183383 26-Sep-2008 kib

Save previous content of the td_fpop before storing the current
filedescriptor into it. Make sure that td_fpop is NULL when calling
d_mmap from dev_pager_getpages().

Change guards against td_fpop field being non-NULL with private state
for another device, and against sudden clearing the td_fpop. This
could occur when either a driver method calls another driver through
the filedescriptor operation, or a page fault happen while driver is
writing to a memory backed by another driver.

Noted by: rwatson
Tested by: rnoland
MFC after: 3 days


183230 21-Sep-2008 ed

Already initialize the vfs timestamps inside the cdev upon allocation.

In the MPSAFE TTY branch I noticed the vfs timestamps inside devfs were
allocated with 0, where the getattr() routine bumps the timestamps to
boottime if the value is below 3600. The reason why it has been designed
like this, is because timestamps during boot are likely to be invalid.

This means that device nodes that are created on demand (posix_openpt())
have timestamps with a value of boottime, which is not what we want.
Solve this by calling vfs_timestamp() inside devfs_alloc().

Discussed with: kib


183215 20-Sep-2008 kib

fdescfs, devfs, mqueuefs, nfs, portalfs, pseudofs, tmpfs and xfs
initialize the vattr structure in VOP_GETATTR() with VATTR_NULL(),
vattr_null() or by zeroing it. Remove these to allow preinitialization
of fields work in vn_stat(). This is needed to get birthtime initialized
correctly.

Submitted by: Jaakko Heinonen <jh saunalahti fi>
Discussed on: freebsd-fs
MFC after: 1 month


182371 28-Aug-2008 attilio

Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread
was always curthread and totally unuseful.

Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>


181905 20-Aug-2008 ed

Integrate the new MPSAFE TTY layer to the FreeBSD operating system.

The last half year I've been working on a replacement TTY layer for the
FreeBSD kernel. The new TTY layer was designed to improve the following:

- Improved driver model:

The old TTY layer has a driver model that is not abstract enough to
make it friendly to use. A good example is the output path, where the
device drivers directly access the output buffers. This means that an
in-kernel PPP implementation must always convert network buffers into
TTY buffers.

If a PPP implementation would be built on top of the new TTY layer
(still needs a hooks layer, though), it would allow the PPP
implementation to directly hand the data to the TTY driver.

- Improved hotplugging:

With the old TTY layer, it isn't entirely safe to destroy TTY's from
the system. This implementation has a two-step destructing design,
where the driver first abandons the TTY. After all threads have left
the TTY, the TTY layer calls a routine in the driver, which can be
used to free resources (unit numbers, etc).

The pts(4) driver also implements this feature, which means
posix_openpt() will now return PTY's that are created on the fly.

- Improved performance:

One of the major improvements is the per-TTY mutex, which is expected
to improve scalability when compared to the old Giant locking.
Another change is the unbuffered copying to userspace, which is both
used on TTY device nodes and PTY masters.

Upgrading should be quite straightforward. Unlike previous versions,
existing kernel configuration files do not need to be changed, except
when they reference device drivers that are listed in UPDATING.

Obtained from: //depot/projects/mpsafetty/...
Approved by: philip (ex-mentor)
Discussed: on the lists, at BSDCan, at the DevSummit
Sponsored by: Snow B.V., the Netherlands
dcons(4) fixed by: kan


181635 12-Aug-2008 kib

Remove unnecessary locking around pointer fetch.

Requested by: jhb


179926 22-Jun-2008 gonzo

Get pointer to devfs_ruleset struct after garbage collection has been
performed. Otherwise if ruleset is used by given mountpoint and is empty
it's freed by devfs_ruleset_reap and pointer becomes bogus.

Submitted by: Mateusz Guzik <mjguzik@gmail.com>
PR: kern/124853


179828 16-Jun-2008 kib

Struct cdev is always the member of the struct cdev_priv. When devfs
needed to promote cdev to cdev_priv, the si_priv pointer was followed.

Use member2struct() to calculate address of the wrapping cdev_priv.
Rename si_priv to __si_reserved.

Tested by: pho
Reviewed by: ed
MFC after: 2 weeks


179554 05-Jun-2008 kib

When devfs_allocv() committed to create new vnode, since de_vnode is NULL,
the dm_lock is held while the newly allocated vnode is locked. Since no
other threads may try to lock the new vnode yet, the LOR there cannot
result in the deadlock.

Shut down the witness warning to note this fact.

Tested by: pho
Prodded by: attilio


179475 01-Jun-2008 ed

Revert the changes I made to devfs_setattr() in r179457.

As discussed with Robert Watson and John Baldwin, it would be better if
PTY's are created with proper permissions, turning grantpt() into a
no-op.

Bypassing security frameworks like MAC by passing NOCRED to
VOP_SETATTR() will only make things more complex.

Approved by: philip (mentor)


179457 31-May-2008 ed

Merge back devfs changes from the mpsafetty branch.

In the mpsafetty branch, PTY's are allocated through the posix_openpt()
system call. The controller side of a PTY now uses its own file
descriptor type (just like sockets, vnodes, pipes, etc).

To remain compatible with existing FreeBSD and Linux C libraries, we can
still create PTY's by opening /dev/ptmx or /dev/ptyXX. These nodes
implement d_fdopen(). Devfs has been slightly changed here, to allow
finit() to be called from d_fdopen().

The routine grantpt() has also been moved into the kernel. This routine
is a little odd, because it needs to bypass standard UNIX permissions.
It needs to change the owner/group/mode of the slave device node, which
may often not be possible. The old implementation solved this by
spawning a setuid utility.

When VOP_SETATTR() is called with NOCRED, devfs_setattr() dereferences
ap->a_cred, causing a kernel panic. Change the de_{uid,gid,mode} code to
allow changes when a->a_cred is set to NOCRED.

Approved by: philip (mentor)


179247 23-May-2008 kib

When vget() fails (because the vnode has been reclaimed), there is no
sense to loop trying to vget() the vnode again.

PR: 122977
Submitted by: Arthur Hartwig <arthur.hartwig nokia com>
Tested by: pho
Reviewed by: jhb
MFC after: 1 week


179175 21-May-2008 kib

Implement the per-open file data for the cdev.

The patch does not change the cdevsw KBI. Management of the data is
provided by the functions
int devfs_set_cdevpriv(void *priv, cdevpriv_dtr_t dtr);
int devfs_get_cdevpriv(void **datap);
void devfs_clear_cdevpriv(void);
All of the functions are supposed to be called from the cdevsw method
contexts.

- devfs_set_cdevpriv assigns the priv as private data for the file
descriptor which is used to initiate currently performed driver
operation. dtr is the function that will be called when either the
last refernce to the file goes away, the device is destroyed or
devfs_clear_cdevpriv is called.
- devfs_get_cdevpriv is the obvious accessor.
- devfs_clear_cdevpriv allows to clear the private data for the still
open file.

Implementation keeps the driver-supplied pointers in the struct
cdev_privdata, that is referenced both from the struct file and struct
cdev, and cannot outlive any of the referee.

Man pages will be provided after the KPI stabilizes.

Reviewed by: jhb
Useful suggestions from: jeff, antoine
Debugging help and tested by: pho
MFC after: 1 month


178834 07-May-2008 jhb

Don't explicitly drop Giant around d_open/d_fdopen/d_close for MPSAFE
drivers. Since devfs is already marked MPSAFE it shouldn't be held
anyway.

MFC after: 2 weeks
Discussed with: phk


177458 20-Mar-2008 kib

Do not dereference cdev->si_cdevsw, use the dev_refthread() to properly
obtain the reference. In particular, this fixes the panic reported in
the PR. Remove the comments stating that this needs to be done.

PR: kern/119422
MFC after: 1 week


176559 25-Feb-2008 attilio

Axe the 'thread' argument from VOP_ISLOCKED() and lockstatus() as it is
always curthread.

As KPI gets broken by this patch, manpages and __FreeBSD_version will be
updated by further commits.

Tested by: Andrea Barberio <insomniac at slackware dot it>


175294 13-Jan-2008 attilio

VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>


175202 10-Jan-2008 attilio

vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by: Diego Sardina <siarodx at gmail dot com>,
Andrea Di Pasquale <whyx dot it at gmail dot com>


175151 08-Jan-2008 jhb

Lock the vnode interlock while reading v_usecount to update si_usecount
in a cdev in devfs_reclaim().

MFC after: 3 days
Reviewed by: jeff (a while ago)


175140 07-Jan-2008 jhb

Make ftruncate a 'struct file' operation rather than a vnode operation.
This makes it possible to support ftruncate() on non-vnode file types in
the future.
- 'struct fileops' grows a 'fo_truncate' method to handle an ftruncate() on
a given file descriptor.
- ftruncate() moves to kern/sys_generic.c and now just fetches a file
object and invokes fo_truncate().
- The vnode-specific portions of ftruncate() move to vn_truncate() in
vfs_vnops.c which implements fo_truncate() for vnode file types.
- Non-vnode file types return EINVAL in their fo_truncate() method.

Submitted by: rwatson


174988 30-Dec-2007 jeff

Remove explicit locking of struct file.
- Introduce a finit() which is used to initailize the fields of struct file
in such a way that the ops vector is only valid after the data, type,
and flags are valid.
- Protect f_flag and f_count with atomic operations.
- Remove the global list of all files and associated accounting.
- Rewrite the unp garbage collection such that it no longer requires
the global list of all files and instead uses a list of all unp sockets.
- Mark sockets in the accept queue so we don't incorrectly gc them.

Tested by: kris, pho


172930 24-Oct-2007 rwatson

Merge first in a series of TrustedBSD MAC Framework KPI changes
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:

mac_<object>_<method/action>
mac_<object>_check_<method/action>

The previous naming scheme was inconsistent and mostly
reversed from the new scheme. Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier. Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods. Also simplify, slightly,
some entry point names.

All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.

Sponsored by: SPARTA (original patches against Mac OS X)
Obtained from: TrustedBSD Project, Apple Computer


171599 26-Jul-2007 pjd

When we do open, we should lock the vnode exclusively. This fixes few races:
- fifo race, where two threads assign v_fifoinfo,
- v_writecount modifications,
- v_object modifications,
- and probably more...

Discussed with: kib, ups
Approved by: re (rwatson)


171181 03-Jul-2007 kib

Since rev. 1.199 of sys/kern/kern_conf.c, the thread that calls
destroy_dev() from d_close() cdev method would self-deadlock.
devfs_close() bump device thread reference counter, and destroy_dev()
sleeps, waiting for si_threadcount to reach zero for cdev without
d_purge method.

destroy_dev_sched() could be used instead from d_close(), to
schedule execution of destroy_dev() in another context. The
destroy_dev_sched_drain() function can be used to drain the scheduled
calls to destroy_dev_sched(). Similarly, drain_dev_clone_events() drains
the events clone to make sure no lingering devices are left after
dev_clone event handler deregistered.

make_dev_credf(MAKEDEV_REF) function should be used from dev_clone
event handlers instead of make_dev()/make_dev_cred() to ensure that created
device has reference counter bumped before cdev mutex is dropped inside
make_dev().

Reviewed by: tegge (early versions), njl (programming interface)
Debugging help and testing by: Peter Holm
Approved by: re (kensmith)


170587 12-Jun-2007 rwatson

Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in
some cases, move to priv_check() if it was an operation on a thread and
no other flags were present.

Eliminate caller-side jail exception checking (also now-unused); jail
privilege exception code now goes solely in kern_jail.c.

We can't yet eliminate suser() due to some cases in the KAME code where
a privilege check is performed and then used in many different deferred
paths. Do, however, move those prototypes to priv.h.

Reviewed by: csjp
Obtained from: TrustedBSD Project


170152 31-May-2007 kib

Revert UF_OPENING workaround for CURRENT.
Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation
argument from being file descriptor index into the pointer to struct file.

Proposed and reviewed by: jhb
Reviewed by: daichi (unionfs)
Approved by: re (kensmith)


168977 23-Apr-2007 rwatson

Rename mac*devfsdirent*() to mac*devfs*() to synchronize with SEDarwin,
where similar data structures exist to support devfs and the MAC
Framework, but are named differently.

Obtained from: TrustedBSD Project
Sponsored by: SPARTA, Inc.


168884 20-Apr-2007 trhodes

In some cases, like whenever devfs file times are zero, the fix(aa) will not
be applied to dev entries. This leaves us with file times like "Jan 1 1970."
Work around this problem by replacing the tv_sec == 0 check with a
<= 3600 check. It's doubtful anyone will be booting within an hour of the
Epoch, let alone care about a few seconds worth of nonzero timestamps. It's
a hackish work around, but it does work and I have not experienced any
negatives in my testing.

Discussed with: bde
"Ok with me: phk


168355 04-Apr-2007 rwatson

Replace custom file descriptor array sleep lock constructed using a mutex
and flags with an sxlock. This leads to a significant and measurable
performance improvement as a result of access to shared locking for
frequent lookup operations, reduced general overhead, and reduced overhead
in the event of contention. All of these are imported for threaded
applications where simultaneous access to a shared file descriptor array
occurs frequently. Kris has reported 2x-4x transaction rate improvements
on 8-core MySQL benchmarks; smaller improvements can be expected for many
workloads as a result of reduced overhead.

- Generally eliminate the distinction between "fast" and regular
acquisisition of the filedesc lock; the plan is that they will now all
be fast. Change all locking instances to either shared or exclusive
locks.

- Correct a bug (pointed out by kib) in fdfree() where previously msleep()
was called without the mutex held; sx_sleep() is now always called with
the sxlock held exclusively.

- Universally hold the struct file lock over changes to struct file,
rather than the filedesc lock or no lock. Always update the f_ops
field last. A further memory barrier is required here in the future
(discussed with jhb).

- Improve locking and reference management in linux_at(), which fails to
properly acquire vnode references before using vnode pointers. Annotate
improper use of vn_fullpath(), which will be replaced at a future date.

In fcntl(), we conservatively acquire an exclusive lock, even though in
some cases a shared lock may be sufficient, which should be revisited.
The dropping of the filedesc lock in fdgrowtable() is no longer required
as the sxlock can be held over the sleep operation; we should consider
removing that (pointed out by attilio).

Tested by: kris
Discussed with: jhb, kris, attilio, jeff


167916 26-Mar-2007 kris

Annotate that this giant acqusition is dependent on tty locking.


167497 13-Mar-2007 tegge

Make insmntque() externally visibile and allow it to fail (e.g. during
late stages of unmount). On failure, the vnode is recycled.

Add insmntque1(), to allow for file system specific cleanup when
recycling vnode on failure.

Change getnewvnode() to no longer call insmntque(). Previously,
embryonic vnodes were put onto the list of vnode belonging to a file
system, which is unsafe for a file system marked MPSAFE.

Change vfs_hash_insert() to no longer lock the vnode. The caller now
has that responsibility.

Change most file systems to lock the vnode and call insmntque() or
insmntque1() after a new vnode has been sufficiently setup. Handle
failed insmntque*() calls by propagating errors to callers, possibly
after some file system specific cleanup.

Approved by: re (kensmith)
Reviewed by: kib
In collaboration with: kib


164033 06-Nov-2006 rwatson

Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges. These may
require some future tweaking.

Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>


163606 22-Oct-2006 rwatson

Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from: TrustedBSD Project
Sponsored by: SPARTA


163530 20-Oct-2006 kib

Update the access and modification times for dev while still holding
thread reference on it.

Reviewed by: tegge
Approved by: pjd (mentor)


163529 20-Oct-2006 kib

Fix the race between devfs_fp_check and devfs_reclaim. Derefence the
vnode' v_rdev and increment the dev threadcount , as well as clear it
(in devfs_reclaim) under the dev_lock().

Reviewed by: tegge
Approved by: pjd (mentor)


163481 18-Oct-2006 kib

Properly lock the vnode around vgone() calls.

Unlock the vnode in devfs_close() while calling into the driver d_close()
routine.

devfs_revoke() changes by: ups
Reviewed and bugfixes by: tegge
Tested by: mbr, Peter Holm
Approved by: pjd (mentor)
MFC after: 1 week


162647 26-Sep-2006 tegge

Use mount interlock to protect all changes to mnt_flag and mnt_kern_flag.
This eliminates a race where MNT_UPDATE flag could be lost when nmount()
raced against sync(), sync_fsync() or quotactl().


162443 19-Sep-2006 kib

Fix the bug in rev. 1.134. In devfs_allocv_drop_refs(), when not_found == 2
and drop_dm_lock is true, no unlocking shall be attempted. The lock is
already dropped and memory is freed.

Found with: Coverity Prevent(tm)
CID: 1536
Approved by: pjd (mentor)


162398 18-Sep-2006 kib

Resolve the devfs deadlock caused by LOR between devfs_mount->dm_lock and
vnode lock in devfs_allocv. Do this by temporary dropping dm_lock around
vnode locking.

For safe operation, add hold counters for both devfs_mount and devfs_dirent,
and DE_DOOMED flag for devfs_dirent. The facilities allow to continue after
dropping of the dm_lock, by making sure that referenced memory does not
disappear.

Reviewed by: tegge
Tested by: kris
Approved by: kan (mentor)
PR: kern/102335


160425 17-Jul-2006 phk

Remove the NDEVFSINO and NDEVFSOVERFLOW options which no longer exists in
DEVFS.

Remove the opt_devfs.h file now that it is empty.


160310 12-Jul-2006 ups

Add vnode interlocking to devfs.
This prevents race conditions that can cause pagefaults or devfs
to use arbitrary vnodes.

MFC after: 1 week


160133 06-Jul-2006 rwatson

Remove now unneeded opt_mac.h and mac.h includes.

MFC after: 3 days


160132 06-Jul-2006 rwatson

Use #include "", not #include <> for opt_foo.h.

MFC after: 3 days


157685 12-Apr-2006 pjd

Remove unused prototypes.


157342 31-Mar-2006 jeff

- Add a bogus vhold/vdrop around vgone() in devfs_revoke. Without this
the vnode is never recycled. It is bogus because the reference really
should be associated with the devfs dirent.


155903 22-Feb-2006 jeff

- We must hold a reference to a vnode before calling vgone() otherwise
it may not be removed from the freelist.

MFC After: 1 week
Found by: kris


155034 30-Jan-2006 jeff

- Remove a stale comment. This function was rewritten to be SMP safe some
time ago.

Sponsored by: Isilon Systems, Inc.


153986 03-Jan-2006 rwatson

When returning EIO from DEVFSIO_RADD ioctl, drop the exclusive rule
lock. Otherwise the system comes to a rather sudden and grinding
halt.

MFC after: 1 week


152254 09-Nov-2005 dwhite

This is a workaround for a complicated issue involving VFS cookies and devfs.
The PR and patch have the details. The ultimate fix requires architectural
changes and clarifications to the VFS API, but this will prevent the system
from panicking when someone does "ls /dev" while running in a shell under the
linuxulator.

This issue affects HEAD and RELENG_6 only.

PR: 88249
Submitted by: "Devon H. O'Dell" <dodell@ixsystems.com>
MFC after: 3 days


151453 18-Oct-2005 phk

Use correct cirteria for determining which directory entries we can
purge right away and which we merely can hide.

Beaten into my skull by: kris


150501 24-Sep-2005 phk

Make rule zero really magical, that way we don't have to do anything
when we mount and get zero cost if no rules are used in a mountpoint.

Add code to deref rules on unmount.

Switch from SLIST to TAILQ.

Drop SYSINIT, use SX_SYSINIT and static initializer of TAILQ instead.

Drop goto, a break will do.

Reduce double pointers to single pointers.

Combine reaping and destroying rulesets.

Avoid memory leaks in a some error cases.


150342 19-Sep-2005 phk

Rewamp DEVFS internals pretty severely [1].

Give DEVFS a proper inode called struct cdev_priv. It is important
to keep in mind that this "inode" is shared between all DEVFS
mountpoints, therefore it is protected by the global device mutex.

Link the cdev_priv's into a list, protected by the global device
mutex. Keep track of each cdev_priv's state with a flag bit and
of references from mountpoints with a dedicated usecount.

Reap the benefits of much improved kernel memory allocator and the
generally better defined device driver APIs to get rid of the tables
of pointers + serial numbers, their overflow tables, the atomics
to muck about in them and all the trouble that resulted in.

This makes RAM the only limit on how many devices we can have.

The cdev_priv is actually a super struct containing the normal cdev
as the "public" part, and therefore allocation and freeing has moved
to devfs_devs.c from kern_conf.c.

The overall responsibility is (to be) split such that kern/kern_conf.c
is the stuff that deals with drivers and struct cdev and fs/devfs
handles filesystems and struct cdev_priv and their private liason
exposed only in devfs_int.h.

Move the inode number from cdev to cdev_priv and allocate inode
numbers properly with unr. Local dirents in the mountpoints
(directories, symlinks) allocate inodes from the same pool to
guarantee against overlaps.

Various other fields are going to migrate from cdev to cdev_priv
in the future in order to hide them. A few fields may migrate
from devfs_dirent to cdev_priv as well.

Protect the DEVFS mountpoint with an sx lock instead of lockmgr,
this lock also protects the directory tree of the mountpoint.

Give each mountpoint a unique integer index, allocated with unr.
Use it into an array of devfs_dirent pointers in each cdev_priv.
Initially the array points to a single element also inside cdev_priv,
but as more devfs instances are mounted, the array is extended with
malloc(9) as necessary when the filesystem populates its directory
tree.

Retire the cdev alias lists, the cdev_priv now know about all the
relevant devfs_dirents (and their vnodes) and devfs_revoke() will
pick them up from there. We still spelunk into other mountpoints
and fondle their data without 100% good locking. It may make better
sense to vector the revoke event into the tty code and there do a
destroy_dev/make_dev on the tty's devices, but that's for further
study.

Lots of shuffling of stuff and churn of bits for no good reason[2].

XXX: There is still nothing preventing the dev_clone EVENTHANDLER
from being invoked at the same time in two devfs mountpoints. It
is not obvious what the best course of action is here.

XXX: comment out an if statement that lost its body, until I can
find out what should go there so it doesn't do damage in the meantime.

XXX: Leave in a few extra malloc types and KASSERTS to help track
down any remaining issues.

Much testing provided by: Kris
Much confusion caused by (races in): md(4)

[1] You are not supposed to understand anything past this point.

[2] This line should simplify life for the peanut gallery.


150200 15-Sep-2005 phk

Don't attempt to recurse lockmgr, it doesn't like it.


150151 15-Sep-2005 phk

Various minor polishing.


150150 15-Sep-2005 phk

Protect the devfs rule internal global lists with a sx lock, the per
mount locks are not enough. Finer granularity (x)locking could be
implemented, but I prefer to keep it simple for now.


150149 15-Sep-2005 phk

Absolve devfs_rule.c from locking responsibility and call it with
all necessary locking held.


150147 15-Sep-2005 phk

Close a race which could result in unwarranted "ruleset %d already
running" panics.

Previously, recursion through the "include" feature was prevented by
marking each ruleset as "running" when applied. This doesn't work for
the case where two DEVFS instances try to apply the same ruleset at
the same time.

Instead introduce the sysctl vfs.devfs.rule_depth (default == 1) which
limits how many levels of "include" we will traverse.

Be aware that traversal of "include" is recursive and kernel stack
size is limited.

MFC: after 3 days


150019 12-Sep-2005 phk

Clean up prototypes.


149573 29-Aug-2005 phk

Add a missing dev_relthread() call.

Remove unused variable.

Spotted by: Hans Petter Selasky <hselasky@c2i.net>


149177 17-Aug-2005 phk

Handle device drivers with D_NEEDGIANT in a way which does not
penalize the 'good' drivers: Allocate a shadow cdevsw and populate
it with wrapper functions which grab Giant


149146 16-Aug-2005 phk

Collect the devfs related sysctls in one place


149144 16-Aug-2005 phk

Create a new internal .h file to communicate very private stuff
from kern_conf.c to devfs.

For now just two prototypes, more to come.


149107 15-Aug-2005 phk

Eliminate effectively unused dm_basedir field from devfs_mount.


148868 08-Aug-2005 rwatson

Merge the dev_clone and dev_clone_cred event handlers into a single
event handler, dev_clone, which accepts a credential argument.
Implementors of the event can ignore it if they're not interested,
and most do. This avoids having multiple event handler types and
fall-back/precedence logic in devfs.

This changes the kernel API for /dev cloning, and may affect third
party packages containg cloning kernel modules.

Requested by: phk
MFC after: 3 days


148547 29-Jul-2005 kris

devfs is not yet fully MPSAFE - for example, multiple concurrent devfs(8)
processes can cause a panic when operating on rulesets.

Approved by: phk


148182 20-Jul-2005 simon

Correct devfs ruleset bypass.

Submitted by: csjp
Reviewed by: phk
Security: FreeBSD-SA-05:17.devfs
Approved by: cperciva


147982 14-Jul-2005 rwatson

When devfs cloning takes place, provide access to the credential of the
process that caused the clone event to take place for the device driver
creating the device. This allows cloned device drivers to adapt the
device node based on security aspects of the process, such as the uid,
gid, and MAC label.

- Add a cred reference to struct cdev, so that when a device node is
instantiated as a vnode, the cloning credential can be exposed to
MAC.

- Add make_dev_cred(), a version of make_dev() that additionally
accepts the credential to stick in the struct cdev. Implement it and
make_dev() in terms of a back-end make_dev_credv().

- Add a new event handler, dev_clone_cred, which can be registered to
receive the credential instead of dev_clone, if desired.

- Modify the MAC entry point mac_create_devfs_device() to accept an
optional credential pointer (may be NULL), so that MAC policies can
inspect and act on the label or other elements of the credential
when initializing the skeleton device protections.

- Modify tty_pty.c to register clone_dev_cred and invoke make_dev_cred(),
so that the pty clone credential is exposed to the MAC Framework.

While currently primarily focussed on MAC policies, this change is also
a prerequisite for changes to allow ptys to be instantiated with the UID
of the process looking up the pty. This requires further changes to the
pty driver -- in particular, to immediately recycle pty nodes on last
close so that the credential-related state can be recreated on next
lookup.

Submitted by: Andrew Reisse <andrew.reisse@sparta.com>
Obtained from: TrustedBSD Project
Sponsored by: SPAWAR, SPARTA
MFC after: 1 week
MFC note: Merge to 6.x, but not 5.x for ABI reasons


146823 31-May-2005 rodrigc

Do not declare a struct as extern, and then implement
it as static in the same file. This is not legal C,
and GCC 4.0 will issue an error.

Reviewed by: phk
Approved by: das (mentor)


145730 01-May-2005 jeff

- In devfs_open() and devfs_close() grab Giant if the driver sets NEEDGIANT.
We still have to DROP_GIANT and PICKUP_GIANT when NEEDGIANT is not set
because vfs is still sometime entered with Giant held.


145698 30-Apr-2005 jeff

- Mark devfs as MNTK_MPSAFE as I belive it does not require Giant.

Sponsored by: Isilon Systems, Inc.
Agreed in principle by: phk


145006 13-Apr-2005 jeff

- Change all filesystems and vfs_cache to relock the dvp once the child is
locked in the ISDOTDOT case. Se vfs_lookup.c r1.79 for details.

Sponsored by: Isilon Systems, Inc.


144389 31-Mar-2005 phk

Explicitly hold a reference to the cdev we have just cloned. This
closes the race where the cdev was reclaimed before it ever made it
back to devfs lookup.


144385 31-Mar-2005 phk

cdev (still) needs per instance uid/gid/mode

Add unlocked version of dev_ref()

Clean up various stuff in sys/conf.h


144384 31-Mar-2005 phk

Rename dev_ref() to dev_refl()


144366 31-Mar-2005 jeff

- LK_NOPAUSE is a nop now.

Sponsored by: Isilon Systems, Inc.


144208 28-Mar-2005 jeff

- We no longer have to bother with PDIRUNLOCK, lookup() handles it for us.

Sponsored by: Isilon Systems, Inc.


144058 24-Mar-2005 jeff

- Update vfs_root implementations to match the new prototype. None of
these filesystems will support shared locks until they are explicitly
modified to do so. Careful review must be done to ensure that this
is safe for each individual filesystem.

Sponsored by: Isilon Systems, Inc.


143746 17-Mar-2005 phk

Prepare for the final onslaught on devices:

Move uid/gid/mode from cdev to cdevsw.

Add kind field to use for devd(8) later.

Bump both D_VERSION and __FreeBSD_version


143510 13-Mar-2005 jeff

- The VI_DOOMED flag now signals the end of a vnode's relationship with
the filesystem. Check that rather than VI_XLOCK.

Sponsored by: Isilon Systems, Inc.


143383 10-Mar-2005 phk

One more bit of the major/minor patch to make ttyname happy as well.


143381 10-Mar-2005 phk

Try to fix the mess I made of devname, with the minimal subset of the
larger minor/major patch which was posted for testing.


143303 08-Mar-2005 phk

Remove kernelside support for devfs rules filtering on major numbers.


142250 22-Feb-2005 phk

We may not have an actual cdev at this point.


142242 22-Feb-2005 phk

Reap more benefits from DEVFS:

List devfs_dirents rather than vnodes off their shared struct cdev, this
saves a pointer field in the vnode at the expense of a field in the
devfs_dirent. There are often 100 times more vnodes so this is bargain.
In addition it makes it harder for people to try to do stypid things like
"finding the vnode from cdev".

Since DEVFS handles all VCHR nodes now, we can do the vnode related
cleanup in devfs_reclaim() instead of in dev_rel() and vgonel().
Similarly, we can do the struct cdev related cleanup in dev_rel()
instead of devfs_reclaim().

rename idestroy_dev() to destroy_devl() for consistency.

Add LIST_ENTRY de_alias to struct devfs_dirent.
Remove v_specnext from struct vnode.
Change si_hlist to si_alist in struct cdev.
String new devfs vnodes' devfs_dirent on si_alist when
we create them and take them off in devfs_reclaim().

Fix devfs_revoke() accordingly. Also don't clear fields
devfs_reclaim() will clear when called from vgone();

Let devfs_reclaim() call dev_rel() instead of vgonel().

Move the usecount tracking from dev_rel() to devfs_reclaim(),
and let dev_rel() take a struct cdev argument instead of vnode.

Destroy SI_CHEAPCLONE devices in dev_rel() (instead of
devfs_reclaim()) when they are no longer used. (This
should maybe happen in devfs_close() instead.)


142232 22-Feb-2005 phk

Make dev_ref() require the dev_lock() to be held and use it from
devfs instead of directly frobbing the si_refcount.


142011 17-Feb-2005 phk

Introduce vx_wait{l}() and use it instead of home-rolled versions.


141633 10-Feb-2005 phk

Make a SYSCTL_NODE static


141617 10-Feb-2005 phk

Statize devfs_ops_f


140939 28-Jan-2005 phk

Make filesystems get rid of their own vnodes vnode_pager object in
VOP_RECLAIM().


140196 13-Jan-2005 phk

Whitespace in vop_vector{} initializations.


140067 11-Jan-2005 phk

Silently ignore forced argument to unmount.


139776 06-Jan-2005 imp

/* -> /*- for copyright notices, minor format tweaks as necessary


139664 04-Jan-2005 phk

Unsupport forceful unmounts of DEVFS.

After disscussing things I have decided to take the easy and
consistent 90% solution instead of aiming for the very involved 99%
solution.

If we allow forceful unmounts of DEVFS we need to decide how to handle
the devices which are in use through this filesystem at the time.

We cannot just readopt the open devices in the main /dev instance since
that would open us to security issues.

For the majority of the devices, this is relatively straightforward
as we can just pretend they got revoke(2)'ed.

Some devices get tricky: /dev/console and /dev/tty for instance
does a sort of recursive open of the real console device. Other devices
may be mmap'ed (kill the processes ?).

And then there are disk devices which are mounted.

The correct thing here would be to recursively unmount the filesystems
mounte from devices from our DEVFS instance (forcefully) and if
this succeeds, complete the forcefully unmount of DEVFS. But if
one of the forceful unmounts fail we cannot complete the forceful
unmount of DEVFS, but we are likely to already have severed a lot
of stuff in the process of trying.

Event attempting this would be a lot of code for a very far out
corner-case which most people would never see or get in touch with.

It's just not worth it.


139189 22-Dec-2004 phk

Be consistent about flag values passed to device drivers read/write
methods:

Read can see O_NONBLOCK and O_DIRECT.

Write can see O_NONBLOCK, O_DIRECT and O_FSYNC.

In addition O_DIRECT is shadowed as IO_DIRECT for now for backwards
compatibility.


139188 22-Dec-2004 phk

Shuffle numeric values of the IO_* flags to match the O_* flags from
fcntl.h.

This is in preparation for making the flags passed to device drivers be
consistently from fcntl.h for all entrypoints.

Today open, close and ioctl uses fcntl.h flags, while read and write
uses vnode.h flags.


139085 20-Dec-2004 phk

We can only ever get to vgonechrl() from a devfs vnode, so we do not
need to reassign the vp->v_op to devfs_specops, we know that is the
value already.

Make devfs_specops private to devfs.


139083 20-Dec-2004 phk

Add a couple of KASSERTS to try to diagnose a problem reported.


138841 14-Dec-2004 phk

Be a bit more assertive about vnode bypass.


138791 13-Dec-2004 phk

Another FNONBLOCK -> O_NONBLOCK.

Don't unconditionally set IO_UNIT to device drivers in write: nobody
checks it, and since it was always set it did not carry information anyway.


138790 13-Dec-2004 phk

Use O_NONBLOCK instead of FNONBLOCK alias.


138788 13-Dec-2004 phk

Explicit panic in vop_read/vop_write for devices


138509 07-Dec-2004 phk

The remaining part of nmount/omount/rootfs mount changes. I cannot sensibly
split the conversion of the remaining three filesystems out from the root
mounting changes, so in one go:

cd9660:
Convert to nmount.
Add omount compat shims.
Remove dedicated rootfs mounting code.
Use vfs_mountedfrom()
Rely on vfs_mount.c calling VFS_STATFS()

nfs(client):
Convert to nmount (the simple way, mount_nfs(8) is still necessary).
Add omount compat shims.
Drop COMPAT_PRELITE2 mount arg compatibility.

ffs:
Convert to nmount.
Add omount compat shims.
Remove dedicated rootfs mounting code.
Use vfs_mountedfrom()
Rely on vfs_mount.c calling VFS_STATFS()

Remove vfs_omount() method, all filesystems are now converted.

Remove MNTK_WANTRDWR, handling RO/RW conversions is a filesystem
task, and they all do it now.

Change rootmounting to use DEVFS trampoline:

vfs_mount.c:
Mount devfs on /. Devfs needs no 'from' so this is clean.
symlink /dev to /. This makes it possible to lookup /dev/foo.
Mount "real" root filesystem on /.
Surgically move the devfs mountpoint from under the real root
filesystem onto /dev in the real root filesystem.

Remove now unnecessary getdiskbyname().

kern_init.c:
Don't do devfs mounting and rootvnode assignment here, it was
already handled by vfs_mount.c.

Remove now unused bdevvp(), addaliasu() and addalias(). Put the
few necessary lines in devfs where they belong. This eliminates the
second-last source of bogo vnodes, leaving only the lemming-syncer.

Remove rootdev variable, it doesn't give meaning in a global context and
was not trustworth anyway. Correct information is provided by
statfs(/).


138481 06-Dec-2004 phk

Use vfs_mountedfrom() and rely on vfs_mount.c to call VFS_STATFS()


138412 05-Dec-2004 phk

VFS_STATFS(mp, ...) is mostly called with &mp->mnt_stat, but a few cases
doesn't. Most of the implementations have grown weeds for this so they
copy some fields from mnt_stat if the passed argument isn't that.

Fix this the cleaner way: Always call the implementation on mnt_stat
and copy that in toto to the VFS_STATFS argument if different.


138290 01-Dec-2004 phk

Back when VOP_* was introduced, we did not have new-style struct
initializations but we did have lofty goals and big ideals.

Adjust to more contemporary circumstances and gain type checking.

Replace the entire vop_t frobbing thing with properly typed
structures. The only casualty is that we can not add a new
VOP_ method with a loadable module. History has not given
us reason to belive this would ever be feasible in the the
first place.

Eliminate in toto VOCALL(), vop_t, VNODEOP_SET() etc.

Give coda correct prototypes and function definitions for
all vop_()s.

Generate a bit more data from the vnode_if.src file: a
struct vop_vector and protype typedefs for all vop methods.

Add a new vop_bypass() and make vop_default be a pointer
to another struct vop_vector.

Remove a lot of vfs_init since vop_vector is ready to use
from the compiler.

Cast various vop_mumble() to void * with uppercase name,
for instance VOP_PANIC, VOP_NULL etc.

Implement VCALL() by making vdesc_offset the offsetof() the
relevant function pointer in vop_vector. This is disgusting
but since the code is generated by a script comparatively
safe. The alternative for nullfs etc. would be much worse.

Fix up all vnode method vectors to remove casts so they
become typesafe. (The bulk of this is generated by scripts)


138270 01-Dec-2004 phk

Mechanically change prototypes for vnode operations to use the new typedefs.


138106 26-Nov-2004 phk

Ignore MNT_NODEV, it is implicit in choice of filesystem these days.


137800 17-Nov-2004 phk

Make vnode bypass for devices mandatory.


137755 15-Nov-2004 phk

Make vnode bypass the default for devices.

Can be disabled in case of problems with
vfs.devfs.fops=0
in loader.conf


137679 13-Nov-2004 phk

Integrate most of vop_revoke() into devfs_revoke() where it belongs.


137678 13-Nov-2004 phk

Add the devfs_fp_check() function which helps us get from a struct file
to a cdev and a devsw, doing all the relevant checks along the way.

Add the check to see if fp->f_vnode->v_rdev differs from our cached
fp->f_data copy of our cdev. If it does the device was revoked and
we return ENXIO.


137478 09-Nov-2004 phk

Refuse attemps to mount root filesystem


137382 08-Nov-2004 phk

Add optional device vnode bypass to DEVFS.

The tunable vfs.devfs.fops controls this feature and defaults to off.

When enabled (vfs.devfs.fops=1 in loader), device vnodes opened
through a filedescriptor gets a special fops vector which instead
of the detour through the vnode layer goes directly to DEVFS.

Amongst other things this allows us to run Giant free read/write to
device drivers which have been weaned off D_NEEDGIANT.

Currently this means /dev/null, /dev/zero, disks, (and maybe the
random stuff ?)

On a 700MHz K7 machine this doubles the speed of
dd if=/dev/zero of=/dev/null bs=1 count=1000000

This roughly translates to shaving 2usec of each read/write syscall.

The poll/kqfilter paths need more work before they are giant free,
this work is ongoing in p4::phk_bufwork

Please test this and report any problems, LORs etc.


137308 06-Nov-2004 phk

Properly implement a default version of VOP_GETWRITEMOUNT.

Remove improper access to vop_stdgetwritemount() which should and
will instead rely on the VOP default path.


137195 04-Nov-2004 phk

Add back securelevel check for disks.

XXX: This should live in geom_dev.c but we don't have access to the
cred there.
XXX: XXX: This may not matter anymore since filesystems use geom_vfs.


137047 29-Oct-2004 phk

Don't give disks special treatment, they don't come this way anymore.


137043 29-Oct-2004 phk

Remove VOP_SPECSTRATEGY() from the system.


137029 29-Oct-2004 phk

Give dev_strategy() an explict cdev argument in preparation for removing
buf->b-dev.

Put a bio between the buf passed to dev_strategy() and the device driver
strategy routine in order to not clobber fields in the buf.

Assert copyright on vfs_bio.c and update copyright message to canonical
text. There is no legal difference between John Dysons two-clause
abbreviated BSD license and the canonical text.


137006 28-Oct-2004 phk

What can I say: don't allow people to mount DEVFS with option "nodev".


136966 26-Oct-2004 phk

Put the I/O block size in bufobj->bo_bsize.

We keep si_bsize_phys around for now as that is the simplest way to pull
the number out of disk device drivers in devfs_open(). The correct solution
would be to do an ioctl(DIOCGSECTORSIZE), but the point is probably mooth
when filesystems sit on GEOM, so don't bother for now.


136770 22-Oct-2004 phk

Alas, poor SPECFS! -- I knew him, Horatio; A filesystem of infinite
jest, of most excellent fancy: he hath taught me lessons a thousand
times; and now, how abhorred in my imagination it is! my gorge rises
at it. Here were those hacks that I have curs'd I know not how
oft. Where be your kludges now? your workarounds? your layering
violations, that were wont to set the table on a roar?

Move the skeleton of specfs into devfs where it now belongs and
bury the rest.


135727 24-Sep-2004 phk

XXX mark two places where we do not hold a threadcount on the dev when
frobbing the cdevsw.

In both cases we examine only the cdevsw and it is a good question if we
weren't better off copying those properties into the cdev in the first
place. This question will be revisited.


132902 30-Jul-2004 phk

Put a version element in the VFS filesystem configuration structure
and refuse initializing filesystems with a wrong version. This will
aid maintenance activites on the 5-stable branch.

s/vfs_mount/vfs_omount/

s/vfs_nmount/vfs_mount/

Name our filesystems mount function consistently.

Eliminate the namiedata argument to both vfs_mount and vfs_omount.
It was originally there to save stack space. A few places abused
it to get hold of some credentials to pass around. Effectively
it is unused.

Reorganize the root filesystem selection code.


132653 26-Jul-2004 cperciva

Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is
somewhat clearer, but more importantly allows for a consistent naming
scheme for suser_cred flags.

The old name is still defined, but will be removed in a few days (unless I
hear any complaints...)

Discussed with: rwatson, scottl
Requested by: jhb


132547 22-Jul-2004 rwatson

In devfs_allocv(), rather than assigning 'td = curthread', assert that
the caller passes in a td that is curthread, and consistently pass 'td'
into vget(). Remove some bogus logic that passed in td or curthread
conditional on td being non-NULL, which seems redundant in the face of
the earlier assignment of td to curthread if td is NULL.

In devfs_symlink(), cache the passed thread in 'td' so we don't have
to keep retrieving it from the 'ap' structure, and assert that td is
curthread (since we dereference it to get thread-local td_ucred). Use
'td' in preference to curthread for later lockmgr calls, since they are
equal.


132023 12-Jul-2004 alfred

Make VFS_ROOT() and vflush() take a thread argument.
This is to allow filesystems to decide based on the passed thread
which vnode to return.
Several filesystems used curthread, they now use the passed thread.


130678 18-Jun-2004 phk

Reduce a fair bit of the atomics because we are now called with a
lock from kern_conf.c and cdev's act a lot more like real objects
these days.


130640 17-Jun-2004 phk

Second half of the dev_t cleanup.

The big lines are:
NODEV -> NULL
NOUDEV -> NODEV
udev_t -> dev_t
udev2dev() -> findcdev()

Various minor adjustments including handling of userland access to kernel
space struct cdev etc.


130585 16-Jun-2004 phk

Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.


126019 19-Feb-2004 phk

Report the correct length for symlink entries.


125855 15-Feb-2004 phk

White-space align a struct definition.
Move a SYSINIT to the file where it belongs.


124804 21-Jan-2004 cperciva

Fix style(9) of my previous commit.

Noticed by: nate
Approved by: nate, rwatson (mentor)


124798 21-Jan-2004 cperciva

Allow devfs path rules to work on directories. Without this fix,
devfs rule add path fd unhide
is a no-op, while it should unhide the fd subdirectory.

Approved by: phk, rwatson (mentor)
PR: kern/60897


124081 02-Jan-2004 phk

Improve on POLA by populating DEVFS before doing devfs(8) rule ioctls.

PR: 60687
Spotted by: Colin Percival <cperciva@daemonology.net>


122524 12-Nov-2003 rwatson

Modify the MAC Framework so that instead of embedding a (struct label)
in various kernel objects to represent security data, we embed a
(struct label *) pointer, which now references labels allocated using
a UMA zone (mac_label.c). This allows the size and shape of struct
label to be varied without changing the size and shape of these kernel
objects, which become part of the frozen ABI with 5-STABLE. This opens
the door for boot-time selection of the number of label slots, and hence
changes to the bound on the number of simultaneous labeled policies
at boot-time instead of compile-time. This also makes it easier to
embed label references in new objects as required for locking/caching
with fine-grained network stack locking, such as inpcb structures.

This change also moves us further in the direction of hiding the
structure of kernel objects from MAC policy modules, not to mention
dramatically reducing the number of '&' symbols appearing in both the
MAC Framework and MAC policy modules, and improving readability.

While this results in minimal performance change with MAC enabled, it
will observably shrink the size of a number of critical kernel data
structures for the !MAC case, and should have a small (but measurable)
performance benefit (i.e., struct vnode, struct socket) do to memory
conservation and reduced cost of zeroing memory.

NOTE: Users of MAC must recompile their kernel and all MAC modules as a
result of this change. Because this is an API change, third party
MAC modules will also need to be updated to make less use of the '&'
symbol.

Suggestions from: bmilekic
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


121281 20-Oct-2003 phk

Remember to check the DE_WHITEOUT flag in the case where a cloned
device is hidden by a devfs(8) rule.

Spotted by: Adam Nowacki <ptnowak@bsk.vectranet.pl>


121270 20-Oct-2003 phk

When a driver successfully created a device on demand, we can directly
pick up the DEVFS inode number from the dev_t and find our directory
entry from that, we don't need to scan the directory to find it.

This also solves an issue with on-demand devices in subdirectories.

Submitted by: cognet


116271 12-Jun-2003 phk

Initialize struct vfsops C99-sparsely.

Submitted by: hmp
Reviewed by: phk


115511 31-May-2003 phk

Remove unused variable.

Found by: FlexeLint


112119 11-Mar-2003 kan

Rename vfs_stdsync function to vfs_stdnosync which matches more
closely what function is really doing. Update all existing consumers
to use the new name.

Introduce a new vfs_stdsync function, which iterates over mount
point's vnodes and call FSYNC on each one of them in turn.

Make nwfs and smbfs use this new function instead of rolling their
own identical sync implementations.

Reviewed by: jeff


111841 03-Mar-2003 njl

Finish cleanup of vprint() which was begun with changing v_tag to a string.
Remove extraneous uses of vop_null, instead defering to the default op.
Rename vnode type "vfs" to the more descriptive "syncer".
Fix formatting for various filesystems that use vop_print.


111742 02-Mar-2003 des

Clean up whitespace, s/register //, refrain from strong urge to ANSIfy.


111741 02-Mar-2003 des

uiomove-related caddr_t -> void * (just the low-hanging fruit)


111730 02-Mar-2003 phk

NODEVFS cleanup:

Replace devfs_{create,destroy} hooks with direct function calls.


111119 19-Feb-2003 imp

Back out M_* changes, per decision of the TRB.

Approved by: trb


110063 29-Jan-2003 phk

NODEVFS cleanup: remove #ifdefs.


109623 21-Jan-2003 alfred

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


109526 19-Jan-2003 phk

Originally when DEVFS was added, a global variable "devfs_present"
was used to control code which were conditional on DEVFS' precense
since this avoided the need for large-scale source pollution with
#include "opt_geom.h"

Now that we approach making DEVFS standard, replace these tests
with an #ifdef to facilitate mechanical removal once DEVFS becomes
non-optional.

No functional change by this commit.


109202 13-Jan-2003 phk

Even if the permissions deny it, a process should be allowed to
access its controlling terminal.

In essense, history dictates that any process is allowed to open
/dev/tty for RW, irrespective of credential, because by definition
it is it's own controlling terminal.

Before DEVFS we relied on a hacky half-device thing (kern/tty_tty.c)
which did the magic deep down at device level, which at best was
disgusting from an architectural point of view.

My first shot at this was to use the cloning mechanism to simply
give people the right tty when they ask for /dev/tty, that's why
you get this, slightly counter intuitive result:

syv# ls -l /dev/tty `tty`
crw--w---- 1 u1 tty 5, 0 Jan 13 22:14 /dev/tty
crw--w---- 1 u1 tty 5, 0 Jan 13 22:14 /dev/ttyp0

Trouble is, when user u1 su(1)'s to user u2, he cannot open
/dev/ttyp0 anymore because he doesn't have permission to do so.

The above fix allows him to do that.

The interesting side effect is that one was previously only able
to access the controlling tty by indirection:
date > /dev/tty
but not by name:
date > `tty`

This is now possible, and that feels a lot more like DTRT.

PR: 46635
MFC candidate: could be.


109090 11-Jan-2003 dd

Add symlink support to devfs_rule_matchpath(). This allows the user
to unhide symlinks as well as hide them.


108648 04-Jan-2003 phk

Since Jeffr made the std* functions the default in rev 1.63 of
kern/vfs_defaults.c it is wrong for the individual filesystems to use
the std* functions as that prevents override of the default.

Found by: src/tools/tools/vop_table


108341 28-Dec-2002 rwatson

Trim left-over and unused vop_refreshlabel() bits from devfs.

Reported by: bde


107698 09-Dec-2002 rwatson

Remove dm_root entry from struct devfs_mount. It's never set, and is
unused. Replace it with a dm_mount back-pointer to the struct mount
that the devfs_mount is associated with. Export that pointer to MAC
Framework entry points, where all current policies don't use the
pointer. This permits the SEBSD port of SELinux's FLASK/TE to compile
out-of-the-box on 5.0-CURRENT with full file system labeling support.

Approved by: re (murray)
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


105988 26-Oct-2002 rwatson

Slightly change the semantics of vnode labels for MAC: rather than
"refreshing" the label on the vnode before use, just get the label
right from inception. For single-label file systems, set the label
in the generic VFS getnewvnode() code; for multi-label file systems,
leave the labeling up to the file system. With UFS1/2, this means
reading the extended attribute during vfs_vget() as the inode is
pulled off disk, rather than hitting the extended attributes
frequently during operations later, improving performance. This
also corrects sematics for shared vnode locks, which were not
previously present in the system. This chances the cache
coherrency properties WRT out-of-band access to label data, but in
an acceptable form. With UFS1, there is a small race condition
during automatic extended attribute start -- this is not present
with UFS2, and occurs because EAs aren't available at vnode
inception. We'll introduce a work around for this shortly.

Approved by: re
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


105585 20-Oct-2002 rwatson

Missed a case of _POSIX_MAC_PRESENT -> _PC_MAC_PRESENT rename.

Pointed out by: phk


105212 16-Oct-2002 phk

Fix comments and one resulting code confusion about the type of the
"command" argument to VOP_IOCTL.

Spotted by: FlexeLint.


105210 16-Oct-2002 phk

A better solution to avoiding variable sized structs in DEVFS.


105209 16-Oct-2002 phk

#include "opt_devfs.h" to protect against variable sized structures.

Spotted by: FlexeLint


104908 11-Oct-2002 mike

Change iov_base's type from `char *' to the standard `void *'. All
uses of iov_base which assume its type is `char *' (in order to do
pointer arithmetic) have been updated to cast iov_base to `char *'.


104653 08-Oct-2002 dd

Treat the pathptrn field as a real pattern with the aid of fnmatch().


104533 05-Oct-2002 rwatson

Integrate a devfs/MAC fix from the MAC tree: avoid a race condition during
devfs VOP symlink creation by introducing a new entry point to determine
the label of the devfs_dirent prior to allocation of a vnode for the
symlink.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


104278 01-Oct-2002 phk

Move the vop-vector declaration into devfs_vnops.c where it belongs.


104113 28-Sep-2002 phk

s/struct dev_t */dev_t */


104099 28-Sep-2002 phk

Fix mis-indent.


103559 18-Sep-2002 njl

Remove any VOP_PRINT that redundantly prints the tag.
Move lockmgr_printinfo() into vprint() for everyone's benefit.

Suggested by: bde


103314 14-Sep-2002 njl

Remove all use of vnode->v_tag, replacing with appropriate substitutes.
v_tag is now const char * and should only be used for debugging.

Additionally:
1. All users of VT_NTS now check vfsconf->vf_type VFCF_NETWORK
2. The user of VT_PROCFS now checks for the new flag VV_PROCDEP, which
is propagated by pseudofs to all child vnodes if the fs sets PFS_PROCDEP.

Suggested by: phk
Reviewed by: bde, rwatson (earlier version)


101777 13-Aug-2002 phk

Introduce typedefs for the member functions of struct vfsops and employ
these in the main filesystems. This does not change the resulting code
but makes the source a little bit more grepable.

Sponsored by: DARPA and NAI Labs.


101308 04-Aug-2002 jeff

- Replace v_flag with v_iflag and v_vflag
- v_vflag is protected by the vnode lock and is used when synchronization
with VOP calls is needed.
- v_iflag is protected by interlock and is used for dealing with vnode
management issues. These flags include X/O LOCK, FREE, DOOMED, etc.
- All accesses to v_iflag and v_vflag have either been locked or marked with
mp_fixme's.
- Many ASSERT_VOP_LOCKED calls have been added where the locking was not
clear.
- Many functions in vfs_subr.c were restructured to provide for stronger
locking.

Idea stolen from: BSD/OS


101195 02-Aug-2002 rwatson

Introduce support for Mandatory Access Control and extensible
kernel access control.

Teach devfs how to respond to pathconf() _POSIX_MAC_PRESENT queries,
allowing it to indicate to user processes that individual vnode labels
are available.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


101191 01-Aug-2002 rwatson

Hook up devfs_pathconf() for specfs devfs nodes, not just regular
devfs nodes.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


101069 31-Jul-2002 rwatson

Introduce support for Mandatory Access Control and extensible
kernel access control.

Instrument devfs to support per-dirent MAC labels. In particular,
invoke MAC framework when devfs directory entries are instantiated
due to make_dev() and related calls, and invoke the MAC framework
when vnodes are instantiated from these directory entries. Implement
vop_setlabel() for devfs, which pushes the label update into the
devfs directory entry for semi-persistant store. This permits the MAC
framework to assign labels to devices and directories as they are
instantiated, and export access control information via devfs vnodes.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


100994 30-Jul-2002 rwatson

Introduce support for Mandatory Access Control and extensible
kernel access control.

Label devfs directory entries, permitting labels to be maintained
on device nodes in devfs instances persistently despite vnode
recycling.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


100804 28-Jul-2002 dd

Correct misindentation of DRA_UID.


100793 28-Jul-2002 dd

Unimplement panic(8) by making sure that we don't recurse into a
ruleset. If we do, that means there's a ruleset loop (10 includes 20
include 30 includes 10), which will quickly cause a double fault due
to stack overflow (since "include" is implemented by recursion).
(Previously, we only checked that X didn't include X.)


100206 17-Jul-2002 dd

Introduce the DEVFS "rule" subsystem. DEVFS rules permit the
administrator to define certain properties of new devfs nodes before
they become visible to the userland. Both static (e.g., /dev/speaker)
and dynamic (e.g., /dev/bpf*, some removable devices) nodes are
supported. Each DEVFS mount may have a different ruleset assigned to
it, permitting different policies to be implemented for things like
jails.

Approved by: phk


97702 01-Jun-2002 semenu

Make devfs to give honour to PDIRUNLOCK flag.

Reviewed by: jeff
MFC after: 1 week


96356 10-May-2002 mux

Fix several bugs in devfs_lookupx(). When we check the nameiop to
make sure it's a correct operation for devfs, do it only in the
ISLASTCN case. If we don't, we are assuming that the final file will
be in devfs, which is not true if another partition is mounted on top
of devfs or with special filenames (like /dev/net/../../foo).

Reviewed by: phk


95954 02-May-2002 mux

Convert devfs to nmount.

Reviewed by: phk


95750 29-Apr-2002 rwatson

Use vnode locking with devfs; permit VFS locking assertions to make
sense for devfs vnodes, and reduce/remove potential races in the devfs
code.

Submitted by: iadowse
Approved by: phk


95212 21-Apr-2002 bde

Don't attempt to decvlare M_DEVFS whern MALLOC_DECLARE is not defined.
This fixes warnings that should be errors in fstat.

Reminded by: alpha tinderbox

Fixed some style bugs (ones near BOF and EOF; there are many more).


93886 05-Apr-2002 bde

Fixed assorted bugs in setting of timestamps in devfs_setattr().

Setting of timestamps on devices had no effect visible to userland
because timestamps for devices were set in places that are never used.
This broke:
- update of file change time after a change of an attribute
- setting of file access and modification times.

The VA_UTIMES_NULL case did not work. Revs 1.31-1.32 were supposed to
fix this by copying correct bits from ufs, but had little or no effect
because the old checks were not removed.


93593 01-Apr-2002 jhb

Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API. The entire API now consists of two functions
similar to the pre-KSE API. The suser() function takes a thread pointer
as its only argument. The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0. The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on: smp@


92727 19-Mar-2002 alfred

Remove __P.


92270 14-Mar-2002 maxim

Be consistent with UFS in a way how devfs_setattr() checks credentials
for chmod(2), chown(2) and utimes(2) with respect to jail(2).

Reviewed by: rwatson, ru
Not objected by: phk
Approved by: ru


89118 09-Jan-2002 msmith

Add a new sysinit SI_SUB_DEVFS. Devfs hooks into the kernel at SI_ORDER_FIRST,
and devices can be created anytime after that.

Print a warning if an atttempt is made to create a device too early.


89107 09-Jan-2002 msmith

Use a sysinit to initialise the devfs hooks in kern_conf.c rather than common
variables.

Reviewed by: phk (in principle)


86892 25-Nov-2001 dd

Address two minor issues: implement the _PC_NAME_MAX and _PC_PATH_MAX
pathconf() variables for directories, and set st_size and st_blocks
(of struct stat) for directories as appropriate. Note that st_size is
always set to DEV_BSIZE, since the size of the directories is not
currently kept.

Reviewed by: phk, bde


86040 04-Nov-2001 phk

Fix "echo > /dev/null" for non-root users which broke in previous commit.


85980 03-Nov-2001 phk

Use vfs_timestamp() instead of getnanotime().

Add magic stuff copied from ufs_setattr().

Instructed by: bde


85979 03-Nov-2001 phk

Use vfs_timestamp() instead of getnanotime() directly.
Fix some modes on directories and symlinks.

Instructed by: bde


84873 13-Oct-2001 bde

Backed out vestiges of the quick fixes for the transient breakage of
<sys/mount.h> in rev.1.106 of the latter (don't include <sys/socket.h>
just to work around bugs in <sys/mount.h>).


84156 30-Sep-2001 phk

The behaviour of whiteout'ing symlinks were too confusing, instead
remove them when asked to.


83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


81620 14-Aug-2001 phk

linux ls fails on DEVFS /dev because linux_getdents fails because
linux_getdents uses VOP_READDIR( ..., &ncookies, &cookies ) instead of
VOP_READDIR( ..., NULL, NULL ) because it seems to need the offsets for
linux_dirent and sizeof(dirent) != sizeof(linux_dirent)...

PR: 29467
Submitted by: Michael Reifenberger <root@nihil.plaut.de>
Reviewed by: phk


77589 01-Jun-2001 brian

Support /dev/tun cloning. Ansify if_tun.c while I'm there.

Only tun0 -> tun32767 may now be opened as struct ifnet's if_unit
is a short.

It's now possible to open /dev/tun and get a handle back for an available
tun device (use devname to find out what you got).

The implementation uses rman by popular demand (and against my judgement)
to track opened devices and uses the new dev_depends() to ensure that
all make_dev()d devices go away before the module is unloaded.

Reviewed by: phk


77243 26-May-2001 phk

Don't copy the trailing zero in readlink, it confuses namei().

PR: 27656


77215 26-May-2001 phk

Create a general facility for making dev_t's depend on another
dev_t. The dev_depends(dev_t, dev_t) function is for tying them
to each other.

When destroy_dev() is called on a dev_t, all dev_t's depending
on it will also be destroyed (depth first order).

Rewrite the make_dev_alias() to use this dependency facility.

kern/subr_disk.c:
Make the disk mini-layer use dependencies to make sure all
relevant dev_t's are removed when the disk disappears.

Make the disk mini-layer precreate some magic sub devices
which the disk/slice/label code expects to be there.

kern/subr_disklabel.c:
Remove some now unneeded variables.

kern/subr_diskmbr.c:
Remove some ancient, commented out code.

kern/subr_diskslice.c:
Minor cleanup. Use name from dev_t instead of dsname()


77050 23-May-2001 phk

Change the way deletes are managed in DEVFS.

This fixes a number of warnings relating to removed cloned devices.

It also makes it possible to recreate deleted devices with
mknod(2). The major/minor arguments are ignored.


76688 16-May-2001 iedowse

Change the second argument of vflush() to an integer that specifies
the number of references on the filesystem root vnode to be both
expected and released. Many filesystems hold an extra reference on
the filesystem root vnode, which must be accounted for when
determining if the filesystem is busy and then released if it isn't
busy. The old `skipvp' approach required individual filesystem
xxx_unmount functions to re-implement much of vflush()'s logic to
deal with the root vnode.

All 9 filesystems that hold an extra reference on the root vnode
got the logic wrong in the case of forced unmounts, so `umount -f'
would always fail if there were any extra root vnode references.
Fix this issue centrally in vflush(), now that we can.

This commit also fixes a vnode reference leak in devfs, which could
result in idle devfs filesystems that refuse to unmount.

Reviewed by: phk, bp


76571 14-May-2001 phk

After a successfull poll of the cloning functions, match on the
returned dev_t rather than the original name.

This allows cloning from one name to another which is useful for
/dev/tty and later for the pty's.


76554 13-May-2001 phk

Convert DEVFS from an "opt-in" to an "opt-out" option.

If for some reason DEVFS is undesired, the "NODEVFS" option is
needed now.

Pending any significant issues, DEVFS will be made mandatory in
-current on july 1st so that we can start reaping the full
benefits of having it.


76320 06-May-2001 phk

Remove unneeded devfs_badop()

Noticed by: rwatson


76166 01-May-2001 markm

Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by: bde (with reservations)


76131 29-Apr-2001 phk

Add a vop_stdbmap(), and make it part of the default vop vector.

Make 7 filesystems which don't really know about VOP_BMAP rely
on the default vector, rather than more or less complete local
vop_nopbmap() implementations.


75874 23-Apr-2001 mjacob

add this ridiculous include foo so it will compile again


73286 01-Mar-2001 adrian

Reviewed by: jlemon

An initial tidyup of the mount() syscall and VFS mount code.

This code replaces the earlier work done by jlemon in an attempt to
make linux_mount() work.

* the guts of the mount work has been moved into vfs_mount().

* move `type', `path' and `flags' from being userland variables into being
kernel variables in vfs_mount(). `data' remains a pointer into
userspace.

* Attempt to verify the `type' and `path' strings passed to vfs_mount()
aren't too long.

* rework mount() and linux_mount() to take the userland parameters
(besides data, as mentioned) and pass kernel variables to vfs_mount().
(linux_mount() already did this, I've just tidied it up a little more.)

* remove the copyin*() stuff for `path'. `data' still requires copyin*()
since its a pointer into userland.

* set `mount->mnt_statf_mntonname' in vfs_mount() rather than in each
filesystem. This variable is generally initialised with `path', and
each filesystem can override it if they want to.

* NOTE: f_mntonname is intiailised with "/" in the case of a root mount.


72637 18-Feb-2001 phk

Remove a debug printf.


71945 02-Feb-2001 phk

At the point in time where most devices are created, we don't know what
time it is because boottime is not yet initialized. Finagle the relevant
fields when we get the chance.


71936 02-Feb-2001 phk

Only superuser can create symlinks.
Give symlinks mode 755 by default to avoid triggering alert eyes.
(the mode isn't use on symlinks)


71822 30-Jan-2001 phk

Fix two minor nits.

Existences revealed, but no details offered by: bp


69781 08-Dec-2000 dwmalone

Convert more malloc+bzero to malloc+M_ZERO.

Submitted by: josh@zipperup.org
Submitted by: Robert Drehmel <robd@gmx.net>


69767 08-Dec-2000 phk

staticize.


67893 29-Oct-2000 phk

Move suser() and suser_xxx() prototypes and a related #define from
<sys/proc.h> to <sys/systm.h>.

Correctly document the #includes needed in the manpage.

Add one now needed #include of <sys/systm.h>.
Remove the consequent 48 unused #includes of <sys/proc.h>.


67882 29-Oct-2000 phk

Remove unneeded #include <sys/proc.h> lines.


66877 09-Oct-2000 phk

Don't hold an extra reference to vnodes. Devfs vnodes are sufficiently
cheap to setup that it doesn't really matter that we recycle device
vnodes at kleenex speed.

Implement first cut try at killing cloned devices when they are
not needed anymore. For now only the bpf driver is involved in
this experiment. Cloned devices can set the SI_CHEAPCLONE flag
which allows us to destroy_dev() it when the vcount() drops to zero
and the vnode is reclaimed. For now it's a requirement that the
driver doesn't keep persistent state from close to (re)open.

Some whitespace changes.


66615 04-Oct-2000 jasone

Convert lockmgr locks from using simple locks to using mutexes.

Add lockdestroy() and appropriate invocations, which corresponds to
lockinit() and must be called to clean up after a lockmgr lock is no
longer needed.


66028 18-Sep-2000 phk

Ignore attempts to set flags to zero. This quenches a syslog warning
from login(1).


65920 16-Sep-2000 phk

Add canonical checks to devfs_setattr().


65788 12-Sep-2000 jhb

Use size_t instead of u_int for 4th argument to copyinstr().


65515 06-Sep-2000 phk

Add refcounts to the "global" DEVFS inode slots, this allows us
to recycle inodes after a destroy_dev() but not until all mounts
have picked up the change.

Add support for an overflow table for DEVFS inodes. The static
table defaults to 1024 inodes, if that fills, an overflow table
of 32k inodes is allocated. Both numbers can be changed at
compile time, the size of the overflow table also with the
sysctl vfs.devfs.noverflow.

Use atomic instructions to barrier between make_dev()/destroy_dev()
and the mounts.

Add lockmgr() locking of directories for operations accessing or
modifying the directory TAILQs.

Various nitpicking here and there.


65447 04-Sep-2000 phk

Off by one error.

Submitted by: des


65374 02-Sep-2000 phk

Avoid the modules madness I inadvertently introduced by making the
cloning infrastructure standard in kern_conf. Modules are now
the same with or without devfs support.

If you need to detect if devfs is present, in modules or elsewhere,
check the integer variable "devfs_present".

This happily removes an ugly hack from kern/vfs_conf.c.

This forces a rename of the eventhandler and the standard clone
helper function.

Include <sys/eventhandler.h> in <sys/conf.h>: it's a helper #include
like <sys/queue.h>

Remove all #includes of opt_devfs.h they no longer matter.


65200 29-Aug-2000 rwatson

o Restructure vaccess() so as to check for DAC permission to modify the
object before falling back on privilege. Make vaccess() accept an
additional optional argument, privused, to determine whether
privilege was required for vaccess() to return 0. Add commented
out capability checks for reference. Rename some variables to make
it more clear which modes/uids/etc are associated with the object,
and which with the access mode.
o Update file system use of vaccess() to pass NULL as the optional
privused argument. Once additional patches are applied, suser()
will no longer set ASU, so privused will permit passing of
privilege information up the stack to the caller.

Reviewed by: bde, green, phk, -security, others
Obtained from: TrustedBSD Project


65132 27-Aug-2000 phk

Reorder vop's alphabetically.
Smarter use of devfs_allocv() (from bp@)
Introduce devfs_find()
".." fixes to devfs_lookup (from bp@)


65118 26-Aug-2000 phk

Minor cleanups tp devfs_readdir();
Add devfs_read() for directories. (inspired by bp@)


65051 24-Aug-2000 phk

Fix panic when removing open device (found by bp@)
Implement subdirs.
Build the full "devicename" for cloning functions.
Fix panic when deleted device goes away.
Collaps devfs_dir and devfs_dirent structures.
Add proper cloning to the /dev/fd* "device-"driver.
Fix a bug in make_dev_alias() handling which made aliases appear
multiple times.
Use devfs_clone to implement getdiskbyname()
Make specfs maintain the stat(2) timestamps per dev_t


64895 21-Aug-2000 phk

Fix devfs_access() bug on directories.

Remove unused #includes.

Bug spotted by: markm


64880 20-Aug-2000 phk

Remove all traces of Julians DEVFS (incl from kern/subr_diskslice.c)

Remove old DEVFS support fields from dev_t.

Make uid, gid & mode members of dev_t and set them in make_dev().

Use correct uid, gid & mode in make_dev in disk minilayer.

Add support for registering alias names for a dev_t using the
new function make_dev_alias(). These will show up as symlinks
in DEVFS.

Use makedev() rather than make_dev() for MFSs magic devices to prevent
DEVFS from noticing this abuse.

Add a field for DEVFS inode number in dev_t.

Add new DEVFS in fs/devfs.

Add devfs cloning to:
disk minilayer (ie: ad(4), sd(4), cd(4) etc etc)
md(4), tun(4), bpf(4), fd(4)

If DEVFS add -d flag to /sbin/inits args to make it mount devfs.

Add commented out DEVFS to GENERIC