History log of /freebsd-current/sys/kern/subr_disk.c
Revision Date Author Comments
# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 3c0177b8 25-Oct-2020 Alexander Motin <mav@FreeBSD.org>

Enable bioq 'car limit' added at r335066 at 128 bios.

Without the 'car limit' enabled (before this), running sequential ZFS scrub
on HDD without command queuing support, I've measured latency on concurrent
random reads reaching 4 seconds (surprised that not more). Enabling this
reduced the latency to 65 milliseconds, while scrub still doing ~180MB/s.

For disks with command queuing this does not make much difference (if any),
since most time all the requests are queued down to the disk or HBA, leaving
nothing in the queue to sort. And even if something does not fit, staying on
the queue, it is likely not for long. To not limit sorting in such bursty
scenarios I've added batched counter zeroing when the queue is getting empty.

The internal scheduler of the SAS HDD I was testing seems to be even more
loyal to random I/O, reducing the scrub speed to ~120MB/s. So in case
somebody worried this is limit is too strict -- it actually looks relaxed.

MFC after: 2 weeks
Sponsored by: iXsystems, Inc.


# 7e48d711 11-Mar-2019 Warner Losh <imp@FreeBSD.org>

Fix botched merge with 355066

When merging from Netflix's tree, resetting the carsize was dropped
accidentally. This fix fixes that revision by properly resetting how
many are in the car.

Noticed by: mav@


# 6afd9210 30-Jan-2019 Alexander Motin <mav@FreeBSD.org>

Only sort requests of types that have concept of offset.

Other types, such as BIO_FLUSH or BIO_ZONE, or especially new/unknown ones,
may imply some degree of ordering even if strict ordering is not requested
explicitly.

MFC after: 2 weeks
Sponsored by: iXsystems, Inc.


# a971acbc 13-Jun-2018 Warner Losh <imp@FreeBSD.org>

Implement a 'car limit' for bioq.

Allow one to implement a 'car limit' for
bioq_disksort. debug.bioq_batchsize sets the size of car limit. Every
time we queue that many requests, we start over so that we limit the
latency for requests when the software queue depths are large. A value
of '0', the default, means to revert to the old behavior.

Sponsored by: Netflix


# 64de3fdd 30-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

SPDX: use the Beerware identifier.


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# f03f7a0c 02-Sep-2010 Justin T. Gibbs <gibbs@FreeBSD.org>

Correct bioq_disksort so that bioq_insert_tail() offers barrier semantic.
Add the BIO_ORDERED flag for struct bio and update bio clients to use it.

The barrier semantics of bioq_insert_tail() were broken in two ways:

o In bioq_disksort(), an added bio could be inserted at the head of
the queue, even when a barrier was present, if the sort key for
the new entry was less than that of the last queued barrier bio.

o The last_offset used to generate the sort key for newly queued bios
did not stay at the position of the barrier until either the
barrier was de-queued, or a new barrier (which updates last_offset)
was queued. When a barrier is in effect, we know that the disk
will pass through the barrier position just before the
"blocked bios" are released, so using the barrier's offset for
last_offset is the optimal choice.

sys/geom/sched/subr_disk.c:
sys/kern/subr_disk.c:
o Update last_offset in bioq_insert_tail().

o Only update last_offset in bioq_remove() if the removed bio is
at the head of the queue (typically due to a call via
bioq_takefirst()) and no barrier is active.

o In bioq_disksort(), if we have a barrier (insert_point is non-NULL),
set prev to the barrier and cur to it's next element. Now that
last_offset is kept at the barrier position, this change isn't
strictly necessary, but since we have to take a decision branch
anyway, it does avoid one, no-op, loop iteration in the while
loop that immediately follows.

o In bioq_disksort(), bypass the normal sort for bios with the
BIO_ORDERED attribute and instead insert them into the queue
with bioq_insert_tail(). bioq_insert_tail() not only gives
the desired command order during insertion, but also provides
barrier semantics so that commands disksorted in the future
cannot pass the just enqueued transaction.

sys/sys/bio.h:
Add BIO_ORDERED as bit 4 of the bio_flags field in struct bio.

sys/cam/ata/ata_da.c:
sys/cam/scsi/scsi_da.c
Use an ordered command for SCSI/ATA-NCQ commands issued in
response to bios with the BIO_ORDERED flag set.

sys/cam/scsi/scsi_da.c
Use an ordered tag when issuing a synchronize cache command.

Wrap some lines to 80 columns.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
sys/geom/geom_io.c
Mark bios with the BIO_FLUSH command as BIO_ORDERED.

Sponsored by: Spectra Logic Corporation
MFC after: 1 month


# 1a996ed1 18-Jul-2010 Edward Tomasz Napierala <trasz@FreeBSD.org>

Revert r210225 - turns out I was wrong; the "/*-" is not license-only
thing; it's also used to indicate that the comment should not be automatically
rewrapped.

Explained by: cperciva@


# 805cc58a 18-Jul-2010 Edward Tomasz Napierala <trasz@FreeBSD.org>

The "/*-" comment marker is supposed to denote copyrights. Remove non-copyright
occurences from sys/sys/ and sys/kern/.


# d4619572 13-Feb-2009 Luigi Rizzo <luigi@FreeBSD.org>

Clarify and reimplement the bioq API so that bioq_disksort() has
the correct behaviour (sorting by distance from the current head position
in the scan direction) and bioq_insert_head() and bioq_insert_tail()
have a well defined (and useful) behaviour, especially when intermixed
with calls to bioq_disksort().

In particular:
- fix a bug in the existing bioq_disksort() that did not use the
current head position correctly;
- redefine semantics of bioq_insert_head() and bioq_insert_tail().
bioq_insert_tail() can now be used as a barrier
between previous and subsequent calls to bioq_disksort().

The code is heavily documented in the source code so please refer
to that for the details.

Much of this code comes from Fabio Checconi. Also thanks to Kirk
for feedback on the (re)definition of bioq_insert_tail().

NOTE: in the current tree there is only a handful of files which
intermix calls to bioq_disksort() with bioq_insert_head() and
bioq_insert_tail(). The ordering of the queue in these situation
was not specified (nor easy to figure out) before, so I doubt any
of that code could be affected by the specification of the API.

Also note that the current implementation is significantly simpler
than the previous one (also used in ata_sort_queue()).
It would be useful to reimplement ata_sort_queue() using
the same code used in bioq_disksort().

MFC after: 1 week


# 13b4c4c3 03-Feb-2009 Warner Losh <imp@FreeBSD.org>

Make bioq_disksort have a ANSI-C definition rather than a K&R definition.


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# c3618c65 31-Oct-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Add a new I/O request - BIO_FLUSH, which basically tells providers below to
flush their caches. For now will mostly be used by disks to flush their
write cache.

Sponsored by: home.pl


# 56e26c3e 28-May-2006 Xin LI <delphij@FreeBSD.org>

Unexpand TAILQ_FIRST(foo) == NULL to TAILQ_EMPTY(foo).


# bc03ea7f 13-Jan-2006 Robert Watson <rwatson@FreeBSD.org>

When calling bioq_first() to see if a queue is empty in bioq_disksort(),
don't save the return value as we won't use it.

Noticed by: Coverity Prevent analysis tool
MFC after: 3 days


# bdcd9f26 15-Jun-2005 Jeff Roberson <jeff@FreeBSD.org>

- Fix insertions of bios which represent data earlier than anything else
in the queue. The insertion sort assumed this had already been taken
care of.

Spotted by: Antoine Brodin
Approved by: re (scottl)


# f19f6869 12-Jun-2005 Jeff Roberson <jeff@FreeBSD.org>

- Dramatically simplify bioqdisksort(). We no longer do ordered bios so
most of the code to deal with them has been dead for sometime. Simplify
the code by doing an insert sort hinted by the current head position.

Met with apathy by: arch@


# 9454b2d8 06-Jan-2005 Warner Losh <imp@FreeBSD.org>

/* -> /*- for copyright notices, minor format tweaks as necessary


# bf484316 12-Dec-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Add bioq_insert_head() function.

OK'd by: phk


# d298f919 19-Aug-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Add bioq_takefirst().

If the bioq is empty, NULL is returned. Otherwise the front element
is removed and returned.

This can simplify locking in many drivers from:

lock()
bp = bioq_first(bq);
if (bp == NULL) {
unlock()
return
}
bioq_remove(bp, bq)
unlock
to:
lock()
bp = bioq_takefirst(bq);
unlock()
if (bp == NULL)
return;


# 1ad9172f 18-Oct-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Report bio_pblkbo instead of bio_blkno.


# 4cb4df48 18-Oct-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Make bioq_disksort() sort on the bio_offset field instead of bio_pblkno.


# b8404473 14-Oct-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Made use of 'error' argument, which was unused (by mistake) before.

Submitted by: Pawel Jakub Dawidek <nick@garage.freebsd.pl>


# 677b542e 10-Jun-2003 David E. O'Brien <obrien@FreeBSD.org>

Use __FBSDID().


# a3007012 16-Apr-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Don't include <sys/disklabel.h>


# b0fc6220 03-Apr-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Remove BIO_SETATTR from non-GEOM part of kernel as well.


# 81750927 01-Apr-2003 Poul-Henning Kamp <phk@FreeBSD.org>

#include <geom/geom_disk.h>


# af6ca7f4 31-Mar-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Introduce bioq_flush() function.


# d2a0822e 30-Mar-2003 Poul-Henning Kamp <phk@FreeBSD.org>

retire the "busy" field in bioqueues, it's served it's purpose.


# d086f85a 30-Mar-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Preparation commit before I start on the bioqueue lockdown:

Collect all the bits of bioqueue handing in subr_disk.c, vfs_bio.c is big
enough as it is and disksort already lives in subr_disk.c.


# b4b138c2 18-Mar-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Including <sys/stdint.h> is (almost?) universally only to be able to use
%j in printfs, so put a newsted include in <sys/systm.h> where the printf
prototype lives and save everybody else the trouble.


# a9463ba8 03-Mar-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Don't pick up a name from the dev_t if it is not there.


# 8e670757 29-Jan-2003 Poul-Henning Kamp <phk@FreeBSD.org>

NO_GEOM cleanup: remove #ifdef


# 44956c98 21-Jan-2003 Alfred Perlstein <alfred@FreeBSD.org>

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


# 0b4583e8 20-Jan-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Only include <sys/diskslice.h> ifdef NO_GEOM


# e03486d1 21-Oct-2002 Kirk McKusick <mckusick@FreeBSD.org>

This checkin reimplements the io-request priority hack in a way
that works in the new threaded kernel. It was commented out of
the disksort routine earlier this year for the reasons given in
kern/subr_disklabel.c (which is where this code used to reside
before it moved to kern/subr_disk.c):

----------------------------
revision 1.65
date: 2002/04/22 06:53:20; author: phk; state: Exp; lines: +5 -0
Comment out Kirks io-request priority hack until we can do this in a
civilized way which doesn't cause grief.

The problem is that it is not generally safe to cast a "struct bio
*" to a "struct buf *". Things like ccd, vinum, ata-raid and GEOM
constructs bio's which are not entrails of a struct buf.

Also, curthread may or may not have anything to do with the I/O request
at hand.

The correct solution can either be to tag struct bio's with a
priority derived from the requesting threads nice and have disksort
act on this field, this wouldn't address the "silly-seek syndrome"
where two equal processes bang the diskheads from one edge to the
other of the disk repeatedly.

Alternatively, and probably better: a sleep should be introduced
either at the time the I/O is requested or at the time it is completed
where we can be sure to sleep in the right thread.

The sleep also needs to be in constant timeunits, 1/hz can be practicaly
any sub-second size, at high HZ the current code practically doesn't
do anything.
----------------------------

As suggested in this comment, it is no longer located in the disk sort
routine, but rather now resides in spec_strategy where the disk operations
are being queued by the thread that is associated with the process that
is really requesting the I/O. At that point, the disk queues are not
visible, so the I/O for positively niced processes is always slowed
down whether or not there is other activity on the disk.

On the issue of scaling HZ, I believe that the current scheme is
better than using a fixed quantum of time. As machines and I/O
subsystems get faster, the resolution on the clock also rises.
So, ten years from now we will be slowing things down for shorter
periods of time, but the proportional effect on the system will
be about the same as it is today. So, I view this as a feature
rather than a drawback. Hence this patch sticks with using HZ.

Sponsored by: DARPA & NAI Labs.
Reviewed by: Poul-Henning Kamp <phk@critter.freebsd.dk>


# e3bf3aea 21-Oct-2002 Olivier Houchard <cognet@FreeBSD.org>

One #include <sys/sysctl.h> should be enough.

Approved by: mux (mentor)


# 2e307eb8 17-Oct-2002 Maxim Sobolev <sobomax@FreeBSD.org>

Separate fiels reported by disk_err() with spaces, so that output doesn't
look cryptic.

MFC after: 1 week


# 64b023f4 14-Oct-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Populate more fields of the disklabel for PC98.

Submitted by: Kawanobe Koh <kawanobe@st.rim.or.jp>


# 3bd65612 05-Oct-2002 Poul-Henning Kamp <phk@FreeBSD.org>

NB: This commit does *NOT* make GEOM the default in FreeBSD
NB: But it will enable it in all kernels not having options "NO_GEOM"

Put the GEOM related options into the intended order.

Add "options NO_GEOM" to all kernel configs apart from NOTES.

In some order of controlled fashion, the NO_GEOM options will be
removed, architecture by architecture in the coming days.

There are currently three known issues which may force people to
need the NO_GEOM option:

boot0cfg/fdisk:
Tries to update the MBR while it is being used to control
slices. GEOM does not allow this as a direct operation.

SCSI floppy drives:
Appearantly the scsi-da driver return "EBUSY" if no media
is inserted. This is wrong, it should return ENXIO.

PC98:
It is unclear if GEOM correctly recognizes all variants of
PC98 disklabels. (Help Wanted! I have neither docs nor HW)

These issues are all being worked.

Sponsored by: DARPA & NAI Labs.


# 52ae0b7f 05-Oct-2002 Brian Somers <brian@FreeBSD.org>

If dsgetlabel() returns a label with a size of zero in diskdumpconf(),
treat it as an invalid partition.

This fixes a bug where ``dumpon <device>'' will configure the dump
device at a random offset on the disk if <device> isn't a valid
partition.

Reviewed by: phk


# 7812d86f 20-Sep-2002 Poul-Henning Kamp <phk@FreeBSD.org>

(This commit touches about 15 disk device drivers in a very consistent
and predictable way, and I apologize if I have gotten it wrong anywhere,
getting prior review on a patch like this is not feasible, considering
the number of people involved and hardware availability etc.)

If struct disklabel is the messenger: kill the messenger.

Inside struct disk we had a struct disklabel which disk drivers used to
communicate certain metrics to the disklayer above (GEOM or the disk
mini-layer). This commit changes this communication to use four
explicit fields instead.

Amongst the benefits is that the fields do not get overwritten by
wrong or bogus on-disk disklabels.

Once that is clear, <sys/disk.h> which is included in the drivers
no longer need to pull <sys/disklabel.h> and <sys/diskslice.h> in,
the few places that needs them, have gotten explicit #includes for
them.

The disklabel inside struct disk is now only for internal use in
the disk mini-layer, so instead of embedding it, we malloc it as
we need it.

This concludes (modulus any mistakes) the series of disklabel related
commits.

I belive it all amounts to a NOP for all the rest of you :-)

Sponsored by: DARPA & NAI Labs.


# 2382fb0a 20-Sep-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Make FreeBSD "struct disklabel" agnostic, step 312 of 723:

Rename bioqdisksort() to bioq_disksort().
Keep a #define around to avoid changing all diskdrivers right now.

Move it from subr_disklabel.c to subr_disk.c.
Move prototype from <sys/disklabel.h> to <sys/bio.h>

Sponsored by: DARPA and NAI Labs.


# f90c382c 19-Sep-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Make FreeBSD "struct disklabel" agnostic, step 311 of 723:

Rename diskerr() to disk_err() for naming consistency.

Drop the by now entirely useless struct disklabel argument.

Add a flag argument for new-line termination.

Fix a couple of printf-format-casts to %j instead of %l.

Correctly print the name of all bio commands.

Move the function from subr_disklabel.c to subr_disk.c,
and from <sys/disklabel.h> to <sys/disk.h>.

Use the new disk_err() throughout, #include <sys/disk.h> as needed.

Bump __FreeBSD_version for the sake of the aac disk drivers #ifdefs.

Remove unused disklabel members of softc for aac, amr and mlx, which seem
to originally have been intended for diskerr() use, but which only rotted
and got Copy&Pasted at least two times to many.

Sponsored by: DARPA & NAI Labs.


# 55f7c614 21-Aug-2002 Archie Cobbs <archie@FreeBSD.org>

Don't use "NULL" when "0" is really meant.


# 1bdb20a6 09-Apr-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Implement DIOCGFRONTSTUFF ioctl which reports how many bytes from the start
of the device magic stuff might occupy.

Sponsored by: DARPA & NAI Labs.


# 7f086a08 09-Apr-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Rename DIOCGKERNELDUMP to DIOCSKERNELDUMP as it strictly speaking
is a "set" not a "get" operation.

Sponsored by: DARPA & NAI Labs.


# 81661c94 31-Mar-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Here follows the new kernel dumping infrastructure.

Caveats:

The new savecore program is not complete in the sense that it emulates
enough of the old savecores features to do the job, but implements none
of the options yet.

I would appreciate if a userland hacker could help me out getting savecore
to do what we want it to do from a users point of view, compression,
email-notification, space reservation etc etc. (send me email if
you are interested).

Currently, savecore will scan all devices marked as "swap" or "dump" in
/etc/fstab _or_ any devices specified on the command-line.

All architectures but i386 lack an implementation of dumpsys(), but
looking at the i386 version it should be trivial for anybody familiar
with the platform(s) to provide this function.

Documentation is quite sparse at this time, more to come.

Details:

ATA and SCSI drivers should work as the dump formatting code has been
removed. The IDA, TWE and AAC have not yet been converted.

Dumpon now opens the device and uses ioctl(DIOCGKERNELDUMP) to set
the device as dumpdev. To implement the "off" argument, /dev/null
is used as the device.

Savecore will fail if handed any options since they are not (yet)
implemented. All devices marked "dump" or "swap" in /etc/fstab
will be scanned and dumps found will be saved to diskfiles
named from the MD5 hash of the header record. The header record
is dumped in readable format in the .info file. The kernel
is not saved. Only complete dumps will be saved.

All maintainer rights for this code are disclaimed: feel free to
improve and extend.

Sponsored by: DARPA, NAI Labs


# 417fb7f6 11-Mar-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Make the disk_clone() routine more robust for abuse.
Sneak in a trivial bit of the GEOM stuff while we're here anyway.


# 6f60771b 05-Mar-2002 Robert Drehmel <robert@FreeBSD.org>

Fix a warning.


# 3165f068 04-Nov-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Don't call cdevsw_add().


# 20a3b67c 04-Nov-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Rename the top 7 bits if disk minors to spare bits, rather than type bits.


# b456f7e6 03-Nov-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Don't choke on old sd%d.ctl devices.

Tripped over by: Jos Backus <josb@cncdsl.com>


# a2d7281c 02-Nov-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Turn the symlinks around, instead of ad0s1 -> ad0s1c, make it ad0s1c -> ad0s1.

Requested by: peter


# 4e130067 28-Oct-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Fix a problem in the disk related hack where device nodes for a physically
non-existent disk in a legacy /dev on a DEVFS system would panic the system
if stat(2)'ed.

Do not whine about anonymous device nodes not having a si_devsw, they're
not supposed to.


# 4e4a7663 27-Oct-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Nudge the axe a bit closer to cdevsw[]:

Make it a panic to repeat make_dev() or destroy_dev(), this check
should maybe be neutered when -current goes -stable.

Whine if devsw() is called on anon dev_t's in a devfs system.

Make a hack to avoid our lazy-eval disk code triggering the above whine.

Fix the multiple make_dev() in disk code by making ${disk}${unit}s${slice}
an alias/symlink to ${disk}${unit}s${slice}c


# 5015bb7f 22-Oct-2001 Poul-Henning Kamp <phk@FreeBSD.org>

disk_clone() was a bit too eager to please: "md0s1ec" is not a valid
device.

Noticed by: Chad David <davidc@acns.ab.ca>


# b40ce416 12-Sep-2001 Julian Elischer <julian@FreeBSD.org>

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# a572c95c 15-Aug-2001 Bruce Evans <bde@FreeBSD.org>

Don't dump on the label sector or below. This avoids clobbering the
label if the dump device overflaps the label (which is a slight
misconfiguration). Dump routines don't use dscheck(), so the normal
write protection of the label doesn't help.

Reduced some nearby overflow bugs. In disk_dumpcheck(), there was
(fatal but fail-safe) overflow on i386's with 4GB of memory, at least
if Maxmem was the top page (can this happen?). The fix assumes that
the sector size divides PAGE_SIZE (dump routines already assume this).
In setdumpdev(), the corresponding overflow occurred with only about
2GB of memory on all machines with 32-bit ints. This allowed setdumpdev()
to succeed when it shouldn't have, but then disk_dumpcheck() failed
safe later. Except in old versions of FreeBSD like RELENG_3 where
there is no disk_dumpcheck().

PR: 28164 (label clobbering part)
MFC after: 1 week


# 22628ccf 29-May-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Remove the hack-around for the slice/label code, it didn't
cover the hole.


# 507fbee0 28-May-2001 Poul-Henning Kamp <phk@FreeBSD.org>

The disklabel/slice code is more twisted than I thought. Revert to
calling the cdevsw_add() unconditionally.


# 3344c5a1 26-May-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Create a general facility for making dev_t's depend on another
dev_t. The dev_depends(dev_t, dev_t) function is for tying them
to each other.

When destroy_dev() is called on a dev_t, all dev_t's depending
on it will also be destroyed (depth first order).

Rewrite the make_dev_alias() to use this dependency facility.

kern/subr_disk.c:
Make the disk mini-layer use dependencies to make sure all
relevant dev_t's are removed when the disk disappears.

Make the disk mini-layer precreate some magic sub devices
which the disk/slice/label code expects to be there.

kern/subr_disklabel.c:
Remove some now unneeded variables.

kern/subr_diskmbr.c:
Remove some ancient, commented out code.

kern/subr_diskslice.c:
Minor cleanup. Use name from dev_t instead of dsname()


# 8576c652 24-May-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Don't take the detour around devsw() to find out if the proto-cdevsw
is already initialized.


# e0e0b661 08-May-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Always initialize bio_resid from bio_bcount in the disk mini-layer so
that the drivers don't have to do it umpteen times.


# 079f2df3 06-May-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Make the disk mini-layer check for and handle zero-length transfers
instead of the underlying drivers.


# a468031c 06-May-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Actually biofinish(struct bio *, struct devstat *, int error) is more general
than the bioerror().

Most of this patch is generated by scripts.


# b417a1a8 13-Mar-2001 Søren Schmidt <sos@FreeBSD.org>

Dont call device close and ioctl functions if device has disappeared.

Reviewed by: phk


# f84ee0ff 15-Dec-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Don't clone impossible unit numbers for disks.


# 959b7375 08-Dec-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Staticize some malloc M_ instances.


# db901281 02-Sep-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Avoid the modules madness I inadvertently introduced by making the
cloning infrastructure standard in kern_conf. Modules are now
the same with or without devfs support.

If you need to detect if devfs is present, in modules or elsewhere,
check the integer variable "devfs_present".

This happily removes an ugly hack from kern/vfs_conf.c.

This forces a rename of the eventhandler and the standard clone
helper function.

Include <sys/eventhandler.h> in <sys/conf.h>: it's a helper #include
like <sys/queue.h>

Remove all #includes of opt_devfs.h they no longer matter.


# 3f54a085 20-Aug-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Remove all traces of Julians DEVFS (incl from kern/subr_diskslice.c)

Remove old DEVFS support fields from dev_t.

Make uid, gid & mode members of dev_t and set them in make_dev().

Use correct uid, gid & mode in make_dev in disk minilayer.

Add support for registering alias names for a dev_t using the
new function make_dev_alias(). These will show up as symlinks
in DEVFS.

Use makedev() rather than make_dev() for MFSs magic devices to prevent
DEVFS from noticing this abuse.

Add a field for DEVFS inode number in dev_t.

Add new DEVFS in fs/devfs.

Add devfs cloning to:
disk minilayer (ie: ad(4), sd(4), cd(4) etc etc)
md(4), tun(4), bpf(4), fd(4)

If DEVFS add -d flag to /sbin/inits args to make it mount devfs.

Add commented out DEVFS to GENERIC


# 5d10777c 05-Jul-2000 Warner Losh <imp@FreeBSD.org>

End two weeks of on and off debugging. Fix the crash on the Nth
insertion of a CF card, for random values of N > 1. With these fixes,
I've been able to do 100 insert/remove of the cards w/o a crash with
lots of system activity going on that in the past would help trigger
the crash.

The problem:

FreeBSD creates dev_t's on the fly as they are needed and never
destroys them. These dev_t's point to a struct disk that is used for
housekeeping on the disk. When a device goes away, the struct disk
pointer becomes a dangling pointer. Sometimes when the device comes
back, the pointer will point to the new struct disk (in which case the
insertion will work). Other times it won't (especially if any length
of time has passed, since it is dependent on memory returned from
malloc).

The Fix:

There is one of these dev_t's that is always correct. The
device for the WHOLE_DISK_SLICE is always right. It gets set at
create_disk() time. So, the fix is to spend a little CPU time and
lookup the WHOLE_DISK_SLICE dev_t and use the si_disk from that in
preference to the one that's in the device asking to do the I/O. In
addition, we change the test of si_disk == NULL meaning that the dev
needed to inherit properties from the pdev to dev->si_disk !=
pdev->si_disk. This test is a little stronger than the previous test,
but can sometimes be fooled into not inheriting. However, the results
of this fooling are that the old values will be used, which will
generally always be the same as before. si_drv[12] are the only
values that are copied that might pose a problem. They tend to change
as the si_disk field would change, so it is a hole, but it is a small
hole.

One could correctly argue that one should replace much of this code
with something much much better. I would be on the pro side of that
argument.

Reviewed by: phk (who also ported the original patch to current)
Sponsored by: Timing Solutions


# 77978ab8 04-Jul-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Previous commit changing SYSCTL_HANDLER_ARGS violated KNF.

Pointed out by: bde


# 82d9ae4e 03-Jul-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Style police catches up with rev 1.26 of src/sys/sys/sysctl.h:

Sanitize SYSCTL_HANDLER_ARGS so that simplistic tools can grog our
sources:

-sysctl_vm_zone SYSCTL_HANDLER_ARGS
+sysctl_vm_zone (SYSCTL_HANDLER_ARGS)


# 445572c1 22-Jun-2000 Neil Blakey-Milner <nbm@FreeBSD.org>

Add 'kern.disks', a sysctl which returns the list of disks from
disk_enumerate(), space delimited. This allows non-root users to get a
list of disks and will simplify libdisk's Disk_Names().

Reviewed by: phk


# 4bd02a56 15-Jun-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Add disk_enumerate() for finding names of disks. Vinum and libh will
need this RSN.

Remove a pointless warning in the root device locating code.

Remove the "wd" compatibility name from the "ad" driver.

WARNING: If you have not updated to use /dev/wd* in your /etc/fstab
and modern bootblocks, it would be a very good idea to do so BEFORE
you upgrade your kernel.


# 9626b608 05-May-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Separate the struct bio related stuff out of <sys/buf.h> into
<sys/bio.h>.

<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall
not be made a nested include according to bdes teachings on the
subject of nested includes.

Diskdrivers and similar stuff below specfs::strategy() should no
longer need to include <sys/buf.> unless they need caching of data.

Still a few bogus uses of struct buf to track down.

Repocopy by: peter


# 67f3c95c 25-Apr-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Clone the {b|bio}_offset field, and make sure it is always initialized
in struct bio. Eventually, bio_offset will probably obsolete the
bio_blkno and bio_pblkno fields.

Remove the special hack in atapi-cd.c to determine of bio_offset was valid.


# 8177437d 14-Apr-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Complete the bio/buf divorce for all code below devfs::strategy

Exceptions:
Vinum untouched. This means that it cannot be compiled.
Greg Lehey is on the case.

CCD not converted yet, casts to struct buf (still safe)

atapi-cd casts to struct buf to examine B_PHYS


# c244d2de 02-Apr-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Move B_ERROR flag to b_ioflags and call it BIO_ERROR.

(Much of this done by script)

Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED.

Move b_pblkno and b_iodone_chain to struct bio while we transition, they
will be obsoleted once bio structs chain/stack.

Add bio_queue field for struct bio aware disksort.

Address a lot of stylistic issues brought up by bde.


# a4fcac54 08-Mar-2000 Bruce Evans <bde@FreeBSD.org>

Fixed a null pointer panic for dumpon(8) on a nonexistent device whose
driver uses the new disk layer.

Reviewed by: phk
Approved by: jkh


# 47351d27 18-Feb-2000 Søren Schmidt <sos@FreeBSD.org>

Update the ata driver to take more advantage of newbus, this
was needed to make attach/detach of devices work, which is
needed for the PCCARD support.
(PCCARD support is still not working though, more to come on that)

Support the CMD646 chip which is used on many alphas, sadly only
in WDMA2 mode, as the silicon is broken beyond belief for UDMA modes.

Lots of cosmetic fixes here and there.

Sorry for the size of this megapatchfromhell but it was not
possible otherwise...

newbus patches based on work from: dfr (Doug Rabson)


# 1edde29e 28-Jan-2000 Poul-Henning Kamp <phk@FreeBSD.org>

rename disk_delete() to disk_destroy().


# d685023e 09-Jan-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Also handle zero return from dscheck().

PR: 15956


# 1b4ce5ce 18-Dec-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Don't ignore return value from tsleep().

Spotted by: charnier


# 8d67e113 19-Nov-1999 Jordan K. Hubbard <jkh@FreeBSD.org>

Conditionalise unwanted chattyness.


# 8db34b3a 06-Nov-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Put a lock on the disk structure while we open to avoid races.

PR: 14486


# d1f088da 11-Oct-1999 Peter Wemm <peter@FreeBSD.org>

Trim unused options (or #ifdef for undoc options).

Submitted by: phk


# 6d7e938c 04-Oct-1999 Poul-Henning Kamp <phk@FreeBSD.org>

be more consistent about passing the whole/raw dev_t to the driver


# dc722a14 02-Oct-1999 Søren Schmidt <sos@FreeBSD.org>

In some drivers we use two devices to be able to boot.
So if si_iosize_max is allready set, dont mess with it..

Also just log the problem with maxphys not being set once.

designed by: phk
tested by: sos


# 45604de3 02-Oct-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Fix a problem relating to si_iosize_max which broke scsi devices.


# 66c12520 30-Sep-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Make all slices/partitions correctly inherit si_* fields.

Lightly tested by: msmith


# 263ab971 30-Sep-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Fix disk_close once more, and better this time.

Spotted by: bde


# 46a706dc 29-Sep-1999 Mike Smith <msmith@FreeBSD.org>

Test the slices for openness before we close them; doing it the other way
around meant that the higher level close routine never gets called.
(phk is on the road; this is a quick fix to get things working and may need
more polish)


# abd1f573 13-Sep-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Register the right cdevsw on the master device.

Detected by: sos


# 2016e4e9 12-Sep-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Bite the bullet and allocate the devsw entry at compile time.


# 3febdd8f 12-Sep-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Use a different tactic when creating the devsw so that disk_create()
doesn't need to malloc.


# 85a219d2 09-Sep-1999 Julian Elischer <julian@FreeBSD.org>

Changes to centralise the default blocksize behaviour.
More likely to follow.

Submitted by: phk@freebsd.org


# 8684f73a 31-Aug-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Improve the micro "disk" layer after gaining more experience with it.


# da9e4f55 29-Aug-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Add micro "disk" layer which should enable us to pull all the slice/label
stuff out of the device drivers.