History log of /freebsd-current/sys/geom/raid3/g_raid3.c
Revision Date Author Comments
# 955f213f 19-Apr-2024 Mark Johnston <markj@FreeBSD.org>

graid3: Fix teardown in g_raid3_try_destroy()

Commit 33cb9b3c3a22 replaced a g_raid3_destroy_device() call with a
g_raid3_free_device() call, which was incorrect and could lead to a
panic if a RAID3 GEOM failed to start (e.g., due to missing disks).

Reported by: graid3 tests
Fixes: 33cb9b3c3a22 ("graid3: Fix teardown races")
MFC after: 3 days
Sponsored by: Klara, Inc.


# 4eb861d3 23-Nov-2023 Mitchell Horne <mhorne@FreeBSD.org>

shutdown: audit shutdown_post_sync event callbacks

Ensure they are all panic/debugger safe.

Most handlers for this event are for disk drivers/geom modules. There
are a mix of checks being used here (or not), so let's standardize on
checking the presence of the RB_NOSYNC flag.

This flag is set whenever:
1. The kernel has panicked and kern.sync_on_panic=0*
2. We reboot from within the kernel debugger (the "reset" command)
3. Userspace requested it, e.g. by 'reboot -n'

Name the functions consistently.

*This sysctl is tuned to zero by default, but its existence means that
these handlers can be executed after a panic, at the user's discretion.
IMO this use-case is implicitly understood to be risky, and we'd be
better off eliminating it altogether.

Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D42337


# f3dc1727 23-Nov-2023 Mitchell Horne <mhorne@FreeBSD.org>

geom: sort includes for some files

This is not exhaustive, just done ahead of some upcoming changes to
these files.

Don't include sys/cdefs.h explicitly. No functional change intended.

Reviewed by: imp, jhb
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42335


# 33cb9b3c 02-Nov-2023 Mark Johnston <markj@FreeBSD.org>

graid3: Fix teardown races

Port commit dc399583ba09 from g_mirror, which has an effectively
identical startup sequence.

This fixes a race that was occasionally causing panics during GEOM test
suite runs on riscv.

MFC after: 1 month


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix


# fd02d0bc 26-Mar-2023 Mark Johnston <markj@FreeBSD.org>

graid3: Pre-allocate the timeout event structure

As in commit 2f1cfb7f63ca ("gmirror: Pre-allocate the timeout event
structure"), graid3 must avoid M_WAITOK allocations in callout handlers.

Reported by: graid3 regression tests
MFC after 2 weeks


# 10ae42cc 29-Jan-2022 Alexander Motin <mav@FreeBSD.org>

GEOM: Set G_CF_DIRECT_SEND/RECEIVE for taste consumers.

All I/O requests through the taste consumers are synchronous, done
with g_read_data() and without any locks held. It makes no sense
to delegate the I/O to g_down/g_up threads.

This removes many of context switches during disk retaste.

MFC after: 2 weeks


# 2cc5a480 09-Dec-2021 Mateusz Guzik <mjg@FreeBSD.org>

geom_raid3: plug set-but-not-unused vars

Sponsored by: Rubicon Communications, LLC ("Netgate")


# 39552dff 13-Jul-2021 Mark Johnston <markj@FreeBSD.org>

graid3: Zero the metadata block before writing

Ensure that string buffers and pad bytes are zero-filled before writing
graid3 metadata.

Reported by: KMSAN
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation


# cd853791 27-Nov-2020 Konstantin Belousov <kib@FreeBSD.org>

Make MAXPHYS tunable. Bump MAXPHYS to 1M.

Replace MAXPHYS by runtime variable maxphys. It is initialized from
MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.

Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer
cache buffers exactly to atop(maxbcachebuf) (currently it is sized to
atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1.
The +1 for pbufs allow several pbuf consumers, among them vmapbuf(),
to use unaligned buffers still sized to maxphys, esp. when such
buffers come from userspace (*). Overall, we save significant amount
of otherwise wasted memory in b_pages[] for buffer cache buffers,
while bumping MAXPHYS to desired high value.

Eliminate all direct uses of the MAXPHYS constant in kernel and driver
sources, except a place which initialize maxphys. Some random (and
arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted
straight. Some drivers, which use MAXPHYS to size embeded structures,
get private MAXPHYS-like constant; their convertion is out of scope
for this work.

Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs,
dev/siis, where either submitted by, or based on changes by mav.

Suggested by: mav (*)
Reviewed by: imp, mav, imp, mckusick, scottl (intermediate versions)
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D27225


# d22ff249 18-Oct-2020 Edward Tomasz Napierala <trasz@FreeBSD.org>

Make g_attach() return ENXIO for orphaned providers; update various
classes to add missing error checking.

Reviewed by: imp
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D26658


# d40bc607 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

geom: clean up empty lines in .c and .h files


# 7029da5c 26-Feb-2020 Pawel Biernacki <kaktus@FreeBSD.org>

Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)

r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718


# 8b522bda 16-Jan-2020 Warner Losh <imp@FreeBSD.org>

Pass BIO_SPEEDUP through all the geom layers

While some geom layers pass unknown commands down, not all do. For the ones that
don't, pass BIO_SPEEDUP down to the providers that constittue the geom, as
applicable. No changes to vinum or virstor because I was unsure how to add this
support, and I'm also unsure how to test these. gvinum doesn't implement
BIO_FLUSH either, so it may just be poorly maintained. gvirstor is for testing
and not supportig BIO_SPEEDUP is fine.

Reviewed by: chs
Differential Revision: https://reviews.freebsd.org/D23183


# ac03832e 07-Aug-2019 Conrad Meyer <cem@FreeBSD.org>

GEOM: Reduce unnecessary log interleaving with sbufs

Similar to what was done for device_printfs in r347229.

Convert g_print_bio() to a thin shim around g_format_bio(), which acts on an
sbuf; documented in g_bio.9.

Reviewed by: markj
Discussed with: rlibby
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D21165


# 49ee0fce 19-Jun-2019 Alexander Motin <mav@FreeBSD.org>

Use sbuf_cat() in GEOM confxml generation.

When it comes to megabytes of text, difference between sbuf_printf() and
sbuf_cat() becomes substantial.

MFC after: 2 weeks
Sponsored by: iXsystems, Inc.


# 74d6c131 10-Apr-2018 Kyle Evans <kevans@FreeBSD.org>

Annotate geom modules with MODULE_VERSION

GEOM ELI may double ask the password during boot. Once at loader time, and
once at init time.

This happens due a module loading bug. By default GEOM ELI caches the
password in the kernel, but without the MODULE_VERSION annotation, the
kernel loads over the kernel module, even if the GEOM ELI was compiled into
the kernel. In this case, the newly loaded module
purges/invalidates/overwrites the GEOM ELI's password cache, which causes
the double asking.

MFC Note: There's a pc98 component to the original submission that is
omitted here due to pc98 removal in head. This part will need to be revived
upon MFC.

Reviewed by: imp
Submitted by: op
Obtained from: opBSD
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D14992


# 3728855a 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/geom: adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.


# 8b64f3ca 23-Sep-2016 Alexander Motin <mav@FreeBSD.org>

Use g_wither_provider() where applicable.

It is just a helper function combining G_PF_WITHER setting with
g_orphan_provider().


# 4e2732b5 20-May-2016 Konstantin Belousov <kib@FreeBSD.org>

Removal of Giant droping wrappers for GEOM classes.

Sponsored by: The FreeBSD Foundation


# e8d57122 29-Apr-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/geom: spelling fixes in comments.

No functional change.


# 9a8fa125 13-Apr-2016 Warner Losh <imp@FreeBSD.org>

Bump bio_cmd and bio_*flags from 8 bits to 16.

Differential Revision: https://reviews.freebsd.org/D5784


# c55f5707 17-Feb-2016 Warner Losh <imp@FreeBSD.org>

Create an API to reset a struct bio (g_reset_bio). This is mandatory
for all struct bio you get back from g_{new,alloc}_bio. Temporary
bios that you create on the stack or elsewhere should use this before
first use of the bio, and between uses of the bio. At the moment, it
is nothing more than a wrapper around bzero, but that may change in
the future. The wrapper also removes one place where we encode the
size of struct bio in the KBI.


# fd90e2ed 22-May-2015 Jung-uk Kim <jkim@FreeBSD.org>

CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head. However, it is continuously misused as the mpsafe argument
for callout_init(9). Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision: https://reviews.freebsd.org/D2613
Reviewed by: jhb
MFC after: 2 weeks


# af3b2549 27-Jun-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Pull in r267961 and r267973 again. Fix for issues reported will follow.


# 37a107a4 27-Jun-2014 Glen Barber <gjb@FreeBSD.org>

Revert r267961, r267973:

These changes prevent sysctl(8) from returning proper output,
such as:

1) no output from sysctl(8)
2) erroneously returning ENOMEM with tools like truss(1)
or uname(1)
truss: can not get etype: Cannot allocate memory


# 3da1cf1e 27-Jun-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after: 2 weeks
Sponsored by: Mellanox Technologies


# f62c1a47 14-Jan-2013 Alexander Motin <mav@FreeBSD.org>

Alike to r242314 for GRAID make GRAID3 more aggressive in marking volumes
as clean on shutdown and move that action from shutdown_pre_sync stage to
shutdown_post_sync to avoid extra flapping.

ZFS tends to not close devices on shutdown, that doesn't allow GEOM RAID
to shutdown gracefully. To handle that, mark volume as clean just when
shutdown time comes and there are no active writes.

MFC after: 2 weeks


# 4a7f7b10 11-Sep-2012 Gleb Smirnoff <glebius@FreeBSD.org>

When synchronizing, include in the config dump amount of
bytes syncronized.
The rationale behind this is the following: for large disks the
percent synchronisation counter ticks too seldom, and monitoring
software (as well as human operator) can't tell whether
synchronisation goes on or one of disks got stuck. On an idle
server one can look into gstat and see whether synchronisation goes
on or not, but on a busy server that won't work. Also, new value
monitored can be differentiated obtaining the synchronisation speed
quite precisely.

Submitted by: Konstantin Kukushkin <dark ramtel.ru>
Reviewed by: pjd


# 6472ac3d 07-Nov-2011 Ed Schouten <ed@FreeBSD.org>

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


# 5d807a0e 10-Jul-2011 Andrey V. Elsukov <ae@FreeBSD.org>

Include sys/sbuf.h directly.

Reviewed by: pjd


# 90f2be24 26-Apr-2011 Alexander Motin <mav@FreeBSD.org>

Implement relaxed comparision for hardcoded provider names to make it
ignore adX/adaY difference in both directions to simplify migration to
the CAM-based ATA or back.


# cb08c2cc 25-Feb-2011 Alexander Leidinger <netchild@FreeBSD.org>

Add some FEATURE macros for various GEOM classes.

No FreeBSD version bump, the userland application to query the features will
be committed last and can serve as an indication of the availablility if
needed.

Sponsored by: Google Summer of Code 2010
Submitted by: kibab
Reviewed by: silence on geom@ during 2 weeks
X-MFC after: to be determined in last commit with code from this project


# 95959703 12-Jan-2011 Andrey V. Elsukov <ae@FreeBSD.org>

Sector size can not be greater than MAXPHYS. Since GRAID3 calculates
sector size from user-specified block size, report to user about
big blocksize.

PR: kern/147851
MFC after: 1 week


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# 981b1110 05-Feb-2010 Alexander Motin <mav@FreeBSD.org>

MFC r201566, r201567:
Move wakeup() out of mutex to reduce contention.


# 9185a702 05-Feb-2010 Alexander Motin <mav@FreeBSD.org>

MFC r201545:
Slightly optimize XOR calculation.


# 3bda9adc 05-Jan-2010 Alexander Motin <mav@FreeBSD.org>

MFC r200940:
As soon as geom_raid3 reports it's own stripe as sector size, report largest
underlying provider's stripe, multiplied by number of data disks in array,
due to transformation done, as array stripe.


# 8de58113 05-Jan-2010 Alexander Motin <mav@FreeBSD.org>

Move wakeup() out of mutex to reduce contention.


# 06b215fd 04-Jan-2010 Alexander Motin <mav@FreeBSD.org>

Slightly optimize XOR calculation.


# 1a7528a7 29-Dec-2009 Alexander Motin <mav@FreeBSD.org>

MFC r200821:
Make graid3 fallback to malloc() when component request size is bigger
then maximal prepared UMA zone size. This fixes crash with MAXPHYS > 128K.


# 113d8e50 24-Dec-2009 Alexander Motin <mav@FreeBSD.org>

As soon as geom_raid3 reports it's own stripe as sector size, report largest
underlying provider's stripe, multiplied by number of data disks in array,
due to transformation done, as array stripe.


# d4060fa6 21-Dec-2009 Alexander Motin <mav@FreeBSD.org>

Make graid3 fallback to malloc() when component request size is bigger
then maximal prepared UMA zone size. This fixes crash with MAXPHYS > 128K.


# 853a10a5 09-Apr-2009 Andrew Thompson <thompsa@FreeBSD.org>

Revert r190676,190677

The geom and CAM changes for root_hold are the wrong solution for USB design
quirks.

Requested by: scottl


# 626fc9fe 03-Apr-2009 Andrew Thompson <thompsa@FreeBSD.org>

Add a how argument to root_mount_hold() so it can be passed NOWAIT and be called
in situations where sleeping isnt allowed.


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# 3745c395 20-Oct-2007 Julian Elischer <julian@FreeBSD.org>

Rename the kthread_xxx (e.g. kthread_create()) calls
to kproc_xxx as they actually make whole processes.
Thos makes way for us to add REAL kthread_create() and friends
that actually make theads. it turns out that most of these
calls actually end up being moved back to the thread version
when it's added. but we need to make this cosmetic change first.

I'd LOVE to do this rename in 7.0 so that we can eventually MFC the
new kthread_xxx() calls.


# 982d11f8 04-Jun-2007 Jeff Roberson <jeff@FreeBSD.org>

Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
sychronization.
- Use the per-process spinlock rather than the sched_lock for per-process
scheduling synchronization.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


# 501250ba 01-Nov-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Now, that we have gjournal in the tree add possibility to configure
gmirror and graid3 in a way that it is not resynchronized after a
power failure or system crash.
It is safe when gjournal is running on top of gmirror/graid3.


# f187490a 01-Nov-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Change spaces to tabs where needed.


# 42461fba 31-Oct-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Implement BIO_FLUSH handling by simply passing it down to the components.

Sponsored by: home.pl


# 11b2174f 10-Oct-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Guard against invalid metadata.

MFC after: 1 week


# b7beab8d 30-Sep-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

One more white space fix.


# 1517bdc8 30-Sep-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Remove trailing spaces.


# 8e007c52 13-Sep-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Fix synchronization in gmirror and graid3 which I broken. Synchronization
request can still have bio_to set to sc_provider (this is READ part of a
synchronization request) and in this case g_{mirror,raid3}_sync() wasn't
called as it should be.

MFC after: 1 week


# 0cca572e 09-Sep-2006 John-Mark Gurney <jmg@FreeBSD.org>

move created/detected/activated under debug level 1 to quiet the common case..

add count of active and total components to the launched line so you can
see at a glance if your mirror/raid3 is complete...

now:
GEOM_MIRROR: Device mirror/sam launched (2/2).

Reviewed by: pjd


# de6f1c7c 09-Aug-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Not only a request from us can be passed to g_{mirror,raid3}_worker()
function, but also a request to us, in which case checking bio_cflags
is wrong, because the class above us is controling it, not we.

MFC after: 1 week


# 776fc0e9 04-Aug-2006 Yaroslav Tykhiy <ytykhiy@gmail.com>

Commit the results of the typo hunt by Darren Pilgrim.
This change affects documentation and comments only,
no real code involved.

PR: misc/101245
Submitted by: Darren Pilgrim <darren pilgrim bitfreak org>
Tested by: md5(1)
MFC after: 1 week


# 3c57a41d 01-Aug-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Don't use f-word in comments. We are gentlemans.

Pointed out by: Maciej Sobczak


# 3525bb6b 10-Jul-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Use proper defines instead of magic values.

MFC after: 1 week


# ed940a82 08-Jul-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

When kern.geom.raid3.use_malloc tunnable is set to 1, malloc(9) instead of
uma(9) will be used for memory allocation.
In case of problems or tracking bugs, there are more useful tools for malloc(9)
debugging than for uma(9) debugging, like memguard(9) and redzone(9).

MFC after: 1 week


# 1f7fec3c 03-Jul-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Allow to close access even if device is already destroyed.

Reported by: Ulrich Spoerlein <uspoerlein@gmail.com>
PR: kern/98093
MFC after: 1 week


# ee40c7aa 04-May-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Use G_RAID3_FOREACH_SAFE_BIO() macro instead of G_RAID3_FOREACH_BIO() in
two places where g_io_request() is called. g_io_request() can free bio
structure so we can't reference it after and G_RAID3_FOREACH_BIO() macro
was doing this.

Found by: Coverity Prevent analysis tool (with my new models)
MFC after: 1 day


# ffd106f5 30-Apr-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

We shouldn't lock the topology here - we will panic on assertion inside
g_raid3_bump_syncid().

Reported by: Bradley W. Dutton <brad-fbsd-stable@duttonbros.com>
MFC after: 1 day


# 84edb86d 27-Apr-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

- Don't hold the device sx lock when going to sleep.
- Prevent possible live-lock in case of memory problems by freeing
already completed requests first.

Reported and tested by: markus, Bradley W. Dutton <brad-fbsd-stable@duttonbros.com>
MFC after: 1 day


# a2fe5c66 27-Apr-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

- Remove dead code.
- Comment possible event miss, which isn't critical, but probably can be
fixed by replacing the event lock usage with the queue lock.

MFC after: 2 weeks


# 18486a5e 28-Apr-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Be sure to not destroy device twice. This is not possible in theory, but
with this change there is even no theoretical race.

MFC after: 2 weeks


# c082905b 18-Apr-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Fix storing offset of already synchronized data. Offset in entire array was
stored in metadata instead of an offset in single disk.
After reboot/crash synchronization process started from a wrong offset
skipping (not synchronizing) part of the component which can lead to data
corrutpion (when synchronization process was interrupted on initial
synchronization) or other strange situations like 'graid3 status' showing
value more than 100%.

Reported, reviewed and tested by: ru
Reported by: Dmitry Morozovsky <marck@rinet.ru>
MFC after: 1 day


# 712fe9bd 10-Apr-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Introduce and use delayed-destruction functionality from a pre-sync hook,
which means that devices will be destroyed on last close.

This fixes destruction order problems when, eg. RAID3 array is build on
top of RAID1 arrays.

Requested, reviewed and tested by: ru
MFC after: 2 weeks


# 0d14fae5 28-Mar-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Preserve previous behaviour of kern.geom.raid3.n{64,16,4}k tunables were 0
means unlimited.

Reported by: ru
MFC after: 3 days


# d7fad9f6 25-Mar-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Increase debug level for "Thread exiting." message. It's not that important
and is 0 by accident.

MFC after: 3 days


# e6757059 19-Mar-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

kern.geom.raid3.sync_requests=2 seems to be a better default - it still
keeps disks very busy, but makes system much more responsive.

While here, kill extra space.


# ef25813d 13-Mar-2006 Ruslan Ermilov <ru@FreeBSD.org>

Fix build on 64-bit platforms.


# 3650be51 12-Mar-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

- Reimplement I/O data allocation to prevent deadlocks.

Submitted by: green

- Speed up synchronization process by using configurable number of I/O
requests in parallel.
+ Add kern.geom.raid3.sync_requests tunable which defines how many parallel
I/O requests should be used.
+ Retire kern.geom.raid3.reqs_per_sync and kern.geom.raid3.syncs_per_sec
sysctls.
- Fix race between regular and synchronization requests.
- Reimplement raid3's data synchronization - do not use the topology lock
for this purpose, as it may case deadlocks.
- Stop synchronization from pre-sync hook.
- Fix some other minor issues.

Tested by: Mike Tancsa <mike@sentex.net>
MFC after: 3 days


# 290c6161 22-Feb-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Do not use bio structure after g_io_deliver(), it may not longer by valid.

Found and fixed by: Vsevolod Lobko <seva@ip.net.ua>
MFC after: 3 days


# bf31327c 12-Feb-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

On component state change to ACTIVE don't forget to update metadata.

MFC after: 3 days


# 01f1f41c 12-Feb-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Use time_uptime instead of time_second, as the latter may go backwards.

Suggested by: ru
MFC after: 3 days


# 67cae8aa 11-Feb-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Allow to set kern.geom.raid3.disconnect_on_failure from loader.conf.

MFC after: 3 days


# 3aae74ec 11-Feb-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

- Add kern.geom.raid3.disconnect_on_failure sysctl/tunnable (default to 1
to preserve currect behaviour). When set to 0, components are not
disconnected - graid3 will try to still use them (only first error will
be logged). This is helpful when we have two broken components, but in
different places, so actually all data is available.
Such buggy component will be visible in 'graid3 list' output with flag
BROKEN.
- Never disconnect the last valid component. If we detect errors there we
will just pass them up. This wasn't reasonable to deny access to the
whole provider because of one broken sector.

Prodded by: ru
MFC after: 3 days


# 17fec17e 11-Feb-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Correct typo. 'fbp' is NULL here so this will result in a panic.

MFC after: 3 days


# 0962f942 11-Feb-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Mark array as CLEAN when there are no write requests in
kern.geom.raid3.idletime seconds. Write, not any requests.
Mark array as clean immediatelly on last write close.

Prodded by: ru
MFC after: 3 days


# 38ea96ac 31-Jan-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Remove trailing spaces.


# 87e9d284 30-Jan-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Fix typo which cased that 64kB elements limit was not set properly and
16kB elements limit wasn't set at all.

Submitted by: Vsevolod Lobko <seva@ip.net.ua>
MFC after: 3 days


# e9b936c7 18-Jan-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Remove dead code.

Found by: Coverity Prevent(tm)
Coverity ID: CID105
MFC after: 3 days


# 8a4a44b5 30-Nov-2005 Maxim Sobolev <sobomax@FreeBSD.org>

Check for g_read_data(9) errors properly:

o The only indication of error condition is NULL value returned by
the function;

o value pointed to by error argument is undefined in the case when
operation completes successfully.

Discussed with: phk


# 5bb84bc8 31-Oct-2005 Robert Watson <rwatson@FreeBSD.org>

Normalize a significant number of kernel malloc type names:

- Prefer '_' to ' ', as it results in more easily parsed results in
memory monitoring tools such as vmstat.

- Remove punctuation that is incompatible with using memory type names
as file names, such as '/' characters.

- Disambiguate some collisions by adding subsystem prefixes to some
memory types.

- Generally prefer lower case to upper case.

- If the same type is defined in multiple architecture directories,
attempt to use the same name in additional cases.

Not all instances were caught in this change, so more work is required to
finish this conversion. Similar changes are required for UMA zone names.


# a65a0da2 28-Oct-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Fix possible live-lock under heavy load where we can't allocate more
memory for request.
I was sure graid3 should handle such situations well, but green@ reported
it is not and we want to fix it before 6.0.

Submitted by: green


# 4ed854e8 27-Jul-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Use root_mount KPI for RAID3 to delay root file system mount.
Actually, one cannot setup root file system on RAID3 device, but when
other file system exist in /etc/fstab which are placed on RAID3 device,
boot process will be interrupted when these devices are missing.

MFC after: 3 days
X-MFC-note: MFC only to RELENG_6, as RELENG_5 doesn't have root_mount KPI.


# 34cb1517 26-Mar-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

If an error occurs, clean up before returning from g_raid3_connect_disk().


# e6890985 27-Feb-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

- Add md_provsize field to metadata, which will help with
shared-last-sector problem.
After this change, even if there is more than one provider with the same
last sector, the proper one will be chosen based on its size.
It still doesn't fix the 'c' partition problem (when da0s1 can be confused
with da0s1c) and situation when 'a' partition starts at offset 0
(then da0s1a can be confused with da0s1 and da0s1c). One can use '-h'
option there, when creating device or avoid sharing last sector.
Actually, when providers share the same last sector and their size is equal,
they provide exactly the same data, so the name (da0s1, da0s1a, da0s1c)
isn't important at all.
- Provide backward compatibility.
- Update copyright's year.

MFC after: 1 week


# 0218292c 16-Feb-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Update copyright in files changed this year.


# 43756685 09-Jan-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Increase default synchronization speed.

MFC after: 3 days


# ea973705 03-Jan-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

- Fix 'rebuild' command - it can no longer relay on retaste event
(we ignore it).
- Remove code used for handling spoil events, as spoiling is not possible
anymore, because we keep consumers open for writing all the time.

MFC after: 4 days


# cdca9c06 02-Jan-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Remove unused #include.


# 63710c4d 30-Dec-2004 John Baldwin <jhb@FreeBSD.org>

Stop explicitly touching td_base_pri outside of the scheduler and simply
set a thread's priority via sched_prio() when that is the desired action.
The schedulers will start managing td_base_pri internally shortly.


# 7f456a7d 28-Dec-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Remove debug code.


# a245a548 25-Dec-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

- Add genid field to the metadata which will allow to improve reliability a bit.
After this change, when component is disconnected because of an I/O error,
it will not be connected and synchronized automatically, it will be logged
as broken and skipped. Autosynchronization can occur, when component is
disconnected (on orphan event) and connected again - there were no I/O
error, so there is no need to not connected the component, but when there were
writes while it wasn't connected, it will be synchronized.
This fix cases, when component is disconnected because of I/O error and can be
connected again and again.
- Bump version number.
- Implement backward compatibility mechanism. After this change when metadata in
old version is detected, it is automatically upgraded to the new (current)
version.


# 4485f000 21-Dec-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Now, when force device destruction is done on shutdown, hide warning,
that device cannot be destroyed immediately, under debug=1.

Suggested by: simon


# d97d5ee9 21-Dec-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Improve reliability and clean up code a bit.
For more details check src/sys/geom/mirror/g_mirror.c rev.1.47,1.48,1.49,1.50.


# 89dd8e53 13-Dec-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

bioq_insert_head() function is already in subr_disk.c.


# afd05d74 04-Dec-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

When initializing device, set d_softc and d_no fields for all components,
because we know it then and we need it when inserting a component which
wasn't destroyed while device was running.

Reported by: Michael Handler <handler@grendel.net>
MFC after: 1 week


# 085f43af 09-Nov-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Before trying to update metadata (so open consumer for writing), be sure
that the events queue is empty. In other case we're able to hit the race
where for example da0s1 is tasted by some other class, which means that
da0 is open with exclusive bit set, which means that we can't open da0
for writing if it is our component.

Reported by: Attila Nagy <bra@fsn.hu> (and somebody else sometime ago,
but I cannot find who it was)


# b36b4bfb 09-Nov-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Don't rely on DIRTY flag to be sure that consumer if open, because
DIRTY flag can be removed in idle process. Use consumer's acw field
instead to avoid opening consumer twice.


# 9c6a3f03 09-Nov-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

For BIO_READ check if provider is open for reading and for BIO_WRITE,
check if provider is open for writing.
This fixes panic when device is open only for writing and we send write
request.


# fdc3c6ce 08-Nov-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Drop Giant lock before grabbing the topology lock.


# 463674f7 08-Nov-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

If device is marked as beeing destroyed, deny all access requests.


# 9bb09163 05-Nov-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Don't forget to make sure that there are no not-finished requests before
marking components as clean.

Pointed out by: scottl


# 4d006a98 05-Nov-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

- Mark all raid3 components as clean after kern.geom.raid3.idletime seconds.
- Make kern.geom.raid3.timeout variable tunable.


# 9da3072c 05-Nov-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Mark raid3 devices as clean on shutdown (after all file systems are
unmounted).

Suggested by: scottl


# 79e61493 04-Nov-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

- Use ->index consumer's field to track number of in-flight requests.
- Remove unused #include.


# 604fce4f 28-Sep-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Just use MAXPHYS as maximum I/O request size, instead of using my own
#define for this purpose.
No functional change.


# e5e7825c 27-Sep-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Decrease kern.geom.raid3.timeout to 4, so it is smaller than
vfs.root.mountdelay by default.


# d2fb9c62 27-Sep-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Avoid race while synchronizing components. It is very hard to bump into,
but it is possible:
1. Read data from good component for synchronization.
2. Write data to the same area.
3. Write synchronization data, which are now stale.

Found by: tegge (for gmirror)


# 201dfcf1 20-Sep-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

This is not needed anymore, it is forced in GEOM now.
Actually, it can even cause some problems, because GEOM requires sectorsize
to be more than 0 on first access, not on provider creation, so we can skip
valid providers by doing this check here.

Reported by: Divacky Roman <xdivac02@stud.fit.vutbr.cz>
Sven Willenberger <sven@dmv.com>


# 6d7b8aec 30-Aug-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Allow to configure debug level from /boot/loader.conf.


# 45d5e85a 29-Aug-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

GCC, ehh.


# c0d68b6e 27-Aug-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Use sc->sc_mediasize instead of sc->sc_provider->mediasize which contains
exactly the same value, but is shorter.


# 29c78ab3 25-Aug-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Skip providers with not defined sector size.

Reported by: kuriyama


# 4cf67afe 25-Aug-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Log verification errors at level 1.


# dba915cf 22-Aug-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Implementation of 'verify reading' algorithm, which uses parity data for
verification of regular data when device is in complete state.
On verification error, EIO error is returned for the bio and sysctl
kern.geom.raid3.stat.parity_mismatch is increased.

Suggested by: phk


# f5a2f7fe 21-Aug-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Implement new reading algorithm, which will use parity component for reading
as well, even if device is in complete state.
I observe 40% of speed-up with this option for random read operations,
but slowdown for sequential reads.
Basically, without this option reading from a RAID3 device built from 5
components (c0-c4) looks like this:

Request no. Used components
1 c0+c1+c2+c3
2 c0+c1+c2+c3
3 c0+c1+c2+c3

With the new feature:

Request no. Used components
1 c0+c1+c2+c3
2 (c1^c2^c3^c4)+c1+c2+c3
3 c0+(c0^c2^c3^c4)+c2+c3
4 c0+c1+(c0^c1^c3^c4)+c3
5 c0+c1+c2+(c0^c1^c2^c4)
6 c0+c1+c2+c3
[...]


# d86bc96c 18-Aug-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

We really don't want to receive spoil event for synchroniztion consumers.


# 28b31df7 18-Aug-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Dump device status on 'list' command.


# fa6a7837 16-Aug-2004 David E. O'Brien <obrien@FreeBSD.org>

Minor style.9 cleanup.


# 809a9dc6 16-Aug-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Decrease debug level to 0.


# 5e6db16c 16-Aug-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Fix warning.


# 2d1661a5 16-Aug-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Introduce GEOM RAID3 class, i.e. kernel module, which implements RAID3
transformation and graid3(8) userland utility, which can be used for
configuration. No manual page yet, sorry.

Hardware provided by: Daniel Seuffert