#
338905 |
|
24-Sep-2018 |
markj |
MFC r338724: Fix an nvpair leak in vdev_geom_read_config().
PR: 230704
|
#
330524 |
|
05-Mar-2018 |
asomers |
MFC r324940:
Fix the error message when creating a zpool on a too-small device
Don't check for SPA_MINDEVSIZE in vdev_geom_attach when opening by path. It's redundant with the check in vdev_open, and failing to attach here results in the wrong error message being printed. However, still check for it in some other situations:
* When opening by guids, so we don't get bogged down reading from slow devices like floppy drives. * In vdev_geom_read_pool_label for the same reason, because we iterate over all providers. * If the caller requests that we verify the guid, because then we'll have to read from the device before vdev_open verifies the size.
PR: 222227 Reported by: Marie Helene Kvello-Aune <marieheleneka@gmail.com> Reviewed by: avg, mav Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D12531
|
#
330522 |
|
05-Mar-2018 |
asomers |
MFC r326401:
Fix assertion when ZFS fails to open certain devices
"panic: vdev_geom_close_locked: cp->private is NULL" This panic will result if ZFS fails to open a device due to either of the following reasons:
1) The device's sector size is greater than 8KB. 2) ZFS wants to open the device RW, but it can't be opened for writing.
The solution is to change the initialization order to ensure that the assertion will be satisfied.
PR: 221066 Reported by: David NewHamlet <wheelcomplex@gmail.com> Reviewed by: avg Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D13278
|
#
326325 |
|
28-Nov-2017 |
asomers |
MFC r322546:
Fix some ZFS debugging messages
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c Be more careful about the use of provider names vs vdev names in ZFS_LOG statements.
Sponsored by: Spectra Logic Corp
|
#
325913 |
|
16-Nov-2017 |
avg |
MFC r325228: vdev_geom_close: close errored consumer even if vdev_reopening is set
|
#
319268 |
|
30-May-2017 |
asomers |
MFC r318189:
vdev_geom may associate multiple vdevs per g_consumer
vdev_geom.c currently uses the g_consumer's private field to point to a vdev_t. That way, a GEOM event can cause a change to a ZFS vdev. For example, when you remove a disk, the vdev's status will change to REMOVED. However, vdev_geom will sometimes attach multiple vdevs to the same GEOM consumer. If this happens, then geom events will only be propagated to one of the vdevs.
Fix this by storing a linked list of vdevs in g_consumer's private field.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
* g_consumer.private now stores a linked list of vdev pointers associated with the consumer instead of just a single vdev pointer.
* Change vdev_geom_set_physpath's signature to more closely match vdev_geom_set_rotation_rate
* Don't bother calling g_access in vdev_geom_set_physpath. It's guaranteed that we've already accessed the consumer by the time we get here.
* Don't call vdev_geom_set_physpath in vdev_geom_attach. Instead, call it in vdev_geom_open, after we know that the open has succeeded.
PR: 218634 Reviewed by: gibbs Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D10391
|
#
319262 |
|
30-May-2017 |
asomers |
MFC r316760:
Fix vdev_geom_attach_by_guids for partitioned disks
When opening a vdev whose path is unknown, vdev_geom must find a geom provider with a label whose guids match the desired vdev. However, due to partitioning, it is possible that two non-synonomous providers will share some labels. For example, if the first partition starts at the beginning of the drive, then ada0 and ada0p1 will share the first label. More troubling, if the last partition runs to the end of the drive, then ada0p3 and ada0 will share the last label. If vdev_geom opens ada0 when it should've opened ada0p3, then the pool won't be readable. If it opens ada0 when it should've opened ada0p1, then it will corrupt some other partition when it writes the 3rd and 4th labels.
The easiest way to reproduce this problem is to install a mirrored root pool with the default partition layout, then swap the positions of the two boot drives and reboot. Whether the bug manifests depends on the order in which geom lists its providers, which is arbitrary.
Fix this situation by modifying the search algorithm to prefer geom providers that have all four labels intact. If no such provider exists, then open whichever provider has the most.
Reviewed by: mav Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D10365
|
#
308592 |
|
12-Nov-2016 |
mav |
MFC r308055: Add vdev_reopening support to vdev_geom.
It allows to avoid extra GEOM providers flapping without significant need. Since GEOM got resize support, we don't need to reopen provider to get new size. If provider was orphaned and no longer valid, ZFS should already know that, and in such case reopen should be done in full as expected.
|
#
308590 |
|
12-Nov-2016 |
mav |
MFC r308051: Matching GUIDs, handle possible race on vdev detach.
In case of vdev detach, causing top level mirror vdev destruction, leaf vdev changes its GUID to one of the destroyed mirror, that creates race condition when GUID in vdev label may not match one in the pool config.
This change replicates logic nuance of vdev_validate() by adding special exception, matching the vdev GUID against the top level vdev GUID. Since this exception is not completely reliable (may give false positives if we fail to erase label on detached vdev), use it only as last resort.
Quick way to reproduce this scenario now is detach vdev from a pool with enabled autoextend. During vdev detach autoextend logic tries to reopen remaining vdev, that always fails now since in-memory configuration is already updated, while on-disk labels are not yet.
|
#
308588 |
|
12-Nov-2016 |
mav |
MFC r308049: Improve few debugging log messages.
|
#
308061 |
|
28-Oct-2016 |
mav |
MFC r300881, r302058 (by asomers): Avoid issuing spa config updates for physical path when not necessary
ZFS's configuration needs to be updated whenever the physical path for a device changes, but not when a new device is introduced. This is because new devices necessarily cause config updates, but only if they are actually accepted into the pool.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c Split vdev_geom_set_physpath out of vdev_geom_attrchanged. When setting the vdev's physical path, only request a config update if the physical path has changed. Don't request it when opening a device for the first time, because the config sync will happen anyway upstack.
sys/geom/geom_dev.c Split g_dev_set_physpath and g_dev_set_media out of g_dev_attrchanged
|
#
308060 |
|
28-Oct-2016 |
mav |
MFC r300059 (by asomers): Speed up vdev_geom_open_by_guids
Speedup is hard to measure because the only time vdev_geom_open_by_guids gets called on many drives at the same time is during boot. But with vdev_geom_open hacked to always call vdev_geom_open_by_guids, operations like "zpool create" speed up by 65%.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
* Read all of a vdev's labels in parallel instead of sequentially. * In vdev_geom_read_config, don't read the entire label, including the uberblock. That's a waste of RAM. Just read the vdev config nvlist. Reduces the IO and RAM involved with tasting from 1MB to 448KB.
|
#
308059 |
|
28-Oct-2016 |
mav |
MFC r298814 (by asomers): Fix a use-after-free when "zpool import" fails
clear vd->vdev_tsd in vdev_geom_close_locked instead of vdev_geom_detach. In the latter function, it would fail to happen in certain circumstances where cp->private was unset. Ideally, the latter should never happen, but it can happen when vdev open fails, or where spares are involved.
|
#
308058 |
|
28-Oct-2016 |
mav |
MFC r298786 (by asomers): Refactor vdev_geom_attach and friends to reduce code duplication
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c Move checks for provider's sectorsize and mediasize into a single location in vdev_geom_attach. Remove the zfs::vdev::taste class; it's ok to use the regular vdev class for tasting. Consolidate guid checks into a single location in vdev_attach_ok. Consolidate some error handling code from vdev_geom_attach into vdev_geom_detach, closing a resource leak of geom consumers in the process.
|
#
308057 |
|
28-Oct-2016 |
mav |
MFC r294329 (by asomers): Disallow zvol-backed ZFS pools
Using zvols as backing devices for ZFS pools is fraught with panics and deadlocks. For example, attempting to online a missing device in the presence of a zvol can cause a panic when vdev_geom tastes the zvol. Better to completely disable vdev_geom from ever opening a zvol. The solution relies on setting a thread-local variable during vdev_geom_open, and returning EOPNOTSUPP during zvol_open if that thread-local variable is set.
Remove the check for MUTEX_HELD(&zfsdev_state_lock) in zvol_open. Its intent was to prevent a recursive mutex acquisition panic. However, the new check for the thread-local variable also fixes that problem.
Also, fix a panic in vdev_geom_taste_orphan. For an unknown reason, this function was set to panic. But it can occur that a device disappears during tasting, and it causes no problems to ignore this departure.
|
#
307296 |
|
14-Oct-2016 |
mav |
MFC r305456 (by avg): fix zfs pool creation accidentally broken by r305331
The upstream change introduced a new load state, SPA_LOAD_CREATE, and vdev_geom code needs to be aware of it.
|
#
299958 |
|
16-May-2016 |
asomers |
MFC r298072
Don't corrupt ZFS label's physpath attribute when booting while a disk is missing
Prior to this change, vdev_geom_open_by_path would call vdev_geom_attach prior to verifying the device's GUIDs. vdev_geom_attach calls vdev_geom_attrchange to set the physpath in the vdev object. The result is that if the disk could not be found, then the labels for other disks in the same TLD would overwrite the missing disk's physpath with the physpath of whichever disk currently has the same devname as the missing one used to have.
|
#
299536 |
|
12-May-2016 |
asomers |
MFC r297986, r298017 to vdev_geom.c
r297986 | asomers | 2016-04-14 13:20:31 -0600 (Thu, 14 Apr 2016) | 6 lines
Update a debugging message in vdev_geom_open_by_guids for consistency with similar messages elsewhere in the file.
r298017 | asomers | 2016-04-14 17:14:41 -0600 (Thu, 14 Apr 2016) | 8 lines
Add more debugging statements in vdev_geom.c
Log a debugging message whenever geom functions fail in vdev_geom_attach. Printing these messages is controlled by vfs.zfs.debug
|
#
299376 |
|
10-May-2016 |
asomers |
MFC 297868
Fix rare double free in vdev_geom_attrchanged
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c Don't drop the g_topology_lock before freeing old_physpath. That opens up a race where one thread can call vdev_geom_attrchanged, set old_physpath, drop the g_topology_lock, then block trying to acquire the SCL_STATE lock. Then another thread can come into vdev_geom_attrchanged, set old_physpath to the same value, and proceed to free it. When the first thread resumes, it will free the same location.
It turns out that the SCL_STATE lock isn't needed. It was originally added by gibbs to protect vd->vdev_physpath while updating the same. However, the update process subsequently was switched to an atomic operation (a pointer swap). Now, there is no need for the SCL_STATE lock, and hence no need to drop the g_topology_lock.
|
#
297957 |
|
14-Apr-2016 |
mav |
MFC r297672: Alike to r293708 relax pool check in vdev_geom_open_by_path().
This made impossible spare disk open by known path, which kind of worked only because the same fix was applied to vdev_geom_attach_by_guids() in r293708.
|
#
297123 |
|
20-Mar-2016 |
mav |
MFC r296615: Make ZFS ignore stripe sizes above SPA_MAXASHIFT (8KB).
If device has stripe size bigger then maximal sector size supported by ZFS, there is nothing can be done to avoid read-modify-write cycles. Taking that stripe size into account will only reduce space efficiency and pointlessly bother user with warnings that can not be fixed.
|
#
297122 |
|
20-Mar-2016 |
mav |
MFC r296613: Make ZFS more picky to GEOM stripe sizes and offsets.
Use of misaligned or non-power-of-2 stripes is not really useful for ZFS, since increased ashift won't help to avoid read-modify-write cycles, and only reduce pool space efficiency and compression rates.
|
#
297108 |
|
20-Mar-2016 |
mav |
MFC r296510, r296563, r296567: MFV r296505: 6531 Provide mechanism to artificially limit disk performance
Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Prakash Surya <prakash.surya@delphix.com>
illumos/illumos-gate@97e81309571898df9fdd94aab1216dfcf23e060b
|
#
297090 |
|
20-Mar-2016 |
mav |
MFC r293677 (by asomers): Record physical path information in ZFS Vdevs
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c: If available, record the physical path of a vdev in ZFS meta-data. Do this both when opening the vdev, and when receiving an attribute change notification from GEOM.
Make vdev_geom_close() synchronous instead of deferring its work to a GEOM event handler. There is no benefit to deferring the work and this prevents a future open call from referencing a consumer that is scheduled for destruction. The close followed by an immediate open will occur during a vdev reprobe triggered by any type of I/O error.
Consolidate vdev_geom_close() and vdev_geom_detach() into vdev_geom_close() and vdev_geom_close_locked(). This also moves the cross linking operations between vdev and GEOM consumer into a single place (linking in vdev_geom_attach() and unlinking in vdev_geom_close_locked()).
Differential Revision: https://reviews.freebsd.org/D4524
|
#
297078 |
|
20-Mar-2016 |
mav |
MFC r274304 (by delphij): MFV r274272 and diff reduction with upstream.
Illumos issue: 5244 zio pipeline callers should explicitly invoke next stage
|
#
294843 |
|
26-Jan-2016 |
asomers |
MFC r292066, r292069, r293708, r294027, and r294358, mostly to vdev_geom.c
r292066 | asomers | 2015-12-10 14:46:21 -0700 (Thu, 10 Dec 2015) | 25 lines
During vdev_geom_open, require that the vdev guids match the device's label except during split, add, or create operations. This fixes a bug where the wrong disk could be returned, and higher layers of ZFS would immediately eject it again.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c: o When opening by GUID, require both the pool and vdev GUIDs to match. While it is highly unlikely for two vdevs to have the same vdev GUIDs, the ZFS storage pool allocator only guarantees they are unique within a pool.
o Modify the open behavior to: - If we are opening a vdev that hasn't previously been opened, open by path without checking GUIDs. - Otherwise, open by path and verify GUIDs. - If that fails, search all geom providers for a device with matching GUIDs. - If that fails, return ENOENT.
r292069 | asomers | 2015-12-10 17:04:13 -0700 (Thu, 10 Dec 2015) | 6 lines
Change an important error message from ZFS_LOG to printf
r293708 | asomers | 2016-01-11 15:15:46 -0700 (Mon, 11 Jan 2016) | 16 lines
Fix importing l2arc device by guid
After r292066, vdev_geom verifies both the vdev and pool guids of device labels during open. However, spare and l2arc devices don't have pool guids, so opening them by guid will fail (opening by path, when the pathname is known, still succeeds). This change allows a vdev to be opened by guid if the label contains no pool_guid, which is the case for inactive spares and l2arc devices.
r294027 | asomers | 2016-01-14 11:19:05 -0700 (Thu, 14 Jan 2016) | 14 lines
Fix race condition involving ZFS remove events
When a ZFS drive disappears, ZFS sends a resource.fs.zfs.removed event to userland. A userland program like zfsd(8) can use that event, for example to activate a hotspare. The current code contains a race condition: vdev_geom will sent the sysevent _before_ spa.c would update the vdev's status, causing userland processes to see pool state that does not reflect the device removal. This change moves the sysevent to spa.c, closing the race.
r294358 | asomers | 2016-01-19 16:16:24 -0700 (Tue, 19 Jan 2016) | 10 lines
Quell harmless CID about unchecked return value in nvlist_get_guids.
The return value doesn't need to be checked, because nvlist_get_guid's callers check the returned values of the guids.
|
#
274800 |
|
21-Nov-2014 |
smh |
MFC r274619: Disable TRIM on file backed ZFS vdevs and fix TRIM on init
Sponsored by: Multiplay
|
#
271238 |
|
07-Sep-2014 |
smh |
MFC r256956: Improve ZFS N-way mirror read performance by using load and locality information.
MFC r260713: Fix ZFS mirror code for handling multiple DVA's
Also make the addition of the d_rotation_rate binary compatible. This allows storage drivers compiled for 10.0 to work by preserving the ABI for disks.
Approved by: re (gjb) Sponsored by: Multiplay
|
#
270312 |
|
21-Aug-2014 |
smh |
MFC r265152 - Reintroduce priority for the TRIM ZIOs instead of using the "NOW" priority MFC r265321 - Fix double fault panic when returning EOPNOTSUPP MFC r269407 - Don't return ZIO_PIPELINE_CONTINUE from vdev_op_io_start methods
Sponsored by: Multiplay
|
#
266720 |
|
26-May-2014 |
smh |
MFC r264885
Eliminate duplicate checks in vdev_geom_io_intr error handling
Sponsored by: Multiplay
|
#
260517 |
|
10-Jan-2014 |
asomers |
MFC 259240 sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c When a da or ada device dissappears, outstanding IOs fail with ENXIO, not EIO. The check for EIO was probably copied from Illumos, where that is indeed the correct errno.
Without this change, pulling a busy drive from a zpool would usually turn it into UNAVAIL, even though pulling an idle drive would turn it into REMOVED. With this change, it is REMOVED every time.
Also, vdev_geom_io_intr shouldn't do zfs_post_remove, because that results in devd getting two resource.fs.zfs.removed events. The comment said that the event had to be sent directly instead of through the async removal thread because "the DE engine is using this information to discard prevoius I/O errors". However, the fact that vdev_geom_io_intr was never actually sending the events until now, and that vdev_geom_orphan never sent them at all, and that vdev_geom_orphan usually gets called about 2 seconds after the actual removal, means that FreeBSD's userland can cope with a late event just fine.
|
#
260385 |
|
06-Jan-2014 |
scottl |
MFC Alexander Motin's GEOM direct dispatch work:
r256603: Introduce new function devstat_end_transaction_bio_bt(), adding new argument to specify present time. Use this function to move binuptime() out of lock, substantially reducing lock congestion when slow timecounter is used.
r256606: Move g_io_deliver() out of the lock, as required for direct dispatch. Move g_destroy_bio() out too to reduce lock scope even more.
r256607: Fix passing uninitialized bio_resid argument to g_trace().
r256610: Add unmapped I/O support to GEOM RAID.
r256830: Restore BIO_UNMAPPED and BIO_TRANSIENT_MAPPING in biodonne() when unmapping temporary mapped buffer. That fixes double unmap if biodone() called twice for the same BIO (but with different done methods).
r256880: Merge GEOM direct dispatch changes from the projects/camlock branch.
When safety requirements are met, it allows to avoid passing I/O requests to GEOM g_up/g_down thread, executing them directly in the caller context. That allows to avoid CPU bottlenecks in g_up/g_down threads, plus avoid several context switches per I/O.
r259247: Fix bug introduced at r256607. We have to recalculate bp_resid here since sizes of original and completed requests may differ due to end of media.
Testing of the stable/10 merge was done by Netflix, but all of the credit goes to Alexander and iX Systems.
Submitted by: mav Sponsored by: iX Systems
|
#
260339 |
|
05-Jan-2014 |
mav |
MFC r259168: Don't even try to read vdev labels from devices smaller then SPA_MINDEVSIZE (64MB). Even if we would find one somehow, ZFS kernel code rejects such devices. It is funny to look on attempts to read 4 256K vdev labels from 1.44MB floppy, though it is not very practical and quite slow.
|
#
274800 |
|
21-Nov-2014 |
smh |
MFC r274619: Disable TRIM on file backed ZFS vdevs and fix TRIM on init
Sponsored by: Multiplay
|
#
271238 |
|
07-Sep-2014 |
smh |
MFC r256956: Improve ZFS N-way mirror read performance by using load and locality information.
MFC r260713: Fix ZFS mirror code for handling multiple DVA's
Also make the addition of the d_rotation_rate binary compatible. This allows storage drivers compiled for 10.0 to work by preserving the ABI for disks.
Approved by: re (gjb) Sponsored by: Multiplay
|
#
270312 |
|
21-Aug-2014 |
smh |
MFC r265152 - Reintroduce priority for the TRIM ZIOs instead of using the "NOW" priority MFC r265321 - Fix double fault panic when returning EOPNOTSUPP MFC r269407 - Don't return ZIO_PIPELINE_CONTINUE from vdev_op_io_start methods
Sponsored by: Multiplay
|
#
266720 |
|
26-May-2014 |
smh |
MFC r264885
Eliminate duplicate checks in vdev_geom_io_intr error handling
Sponsored by: Multiplay
|
#
260517 |
|
10-Jan-2014 |
asomers |
MFC 259240 sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c When a da or ada device dissappears, outstanding IOs fail with ENXIO, not EIO. The check for EIO was probably copied from Illumos, where that is indeed the correct errno.
Without this change, pulling a busy drive from a zpool would usually turn it into UNAVAIL, even though pulling an idle drive would turn it into REMOVED. With this change, it is REMOVED every time.
Also, vdev_geom_io_intr shouldn't do zfs_post_remove, because that results in devd getting two resource.fs.zfs.removed events. The comment said that the event had to be sent directly instead of through the async removal thread because "the DE engine is using this information to discard prevoius I/O errors". However, the fact that vdev_geom_io_intr was never actually sending the events until now, and that vdev_geom_orphan never sent them at all, and that vdev_geom_orphan usually gets called about 2 seconds after the actual removal, means that FreeBSD's userland can cope with a late event just fine.
|
#
260385 |
|
06-Jan-2014 |
scottl |
MFC Alexander Motin's GEOM direct dispatch work:
r256603: Introduce new function devstat_end_transaction_bio_bt(), adding new argument to specify present time. Use this function to move binuptime() out of lock, substantially reducing lock congestion when slow timecounter is used.
r256606: Move g_io_deliver() out of the lock, as required for direct dispatch. Move g_destroy_bio() out too to reduce lock scope even more.
r256607: Fix passing uninitialized bio_resid argument to g_trace().
r256610: Add unmapped I/O support to GEOM RAID.
r256830: Restore BIO_UNMAPPED and BIO_TRANSIENT_MAPPING in biodonne() when unmapping temporary mapped buffer. That fixes double unmap if biodone() called twice for the same BIO (but with different done methods).
r256880: Merge GEOM direct dispatch changes from the projects/camlock branch.
When safety requirements are met, it allows to avoid passing I/O requests to GEOM g_up/g_down thread, executing them directly in the caller context. That allows to avoid CPU bottlenecks in g_up/g_down threads, plus avoid several context switches per I/O.
r259247: Fix bug introduced at r256607. We have to recalculate bp_resid here since sizes of original and completed requests may differ due to end of media.
Testing of the stable/10 merge was done by Netflix, but all of the credit goes to Alexander and iX Systems.
Submitted by: mav Sponsored by: iX Systems
|
#
260339 |
|
05-Jan-2014 |
mav |
MFC r259168: Don't even try to read vdev labels from devices smaller then SPA_MINDEVSIZE (64MB). Even if we would find one somehow, ZFS kernel code rejects such devices. It is funny to look on attempts to read 4 256K vdev labels from 1.44MB floppy, though it is not very practical and quite slow.
|