History log of /freebsd-10-stable/cddl/contrib/opensolaris/cmd/ztest/ztest.c
Revision Date Author Comments
# 326335 28-Nov-2017 asomers

MFC r324220:

MFC r316858 7280 Allow changing global libzpool variables in zdb

7280 Allow changing global libzpool variables in zdb and ztest through command line

illumos/illumos-gate@0e60744c982adecd0a1f146f5637475d07ab1069
https://github.com/illumos/illumos-gate/commit/0e60744c982adecd0a1f146f5637475d07ab1069

https://www.illumos.org/issues/7280
zdb is very handy for diagnosing problems with a pool in a safe and
quick way. When a pool is in a bad shape, we often want to disable some
fail-safes, or adjust some tunables in order to open them. In the
kernel, this is done by changing public variables in mdb. The goal of
this feature is to add the same capability to zdb and ztest, so that
they can change libzpool tuneables from the command line.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>


# 325915 16-Nov-2017 avg

MFC r325035: MFV r325013,r325034: 640 number_to_scaled_string is duplicated in several commands

FreeBSD note: of all libcmdutils functionality ZFS (and other illumos
contrib code) currently uses only nicenum() function (which is similar
to humanize_number but has some formatting differences). For this
reason I decided to not port the whole library. As a result, nicenum.c
from libcmdutils is compiled into libzfs and libzpool. This is a bit
ugly, but works. If one day we are forced to create libillumos, then
the file should be moved to that library.


# 320496 30-Jun-2017 avg

MFC r308782:
After some ZIL changes 6 years ago zil_slog_limit got partially broken
due to zl_itx_list_sz not updated when async itx'es upgraded to sync.
Actually because of other changes about that time zl_itx_list_sz is not
really required to implement the functionality, so this patch removes
some unneeded broken code and variables.

Original idea of zil_slog_limit was to reduce chance of SLOG abuse by
single heavy logger, that increased latency for other (more latency critical)
loggers, by pushing heavy log out into the main pool instead of SLOG. Beside
huge latency increase for heavy writers, this implementation caused double
write of all data, since the log records were explicitly prepared for SLOG.
Since we now have I/O scheduler, I've found it can be much more efficient
to reduce priority of heavy logger SLOG writes from ZIO_PRIORITY_SYNC_WRITE
to ZIO_PRIORITY_ASYNC_WRITE, while still leave them on SLOG.

Existing ZIL implementation had problem with space efficiency when it
has to write large chunks of data into log blocks of limited size. In some
cases efficiency stopped to almost as low as 50%. In case of ZIL stored on
spinning rust, that also reduced log write speed in half, since head had to
uselessly fly over allocated but not written areas. This change improves
the situation by offloading problematic operations from z*_log_write() to
zil_lwb_commit(), which knows real situation of log blocks allocation and
can split large requests into pieces much more efficiently. Also as side
effect it removes one of two data copy operations done by ZIL code WR_COPIED
case.

While there, untangle and unify code of z*_log_write() functions.
Also zfs_log_write() alike to zvol_log_write() can now handle writes crossing
block boundary, that may also improve efficiency if ZPL is made to do that.


# 307276 14-Oct-2016 mav

MFC r305328: MFV r303081: 7163 ztest failures due to excess error injection

illumos/illumos-gate@f34284d835bc555f987c1310df46c034c3101155
https://github.com/illumos/illumos-gate/commit/f34284d835bc555f987c1310df46c034c
3101155

https://www.illumos.org/issues/7163
Running zloop from zfs-precommit hit this assertion:
*panicstr/s
0xfffffd7fd7419370: assertion failed for thread 0xfffffd7fe29ed240,
thread-id 577: parent != NULL, file ../../../uts/common/fs/zfs/dbuf.c, line
1827
$c
libc.so.1`_lwp_kill+0xa()
libc.so.1`_assfail+0x182(fffffd7ffb1c29fa, fffffd7ffb1cc028, 723)
libc.so.1`assfail+0x19(fffffd7ffb1c29fa, fffffd7ffb1cc028, 723)
libzpool.so.1`dbuf_dirty+0xc69(10e3bc10, 3601700)
libzpool.so.1`dbuf_dirty+0x61e(10c73640, 3601700)
libzpool.so.1`dbuf_dirty+0x61e(10e28280, 3601700)
libzpool.so.1`dmu_buf_will_fill+0x64(10e28280, 3601700)
libzpool.so.1`dmu_write+0x1b6(2c7e640, d, 400000002e000000, 200, 3717b40,
3601700)
ztest_replay_write+0x568(4950d0, 3717a80, 0)
ztest_write+0x125(4950d0, d, 400000002e000000, 200, 413f000)
ztest_io+0x1bb(4950d0, d, 400000002e000000)
ztest_dmu_write_parallel+0xaa(4950d0, 6)
ztest_execute+0x83(1, 420c98, 6)
ztest_thread+0xf4(6)
libc.so.1`_thrp_setup+0x8a(fffffd7fe29ed240)
libc.so.1`_lwp_start()
This is another manifestation of ECKSUM in ztest:
The lowest level ancestor that’s in memory is the L8 (topmost). The L7
ancestor is blkid 0x10:
::dbufs -O 0x2c7e640 -o d -l 7 |::dbuf
addr object lvl blkid holds os
600be50 d 7 4 1 ztest/ds_6
719d880 d 7 0 4 ztest/ds_6

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>


# 307274 14-Oct-2016 mav

MFC r305327: MFV r303080: 6451 ztest fails due to checksum errors

illumos/illumos-gate@f9eb9fdf196b6ed476e4ffc69cecd8b0da3cb7e7
https://github.com/illumos/illumos-gate/commit/f9eb9fdf196b6ed476e4ffc69cecd8b0d
a3cb7e7

https://www.illumos.org/issues/6451
Sometimes ztest fails because zdb detects checksum errors. e.g.:
Traversing all blocks to verify checksums and verify nothing leaked ...
zdb_blkptr_cb: Got error 50 reading <71, 47, 0, 8000160> DVA0=<0:1cc2000:
180000> [L0 other uint64[]] sha256 uncompressed LE contiguou
s unique single size=100000L/100000P birth=271L/271P fill=1
cksum=c5a3e27d1ed0f894:843bca3a5473c4bf:f76a19b6830a2e4:91292591613a12bf --
skipping
zdb_blkptr_cb: Got error 50 reading <71, 47, 0, 800000180> DVA0=<0:ce16800:
180000> [L0 other uint64[]] sha256 uncompressed LE contigu
ous unique single size=100000L/100000P birth=840L/840P fill=1
cksum=5d018f3d061e17f3:6d1584784587bf63:2805a74a0ce37369:ba68a214806c7e75
-- skipping
zdb_blkptr_cb: Got error 50 reading <71, 47, 0, 1000000360> DVA0=<0:10d37400:
180000> [L0 other uint64[]] sha256 uncompressed LE conti
guous unique single size=100000L/100000P birth=904L/904P fill=1
cksum=fa1e11d4138bd14b:86c9488c444473e3:f31e43c72e72e46b:e3446472d1174d
ba -- skipping
zdb_blkptr_cb: Got error 50 reading <71, 47, 0, 400000002c0> DVA0=<0:127ef400:
180000> [L0 other uint64[]] sha256 uncompressed LE cont
iguous dedup single size=100000L/100000P birth=549L/549P fill=1
cksum=30e14955ebf13522:66dc2ff8067e6810:4607e750abb9d3b3:6582b8af909fcb
58 -- skipping
zdb_blkptr_cb: Got error 50 reading <657, 5, 0, 1c0> DVA0=<0:1a180400:180000>
[L0 other uint64[]] fletcher4 uncompressed LE contiguou
s unique single size=100000L/100000P birth=1091L/1091P fill=1 cksum=a6cf1e50:
29b3bd01c57e5:36779b914035db9a:db61cdcf6bec56f0 -- skippin
g
The problem is that ztest_fault_inject() can inject multiple faults into the
same block. It is designed such that it can inject errors on all leafs of a
RAID-Z or mirror, but for a given range of offsets, it will only inject errors

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Jorgen Lundman <lundman@lundman.net>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Matthew Ahrens <mahrens@delphix.com>


# 307272 14-Oct-2016 mav

MFC r305326: MFV r303079:
7147 ztest: ztest_ddt_repair fails with ztest_pattern_match assertion

illumos/illumos-gate@aab80726335c76a7cae32c7300890248d73a51e3
https://github.com/illumos/illumos-gate/commit/aab80726335c76a7cae32c7300890248d
73a51e3

https://www.illumos.org/issues/7147
Here's the dbuf we're currently reading:
966f200::dbuf
addr object lvl blkid holds os
966f200 4 0 0 1 ztest/ds_3
966f200::print dmu_buf_t db_data
db_data = 0x9ae0400
0x9ae0400/10J
0x9ae0400: c1c7ced932020d c1c7ced932020d c1c7ced932020d c1c7ced932020d
c1c7ced932020d c1c7ced932020d c1c7ced932020d c1c7ced932020d
c1c7ced932020d c1c7ced932020d
The pattern we're expecting is actually this: a34ae10b5f2db2. If we attempt to
read the block on disk we find that it has matches what ztest_ddt_repair()
would have written:
~c1c7ced932020d=J
ff3e383126cdfdf2
966f200::print dmu_buf_impl_t db_blkptr | ::blkptr
DVA0=<0:71d3c00:800>
[L0 UINT64_OTHER] SHA256 OFF LE contiguous dedup single
size=400L/400P birth=55L/55P fill=1
cksum=18486450d3ce8c6d:75a72f4bbf117b0f:2d3a226314eb5650:2eb0fd68648b1af0
1. zdb -U /rpool/tmp/zpool.cache -R ztest 0:71d3c00:800 | head
Found vdev type: mirror
0:71d3c00:800
0 1 2 3 4 5 6 7 8 9 a b c d e f 0123456789abcdef
000000: ff3e383126cdfdf2 ff3e383126cdfdf2 ...&18>....&18>.
000010: ff3e383126cdfdf2 ff3e383126cdfdf2 ...&18>....&18>.
000020: ff3e383126cdfdf2 ff3e383126cdfdf2 ...&18>....&18>.
000030: ff3e383126cdfdf2 ff3e383126cdfdf2 ...&18>....&18>.
000040: ff3e383126cdfdf2 ff3e383126cdfdf2 ...&18>....&18>.
000050: ff3e383126cdfdf2 ff3e383126cdfdf2 ...&18>....&18>.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: George Wilson <george.wilson@delphix.com>


# 307266 14-Oct-2016 mav

MFC r305323: MFV r302991: 6950 ARC should cache compressed data

illumos/illumos-gate@dcbf3bd6a1f1360fc1afcee9e22c6dcff7844bf2
https://github.com/illumos/illumos-gate/commit/dcbf3bd6a1f1360fc1afcee9e22c6dcff
7844bf2

https://www.illumos.org/issues/6950
When reading compressed data from disk, the ARC should keep the compressed
block cached and only decompress it when consumers access the block. The
uncompressed data should be short-lived allowing the ARC to cache a much large
r
amount of data. The DMU would also maintain a smaller cache of uncompressed
blocks to minimize the impact of decompressing frequently accessed blocks.

Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Don Brady <don.brady@intel.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: George Wilson <george.wilson@delphix.com>


# 307122 11-Oct-2016 mav

MFC r305209: MFV r302660: 6314 buffer overflow in dsl_dataset_name

illumos/illumos-gate@9adfa60d484ce2435f5af77cc99dcd4e692b6660
https://github.com/illumos/illumos-gate/commit/9adfa60d484ce2435f5af77cc99dcd4e6
92b6660

https://www.illumos.org/issues/6314
Callers of dsl_dataset_name pass a buffer of size ZFS_MAXNAMELEN, but
dsl_dataset_name copies the datasets' name PLUS the snapshot name to it,
resulting in a max of 2 * ZFS_MAXNAMELEN + '@'.

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Matthew Ahrens <mahrens@delphix.com>


# 297112 20-Mar-2016 mav

MFC r296519: MFV r296518: 5027 zfs large block support (add copyright)

Author: Matthew Ahrens <matt@mahrens.org>

illumos/illumos-gate@c3d26abc9ee97b4f60233556aadeb57e0bd30bb9


# 288571 03-Oct-2015 mav

MFC r286705: 5960 zfs recv should prefetch indirect blocks
5925 zfs receive -o origin=

Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Author: Paul Dagnelie <pcd@delphix.com>

While running 'zfs recv' we noticed that every 128th 8K block required a
read. We were seeing that restore_write() was calling dmu_tx_hold_write()
and the indirect block was not cached. We should prefetch upcoming indirect
blocks to avoid having to go to disk and blocking the restore_write().

Allow an incremental send stream to be received as a clone, even if the
stream does not mark it as a clone.


# 287184 26-Aug-2015 delphij

MFC r286737:

Plug a memory leak.


# 285001 01-Jul-2015 avg

MFC r284304: MFV r284030: 5818 zfs {ref}compressratio is incorrect with 4k sector size

Note: no MFC to stable/9 because r268075 (vendor r267565) has not been MFC-ed.


# 276081 22-Dec-2014 delphij

MFC r274337,r274673,274681,r275515:

ZFS large block support. The default recordsize remains at 128KB.

A new tunable/sysctl variable, vfs.zfs.max_recordsize is added to
allow adjusting the permitted maximum record size, or
zfs_max_recordsize, with a default of 1MB. ZFS will not allow
setting recordsize greater than zfs_max_recordsize as a safety
belt, because larger recordsize means greater read and write
latency and more memory usage.

Please note that booting from datasets that have recordsize greater
than 128KB is not supported (but it's Okay to enable the feature on
the pool).

Limited safety belt is provided for mounted root filesystem but use
caution when using a larger value.

Illumos issue:
5027 zfs large block support


# 270126 18-Aug-2014 delphij

MFC r269430: MFV r269426:

Double test device size for ztest(1).

Illumos issue:
5039 ztest should default to larger device sizes
Author: Matthew Ahrens <mahrens@delphix.com>


# 269416 02-Aug-2014 delphij

MFC r268855: MFV r268848:

Instead of asserting all zio's be properly aligned, only assert
on the logical ones.

Cap uberblocks at 8k, otherwise with ashift=17, there would be
only one uberblock.

This fixes a problem that zdb would trip assert on pools with
ashift >= 0xe (8k).

While there, also change the code so it only attempt to condense
space map unless the uncondensed size consumes greater than
zfs_metaslab_condense_block_threshold blocks.

Illumos issue:
4958 zdb trips assert on pools with ashift >= 0xe


# 268656 15-Jul-2014 delphij

MFC r268086: MFV r267570:

4756 metaslab_group_preload() could deadlock


# 268649 15-Jul-2014 delphij

MFC r268075: MFV r267565:

4757 ZFS embedded-data block pointers ("zero block compression")
4913 zfs release should not be subject to space checks


# 262093 17-Feb-2014 avg

MFC r258717: MFV r258371,r258372: 4101 metaslab_debug should allow for
fine-grained control


# 260763 16-Jan-2014 avg

MFC r258632,258704: MFV r255255: 4045 zfs write throttle & i/o scheduler
performance work

Sponsored by: HybridCluster [merge]


# 288571 03-Oct-2015 mav

MFC r286705: 5960 zfs recv should prefetch indirect blocks
5925 zfs receive -o origin=

Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Author: Paul Dagnelie <pcd@delphix.com>

While running 'zfs recv' we noticed that every 128th 8K block required a
read. We were seeing that restore_write() was calling dmu_tx_hold_write()
and the indirect block was not cached. We should prefetch upcoming indirect
blocks to avoid having to go to disk and blocking the restore_write().

Allow an incremental send stream to be received as a clone, even if the
stream does not mark it as a clone.


# 287184 26-Aug-2015 delphij

MFC r286737:

Plug a memory leak.


# 285001 01-Jul-2015 avg

MFC r284304: MFV r284030: 5818 zfs {ref}compressratio is incorrect with 4k sector size

Note: no MFC to stable/9 because r268075 (vendor r267565) has not been MFC-ed.


# 276081 22-Dec-2014 delphij

MFC r274337,r274673,274681,r275515:

ZFS large block support. The default recordsize remains at 128KB.

A new tunable/sysctl variable, vfs.zfs.max_recordsize is added to
allow adjusting the permitted maximum record size, or
zfs_max_recordsize, with a default of 1MB. ZFS will not allow
setting recordsize greater than zfs_max_recordsize as a safety
belt, because larger recordsize means greater read and write
latency and more memory usage.

Please note that booting from datasets that have recordsize greater
than 128KB is not supported (but it's Okay to enable the feature on
the pool).

Limited safety belt is provided for mounted root filesystem but use
caution when using a larger value.

Illumos issue:
5027 zfs large block support


# 270126 18-Aug-2014 delphij

MFC r269430: MFV r269426:

Double test device size for ztest(1).

Illumos issue:
5039 ztest should default to larger device sizes
Author: Matthew Ahrens <mahrens@delphix.com>


# 269416 02-Aug-2014 delphij

MFC r268855: MFV r268848:

Instead of asserting all zio's be properly aligned, only assert
on the logical ones.

Cap uberblocks at 8k, otherwise with ashift=17, there would be
only one uberblock.

This fixes a problem that zdb would trip assert on pools with
ashift >= 0xe (8k).

While there, also change the code so it only attempt to condense
space map unless the uncondensed size consumes greater than
zfs_metaslab_condense_block_threshold blocks.

Illumos issue:
4958 zdb trips assert on pools with ashift >= 0xe


# 268656 15-Jul-2014 delphij

MFC r268086: MFV r267570:

4756 metaslab_group_preload() could deadlock


# 268649 15-Jul-2014 delphij

MFC r268075: MFV r267565:

4757 ZFS embedded-data block pointers ("zero block compression")
4913 zfs release should not be subject to space checks


# 262093 17-Feb-2014 avg

MFC r258717: MFV r258371,r258372: 4101 metaslab_debug should allow for
fine-grained control


# 260763 16-Jan-2014 avg

MFC r258632,258704: MFV r255255: 4045 zfs write throttle & i/o scheduler
performance work

Sponsored by: HybridCluster [merge]