#
325932 |
|
17-Nov-2017 |
avg |
MFC r325610: MFV r325609: 7531 Assign correct flags to prefetched buffers
|
#
308083 |
|
29-Oct-2016 |
mav |
MFC r306424: MFV r306422: 7254 ztest failed assertion in ztest_dataset_dirobj_verify: dirobjs + 1 == usedo bjs
dsl_dataset_space is looking at the ds_bp's fill count while dmu_objset_write_ready() is concurrently modifying it. This fix adds an rrwlock to protect the ds_bp.
Closes #180
Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Steve Gonczi <steve.gonczi@delphix.com> Author: Paul Dagnelie <pcd@delphix.com>
|
#
307292 |
|
14-Oct-2016 |
mav |
MFC r305340: MFC r305337: 7004 dmu_tx_hold_zap() does dnode_hold() 7x on same object
Using a benchmark which has 32 threads creating 2 million files in the same directory, on a machine with 16 CPU cores, I observed poor performance. I noticed that dmu_tx_hold_zap() was using about 30% of all CPU, and doing dnode_hold() 7 times on the same object (the ZAP object that is being held).
dmu_tx_hold_zap() keeps a hold on the dnode_t the entire time it is running, in dmu_tx_hold_t:txh_dnode, so it would be nice to use the dnode_t that we already have in hand, rather than repeatedly calling dnode_hold(). To do this, we need to pass the dnode_t down through all the intermediate calls that dmu_tx_hold_zap() makes, making these routines take the dnode_t* rather than an objset_t* and a uint64_t object number. In particular, the following routines will need to have analogous *_by_dnode() variants created:
dmu_buf_hold_noread() dmu_buf_hold() zap_lookup() zap_lookup_norm() zap_count_write() zap_lockdir() zap_count_write()
This can improve performance on the benchmark described above by 100%, from 30,000 file creations per second to 60,000. (This improvement is on top of that provided by working around the object allocation issue. Peak performance of ~90,000 creations per second was observed with 8 CPUs; adding CPUs past that decreased performance due to lock contention.) The CPU used by dmu_tx_hold_zap() was reduced by 88%, from 340 CPU-seconds to 40 CPU-seconds.
Sponsored by: Intel Corp.
Closes #109
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Ned Bass <bass6@llnl.gov> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Author: Matthew Ahrens <mahrens@delphix.com>
openzfs/openzfs@d3e523d489a169ab36f9ec1b2a111a60a5563a9f
|
#
307287 |
|
14-Oct-2016 |
mav |
MFC r305338: MFV r305335: 7003 zap_lockdir() should tag hold
zap_lockdir() / zap_unlockdir() should take a "void *tag" argument which tags the hold on the zap. This will help diagnose programming errors which misuse the hold on the ZAP.
Sponsored by: Intel Corp.
Closes #108
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Steve Gonczi <steve.gonczi@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Author: Matthew Ahrens <mahrens@delphix.com>
openzfs/openzfs@0780b3eab5a2c13e04328b39ecd2a6d0d3c4f7cb
|
#
307270 |
|
14-Oct-2016 |
mav |
MFC r305325: MFV r303078: 7086 ztest attempts dva_get_dsize_sync on an embedded blockpointer
illumos/illumos-gate@926549256b71acd595f69b236779ff6b78fa08ef https://github.com/illumos/illumos-gate/commit/926549256b71acd595f69b236779ff6b7 8fa08ef
https://www.illumos.org/issues/7086 In dbuf_dirty(), we need to grab the dn_struct_rwlock before looking at the db_blkptr, to prevent it from being changed by syncing context. Otherwise we may see that ztest got a segfault from this stack: libzpool.so.1`dva_get_dsize_sync+0x98(872f000, b32b240, fed7811b, 0, b4cda20, 0) libzpool.so.1`bp_get_dsize+0x60(872f000, b32b240, 0, 97cb780, 9d4c1a8, 0) libzpool.so.1`dbuf_dirty+0x9b3(ce0a100, 97cb780, 9, fecd2530) libzpool.so.1`dmu_buf_will_dirty+0xc3(ce0a100, 97cb780, ea293d6c, 1) libzpool.so.1`zap_lockdir+0x1a0(8aaa3c0, 1, 0, 97cb780, 1, 1) libzpool.so.1`zap_remove_norm+0x30(8aaa3c0, 1, 0, 8728b10, 0, 97cb780) libzpool.so.1`zap_remove+0x29(8aaa3c0, 1, 0, 8728b10, 97cb780, a) ztest_replay_remove+0x225(ea294588, 8728ae8, 0, 38010000, 0, 0) ztest_remove+0x9f(ea294588, ea293f50, 4, 3) ztest_object_init+0x78(ea294588, ea293f50, 4e0, 1) ztest_dmu_object_alloc_free+0x71(ea294588, 13) ztest_dmu_objset_create_destroy+0x224(80cef08, 13, 0, 805d36c, 9017ad44, 0) ztest_execute+0x89(a, 807c720, 13, 0) ztest_thread+0xea(13, 0, 0, 0) libc.so.1`_thrp_setup+0x88(f0983240) libc.so.1`_lwp_start(f0983240, 0, 0, 0, 0, 0) Looking into it a bit, we see that this is an embedded blockpointer, so BP_GET_NDVAS should have returned 0: b32b240::blkptr EMBEDDED [L0 ZAP_OTHER] et=0 LZ4 size=200L/4aP birth=80L Instead, it looks like another thread is modifying this blockpointer: b32b240::ugrep | ::whatis f47a0e0c is in [ stack tid=0x19f ] ebd6ec40 is in [ stack tid=0x226 ] ea293bd0 is in [ stack tid=0x244 ] ea293be4 is in [ stack tid=0x244 ]
Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>
|
#
307266 |
|
14-Oct-2016 |
mav |
MFC r305323: MFV r302991: 6950 ARC should cache compressed data
illumos/illumos-gate@dcbf3bd6a1f1360fc1afcee9e22c6dcff7844bf2 https://github.com/illumos/illumos-gate/commit/dcbf3bd6a1f1360fc1afcee9e22c6dcff 7844bf2
https://www.illumos.org/issues/6950 When reading compressed data from disk, the ARC should keep the compressed block cached and only decompress it when consumers access the block. The uncompressed data should be short-lived allowing the ARC to cache a much large r amount of data. The DMU would also maintain a smaller cache of uncompressed blocks to minimize the impact of decompressing frequently accessed blocks.
Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Don Brady <don.brady@intel.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: George Wilson <george.wilson@delphix.com>
|
#
304139 |
|
15-Aug-2016 |
avg |
MFC r302838: 6513 partially filled holes lose birth time
|
#
304136 |
|
15-Aug-2016 |
avg |
MFC r302837: 6844 dnode_next_offset can detect fictional holes
|
#
299433 |
|
11-May-2016 |
mav |
MFC r297832: MFV r297831: 6322 ZFS indirect block predictive prefetch
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Author: Alexander Motin <mav@FreeBSD.org>
Improve speculative prefetch of indirect blocks.
Scalability of many operations on wide ZFS pool can be limited by requirement to prefetch indirect blocks first. Recently added asynchronous indirect block read partially helped, but did not solve the problem completely. This patch extends existing prefetcher functionality to explicitly work with indirect blocks.
Before this change prefetcher issued reads for up to 8MB of data in advance. With this change it also issues indirect block reads for up to 64MB of data in advance, so that when it will be time to actually read those data, it can be done immediately. Alike effect can be achieved by just increasing maximal data prefetch distance, but at higher memory cost.
Also this change introduces indirect block prefetch for rewrite operations, that was never done before. Previously ARC miss for Indirect blocks regularly blocked rewrites, converting perfectly aligned asynchronous operations into synchronous read-write pairs, significantly reducing maximal rewrite speed.
While being there this issue was also fixed: - prefetch was done always, even if caching for the dataset was completely disabled.
Testing on FreeBSD with zvol on top of 6x striped 2x mirrored pool of 12 assorted HDDs shown me such performance numbers: ------- BEFORE -------- Write 491363677 bytes/sec Read 312430631 bytes/sec Rewrite 97680464 bytes/sec -------- AFTER -------- Write 493524146 bytes/sec Read 438598079 bytes/sec Rewrite 277506044 bytes/sec
Closes #65 Closes #80
openzfs/openzfs@792fd28ac04f78cc5e43ead2d72a96f244ea84e8
|
#
297112 |
|
20-Mar-2016 |
mav |
MFC r296519: MFV r296518: 5027 zfs large block support (add copyright)
Author: Matthew Ahrens <matt@mahrens.org>
illumos/illumos-gate@c3d26abc9ee97b4f60233556aadeb57e0bd30bb9
|
#
290754 |
|
13-Nov-2015 |
mav |
MFC r289309: 6267 dn_bonus evicted too early
Reviewed by: Richard Yao <ryao@gentoo.org> Reviewed by: Xin LI <delphij@freebsd.org> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Justin T. Gibbs <gibbs@FreeBSD.org>
illumos/illumos-gate@d2058105c61ec61df3a2dd3f839fed8c3fe7bfd6
|
#
290750 |
|
13-Nov-2015 |
mav |
MFC r289297: 6288 dmu_buf_will_dirty could be faster
Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Justin Gibbs <gibbs@scsiguy.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@0f2e7d03b8f588387cb8dd8dd500cbe5ff4484e0
|
#
288594 |
|
03-Oct-2015 |
mav |
MFC r287702: 5987 zfs prefetch code needs work
Rewrite the ZFS prefetch code to detect only forward, sequential streams.
The following kstats have been added:
kstat.zfs.misc.arcstats.sync_wait_for_async
How many sync reads have waited for async read to complete. (less is better)
kstat.zfs.misc.arcstats.demand_hit_predictive_prefetch
How many demand read didn't have to wait for I/O because of predictive prefetch. (more is better)
zfetch kstats have been similified to hits, misses, and max_streams, with max_streams representing times when we were not able to create new stream because we already have the maximum number of sequences for a file.
The sysctl variable/loader tunable vfs.zfs.zfetch.block_cap have been replaced by vfs.zfs.zfetch.max_distance, which controls maximum bytes to prefetch per stream.
illumos/illumos-gate@cf6106c8a0d6598b045811f9650d66e07eb332af
Illumos ZFS issues:
5987 zfs prefetch code needs work https://www.illumos.org/issues/5987
|
#
288576 |
|
03-Oct-2015 |
mav |
Fix r288549 build on stable.
For some reason this (odd) code builds on head, but not on stable.
|
#
288572 |
|
03-Oct-2015 |
mav |
MFC r286708: 5959 clean up per-dataset feature count code
Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@ca0cc3918a1789fa839194af2a9245f801a06b1a
A ZFS feature flags (large blocks) tracks its refcounts as the number of datasets that have ever used the feature. Several features of this type are planned to be added (new checksum functions). This code should be made common infrastructure rather than duplicating the code for each feature.
|
#
288571 |
|
03-Oct-2015 |
mav |
MFC r286705: 5960 zfs recv should prefetch indirect blocks 5925 zfs receive -o origin=
Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Author: Paul Dagnelie <pcd@delphix.com>
While running 'zfs recv' we noticed that every 128th 8K block required a read. We were seeing that restore_write() was calling dmu_tx_hold_write() and the indirect block was not cached. We should prefetch upcoming indirect blocks to avoid having to go to disk and blocking the restore_write().
Allow an incremental send stream to be received as a clone, even if the stream does not mark it as a clone.
|
#
288549 |
|
03-Oct-2015 |
mav |
MFC r286575: 5056 ZFS deadlock on db_mtx and dn_holds
Reviewed by: Will Andrews <willa@spectralogic.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Justin Gibbs <justing@spectralogic.com>
illumos/illumos-gate@bc9014e6a81272073b9854d9f65dd59e18d18c35
|
#
288541 |
|
03-Oct-2015 |
mav |
MFC r286545: 5630 stale bonus buffer in recycled dnode_t leads to data corruption
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Will Andrews <will@freebsd.org> Approved by: Robert Mustacchi <rm@joyent.com> Author: Justin T. Gibbs <justing@spectralogic.com>
|
#
288538 |
|
03-Oct-2015 |
mav |
MFC r286541: 5531 NULL pointer dereference in dsl_prop_get_ds()
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Justin T. Gibbs <justing@spectralogic.com>
illumos/illumos-gate@e57a022b8f718889ffa92adbde47a8f08abcdb25
|
#
285202 |
|
06-Jul-2015 |
avg |
MFC r284593: MFV r284412: 5911 ZFS "hangs" while deleting file
illumos/illumos-gate@46e1baa6cf6d5432f5fd231bb588df8f9570c858 https://www.illumos.org/issues/5911 Sometimes ZFS appears to hang while deleting a file. It is actually making slow progress at the file deletion, but other operations (administrative and writes via the data path) "hang" until the file removal completes, which can take a long time if the file has many blocks. The deletion (or most of it) happens in a single txg, and the sync thread spends most of its time reading indirect blocks...
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com> Reviewed by: Alek Pinchuk <alek@nexenta.com> Reviewed by: Simon Klinkert <simon.klinkert@gmail.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Matthew Ahrens <mahrens@delphix.com>
PR: 199775 Approved by: re(kib)
|
#
277586 |
|
23-Jan-2015 |
delphij |
MFC r275811: MFV r275783:
Convert ARC flags to use enum. Previously, public flags are defined in arc.h and private flags are defined in arc.c which can lead to confusion and programming errors.
Consistently use 'hdr' (when referencing arc_buf_hdr_t) instead of 'buf' or 'ab' because arc_buf_t are often named 'buf' as well.
Illumos issue: 5369 arc flags should be an enum 5370 consistent arc_buf_hdr_t naming scheme
|
#
277585 |
|
23-Jan-2015 |
delphij |
MFC r275782: MFV r275551:
Remove "dbuf phys" db->db_data pointer aliases.
Use function accessors that cast db->db_data to the appropriate "phys" type, removing the need for clients of the dmu buf user API to keep properly typed pointer aliases to db->db_data in order to conveniently access their data.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_leaf.c: In zap_leaf() and zap_leaf_byteswap, now that the pointer alias field l_phys has been removed, use the db_data field in an on stack dmu_buf_t to point to the leaf's phys data.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c: Remove the db_user_data_ptr_ptr field from dbuf and all logic to maintain it.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dbuf.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dmu.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dir.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sa.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c: Modify the DMU buf user API to remove the ability to specify a db_data aliasing pointer (db_user_data_ptr_ptr).
cddl/contrib/opensolaris/cmd/zdb/zdb.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_diff.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_traverse.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_bookmark.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_deadlist.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_deleg.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_destroy.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dir.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_prop.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_synctask.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_userhold.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sa.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_history.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_leaf.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dataset.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dir.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zap_impl.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zap_leaf.h: Create and use the new "phys data" accessor functions dsl_dir_phys(), dsl_dataset_phys(), zap_m_phys(), zap_f_phys(), and zap_leaf_phys().
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dataset.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dir.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zap_impl.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zap_leaf.h: Remove now unused "phys pointer" aliases to db->db_data from clients of the DMU buf user API.
Illumos issue: 5314 Remove "dbuf phys" db->db_data pointer aliases in ZFS
|
#
276081 |
|
22-Dec-2014 |
delphij |
MFC r274337,r274673,274681,r275515:
ZFS large block support. The default recordsize remains at 128KB.
A new tunable/sysctl variable, vfs.zfs.max_recordsize is added to allow adjusting the permitted maximum record size, or zfs_max_recordsize, with a default of 1MB. ZFS will not allow setting recordsize greater than zfs_max_recordsize as a safety belt, because larger recordsize means greater read and write latency and more memory usage.
Please note that booting from datasets that have recordsize greater than 128KB is not supported (but it's Okay to enable the feature on the pool).
Limited safety belt is provided for mounted root filesystem but use caution when using a larger value.
Illumos issue: 5027 zfs large block support
|
#
273346 |
|
20-Oct-2014 |
delphij |
MFC r272511: MFV r272499:
Illumos issue: 5174 add sdt probe for blocked read in dbuf_read()
|
#
271002 |
|
03-Sep-2014 |
delphij |
MFC r270248: MFV r270196:
Illumos issue: 5047 don't use atomic_*_nv if you discard the return value
|
#
271001 |
|
03-Sep-2014 |
delphij |
MFC r270247: MFV r270195:
Illumos issue: 5045 use atomic_{inc,dec}_* instead of atomic_add_*
|
#
270809 |
|
29-Aug-2014 |
delphij |
MFC r270383: MFV r270198:
Instead of using timestamp in the AVL, use the memory address when comparing.
Illumos issue: 5095 panic when adding a duplicate dbuf to dn_dbufs
|
#
269845 |
|
11-Aug-2014 |
delphij |
MFC r269229,269404,269466: MFV r269223:
Change dn->dn_dbufs from linked list to AVL tree.
Illumos issues: 4873 zvol unmap calls can take a very long time for larger datasets
|
#
269417 |
|
02-Aug-2014 |
delphij |
MFC r268858: MFV r268850:
Change the interaction between the DMU and ARC so that when the DMU is shutting down an objset, we do not evict the data from the ARC. Instead we simply coordinate the destruction of the DMU's data with the ARC.
The only case where we actually need to explicitly evict from the ARC is when dbuf_rele_and_unlock() determines that the administrator has requested that it not be kept in memory, via the primarycache/secondarycache properties. In this case, we evict the data from the ARC by its blkptr_t, the same way as when a block is freed we explicitly evict it from the ARC.
Illumos issue: 4631 zvol_get_stats triggering too many reads
|
#
269218 |
|
29-Jul-2014 |
delphij |
MFC r268713: MFV r268702:
Add missing *_destroy() calls in various places with ZFS.
Illumos issue: 4975 missing mutex_destroy() calls in zfs
|
#
268657 |
|
15-Jul-2014 |
delphij |
MFC r268123: MFV r268119:
4914 zfs on-disk bookmark structure should be named *_phys_t
|
#
268649 |
|
15-Jul-2014 |
delphij |
MFC r268075: MFV r267565:
4757 ZFS embedded-data block pointers ("zero block compression") 4913 zfs release should not be subject to space checks
|
#
265740 |
|
09-May-2014 |
delphij |
MFC r264669: MFV r264666:
4374 dn_free_ranges should use range_tree_t
illumos/illumos-gate@bf16b11e8deb633dd6c4296d46e92399d1582df4
|
#
263397 |
|
19-Mar-2014 |
delphij |
MFC r260150: MFV r259170:
4370 avoid transmitting holes during zfs send
4371 DMU code clean up
illumos/illumos-gate@43466aae47bfcd2ad9bf501faec8e75c08095e4f
NOTE: Make sure the boot code is updated if a zpool upgrade is done on boot zpool.
|
#
260763 |
|
16-Jan-2014 |
avg |
MFC r258632,258704: MFV r255255: 4045 zfs write throttle & i/o scheduler performance work
Sponsored by: HybridCluster [merge]
|
#
260617 |
|
13-Jan-2014 |
delphij |
MFC r259811:
MFV r258373:
4168 ztest assertion failure in dbuf_undirty
4169 verbatim import causes zdb to segfa 4170 zhack leaves pool in ACTIVE state
illumos/illumos-gate@7fdd916c474ea52896c671bbe7b56ba34a1ca132
|
#
288594 |
|
03-Oct-2015 |
mav |
MFC r287702: 5987 zfs prefetch code needs work
Rewrite the ZFS prefetch code to detect only forward, sequential streams.
The following kstats have been added:
kstat.zfs.misc.arcstats.sync_wait_for_async
How many sync reads have waited for async read to complete. (less is better)
kstat.zfs.misc.arcstats.demand_hit_predictive_prefetch
How many demand read didn't have to wait for I/O because of predictive prefetch. (more is better)
zfetch kstats have been similified to hits, misses, and max_streams, with max_streams representing times when we were not able to create new stream because we already have the maximum number of sequences for a file.
The sysctl variable/loader tunable vfs.zfs.zfetch.block_cap have been replaced by vfs.zfs.zfetch.max_distance, which controls maximum bytes to prefetch per stream.
illumos/illumos-gate@cf6106c8a0d6598b045811f9650d66e07eb332af
Illumos ZFS issues:
5987 zfs prefetch code needs work https://www.illumos.org/issues/5987
|
#
288576 |
|
03-Oct-2015 |
mav |
Fix r288549 build on stable.
For some reason this (odd) code builds on head, but not on stable.
|
#
288572 |
|
03-Oct-2015 |
mav |
MFC r286708: 5959 clean up per-dataset feature count code
Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@ca0cc3918a1789fa839194af2a9245f801a06b1a
A ZFS feature flags (large blocks) tracks its refcounts as the number of datasets that have ever used the feature. Several features of this type are planned to be added (new checksum functions). This code should be made common infrastructure rather than duplicating the code for each feature.
|
#
288571 |
|
03-Oct-2015 |
mav |
MFC r286705: 5960 zfs recv should prefetch indirect blocks 5925 zfs receive -o origin=
Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Author: Paul Dagnelie <pcd@delphix.com>
While running 'zfs recv' we noticed that every 128th 8K block required a read. We were seeing that restore_write() was calling dmu_tx_hold_write() and the indirect block was not cached. We should prefetch upcoming indirect blocks to avoid having to go to disk and blocking the restore_write().
Allow an incremental send stream to be received as a clone, even if the stream does not mark it as a clone.
|
#
288549 |
|
03-Oct-2015 |
mav |
MFC r286575: 5056 ZFS deadlock on db_mtx and dn_holds
Reviewed by: Will Andrews <willa@spectralogic.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Justin Gibbs <justing@spectralogic.com>
illumos/illumos-gate@bc9014e6a81272073b9854d9f65dd59e18d18c35
|
#
288541 |
|
03-Oct-2015 |
mav |
MFC r286545: 5630 stale bonus buffer in recycled dnode_t leads to data corruption
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Will Andrews <will@freebsd.org> Approved by: Robert Mustacchi <rm@joyent.com> Author: Justin T. Gibbs <justing@spectralogic.com>
|
#
288538 |
|
03-Oct-2015 |
mav |
MFC r286541: 5531 NULL pointer dereference in dsl_prop_get_ds()
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Justin T. Gibbs <justing@spectralogic.com>
illumos/illumos-gate@e57a022b8f718889ffa92adbde47a8f08abcdb25
|
#
285202 |
|
06-Jul-2015 |
avg |
MFC r284593: MFV r284412: 5911 ZFS "hangs" while deleting file
illumos/illumos-gate@46e1baa6cf6d5432f5fd231bb588df8f9570c858 https://www.illumos.org/issues/5911 Sometimes ZFS appears to hang while deleting a file. It is actually making slow progress at the file deletion, but other operations (administrative and writes via the data path) "hang" until the file removal completes, which can take a long time if the file has many blocks. The deletion (or most of it) happens in a single txg, and the sync thread spends most of its time reading indirect blocks...
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com> Reviewed by: Alek Pinchuk <alek@nexenta.com> Reviewed by: Simon Klinkert <simon.klinkert@gmail.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Matthew Ahrens <mahrens@delphix.com>
PR: 199775 Approved by: re(kib)
|
#
277586 |
|
23-Jan-2015 |
delphij |
MFC r275811: MFV r275783:
Convert ARC flags to use enum. Previously, public flags are defined in arc.h and private flags are defined in arc.c which can lead to confusion and programming errors.
Consistently use 'hdr' (when referencing arc_buf_hdr_t) instead of 'buf' or 'ab' because arc_buf_t are often named 'buf' as well.
Illumos issue: 5369 arc flags should be an enum 5370 consistent arc_buf_hdr_t naming scheme
|
#
277585 |
|
23-Jan-2015 |
delphij |
MFC r275782: MFV r275551:
Remove "dbuf phys" db->db_data pointer aliases.
Use function accessors that cast db->db_data to the appropriate "phys" type, removing the need for clients of the dmu buf user API to keep properly typed pointer aliases to db->db_data in order to conveniently access their data.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_leaf.c: In zap_leaf() and zap_leaf_byteswap, now that the pointer alias field l_phys has been removed, use the db_data field in an on stack dmu_buf_t to point to the leaf's phys data.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c: Remove the db_user_data_ptr_ptr field from dbuf and all logic to maintain it.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dbuf.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dmu.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dir.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sa.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c: Modify the DMU buf user API to remove the ability to specify a db_data aliasing pointer (db_user_data_ptr_ptr).
cddl/contrib/opensolaris/cmd/zdb/zdb.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_diff.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_traverse.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_bookmark.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_deadlist.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_deleg.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_destroy.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dir.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_prop.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_synctask.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_userhold.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sa.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_history.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_leaf.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dataset.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dir.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zap_impl.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zap_leaf.h: Create and use the new "phys data" accessor functions dsl_dir_phys(), dsl_dataset_phys(), zap_m_phys(), zap_f_phys(), and zap_leaf_phys().
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dataset.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dir.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zap_impl.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zap_leaf.h: Remove now unused "phys pointer" aliases to db->db_data from clients of the DMU buf user API.
Illumos issue: 5314 Remove "dbuf phys" db->db_data pointer aliases in ZFS
|
#
276081 |
|
22-Dec-2014 |
delphij |
MFC r274337,r274673,274681,r275515:
ZFS large block support. The default recordsize remains at 128KB.
A new tunable/sysctl variable, vfs.zfs.max_recordsize is added to allow adjusting the permitted maximum record size, or zfs_max_recordsize, with a default of 1MB. ZFS will not allow setting recordsize greater than zfs_max_recordsize as a safety belt, because larger recordsize means greater read and write latency and more memory usage.
Please note that booting from datasets that have recordsize greater than 128KB is not supported (but it's Okay to enable the feature on the pool).
Limited safety belt is provided for mounted root filesystem but use caution when using a larger value.
Illumos issue: 5027 zfs large block support
|
#
273346 |
|
20-Oct-2014 |
delphij |
MFC r272511: MFV r272499:
Illumos issue: 5174 add sdt probe for blocked read in dbuf_read()
|
#
271002 |
|
03-Sep-2014 |
delphij |
MFC r270248: MFV r270196:
Illumos issue: 5047 don't use atomic_*_nv if you discard the return value
|
#
271001 |
|
03-Sep-2014 |
delphij |
MFC r270247: MFV r270195:
Illumos issue: 5045 use atomic_{inc,dec}_* instead of atomic_add_*
|
#
270809 |
|
29-Aug-2014 |
delphij |
MFC r270383: MFV r270198:
Instead of using timestamp in the AVL, use the memory address when comparing.
Illumos issue: 5095 panic when adding a duplicate dbuf to dn_dbufs
|
#
269845 |
|
11-Aug-2014 |
delphij |
MFC r269229,269404,269466: MFV r269223:
Change dn->dn_dbufs from linked list to AVL tree.
Illumos issues: 4873 zvol unmap calls can take a very long time for larger datasets
|
#
269417 |
|
02-Aug-2014 |
delphij |
MFC r268858: MFV r268850:
Change the interaction between the DMU and ARC so that when the DMU is shutting down an objset, we do not evict the data from the ARC. Instead we simply coordinate the destruction of the DMU's data with the ARC.
The only case where we actually need to explicitly evict from the ARC is when dbuf_rele_and_unlock() determines that the administrator has requested that it not be kept in memory, via the primarycache/secondarycache properties. In this case, we evict the data from the ARC by its blkptr_t, the same way as when a block is freed we explicitly evict it from the ARC.
Illumos issue: 4631 zvol_get_stats triggering too many reads
|
#
269218 |
|
29-Jul-2014 |
delphij |
MFC r268713: MFV r268702:
Add missing *_destroy() calls in various places with ZFS.
Illumos issue: 4975 missing mutex_destroy() calls in zfs
|
#
268657 |
|
15-Jul-2014 |
delphij |
MFC r268123: MFV r268119:
4914 zfs on-disk bookmark structure should be named *_phys_t
|
#
268649 |
|
15-Jul-2014 |
delphij |
MFC r268075: MFV r267565:
4757 ZFS embedded-data block pointers ("zero block compression") 4913 zfs release should not be subject to space checks
|
#
265740 |
|
09-May-2014 |
delphij |
MFC r264669: MFV r264666:
4374 dn_free_ranges should use range_tree_t
illumos/illumos-gate@bf16b11e8deb633dd6c4296d46e92399d1582df4
|
#
263397 |
|
19-Mar-2014 |
delphij |
MFC r260150: MFV r259170:
4370 avoid transmitting holes during zfs send
4371 DMU code clean up
illumos/illumos-gate@43466aae47bfcd2ad9bf501faec8e75c08095e4f
NOTE: Make sure the boot code is updated if a zpool upgrade is done on boot zpool.
|
#
260763 |
|
16-Jan-2014 |
avg |
MFC r258632,258704: MFV r255255: 4045 zfs write throttle & i/o scheduler performance work
Sponsored by: HybridCluster [merge]
|
#
260617 |
|
13-Jan-2014 |
delphij |
MFC r259811:
MFV r258373:
4168 ztest assertion failure in dbuf_undirty
4169 verbatim import causes zdb to segfa 4170 zhack leaves pool in ACTIVE state
illumos/illumos-gate@7fdd916c474ea52896c671bbe7b56ba34a1ca132
|