338905 |
24-Sep-2018 |
markj |
MFC r338724: Fix an nvpair leak in vdev_geom_read_config().
PR: 230704 |
335549 |
22-Jun-2018 |
avg |
Revert r335546 as temporary pool name feature has not been merged |
335546 |
22-Jun-2018 |
avg |
MFC r333630: Fix 'zpool create -t <tempname>'
Creating a pool with a temporary name fails when we also specify custom dataset properties: this is because we mistakenly call zfs_set_prop_nvlist() on the "real" pool name which, as expected, cannot be found because the SPA is present in the namespace with the temporary name. |
333249 |
04-May-2018 |
emaste |
MFC r333234: zfs_ioctl: avoid out-of-bound read
admbugs: 796 Submitted by: Domagoj Stolfa <ds815@cam.ac.uk> Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com> |
333196 |
03-May-2018 |
avg |
MFC r332426: allow ZFS pool to have temporary name for duration of current import
The change adds -t <name> option to zpool create and -t option to zpool import in its form with an old name and a new name. This allows to import (or create) a pool under a name that's different from its real, permanent name without affecting that name. This is useful when working with VM images or images of other physical systems if they happen to have a ZFS pool with the same name as the host system.
Sponsored by: Panzura (porting) |
331612 |
27-Mar-2018 |
avg |
MFC r330974: MFV r330973: 9164 assert: newds == os->os_dsl_dataset
PR: 225877 |
330989 |
15-Mar-2018 |
avg |
MFC r330057: add ZFS_ENTER protection to .zfs/snapshot vnode operations that need it |
330987 |
15-Mar-2018 |
avg |
MFC r322245,r329717: MFV r322242: 8373 TXG_WAIT in ZIL commit path
MFC r322245: MFV r322242: 8373 TXG_WAIT in ZIL commit path MFC r329717: MFV r329715: 8997 ztest assertion failure in zil_lwb_write_issue |
330736 |
10-Mar-2018 |
asomers |
MFC r329265, r329384
r329265: Implement .vop_pathconf and .vop_getacl for the .zfs ctldir
zfsctl_common_pathconf will report all the same variables that regular ZFS volumes report. zfsctl_common_getacl will report an ACL equivalent to 555, except that you can't read xattrs or edit attributes.
Fixes a bug where "ls .zfs" will occasionally print something like: ls: .zfs/.: Operation not supported
PR: 225793 Reviewed by: avg Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D14365
r329384: Handle generic pathconf attributes in the .zfs ctldir
MFC instructions: change the value of _PC_LINK_MAX to INT_MAX
Reported by: jhb X-MFC-With: 329265 Sponsored by: Spectra Logic Corp |
330589 |
07-Mar-2018 |
avg |
MFC r329714: MFV r329713: 8731 ASSERT3U(nui64s, <=, UINT16_MAX) fails for large blocks |
330524 |
05-Mar-2018 |
asomers |
MFC r324940:
Fix the error message when creating a zpool on a too-small device
Don't check for SPA_MINDEVSIZE in vdev_geom_attach when opening by path. It's redundant with the check in vdev_open, and failing to attach here results in the wrong error message being printed. However, still check for it in some other situations:
* When opening by guids, so we don't get bogged down reading from slow devices like floppy drives. * In vdev_geom_read_pool_label for the same reason, because we iterate over all providers. * If the caller requests that we verify the guid, because then we'll have to read from the device before vdev_open verifies the size.
PR: 222227 Reported by: Marie Helene Kvello-Aune <marieheleneka@gmail.com> Reviewed by: avg, mav Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D12531 |
330522 |
05-Mar-2018 |
asomers |
MFC r326401:
Fix assertion when ZFS fails to open certain devices
"panic: vdev_geom_close_locked: cp->private is NULL" This panic will result if ZFS fails to open a device due to either of the following reasons:
1) The device's sector size is greater than 8KB. 2) ZFS wants to open the device RW, but it can't be opened for writing.
The solution is to change the initialization order to ensure that the assertion will be satisfied.
PR: 221066 Reported by: David NewHamlet <wheelcomplex@gmail.com> Reviewed by: avg Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D13278 |
330238 |
01-Mar-2018 |
avg |
MFC r329314: MFV r329313: 8857 zio_remove_child() panic due to already destroyed parent zio
PR: 223803 |
330234 |
01-Mar-2018 |
avg |
MFC r329711: MFV r329710: 8966 use after end of the lifetime of a local variable
PR: 225162 |
330071 |
27-Feb-2018 |
avg |
MFC r329556,r329820 remove an assert in zfsctl_snapdir_lookup to match r323578 |
330065 |
27-Feb-2018 |
avg |
MFC r329016: remove a duplicate assignment |
330063 |
27-Feb-2018 |
avg |
MFC r328881: zfs: move a utility function, ioflags, closer to its consumers |
330059 |
27-Feb-2018 |
avg |
MFC r328217: zfs: no need to check that size of zfs_cmd_t is not greater than IOCPARM_MAX |
328048 |
16-Jan-2018 |
avg |
MFC r327725: zfs_mount: restore a bit of ifdef-out illumos code |
327470 |
01-Jan-2018 |
dim |
MFC r327167:
Remove obsolete register keyword from opensolaris's sysmacros.h. When compiling zfsd with recent clang, it leads to a warning about the register storage class being incompatible with C++17. |
326428 |
01-Dec-2017 |
avg |
MFC r326070: zfs_write: fix problem with writes appearing to succeed when over quota
The problem happens when the writes have offsets and sizes aligned with a filesystem's recordsize (maximum block size). In this scenario dmu_tx_assign() would fail because of being over the quota, but the uio would already be modified in the code path where we copy data from the uio into a borrowed ARC buffer. That makes an appearance of a partial write, so zfs_write() would return success and the uio would be modified consistently with writing a single block.
That bug can result in a data loss because the writes over the quota would appear to succeed while the actual data is being discarded.
This commit fixes the bug by ensuring that the uio is not changed until after all error checks are done. To achieve that the code now uses uiocopy() + uioskip() as in the original illumos design. We can do that now that uiocopy() has been updated in r326067 to use vn_io_fault_uiomove(). |
326334 |
28-Nov-2017 |
asomers |
MFC r323813:
MFV r323789: 8473 scrub does not detect errors on active spares
illumos/illumos-gate@554675eee75dd2d7398d960aa5c81083ceb8505a https://github.com/illumos/illumos-gate/commit/554675eee75dd2d7398d960aa5c81083ceb8505a
https://www.illumos.org/issues/8473 Scrubbing is supposed to detect and repair all errors in the pool. However, it wrongly ignores active spare devices. The problem can easily be reproduced in OpenZFS at git rev 0ef125d with these commands:
truncate -s 64m /tmp/a /tmp/b /tmp/c sudo zpool create testpool mirror /tmp/a /tmp/b spare /tmp/c sudo zpool replace testpool /tmp/a /tmp/c /bin/dd if=/dev/zero bs=1024k count=63 oseek=1 conv=notrunc of=/tmp/c sync sudo zpool scrub testpool zpool status testpool # Will show 0 errors, which is wrong sudo zpool offline testpool /tmp/a sudo zpool scrub testpool zpool status testpool # Will show errors on /tmp/c, # which should've already been fixed
FreeBSD head is partially affected: the first scrub will detect some errors, but the second scrub will detect more.
Reviewed by: Andy Stormont <astormont@racktopsystems.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net>
Sponsored by: Spectra Logic Corp |
326325 |
28-Nov-2017 |
asomers |
MFC r322546:
Fix some ZFS debugging messages
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c Be more careful about the use of provider names vs vdev names in ZFS_LOG statements.
Sponsored by: Spectra Logic Corp |
325932 |
17-Nov-2017 |
avg |
MFC r325610: MFV r325609: 7531 Assign correct flags to prefetched buffers |
325913 |
16-Nov-2017 |
avg |
MFC r325228: vdev_geom_close: close errored consumer even if vdev_reopening is set |
325910 |
16-Nov-2017 |
avg |
MFC r325606: MFV r325605: 8713 Buffer overflow in dsl_dataset_name() |
325540 |
08-Nov-2017 |
avg |
MFC r324757: remove spa_sync_on assert from spa_async_thread_vd |
324254 |
04-Oct-2017 |
avg |
MFC r323483: zfsctl_snapdir_lookup should be able to handle an uncovered vnode |
324251 |
04-Oct-2017 |
avg |
MFC r323481: zfsvfs_hold: assert that the busied filesystem can not be unmounted |
324204 |
02-Oct-2017 |
avg |
MFC r323918: MFV r323917: 8648 Fix range locking in ZIL commit codepath
This fixes a problem introduced in r320496, MFC of r308782. |
324159 |
01-Oct-2017 |
avg |
MFC r323522: slightly simplify zfs_vptocnp |
324009 |
26-Sep-2017 |
avg |
MFC r323480: zfs_get_vfs: reference a requested filesystem instead of vfs_busy-ing it
Sponsored by: Panzura |
324005 |
26-Sep-2017 |
avg |
MFC r323479,r323491: zfs: tighten debug versions of ZTOV and VTOZ
Sponsored by: Panzura |
323762 |
19-Sep-2017 |
avg |
MFC r323482: zfs_ctldir: remove obsolete / bogus ARGSUSED lint directives
None of the tagged functions had unused parameters. |
323759 |
19-Sep-2017 |
avg |
MFC r322241: MFV r322240: 8491 uberblock on-disk padding to reserve space for smoothly merging zpool checkpoint & MMP in ZFS
illumos/illumos-gate@79c2b812ee2010ebf20fdd92dc5f06b59000a94c https://github.com/illumos/illumos-gate/commit/79c2b812ee2010ebf20fdd92dc5f06b59000a94c
https://www.illumos.org/issues/8491 The zpool checkpoint feature in DxOS added a new field in the uberblock. The Multi-Modifier Protection Pull Request from ZoL adds two new fields in the uberblock (Reference: https://github.com/zfsonlinux/zfs/pull/6279). As these two changes come from two different sources and once upstreamed and deployed will introduce an incompatibility with each other we want to upstream a change that will reserve the padding for both of them so integration goes smoothly and everyone gets both features.
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Olaf Faaland <faaland1@llnl.gov> Approved by: Gordon Ross <gwr@nexenta.com> Author: Serapheim Dimitropoulos <serapheim@delphix.com> |
323756 |
19-Sep-2017 |
avg |
MFC r322228: MFV r322227: 8377 Panic in bookmark deletion
illumos/illumos-gate@42418f9e73f0d007aa87675ecc206c26fc8e073e https://github.com/illumos/illumos-gate/commit/42418f9e73f0d007aa87675ecc206c26fc8e073e
https://www.illumos.org/issues/8377 The problem is that when dsl_bookmark_destroy_check() is executed from open context (the pre-check), it fills in dbda_success based on the existence of the bookmark. But the bookmark (or containing filesystem as in this case) can be destroyed before we get to syncing context. When we re-run dsl_bookmark_destroy_check() in syncing context, it will not add the deleted bookmark to dbda_success, intending for dsl_bookmark_destroy_sync() to not process it. But because the bookmark is still in dbda_success from the open-context call, we do try to destroy it. The fix is that dsl_bookmark_destroy_check() should not modify dbda_success when called from open context.
Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com> |
323747 |
19-Sep-2017 |
avg |
MFC r321471: spa_import_rootpool should be able to handle an imported root pool
That is required to support reboot -r with a new root filesystem being on an already imported pool.
PR: 210721 |
323745 |
19-Sep-2017 |
avg |
MFC r320352: zfs: port vdev_file part of illumos change 3306
3306 zdb should be able to issue reads in parallel illumos/illumos-gate/31d7e8fa33fae995f558673adb22641b5aa8b6e1 https://www.illumos.org/issues/3306
The upstream change was made before we started to import upstream commits individually. It was imported into the illumos vendor area as r242733. That commit was MFV-ed in r260138, but as the commit message says vdev_file.c was left intact.
This commit actually implements the parallel I/O for vdev_file using a taskqueue with multiple thread. This implementation does not depend on the illumos or FreeBSD bio interface at all, but uses zio_t to pass around all the relevent data. So, the code looks a bit different from the upstream.
This commit also incorporates ZoL commit zfsonlinux/zfs/bc25c9325b0e5ced897b9820dad239539d561ec9 that fixed https://github.com/zfsonlinux/zfs/issues/2270 We need to use a dedicated taskqueue for exactly the same reason as ZoL as we do not implement TASKQ_DYNAMIC. |
323331 |
08-Sep-2017 |
emaste |
MFC r323002: zfs: do not advertise unsupported hash algorithms
illumos 4185 ("add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R") was intentionally merged only partially in r289422, without adding support for skein, sha512 and edonr on FreeBSD.
Support for skein and sha512 was added later on (in head), but none of these are supported in stable/10. Prior to this commit zfs(8) correctly rejected these algorithms, but with an error message that claimed support:
fk@r500 ~ $zfs set checksum=edonr tank cannot set property for 'tank': 'checksum' must be one of 'on | off | fletcher2 | fletcher4 | sha256 | sha512 | skein | edonr'
(This commit removes sha512 and skein in addition to edonr from the merge of head's r323002.)
PR: 204055 Submitted by: Fabian Keil Approved by: re (kib) Obtained from: ElectroBSD |
321592 |
26-Jul-2017 |
emaste |
MFC r321218: zfs: Fix a typo in the delay_min_dirty_percent sysctl description
The description is FreeBSD-specific and was added in r266497 to fix PR189865.
PR: 220825 Submitted by: Fabian Keil Obtained from: ElectroBSD |
320496 |
30-Jun-2017 |
avg |
MFC r308782: After some ZIL changes 6 years ago zil_slog_limit got partially broken due to zl_itx_list_sz not updated when async itx'es upgraded to sync. Actually because of other changes about that time zl_itx_list_sz is not really required to implement the functionality, so this patch removes some unneeded broken code and variables.
Original idea of zil_slog_limit was to reduce chance of SLOG abuse by single heavy logger, that increased latency for other (more latency critical) loggers, by pushing heavy log out into the main pool instead of SLOG. Beside huge latency increase for heavy writers, this implementation caused double write of all data, since the log records were explicitly prepared for SLOG. Since we now have I/O scheduler, I've found it can be much more efficient to reduce priority of heavy logger SLOG writes from ZIO_PRIORITY_SYNC_WRITE to ZIO_PRIORITY_ASYNC_WRITE, while still leave them on SLOG.
Existing ZIL implementation had problem with space efficiency when it has to write large chunks of data into log blocks of limited size. In some cases efficiency stopped to almost as low as 50%. In case of ZIL stored on spinning rust, that also reduced log write speed in half, since head had to uselessly fly over allocated but not written areas. This change improves the situation by offloading problematic operations from z*_log_write() to zil_lwb_commit(), which knows real situation of log blocks allocation and can split large requests into pieces much more efficiently. Also as side effect it removes one of two data copy operations done by ZIL code WR_COPIED case.
While there, untangle and unify code of z*_log_write() functions. Also zfs_log_write() alike to zvol_log_write() can now handle writes crossing block boundary, that may also improve efficiency if ZPL is made to do that. |
319625 |
06-Jun-2017 |
gjb |
MFC r318943 (avg):
MFV r318942: 8166 zpool scrub thinks it repaired offline device
https://www.illumos.org/issues/8166 If we do a scrub while a leaf device is offline (via "zpool offline"), we will inadvertently clear the DTL (dirty time log) of the offline device, even though it is still damaged. When the device comes back online, we will incompletely resilver it, thinking that the scrub repaired blocks written before the scrub was started. The incomplete resilver can lead to data loss if there is a subsequent failure of a different leaf device. The fix is to never clear the DTL of offline devices. Note that if a device is onlined while a scrub is in progress, the scrub will be restarted. The problem can be worked around by running "zpool scrub" after "zpool online". See also https://github.com/zfsonlinux/zfs/issues/5806
PR: 219537 Sponsored by: The FreeBSD Foundation |
319423 |
01-Jun-2017 |
avg |
MFC r318832: MFV r316923: 8026 retire zfs_throttle_delay and zfs_throttle_resolution |
319421 |
01-Jun-2017 |
avg |
MFC r318830: MFV r316921: 8027 tighten up dsl_pool_dirty_delta |
319416 |
01-Jun-2017 |
avg |
MFC r319096: zfs_lookup: fix bogus arguments to lookup of "snapshot" directory |
319268 |
30-May-2017 |
asomers |
MFC r318189:
vdev_geom may associate multiple vdevs per g_consumer
vdev_geom.c currently uses the g_consumer's private field to point to a vdev_t. That way, a GEOM event can cause a change to a ZFS vdev. For example, when you remove a disk, the vdev's status will change to REMOVED. However, vdev_geom will sometimes attach multiple vdevs to the same GEOM consumer. If this happens, then geom events will only be propagated to one of the vdevs.
Fix this by storing a linked list of vdevs in g_consumer's private field.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
* g_consumer.private now stores a linked list of vdev pointers associated with the consumer instead of just a single vdev pointer.
* Change vdev_geom_set_physpath's signature to more closely match vdev_geom_set_rotation_rate
* Don't bother calling g_access in vdev_geom_set_physpath. It's guaranteed that we've already accessed the consumer by the time we get here.
* Don't call vdev_geom_set_physpath in vdev_geom_attach. Instead, call it in vdev_geom_open, after we know that the open has succeeded.
PR: 218634 Reviewed by: gibbs Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D10391 |
319262 |
30-May-2017 |
asomers |
MFC r316760:
Fix vdev_geom_attach_by_guids for partitioned disks
When opening a vdev whose path is unknown, vdev_geom must find a geom provider with a label whose guids match the desired vdev. However, due to partitioning, it is possible that two non-synonomous providers will share some labels. For example, if the first partition starts at the beginning of the drive, then ada0 and ada0p1 will share the first label. More troubling, if the last partition runs to the end of the drive, then ada0p3 and ada0 will share the last label. If vdev_geom opens ada0 when it should've opened ada0p3, then the pool won't be readable. If it opens ada0 when it should've opened ada0p1, then it will corrupt some other partition when it writes the 3rd and 4th labels.
The easiest way to reproduce this problem is to install a mirrored root pool with the default partition layout, then swap the positions of the two boot drives and reboot. Whether the bug manifests depends on the order in which geom lists its providers, which is arbitrary.
Fix this situation by modifying the search algorithm to prefer geom providers that have all four labels intact. If no such provider exists, then open whichever provider has the most.
Reviewed by: mav Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D10365 |
318786 |
24-May-2017 |
avg |
MFC r316854: rename vfs.zfs.debug_flags to vfs.zfs.debugflags
Since this is a stable branch vfs.zfs.debug_flags sysctl is also kept. The corresponing tunable could never work. |
316850 |
14-Apr-2017 |
avg |
MFC r315852: zfs: add zio_buf_alloc_nowait and use it in vdev_queue_aggregate |
316848 |
14-Apr-2017 |
avg |
MFC r315853: zfs_putpages: use TXG_WAIT |
315848 |
23-Mar-2017 |
avg |
MFC r315076: zfs: provide a special vptocnp method for the .zfs vnode |
315844 |
23-Mar-2017 |
avg |
MFC r314048,r314194: reimplement zfsctl (.zfs) support |
315835 |
23-Mar-2017 |
avg |
MFC r314913: MFV r314911: 7867 ARC space accounting leak |
315833 |
23-Mar-2017 |
avg |
MFC r314912: MFV r314910: 7843 get_clones_stat() is suboptimal for lots of clones |
315388 |
16-Mar-2017 |
mav |
MFC r314549: Execute last ZIO of log commit synchronously.
For short transactions overhead of context switch can be too large. Skipping it gives significant latency reduction. For large ones, including multiple ZIOs, latency is less critical, while throughput there may become limited by checksumming speed of single CPU core. To get best of both cases, execute last ZIO directly from calling thread context to save latency, while all others (if there are any) enqueue to taskqueues in traditional way. |
315385 |
16-Mar-2017 |
mav |
MFC r314548: Completely skip cache flushing for not supporting log devices. |
315073 |
11-Mar-2017 |
avg |
MFC r314274: l2arc: fix write size calculation broken by Compressed ARC commit |
314874 |
07-Mar-2017 |
jpaetzel |
MFC 313879
MVF: 313876
7504 kmem_reap hangs spa_sync and administrative tasks
illumos/illumos-gate@405a5a0f5c3ab36cb76559467d1a62ba648bd809 https://github.com/illumos/illumos-gate/commit/405a5a0f5c3ab36cb76559467d1a62ba648bd80
https://www.illumos.org/issues/7504
We see long spa_sync(). We are waiting to hold dp_config_rwlock for writer. Some other thread holds dp_config_rwlock for reader, then calls arc_get_data_buf(), which finds that arc_is_overflowing()==B_TRUE. So it waits (while holding dp_config_rwlock for reader) for arc_reclaim_thread to signal arc_reclaim_waiters_cv. Before signaling, arc_reclaim_thread does arc_kmem_reap_now(), which takes ~seconds.
Author: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> |
314857 |
07-Mar-2017 |
avg |
MFC r314058: zfs: lower priority of zio_write_issue threads by four
Obtained from: Panzura Sponsored by: Panzura |
314711 |
05-Mar-2017 |
mm |
MFC r314572:
Fix null pointer dereference in zfs_freebsd_setacl().
Prevents unprivileged users from panicking the kernel by calling __acl_delete_*() on files or directories inside a ZFS mount. |
314668 |
04-Mar-2017 |
avg |
MFC r314273: zfs: call spa_deadman on a taskqueue thread |
314667 |
04-Mar-2017 |
avg |
MFC r283291: don't use CALLOUT_MPSAFE with callout_init()
The main purpose of this MFC is to reduce conflicts for other merges. Parts of the original change have already "trickled down" via individual MFCs. |
314356 |
27-Feb-2017 |
avg |
MFC r314059: zfs: move zio_taskq_basedc under SYSDC |
314327 |
27-Feb-2017 |
avg |
MFC r292782: Replace sys/crypto/sha2/sha2.c with lib/libmd/sha512c.c
cperciva's libmd implementation is 5-30% faster The same was done for SHA256 previously in r263218
Approved by: secteam |
314032 |
21-Feb-2017 |
avg |
MFC r313687: remove l2_padding_needed statistic from zfs arc |
314029 |
21-Feb-2017 |
avg |
MFC r313686: check remaining space in zfs implementations of vptocnp
PR: 216939 |
313486 |
09-Feb-2017 |
ngie |
MFC r258903,r264487,r271699,r288415:
r258903 (by markj):
Enable some previously-disabled DTrace tests for umod, ufunc and usym. They expect the installed ksh binary to be named "ksh", which is not the case when it's installed on FreeBSD via the shells/ksh93 port. Allow for it to be "ksh93" as well so that the tests can actually pass.
r264487 (by markj):
Replace a few Solarisisms with their corresponding FreeBSDisms to make a few printf tests pass.
r271699 (by markj):
Implement a workaround to allow this test program to be compiled with clang. It seems that if a pragma is used to define a weak alias for a local function, the pragma must appear after the function is defined.
PR: 193056
r288415 (by markj):
MFV r288408: 6266 harden dtrace_difo_chunksize() with respect to malicious DIF
illumos/illumos-gate@395c7a3dcfc66b8b671dc4b3c4a2f0ca26449922
Author: Bryan Cantrill <bryan@joyent.com> |
311150 |
03-Jan-2017 |
markj |
MFC r310647: Remove an obsolete pragma from dtrace.h. |
310516 |
24-Dec-2016 |
avg |
MFC r309250: MFV r309249: 3821 Race in rollback, zil close, and zil flush |
310514 |
24-Dec-2016 |
avg |
MFC r309099: MFV r308990: 7181 race between zfs_mount and zfs_ioc_rollback |
310512 |
24-Dec-2016 |
avg |
MFC r309098: MFV r308988: 7199, 7200 dsl_dataset_rollback_sync may try to free already free blocks |
310510 |
24-Dec-2016 |
avg |
MFC r309097: MFV r308987: 7180 potential race between zfs_suspend_fs+zfs_resume_fs and zfs_ioc_rename |
310106 |
15-Dec-2016 |
mav |
MFC 309714: Fix spa_alloc_tree sorting by offset in r305331.
Original commit "7090 zfs should improve allocation order" declares alloc queue sorted by time and offset. But in practice io_offset is always zero, so sorting happened only by time, while order of writes with equal time was completely random. On Illumos this did not affected much thanks to using high resolution timestamps. On FreeBSD due to using much faster but low resolution timestamps it caused bad data placement on disks, affecting further read performance.
This change switches zio_timestamp_compare() from comparing uninitialized io_offset to really populated io_bookmark values. I haven't decided yet what to do with timestampts, but on simple tests this change gives the same peformance results by just making code to work as declared. |
310067 |
14-Dec-2016 |
avg |
MFC r308887,309090: fix unsafe modification of zfs_vnodeops when DIAGNOSTIC is enabled |
308915 |
21-Nov-2016 |
avg |
MFC r308089: zfsbootcfg: a simple tool to set next boot (one time) options for zfsboot
There is a branch-specific change in sbin/zfsbootcfg/Makefile because of LIBADD vs LDADD/DPADD. |
308765 |
17-Nov-2016 |
avg |
Revert r308753: some unrelated changes were included into the commit |
308753 |
17-Nov-2016 |
avg |
MFC r308040,308479: nap time between pats is forced to be at most half of the timeout
Note that in this branch the default nap period is 1 second unlike the head where the period is 10 seconds. |
308596 |
13-Nov-2016 |
mav |
MFC r308173: Fix ZIL records ordering when ZVOL opened both with and without FSYNC.
Before this an earlier writes to a ZVOL opened without FSYNC could get to ZIL after later writes to the same ZVOL opened with FSYNC. Fix this by replicating functionality of ZPL (zv_sync_cnt equivalent to z_sync_cnt), marking all log records sync if anybody opened the ZVOL with FSYNC. |
308594 |
12-Nov-2016 |
mav |
MFC r308169: Pass to zvol_log_truncate() same sync values as to zvol_log_write().
Surplus marking of TX_TRUNCATE records as sync could result in putting them into ZIL before previous writes if ones were async. |
308592 |
12-Nov-2016 |
mav |
MFC r308055: Add vdev_reopening support to vdev_geom.
It allows to avoid extra GEOM providers flapping without significant need. Since GEOM got resize support, we don't need to reopen provider to get new size. If provider was orphaned and no longer valid, ZFS should already know that, and in such case reopen should be done in full as expected. |
308590 |
12-Nov-2016 |
mav |
MFC r308051: Matching GUIDs, handle possible race on vdev detach.
In case of vdev detach, causing top level mirror vdev destruction, leaf vdev changes its GUID to one of the destroyed mirror, that creates race condition when GUID in vdev label may not match one in the pool config.
This change replicates logic nuance of vdev_validate() by adding special exception, matching the vdev GUID against the top level vdev GUID. Since this exception is not completely reliable (may give false positives if we fail to erase label on detached vdev), use it only as last resort.
Quick way to reproduce this scenario now is detach vdev from a pool with enabled autoextend. During vdev detach autoextend logic tries to reopen remaining vdev, that always fails now since in-memory configuration is already updated, while on-disk labels are not yet. |
308588 |
12-Nov-2016 |
mav |
MFC r308049: Improve few debugging log messages. |
308586 |
12-Nov-2016 |
mav |
MFC r307318: MFV r307314: 6988 spa_sync() spends half its time in dmu_objset_do_userquota_updates
Using a benchmark which creates 2 million files in one TXG, I observe that the thread running spa_sync() is on CPU almost the entire time we are syncing, and therefore can be a performance bottleneck. About 50% of the time in spa_sync() is in dmu_objset_do_userquota_updates().
The problem is that dmu_objset_do_userquota_updates() calls zap_increment_int(DMU_USERUSED_OBJECT) once for every file that was modified (or created). In this benchmark, all the files are owned by the same user/group, so all 2 million calls to zap_increment_int() are modifying the same entry in the zap. The same issue exists for the DMU_GROUPUSED_OBJECT.
We should keep an in-memory map from user to space delta while we are syncing, and when we finish, iterate over the in-memory map and modify the ZAP once per entry. This reduces the number of calls to zap_increment_int() from "number of objects modified" to "number of owners/groups of modified files".
This reduced the time spent in spa_sync() in the file create benchmark by ~33%, from 11 seconds to 7 seconds.
Closes #107
Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Steve Gonczi <steve.gonczi@delphix.com> Reviewed by: Ned Bass <bass6@llnl.gov> Reviewed by: Jinshan Xiong <jinshan.xiong@intel.com> Author: Matthew Ahrens <mahrens@delphix.com>
openzfs/openzfs@5fc46359c569369d87728ca09f8705cdff6cc8e2 |
308448 |
08-Nov-2016 |
mav |
MFC r307857: Fix panic after ZVOL renamed to name invalid for DEVFS. |
308246 |
03-Nov-2016 |
avg |
MFC r307994: 3746 ZRLs are racy
PR: 204037 |
308087 |
29-Oct-2016 |
mav |
MFC r306456: Add #ifdef _KERNEL around send_holes_without_birth_time sysctl. |
308086 |
29-Oct-2016 |
mav |
MFC r306425: MFV r306423: 7402 Create tunable to ignore hole_birth feature
Until we can resolve the numerous hole_birth bugs that have cropped up recently, and come up with a way going forwards to protect users from corruption, we should disable the hole_birth feature. Using a tunable allows those who are confident that their data is correct to continue to take advantage of the feature.
Closes #188
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Author: Paul Dagnelie <pcd@delphix.com> |
308083 |
29-Oct-2016 |
mav |
MFC r306424: MFV r306422: 7254 ztest failed assertion in ztest_dataset_dirobj_verify: dirobjs + 1 == usedo bjs
dsl_dataset_space is looking at the ds_bp's fill count while dmu_objset_write_ready() is concurrently modifying it. This fix adds an rrwlock to protect the ds_bp.
Closes #180
Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Steve Gonczi <steve.gonczi@delphix.com> Author: Paul Dagnelie <pcd@delphix.com> |
308061 |
28-Oct-2016 |
mav |
MFC r300881, r302058 (by asomers): Avoid issuing spa config updates for physical path when not necessary
ZFS's configuration needs to be updated whenever the physical path for a device changes, but not when a new device is introduced. This is because new devices necessarily cause config updates, but only if they are actually accepted into the pool.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c Split vdev_geom_set_physpath out of vdev_geom_attrchanged. When setting the vdev's physical path, only request a config update if the physical path has changed. Don't request it when opening a device for the first time, because the config sync will happen anyway upstack.
sys/geom/geom_dev.c Split g_dev_set_physpath and g_dev_set_media out of g_dev_attrchanged |
308060 |
28-Oct-2016 |
mav |
MFC r300059 (by asomers): Speed up vdev_geom_open_by_guids
Speedup is hard to measure because the only time vdev_geom_open_by_guids gets called on many drives at the same time is during boot. But with vdev_geom_open hacked to always call vdev_geom_open_by_guids, operations like "zpool create" speed up by 65%.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
* Read all of a vdev's labels in parallel instead of sequentially. * In vdev_geom_read_config, don't read the entire label, including the uberblock. That's a waste of RAM. Just read the vdev config nvlist. Reduces the IO and RAM involved with tasting from 1MB to 448KB. |
308059 |
28-Oct-2016 |
mav |
MFC r298814 (by asomers): Fix a use-after-free when "zpool import" fails
clear vd->vdev_tsd in vdev_geom_close_locked instead of vdev_geom_detach. In the latter function, it would fail to happen in certain circumstances where cp->private was unset. Ideally, the latter should never happen, but it can happen when vdev open fails, or where spares are involved. |
308058 |
28-Oct-2016 |
mav |
MFC r298786 (by asomers): Refactor vdev_geom_attach and friends to reduce code duplication
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c Move checks for provider's sectorsize and mediasize into a single location in vdev_geom_attach. Remove the zfs::vdev::taste class; it's ok to use the regular vdev class for tasting. Consolidate guid checks into a single location in vdev_attach_ok. Consolidate some error handling code from vdev_geom_attach into vdev_geom_detach, closing a resource leak of geom consumers in the process. |
308057 |
28-Oct-2016 |
mav |
MFC r294329 (by asomers): Disallow zvol-backed ZFS pools
Using zvols as backing devices for ZFS pools is fraught with panics and deadlocks. For example, attempting to online a missing device in the presence of a zvol can cause a panic when vdev_geom tastes the zvol. Better to completely disable vdev_geom from ever opening a zvol. The solution relies on setting a thread-local variable during vdev_geom_open, and returning EOPNOTSUPP during zvol_open if that thread-local variable is set.
Remove the check for MUTEX_HELD(&zfsdev_state_lock) in zvol_open. Its intent was to prevent a recursive mutex acquisition panic. However, the new check for the thread-local variable also fixes that problem.
Also, fix a panic in vdev_geom_taste_orphan. For an unknown reason, this function was set to panic. But it can occur that a device disappears during tasting, and it causes no problems to ignore this departure. |
307996 |
27-Oct-2016 |
avg |
MFC r306801: implement zfs_vptocnp() using z_parent property |
307672 |
20-Oct-2016 |
kib |
MFC r307218: Fix a race in vm_page_busy_sleep(9). |
307300 |
14-Oct-2016 |
mav |
MFC r305563: MFV r305562: 7259 DS_FIELD_LARGE_BLOCKS is unused
The DS_FIELD_LARGE_BLOCKS macro has been unused since the integration of this patch:
commit ca0cc3918a1789fa839194af2a9245f801a06b1a Author: Matthew Ahrens <mahrens@delphix.com> Date: Fri Jul 24 09:53:55 2015 -0700
5959 clean up per-dataset feature count code Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net>
This patch simply removes this macro from dsl_dataset.h.
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Author: Matthew Ahrens <mahrens@delphix.com> |
307298 |
14-Oct-2016 |
mav |
MFC r305561: MFV r305560: 7278 tuning zfs_arc_max does not impact arc_c_min
When changing zfs_arc_max (e.g. as zdb does), it may be set to less than the default arc_c_min. arc_c_min should decrease to not be more than arc_c_max, but it doesn't; therefore tuning of arc_c_max is ineffective.
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Author: Matthew Ahrens <mahrens@delphix.com>
openzfs/openzfs@608764beadaf4bb71c5d8fe1818e8392ac66a61b |
307296 |
14-Oct-2016 |
mav |
MFC r305456 (by avg): fix zfs pool creation accidentally broken by r305331
The upstream change introduced a new load state, SPA_LOAD_CREATE, and vdev_geom code needs to be aware of it. |
307294 |
14-Oct-2016 |
mav |
MFC r305342: Missed FreeBSD-specific piece of r305338. |
307292 |
14-Oct-2016 |
mav |
MFC r305340: MFC r305337: 7004 dmu_tx_hold_zap() does dnode_hold() 7x on same object
Using a benchmark which has 32 threads creating 2 million files in the same directory, on a machine with 16 CPU cores, I observed poor performance. I noticed that dmu_tx_hold_zap() was using about 30% of all CPU, and doing dnode_hold() 7 times on the same object (the ZAP object that is being held).
dmu_tx_hold_zap() keeps a hold on the dnode_t the entire time it is running, in dmu_tx_hold_t:txh_dnode, so it would be nice to use the dnode_t that we already have in hand, rather than repeatedly calling dnode_hold(). To do this, we need to pass the dnode_t down through all the intermediate calls that dmu_tx_hold_zap() makes, making these routines take the dnode_t* rather than an objset_t* and a uint64_t object number. In particular, the following routines will need to have analogous *_by_dnode() variants created:
dmu_buf_hold_noread() dmu_buf_hold() zap_lookup() zap_lookup_norm() zap_count_write() zap_lockdir() zap_count_write()
This can improve performance on the benchmark described above by 100%, from 30,000 file creations per second to 60,000. (This improvement is on top of that provided by working around the object allocation issue. Peak performance of ~90,000 creations per second was observed with 8 CPUs; adding CPUs past that decreased performance due to lock contention.) The CPU used by dmu_tx_hold_zap() was reduced by 88%, from 340 CPU-seconds to 40 CPU-seconds.
Sponsored by: Intel Corp.
Closes #109
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Ned Bass <bass6@llnl.gov> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Author: Matthew Ahrens <mahrens@delphix.com>
openzfs/openzfs@d3e523d489a169ab36f9ec1b2a111a60a5563a9f |
307289 |
14-Oct-2016 |
mav |
MFC r305339: MFV r305336: 7247 zfs receive of deduplicated stream fails
This resolves two 'zfs recv' issues. First, when receiving into an existing filesystem, a snapshot created during the receive process is not added to the guid->dataset map for the stream, resulting in failed lookups for deduped streams when a WRITE_BYREF record refers to a snapshot received earlier in the stream. Second, the newly created snapshot was also not set properly, referencing the snapshot before the new receiving dataset rather than the existing filesystem.
Closes #159
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Author: Chris Williamson <chris.williamson@delphix.com>
openzfs/openzfs@b09697c8c18be68abfe538de9809938239402ae8 |
307287 |
14-Oct-2016 |
mav |
MFC r305338: MFV r305335: 7003 zap_lockdir() should tag hold
zap_lockdir() / zap_unlockdir() should take a "void *tag" argument which tags the hold on the zap. This will help diagnose programming errors which misuse the hold on the ZAP.
Sponsored by: Intel Corp.
Closes #108
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Steve Gonczi <steve.gonczi@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Author: Matthew Ahrens <mahrens@delphix.com>
openzfs/openzfs@0780b3eab5a2c13e04328b39ecd2a6d0d3c4f7cb |
307285 |
14-Oct-2016 |
mav |
MFC r305334: MFV r304157: 7230 add assertions to dmu_send_impl() to verify that stream includes BEGIN and END records
illumos/illumos-gate@12b90ee2d3b10689fc45f4930d2392f5fe1d9cfa https://github.com/illumos/illumos-gate/commit/12b90ee2d3b10689fc45f4930d2392f5f e1d9cfa
https://www.illumos.org/issues/7230 A test failure occurred where a send stream had only a BEGIN record. This should not be possible if the send returns without error. Prevented this from happening in the future by adding an assertion to dmu_send_impl() to verify that if the function returns 0 (success) both a BEGIN and END record are present. Did this by adding flags to dmu_sendarg_t (indicating whether BEGIN o r END records sent), having dump_record() set flags appropriately, adding VERIFY statement to dmu_send_impl().
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matt Krantz <matt.krantz@delphix.com> |
307283 |
14-Oct-2016 |
mav |
MFC r305333: MFV r304156: 7235 remove unused func dsl_dataset_set_blkptr
illumos/illumos-gate@bd56f80007857b960e0981ed0797ad8ec844a96b https://github.com/illumos/illumos-gate/commit/bd56f80007857b960e0981ed0797ad8ec 844a96b
https://www.illumos.org/issues/7235 The function dsl_dataset_set_blkptr() is unused. We should remove it.
Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com> |
307279 |
14-Oct-2016 |
mav |
MFC r305331: MFV r304155: 7090 zfs should improve allocation order and throttle allocations
illumos/illumos-gate@0f7643c7376dd69a08acbfc9d1d7d548b10c846a https://github.com/illumos/illumos-gate/commit/0f7643c7376dd69a08acbfc9d1d7d548b 10c846a
https://www.illumos.org/issues/7090 When write I/Os are issued, they are issued in block order but the ZIO pipelin e will drive them asynchronously through the allocation stage which can result i n blocks being allocated out-of-order. It would be nice to preserve as much of the logical order as possible. In addition, the allocations are equally scattered across all top-level VDEVs but not all top-level VDEVs are created equally. The pipeline should be able t o detect devices that are more capable of handling allocations and should allocate more blocks to those devices. This allows for dynamic allocation distribution when devices are imbalanced as fuller devices will tend to be slower than empty devices. The change includes a new pool-wide allocation queue which would throttle and order allocations in the ZIO pipeline. The queue would be ordered by issued time and offset and would provide an initial amount of allocation of work to each top-level vdev. The allocation logic utilizes a reservation system to reserve allocations that will be performed by the allocator. Once an allocatio n is successfully completed it's scheduled on a given top-level vdev. Each top- level vdev maintains a maximum number of allocations that it can handle (mg_alloc_queue_depth). The pool-wide reserved allocations (top-levels * mg_alloc_queue_depth) are distributed across the top-level vdevs metaslab groups and round robin across all eligible metaslab groups to distribute the work. As top-levels complete their work, they receive additional work from the pool-wide allocation queue until the allocation queue is emptied.
Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: George Wilson <george.wilson@delphix.com> |
307270 |
14-Oct-2016 |
mav |
MFC r305325: MFV r303078: 7086 ztest attempts dva_get_dsize_sync on an embedded blockpointer
illumos/illumos-gate@926549256b71acd595f69b236779ff6b78fa08ef https://github.com/illumos/illumos-gate/commit/926549256b71acd595f69b236779ff6b7 8fa08ef
https://www.illumos.org/issues/7086 In dbuf_dirty(), we need to grab the dn_struct_rwlock before looking at the db_blkptr, to prevent it from being changed by syncing context. Otherwise we may see that ztest got a segfault from this stack: libzpool.so.1`dva_get_dsize_sync+0x98(872f000, b32b240, fed7811b, 0, b4cda20, 0) libzpool.so.1`bp_get_dsize+0x60(872f000, b32b240, 0, 97cb780, 9d4c1a8, 0) libzpool.so.1`dbuf_dirty+0x9b3(ce0a100, 97cb780, 9, fecd2530) libzpool.so.1`dmu_buf_will_dirty+0xc3(ce0a100, 97cb780, ea293d6c, 1) libzpool.so.1`zap_lockdir+0x1a0(8aaa3c0, 1, 0, 97cb780, 1, 1) libzpool.so.1`zap_remove_norm+0x30(8aaa3c0, 1, 0, 8728b10, 0, 97cb780) libzpool.so.1`zap_remove+0x29(8aaa3c0, 1, 0, 8728b10, 97cb780, a) ztest_replay_remove+0x225(ea294588, 8728ae8, 0, 38010000, 0, 0) ztest_remove+0x9f(ea294588, ea293f50, 4, 3) ztest_object_init+0x78(ea294588, ea293f50, 4e0, 1) ztest_dmu_object_alloc_free+0x71(ea294588, 13) ztest_dmu_objset_create_destroy+0x224(80cef08, 13, 0, 805d36c, 9017ad44, 0) ztest_execute+0x89(a, 807c720, 13, 0) ztest_thread+0xea(13, 0, 0, 0) libc.so.1`_thrp_setup+0x88(f0983240) libc.so.1`_lwp_start(f0983240, 0, 0, 0, 0, 0) Looking into it a bit, we see that this is an embedded blockpointer, so BP_GET_NDVAS should have returned 0: b32b240::blkptr EMBEDDED [L0 ZAP_OTHER] et=0 LZ4 size=200L/4aP birth=80L Instead, it looks like another thread is modifying this blockpointer: b32b240::ugrep | ::whatis f47a0e0c is in [ stack tid=0x19f ] ebd6ec40 is in [ stack tid=0x226 ] ea293bd0 is in [ stack tid=0x244 ] ea293be4 is in [ stack tid=0x244 ]
Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com> |
307268 |
14-Oct-2016 |
mav |
MFC r305324: MFV r303077: 7072 zfs fails to expand if lun added when os is in shutdown state
illumos/illumos-gate@c39a2aae1e2c439d156021edfc20910dad7f9891 https://github.com/illumos/illumos-gate/commit/c39a2aae1e2c439d156021edfc20910da d7f9891
https://www.illumos.org/issues/7072 upstream: 38733 zfs fails to expand if lun added when os is in shutdown state DLPX-36910 spares and caches should not display expandable space DLPX-39262 vdev_disk_open spam zfs_dbgmsg buffer
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: George Wilson <george.wilson@delphix.com> |
307266 |
14-Oct-2016 |
mav |
MFC r305323: MFV r302991: 6950 ARC should cache compressed data
illumos/illumos-gate@dcbf3bd6a1f1360fc1afcee9e22c6dcff7844bf2 https://github.com/illumos/illumos-gate/commit/dcbf3bd6a1f1360fc1afcee9e22c6dcff 7844bf2
https://www.illumos.org/issues/6950 When reading compressed data from disk, the ARC should keep the compressed block cached and only decompress it when consumers access the block. The uncompressed data should be short-lived allowing the ARC to cache a much large r amount of data. The DMU would also maintain a smaller cache of uncompressed blocks to minimize the impact of decompressing frequently accessed blocks.
Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Don Brady <don.brady@intel.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: George Wilson <george.wilson@delphix.com> |
307143 |
12-Oct-2016 |
avg |
MFC r306665: zfs: fix a wrong assertion for extended attributes
PR: 213112 |
307127 |
12-Oct-2016 |
mav |
MFC r305224: MFV r304158: 7136 ESC_VDEV_REMOVE_AUX ought to always include vdev information
7115 6922 generates ESC_ZFS_VDEV_REMOVE_AUX a bit too often
illumos/illumos-gate@b72b6bb10ad55121a1b352c6f68ebdc8e20c9086 https://github.com/illumos/illumos-gate/commit/b72b6bb10ad55121a1b352c6f68ebdc8e 20c9086
https://www.illumos.org/issues/7136 6922 added ESC_ZFS_VDEV_REMOVE_AUX and ESC_ZFS_VDEV_REMOVE_DEV sysevents whenever an aux device gets removed from a pool. However, those sysevents will be created without the vdev_guid and vdev_path fields. It would be better to always populate those fields.
https://www.illumos.org/issues/7115 The addition of spa_event_notify in vdev removal code (see #6922) causes event s to be generated even if the spare failed to be removed with EBUSY.
Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net> Approved by: Robert Mustacchi <rm@joyent.com> Author: Alan Somers <asomers@gmail.com> |
307126 |
12-Oct-2016 |
mav |
MFC r305222: MFV r302993: 7104 increase indirect block size
illumos/illumos-gate@4b5c8e93cab28d3c65ba9d407fd8f46e3be1db1c https://github.com/illumos/illumos-gate/commit/4b5c8e93cab28d3c65ba9d407fd8f46e3 be1db1c
https://www.illumos.org/issues/7104 The current default indirect block size is 16KB. We can improve performance by increasing it to 128KB. This is especially helpful for any workload that needs to read most of the metadata, e.g. scrub/resilver, file deletion, filesystem deletion, and zfs send. We also need to fix a few space estimation errors to make the tests pass.
Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com> |
307125 |
12-Oct-2016 |
mav |
MFC r305221: MFV r302992: 7071 lzc_snapshot does not fill in errlist on ENOENT
illumos/illumos-gate@25f7d993adbfb3452ac4625b3791670746d35ae3 https://github.com/illumos/illumos-gate/commit/25f7d993adbfb3452ac4625b379167074 6d35ae3
https://www.illumos.org/issues/7071 upstream DLPX-40482 lzc_snapshot does not fill in errlist on ENOENT
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com> |
307124 |
12-Oct-2016 |
mav |
MFC r305211: MFV r302662: 6447 handful of nvpair cleanups
illumos/illumos-gate@759e89be359f2af635e4122d147df56bce948773 https://github.com/illumos/illumos-gate/commit/759e89be359f2af635e4122d147df56bc e948773
https://www.illumos.org/issues/6447 I got a patch from someone who uses nvpair code outside of illumos. It fixes a couple of gcc warnings/bugs for him. 1. silence uninitialized use warnings 2. add parentheses around assignment used as truth value 3. fix printf format specifier (ll is for integers only) 4. strstr, strspn, strcspn, and strcmp are declared in string.h, not strings.h. 5. avoid scanning integer into boolean variable
Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net> Reviewed by: Andy Stormont <astormont@racktopsystems.com> Reviewed by: Garrett D'Amore <garrett@damore.org> Approved by: Robert Mustacchi <rm@joyent.com> Author: Steve Dougherty <sdougherty@barracuda.com> |
307123 |
12-Oct-2016 |
mav |
MFC r305210: MFV r302661: 7082 bptree_iterate() passes wrong args to zfs_dbgmsg()
illumos/illumos-gate@10e67aa0db0823d5464aafdd681f3c966155c68e https://github.com/illumos/illumos-gate/commit/10e67aa0db0823d5464aafdd681f3c966 155c68e
https://www.illumos.org/issues/7082 upstream DLPX-40542 bptree_iterate() passes wrong args to zfs_dbgmsg()
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Matthew Ahrens <mahrens@delphix.com> |
307122 |
12-Oct-2016 |
mav |
MFC r305209: MFV r302660: 6314 buffer overflow in dsl_dataset_name
illumos/illumos-gate@9adfa60d484ce2435f5af77cc99dcd4e692b6660 https://github.com/illumos/illumos-gate/commit/9adfa60d484ce2435f5af77cc99dcd4e6 92b6660
https://www.illumos.org/issues/6314 Callers of dsl_dataset_name pass a buffer of size ZFS_MAXNAMELEN, but dsl_dataset_name copies the datasets' name PLUS the snapshot name to it, resulting in a max of 2 * ZFS_MAXNAMELEN + '@'.
Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Matthew Ahrens <mahrens@delphix.com> |
307058 |
11-Oct-2016 |
mav |
MFC r305207: MFV r302659: 6931 lib/libzfs: cleanup gcc warnings
illumos/illumos-gate@88f61dee20b358671b1b643e9d1dbf220a1d69be https://github.com/illumos/illumos-gate/commit/88f61dee20b358671b1b643e9d1dbf220a1d69be
https://www.illumos.org/issues/6931 need cleanup: CERRWARN += -_gcc=-Wno-switch CERRWARN += -_gcc=-Wno-parentheses CERRWARN += -_gcc=-Wno-unused-function
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Igor Kozhukhov <ikozhukhov@gmail.com> |
307057 |
11-Oct-2016 |
mav |
MFC r305200: MFV r302651: 7054 dmu_tx_hold_t should use refcount_t to track space
illumos/illumos-gate@0c779ad424a92a84d1e07d47cab7f8009189202b https://github.com/illumos/illumos-gate/commit/0c779ad424a92a84d1e07d47cab7f8009 189202b
https://www.illumos.org/issues/7054 upstream: ee0003de7d3e598499be7ac3fe6b61efcc47cb7f DLPX-40399 dmu_tx_hold_t should use refcount_t to track space
Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Matthew Ahrens <mahrens@delphix.com> |
307056 |
11-Oct-2016 |
mav |
MFC r305199: MFV r302648: 7019 zfsdev_ioctl skips secpolicy when FKIOCTL is set
Note that the bulk of the upstream change is not applicable to FreeBSD and the affected files are not even in the vendor area.
illumos/illumos-gate@45b1747515a17db45e8971501ee84a26bdff37b2 https://github.com/illumos/illumos-gate/commit/45b1747515a17db45e8971501ee84a26bdff37b2
https://www.illumos.org/issues/7019 Currently zfsdev_ioctl, when confronted by a request with the FKIOCTL flag set, skips all processing of secpolicy functions. This means that ZFS is not doing any kind of verification of the credentials or access rights of the caller and assuming that (as it is an in-kernel client) all such checks have already been done. This turns out to be quite a dangerous assumption, especially with respect to sdev. In general I don't think it's particularly reasonable to offload this enforcement of access rights onto other kernel subsystems when ZFS has some particular local semantics in this area (delegated datasets etc) and does not provide any kind of API to allow other subsystems to avoid code duplication when doing it. ZFS should apply its normal access policy to requests from within the kernel, and callers should take care to give it the correct credentials and call it from the correct context in order to get the results they need. You can observe the currently unfortunate consequences of this bug in any non- global zone that has access to /dev/zvol or any subset of it via sdev profiles. In particular, a zone used to contain a KVM or similar which has a single zvol passed through to it using a <device match= block in its zone XML. Even though sdev makes something of an attempt to control for whether the caller should have access to nodes in /dev/zvol, it doesn't do this correctly, or really at all in the lookup call path. So, if we have a zone that's been given access to any part of /dev/zvol, it can simply look up the full path to any other zvol on the entire system, and the node will appear and be able to be used.
Reviewed by: Robert Mustacchi <rm@joyent.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Alex Wilson <alex.wilson@joyent.com> |
307055 |
11-Oct-2016 |
mav |
MFC r305198: MFV r302647: 6922 Emit ESC_ZFS_VDEV_REMOVE_AUX after removing an aux device
illumos/illumos-gate@63364b0ee2604783e7a55f8425888867768eafa4 https://github.com/illumos/illumos-gate/commit/63364b0ee2604783e7a55f84258888677 68eafa4
https://www.illumos.org/issues/6922 ZFS does not do a config_sync after removing an aux (spare, log, or cache) device. AFAICT this isn't being done because it is slow and was deemed unnecessary. However, it should be such a rare operation that speed doesn't matter, and not doing it results in two problems: 1) It is theoretically possible to remove an aux device from one pool and attach it to another, then lose power. When power is restored, both pools woul d think that they own the aux device. 2) Removal of the aux device doesn't send any useful sysevents to userland.
Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Alan Somers <asomers@gmail.com> |
307054 |
11-Oct-2016 |
mav |
MFC r305197: MFV r302646: 6980 6902 causes zfs send to break due to 32-bit/64-bit struct mismatch
illumos/illumos-gate@ea4a67f462de0a39a9adea8197bcdef849de5371 https://github.com/illumos/illumos-gate/commit/ea4a67f462de0a39a9adea8197bcdef84 9de5371
https://www.illumos.org/issues/6980 doing zfs send -i snap1 snap2 >testfile results in internal error: Invalid argument Abort (core dumped)
Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com> |
307053 |
11-Oct-2016 |
mav |
MFC r305195: MFV r302643: 6902 speed up listing of snapshots if requesting name only and sorting by name
This was our change from the beginning, so just reduce the upstream diff. |
307052 |
11-Oct-2016 |
mav |
MFC r305193: MFV r302642: 6876 Stack corruption after importing a pool with a too-long name
illumos/illumos-gate@c971037baa5d64dfecf6d87ed602fc3116ebec41 https://github.com/illumos/illumos-gate/commit/c971037baa5d64dfecf6d87ed602fc3116ebec41
https://www.illumos.org/issues/6876 Calling dsl_dataset_name on a dataset with a 256 byte buffer is asking for trouble. We should check every dataset on import, using a 1024 byte buffer and checking each time to see if the dataset's new name is longer than 256 bytes.
Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Paul Dagnelie <pcd@delphix.com> |
306819 |
07-Oct-2016 |
avg |
MFC r306292: fix vnode lock assertion for extended attributes directory |
305800 |
14-Sep-2016 |
mav |
MFC r305123: Fix kernel panic when inheriting properties without default.
There are two writable hidden properties "iscsioptions" and "stmf_sbd_lu", that have no default string value. Attempt to unset them or replicate caused kernel panic. This simple bandaid seems fixes the problem nicely. |
305273 |
02-Sep-2016 |
ngie |
MFstable/11 r305271:
MFC r303576:
Conditionalize code which defines sysctls per _KERNEL #ifdef guard
This resolves several issues when compiling libzpool (userspace library), i.e. -Wimplicit-function-declaration and -Wmissing-declarations issues.
Tested with: clang 3.8.1, gcc 4.2.1, gcc 5.3.0 |
304671 |
23-Aug-2016 |
avg |
MFC r303763,303791,303869: zfs: honour and make use of vfs vnode locking protocol
PR: 209158 |
304139 |
15-Aug-2016 |
avg |
MFC r302838: 6513 partially filled holes lose birth time |
304136 |
15-Aug-2016 |
avg |
MFC r302837: 6844 dnode_next_offset can detect fictional holes |
304135 |
15-Aug-2016 |
avg |
MFC r302836: 6874 rollback and receive need to reset ZPL state to what's on disk |
304128 |
15-Aug-2016 |
avg |
MFC r302840: 6878 Add scrub completion info to "zpool history" |
304121 |
15-Aug-2016 |
avg |
MFC r302839: 6940 Cannot unlink directories when over quota |
302989 |
18-Jul-2016 |
avg |
MFC r302772: re-apply r299908: zfsctl_snapdir_lookup: clear VV_ROOT of snapshot's root |
302909 |
15-Jul-2016 |
markj |
MFC r302507: Avoid truncating the return value of DTrace predicates. |
302764 |
13-Jul-2016 |
avg |
MFC r300134: move zfsctl_freebsd_root_lookup right next to zfsctl_root_lookup |
302762 |
13-Jul-2016 |
avg |
MFC r300130: zfsctl_freebsd_root_lookup: gfs_vop_lookup may return a doomed vnode |
302760 |
13-Jul-2016 |
avg |
MFC r299951: do not destroy 'snapdir' when it becomes inactive |
302756 |
13-Jul-2016 |
avg |
MFC r299949: try to recycle "snap" vnodes as soon as possible |
302752 |
13-Jul-2016 |
avg |
MFC r299947: fix locking in zfsctl_root_lookup |
302750 |
13-Jul-2016 |
avg |
MFC r299946: gfs_lookup_dot() does not have to acquire any locks |
302748 |
13-Jul-2016 |
avg |
MFC r299945: avoid deadlock between zfsctl_snapdir_lookup and zfsctl_snapshot_reclaim |
302746 |
13-Jul-2016 |
avg |
MFC r299940: fix a vnode reference leak caused by illumos compat traverse() |
302743 |
13-Jul-2016 |
avg |
MFC r299914: zfsctl_ops_snapshot: remove methods should never be called |
302740 |
13-Jul-2016 |
avg |
MFC r301273: zfs_root: fix a potential root vnode reference leak |
302738 |
13-Jul-2016 |
avg |
MFC r299908,300131,301275: zfs: set VROOT / VV_ROOT consistently and in a single place |
302735 |
13-Jul-2016 |
avg |
MFC r299902,299938: mount_snapshot: consolidate all error handling |
302732 |
13-Jul-2016 |
avg |
MFC r299906,301870: add zfs_vptocnp with special handling for snapshots under .zfs
Note that the changed is adjusted for the lack of LK_VNHELD in this branch. |
302730 |
13-Jul-2016 |
avg |
MFC r300145: add vop_print methods to vnode operatios of various zfsctl node types |
302729 |
13-Jul-2016 |
avg |
MFC r301873: l2arc: reset b_tmp_cdata to NULL in the case of unset b_daddr |
302727 |
13-Jul-2016 |
avg |
MFC r302123: fix deadlock-prone code in getzfsvfs() |
302724 |
13-Jul-2016 |
avg |
MFC r299900: zfsctl: fix several problems with reference counts
PR: 207464 |
302721 |
13-Jul-2016 |
avg |
MFC r298105: zfs: enable vn_io_fault support |
302719 |
13-Jul-2016 |
avg |
MFC r300133: zfsctl_common_fid: remove redundant assignment |
302717 |
13-Jul-2016 |
avg |
MFC r300132: zfsctl: tighten assertion and remove unused definition |
302714 |
13-Jul-2016 |
smh |
MFC r302265, r302382
Allow ZFS ARC min / max to be tuned at runtime
Relnotes: YES Sponsored by: Multiplay |
301695 |
08-Jun-2016 |
ngie |
MFC r300870,r300884:
r300870:
Unbreak the zfs(4) build
vm/vm_pageout.h grew a dependency on the bool typedef in r300865
arc.c didn't include sys/types.h, which included the definition for the typedef
Other items (ofed, drm2) might need to be chased for this commit.
Pointyhat to: alc
r300884:
Fix up r300870
The sys/types.h fix I proposed was only tested with zfs(4), not with libzpool, which is where the build failure actually existed
Remove vm/vm_pageout.h from arc.c and zfs_vnops.c because they're both unneeded
In collaboration with: kib |
300482 |
23-May-2016 |
avg |
MFC r300024: zfs_ioc_rename: fix a reversed condition
PR: 209093 |
300039 |
17-May-2016 |
avg |
MFC r297848: l2arc: make sure that all writes honor ashift of a cache device
Note: no MFC stable/9 because it has become quite out of date with head, so the merge would be quite labourious and, thus, risky. |
300034 |
17-May-2016 |
avg |
MFC r298106: zfs_rezget: z_vnode can not be NULL if zp is valid |
300028 |
17-May-2016 |
avg |
MFC r298472: MFV r298471: 6052 decouple lzc_create() from the implementation details |
299958 |
16-May-2016 |
asomers |
MFC r298072
Don't corrupt ZFS label's physpath attribute when booting while a disk is missing
Prior to this change, vdev_geom_open_by_path would call vdev_geom_attach prior to verifying the device's GUIDs. vdev_geom_attach calls vdev_geom_attrchange to set the physpath in the vdev object. The result is that if the disk could not be found, then the labels for other disks in the same TLD would overwrite the missing disk's physpath with the physpath of whichever disk currently has the same devname as the missing one used to have. |
299536 |
12-May-2016 |
asomers |
MFC r297986, r298017 to vdev_geom.c
r297986 | asomers | 2016-04-14 13:20:31 -0600 (Thu, 14 Apr 2016) | 6 lines
Update a debugging message in vdev_geom_open_by_guids for consistency with similar messages elsewhere in the file.
r298017 | asomers | 2016-04-14 17:14:41 -0600 (Thu, 14 Apr 2016) | 8 lines
Add more debugging statements in vdev_geom.c
Log a debugging message whenever geom functions fail in vdev_geom_attach. Printing these messages is controlled by vfs.zfs.debug |
299433 |
11-May-2016 |
mav |
MFC r297832: MFV r297831: 6322 ZFS indirect block predictive prefetch
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Author: Alexander Motin <mav@FreeBSD.org>
Improve speculative prefetch of indirect blocks.
Scalability of many operations on wide ZFS pool can be limited by requirement to prefetch indirect blocks first. Recently added asynchronous indirect block read partially helped, but did not solve the problem completely. This patch extends existing prefetcher functionality to explicitly work with indirect blocks.
Before this change prefetcher issued reads for up to 8MB of data in advance. With this change it also issues indirect block reads for up to 64MB of data in advance, so that when it will be time to actually read those data, it can be done immediately. Alike effect can be achieved by just increasing maximal data prefetch distance, but at higher memory cost.
Also this change introduces indirect block prefetch for rewrite operations, that was never done before. Previously ARC miss for Indirect blocks regularly blocked rewrites, converting perfectly aligned asynchronous operations into synchronous read-write pairs, significantly reducing maximal rewrite speed.
While being there this issue was also fixed: - prefetch was done always, even if caching for the dataset was completely disabled.
Testing on FreeBSD with zvol on top of 6x striped 2x mirrored pool of 12 assorted HDDs shown me such performance numbers: ------- BEFORE -------- Write 491363677 bytes/sec Read 312430631 bytes/sec Rewrite 97680464 bytes/sec -------- AFTER -------- Write 493524146 bytes/sec Read 438598079 bytes/sec Rewrite 277506044 bytes/sec
Closes #65 Closes #80
openzfs/openzfs@792fd28ac04f78cc5e43ead2d72a96f244ea84e8 |
299432 |
11-May-2016 |
mav |
MFC r297509: MFV r297506: 6738 zfs send stream padding needs documentation
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Eli Rosenthal <eli.rosenthal@delphix.com>
illumos/illumos-gate@c20404ff77119516354b0d112d28b7ea0dadd303 |
299431 |
11-May-2016 |
mav |
MFC r297507: MFV r297504: 6681 zfs list burning lots of time in dodefault() via dsl_prop_*
Reviewed by: Patrick Mooney <patrick.mooney@joyent.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Matthew Ahrens <mahrens@delphix.com> Author: Alex Wilson <alex.wilson@joyent.com>
illumos/illumos-gate@d09e4475f635b6f66ee68d8c17a32bba7be17c96 |
299430 |
11-May-2016 |
mav |
MFC r297763: MFV r297760: 6418 zpool should have a label clearing command
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Author: Will Andrews <will@firepipe.net>
Closes #83 Closes #32
openzfs/openzfs@9663688425131744221ea99f9e66b9ed964492ae
FreeBSD already had `zpool labelclear` functionality, so this is mostly just a diff reduction. |
299376 |
10-May-2016 |
asomers |
MFC 297868
Fix rare double free in vdev_geom_attrchanged
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c Don't drop the g_topology_lock before freeing old_physpath. That opens up a race where one thread can call vdev_geom_attrchanged, set old_physpath, drop the g_topology_lock, then block trying to acquire the SCL_STATE lock. Then another thread can come into vdev_geom_attrchanged, set old_physpath to the same value, and proceed to free it. When the first thread resumes, it will free the same location.
It turns out that the SCL_STATE lock isn't needed. It was originally added by gibbs to protect vd->vdev_physpath while updating the same. However, the update process subsequently was switched to an atomic operation (a pointer swap). Now, there is no need for the SCL_STATE lock, and hence no need to drop the g_topology_lock. |
299277 |
09-May-2016 |
markj |
MFC r298589: Allow DOF sections with excessively long probe function components.
PR: 207735 |
299061 |
04-May-2016 |
avg |
MFC r297812: zio: align use of "no dump" flag between use_uma and !use_uma cases |
299003 |
03-May-2016 |
markj |
MFC r296479: Fix fasttrap tracepoint locking. |
298533 |
24-Apr-2016 |
avg |
MFC r297513: remove emulation of VFS_HOLD and VFS_RELE from opensolaris compat |
298469 |
22-Apr-2016 |
avg |
MFC r297709: zio write issue threads should have lower (numerically greater) priority |
297957 |
14-Apr-2016 |
mav |
MFC r297672: Alike to r293708 relax pool check in vdev_geom_open_by_path().
This made impossible spare disk open by known path, which kind of worked only because the same fix was applied to vdev_geom_attach_by_guids() in r293708. |
297549 |
04-Apr-2016 |
mav |
MFC r297421: Plug open count leak on zvol rename. |
297548 |
04-Apr-2016 |
mav |
MFC r297420: Switch from using make_dev_p() to make_dev_s() to close races. |
297547 |
04-Apr-2016 |
mav |
MFC r297337: Pass through error code from make_dev_p().
ENAMETOOLONG is much more informative in logs then ENXIO. |
297546 |
04-Apr-2016 |
mav |
MFC r297232: Unify ignoring EEXIST from zvol_create_minor().
This fixes creation of zvol devices for snapshots during zfs receive, that previously failed with "ZFS WARNING: Unable to create ZVOL" message. This solution is not perfect, but IMHO better then it was before. |
297544 |
04-Apr-2016 |
mav |
MFC r277504 (by will): Remove commented log messages. |
297543 |
04-Apr-2016 |
mav |
MFC r277450 (by will): Use the "zfs_gfs" tag for GFS vnodes to make them easier to identify. |
297542 |
04-Apr-2016 |
mav |
MFC r270382 (by delphij): MFV r270197:
Illumos issue: 5066 remove support for non-ANSI compilation 5068 Remove SCCSID() macro from <macros.h> |
297144 |
21-Mar-2016 |
mav |
MFC r277629 (by will): When creating or updating a node, use vfs_timestamp() for "now" instead of gethrestime(), to allow the administrator to decide the appropriate timestamp precision instead of always using nanosecond precision. |
297123 |
21-Mar-2016 |
mav |
MFC r296615: Make ZFS ignore stripe sizes above SPA_MAXASHIFT (8KB).
If device has stripe size bigger then maximal sector size supported by ZFS, there is nothing can be done to avoid read-modify-write cycles. Taking that stripe size into account will only reduce space efficiency and pointlessly bother user with warnings that can not be fixed. |
297122 |
21-Mar-2016 |
mav |
MFC r296613: Make ZFS more picky to GEOM stripe sizes and offsets.
Use of misaligned or non-power-of-2 stripes is not really useful for ZFS, since increased ashift won't help to avoid read-modify-write cycles, and only reduce pool space efficiency and compression rates. |
297116 |
21-Mar-2016 |
mav |
MFC r296530: MFV r296529: 6672 arc_reclaim_thread() should use gethrtime() instead of ddi_get_lbolt() 6673 want a macro to convert seconds to nanoseconds and vice-versa
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net> Reviewed by: Robert Mustacchi <rm@joyent.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Eli Rosenthal <eli.rosenthal@delphix.com>
illumos/illumos-gate@a8f6344fa0921599e1f4511e41b5f9a25c38c0f9 |
297115 |
21-Mar-2016 |
mav |
MFC r296528: MFV r296527: 6659 nvlist_free(NULL) is a no-op
Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: Marcel Telka <marcel@telka.sk> Approved by: Robert Mustacchi <rm@joyent.com> Author: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
illumos/illumos-gate@aab83bb83be7342f6cfccaed8d5fe0b2f404855d |
297114 |
21-Mar-2016 |
mav |
MFC r296523: MFV r296522: 6541 Pool feature-flag check defeated if "verify" is included in the dedup property value
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Richard Laager <rlaager@wiktel.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: ilovezfs <ilovezfs@icloud.com>
illumos/illumos-gate@971640e6aa954c91b0706543741aa4570299f4d7 |
297113 |
21-Mar-2016 |
mav |
MFC r296521: MFV r296520: 6562 Refquota on receive doesn't account for overage
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com> Reviewed by: Toomas Soome <tsoome@me.com> Approved by: Gordon Ross <gwr@nexenta.com> Author: Dan McDonald <danmcd@omniti.com>
illumos/illumos-gate@5f7a8e6d750cb070a3347f045201c6206caee6aa |
297112 |
21-Mar-2016 |
mav |
MFC r296519: MFV r296518: 5027 zfs large block support (add copyright)
Author: Matthew Ahrens <matt@mahrens.org>
illumos/illumos-gate@c3d26abc9ee97b4f60233556aadeb57e0bd30bb9 |
297111 |
21-Mar-2016 |
mav |
MFC r296516: MFV r296515: 6536 zfs send: want a way to disable setting of DRR_FLAG_FREERECORDS
Reviewed by: Anil Vijarnia <avijarnia@racktopsystems.com> Reviewed by: Kim Shrier <kshrier@racktopsystems.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Andrew Stormont <astormont@racktopsystems.com>
illumos/illumos-gate@880094b6062aebeec8eda6a8651757611c83b13e |
297110 |
21-Mar-2016 |
mav |
MFC r296514: MFV r296513: 6450 scrub/resilver unnecessarily traverses snapshots created after the scrub started
Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@38d61036746e2273cc18f6698392e1e29f87d1bf |
297109 |
21-Mar-2016 |
mav |
MFC r296512: MFV r296511: 6537 Panic on zpool scrub with DEBUG kernel
Reviewed by: Steve Gonczi <gonczi@comcast.net> Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Matthew Ahrens <mahrens@delphix.com> Author: Gary Mills <gary_mills@fastmail.fm>
illumos/illumos-gate@8c04a1fa3f7d569d48fe9b5342d0bd4c533179b9 |
297108 |
21-Mar-2016 |
mav |
MFC r296510, r296563, r296567: MFV r296505: 6531 Provide mechanism to artificially limit disk performance
Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Prakash Surya <prakash.surya@delphix.com>
illumos/illumos-gate@97e81309571898df9fdd94aab1216dfcf23e060b |
297107 |
21-Mar-2016 |
mav |
MFC r296021 (by smh): Removed unused label and fix mutex_exit order
Remove unused done label from zfs_setacl fixing PVS-Studio V729.
Fix mutex_exit order to mirror the mutex_enter order. |
297106 |
21-Mar-2016 |
mav |
MFC r295125: MFV r294821: 6529 Properly handle updates of variably-sized SA entries.
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Ned Bass <bass6@llnl.gov> Reviewed by: Tim Chase <tim@chase2k.com> Approved by: Gordon Ross <gwr@nexenta.com> Author: Andriy Gapon <avg@icyb.net.ua>
illumos/illumos-gate@e7e978b1f75353cb29673af9b35453c20c2827bf
During the update process in sa_modify_attrs(), the sizes of existing variably-sized SA entries are obtained from sa_lengths[]. The case where a variably-sized SA was being replaced neglected to increment the index into sa_lengths[], so subsequent variable-length SAs would be rewritten with the wrong length. This patch adds the missing increment operation so all variably-sized SA entries are stored with their correct lengths.
Another problem was that index into attr_desc[] was increased even when an attribute was removed. If that attribute was not the last attribute, then the last attribute was lost. |
297104 |
21-Mar-2016 |
mav |
MFC r294820: MFV r294819: 6495 Fix mutex leak in dmu_objset_find_dp
Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Albert Lee <trisk@omniti.com> Author: Steven Hartland <steven.hartland@multiplay.co.uk>
illumos/illumos-gate@2bad22584defe4667f99737e3158d336e4dcca11 |
297103 |
21-Mar-2016 |
mav |
MFC r294817: MFV r294816: 4986 receiving replication stream fails if any snapshot exceeds refquota
Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Gordon Ross <gordon.ross@nexenta.com> Author: Dan McDonald <danmcd@omniti.com>
illumos/illumos-gate@5878fad70d76d8711f6608c1f80b0447601261c6 |
297102 |
21-Mar-2016 |
mav |
MFC r294815: MFV r294814: 6393 zfs receive a full send as a clone
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Paul Dagnelie <pcd@delphix.com>
illumos/illumos-gate@68ecb2ec930c4b0f00acaf8e0abb2b19c4b8b76f
This allows to do a full (non-incremental send) and receive it as a clone of an existing dataset. It can leverage nopwrite to share blocks with the origin. This can be used to change the relationship of datasets on the target. For example, maybe on the source you have:
A ---- B ---- C
And you have sent to the target a full of B, and the incremental B->C:
B ---- C
You later realize that you want to have A on the target. You will have to do a full send of A, but nopwrite can save you space on the target if you receive it as a clone of B, assuming that A and B have some blocks inxi common:
B ---- C \ A |
297101 |
21-Mar-2016 |
mav |
MFC r294813: MFV r294812: 6434 sa_find_sizes() may compute wrong SA header size
Reviewed-by: Ned Bass <bass6@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Andriy Gapon <avg@freebsd.org> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: James Pan <jiaming.pan@yahoo.com>
illumos/illumos-gate@3502ed6e7cb3f3d2e781960ab8fe465fdc884834 |
297100 |
21-Mar-2016 |
mav |
MFC r294811: MFV r294810: 6414 vdev_config_sync could be simpler
Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Will Andrews <will@firepipe.net>
illumos/illumos-gate@eb5bb58421f46cee79155a55688e6c675e7dd361 |
297099 |
21-Mar-2016 |
mav |
MFC r294809: MFV r294808: 6421 Add missing multilist_destroy calls to arc_fini
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Jorgen Lundman <lundman@lundman.net> Approved by: Robert Mustacchi <rm@joyent.com> Author: Prakash Surya <prakash.surya@delphix.com>
illumos/illumos-gate@57deb2328260c447bf1db25fe74e0eece102733e |
297098 |
21-Mar-2016 |
mav |
MFC r294807: MFV r294806: 6388 Failure of userland copy should return EFAULT
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Richard Yao <ryao@gentoo.org>
illumos/illumos-gate@c71c00bbe8a9cdc7e3f4048b751f48e80441d506 |
297097 |
20-Mar-2016 |
mav |
MFC r294805: MFV r294804: 6386 Fix function call with uninitialized value in vdev_inuse
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Richard Yao <ryao@gentoo.org>
illumos/illumos-gate@5bdd995ddb777f538bfbcc5e2d5ff1bed07ae56e |
297096 |
21-Mar-2016 |
mav |
MFC r294803: MFV r294802: 6334 Cannot unlink files when over quota
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Toomas Soome <tsoome@me.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Simon Klinkert <simon.klinkert@gmail.com>
illumos/illumos-gate@6575bca01367958c7237253d88e5fa9ef0b1650a |
297095 |
21-Mar-2016 |
mav |
MFC r294801: MFV r294800: 6385 Fix unlocking order in zfs_zget
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Andriy Gapon <avg@freebsd.org> Approved by: Robert Mustacchi <rm@joyent.com> Author: Richard Yao <ryao@gentoo.org>
illumos/illumos-gate@eaef6a96de3f6afbbccc69bd7a0aed4463689d0a |
297094 |
21-Mar-2016 |
mav |
MFC r294799: MFV r294798: 6292 exporting a pool while an async destroy is running can leave entries in the deferred tree
Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Andriy Gapon <avg@FreeBSD.org> Reviewed by: Fabian Keil <fk@fabiankeil.de> Approved by: Gordon Ross <gordon.ross@nexenta.com>
illumos/illumos-gate@a443cc80c742af740aa82130db840f02b4389365 |
297093 |
20-Mar-2016 |
mav |
MFC r294797: MFV r294796: 6319 assertion failed in zio_ddt_write: bp->blk_birth == txg
Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com>
illumos/illumos-gate@b39b744be78c6327db43c1f69d11c2f5909f73cb
This is revert of 5693. |
297092 |
20-Mar-2016 |
mav |
MFC r294794: MFV r294793: 6367 spa_config_tryenter incorrectly handles the multiple-lock case
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com> Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com> Reviewed by: Prashanth Sreenivasa <prashksp@gmail.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: Steven Hartland <steven.hartland@multiplay.co.uk> Approved by: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@e495b6e6735b803e422025a630352ef9bba788c5 |
297091 |
21-Mar-2016 |
mav |
MFC r294625 (by trasz): Fix ru_oublocks accounting for ZFS. There are two code paths that can be called from zfs_write() - one of them, through dmu_write(), was handled correctly; the other wasn't.
Differential Revision: https://reviews.freebsd.org/D4923 |
297090 |
20-Mar-2016 |
mav |
MFC r293677 (by asomers): Record physical path information in ZFS Vdevs
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c: If available, record the physical path of a vdev in ZFS meta-data. Do this both when opening the vdev, and when receiving an attribute change notification from GEOM.
Make vdev_geom_close() synchronous instead of deferring its work to a GEOM event handler. There is no benefit to deferring the work and this prevents a future open call from referencing a consumer that is scheduled for destruction. The close followed by an immediate open will occur during a vdev reprobe triggered by any type of I/O error.
Consolidate vdev_geom_close() and vdev_geom_detach() into vdev_geom_close() and vdev_geom_close_locked(). This also moves the cross linking operations between vdev and GEOM consumer into a single place (linking in vdev_geom_attach() and unlinking in vdev_geom_close_locked()).
Differential Revision: https://reviews.freebsd.org/D4524 |
297087 |
20-Mar-2016 |
mav |
MFC r290266 (by avg): zfs: allow the lookup of extended attributes of an unlinked file
That's required for extattr_get_fd(2) and the like to work properly.
PR: 203201 |
297085 |
20-Mar-2016 |
mav |
MFC r274627 (by avg): Revert r269093 which introduced physical zio alignment transform
Size of physical ZIOs must never be implicitly adjusted, it's a responsibility of a caller to make sure that such a ZIO has proper offset and size.
Discussed with: delphij, gibbs |
297084 |
20-Mar-2016 |
mav |
MFV r258597 (by pjd): When append-only, immutable or read-only flag is set don't allow for hard links creation. This matches UFS behaviour.
Reported by: Oleg Ginzburg <olevole@olevole.ru> |
297083 |
20-Mar-2016 |
mav |
MFC r262990: MFV r262983:
4638 Panic in ZFS via rfs3_setattr()/rfs3_write(): dirtying snapshot!
illumos/illumos-gate@2144b121c08e0eb676cc6ca4662ebbc9f9c22fe3 |
297082 |
20-Mar-2016 |
mav |
MFC r272359 (by will): zfsvfs_create(): Refuse to mount datasets whose names are too long.
This is checked for in the zfs_snapshot_004_neg STF/ATF test (currently still in projects/zfsd rather than head).
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c: - zfsvfs_create(): Check whether the objset name fits into statfs.f_mntfromname, and return ENAMETOOLONG if not. Although the filesystem can be unmounted via the umount(8) command, any interface that relies on iterating on statfs (e.g. libzfs) will fail to find the filesystem by its objset name, and thus assume it's not mounted. This causes "zfs unmount", "zfs destroy", etc. to fail on these filesystems, whether or not -f is passed.
MFSpectraBSD: 974872 on 2013/08/09 |
297081 |
20-Mar-2016 |
mav |
MFC r277503 (by will): Ignore sync requests from the system syncher, i.e. VFS_SYNC(waitfor=MNT_LAZY).
ZFS already commits outstanding data every zfs_txg_timeout seconds, so these syncs are unnecessarily intrusive.
MFSpectraBSD: 1105759 on 2014/12/11 |
297080 |
20-Mar-2016 |
mav |
MFC r277501 (by will): Eliminate an #ifdef illumos for zfs_ioc_rename().
Since allow_mounted is a FreeBSD-specific change, default to B_TRUE, then locally check for the magic bit. Unconditionally check allow_mounted below. Convert the setting of allow_mounted to an explicit boolean.
MFSpectraBSD: 672578 (in part) on 2013/07/19 |
297079 |
20-Mar-2016 |
mav |
MFC r286223 (by smh): Fix KSTACK_PAGES check in ZFS module
The check introduced by r285946 failed to add the dependency on opt_kstack_pages.h which meant the default value for the platform instead of the customised options KSTACK_PAGES=X was being tested.
Also wrap in #ifdef __FreeBSD__ for portability. |
297078 |
20-Mar-2016 |
mav |
MFC r274304 (by delphij): MFV r274272 and diff reduction with upstream.
Illumos issue: 5244 zio pipeline callers should explicitly invoke next stage |
297077 |
20-Mar-2016 |
mav |
MFC r277300 (by smh): Mechanically convert cddl sun #ifdef's to illumos
Since the upstream for cddl code is now illumos not sun, mechanically convert all sun #ifdef's to illumos #ifdef's which have been used in all newer code for some time.
Also do a manual pass to correct the use if #ifdef comments as per style(9) as well as few uses of #if defined(__FreeBSD__) vs #ifndef illumos. |
297076 |
20-Mar-2016 |
mav |
MFC r271785: Reorder sysctls for spa.c global tunables; add sysctl for ccw_retry_interval. |
297075 |
20-Mar-2016 |
mav |
MFC r269222: Reschedule the 'deadman' callout after handling, this makes our code behave more like it is on Solaris.
Differential Revision: https://phabric.freebsd.org/D457 |
297074 |
20-Mar-2016 |
mav |
MFC r271788 (by will): Enable ZFS debug flags to be modified via vfs.zfs.debug_flags.
This is primarily only of interest to ZFS developers, but it makes it easier to get additional debugging.
MFSpectraBSD: 517074 on 2011/12/15 (by will), 662343 on 2013/03/20 (by gibbs) |
297073 |
20-Mar-2016 |
mav |
MFC r277492 (by will): Add vfs.zfs.reference_tracking_enable sysctl/tunable.
This is primarily for developer/debugging use; it enables built-in tagged tracking of refcounts inside ZFS. It can only be enabled from the loader, since it modifies how in-core state is managed. Default remains disabled. |
297071 |
20-Mar-2016 |
mav |
MFC r271781i (by will): bpobj_iterate_impl(): Close a refcount leak iterating on a sublist.
If bpobj_space() returned non-zero here, the sublist would have been left open, along with the bonus buffer hold it requires. This call does not invoke any calls to bpobj_close() itself.
This bug doesn't have any known vector, but was found on inspection.
MFC after: 1 week Sponsored by: Spectra Logic Affects: All ZFS versions starting 21 May 2010 (illumos cde58dbc) MFSpectraBSD: r1050998 on 2014/03/26 |
297067 |
20-Mar-2016 |
mav |
MFC r264670: MFV r264667:
4752 fan out read zio taskqs
illumos/illumos-gate@1b497ab83e8f1c58bba5da59c649207a442a4720 |
296629 |
10-Mar-2016 |
smh |
MFC r296610:
ZFS send fails to transmit some holes
PR: 207714 Approved by: re (gjb) Sponsored by: Multiplay |
294843 |
26-Jan-2016 |
asomers |
MFC r292066, r292069, r293708, r294027, and r294358, mostly to vdev_geom.c
r292066 | asomers | 2015-12-10 14:46:21 -0700 (Thu, 10 Dec 2015) | 25 lines
During vdev_geom_open, require that the vdev guids match the device's label except during split, add, or create operations. This fixes a bug where the wrong disk could be returned, and higher layers of ZFS would immediately eject it again.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c: o When opening by GUID, require both the pool and vdev GUIDs to match. While it is highly unlikely for two vdevs to have the same vdev GUIDs, the ZFS storage pool allocator only guarantees they are unique within a pool.
o Modify the open behavior to: - If we are opening a vdev that hasn't previously been opened, open by path without checking GUIDs. - Otherwise, open by path and verify GUIDs. - If that fails, search all geom providers for a device with matching GUIDs. - If that fails, return ENOENT.
r292069 | asomers | 2015-12-10 17:04:13 -0700 (Thu, 10 Dec 2015) | 6 lines
Change an important error message from ZFS_LOG to printf
r293708 | asomers | 2016-01-11 15:15:46 -0700 (Mon, 11 Jan 2016) | 16 lines
Fix importing l2arc device by guid
After r292066, vdev_geom verifies both the vdev and pool guids of device labels during open. However, spare and l2arc devices don't have pool guids, so opening them by guid will fail (opening by path, when the pathname is known, still succeeds). This change allows a vdev to be opened by guid if the label contains no pool_guid, which is the case for inactive spares and l2arc devices.
r294027 | asomers | 2016-01-14 11:19:05 -0700 (Thu, 14 Jan 2016) | 14 lines
Fix race condition involving ZFS remove events
When a ZFS drive disappears, ZFS sends a resource.fs.zfs.removed event to userland. A userland program like zfsd(8) can use that event, for example to activate a hotspare. The current code contains a race condition: vdev_geom will sent the sysevent _before_ spa.c would update the vdev's status, causing userland processes to see pool state that does not reflect the device removal. This change moves the sysevent to spa.c, closing the race.
r294358 | asomers | 2016-01-19 16:16:24 -0700 (Tue, 19 Jan 2016) | 10 lines
Quell harmless CID about unchecked return value in nvlist_get_guids.
The return value doesn't need to be checked, because nvlist_get_guid's callers check the returned values of the guids. |
294334 |
19-Jan-2016 |
dim |
MFC r294102:
MFV r294101: 6527 Possible access beyond end of string in zpool comment
Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Gordon Ross <gwr@nexenta.com>
illumos/illumos-gate@2bd7a8d078223b122d65fea49bb8641f858b1409
This fixes erroneous double increments of the 'check' variable in a loop in spa_prop_validate(). I ran into this in the clang380-import branch, where clang 3.8.0 warns about it. (It is already fixed there.) |
294188 |
16-Jan-2016 |
allanjude |
MFC: r287337 Apply the noline attribute to vdev_queue_max_async_writes
This makes it possible to analyze the performance of the new ZFS write throttle with dtrace
PR: 200316 Sponsored by: FreeBSD Mastery: Advanced ZFS |
293413 |
08-Jan-2016 |
stas |
MFC r291545: make the number of fasttrap probes and the size of the trace points hash table tunable via sysctl or kernel tunables. |
292973 |
31-Dec-2015 |
ngie |
MFC nv(3) and part of nv(9) to stable/10
This includes the following revisions from head:
r258065,r258594,r259430,r260222,r261407,r261408,r263479,r264021,r266351, r269603,r271026,r271027,r271028,r271241,r271578,r271579,r271847,r272102, r272843,r273752,r277920,r277921,r277925,r277926,r277927,r279421,r279422, r279423,r279424,r279425,r279426,r279427,r279428,r279429,r279430,r279431, r279432,r279434,r279435,r279436,r279438,r279439,r279440,r279760,r282122, r282254,r282257,r282304,r282312,r285339,r288340
This change reverts stable/10@r282122 and stable/10@r288340, and re-MFCs the series again (r282122, r285339, and r288340).
More changes are pending to nv(9)/pci(4) after further review/work. Please see the Phabricator review for more details (both https://reviews.freebsd.org/D4232 and https://reviews.freebsd.org/D4249 ).
- Tested with: -- Booting VMware Fusion 8.1.0 running on a Haswell Apple Macbook Pro -- Booting a Haswell machine with zfs and running some stress workloads with VirtualBox guests -- make tinderbox -- kyua test -k /usr/tests/lib/libnv
Differential Revision: https://reviews.freebsd.org/D4249 (part of a larger diff) Relnotes: yes Reviewed by: oshogbo (implicit), sbruno (implicit) Submitted by: Kevin Bowling <kevin.bowling@kev009.com> Sponsored by: EMC / Isilon Storage Division |
290893 |
15-Nov-2015 |
ngie |
MFC r289195:
Integrate the tests from lib/libarchive, usr.bin/cpio, and usr.bin/tar in to the FreeBSD test suite
functional_test.sh was ported from bin/sh/tests/functional_test.sh, as a small wrapper around libarchive_test, bsdcpio_test, and bsdtar_test provided by upstream.
A handful of testcases in lib/libarchive/tests have been disabled as they were failing when run with kyua test (see BROKEN_TESTS in lib/libarchive/tests/Makefile)
As a sidenote: this removes the check/test targets from the Makefiles as they don't match the pattern used in the rest of the FreeBSD test suite.
Sponsored by: EMC / Isilon Storage Division
Conflicts: lib/libarchive/test usr.bin/cpio/test |
290766 |
13-Nov-2015 |
mav |
MFC r290191 (by avg): l2arc: do not call trim_map_free() for blocks with zero b_asize
b_asize can be zero if the block is compressed into an empty block (ZIO_COMPRESS_EMPTY) and the trim code asserts that meaningless zero-sized trimming is not attempted. The logic for calling trim_map_free() is extracted into a new function l2arc_trim() to minimize code duplication.
PR: 203473 Reported by: Willem Jan Withagen <wjw@digiware.nl> Tested by: Willem Jan Withagen <wjw@digiware.nl> |
290765 |
13-Nov-2015 |
mav |
MFC r289562: 6328 Fix cstyle errors in zfs codebase
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed by: Jorgen Lundman <lundman@lundman.net> Approved by: Robert Mustacchi <rm@joyent.com> Author: Paul Dagnelie <pcd@delphix.com>
illumos/illumos-gate@9a686fbc186e8e2a64e9a5094d44c7d6fa0ea167 |
290761 |
13-Nov-2015 |
mav |
MFC r289527: 5561 support root pools on EFI/GPT partitioned disks 5125 update zpool/libzfs to manage bootable whole disk pools (EFI/GPT labeled disks)
Reviewed by: Jean McCormack <jean.mccormack@nexenta.com> Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Hans Rosenfeld <hans.rosenfeld@nexenta.com>
illumos/illumos-gate@1a902ef8628b0dffd6df5442354ab59bb8530962
This is NOP changes for FreeBSD. |
290757 |
13-Nov-2015 |
mav |
MFC r289422: 4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R
Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Approved by: Garrett D'Amore <garrett@damore.org> Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@45818ee124adeaaf947698996b4f4c722afc6d1f
This is only a partial merge of respective ZFS infrastructure changes. At this moment FreeBSD kernel has no those crypto algorithms, so the parts of the code to enable them are commented out. When they are implemented, it will be trivial to plug them in. |
290756 |
13-Nov-2015 |
mav |
MFC r289362, r289445: 2605 want to resume interrupted zfs send
Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed by: Xin Li <delphij@freebsd.org> Reviewed by: Arne Jansen <sensille@gmx.net> Approved by: Dan McDonald <danmcd@omniti.com> Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@9c3fd1216fa7fb02cfbc78a2518a686d54b48ab8
For more info, see: - slides http://www.slideshare.net/MatthewAhrens/openzfs-send-and-receive - video https://www.youtube.com/watch?v=iY44jPMvxog - manpage changes (for zfs resume -s and zfs send -t) - upcoming talk at the OpenZFS Developer Summit
The TL;DR is: Use "zfs receive -s" to save the partially received state on failure. On failure, get the receive token with "zfs get receive_resume_token <fs>" Resume the send with "zfs send -t <token_value>"
Relnotes: yes |
290754 |
13-Nov-2015 |
mav |
MFC r289309: 6267 dn_bonus evicted too early
Reviewed by: Richard Yao <ryao@gentoo.org> Reviewed by: Xin LI <delphij@freebsd.org> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Justin T. Gibbs <gibbs@FreeBSD.org>
illumos/illumos-gate@d2058105c61ec61df3a2dd3f839fed8c3fe7bfd6 |
290753 |
13-Nov-2015 |
mav |
MFC r289307: 6295 metaslab_condense's dbgmsg should include vdev id
Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Andriy Gapon <avg@freebsd.org> Reviewed by: Xin Li <delphij@freebsd.org> Reviewed by: Justin Gibbs <gibbs@scsiguy.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Joe Stein <joe.stein@delphix.com>
illumos/illumos-gate@daec38ecb4fb5e73e4ca9e99be84f6b8c50c02fa |
290752 |
13-Nov-2015 |
mav |
MFC r289305: 6293 ztest failure: error == 28 (0xc == 0x1c) in ztest_tx_assign()
Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@8fe00bfb8790ad51653f67b01d5ac14256cbb404 |
290751 |
13-Nov-2015 |
mav |
MFC r289299: 6286 ZFS internal error when set large block on bootfs
Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Andriy Gapon <avg@FreeBSD.org> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@6de9bb5603e65b16816b7ab29e39bac820e2da2b |
290750 |
13-Nov-2015 |
mav |
MFC r289297: 6288 dmu_buf_will_dirty could be faster
Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Justin Gibbs <gibbs@scsiguy.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@0f2e7d03b8f588387cb8dd8dd500cbe5ff4484e0 |
290749 |
13-Nov-2015 |
mav |
MFC r289295: 5219 l2arc_write_buffers() may write beyond target_sz
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Saso Kiselkov <skiselkov@gmail.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Steven Hartland <steven.hartland@multiplay.co.uk> Reviewed by: Justin Gibbs <gibbs@FreeBSD.org> Approved by: Matthew Ahrens <mahrens@delphix.com> Author: Andriy Gapon <avg@freebsd.org>
illumos/illumos-gate@d7d9a6d919f92d74ea0510a53f8441396048e800 |
290748 |
13-Nov-2015 |
mav |
MFC r289192: 6281 prefetching should apply to 1MB reads
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Alexander Motin <mav@freebsd.org> Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: Justin Gibbs <gibbs@scsiguy.com> Reviewed by: Xin Li <delphij@freebsd.org> Approved by: Gordon Ross <gordon.ross@nexenta.com> Author: George Wilson <george.wilson@delphix.com>
illumos/illumos-gate@632802744ef6d17e06d6980a95f631615c3b060f |
290747 |
13-Nov-2015 |
mav |
MFC r289191, r289194: 6251 add tunable to disable free_bpobj processing
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Simon Klinkert <simon.klinkert@gmail.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed by: Albert Lee <trisk@omniti.com> Reviewed by: Xin Li <delphij@freebsd.org> Approved by: Garrett D'Amore <garrett@damore.org> Author: George Wilson <george.wilson@delphix.com>
illumos/illumos-gate@139510fb6efa97dbe5f5479594b308d940cab8d1 |
290746 |
13-Nov-2015 |
mav |
MFC r289190: 6250 zvol_dump_init() can hold txg open
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Albert Lee <trisk@omniti.com> Reviewed by: Xin Li <delphij@freebsd.org> Approved by: Garrett D'Amore <garrett@damore.org> Author: George Wilson <george.wilson@delphix.com>
illumos/illumos-gate@b10bba72460aeaa53119c76ff5e647fd5585bece |
290745 |
13-Nov-2015 |
mav |
MFC r287745: 5997 FRU field not set during pool creation and never updated
ZFS already supports storing the vdev FRU in a vdev property. There is code in libzfs to work with this property, and there is code in the zfs-retire FMA module that looks for that information. But there is no code actually setting or updating the FRU.
To address this, ZFS is changed to send a handful of new events whenever a vdev is added, attached, cleared, or onlined, as well as when a pool is created or imported.
Note that syseventd is not currently available on FreeBSD and thus some work is needed to actually support the new ZFS events (e.g. in zfsd) to actually use this capability, this changeset is mostly a diff reduction from upstream.
illumos/illumos-gate@1437283407f89cab03860accf49408f94559bc34
Illumos issues:
5997 FRU field not set during pool creation and never updated https://www.illumos.org/issues/5997 |
290713 |
12-Nov-2015 |
smh |
MFC r290401 & r290466
Provide information about bad DVA
Sponsored by: Multiplay |
290712 |
12-Nov-2015 |
smh |
MFC r290399:
Allow zfs_recover to be changed at runtime
Sponsored by: Multiplay |
290402 |
05-Nov-2015 |
smh |
MFC r276450
Correct zpool list displaying invalid EXPANDSZ for unavailable pool vdevs
Sponsored by: Multiplay |
289805 |
23-Oct-2015 |
avg |
MFC r288340: define aok in libnvpair which is linked to all zfs libraries that need aok |
289100 |
10-Oct-2015 |
delphij |
MFC r288204: MFV r288063:
make dataset property de-registration operation O(1) |
288825 |
05-Oct-2015 |
mav |
MFC r288579: Restore original array_rd_sz semantics.
Before r278702 prefetch was blocked for I/Os > 1MB, after -- >= 1MB. 1MB I/Os are used for bulk operations in CTL (XCOPY, VERIFY), and disabling prefetch for them reduced the performance. |
288599 |
03-Oct-2015 |
mav |
MFC r288064 (by avg): 6220 memleak in l2arc on debug build
illumos/illumos-gate/commit/c546f36aa898d913ff77674fb5ff97f15b2e08b4 https://www.illumos.org/issues/6220 5408 introduced a memleak in l2arc, namely the member b_thawed gets leaked when an arc_hdr is realloced from full to l2only.
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Reviewed by: Simon Klinkert <simon.klinkert@gmail.com> Reviewed by: George Wilson <george@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Arne Jansen <sensille@gmx.net> |
288597 |
03-Oct-2015 |
mav |
MFC r287744 (by delphij): Reduce diff against upstream. |
288596 |
03-Oct-2015 |
mav |
MFC r287706 (by delphij): 6214 zpools going south
In r286570 (MFV of r277426) an unprotected write to b_flags to set the compression mode was introduced. This would open a race window where data is partially decompressed, modified, checksummed and written to the pool, resulting in pool corruption due to the partial decompression.
Prevent this by reintroducing b_compress
illumos/illumos-gate@d4cd038c92c36fd0ae35945831a8fc2975b5272c
Illumos issues:
6214 zpools going south https://www.illumos.org/issues/6214 |
288595 |
03-Oct-2015 |
mav |
MFV r287703, r287705 (by delphij): 6091 avl_add doesn't assert on non-debug builds
Use assfail() from libuutil instead of ASSERT() in userland AVL avl_add.
illumos/illumos-gate@faa2b6be2fc102adf9ed584fc1a667b4ddf50d78
Illumos issues:
6091 avl_add doesn't assert on non-debug builds https://www.illumos.org/issues/6091 |
288594 |
03-Oct-2015 |
mav |
MFC r287702: 5987 zfs prefetch code needs work
Rewrite the ZFS prefetch code to detect only forward, sequential streams.
The following kstats have been added:
kstat.zfs.misc.arcstats.sync_wait_for_async
How many sync reads have waited for async read to complete. (less is better)
kstat.zfs.misc.arcstats.demand_hit_predictive_prefetch
How many demand read didn't have to wait for I/O because of predictive prefetch. (more is better)
zfetch kstats have been similified to hits, misses, and max_streams, with max_streams representing times when we were not able to create new stream because we already have the maximum number of sequences for a file.
The sysctl variable/loader tunable vfs.zfs.zfetch.block_cap have been replaced by vfs.zfs.zfetch.max_distance, which controls maximum bytes to prefetch per stream.
illumos/illumos-gate@cf6106c8a0d6598b045811f9650d66e07eb332af
Illumos ZFS issues:
5987 zfs prefetch code needs work https://www.illumos.org/issues/5987 |
288592 |
03-Oct-2015 |
mav |
MFC r287283 (by delphij): Fix a buffer overrun which may lead to data corruption, introduced in r286951 by reinstating changes in r274628.
In l2arc_compress_buf(), we allocate a buffer to stash away the compressed data in 'cdata', allocated of l2hdr->b_asize bytes.
We then ask zio_compress_data() to compress the buffer, b_l1hdr.b_tmp_cdata, which is of l2hdr->b_asize bytes, and have the compressed size (or original size, if compress didn't gain enough) stored in csize.
To pad the buffer to fit the optimal write size, we round up the compressed size to L2 device's vdev_ashift.
Illumos code rounds up the size by at most SPA_MINBLOCKSIZE. Because we know csize <= b_asize, and b_asize is integer multiple of SPA_MINBLOCKSIZE, we are guaranteed that the rounded up csize would be <= b_asize. However, this is not necessarily true when we round up to 1 << vdev_ashift, because it could be larger than SPA_MINBLOCKSIZE.
So, in the worst case scenario, we are overwriting at most
(1 << vdev_ashift - SPA_MINBLOCKSIZE)
bytes of memory next to the compressed data buffer.
Andriy's original change in r274628 reorganized the code a little bit, by moving the padding to after we determined that the compression was beneficial. At which point, we would check rounded size against the allocated buffer size, and the buffer overrun would not be possible. |
288591 |
03-Oct-2015 |
mav |
MFC r287280 (by delphij): In r286705 (Illumos 5960/a2cdcdd), a separate thread is created with curproc as parent. In the case of a send or receive, the curproc would be the userland application that issues the ioctl. This would trigger an assertion failure introduced in Solaris compatibility shims in r196458 when kernel is compiled with INVARIANTS.
Fix this by using p0 (proc0 or kernel) as the parent thread when creating the kernel threads. |
288590 |
03-Oct-2015 |
mav |
MFC r287103 (by avg): 5692 expose the number of hole blocks in a file
FreeBSD porting notes: - only kernel-side changes are merged - the new ioctl is not actually implemented yet - thus, the goal is to synchronize DMU code
illumos/illumos-gate@2bcf0248e992f292c7b814458bcdce2f004925d6
https://www.illumos.org/issues/5692 we would like to expose the number of hole (sparse) blocks in a file. this can be useful to for example if you want to fill in the holes with some data; knowing the number of holes in advances allows you to report progress on hole filling. We could use SEEK_HOLE to do that but it would be O(n) where n is the number of holes present in the file.
Author: Max Grossman <max.grossman@delphix.com> Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Boris Protopopov <bprotopopov@hotmail.com> Approved by: Richard Lowe <richlowe@richlowe.net> |
288589 |
03-Oct-2015 |
mav |
MFC r286983 (by avg): fix a mismerge in r286539 (MFV 286538: 5562 ZFS sa_handle's violate...) |
288588 |
03-Oct-2015 |
mav |
MFC r286951: Restore part of r274628, reverted at r286776. |
288587 |
03-Oct-2015 |
mav |
MFC r286776: Remove some random accumulated diff from Illumos. |
288586 |
03-Oct-2015 |
mav |
MFC r286774: 2618 arc.c mistypes in the comments
Reviewed by: Jason King <jason.brian.king@gmail.com> Reviewed by: Josef Sipek <jeffpc@josefsipek.net> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Bart Coddens <bart.coddens@gmail.com>
illumos/illumos-gate@fc98fea58e89224f6f13d7fae246d6cb5dfa35ea |
288585 |
03-Oct-2015 |
mav |
MFC r286770: Fix r286766 build with debug. |
288584 |
03-Oct-2015 |
mav |
MFC r286767: Fix minor mismerge sometimes earlier. |
288583 |
03-Oct-2015 |
mav |
MFC r286766: 5817 change type of arcs_size from uint64_t to refcount_t
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com> Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Reviewed by: Richard Elling <richard.elling@richardelling.com> Approved by: Garrett D'Amore <garrett@damore.org> Author: Prakash Surya <prakash.surya@delphix.com>
illumos/illumos-gate@2fd872a734cf486007a8dba532cec52bfb4d40e5
As a way to make it more difficult to introduce bugs into the ARC, and to make it easier to diagnose issues when bugs do creep in, it would be beneficial to change the type of the arc_state_t's arcs_size field to be a refcount_t instead of a uint64_t. This would allow us to make stricter checks when incrementing and decrementing the value with debugging enabled, but still fallback to simple, fast atomic operations when debugging is disabled. |
288582 |
03-Oct-2015 |
mav |
MFC r286764: 6033 arc_adjust() should search MFU lists for oldest buffer when adjusting MFU size.
illumos/illumos-gate@31c46cf23cd1cf4d66390a983dc5072d7d299ba2
https://www.illumos.org/issues/6033 When we're looking for the list containing oldest buffer we never actually look at the MFU lists even when we try to evict from MFU. looks like a copy paste error, the fix is here:
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Reviewed by: Xin Li <delphij@delphij.net> Reviewed by: Prakash Surya <me@prakashsurya.com> Approved by: Matthew Ahrens <mahrens@delphix.com> Author: Alek Pinchuk <alek@nexenta.com> Obtained from: illumos |
288581 |
03-Oct-2015 |
mav |
MFC r286763: 5497 lock contention on arcs_mtx
Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Richard Elling <richard.elling@richardelling.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Prakash Surya <prakash.surya@delphix.com>
illumos/illumos-gate@244781f10dcd82684fd8163c016540667842f203
This patch attempts to reduce lock contention on the current arc_state_t mutexes. These mutexes are used liberally to protect the number of LRU lists within the ARC (e.g. ARC_mru, ARC_mfu, etc). The granularity at which these locks are acquired has been shown to greatly affect the performance of highly concurrent, cached workloads. |
288580 |
03-Oct-2015 |
mav |
MFC r286762: Revert part of r205231, introducing multiple ARC state locks.
This local implementation will be replaced by one from Illumos to reduce code divergence and make further merges easier. |
288576 |
03-Oct-2015 |
mav |
Fix r288549 build on stable.
For some reason this (odd) code builds on head, but not on stable. |
288574 |
03-Oct-2015 |
mav |
MFC r286712: 6096 ZFS_SMB_ACL_RENAME needs to cleanup better
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Gordon Ross <gordon.w.ross@gmail.com> Reviewed by: George Wilson <gwilson@zfsmail.com> Approved by: Robert Mustacchi <rm@joyent.com>
illumos/illumos-gate@8f5190a540d64d2debee6664bbc740e4c38f5b7f |
288573 |
03-Oct-2015 |
mav |
MFC r286710: 6093 zfsctl_shares_lookup should only VN_RELE() on zfs_zget() success
Reviewed by: Gordon Ross <gwr@nexenta.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Dan McDonald <danmcd@omniti.com>
illumos/illumos-gate@0f92170f1ec2737ee5a0e51b5f74093904811452 |
288572 |
03-Oct-2015 |
mav |
MFC r286708: 5959 clean up per-dataset feature count code
Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@ca0cc3918a1789fa839194af2a9245f801a06b1a
A ZFS feature flags (large blocks) tracks its refcounts as the number of datasets that have ever used the feature. Several features of this type are planned to be added (new checksum functions). This code should be made common infrastructure rather than duplicating the code for each feature. |
288571 |
03-Oct-2015 |
mav |
MFC r286705: 5960 zfs recv should prefetch indirect blocks 5925 zfs receive -o origin=
Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Author: Paul Dagnelie <pcd@delphix.com>
While running 'zfs recv' we noticed that every 128th 8K block required a read. We were seeing that restore_write() was calling dmu_tx_hold_write() and the indirect block was not cached. We should prefetch upcoming indirect blocks to avoid having to go to disk and blocking the restore_write().
Allow an incremental send stream to be received as a clone, even if the stream does not mark it as a clone. |
288570 |
03-Oct-2015 |
mav |
MFC r286689: 5981 Deadlock in dmu_objset_find_dp
illumos/illumos-gate@1d3f896f5469c69c1339890ec3d68e9feddb0343
https://www.illumos.org/issues/5981 When dmu_objset_find_dp gets called with a read lock held, it fans out the work to the task queue. Each task in turn acquires its own read lock before calling the callback. If during this process anyone tries to a acquire a write lock, it will stall all read lock requests.Thus the tasks will never finish, the read lock of the caller will never get freed and the write lock never acquired. deadlock.
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Arne Jansen <jansen@webgods.de> |
288569 |
03-Oct-2015 |
mav |
MFC r286686: 5269 zpool import slow
illumos/illumos-gate@12380e1e701fda28c9e9f32d01cafb54af279eb5
https://www.illumos.org/issues/5269 When importing a pool (at boot or with zpool import) with many filesystem, the process can take minutes. It doesn't matter whether the pool has been exported cleanly or uncleanly. The problem is that each dataset has its own log chain. On import, all datasets have to be checked if there are logs to replay. The idea is to speed up this process by paralellizing it.
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Arne Jansen <jansen@webgods.de> |
288568 |
03-Oct-2015 |
mav |
MFC r286683: 5765 add support for estimating send stream size with lzc_send_space when source is a bookmark
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Steven Hartland <killing@multiplay.co.uk> Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com> Approved by: Albert Lee <trisk@nexenta.com> Author: Max Grossman <max.grossman@delphix.com>
illumos/illumos-gate@643da460c8ca583e39ce053081754e24087f84c8 |
288567 |
03-Oct-2015 |
mav |
MFC r286677: 5695 dmu_sync'ed holes do not retain birth time
illumos/illumos-gate@70163ac57e58ace1c5c94dfbe85dca5a974eff36
https://www.illumos.org/issues/5695 In dmu_sync_ready(), a hole block pointer will have it's logical size explicitly set as it's necessary for replay purposes. To "undo" this, dmu_sync_done() will zero out any hole that it finds. This becomes a problem when using the "hole_birth" feature, as this will also wipe out any birth time that might have happened to be set on the hole. ... As a fix, the logic to zero out a hole is only applied to old style holes with a birth time of zero. Holes created with the "hole_birth" feature enabled will have a non-zero birth time, and will be skipped (thus preserving the ltime, type, and level information as well). In addition, zdb was updated to also print the ltime, type, and level information for these new style holes. Previously, only the logical birth time would be printed.
Author: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com> Approved by: Dan McDonald <danmcd@omniti.com> |
288566 |
03-Oct-2015 |
mav |
MFC r286655: Fix set of sign extension bugs in r286625. |
288565 |
03-Oct-2015 |
mav |
MFC r286647: Fix assertion panic caused by combination of r286598 and TRIM. |
288564 |
03-Oct-2015 |
mav |
MFC r286628: Fix r286625 build on i386. |
288563 |
03-Oct-2015 |
mav |
MFC r286626: Fix minor mismerge in r286574. |
288562 |
03-Oct-2015 |
mav |
MFC r286625: 5376 arc_kmem_reap_now() should not result in clearing arc_no_grow
Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Steven Hartland <killing@multiplay.co.uk> Reviewed by: Richard Elling <richard.elling@richardelling.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@2ec99e3e987d8aa273f1e9ba2b983557d058198c |
288561 |
03-Oct-2015 |
mav |
MFC r286623: Remove extra lock, that IMO only creates potential problems now. |
288560 |
03-Oct-2015 |
mav |
MFC r286605: 5812 assertion failed in zrl_tryenter(): zr_owner==NULL
Reviewed by: George Wilson <george@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Reviewed by: Will Andrews <will@freebsd.org> Approved by: Gordon Ross <gwr@nexenta.com> Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@8df173054ca442cd8845a7364c3edad9d6822351 |
288559 |
03-Oct-2015 |
mav |
MFC r286603: 5810 zdb should print details of bpobj
Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Will Andrews <will@freebsd.org> Reviewed by: Simon Klinkert <simon.klinkert@gmail.com> Approved by: Gordon Ross <gwr@nexenta.com> Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@732885fca09e11183dd0ea69aaaab5588fb7dbff |
288558 |
03-Oct-2015 |
mav |
MFC r286600: 5808 spa_check_logs is not necessary on readonly pools
Reviewed by: George Wilson <george@delphix.com> Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com> Reviewed by: Simon Klinkert <simon.klinkert@gmail.com> Reviewed by: Will Andrews <will@freebsd.org> Approved by: Gordon Ross <gwr@nexenta.com> Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@23367a2f2caec1ccb4d918bdd0f2fc2c9cadcd06 |
288557 |
03-Oct-2015 |
mav |
MFC r286598: 5701 zpool list reports incorrect "alloc" value for cache devices |
288555 |
03-Oct-2015 |
mav |
MFC r286593: Local addition and mismerge fix for r286579. |
288554 |
03-Oct-2015 |
mav |
MFC r286589: 5820 verify failed in zio_done(): BP_EQUAL(bp, io_bp_orig)
Reviewed by: Alex Reece <alex@delphix.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Steven Hartland <killing@multiplay.co.uk> Approved by: Garrett D'Amore <garrett@damore.org> Author: Matthew Ahrens <mahrens@delphix.com>
illumod/illumos-gate@34e8acef009195effafdcf6417aec385e241796e |
288553 |
03-Oct-2015 |
mav |
MFC r286587: 5746 more checksumming in zfs send
Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com> Approved by: Albert Lee <trisk@omniti.com> Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@98110f08fa182032082d98be2ddb9391fcd62bf1 |
288552 |
03-Oct-2015 |
mav |
MFC r286579: 5313 Allow I/Os to be aggregated across ZIO priority classes Reviewed by: Andriy Gapon <avg@FreeBSD.org> Reviewed by: Will Andrews <willa@SpectraLogic.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Justin T. Gibbs <justing@spectralogic.com>
illumos/illumos-gate@fe319232d24f4ae183730a5a24a09423d8ab4429 |
288550 |
03-Oct-2015 |
mav |
MFC r286576: Fix r286570 build with debug. |
288549 |
03-Oct-2015 |
mav |
MFC r286575: 5056 ZFS deadlock on db_mtx and dn_holds
Reviewed by: Will Andrews <willa@spectralogic.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Justin Gibbs <justing@spectralogic.com>
illumos/illumos-gate@bc9014e6a81272073b9854d9f65dd59e18d18c35 |
288548 |
03-Oct-2015 |
mav |
MFC r286574: 5445 Add more visibility via arcstats; specifically arc_state_t stats and differentiate between "data" and "metadata"
Reviewed by: Basil Crow <basil.crow@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Bayard Bell <bayard.bell@nexenta.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Prakash Surya <prakash.surya@delphix.com>
illumos/illumos-gate@4076b1bf41cfd9f968a33ed54a7ae76d9e996fe8 |
288547 |
03-Oct-2015 |
mav |
MFC r286570: 5408 managing ZFS cache devices requires lots of RAM Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Don Brady <dev.fs.zfs@gmail.com> Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com> Approved by: Garrett D'Amore <garrett@damore.org> Author: Chris Williamson <Chris.Williamson@delphix.com>
illumos/illumos-gate@89c86e32293a30cdd7af530c38b2073fee01411c
Currently, every buffer cached in the L2ARC is accompanied by a 240-byte header in memory, leading to very high memory consumption when using very large cache devices. These changes significantly reduce this overhead.
Currently:
L1-only header = 176 bytes L1 + L2 or L2-only header = 176 bytes + 32 byte checksum + 32 byte l2hdr = 240 bytes
Memory-optimized:
L1-only header = 176 bytes L1 + L2 header = 176 bytes + 32 byte checksum = 208 bytes L2-only header = 96 bytes + 32 byte checksum = 128 bytes
So overall:
Trunk Optimized +-----------------+ L1-only | 176 B | 176 B | (same) +-----------------+ L1 & L2 | 240 B | 208 B | (saved 32 bytes) +-----------------+ L2-only | 240 B | 128 B | (saved 116 bytes) +-----------------+
For an average blocksize of 8KB, this means that for the L2ARC, the ratio of metadata to data has gone down from about 2.92% to 1.56%. For a 'storage optimized' EC2 instance with 1600GB of SSD and 60GB of RAM, this means that we expect a completely full L2ARC to use (1600 GB * 0.0156) / 60GB = 41% of the available memory, down from 78%.
Relnotes: yes |
288546 |
03-Oct-2015 |
mav |
MFC r286556: Avoid 128K kmem allocations in mzap_upgrade()
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Steven Hartland <killing@multiplay.co.uk> Approved by: Rich Lowe <richlowe@richlowe.net>
illumos/illumos-gate@be3e2ab906b80af79c7b22885f279e45ad8fb995 |
288545 |
03-Oct-2015 |
mav |
MFC r286554: 5769 Cast 'zfs bad bloc' to ULL for x86
Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com> Reviewed by: Richard PALO <richard@NetBSD.org> Approved by: Dan McDonald <danmcd@omniti.com>
illumos/illumos-gate@8c76e0763bcf0029556e106377da859f6492a7ee |
288544 |
03-Oct-2015 |
mav |
MFC r286551: 5694 traverse_prefetcher does not prefetch enough
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com> Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com> Approved by: Garrett D'Amore <garrett@damore.org> Author: George Wilson <george.wilson@delphix.com>
illumos/illumos-gate@34d7ce052c4565b078f73b95ccbd49274e98edaa |
288543 |
03-Oct-2015 |
mav |
MFC r286549: 5693 ztest fails in dbuf_verify: buf[i] == 0, due to dedup and bp_override
Reviewed by: George Wilson <george@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@7f7ace370074e350853da254c65688fd43ddc695 |
288542 |
03-Oct-2015 |
mav |
MFC r286547: 5661 ZFS: "compression = on" should use lz4 if feature is enabled
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net> Reviewed by: Xin LI <delphij@freebsd.org> Approved by: Robert Mustacchi <rm@joyent.com> Author: Justin T. Gibbs <justing@spectralogic.com>
illumos/illumos-gate@db1741f555ec79def5e9846e6bfd132248514ffe |
288541 |
03-Oct-2015 |
mav |
MFC r286545: 5630 stale bonus buffer in recycled dnode_t leads to data corruption
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Will Andrews <will@freebsd.org> Approved by: Robert Mustacchi <rm@joyent.com> Author: Justin T. Gibbs <justing@spectralogic.com> |
288539 |
03-Oct-2015 |
mav |
MFC r286543: 5592 NULL pointer dereference in dsl_prop_notify_all_cb()
Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Will Andrews <will@freebsd.org> Approved by: Robert Mustacchi <rm@joyent.com>
illumos/illumos-gate@9d47dec0481d8cd53b2c1053c96bfa3f78357d6a |
288538 |
03-Oct-2015 |
mav |
MFC r286541: 5531 NULL pointer dereference in dsl_prop_get_ds()
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Justin T. Gibbs <justing@spectralogic.com>
illumos/illumos-gate@e57a022b8f718889ffa92adbde47a8f08abcdb25 |
288537 |
03-Oct-2015 |
mav |
MFC r286539: 5562 ZFS sa_handle's violate kmem invariants, debug kernels panic on boot
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Robert Mustacchi <rm@fingolfin.org> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Rich Lowe <richlowe@richlowe.net> Approved by: Dan McDonald <danmcd@omniti.com> Author: Justin T. Gibbs <justing@spectralogic.com>
illumos/illumos-gate@0fda3cc5c1c5a1d9bdea6d52637bef6e781549c9 |
288536 |
03-Oct-2015 |
mav |
MFC r281109: Add DTrace probe to the new ARC reclaim cause added in r281026. |
288521 |
02-Oct-2015 |
mav |
MFC r284591 (by avg): illums compat: use flsl/flsll for highbit/highbit64
Do that only when when fast inline versions are available. At the moment that can be the case only in the kernel and not for all platforms.
The original code uses the binary search and that's kept as a fallback. This is a micro optimization. |
288520 |
02-Oct-2015 |
mav |
MFC r279996 (by smh): Allow zvol_geom_worker to process BIO_DELETE's
If zvol_geom_start is called with a BIO_DELETE from a thread which can sleep it queues it for later processing by the zvol_geom_worker. The zvol_geom_worker didn't have a delete case so would simply loose the bio hence preventing the original caller from every completing. In addition an other unknown types would suffer the same fate.
Allow zvol_geom_worker to process BIO_DELETE's via zvol_strategy and return unsupported for all unknown bio types. |
288519 |
02-Oct-2015 |
mav |
MFC r277826 (by delphij): Diff reduction with upstream. The actual change was merged in r272483 already. |
288518 |
02-Oct-2015 |
mav |
MFC r277452 (by will): Fix arc__shrink DTrace probe's to_free argument.
Remove the unnecessary #ifdef _KERNEL, which did not differ in the true or false cases. Actually set the value of to_free before using it. |
288517 |
02-Oct-2015 |
mav |
MFC r275780 (by delphij): Add a loader tunable, vfs.zfs.arc_meta_min, which controls how much metadata ZFS should keep in ARC at minimum.
In arc_evict(), when doing recycle, take more factors into account by applying the following policy:
1. If no evictable data, evict metadata; 2. If no evictable metadata, evict data; 3. If we hit arc_meta_limit, evict metadata; 4. If we haven't hit arc_meta_min, evict data; 5* (Illumos only, not present in new FreeBSD code, yet) evict the oldest cached element from data and metadata. (FreeBSD) evict the data type specified by caller, which is the existing behavior.
Note that because of our splitted locks (implemented in r205231 to improve scalability by reducing lock contention), implementing the fifth Illumos behavior will not be cheap, so for now just implement the 1-4 and fall back to current behavior for 5.
Illumos issue: 5368 ARC should cache more metadata |
288516 |
02-Oct-2015 |
mav |
MFC r269117: Make sysctls under vfs.zfs.zfetch writeable.
I don't see any reason for them to be read-only, while tuning them without reboot is much more convenient for experiments. |
287667 |
11-Sep-2015 |
avg |
MFC r287100: spa_import_rootpool: prevent lock and resource leak
PR: 198563 |
287665 |
11-Sep-2015 |
avg |
MFC r287099: account for ashift when gathering buffers to be written to l2arc device
The change differs from that in head because of other changes that have not been MFC-ed yet. |
287661 |
11-Sep-2015 |
avg |
MFC r285021: zfs_mount(MS_REMOUNT): protect zfs_(un)register_callbacks calls |
287658 |
11-Sep-2015 |
avg |
MFC r286985: try to fix lor between z_teardown_lock and spa_namespace_lock |
287656 |
11-Sep-2015 |
avg |
MFC r284513: l2arc: pass correct size to trim requests |
287230 |
27-Aug-2015 |
markj |
MFC r286167: Avoid dereferencing curthread->td_proc->p_cred in DTrace probe context. |
286120 |
31-Jul-2015 |
smh |
MFC: r285946 and r285947
Add warning about low KSTACK_PAGES for ZFS use.
Sponsored by: Multiplay |
285717 |
20-Jul-2015 |
jpaetzel |
MFC 278040:
Prevent inlining txg_quiesce
This allows dtrace to monitor the calls to txg_quiesce which can be really helpful.
Also standardize __noinline order for arc_kmem_reap_now.
Sponsored by: Multiplay
Approved by: re |
285202 |
06-Jul-2015 |
avg |
MFC r284593: MFV r284412: 5911 ZFS "hangs" while deleting file
illumos/illumos-gate@46e1baa6cf6d5432f5fd231bb588df8f9570c858 https://www.illumos.org/issues/5911 Sometimes ZFS appears to hang while deleting a file. It is actually making slow progress at the file deletion, but other operations (administrative and writes via the data path) "hang" until the file removal completes, which can take a long time if the file has many blocks. The deletion (or most of it) happens in a single txg, and the sync thread spends most of its time reading indirect blocks...
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com> Reviewed by: Alek Pinchuk <alek@nexenta.com> Reviewed by: Simon Klinkert <simon.klinkert@gmail.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Matthew Ahrens <mahrens@delphix.com>
PR: 199775 Approved by: re(kib) |
285001 |
01-Jul-2015 |
avg |
MFC r284304: MFV r284030: 5818 zfs {ref}compressratio is incorrect with 4k sector size
Note: no MFC to stable/9 because r268075 (vendor r267565) has not been MFC-ed. |
284760 |
24-Jun-2015 |
avg |
MFC r284306: MFV r284036: 5961 Fix stack overflow in zfs_create_fs |
284758 |
24-Jun-2015 |
avg |
MFC r284303: MFV r283534: 5515 dataset user hold doesn't reject empty tags |
284757 |
24-Jun-2015 |
avg |
MFC r284301: MFV r284040: check that datasets are snapshots |
284313 |
12-Jun-2015 |
avg |
MFC r283525: zfs: fixes for a full stream received into an existing dataset |
284203 |
10-Jun-2015 |
kib |
MFC r283602: Prevent dounmount() from acting on the freed (although type-stable) memory by changing the interface to require the mount point to be referenced.
MFC r283629: Add missed {}. |
284138 |
07-Jun-2015 |
pfg |
MFC r278167, MFV r266995:
4767 dtrace_probe() always has the timestamp
Reference: https://illumos.org/issues/4767
Obtained from: Illumos |
284136 |
07-Jun-2015 |
pfg |
MFC r278166, MFV r266993:
4469 DTrace helper tracing should be dynamic
Reference: https://illumos.org/issues/4469
Obtained from: Illumos Phabric: D1551 Reviewed by: markj |
284134 |
07-Jun-2015 |
markj |
MFC r278136, r278137, r278370: Diff reduction with illumos, in preparation for merging r266993 from the vendor branch. No functional change. |
284028 |
05-Jun-2015 |
avg |
MFC r283524: dsl_dataset_promote_check: ensure that shared snaps do not become too long |
284026 |
05-Jun-2015 |
avg |
MFC r282766: zfs ioctls: use fget_write / fget_read instead of getf wrapper for fget |
283872 |
01-Jun-2015 |
kib |
MFC r283515: Remove excess Giant acquisition around the dounmount() call. |
283677 |
29-May-2015 |
markj |
MFC r277915: Don't attempt to disable enabled fasttrap probes in an exiting process.
MFC r277914: fasttrap_sigtrap(): use tdsendsignal() to send SIGTRAP. |
283676 |
29-May-2015 |
markj |
MFC r281915: Make vpanic() externally visible.
MFC r281916: Fix DTrace's panic() action. |
283522 |
25-May-2015 |
avg |
MFC r282475: zfs: do not hold an extra reference on a root vnode |
283520 |
25-May-2015 |
avg |
MFC r282473: dmu_recv_end_check: don't leak hold if dsl_destroy_snapshot_check_impl fails |
283519 |
25-May-2015 |
avg |
MFC r282632: MFV r282630: 5809 Blowaway full receive in v1 pool causes kernel panic |
282995 |
16-May-2015 |
smh |
MFC r282880:
Add copyright info missing from r282205
Sponsored by: Multiplay |
282813 |
12-May-2015 |
smh |
MFC r282205:
Fix misuse of input argument in traverse_visitbp
Obtained from: zfsonlinux (a585f2f844ed3d4270221fed88f5e494eb55d932 Sponsored by: Multiplay |
282764 |
11-May-2015 |
avg |
MFC r282131: replace a comment about zfs recv -F corner case with a longer one |
282760 |
11-May-2015 |
avg |
MFC r282130: zfs_onexit_fd_hold: return EBADF even if devfs_get_cdevpriv gave ENOENT |
282758 |
11-May-2015 |
avg |
MFC r282127: dsl_dir_rename_check: return EXDEV on cross-pool rename attempt |
282756 |
11-May-2015 |
avg |
MFC r282126: FV r282123: 5610 zfs clone from different source and target pools |
282755 |
11-May-2015 |
avg |
MFC r282125: MFV r282124: 5393 spurious failures from dsl_dataset_hold_obj() |
282753 |
11-May-2015 |
avg |
MFC r282122: nvpair_type_is_array: DATA_TYPE_INT8_ARRAY was not recognized |
282748 |
11-May-2015 |
avg |
MFC r275576: remove opensolaris cyclic code, replace with high-precision callouts |
282361 |
03-May-2015 |
mav |
MFC r281026, r281108, r281109: Make ZFS ARC track both KVA usage and fragmentation.
Even on Illumos, with its much larger KVA, ZFS ARC steps back if KVA usage reaches certain threshold (3/4 on i386 or 16/17 otherwise). FreeBSD has even less KVA, but had no such limit on archs with direct map as amd64. As result, on machines with a lot of RAM, during load with very small user- space memory pressure, such as `zfs send`, it was possible to reach state, when there is enough both physical RAM and KVA (I've seen up to 25-30%), but no continuous KVA range to allocate even single 128KB I/O request.
Address this situation from two sides: - restore KVA usage limitations in a way the most close to Illumos; - introduce new requirement for KVA fragmentation, specifying that we should have at least one sequential KVA range of zfs_max_recordsize bytes.
Experiments show that first limitation done alone is not sufficient. On machine with 64GB of RAM it is sometimes needed to drop up to half of ARC size to get at leats one 1MB KVA chunk. Statically limiting ARC to half of KVA/RAM is too strict, so second limitation makes it to work in cycles: accumulate trash up to certain critical mass, do massive spring-cleaning, and then start littering again. |
281958 |
25-Apr-2015 |
delphij |
MFC r281667:
Remove vfs.zfs.snapshot_list_prefetch, the corresponding code was gone in r248571 already. |
281104 |
05-Apr-2015 |
mav |
MFC r280822: Some cosmetic polishing. No functional change. |
280753 |
27-Mar-2015 |
mav |
MFC r279927: Make DIOCGATTR in device mode handle "GEOM::candelete". |
278142 |
03-Feb-2015 |
mav |
MFC r277419: Allow skipping dmu_buf_will_dirty() call in dsl_dir_transfer_space().
dsl_dir_transfer_space() is mostly called after dsl_dir_diduse_space(), which already calls dmu_buf_will_dirty() for the same dbuf and tx, so its duplicate call in those cases will change nothing, only spend time.
Skipping this call by four times reduces time spent in dbuf_write_done() and descendants, updating dataset statistics with several congested lock acquisitions. When rewriting 8K zvol blocks at 1GB/s rate, this reduces CPU time spent inside dbuf_write_done(), according to profiling, from 45% of 683K samples to 18% of 422K. |
278028 |
01-Feb-2015 |
smh |
MFC r276123: Always sync the global ZFS config cache to reflect the new mosconfig
MFC r277351: Clean ZFS spa config before syncing
Sponsored by: Multiplay |
277819 |
28-Jan-2015 |
mav |
MFC r277185: Fix overflow bug from r248577, turning 30s TRIM timeout into ~4s. |
277818 |
28-Jan-2015 |
mav |
MFC r277169: Reimplement TRIM throttling added in r248577.
Previous throttling implementation approached problem from the wrong side. It significantly limited useful delaying of TRIM requests and aggregation potential, while not so much controlled TRIM burstiness under heavy load.
With this change random 4K write benchmarks (probably the worst case for TRIM) show me IOPS increase by 20%, average latency reduction by 30%, peak TRIM bursts reduction by 3 times and same peak TRIM map size (memory usage).
Also the new logic does not force map size down so heavily, really allowing to keep deleted data for 32 TXG or 30 seconds under moderate load. It was practically impossible with old throttling logic, which pushed map down to only 64 segments. |
277760 |
26-Jan-2015 |
mav |
MFC r277096: Skip extra bcopy() when scrubbing vdev without redundancy.
According to profiler, this bcopy() can use about 10% of CPU time. |
277701 |
25-Jan-2015 |
mav |
MFC r276983: When aggregating TRIM segments, move the new one to the end.
New segment at the list head may block all TRIM requests until txg of that segment can be processed. On my random I/O tests this change reduce peak TRIM list length from 650 to 450 segments. Hopefully it should reduce TRIM burstiness when list processing is unblocked. |
277700 |
25-Jan-2015 |
mav |
MFC r276952: Add LBA as secondary sort key for synchronous I/O requests.
On FreeBSD gethrtime() implemented via getnanouptime(), that has 1ms (1/hz) precision. It makes primary sort key (timestamp) collision very possible. In such situations sorting by secondary key of LBA is much more reasonable then by totally meaningless zio pointer value.
With this change on multi-threaded synchronous ZVOL read I've measured 10% throughput increase and average latency reduction. |
277699 |
25-Jan-2015 |
mav |
MFC r276913: Use new optimized dmu_read_uio_dbuf() for ZVOLs in device mode.
This slightly reduces overhead by avoiding dnode_hold()/dnode_rele() calls. |
277618 |
23-Jan-2015 |
delphij |
MFC r275923:
Add missing continue: we can't proceed further if the kernel does not panic with zfs_panic_recover.
Illumos issue: 5438 zfs_blkptr_verify should continue after zfs_panic_recover
Reported by: Coverity CID: 1232014 |
277588 |
23-Jan-2015 |
delphij |
MFC r275922: MFV r275914:
As of r270383, the dbuf_compare comparator compares the dbuf attributes in the following order:
db_level (indirect level) db_blkid (block number) db_state (current state) the address of the element
Because db_state is being considered before the element's state, changing of db_state would affect balancedness of the AVL tree, even when the address of element compares differently. For instance, in dbuf_create, db_state may be altered after the node is inserted into the AVL tree and may break AVL tree balancedness.
Instead of using db_state as a comparision critera (introduced in r270383), consider it only when we are doing a lookup, that is one of the two dbuf pointers contains DB_SEARCH.
Illumos issue: 5422 preserve AVL invariants in dn_dbufs |
277586 |
23-Jan-2015 |
delphij |
MFC r275811: MFV r275783:
Convert ARC flags to use enum. Previously, public flags are defined in arc.h and private flags are defined in arc.c which can lead to confusion and programming errors.
Consistently use 'hdr' (when referencing arc_buf_hdr_t) instead of 'buf' or 'ab' because arc_buf_t are often named 'buf' as well.
Illumos issue: 5369 arc flags should be an enum 5370 consistent arc_buf_hdr_t naming scheme |
277585 |
23-Jan-2015 |
delphij |
MFC r275782: MFV r275551:
Remove "dbuf phys" db->db_data pointer aliases.
Use function accessors that cast db->db_data to the appropriate "phys" type, removing the need for clients of the dmu buf user API to keep properly typed pointer aliases to db->db_data in order to conveniently access their data.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_leaf.c: In zap_leaf() and zap_leaf_byteswap, now that the pointer alias field l_phys has been removed, use the db_data field in an on stack dmu_buf_t to point to the leaf's phys data.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c: Remove the db_user_data_ptr_ptr field from dbuf and all logic to maintain it.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dbuf.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dmu.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dir.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sa.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c: Modify the DMU buf user API to remove the ability to specify a db_data aliasing pointer (db_user_data_ptr_ptr).
cddl/contrib/opensolaris/cmd/zdb/zdb.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_diff.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_traverse.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_bookmark.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_deadlist.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_deleg.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_destroy.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dir.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_prop.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_synctask.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_userhold.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sa.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_history.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_leaf.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dataset.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dir.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zap_impl.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zap_leaf.h: Create and use the new "phys data" accessor functions dsl_dir_phys(), dsl_dataset_phys(), zap_m_phys(), zap_f_phys(), and zap_leaf_phys().
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dataset.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dir.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zap_impl.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zap_leaf.h: Remove now unused "phys pointer" aliases to db->db_data from clients of the DMU buf user API.
Illumos issue: 5314 Remove "dbuf phys" db->db_data pointer aliases in ZFS |
277584 |
23-Jan-2015 |
delphij |
MFC r275781: MFV r275550:
In addition to r273158, make the code in spa_sync() that checks if the current TXG is a no-op TXG less fragile.
Illumos issue: 5347 idle pool may run itself out of space |
277583 |
23-Jan-2015 |
delphij |
MFC r275748: MFV r247174:
Expose arc_meta_limit, et al via kstats.
Note that as a result, vfs.zfs.arc_meta_used is removed. The existing vfs.zfs.arc_meta_limit sysctl/tunable is retained with a SYSCTL_PROC wrapper.
Illumos ZFS issues: 3561 arc_meta_limit should be exposed via kstats
Relnotes: yes |
277582 |
23-Jan-2015 |
delphij |
MFC r275740: MFV r275548:
Verify that the block pointer is structurally valid, before attempting to read it in. It can only be invalid in the case of a ZFS bug, but this change will help identify such bugs in a more transparent way, by panic'ing with a relevant message, rather than indexing off the end of an array or something.
Illumos issue: 5349 verify that block pointer is plausible before reading |
277576 |
23-Jan-2015 |
delphij |
MFC r275738: MFV r275546:
Reduce scrub activities when system there is enough dirty data, namely when dirty data is more than zfs_vdev_async_write_active_min_dirty_percent (once we start to increase the number of concurrent async writes).
While there also correct rounding error which would make scrub end up pausing for (zfs_txg_timeout + 1) seconds instead of the desired zfs_txg_timeout seconds.
Illumos issue: 5351 scrub goes for an extra second each txg 5352 scrub should pause when there is some dirty data |
277575 |
23-Jan-2015 |
delphij |
MFC r275737: MFV r275545:
If zio_checksum_error() returns other than ECKSUM (e.g. EINVAL), it does not fill in the "zio_bad_cksum_t *info" parameter. Caller should not attempt to use it in this case.
Illumos issue: 5348 zio_checksum_error() only fills in info if ECKSUM |
277574 |
23-Jan-2015 |
delphij |
MFC r275736: MFV r275544:
Clean up some duplicated code in dnode_sync() around freeing spill blocks.
Illumos issue: 5350 clean up code in dnode_sync() |
277573 |
23-Jan-2015 |
delphij |
MFC r275735: MFV r275543:
Remove always true tests for ds->ds_phys' presence.
Clean up assertions in dsl_dataset_disown.
Remove unreachable code in dsl_dataset_disown().
Illumos issue: 5310 Remove always true tests for non-NULL ds->ds_phys |
277572 |
23-Jan-2015 |
delphij |
MFC r275734: MFV r275542:
If a dnode has a spill block and there is an error while accessing a data block then traverse_dnode() loses information about that error and returns a status of visiting the spill block.
This issue is discovered by Spectra Logic.
Illumos issue: 5311 traverse_dnode may report success when it should not
Original author: gibbs |
277553 |
23-Jan-2015 |
delphij |
MFC r275594: MFV r275540:
When importing a pool, don't assume that the passed pool configuration at vdev_load is always vaild. It's possible that a stale configuration that comes with extra vdevs, where metaslab_init() would fail because of lower layer returns error.
Change the code to make metaslab_init() handle and return errors from lower layer and pass it back to upper layer and handle it there.
Illumos issue: 5213 panic in metaslab_init due to space_map_open returning ENXIO |
277547 |
23-Jan-2015 |
delphij |
MFC r275562: MFV r275535:
Unexpand ISP2() and MSEC2NSEC().
Illumos issue: 5255 uts shouldn't open-code ISP2 |
277546 |
23-Jan-2015 |
delphij |
MFC r275561: MFV r275534:
Sync with Illumos. This have no effect to FreeBSD.
Illumos issue: 5285 pass in cpu_pause_func via pause_cpus |
277545 |
23-Jan-2015 |
delphij |
MFC r275533:
Sync with Illumos. This have no effect to FreeBSD.
Illumos issue: 5100 sparc build failed after 5004 |
277483 |
21-Jan-2015 |
smh |
MFC r276063: Standardise on illumos for #ifdef's in zvol.c
MFC r276066: Refactor zvol locking to minimise diff with upstream
MFC r276069: Fix panic when resizing ZFS zvol's
Sponsored by: Multiplay |
277482 |
21-Jan-2015 |
smh |
MFC r272509 (by delphi): Diff reduction with upstream
Sponsored by: Multiplay |
276900 |
10-Jan-2015 |
delphij |
MFC r265218 (smh):
Removed pointless / duplicated call to trim_map_first. |
276899 |
10-Jan-2015 |
delphij |
MFC r264392 (davide):
Fix a panic in zfs_rename(). this is due to a wrong dereference of a vnode when it's not locked and can be (potentially) recycled. 'sdvp' cannot be locked on zfs_rename() entry point because the VFS can't be sure that this scenario is LOR-free (it might violate the parent->child lock acquisition rule). Dereference 'tdvp' instead, which is already locked on entry, and access 'sdvp' fields only when it's safe, i.e. under ZFS_ENTER scope.
While at it, remove the usage of VOP_REALVP, as long as this is a NOP on FreeBSD. |
276648 |
04-Jan-2015 |
kib |
MFC r276007: Handle MAKEENTRY cnp flag in the VOP_CREATE(). |
276500 |
01-Jan-2015 |
kib |
MFC r275897: Set NOCACHE flag for CREATE namei() calls, do not specially handle MAKEENTRY in VOP_LOOKUP(). |
276082 |
22-Dec-2014 |
delphij |
MFC r275530:
Use %d instead of %u for error number. This way we see ERESTART as -1 not 4294967295 when doing DTrace. |
276081 |
22-Dec-2014 |
delphij |
MFC r274337,r274673,274681,r275515:
ZFS large block support. The default recordsize remains at 128KB.
A new tunable/sysctl variable, vfs.zfs.max_recordsize is added to allow adjusting the permitted maximum record size, or zfs_max_recordsize, with a default of 1MB. ZFS will not allow setting recordsize greater than zfs_max_recordsize as a safety belt, because larger recordsize means greater read and write latency and more memory usage.
Please note that booting from datasets that have recordsize greater than 128KB is not supported (but it's Okay to enable the feature on the pool).
Limited safety belt is provided for mounted root filesystem but use caution when using a larger value.
Illumos issue: 5027 zfs large block support |
275901 |
18-Dec-2014 |
avg |
MFC r275401: zfs_putpages: actually update mtime and ctime |
275892 |
18-Dec-2014 |
mav |
MFC r275474: Add GET LBA STATUS command support to CTL.
It is implemented for LUNs backed by ZVOLs in "dev" mode and files. GEOM has no such API, so for LUNs backed by raw devices all LBAs will be reported as mapped/unknown.
Sponsored by: iXsystems, Inc. |
275609 |
08-Dec-2014 |
avg |
MFC r274628: l2arc: restore correct rounding up of asize of compressed data |
275492 |
05-Dec-2014 |
delphij |
MFC r274172 (avg)
fix l2arc compression buffers leak
We have observed that arc_release() can be called concurrently with a l2arc in-flight write. Also, we have observed that arc_hdr_destroy() can be called from arc_write_done() for a zio with ZIO_FLAG_IO_REWRITE flag in similar circumstances.
Previously the l2arc headers would be freed while leaking their associated compression buffers. Now the buffers are placed on l2arc_free_on_write list for delayed freeing. This is similar to what was already done to arc buffers that were supposed to be freed concurrently with in-flight writes of those buffers.
In addition to fixing the discovered leaks this change also adds some protective code to assert that a compression buffer associated with a l2arc header is never leaked.
A new kstat l2_cdata_free_on_write is added. It keeps a count of delayed compression buffer frees which previously would have been leaks.
Tested by: Vitalij Satanivskij <satan@ukr.net> et al Requested by: many Sponsored by: HybridCluster / ClusterHQ
This is a 10.1-RELEASE errata candidate. |
275490 |
04-Dec-2014 |
delphij |
MFC r274674:
Add a tunable for spa_slop_shift which controls how much space we would reserve by default. Tuning is not recommended.
Relnotes: yes |
275488 |
04-Dec-2014 |
delphij |
MFC r274276: MFV r274271:
Improve zdb -b performance:
- Reduce gethrtime() call to 1/100th of blkptr's; - Skip manipulating the size-ordered tree; - Issue more (10, previously 3) async reads; - Use lighter weight testing in traverse_visitbp();
Illumos issue: 5243 zdb -b could be much faster |
274800 |
21-Nov-2014 |
smh |
MFC r274619: Disable TRIM on file backed ZFS vdevs and fix TRIM on init
Sponsored by: Multiplay |
274732 |
20-Nov-2014 |
mav |
MFC r274154, r274163: Add to CTL support for logical block provisioning threshold notifications.
For ZVOL-backed LUNs this allows to inform initiators if storage's used or available spaces get above/below the configured thresholds.
Sponsored by: iXsystems, Inc. |
274625 |
17-Nov-2014 |
avg |
MFC r272708: l2arc_write_buffers: reduce headroom value |
274326 |
09-Nov-2014 |
jpaetzel |
MFC: 273641
This change addresses 4 bugs in ZFS exposed by Richard Kojedzinszky's crash.sh script attached to FreeNAS bug 4109: https://bugs.freenas.org/issues/4109
Three are in the snapshot layer: a) AVG explains in his notes: https://wiki.freebsd.org/AvgVfsSolarisVsFreeBSD
"VOP_INACTIVE must not do any destructive actions to a vnode and its filesystem node, nor invalidate them in any way." gfs_vop_inactive and zfsctl_snapshot_inactive did just that. In OpenSolaris VOP_INACTIVE is much closer to FreeBSD's VOP_RECLAIM. Rename & move them to gfs_vop_reclaim and zfsctl_snapshot_reclaim and merge in the requisite vnode_destroy from zfsctl_common_reclaim.
b) gfs_lookup_dot and various zfsctl functions do not honor the FreeBSD VFS convention of only locking from the root downward. When looking up ".." the convention is to drop the current leaf vnode lock before acquiring the directory vnode and then subsequently re-acquiring the lock on the leaf vnode. This fixes that in all the places that our exercised by crash.sh.
c) The snapshot may already be unmounted when the directory vnode is reclaimed. Check for this case and return.
One in the common layer: d) Callers of traverse expect the reference to the vnode passed in to be maintained. Don't release it.
This last one may be an unclear contract. There may in fact be some callers that do expect the reference to be dropped on success in addition to callers that expect it to be released. In this case a further audit of the callers is needed and a consensus on the correct behavior.
PR: 184677 Submitted by: kmacy Reviewed by: delphij, will, avg Sponsored by: iXsystems |
273984 |
02-Nov-2014 |
delphij |
MFC r273026:
Add a tunable for arc_shrink_shift (vfs.zfs.arc_shrink_shift) that controls how much fraction, 1/2^arc_shrink_shift, should be reclaimed when there is memory pressure.
Submitted by: Richard Kojedzinszky <krichy at tvnetwork.hu> |
273983 |
02-Nov-2014 |
delphij |
MFC r273267:
Add tunable vfs.zfs.space_map_blksz for space map's maximum block size. |
273736 |
27-Oct-2014 |
hselasky |
MFC r263710, r273377, r273378, r273423 and r273455:
- De-vnet hash sizes and hash masks. - Fix multiple issues related to arguments passed to SYSCTL macros.
Sponsored by: Mellanox Technologies |
273510 |
23-Oct-2014 |
delphij |
MFC r272810: FV r272804:
Refactor the code and stop restore_object from creating two transactions.
Illumos issue: 3693 restore_object uses at least two transactions to restore an object |
273509 |
23-Oct-2014 |
delphij |
MFC r272809: MFV r272803:
Illumos issue: 5175 implement dmu_read_uio_dbuf() to improve cached read performance |
273350 |
20-Oct-2014 |
delphij |
MFC r272601: MFV r272591:
Use loaned ARC buffer for zfs receive to avoid copy.
Illumos issue: 5162 zfs recv should use loaned arc buffer to avoid copy |
273348 |
20-Oct-2014 |
delphij |
MFC r272598: MFV r272585:
Split the godfather zio into CPU number's to reduce lock contention.
Illumos issue: 5176 lock contention on godfather zio |
273347 |
20-Oct-2014 |
delphij |
MFC r272584: MFV r272501:
Illumos issue: 5177 remove dead code from dsl_scan.c |
273346 |
20-Oct-2014 |
delphij |
MFC r272511: MFV r272499:
Illumos issue: 5174 add sdt probe for blocked read in dbuf_read() |
273345 |
20-Oct-2014 |
delphij |
MFC r272510: MFV r272498:
Add a new sysctl, vfs.zfs.vol.unmap_enabled, which allows the system administrator to toggle whether ZFS should ignore UNMAP requests.
Illumos issue: 5149 zvols need a way to ignore DKIOCFREE |
273343 |
20-Oct-2014 |
delphij |
MFC r272507: MFV r272496:
Add tunable for number of metaslabs per vdev (vfs.zfs.vdev.metaslabs_per_vdev). The default remains at 200.
Illumos issue: 5161 add tunable for number of metaslabs per vdev |
273341 |
20-Oct-2014 |
delphij |
MFC r272504: MFV r272494:
Make space_map_truncate() always do space_map_reallocate(). Without this, setting space_map_max_blksz would cause panic for existing pool, as dmu_objset_set_blocksize would fail if the object have multiple blocks.
Illumos issues: 5164 space_map_max_blksz causes panic, does not work 5165 zdb fails assertion when run on pool with recently-enabled spacemap_histogram feature |
273195 |
16-Oct-2014 |
delphij |
MFC r272583: MFV r272500:
Don't inherit flags other than DS_FLAG_CI_DATASET and DS_FLAG_INCONSISTENT when cloning. This prevents DS_FLAG_DEFER_DESTROY being inherited from a clone that is marked for deferred destroy, which causes snapshots of the clone being destroyed when getting a hold or clone.
Illumos issue: 5150 zfs clone of a defer_destroy snapshot causes strangeness |
273194 |
16-Oct-2014 |
delphij |
MFC r272527:
Don't make nested definition for range_seg_cache.
Reported by: ian |
273193 |
16-Oct-2014 |
delphij |
MFC r272506: MFV r272495:
In arc_kmem_reap_now(), reap range_seg_cache too to reclaim memory in response of memory pressure.
Illumos issue: 5163 arc should reap range_seg_cache |
273191 |
16-Oct-2014 |
delphij |
MFV r273060:
Use write_psize instead of write_asize when doing vdev_space_update. Without this change the accounting of L2ARC usage would be wrong and give 16EB free space because the number became negative and overflows.
Obtained from: FreeNAS (issue #6239) |
273161 |
16-Oct-2014 |
smh |
MFC r273158: Prevent ZFS leaking pool free space
Early MFC approved by re@
Approved by: re@ (glebius) Sponsored by: Multiplay |
273110 |
14-Oct-2014 |
pfg |
MFC r267851:
Continue the crusade towards a dev_clone()-free kernel, removing its usage from dtrace. The dtrace code already uses cdevpriv(9) since FreeBSD 8, so this change is quite harmless.
Originally by: davide Reviewed by: markj |
273057 |
13-Oct-2014 |
delphij |
Fix a missed merge introduced in r272883. |
272883 |
10-Oct-2014 |
smh |
MFC r272474: Fix various issues with zvols
Sponsored by: Multiplay |
272882 |
10-Oct-2014 |
smh |
MFC r271589: Added missing ZFS sysctls
This also includes small additional direct changes as it still uses the old way of handling tunables.
Sponsored by: Multiplay |
272879 |
10-Oct-2014 |
smh |
MFC r271754: Remove unused ZFS ARC functions
Sponsored by: Multiplay |
272875 |
10-Oct-2014 |
smh |
MFC r270759: Refactor ZFS ARC reclaim logic to be more VM cooperative
MFC r270861: Ensure that ZFS ARC free memory checks include cached pages
MFC r272483: Refactor ZFS ARC reclaim checks and limits
Sponsored by: Multiplay |
272676 |
07-Oct-2014 |
araujo |
Make external NFS clients know when files have their attributes changed and avoid cache the file's state indefinitely. The va_filerev is what is sent to the client as the "change" attribute, the client is periodically fetching the attributes and without this option the attribute remains as some garbage value.
Phabric: D905 Reported by: Kevin Buhr <buhr@asaurus.net> Reviewed by: rmacklem, delphij Approved by: delphij Obtained from: r272467 Sponsored by: QNAP Systems Inc. |
272665 |
06-Oct-2014 |
delphij |
MFC r271532: MFV r271515:
Add a new tunable/sysctl, vfs.zfs.free_max_blocks, which can be used to limit how many blocks can be free'ed before a new transaction group is created. The default is no limit (infinite), but we should probably have a lower default, e.g. 100,000.
With this limit, we can guard against the case where ZFS could run out of memory when destroying large numbers of blocks in a single transaction group, as the entire DDT needs to be brought into memory.
Illumos issue: 5138 add tunable for maximum number of blocks freed in one txg |
272615 |
06-Oct-2014 |
mav |
MFC r271308: Make ZVOL writes in device mode support IO_SYNC flag. |
272456 |
02-Oct-2014 |
delphij |
MFC r271528: MFV r271512:
Illumos issue: 5136 fix write throttle comment in dsl_pool.c
Approved by: re (gjb) |
272332 |
30-Sep-2014 |
delphij |
MFC r271526: MFV r271510:
Enforce 4K as smallest indirect block size (previously the smallest indirect block size was 1K but that was never used).
This makes some space estimates more accurate and uses less memory for some data structures.
Illumos issue: 5141 zfs minimum indirect block size is 4K
Approved by: re (gjb) |
272134 |
25-Sep-2014 |
delphij |
MFC r271536: MFV r271518:
Correctly report hole at end of file.
When asked to find a hole, the DMU sees that there are no holes in the object, and returns ESRCH. The ZPL interprets this as "no holes before the end of the file", and therefore inserts the "virtual hole" at the end of the file. Because DMU and ZPL have different ideas of where the end of an object/file is, we will end up returning the end of file, which is generally larger, instead of returning the end of object.
The fix is to handle the "virtual hole" in the DMU. If no hole is found, the DMU will return a hole at the end of the file, rather than an error.
Illumos issue: 5139 SEEK_HOLE failed to report a hole at end of file
Approved by: re (gjb) |
272133 |
25-Sep-2014 |
delphij |
MFC r271534: MFV r271517:
In zil_claim, don't issue warning if we get EBUSY (inconsistent) when opening an objset, instead, ignore it silently.
Illumos issue:
5140 message about "%recv could not be opened" is printed when booting after crash
Approved by: re (gjb) |
271776 |
18-Sep-2014 |
smh |
MFC r271429: Persist vdev_resilver_txg changes to avoid panic caused by validation vs a vdev_resilver_txg value from a previous resilver.
Approved by: re (glebius) Sponsored by: Multiplay |
271683 |
16-Sep-2014 |
smh |
MFC 265253: Don't treat TRIM requests returning ENOTSUP as an unexpected error
Approved by: re (gjb) Sponsored by: Multiplay |
271435 |
11-Sep-2014 |
smh |
MFC r266497: Add sysctls for ZFS dirty data tuning.
MFC r266533: Improve sysctl descriptions for new ZFS sysctls.
Approved by: re (marius) Sponsored by: Multiplay |
271392 |
10-Sep-2014 |
delphij |
MFC r271226: MFV r271223:
In dnode_sync(), do dnode_increase_indirection() before processing the dn_next_nblkptr.
Illumos issue: 5117 space map reallocation can cause corruption
Approved by: re (gjb) |
271238 |
07-Sep-2014 |
smh |
MFC r256956: Improve ZFS N-way mirror read performance by using load and locality information.
MFC r260713: Fix ZFS mirror code for handling multiple DVA's
Also make the addition of the d_rotation_rate binary compatible. This allows storage drivers compiled for 10.0 to work by preserving the ABI for disks.
Approved by: re (gjb) Sponsored by: Multiplay |
271002 |
03-Sep-2014 |
delphij |
MFC r270248: MFV r270196:
Illumos issue: 5047 don't use atomic_*_nv if you discard the return value |
271001 |
03-Sep-2014 |
delphij |
MFC r270247: MFV r270195:
Illumos issue: 5045 use atomic_{inc,dec}_* instead of atomic_add_* |
270998 |
03-Sep-2014 |
delphij |
MFC r270239: MFV r270193:
Illumos issues: 5042 stop using deprecated atomic functions |
270809 |
29-Aug-2014 |
delphij |
MFC r270383: MFV r270198:
Instead of using timestamp in the AVL, use the memory address when comparing.
Illumos issue: 5095 panic when adding a duplicate dbuf to dn_dbufs |
270312 |
21-Aug-2014 |
smh |
MFC r265152 - Reintroduce priority for the TRIM ZIOs instead of using the "NOW" priority MFC r265321 - Fix double fault panic when returning EOPNOTSUPP MFC r269407 - Don't return ZIO_PIPELINE_CONTINUE from vdev_op_io_start methods
Sponsored by: Multiplay |
270294 |
21-Aug-2014 |
markj |
MFC r269525: Return 0 for the PPID of threads in process 0, as process 0 doesn't have a parent process. |
270128 |
18-Aug-2014 |
delphij |
MFC r269543: MFV r269542:
In vdev_get_stats, check that the vdev is not a hole before computing the fragmentation. This fixes a panic when removing log device.
Illumos issue: 5049 panic when removing log device |
270127 |
18-Aug-2014 |
delphij |
MFC r269431: MFV r269427:
In dnode_children_t, use C99's "[]" idiom for declaring the variable sized array dnc_children at the end of the structure.
This prevents the compiler from mistakenly optimizing away accesses beyond the array's defined size.
Illumos issue: 5038 Remove "old-style" flexible array usage in ZFS. Author: Justin T. Gibbs <justing@spectralogic.com> |
269846 |
12-Aug-2014 |
delphij |
MFC r269230: MFV r269224:
Increase default ARC buf_hash_table size. When typical block size is small, the hash table could be too small, which would lead to long hash chains and limit performance for cached reads.
A new loader tunable, vfs.zfs.arc_average_blocksize, have been added which allows users to override the default assumption of average (typical) block size. Old default was 65536 (64 KiB) and new default is 8192 (8 KiB).
Illumos issue: 5034 ARC's buf_hash_table is too small |
269845 |
12-Aug-2014 |
delphij |
MFC r269229,269404,269466: MFV r269223:
Change dn->dn_dbufs from linked list to AVL tree.
Illumos issues: 4873 zvol unmap calls can take a very long time for larger datasets |
269774 |
10-Aug-2014 |
delphij |
MFC r269138:
Add two sysctls for newly added tunables. |
269773 |
10-Aug-2014 |
delphij |
MFC r269118: MFV r269010:
Import Illumos changes to address the following Illumos issues: 4976 zfs should only avoid writing to a failing non-redundant top-level vdev 4978 ztest fails in get_metaslab_refcount() 4979 extend free space histogram to device and pool 4980 metaslabs should have a fragmentation metric 4981 remove fragmented ops vector from block allocator 4982 space_map object should proactively upgrade when feature is enabled 4984 device selection should use fragmentation metric |
269756 |
09-Aug-2014 |
markj |
MFC r259211: Correct the check for errors from proc_rwmem(). |
269733 |
08-Aug-2014 |
delphij |
MFC r269093:
Transform the I/O when vdev_physical_ashift is greater than SPA_MINBLOCKSHIFT. |
269732 |
08-Aug-2014 |
delphij |
MFC r269086:
As of r268075, the responsibility of rounding up buffer to optimal size have been transferred from zio_compress_data to its caller. Therefore, passing the 'minblocksize' down will be a no-op.
Eliminate the parameter to reduce diff against upstream. |
269557 |
05-Aug-2014 |
markj |
MFC r267759, r267761
r267759: Fix a couple of bugs on amd64 when fetching probe arguments beyond the first five for probes entered through a UD fault (i.e. FBT probes).
Specifically, handle the fact that dtrace_invop_callsite must be 16 byte-aligned and thus may not immediately follow the call to dtrace_invop() in dtrace_invop_start(). Also fetch register arguments and the stack pointer through a struct trapframe instead of a struct reg.
r267761: Fix some bugs when fetching probe arguments in i386. Firstly ensure that the 4 byte-aligned dtrace_invop_callsite can be found and that it immediately follows the call to dtrace_invop(). Secondly, fix some pointer arithmetic to account for differences between struct i386_frame and illumos' struct frame. Finally, ensure that dtrace_getarg() isn't inlined. It works by following a fixed number of frame pointers to the probe site, so inlining breaks it.
PR: 191260 |
269531 |
04-Aug-2014 |
markj |
MFC r256822: When fetching function arguments out of a frame on amd64, explicitly select the register based on the argument index rather than relying on the fields in struct reg to be in the right order. This assumption is incorrect on FreeBSD and generally led to bogus argument values for the sixth argument of PID and USDT probes; the first five are passed directly to dtrace_probe() via the fasttrap trap handler and so were correctly handled. |
269520 |
04-Aug-2014 |
markj |
MFC r256571: Add a function, memstr, which can be used to convert a buffer of null-separated strings to a single string. This can be used to print the full arguments of a process using execsnoop (from the DTrace toolkit) or with the following one-liner:
dtrace -n 'syscall::execve:return {trace(curpsinfo->pr_psargs);}'
Note that this relies on the process arguments being cached via the struct proc, which means that it will not work for argvs longer than kern.ps_arg_cache_limit. However, the following rather non-portable script can be used to extract any argv at exec time:
fbt::kern_execve:entry { printf("%s", memstr(args[1]->begin_argv, ' ', args[1]->begin_envv - args[1]->begin_argv)); }
The debug.dtrace.memstr_max sysctl limits the maximum argument size to memstr(). |
269494 |
04-Aug-2014 |
kib |
MFC r269189: Initialize zfs vnode v_hash when the vnode is allocated. |
269419 |
02-Aug-2014 |
delphij |
MFC r268865: MFV r268852:
Reduce lock contention on the z_teardown_lock under heavily cached read workload by splitting the single teardown rrw lock into RRM_NUM_LOCKS (17) of them.
Read acquisitions are randomly distributed among these locks based on curthread pointer. Write acquisitions are going to all the locks, which for the usage of this type of lock should be rare.
Illumos issue: 5008 lock contention (rrw_exit) while running a read only load |
269418 |
02-Aug-2014 |
delphij |
MFC r268859: MFV r268851:
When a sync task is waiting for a txg to complete, we should hurry it along by increasing the number of outstanding async writes (i.e. make vdev_queue_max_async_writes() return a larger number).
Illumos issue: 4753 increase number of outstanding async writes when sync task is waiting |
269417 |
02-Aug-2014 |
delphij |
MFC r268858: MFV r268850:
Change the interaction between the DMU and ARC so that when the DMU is shutting down an objset, we do not evict the data from the ARC. Instead we simply coordinate the destruction of the DMU's data with the ARC.
The only case where we actually need to explicitly evict from the ARC is when dbuf_rele_and_unlock() determines that the administrator has requested that it not be kept in memory, via the primarycache/secondarycache properties. In this case, we evict the data from the ARC by its blkptr_t, the same way as when a block is freed we explicitly evict it from the ARC.
Illumos issue: 4631 zvol_get_stats triggering too many reads |
269416 |
02-Aug-2014 |
delphij |
MFC r268855: MFV r268848:
Instead of asserting all zio's be properly aligned, only assert on the logical ones.
Cap uberblocks at 8k, otherwise with ashift=17, there would be only one uberblock.
This fixes a problem that zdb would trip assert on pools with ashift >= 0xe (8k).
While there, also change the code so it only attempt to condense space map unless the uncondensed size consumes greater than zfs_metaslab_condense_block_threshold blocks.
Illumos issue: 4958 zdb trips assert on pools with ashift >= 0xe |
269342 |
31-Jul-2014 |
markj |
MFC r264434: DTrace's pid provider works by inserting breakpoint instructions at probe sites and installing a hook at the kernel's trap handler. The fasttrap code will emulate the overwritten instruction in some common cases, but otherwise copies it out into some scratch space in the traced process' address space and ensures that it's executed after returning from the trap.
In Solaris and illumos, this (per-thread) scratch space comes from some reserved space in TLS, accessible via the fs segment register. This approach is somewhat unappealing on FreeBSD since it would require some modifications to rtld and jemalloc (for static TLS) to ensure that TLS is executable, and would thus introduce dependencies on their implementation details. I think it would also be impossible to safely trace static binaries compiled without these modifications.
This change implements the functionality in a different way, by having fasttrap map pages into the target process' address space on demand. Each page is divided into 64-byte chunks for use by individual threads, and fasttrap's process descriptor struct has been extended to keep track of any scratch space allocated for the corresponding process.
With this change it's possible to trace all libc functions in a program, e.g. with
pid$target:libc.so.*::entry {@[probefunc] = count();}
Previously this would generally cause the victim process to crash, as tracing memcpy on amd64 requires the functionality described above. |
269219 |
29-Jul-2014 |
delphij |
MFC r268720: MFV r268714:
Improve extreme rewind import.
When doing an "extreme rewind" import ("zpool import -XF"), we attempt to verify all data in the pool, essentially scrubbing the entire pool. The problem is that spa_load_verify_cb() issues an unbounded number of concurrent scrub i/os. This can lead to all of memory being used for these zio's, wedging the system. Like normal scrub, we need to put a cap on the number of outstanding i/os, and have the traverse thread block when we reach this cap.
For this purpose the cap can be very large (10,000) to optimize the elevator algorithm. Three kernel tunables have been added:
vfs.zfs.spa_load_verify_maxinflight vfs.zfs.spa_load_verify_metadata vfs.zfs.spa_load_verify_data
The latter two tunables controls whether metadata and/or user data when doing extreme rewind.
Make 'zpool import -T' imply scrub.
Make zpool import -T <txg> accept hexadecimal values for the txg when prefixed with 0x.
Skip txg's for which there is no uberblock when doing extreme rewind.
Skip reading all user data twice by skipping prefetches when doing extreme rewinds as we do not access via the ARC.
Illumos issues: 4970 need controls on i/o issued by zpool import -XF 4971 zpool import -T should accept hex values 4972 zpool import -T implies extreme rewind, and thus a scrub 4973 spa_load_retry retries the same txg 4974 spa_load_verify() reads all data twice |
269218 |
29-Jul-2014 |
delphij |
MFC r268713: MFV r268702:
Add missing *_destroy() calls in various places with ZFS.
Illumos issue: 4975 missing mutex_destroy() calls in zfs |
269061 |
24-Jul-2014 |
mav |
MFC r268420: Remove IO_SYNC flag when writing extended file attributes on ZFS.
While it is possible to create and write file, modify its permissions, etc. without ever doing sync, it looks odd that it is required for setting extended file attributes on ZFS. UFS does not do sync there too.
Samba uses those extended attributes to store some its data, and doing it synchronously by many times reduces file creation performance for systems without SLOG device. |
269006 |
23-Jul-2014 |
delphij |
MFC r268473: MFV r268455:
Use reserved space for ZFS administrative commands. |
269002 |
23-Jul-2014 |
delphij |
MFC r268464: MFV r268452:
Explicitly mark file removal transactions as "presumed to result in a net free of space" so they will not fail with ENOSPC.
Illumos issue: 4950 files sometimes can't be removed from a full filesystem |
268786 |
17-Jul-2014 |
delphij |
MFC r268116:
- Fix handling of "new" style of ioctl in compatiblity mode [1]; - Reorganize code and reduce diff from upstream; - Improve forward compatibility shims for previous kernel;
Reported by: sbruno [1] |
268734 |
16-Jul-2014 |
pfg |
MFC r268097:
MFV r260708 4427 pid provider rejects probes with valid UTF-8 names
This make use of Solaris' u8_validate() which we happen to use since r185029 for ZFS. Use of u8_textprep.c required -Wno-cast-qual for powerpc.
Illumos Revision: 1444d846b126463eb1059a572ff114d51f7562e5
Reference: https://www.illumos.org/issues/4427
Obtained from: Illumos |
268659 |
15-Jul-2014 |
delphij |
MFC r268128: MFV r268122:
4929 want prevsnap property |
268658 |
15-Jul-2014 |
delphij |
MFC r268126: MFV r268121:
4924 LZ4 Compression for metadata |
268657 |
15-Jul-2014 |
delphij |
MFC r268123: MFV r268119:
4914 zfs on-disk bookmark structure should be named *_phys_t |
268656 |
15-Jul-2014 |
delphij |
MFC r268086: MFV r267570:
4756 metaslab_group_preload() could deadlock |
268654 |
15-Jul-2014 |
delphij |
MFC r268085: MFV r267569:
4897 Space accounting mismatch in L2ARC/zpool |
268651 |
15-Jul-2014 |
delphij |
MFC r268082: MFV r267567:
4881 zfs send performance degradation when embedded block pointers are encountered |
268650 |
15-Jul-2014 |
delphij |
MFC r268079: MFV r267566:
4390 i/o errors when deleting filesystem/zvol can lead to space map corruption |
268649 |
15-Jul-2014 |
delphij |
MFC r268075: MFV r267565:
4757 ZFS embedded-data block pointers ("zero block compression") 4913 zfs release should not be subject to space checks |
268647 |
15-Jul-2014 |
delphij |
MFC r266771: MFV r266766:
Add a new zfs property, "redundant_metadata" which can have values "all" or "most". The default will be "all", which is the current behavior. When set to all, ZFS stores an extra copy of all metadata. If a single on-disk block is corrupt, at worst a single block of user data (which is recordsize bytes long) can be lost.
Setting to "most" will cause us to only store 1 copy of level-1 indirect blocks of user data files. This can improve performance of random writes, because less metadata has to be written. In practice, at worst about 100 blocks (of recordsize bytes each) of user data can be lost if a single on-disk block is corrupt.
The exact behavior of which metadata blocks are stored redundantly may change in future releases.
Illumos issue: 3835 zfs need not store 2 copies of all metadata |
268595 |
13-Jul-2014 |
pfg |
MFC r268290: Merge from OpenSolaris (24-Jul-2010):
6679140 asymmetric alloc/dealloc activity can induce dynamic variable drops 6679193 dtrace_dynvar walker produces flood of dtrace_dynhash_sink
This finishes a set of merges from the older OpenSolaris releases. Still the FreeBSD port has many differences that are difficult to account for but that seems normal given that the kernels are different.
Obtained from: OpenSolaris (through Illumos) |
268578 |
12-Jul-2014 |
rpaulo |
MFC 267929, 267937, 267939, 267940, 267941, 267942, 267987, 268006:
2915 DTrace in a zone should see "cpu", "curpsinfo", et al 2916 DTrace in a zone should be able to access fds[] 2917 DTrace in a zone should have limited provider access 4477 DTrace should speak JSON Add stubs for CTF functions which are not yet implemented. 4474 DTrace Userland CTF Support 4475 DTrace userland Keyword 4476 DTrace tests should be better citizens 4479 pid provider types 4480 dof emulation is missing checks 4471 DTrace count() with histogram 4472 DTrace full width distribution histograms 4473 DTrace frequency trails |
268572 |
12-Jul-2014 |
pfg |
MFC r268130, r268224, r268230, r268231:
Various DTrace Merges from OpenSolaris/Illumos:
15-Sep-2008: 6735480 race between probe enabling and provider registration
20-Apr-2008: 6822482 DOF validation needs to handle loadable sections flagged as unloadable
22-Apr-2009: 6823388 DTrace ioctl handlers must validate all structure members
30-Jun-2009: 6851093 system drops to kmdb with anonymous dtrace probes + kmdb
Obtained from: OpenSolaris |
268323 |
06-Jul-2014 |
pfg |
MFC r268125:
Small merges from OpenSolaris:
These have no effect on FreeBSD, in fact they are ifdef'ed, but make easier future merges:
6699767 panic in spec_open()
6718877 crgetzoneid() use can cause problems when forking processes with USDT providers in a non global zone |
268274 |
05-Jul-2014 |
mav |
MFC r268178: Fix bug in sync control in new "dev" mode of ZVOL (r265678).
Don't check ZVOL_WCE flag, used in Solaris to control device "write cache". It is not applicable on FreeBSD and by default set to "disable". |
268132 |
02-Jul-2014 |
pfg |
MVC r268014: Reduce some warnings in the Solaris unicode support.
Clean some warnings from parenthesis and minor style issues. |
267571 |
17-Jun-2014 |
mav |
MFC r267029, r267038: Replace gethrtime() with cpu_ticks(), as source of random for the taskqueue selection. gethrtime() in our port updated with HZ rate, so unusable for this specific purpose, completely draining benefit of multiple taskqueues. |
267138 |
06-Jun-2014 |
delphij |
MFC r266915: MFV 266913+266914:
3897 zfs filesystem and snapshot limits (fix leak) 4901 zfs filesystem/snapshot limit leaks |
266720 |
26-May-2014 |
smh |
MFC r264885
Eliminate duplicate checks in vdev_geom_io_intr error handling
Sponsored by: Multiplay |
266667 |
25-May-2014 |
markj |
MFC r262329: Define the KM_NORMALPRI flag for kmem_alloc(), as it is used in some upstream DTrace code.
MFC r262330: 1452 DTrace buffer autoscaling should be less violent
illumos/illumos-gate@6fb4854bed54ce82bd8610896b64ddebcd4af706 |
266122 |
15-May-2014 |
smh |
MFC r264850
Add the ability to set a minimum ashift size for ZFS pool creation or root level vdev addition.
Change max_auto_ashift sysctl to error when an invalid value is requested instead of silently limiting it.
Sponsored by: Multiplay |
266102 |
15-May-2014 |
markj |
MFC r262665: Expose a few DTrace parameters as sysctls under kern.dtrace and add descriptions for several existing sysctls.
PR: 187027 |
265746 |
09-May-2014 |
delphij |
MFC r265458: Import George Wilson's change for Illumos #4730:
4730 metaslab group taskq should be destroyed in metaslab_group_destroy() Reviewed by: Alex Reece <alex.reece@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Original author: George Wilson |
265745 |
09-May-2014 |
delphij |
MFC r264836 (MFV r264830):
4745 fix AVL code misspellings |
265744 |
09-May-2014 |
delphij |
MFC r264835 (MFV r264829):
3897 zfs filesystem and snapshot limits |
265741 |
09-May-2014 |
delphij |
MFC r264671 (MFV r264668):
4754 io issued to near-full luns even after setting noalloc threshold 4755 mg_alloc_failures is no longer needed
illumos/illumos@b6240e830b871f59c22a3918aebb3b36c872edba |
265740 |
09-May-2014 |
delphij |
MFC r264669: MFV r264666:
4374 dn_free_ranges should use range_tree_t
illumos/illumos-gate@bf16b11e8deb633dd6c4296d46e92399d1582df4 |
265678 |
08-May-2014 |
mav |
MFC r264145: Add property and sysctl to control how ZVOLs are exposed to OS.
New ZFS property volmode and sysctl vfs.zfs.vol.mode allow switching ZVOL between three modes: geom -- existing fully functional behavior (default); dev -- exposing volumes only as raw disk device file in devfs; none -- not exposing volumes outside ZFS.
The "dev" mode is less functional (can't be partitioned, mounted, etc), but it is faster, and in some scenarios with untrusted consumers safer. It can be useful for NAS, VM block storages, etc. The "none" mode may be convenient for backup servers, etc. that don't need direct data access.
Due to the way ZVOL is integrated with main ZFS code, those property and sysctl are checked only during pool import and volume creation. |
265677 |
08-May-2014 |
mav |
MFC r264086: 3580 Want zvols to return volblocksize when queried for physical block size
illumos/illumos-gate@a0b60564dfc644f4bfaef1ce26d343b44cf68bc5
It is irrelevant for FreeBSD, just reducing diff. |
265346 |
05-May-2014 |
markj |
MFC r262661: Fix emulation of call and jmp instructions on i386 and for 32-bit processes on amd64. |
265273 |
03-May-2014 |
markj |
MFC r262542: Move some files that are identical on i386 and amd64 to an x86 subdirectory rather than keeping duplicate copies. |
265234 |
02-May-2014 |
pfg |
MFC r264040:
4248 dtrace(1M) should never create DOF with empty probes section 4249 Only probes from the first DTrace object file will be included
Illumos Revision: 4a20ab41aadcb81c53e72fc65886e964e9add59
Reference: https://www.illumos.org/issues/4248 https://www.illumos.org/issues/4249
Obtained from: Illumos |
265143 |
30-Apr-2014 |
smh |
MFC r265046
Fix ZIO reordering issue which could cause data loss / corruption.
Sponsored by: Multiplay |
264796 |
23-Apr-2014 |
markj |
MFC r262596: 4478 dtrace_dof_maxsize is far too small
illumos/illumos-gate@d339a29bb4765c4b6883a935cf69b669cd05bca0 |
264733 |
21-Apr-2014 |
mav |
MFC r264193: In addition to r264077, tell GEOM that we do support BIO_DELETE now. |
264732 |
21-Apr-2014 |
mav |
MFC r264077: Add BIO_DELETE support to ZVOL.
It is an adapted merge from the vendor branch of: 701 UNMAP support for COMSTAR (in part related to ZFS) 2130 zvol DKIOCFREE uses nested DMU transactions |
264729 |
21-Apr-2014 |
mav |
MFC r264341: Create zvol devices on zfs clone.
While big and shiny patch is not ready, it is better to have something.
PR: kern/178999 |
263987 |
01-Apr-2014 |
mav |
MFC r263118: Report ZVOL block size as GEOM stripesize. |
263407 |
20-Mar-2014 |
delphij |
MFC r260183: MFV r260154 + 260182:
4369 implement zfs bookmarks 4368 zfs send filesystems from readonly pools
Illumos/illumos-gate@78f171005391b928aaf1642b3206c534ed644332 |
263401 |
20-Mar-2014 |
delphij |
MFC r260181:
Fix build on platforms where atomic_swap_64 is not available. |
263399 |
20-Mar-2014 |
delphij |
MFC r260157: MFV r260153:
4121 vdev_label_init should treat request as succeeded when pool is read only
illumos/illumos-gate@973c78e94bf9634782164382c9e291bf81161fa5 |
263397 |
19-Mar-2014 |
delphij |
MFC r260150: MFV r259170:
4370 avoid transmitting holes during zfs send
4371 DMU code clean up
illumos/illumos-gate@43466aae47bfcd2ad9bf501faec8e75c08095e4f
NOTE: Make sure the boot code is updated if a zpool upgrade is done on boot zpool. |
263395 |
19-Mar-2014 |
delphij |
MFC r260141: MFV r258385:
(Note: this change is not applicable to FreeBSD and the file is not included in build. It's integrated for completeness).
4128 disks in zpools never go away when pulled
illumos/illumos-gate@39cddb10a31c1c2e66aed69e6871d09caa4c8147 |
263393 |
19-Mar-2014 |
delphij |
MFC r260138: MFV r242733:
3306 zdb should be able to issue reads in parallel 3321 'zpool reopen' command should be documented in the man page and help message
illumos/illumos-gate@31d7e8fa33fae995f558673adb22641b5aa8b6e1
FreeBSD porting notes: the kernel part of this changeset depends on Solaris buf(9S) interfaces and are not really applicable for our use. vdev_disk.c is patched as-is to reduce diverge from upstream, but vdev_file.c is left intact. |
263390 |
19-Mar-2014 |
delphij |
MFC r259813 + r259813: MFV r258374:
4171 clean up spa_feature_*() interfaces
4172 implement extensible_dataset feature for use by other zpool features
illumos/illumos-gate@2acef22db7808606888f8f92715629ff3ba555b9 |
263281 |
18-Mar-2014 |
markj |
MFC r259535: The fasttrap fork handler is responsible for removing tracepoints in the child process that were inherited from its parent. However, this should not be done in the case of a vfork, since the fork handler ends up removing the tracepoints from the shared vm space, and userland DTrace probes in the parent will no longer fire as a result.
Now the child of a vfork may trigger userland DTrace probes enabled in its parent, so modify the fasttrap probe handler to handle this case and handle the child process in the same way that it would handle the traced process. In particular, if once traces function foo() in a process that vforks, and the child calls foo(), fasttrap will treat this call as having come from the parent. This is the behaviour of the upstream code.
While here, add #ifdef guards to some code that isn't present upstream. |
263269 |
17-Mar-2014 |
delphij |
MFC r262676:
All callers of static method load_nvlist() in spa.c handles error case, so there is no reason to assert that we won't hit an error. Instead, just return that error to caller and have the upper layer handle it.
Obtained from: FreeNAS Reported by: rodrigc Reviewed by: Matthew Ahrens |
262320 |
22-Feb-2014 |
delphij |
MFC r261620: MFV r261619:
4574 get_clones_stat does not call zap_count in non-debug kernel
zap_count(...) is never called in non-DEBUG kernel. As result "count" variable is always 0, and "goto fail" is always reached. This means get_clones_stat function never makes up list of clones for "clones" properties. |
262179 |
18-Feb-2014 |
avg |
MFC r259052: Expose spa_asize_inflation |
262167 |
18-Feb-2014 |
mav |
MFC r260236: In dmu_zfetch_stream_reclaim() replace division with multiplication and move it out of the loop and lock. |
262120 |
17-Feb-2014 |
avg |
MFC r260185: MFV r260155: 4391 panic system rather than corrupting pool if we hit bug 4390 |
262115 |
17-Feb-2014 |
avg |
MFC r260835: MFV r260834: Fix memory leak of compressed buffers in l2arc_write_done |
262112 |
17-Feb-2014 |
avg |
MFC r260704,260717: zfs: getnewvnode_reserve must be called outside of a zfs transaction |
262107 |
17-Feb-2014 |
avg |
MFC r260812: traverse_visitbp: visit DMU_GROUPUSED_OBJECT before DMU_USERUSED_OBJECT |
262096 |
17-Feb-2014 |
avg |
MFC r260706: zfs_deleteextattr: name buffer from namei is needed by zfs_remove |
262093 |
17-Feb-2014 |
avg |
MFC r258717: MFV r258371,r258372: 4101 metaslab_debug should allow for fine-grained control |
262048 |
17-Feb-2014 |
avg |
MFC r258291: change the ioctl definition so that the fasttrap ioctl handler is responsible for copying in userland data |
262047 |
17-Feb-2014 |
avg |
MFC r257679: Use suword32 and suword64 instead of copyout(9) in fasttrap |
262044 |
17-Feb-2014 |
avg |
MFC r257143: Fix a couple of bugs in the fasttrap emulation of a "push %rbp" |
260786 |
16-Jan-2014 |
avg |
MFC r258744-258746: zfs: add zfs_freebsd_putpages |
260776 |
16-Jan-2014 |
avg |
MFC r258720: MFV r258665: 4347 ZPL can use dmu_tx_assign(TXG_WAIT) |
260773 |
16-Jan-2014 |
avg |
MFC r258739: zfs mappedread_sf: assert that a page is never partially valid |
260769 |
16-Jan-2014 |
avg |
MFC r258634: MFV r258376: 3964 L2ARC should always compress metadata buffers |
260768 |
16-Jan-2014 |
avg |
MFC r258633: MFV r255256: 3954 metaslabs continue to load even after hitting zfs_mg_alloc_failure limit |
260763 |
16-Jan-2014 |
avg |
MFC r258632,258704: MFV r255255: 4045 zfs write throttle & i/o scheduler performance work
Sponsored by: HybridCluster [merge] |
260750 |
16-Jan-2014 |
avg |
MFC r258631: MFV r247578
3581 spa_zio_taskq[ZIO_TYPE_FREE][ZIO_TASKQ_ISSUE]->tq_lock is piping hot |
260745 |
16-Jan-2014 |
avg |
MFC r258743: drop ZUT_OBJ |
260742 |
16-Jan-2014 |
avg |
MFC r258630: 734 taskq_dispatch_prealloc() desired |
260739 |
16-Jan-2014 |
avg |
MFC r258628: opensolaris taskq: some cosmetic changes |
260731 |
16-Jan-2014 |
avg |
MFC r258638,258642: expose zfs_flags as debug.zfs_flags r/w tunable and sysctl
Sponsored by: HybridCluster |
260670 |
15-Jan-2014 |
jhibbits |
MFC r256543,r259245,r259421,r259668,r259674
r256543:
Add fasttrap for PowerPC. This is the last piece of the DTrace/ppc puzzle. It's incomplete, it doesn't contain full instruction emulation, but it should be sufficient for most cases.
r259245,r259421: (FBT)
FBT now does work fully on PowerPC.
Save r3 before using it for the trap check, else we end up saving the new r3, containing the trap instruction encoding (0x7c810808), and restoring it back with the frame on return. This caused it to panic on my ppc32 machine.
r259668,r259674: Fix a typo in the FBT code. |
260617 |
14-Jan-2014 |
delphij |
MFC r259811:
MFV r258373:
4168 ztest assertion failure in dbuf_undirty
4169 verbatim import causes zdb to segfa 4170 zhack leaves pool in ACTIVE state
illumos/illumos-gate@7fdd916c474ea52896c671bbe7b56ba34a1ca132 |
260517 |
10-Jan-2014 |
asomers |
MFC 259240 sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c When a da or ada device dissappears, outstanding IOs fail with ENXIO, not EIO. The check for EIO was probably copied from Illumos, where that is indeed the correct errno.
Without this change, pulling a busy drive from a zpool would usually turn it into UNAVAIL, even though pulling an idle drive would turn it into REMOVED. With this change, it is REMOVED every time.
Also, vdev_geom_io_intr shouldn't do zfs_post_remove, because that results in devd getting two resource.fs.zfs.removed events. The comment said that the event had to be sent directly instead of through the async removal thread because "the DE engine is using this information to discard prevoius I/O errors". However, the fact that vdev_geom_io_intr was never actually sending the events until now, and that vdev_geom_orphan never sent them at all, and that vdev_geom_orphan usually gets called about 2 seconds after the actual removal, means that FreeBSD's userland can cope with a late event just fine. |
260385 |
07-Jan-2014 |
scottl |
MFC Alexander Motin's GEOM direct dispatch work:
r256603: Introduce new function devstat_end_transaction_bio_bt(), adding new argument to specify present time. Use this function to move binuptime() out of lock, substantially reducing lock congestion when slow timecounter is used.
r256606: Move g_io_deliver() out of the lock, as required for direct dispatch. Move g_destroy_bio() out too to reduce lock scope even more.
r256607: Fix passing uninitialized bio_resid argument to g_trace().
r256610: Add unmapped I/O support to GEOM RAID.
r256830: Restore BIO_UNMAPPED and BIO_TRANSIENT_MAPPING in biodonne() when unmapping temporary mapped buffer. That fixes double unmap if biodone() called twice for the same BIO (but with different done methods).
r256880: Merge GEOM direct dispatch changes from the projects/camlock branch.
When safety requirements are met, it allows to avoid passing I/O requests to GEOM g_up/g_down thread, executing them directly in the caller context. That allows to avoid CPU bottlenecks in g_up/g_down threads, plus avoid several context switches per I/O.
r259247: Fix bug introduced at r256607. We have to recalculate bp_resid here since sizes of original and completed requests may differ due to end of media.
Testing of the stable/10 merge was done by Netflix, but all of the credit goes to Alexander and iX Systems.
Submitted by: mav Sponsored by: iX Systems |
260339 |
05-Jan-2014 |
mav |
MFC r259168: Don't even try to read vdev labels from devices smaller then SPA_MINDEVSIZE (64MB). Even if we would find one somehow, ZFS kernel code rejects such devices. It is funny to look on attempts to read 4 256K vdev labels from 1.44MB floppy, though it is not very practical and quite slow. |
260338 |
05-Jan-2014 |
mav |
MFC r258342: Reenable vfs.zfs.zio.use_uma for amd64, disabled at r209261.
On machines with seveal CPUs and enough RAM this can easily twice improve ZFS performance or twice reduce CPU usage. It was disabled three years ago due to memory and KVA exhaustion reports, but our VM subsystem got improved a lot since that time, hopefully enough to make another try. |
260337 |
05-Jan-2014 |
mav |
MFC r258137: Introduce allocation cache to store LZ4 compression contexts without kicking VM subsystem twice for every written record.
Tests on 24-core system show double reduction of CPU time spent on copying single large well-compressed file.
This patch is not really needed on illumos (while not harm either) since their memory allocator by default uses caching for all requests up to 128K. |
259734 |
22-Dec-2013 |
pjd |
MFC r259576:
MFV r258923: 4188 assertion failed in dmu_tx_hold_free(): dn_datablkshift != 0
illumos/illumos-gate@bb411a08b05466bfe0c7095b6373bbc1587e259a |
259483 |
16-Dec-2013 |
asomers |
MFC r258311
opensolaris/uts/common/dtrace/fasttrap.c Fix several problems that can cause panics on kldload and kldunload.
* kproc_create(fasttrap_pid_cleanup_cb, ...) gets called before fasttrap_provs.fth_table gets allocated. This can lead to a panic on module load, because fasttrap_pid_cleanup_cb references fasttrap_provs.fth_table. Move kproc_create down after the point that fasttrap_provs.fth_table gets allocated, and modify the error handling accordingly.
* dtrace_fasttrap_{fork,exec,exit} weren't getting NULLed until after fasttrap_provs.fth_table got freed. That caused panics on module unload because fasttrap_exec_exit calls fasttrap_provider_retire, which references fasttrap_provs.fth_table. NULL those function pointers earlier.
* There wasn't any code to destroy the fasttrap_{tpoints,provs,procs}.fth_table mutexes on module unload, leading to a resource leak when WITNESS is enabled. Destroy those mutexes during fasttrap_unload().
Sponsored by: Spectra Logic Corporation |
259073 |
07-Dec-2013 |
peter |
Hoist all the mergeinfo up to the root in preparation for enforcing merges to the root only. All MFC's were rerecorded to the root.
Going forward, if an MFC includes mergeinfo, it will need to be made to the root and committed from the root. Merges with --ignore-ancestry or diff | patch can go anywhere.
The mergeinfo in HEAD is in a bad state from years of neglect and manual tampering and this was branched into 10.x. This confuses the coalescing code and prevents it from doing its job.
Approved by: re (gjb, implicit) |
258595 |
25-Nov-2013 |
smh |
MFC r258294: Fix ZFS deadlock when sending a snapshot which is mounted.
Approved by: re (glebius) Sponsored by: Multiplay |
258566 |
25-Nov-2013 |
avg |
MFV r258378: 4089 NULL pointer dereference in arc_read()
illumos/illumos-gate@57815f6b95a743697e148327725b7f568e75e6ea
Tested by: adrian Approved by: re (gjb) |
258565 |
25-Nov-2013 |
avg |
MFV r258377: 4088 use after free in arc_release()
illumos/illumos-gate@ccc22e130479b5bd7c0002267fee1e0602d3f772
Approved by: re (gjb) |
258563 |
25-Nov-2013 |
avg |
MFC r258353: zfs page_busy: fix the boundaries of the cleared range
This is a fix for a regression introduced in r246293.
vm_page_clear_dirty expects the range to have DEV_BSIZE aligned boundaries, otherwise it extends them. Thus it can happen that the whole page is marked clean while actually having some small dirty region(s). This commit makes the range properly aligned and ensures that only the clean data is marked as such.
It would interesting to evaluate how much benefit clearing with DEV_BSIZE granularity produces. Perhaps instead we should clear the whole page when it is completely overwritten and don't bother clearing any bits if only a portion a page is written.
Reviewed by: kib Approved by: re (gjb) |
257058 |
24-Oct-2013 |
smh |
MFC r256889:
Use the vdev's ashift to calculate the supported min block size passed to zio_compress_data(..) when compressing l2arc buffers.
This eliminates L2ARC I/O errors, which resulted in very poor performance on vdev's configured with block size greater than 512b due to compression assuming a smaller min block size than the vdev supports.
Approved by: re (glebius) |
256281 |
10-Oct-2013 |
gjb |
Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
|
256259 |
10-Oct-2013 |
avg |
MFV r255257: 4082 zfs receive gets EFBIG from dmu_tx_hold_free()
illumos change 14172:be36a38bac3d: illumos ZFS issues: 4082 zfs receive gets EFBIG from dmu_tx_hold_free()
Please note that this change is slightly different from r255257, because it is merged out of order with other (larger) upstream changes.
PR: kern/182570 Reported by: Keith White <kwhite@site.uottawa.ca> Tested by: Keith White <kwhite@site.uottawa.ca> Approved by: re (glebius) MFC after: 1 week X-MFC after: r254753
|
256148 |
08-Oct-2013 |
markj |
Initialize and free the DTrace taskqueue in the dtrace module load/unload handlers rather than in the dtrace device open/close methods. The current approach can cause a panic if the device is closed which the taskqueue thread is active, or if a kernel module containing a provider is unloaded while retained enablings are present and the dtrace device isn't opened.
Submitted by: gibbs (original version) Reviewed by: gibbs Approved by: re (glebius) MFC after: 2 weeks
|
256132 |
08-Oct-2013 |
delphij |
Improve lzjb decompress performance by reorganizing the code to tighten the copy loop.
Submitted by: Denis Ahrens <denis h3q com> MFC after: 2 weeks Approved by: re (gjb)
|
255753 |
21-Sep-2013 |
gibbs |
Optimize the block size used on ZFS cache devices as is already done for data and log devices.
Reported by: Dmitryy Makarov Submitted by: smh Reviewed by: gibbs Approved by: re (delphij) MFC after: 2 weeks
|
255750 |
21-Sep-2013 |
delphij |
MFV r254750:
Add support of Illumos dumps on zvol over RAID-Z.
Note that this only adds the features. FreeBSD would still need more work to support dumping on zvols.
Illumos ZFS issues: 2932 support crash dumps to raidz, etc. pools
MFC after: 1 month Approved by: re (ZFS blanket)
|
255748 |
20-Sep-2013 |
davide |
Fixup cross-device rename checks in ZFS. Add a check for the case where 'fdvp' is a directory, 'tvp' is an already existing directory and they have different mount points.
Reported by: avg, pjd Reviewed by: pjd Approved by: re (rodrigc)
|
255437 |
10-Sep-2013 |
delphij |
MFV r247844 (illumos-gate 13975:ef6409bc370f)
Illumos ZFS issues: 3582 zfs_delay() should support a variable resolution 3584 DTrace sdt probes for ZFS txg states
Provide a compatibility shim for Solaris's cv_timedwait_hires to help aid future porting.
Approved by: re (ZFS blanket)
|
255226 |
05-Sep-2013 |
pjd |
Add sysctl/tunables for various metaslab variables.
|
255219 |
05-Sep-2013 |
pjd |
Change the cap_rights_t type from uint64_t to a structure that we can extend in the future in a backward compatible (API and ABI) way.
The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough.
The structure definition looks like this:
struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; };
The initial CAP_RIGHTS_VERSION is 0.
The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements.
The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future.
To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg.
#define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL)
We still support aliases that combine few rights, but the rights have to belong to the same array element, eg:
#define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL)
#define CAP_FCHMODAT (CAP_FCHMOD | CAP_LOOKUP)
There is new API to manage the new cap_rights_t structure:
cap_rights_t *cap_rights_init(cap_rights_t *rights, ...); void cap_rights_set(cap_rights_t *rights, ...); void cap_rights_clear(cap_rights_t *rights, ...); bool cap_rights_is_set(const cap_rights_t *rights, ...);
bool cap_rights_is_valid(const cap_rights_t *rights); void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src); void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src); bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);
Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg:
cap_rights_t rights;
cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);
There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg:
#define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...);
Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1:
cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);
Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition.
This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x.
Sponsored by: The FreeBSD Foundation
|
254982 |
28-Aug-2013 |
delphij |
Previously, both zfs_rename and zfs_link does a check on whether the passed vnode belongs to the same mount point (v_vfsp or also known as v_mount in FreeBSD). This check prevents the code from proceeding further on vnodes that do not belong to ZFS, for instance, on UFS or NULLFS.
The recent change (merged as r254585) on upstream changes the check of v_vfsp to instead check the znode's z_zfsvfs. On Illumos this would work because when the vnode comes from lofs, the VOP_REALVP() would give the right vnode, this is not true on FreeBSD where our VOP_REALVP is a no-op, and as such tdvp is not guaranteed to be a ZFS vnode, and will later trigger a failed assertion when verifying the vnode.
This changeset modifies our local shims (zfs_freebsd_rename and zfs_freebsd_link) to check if v_mount matches before proceeding further.
Reported by: many Diagnostic work by: avg
|
254813 |
24-Aug-2013 |
markj |
Rename the kld_unload event handler to kld_unload_try, and add a new kld_unload event handler which gets invoked after a linker file has been successfully unloaded. The kld_unload and kld_load event handlers are now invoked with the shared linker lock held, while kld_unload_try is invoked with the lock exclusively held.
Convert hwpmc(4) to use these event handlers instead of having kern_kldload() and kern_kldunload() invoke hwpmc(4) hooks whenever files are loaded or unloaded. This has no functional effect, but simplifes the linker code somewhat.
Reviewed by: jhb
|
254757 |
24-Aug-2013 |
delphij |
MFV r254749:
Don't hold dd_lock for long by breaking it when not doing dsl_dir accounting. It is not necessary to hold the lock while manipulating the parent's accounting, because there is no interface for userland to see a consistent picture of both parent and child at the same time anyway.
Illumos ZFS issues: 4046 dsl_dataset_t ds_dir->dd_lock is highly contended
|
254753 |
24-Aug-2013 |
delphij |
MFV r254747:
Fix a panic from dbuf_free_range() from dmu_free_object() while doing zfs receive. This is a regression from FreeBSD r253821.
Illumos ZFS issues: 4047 panic from dbuf_free_range() from dmu_free_object() while doing zfs receive
|
254744 |
23-Aug-2013 |
delphij |
MFV r254422:
Illumos DTrace issues: 3089 want ::typedef 3094 libctf should support removing a dynamic type 3095 libctf does not validate arrays correctly 3096 libctf does not validate function types correctly
|
254714 |
23-Aug-2013 |
avg |
zfs: do not reject any operations on a pool just because it's a boot pool
Unlike the upstream FreeBSD supports booting to all kinds of pools.
Requested by: many Tested by: sbruno MFC after: 12 days
|
254711 |
23-Aug-2013 |
avg |
zfs: inline and remove zfs_vnode_lock
It didn't serve any useful purpose, but obscured file and line information useful for debugging.
MFC after: 5 days X-MFC with: r254445
|
254649 |
22-Aug-2013 |
kib |
Remove the deprecated VM_ALLOC_RETRY flag for the vm_page_grab(9). The flag was mandatory since r209792, where vm_page_grab(9) was changed to only support the alloc retry semantic.
Suggested and reviewed by: alc Sponsored by: The FreeBSD Foundation
|
254627 |
21-Aug-2013 |
ken |
Expand the use of stat(2) flags to allow storing some Windows/DOS and CIFS file attributes as BSD stat(2) flags.
This work is intended to be compatible with ZFS, the Solaris CIFS server's interaction with ZFS, somewhat compatible with MacOS X, and of course compatible with Windows.
The Windows attributes that are implemented were chosen based on the attributes that ZFS already supports.
The summary of the flags is as follows:
UF_SYSTEM: Command line name: "system" or "usystem" ZFS name: XAT_SYSTEM, ZFS_SYSTEM Windows: FILE_ATTRIBUTE_SYSTEM
This flag means that the file is used by the operating system. FreeBSD does not enforce any special handling when this flag is set.
UF_SPARSE: Command line name: "sparse" or "usparse" ZFS name: XAT_SPARSE, ZFS_SPARSE Windows: FILE_ATTRIBUTE_SPARSE_FILE
This flag means that the file is sparse. Although ZFS may modify this in some situations, there is not generally any special handling for this flag.
UF_OFFLINE: Command line name: "offline" or "uoffline" ZFS name: XAT_OFFLINE, ZFS_OFFLINE Windows: FILE_ATTRIBUTE_OFFLINE
This flag means that the file has been moved to offline storage. FreeBSD does not have any special handling for this flag.
UF_REPARSE: Command line name: "reparse" or "ureparse" ZFS name: XAT_REPARSE, ZFS_REPARSE Windows: FILE_ATTRIBUTE_REPARSE_POINT
This flag means that the file is a Windows reparse point. ZFS has special handling code for reparse points, but we don't currently have the other supporting infrastructure for them.
UF_HIDDEN: Command line name: "hidden" or "uhidden" ZFS name: XAT_HIDDEN, ZFS_HIDDEN Windows: FILE_ATTRIBUTE_HIDDEN
This flag means that the file may be excluded from a directory listing if the application honors it. FreeBSD has no special handling for this flag.
The name and bit definition for UF_HIDDEN are identical to the definition in MacOS X.
UF_READONLY: Command line name: "urdonly", "rdonly", "readonly" ZFS name: XAT_READONLY, ZFS_READONLY Windows: FILE_ATTRIBUTE_READONLY
This flag means that the file may not written or appended, but its attributes may be changed.
ZFS currently enforces this flag, but Illumos developers have discussed disabling enforcement.
The behavior of this flag is different than MacOS X. MacOS X uses UF_IMMUTABLE to represent the DOS readonly permission, but that flag has a stronger meaning than the semantics of DOS readonly permissions.
UF_ARCHIVE: Command line name: "uarch", "uarchive" ZFS_NAME: XAT_ARCHIVE, ZFS_ARCHIVE Windows name: FILE_ATTRIBUTE_ARCHIVE
The UF_ARCHIVED flag means that the file has changed and needs to be archived. The meaning is same as the Windows FILE_ATTRIBUTE_ARCHIVE attribute, and the ZFS XAT_ARCHIVE and ZFS_ARCHIVE attribute.
msdosfs and ZFS have special handling for this flag. i.e. they will set it when the file changes.
sys/param.h: Bump __FreeBSD_version to 1000047 for the addition of new stat(2) flags.
chflags.1: Document the new command line flag names (e.g. "system", "hidden") available to the user.
ls.1: Reference chflags(1) for a list of file flags and their meanings.
strtofflags.c: Implement the mapping between the new command line flag names and new stat(2) flags.
chflags.2: Document all of the new stat(2) flags, and explain the intended behavior in a little more detail. Explain how they map to Windows file attributes.
Different filesystems behave differently with respect to flags, so warn the application developer to take care when using them.
zfs_vnops.c: Add support for getting and setting the UF_ARCHIVE, UF_READONLY, UF_SYSTEM, UF_HIDDEN, UF_REPARSE, UF_OFFLINE, and UF_SPARSE flags.
All of these flags are implemented using attributes that ZFS already supports, so the on-disk format has not changed.
ZFS currently doesn't allow setting the UF_REPARSE flag, and we don't really have the other infrastructure to support reparse points.
msdosfs_denode.c, msdosfs_vnops.c: Add support for getting and setting UF_HIDDEN, UF_SYSTEM and UF_READONLY in MSDOSFS.
It supported SF_ARCHIVED, but this has been changed to be UF_ARCHIVE, which has the same semantics as the DOS archive attribute instead of inverse semantics like SF_ARCHIVED.
After discussion with Bruce Evans, change several things in the msdosfs behavior:
Use UF_READONLY to indicate whether a file is writeable instead of file permissions, but don't actually enforce it.
Refuse to change attributes on the root directory, because it is special in FAT filesystems, but allow most other attribute changes on directories.
Don't set the archive attribute on a directory when its modification time is updated. Windows and DOS don't set the archive attribute in that scenario, so we are now bug-for-bug compatible.
smbfs_node.c, smbfs_vnops.c: Add support for UF_HIDDEN, UF_SYSTEM, UF_READONLY and UF_ARCHIVE in SMBFS.
This is similar to changes that Apple has made in their version of SMBFS (as of smb-583.8, posted on opensource.apple.com), but not quite the same.
We map SMB_FA_READONLY to UF_READONLY, because UF_READONLY is intended to match the semantics of the DOS readonly flag. The MacOS X code maps both UF_IMMUTABLE and SF_IMMUTABLE to SMB_FA_READONLY, but the immutable flags have stronger meaning than the DOS readonly bit.
stat.h: Add definitions for UF_SYSTEM, UF_SPARSE, UF_OFFLINE, UF_REPARSE, UF_ARCHIVE, UF_READONLY and UF_HIDDEN.
The definition of UF_HIDDEN is the same as the MacOS X definition.
Add commented-out definitions of UF_COMPRESSED and UF_TRACKED. They are defined in MacOS X (as of 10.8.2), but we do not implement them (yet).
ufs_vnops.c: Add support for getting and setting UF_ARCHIVE, UF_HIDDEN, UF_OFFLINE, UF_READONLY, UF_REPARSE, UF_SPARSE, and UF_SYSTEM in UFS. Alphabetize the flags that are supported.
These new flags are only stored, UFS does not take any action if the flag is set.
Sponsored by: Spectra Logic Reviewed by: bde (earlier version)
|
254608 |
21-Aug-2013 |
gibbs |
Add kstat entries for ZFS compression statistics.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio_compress.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio_compress.c: Add module lifetime functions to allocate and teardown state data.
Report: - Compression attempts. - Buffers found to be empty. - Compression calls that are skipped because the data length is already less than or equal to the minimum block length. - Compression attempts that fail to yield a 12.5% compression ratio.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c: Add calls to the zio_compress.c module's init and fini functions.
Sponosred by: Spectra Logic Corporation MFC after: 2 weeks
|
254591 |
21-Aug-2013 |
gibbs |
Enhance the ZFS vdev layer to maintain both a logical and a physical minimum allocation size for devices. Use this information to automatically increase ZFS's minimum allocation size for new top-level vdevs to a value that more closely matches the optimum device allocation size.
Use GEOM's stripesize attribute, if set, as the physical sector size of the GEOM.
Calculate the minimum blocksize of each metaslab class. Use the calculated value instead of SPA_MINBLOCKSIZE (512b) when determining the likelyhood of compression yeilding a reduction in physical space usage.
Report devices with sub-optimal block size configuration in "zpool status". Also properly fail attempts to attach devices with a logical block size greater than 8kB, since this will cause corruption to ZFS's label area.
Sponsored by: Spectra Logic Corporaion MFC after: 2 weeks
Background ========== Many modern devices use physical allocation units that are much larger than the minimum logical allocation size accessible by external commands. Two prevalent examples of this are 512e disk drives (512b logical sector, 4K physical sector) and flash devices (512b logical sector, 4K or larger allocation block size, and 128k or larger erase block size). Operations that modify less than the physical sector size result in a costly read-modify-write or garbage collection sequence on these devices.
Simply exporting the true physical sector of the device to ZFS would yield optimal performance, but has two serious drawbacks:
1) Existing pools created with devices that have different logical and physical block sizes, but were configured to use the logical block size (e.g. because the OS version used for pool construction reported the logical block size instead of the physical block size) will suddenly find that the vdev allocation size has increased. This can be easily tolerated for active members of the array, but ZFS would prevent replacement of a vdev with another identical device because it now appears that the smaller allocation size required by the pool is not supported by the new device.
2) The device's physical block size may be too large to be supported by ZFS. The optimal allocation size for the vdev may be quite large. For example, a RAID controller may export a vdev that requires read-modify-write cycles unless accessed using 64k aligned/sized requests. ZFS currently has an 8k minimum block size limit.
Reporting both the logical and physical allocation sizes for vdevs solves these problems. A device may be used so long as the logical block size is compatible with the configuration. By comparing the logical and physical block sizes, new configurations can be optimized and administrators can be notified of any existing pools that are sub-optimal.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa.h: Add the SPA_ASHIFT constant. ZFS currently has a hard upper limit of 13 (8k) for ashift and this constant is used to both document and enforce this limit.
sys/cddl/contrib/opensolaris/uts/common/sys/fs/zfs.h: Add the VDEV_AUX_ASHIFT_TOO_BIG error code.
Add fields for exporting the configured, logical, and physical ashift to the vdev_stat_t structure.
Add VDEV_STAT_VALID() macro which can be used to verify the presence of required vdev_stat_t fields in nvlist data.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c: Provide a SYSCTL_PROC handler for "max_auto_ashift". Since the limit is only referenced long after boot when a create operation occurs, there's no compelling need for it to be a boot time configurable tunable. This also allows the validation code for the max_auto_ashift value to be contained within the sysctl handler.
Populate the new fields in the vdev_stat_t structure.
Fail vdev opens if the vdev reports an ashift larger than SPA_MAXASHIFT.
Propogate vdev_logical_ashift and vdev_physical_ashift between child and parent vdevs as is done for vdev_ashift.
In vdev_open(), restore code that fails opens for devices where vdev_ashift grows. This can only happen now if the device's logical ashift grows, which means it really isn't safe to use the device.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev_impl.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_file.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_missing.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_raidz.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_root.c: Update the vdev_open() API so that both logical (what was just ashift before) and physical ashift are reported.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev_impl.h: Add two new fields, vdev_physical_ashift and vdev_logical_ashift, to vdev_t.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_config.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c: Add vdev_ashift_optimize(). Call it anytime a new top-level vdev is allocated.
cddl/contrib/opensolaris/cmd/zpool/zpool_main.c: Add text for the VDEV_AUX_ASHIFT_TOO_BIG error.
For each sub-optimally configured leaf vdev, report configured and native block sizes.
cddl/contrib/opensolaris/cmd/zpool/zpool_main.c: cddl/contrib/opensolaris/lib/libzfs/common/libzfs.h: cddl/contrib/opensolaris/lib/libzfs/common/libzfs_status.c: Introduce a new zpool status: ZPOOL_STATUS_NON_NATIVE_ASHIFT. This status is reported on healthy pools containing vdevs configured to use a block size smaller than their reported physical block size.
cddl/contrib/opensolaris/lib/libzfs/common/libzfs_status.c: Update find_vdev_problem() and supporting functions to provide the full vdev_stat_t structure to problem checking routines, and to allow decent into replacing vdevs.
Add a vdev_non_native_ashift() validator which is used on the full vdev tree to check for ZPOOL_STATUS_NON_NATIVE_ASHIFT.
cddl/contrib/opensolaris/lib/libzpool/common/kernel.c: cddl/contrib/opensolaris/lib/libzpool/common/sys/zfs_context.h: Enhance sysctl userland stubs now that a SYSCTL_PROC handler is used in vdev.c.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/metaslab_impl.h: When the group membership of a metaslab class changes (i.e. when a vdev is added or removed from a pool), walk the group list to determine the smallest block size currently available and record this in the metaslab class.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/metaslab.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c: Add the metaslab_class_get_minblocksize() accessor.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio_compress.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio_compress.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c: In zio_compress_data(), take the minimum blocksize as an input parameter instead of assuming SPA_MINBLOCKSIZE.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c: In l2arc_compress_buf(), pass SPA_MINBLOCKSIZE as the minimum blocksize of the device. The l2arc code performs has it's own code for deciding if compression is worth while, so this effectively disables zio_compress_data() from second guessing the original decision.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c: In zio_write_bp_init(), use the minimum blocksize of the normal metaslab class when compressing data.
|
254587 |
21-Aug-2013 |
delphij |
MFV r254421:
Illumos ZFS issues: 3996 want a libzfs_core API to rollback to latest snapshot
|
254585 |
20-Aug-2013 |
delphij |
MFV r254220:
Illumos ZFS issues: 4039 zfs_rename()/zfs_link() needs stronger test for XDEV
|
254445 |
17-Aug-2013 |
pjd |
Remove redundant variable.
|
254309 |
14-Aug-2013 |
markj |
Use kld_{load,unload} instead of mod_{load,unload} for the linker file load and unload event handlers added in r254266.
Reported by: jhb X-MFC with: r254266
|
254268 |
13-Aug-2013 |
markj |
FreeBSD's DTrace implementation has a few problems with respect to handling probes declared in a kernel module when that module is unloaded. In particular,
* Unloading a module with active SDT probes will cause a panic. [1] * A module's (FBT/SDT) probes aren't destroyed when the module is unloaded; trying to use them after the fact will generally cause a panic.
This change fixes both problems by porting the DTrace module load/unload handlers from illumos and registering them with the corresponding EVENTHANDLER(9) handlers. This allows the DTrace framework to destroy all probes defined in a module when that module is unloaded, and to prevent a module unload from proceeding if some of its probes are active. The latter problem has already been fixed for FBT probes by checking lf->nenabled in kern_kldunload(), but moving the check into the DTrace framework generalizes it to all kernel providers and also fixes a race in the current implementation (since a probe may be activated between the check and the call to linker_file_unload()).
Additionally, the SDT implementation has been reworked to define SDT providers/probes/argtypes in linker sets rather than using SYSINIT/SYSUNINIT to create and destroy SDT probes when a module is loaded or unloaded. This simplifies things quite a bit since it means that pretty much all of the SDT code can live in sdt.ko, and since it becomes easier to integrate SDT with the DTrace framework. Furthermore, this allows FreeBSD to be quite flexible in that SDT providers spanning multiple modules can be created on the fly when a module is loaded; at the moment it looks like illumos' SDT implementation requires all SDT probes to be statically defined in a single kernel table.
PR: 166927, 166926, 166928 Reported by: davide [1] Reviewed by: avg, trociny (earlier version) MFC after: 1 month
|
254198 |
11-Aug-2013 |
rpaulo |
fasttrap_fork(): unlock the processes before removing the tracepoints.
In the future, we'll need to come up with new proc_*() functions that accept locked processes. For now, this prevents postgresql + DTrace from crashing the system.
MFC after: 1 month
|
254138 |
09-Aug-2013 |
attilio |
The soft and hard busy mechanism rely on the vm object lock to work. Unify the 2 concept into a real, minimal, sxlock where the shared acquisition represent the soft busy and the exclusive acquisition represent the hard busy. The old VPO_WANTED mechanism becames the hard-path for this new lock and it becomes per-page rather than per-object. The vm_object lock becames an interlock for this functionality: it can be held in both read or write mode. However, if the vm_object lock is held in read mode while acquiring or releasing the busy state, the thread owner cannot make any assumption on the busy state unless it is also busying it.
Also: - Add a new flag to directly shared busy pages while vm_page_alloc and vm_page_grab are being executed. This will be very helpful once these functions happen under a read object lock. - Move the swapping sleep into its own per-object flag
The KPI is heavilly changed this is why the version is bumped. It is very likely that some VM ports users will need to change their own code.
Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff, kib Tested by: gavin, bapt (older version) Tested by: pho, scottl
|
254112 |
08-Aug-2013 |
delphij |
MFV r254079:
Illumos ZFS issues: 3957 ztest should update the cachefile before killing itself 3958 multiple scans can lead to partial resilvering 3959 ddt entries are not always resilvered 3960 dsl_scan can skip over dedup-ed blocks if physical birth != logical birth 3961 freed gang blocks are not resilvered and can cause pool to suspend 3962 ztest should print out zfs debug buffer before exiting
|
254077 |
07-Aug-2013 |
delphij |
MFV r254071:
Fix a regression introduced by fix for Illumos bug #3834. Quote from Matthew Ahrens on the Illumos issue:
ztest fails this assertion because ztest_dmu_read_write() does dmu_tx_hold_free(tx, bigobj, bigoff, bigsize); and then dmu_object_set_checksum(os, bigobj, (enum zio_checksum)ztest_random_dsl_prop(ZFS_PROP_CHECKSUM), tx);
If the region to free is past the end of the file, the DMU assumes that there will be nothing to do for this object. However, ztest does set_checksum(), which must modify the dnode. The fix is for ztest to also call
dmu_tx_hold_bonus(tx, bigobj);
so we can account for the dirty data associated with setting the checksum
Illumos ZFS issues: 3955 ztest failure: assertion refcount_count(&tx->tx_space_written) + delta <= tx->tx_space_towrite
|
254074 |
07-Aug-2013 |
delphij |
MFV r254070:
Merge vendor bugfix for ZFS test suite that triggers false positives.
Illumos ZFS issues: 3949 ztest fault injection should avoid resilvering devices 3950 ztest: deadman fires when we're doing a scan 3951 ztest hang when running dedup test 3952 ztest: ztest_reguid test and ztest_fault_inject don't place nice together
|
254014 |
06-Aug-2013 |
delphij |
MFV r254013 (dummy merge to note that the change is already merged):
Illumos ZFS issues: 3973 zfs_ioc_rename alters passed in zc->zc_name
|
254012 |
06-Aug-2013 |
delphij |
MFV r254011:
This change have no effect to FreeBSD but integrated for completeness.
Illumos ZFS issues: 348 ZFS should handle DKIOCGMEDIAINFOEXT failure
|
253993 |
06-Aug-2013 |
mav |
Block reporting of ZFS features for suspended pools.
Before executing any subcommand, zpool tool fetches pools configuration from the kernel. Before features support was added, kernel was regenerating that configuration based on data always present in memory. Unfortunately, pool features list and activity counters are not such. They are stored in ZAP, that normally resides in ARC, but under heavy memory pressure may be swapped out. If pool is suspended at this point, there is no way to recover it back since any zpool command will stuck.
This change has one predictable flaw: `zpool upgrade` always wish to upgrade suspended pools, but fortunately it can't do it due to the suspension.
|
253992 |
06-Aug-2013 |
mav |
Disable r252840 when ZFS TRIM is enabled (vfs.zfs.trim.enabled=1) and really disable TRIM otherwise.
r252840 (illumos bug 3836) is based on assumption that zio_free_sync() has no lock dependencies and should complete immediately. Unfortunately, with our TRIM implementation that is not true due to ZIO_STAGE_VDEV_IO_START added to the ZIO_FREE_PIPELINE, which, while not really accessing devices, still acquires SCL_ZIO lock for read to be sure devices won't disappear.
When TRIM is disabled, this patch enables direct free execution from r252840 and removes ZIO_STAGE_VDEV_IO_START and ZIO_STAGE_VDEV_IO_ASSESS stages from the pipeline to avoid lock acquisition. Otherwise it queues free request as it was before r252840.
|
253991 |
06-Aug-2013 |
mav |
Make `zpool clear` to reopen also reconnected cache and spare devices. Since `zpool status` reports about such kinds of errors, it is strange that they are not cleared by `zpool clear`.
|
253990 |
06-Aug-2013 |
mav |
Make ZFS to use separate thread to handle SPA_ASYNC_REMOVE async events. Existing async thread is running only on successfull spa_sync() completion, that is impossible in case of pool loosing required (last) disk(s). That indefinite delay of SPA_ASYNC_REMOVE processing made ZFS to not close the lost disks, preventing GEOM/CAM from destroying devices and reusing names on later disk reattach.
In earlier version of the patch I've tried to just run existing thread immediately, unrelated to spa_sync() completion, but that exposed number of situations where it could stuck due to locks held by stuck spa_sync(), that are required for other kinds of async events.
Experiments with OpenIndiana snapshot confirmed that they also have this issue with lost disks reattach.
|
253953 |
05-Aug-2013 |
attilio |
Revert r253939: We cannot busy a page before doing pagefaults. Infact, it can deadlock against vnode lock, as it tries to vget(). Other functions, right now, have an opposite lock ordering, like vm_object_sync(), which acquires the vnode lock first and then sleeps on the busy mechanism.
Before this patch is reinserted we need to break this ordering.
Sponsored by: EMC / Isilon storage division Reported by: kib
|
253939 |
04-Aug-2013 |
attilio |
The page hold mechanism is fast but it has couple of fallouts: - It does not let pages respect the LRU policy - It bloats the active/inactive queues of few pages
Try to avoid it as much as possible with the long-term target to completely remove it. Use the soft-busy mechanism to protect page content accesses during short-term operations (like uiomove_fromphys()).
After this change only vm_fault_quick_hold_pages() is still using the hold mechanism for page content access. There is an additional complexity there as the quick path cannot immediately access the page object to busy the page and the slow path cannot however busy more than one page a time (to avoid deadlocks).
Fixing such primitive can bring to complete removal of the page hold mechanism.
Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff Tested by: pho
|
253926 |
04-Aug-2013 |
smh |
zfs_ioc_rename should not leave the value of zc_name passed in via zc altered on return.
MFC after: 1 week
|
253821 |
30-Jul-2013 |
delphij |
MFV r253783:
Skip eviction step of processing free records when doing ZFS receive to avoid the expensive search operation of non-existent dbufs in dn_dbufs.
Illumos ZFS issues: 3834 incremental replication of 'holey' file systems is slow
MFC after: 2 weeks
|
253820 |
30-Jul-2013 |
delphij |
MFV r253782:
To quote Illumos issue #3888:
When 'zfs recv -F' is used with an incremental recv it rolls back any changes made since the last snapshot in case new changes were made to the file system while the recv is in progress (without -F the recv would fail when it does it's final check to commit the recv-ed data as the recv-ed data conflicts with the newly written data).
However, if there is a snapshot taken after the recv began rolling back to the 'latest' snapshot will not help and the recv will still fail. 'zfs recv -F' should be extended to destroy any snapshots created since the source snapshot when finishing the recv (effectively rolling back through all snapshots, instead of just to the latest snapshot).
Illumos ZFS issues: 3888 zfs recv -F should destroy any snapshots created since the incremental source
MFC after: 2 weeks
|
253819 |
30-Jul-2013 |
delphij |
MFV r253781 + r253871:
Illumos ZFS issues: 3894 zfs should not allow snapshot of inconsistent dataset
MFC after: 2 weeks
|
253816 |
30-Jul-2013 |
delphij |
MFV r253780:
To quote Illumos #3875:
The problem here is that if we ever end up in the error path, we drop the locks protecting access to the zfsvfs_t prior to forcibly unmounting the filesystem. Because z_os is NULL, any thread that had already picked up the zfsvfs_t and was sitting in ZFS_ENTER() when we dropped our locks in zfs_resume_fs() will now acquire the lock, attempt to use z_os, and panic.
Illumos ZFS issues: 3875 panic in zfs_root() after failed rollback
MFC after: 2 weeks
|
253806 |
30-Jul-2013 |
mav |
Allow three IOCTLs to be used on suspended pool, restoring state that existed before IOCTL code refactoring merged change 4445fffb from illumos at r248571.
This change allows `zpool clear` to be used again to recover suspended pool. It seems the only was supposed by the code to restore pool operation after reconnecting lost disks that were required for data completeness. There are still cases where `zpool clear` command can just safely stuck due to deadlocks inside ZFS kernel part, but probably that is better then having no chances to recover at all.
|
253754 |
28-Jul-2013 |
mav |
Partially close race between calls of orphan() method from GEOM and close() method from ZFS core, that reliably causes use-after-free panic if SSD vdev detached during inititial erase.
|
253643 |
25-Jul-2013 |
mav |
Following r222950, revert unintentional change cls -> class in argument name in r245264. Aside from non-uniformity, that again confused C++ compilers.
|
253606 |
24-Jul-2013 |
avg |
zfs module: perform cleanup during shutdown in addition to module unload
- move init and fini code into separate functions (like it is done upstream) - invoke fini code via shutdown_post_sync event hook
This should make zfs close its underlying devices during shutdown, which may be important for their drivers.
MFC after: 20 days
|
253603 |
24-Jul-2013 |
avg |
zfs: move vnode creation from zfs_znode_cache_constructor to zfs_znode_alloc
All other places where a znode is allocated do not need z_vnode at all. These are: - zfs_create_share_dir - zfs_create_fs
This chnage ensures two things: - VN_LOCK_ASHARE is not erroneously called for VFIFO vnodes - vn_lock is called on a fully constructed vnode with correct v_ops
The change also allows to make zfs_znode_cache_constructor a normal kmem_cache constructor again (as it is in upstream). This allows to avoid a problem where zfs_znode_cache_destructor may be called on un-constructed znodes.
MFC after: 17 days
|
253441 |
18-Jul-2013 |
delphij |
Manually merge part of vendor import r238583 from Illumos.
Illumos changeset: 13680:2bd022a765e2 Illumos ZFS issue:
2671 zpool import should not fail if vdev ashift has increased
MFC after: 3 days
|
253079 |
09-Jul-2013 |
avg |
dtrace/fasttrap: install hook functions only after all data is initialized
Sponsored by: HybridCluster MFC after: 7 days
|
253073 |
09-Jul-2013 |
avg |
zfs: try to properly handle i/o errors in mappedread_sf
Unconditionally freeing a page is not good, especially if it is the page that was wired by the caller. The checks are picked up from kern_sendfile.
MFC after: 3 weeks
|
253070 |
09-Jul-2013 |
avg |
zfs: load zpool.cache after a root fs is mounted
MFC after: 3 weeks
|
252850 |
05-Jul-2013 |
markj |
Hide references to mod_lock. In FreeBSD it is always acquired with the provider lock held, so its use has no effect.
|
252840 |
05-Jul-2013 |
mm |
MFV r252839:
Quoting illumos issue #3836: Currently zio_free() always puts the zio on a list for subsequent processing by zio_free_sync(). This is only necessary for frees that might need to issue reads (gang and dedup blocks).
By processing the majority of the frees as we encounter them, we reduce the amount of time that the spa_sync() thread spends burning CPU and not doing any i/o, thus increasing the overall write throughput of the system.
Illumos ZFS issues: 3836 zio_free() can be processed immediately in the common case
MFC after: 1 week
|
252493 |
01-Jul-2013 |
markj |
Be sure to destory the fasttrap cleanup mutex when unloading the fasttrap module. This should be MFCed with r250953.
|
252431 |
30-Jun-2013 |
rmh |
Enable kernel-specific code for FreeBSD also on other systems that use the kernel of FreeBSD.
Reviewed by: pjd
|
252390 |
29-Jun-2013 |
smh |
Remove invalid ASSERT which causes a panic on zfs renames when run with ASSERTS. Removal was missed in merge of illumos 3464 (r248571)
MFC after: 2 days
|
252380 |
29-Jun-2013 |
mm |
Unbreak "zfs jail" and "zfs unjail" (broken since r248571)
I missed to register zfs_ioc_jail and zfs_ioc_unjail as legacy ioctl's with the new zfs_ioctl_register_legacy() function.
These operations do not modify pools or datasets so there is no need to log them to pool history.
Reported by: Alexander Leidinger <ale@FreeBSD.org> and others on current@ MFC after: 3 days
|
252337 |
28-Jun-2013 |
gavin |
Don't try to re-insert an already present but invalid page.
This could happen if a thread doing a page-in loses a ZFS range lock race to a thread writing to the same range
This fixes "panic: vm_page_alloc: pindex already allocated" in http://docs.FreeBSD.org/cgi/mid.cgi?1372165971.96049.42.camel
Submitted by: avg MFC after: 1 week
|
252219 |
25-Jun-2013 |
delphij |
MFV r252215:
Restore a previous behavior before r251646, where when destructing ZFS snapshot, the ioctl would return ENOENT when it hit any of them in the errlist (the new behavior was only return ENOENT when all returns error).
Illumos ZFS issues: 3829 fix for 3740 changed behavior of zfs destroy/hold/release ioctl
MFC after: 1 week
|
252060 |
21-Jun-2013 |
smh |
Fix intermittent ZFS lock panic when kernel is compiled with debugging caused by access of uninitialized smlock in mmutex_init.
MFC after: 1 week
|
252056 |
21-Jun-2013 |
smh |
Fixed import of destroyed ZFS pools failing due to vdev_geom incorrectly preventing config loads from devices associated with destroyed pools.
Reviewed by: avg MFC after: 1 week
|
251646 |
12-Jun-2013 |
delphij |
MFV r251644:
Poor ZFS send / receive performance due to snapshot hold / release processing (by smh@)
Illumos ZFS issues: 3740 Poor ZFS send / receive performance due to snapshot hold / release processing
MFC after: 2 weeks
|
251636 |
11-Jun-2013 |
delphij |
MFV r251626:
ZFS event processing should work on R/O root filesystems
Illumos ZFS issues: 3749 zfs event processing should work on R/O root filesystems
MFC after: 2 weeks
|
251635 |
11-Jun-2013 |
delphij |
MFV r251624:
txg commit callbacks don't work
Illumos ZFS issues: 3747 txg commit callbacks don't work
MFC after: 2 weeks
|
251633 |
11-Jun-2013 |
delphij |
MFV r251622:
ZFS shouldn't ignore errors unmounting snapshots
Illumos ZFS issues: 3744 zfs shouldn't ignore errors unmounting snapshots
MFC after: 2 weeks
|
251632 |
11-Jun-2013 |
delphij |
MFV r251621:
ZFS needs a refcount audit
Illumos ZFS issues: 3741 zfs needs a refcount audit
MFC after: 2 weeks
|
251631 |
11-Jun-2013 |
delphij |
MFV r251620:
ZFS comments need cleaner, more consistent style
Illumos ZFS issues: 3741 zfs comments need cleaner, more consistent style
MFC after: 2 weeks
|
251629 |
11-Jun-2013 |
delphij |
MFV r251619:
ZFS needs better comments.
Illumos ZFS issues: 3741 zfs needs better comments
MFC after: 2 weeks
|
251520 |
08-Jun-2013 |
delphij |
MFV r251519:
* Illumos ZFS issue #3805 arc shouldn't cache freed blocks
Quote from the Illumos issue:
ZFS should proactively evict freed blocks from the cache.
Even though these freed blocks will never be used again, and thus will eventually be evicted, this causes us to use memory inefficiently for 2 reasons:
1. A block that is freed has no chance of being accessed again, but will be kept in memory preferentially to a block that was accessed before it (and is thus older) but has not been freed and thus has at least some chance of being accessed again.
2. We partition the ARC into several buckets: user data that has been accessed only once (MRU) metadata that has been accessed only once (MRU) user data that has been accessed more than once (MFU) metadata that has been accessed more than once (MFU)
The user data vs metadata split is somewhat arbitrary, and the primary control on how much memory is used to cache data vs metadata is to simply try to keep the proportion the same as it has been in the past (each bucket "evicts against" itself). The secondary control is to evict data before evicting metadata.
Because of this bucketing, we may end up with one bucket mostly containing freed blocks that are very old, while another bucket has more recently accessed, still-allocated blocks. Data in the useful bucket (with still-allocated blocks) may be evicted in preference to data in the useless bucket (with old, freed blocks).
On dcenter, we saw that the MFU metadata bucket was 230MB, while the MFU data bucket was 27GB and the MRU metadata bucket was 256GB. However, the vast majority of data in the MRU metadata bucket (256GB) was freed blocks, and thus useless. Meanwhile, the MFU metadata bucket (230MB) was constantly evicting useful blocks that will be soon needed.
The problem of cache segmentation is a larger problem that needs more investigation. However, if we stop caching freed blocks, it should reduce the impact of this more fundamental issue.
MFC after: 2 weeks
|
251478 |
06-Jun-2013 |
delphij |
MFV r251474:
* Illumos zfs issue #3137 L2ARC compression
Whether or not to compress buffers entering the L2ARC is controlled by "compression" setting on the dataset, when compression is not "off", L2ARC compression is enabled.
The compress method is always LZ4 for L2ARC when enabled because it works best for the scenario.
MFC after: 2 weeks
|
250953 |
24-May-2013 |
markj |
The fasttrap provider cleans up probes asynchronously when a process with USDT probes exits. This was previously done with a callout; however, it is possible to sleep while holding the DTrace mutexes, so a panic will occur on INVARIANTS kernels if the callout handler can't immediately acquire one of these mutexes. This panic will be frequently triggered on systems where a USDT-enabled program (perl, for instance) is often run.
This revision changes the fasttrap cleanup mechanism so that a dedicated thread is used instead of a callout. The old behaviour is otherwise preserved.
Reviewed by: rpaulo MFC after: 1 month
|
250574 |
12-May-2013 |
markj |
Bring back part of r249367 by adding DTrace's temporal option, which allows users to guarantee that the output of DTrace scripts will be time-ordered. This option is enabled by adding the line
#pragma D option temporal
to the beginning of a script, or by adding '-x temporal' to the arguments of dtrace(1).
This change fixes a bug in the original port of the temporal option. This bug was causing some assertions to fail, so they had been disabled; in this revision the assertions are working properly and are enabled.
The DTrace version number has been bumped from 1.9.0 to 1.9.1 to reflect the language change that's being introduced.
This change corresponds to part of illumos-gate commit e5803b76927480: 3021 option for time-ordered output from dtrace(1M)
Reviewed by: pfg Obtained from: illumos MFC after: 1 month
|
250149 |
01-May-2013 |
davide |
In case ZFS doesn't use UMA for buffers there's no need to waste memory creating zones that will remain empty.
Reviewed by: pjd
|
249921 |
26-Apr-2013 |
smh |
Changed ZFS TRIM sysctl from vfs.zfs.trim_disable -> vfs.zfs.trim.enabled Enabled ZFS TRIM by default
Reviewed by: pjd (mentor) Approved by: pjd (mentor) MFC after: 2 weeks
|
249858 |
24-Apr-2013 |
mm |
MFV r249857:
Merge vendor bugfix for a possible deadlock related to async destroy and improve write performance by introducing a new lock protecting tx_open_txg.
Illumos ZFS issues: 3642 dsl_scan_active() should not issue I/O to determine if async destroying is active 3643 txg_delay should not hold the tc_lock
MFC after: 1 week
|
249787 |
23-Apr-2013 |
mm |
The zfs synctask code restructuring introduced a new bug that makes it impossible to set quota and reservation on pools lower than version 22. Problem has been reported and a solution discussed with vendor.
Illumos ZFS issues: 3739 cannot set zfs quota or reservation on pool version < 22
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reported by: Steve Wills <swills@FreeBSD.org> MFC after: 3 days
|
249573 |
17-Apr-2013 |
pfg |
DTrace: Revert r249367
The following change from illumos brought caused DTrace to pause in an interactive environment:
3026 libdtrace should set LD_NOLAZYLOAD=1 to help the pid provider
This was not detected during testing because it doesn't affect scripts.
We shouldn't be changing the environment, especially since the LD_NOLAZYLOAD option doesn't apply to our (GNU) ld. Unfortunately the change from upstream was made in such a way that it is very difficult to separate this change from the others so, at least for now, it's better to just revert everything.
Reference: https://www.illumos.org/issues/3026
Reported by: Navdeep Parhar and Mark Johnston
|
249367 |
11-Apr-2013 |
pfg |
DTrace: option for time-ordered output
Merge changes from illumos:
3021 option for time-ordered output from dtrace(1M) 3022 DTrace: keys should not affect the sort order when sorting by value 3023 it should be possible to dereference dynamic variables 3024 D integer narrowing needs some work 3025 register leak in D code generation 3026 libdtrace should set LD_NOLAZYLOAD=1 to help the pid provider
This brings yet another feature implemented in upstream DTrace. A complete description is available here: http://dtrace.org/blogs/ahl/2012/07/28/my-new-dtrace-favorite/
This change bumps the DT_VERS_* number to 1.9.1 in accordance to what is done in illumos.
This change was somewhat complicated because upstream is mixed many changes in an individual commit and some of the tests don't really apply to us.
There are also appear to be differences in timestamping with Solaris so we had to workaround some assertions making sure no regression happened.
Special thanks to Fabian Keil for changes and testing.
Illumos Revisions: 13758:23432da34147
Reference: https://www.illumos.org/issues/3021 https://www.illumos.org/issues/3022 https://www.illumos.org/issues/3023 https://www.illumos.org/issues/3024 https://www.illumos.org/issues/3025 https://www.illumos.org/issues/1694
Tested by: Fabian Keil Obtained from: Illumos MFC after: 1 months
|
249356 |
11-Apr-2013 |
mm |
MFV r249354: Merge bugfixes accepted and integrated by vendor. Underlying problems have been reported by us and fixed in r240942 and r249196.
Illumos ZFS issues: 3645 dmu_send_impl: possibilty of pool hold leak 3692 Panic on zfs receive of a recursive deduplicated stream
MFC after: 8 days
|
249326 |
10-Apr-2013 |
mm |
Cast to (void *)(uintptr_t) on copyout and copyin of zfs_iocparm_t.zfs_cmd
MFC after: 9 days
|
249319 |
09-Apr-2013 |
mm |
ZFS expects a copyout of zfs_cmd_t on an ioctl error. Our sys_ioctl() doesn't copyout in this case.
To solve this issue a new struct zfs_iocparm_t is introduced consisting of: - zfs_ioctl_version (future backwards compatibility purposes) - user space pointer to zfs_cmd_t (copyin and copyout) - size of zfs_cmd_t (verification purposes)
The copyin and copyout of zfs_cmd_t is now done the illumos (vendor) way what makes porting of new changes easier and ensures correct behavior if returning an error.
MFC after: 10 days
|
249209 |
06-Apr-2013 |
mm |
MFV r249186: Do not list read-only pools in zpool.cache Reduce diff against vendor in unused vdev_disk.c
Illumos ZFS issues: 3639 zpool.cache should skip over readonly pools 3640 want automatic devid updates
MFC after: 1 week
|
249206 |
06-Apr-2013 |
mm |
MFV r248660: Merge vendor change - modify time processing in deadman thread.
Illumos ZFS issues: 3618 ::zio dcmd does not show timestamp data
MFC after: 3 weeks
|
249196 |
06-Apr-2013 |
mm |
Provide a fix for kernel panic if receiving recursive deduplicated streams. Problem reported to vendor.
Illumos ZFS issues: 3692 Panic on zfs receive of a recursive deduplicated stream
MFC after: 2 weeks
|
249195 |
06-Apr-2013 |
mm |
MFV r248217: Merge change from vendor to reduce diff only. ZFS dtrace probes are not supported on FreeBSD yet.
Illumos ZFS issues: 3598 want to dtrace when errors are generated in zfs
MFC after: 3 weeks
|
249188 |
06-Apr-2013 |
mm |
MFV r242816: Import vendor change to reduce diff, no effect on FreeBSD.
Illumos ZFS issues: 3517 importing pool with autoreplace=on and "hole" vdevs crashes syseventd
|
249047 |
03-Apr-2013 |
avg |
spa_open_common: fix argument to zvol_create_minors
Prior to r248571 spa_open was always called with a bare pool name, but now it is called with a dataset name instead (spa_lookup handles that). So, when a ZFS root is mounted spa_open is called with a name of a root dataset, which can very well be different from the pool name. But zvol_create_minors should be called with the pool name, because it performs a recursive traversal of all datasets under the name to find all those that are volumes.
MFC after: 7 days
|
249042 |
03-Apr-2013 |
mm |
Fix possible pool hold leak in dmu_send_impl()
Problem reported to vendor: https://www.illumos.org/issues/3645
Reported by: Andriy Gapon <avg@FreeBSD.org> MFC after: 15 days
|
249004 |
02-Apr-2013 |
mm |
Do not check against uninitialized rc and comment out vendor code
MFC after: 16 days
|
248983 |
01-Apr-2013 |
pfg |
Dtrace: enablings on defunct providers prevent providers from unregistering
Merge change from illumos:
1368 enablings on defunct providers prevent providers from unregistering
We try to address some underlying differences between the Solaris and FreeBSD implementations: dtrace_attach() / dtrace_detach() are currently unimplemented in FreeBSD but the new code from illumos makes use of taskq so some adaptations were made to dtrace_open() and dtrace_close() to handle them appropriately.
Illumos Revision: r13430:8e6add739e38
Reference: https://www.illumos.org/issues/1368
Reviewed by: gnn Tested by: Fabian Keil Obtained from: Illumos MFC after: 3 weeks
|
248976 |
01-Apr-2013 |
mm |
Call dmu_snapshot_list_next() in zvol.c with dsl_pool_config lock held
Submitted by: Andriy Gapon <avg@FreeBSD.org> MFC after: 17 days
|
248752 |
26-Mar-2013 |
pfg |
Dtrace: dtrace.c erroneously checks for memory alignment on amd64.
Merge change from illumos:
3511 dtrace.c erroneously checks for memory alignment on amd64
Illumos Revision: c93cc65
Reference: https://www.illumos.org/issues/3511
Obtained from: Illumos MFC after: 3 weeks
|
248708 |
25-Mar-2013 |
pfg |
Dtrace: Add SUN MDB-like type-aware print() action.
Merge change from illumos:
1694 Add type-aware print() action
This is a very nice feature implemented in upstream Dtrace. A complete description is available here: http://dtrace.org/blogs/eschrock/2011/10/26/your-mdb-fell-into-my-dtrace/
This change bumps the DT_VERS_* number to 1.9.0 in accordance to what is done in illumos.
While here also include some minor cleanups to ease further merging and appease clang with a fix by Fabian Keil.
Illumos Revisions: 13501:c3a7090dbc16 13483:f413e6c5d297
Reference: https://www.illumos.org/issues/1560 https://www.illumos.org/issues/1694
Tested by: Fabian Keil Obtained from: Illumos MFC after: 1 month
|
248706 |
25-Mar-2013 |
pfg |
Dtrace: add toupper()/tolower() and enhancements to lltostr().
Merge changes from illumos:
1451 DTrace needs toupper()/tolower() subroutines 1457 lltostr() D subroutine should take an optional base
This change bumps the DT_VERS_* number to 1.8.1 in accordance to what is done in illumos.
The test suite we currently include is outdated and doesnt support some updates in tst.subr.d which had to be left out for now.
Illumos Revisions: r13458 5e394d8db762 r13459 c3454574dd1a
Reference: https://www.illumos.org/issues/1451 https://www.illumos.org/issues/1457
Tested by: Fabian Keil Obtained from: Illumos MFC after: 1 month
|
248690 |
24-Mar-2013 |
pfg |
Dtrace: add optional size argument to tracemem().
Merge change from illumos:
1455 DTrace tracemem() should take an optional size argument
Our local enhancements to dt_print_bytes were equivalent to those in illumos but we made it match the illumos version to ease further code merges.
For now leave out tst.smallsize.d and tst.smallsize.d.out since those don't seem to work cleanly on FreeBSD.
This change bumps the DT_VERS_* number to 1.7.1 in accordance to what is done in illumos.
Illumos Revision: 13457:571b0355c2e3
Reference: https://www.illumos.org/issues/1455
Tested by: Fabian Keil Obtained from: Illumos MFC after: 1 month
|
248653 |
23-Mar-2013 |
will |
ZFS: Fix a panic while unmounting a busy filesystem.
This particular scenario was easily reproduced using a NFS export. When the first 'zfs unmount' occurred, it returned EBUSY via this path, while vflush() had flushed references on the filesystem's root vnode, which in turn caused its v_interlock to be destroyed. The next time 'zfs unmount' was called, vflush() tried to obtain this lock, which caused this panic.
Since vflush() on FreeBSD is a definitive call, there is no need to check vfsp->vfs_count after it completes. Simply #ifdef sun this check.
Submitted by: avg Reviewed by: avg Approved by: ken (mentor) MFC after: 1 month
|
248602 |
21-Mar-2013 |
smh |
Fix for building libzpool under i386.
Reviewed by: pjd (mentor) Approved by: pjd (mentor) MFC after: 2 weeks
|
248579 |
21-Mar-2013 |
smh |
Add missing descriptions for ZFS sysctls
Reviewed by: pjd (mentor) Approved by: pjd (mentor) MFC after: 2 weeks
|
248577 |
21-Mar-2013 |
smh |
Optimisation of TRIM processing.
Previously TRIM processing was very bursty. This was made worse by the fact that TRIM requests on SSD's are typically much slower than reads or writes. This often resulted in stalls while large numbers of TRIM's where processed.
In addition due to the way the TRIM thread was only woken by writes, deletes could stall in the queue for extensive periods of time.
This patch adds a number of controls to how often the TRIM thread for each SPA processes its outstanding delete requests. vfs.zfs.trim.timeout: Delay TRIMs by up to this many seconds vfs.zfs.trim.txg_delay: Delay TRIMs by up to this many TXGs (reduced to 32) vfs.zfs.vdev.trim_max_bytes: Maximum pending TRIM bytes for a vdev vfs.zfs.vdev.trim_max_pending: Maximum pending TRIM segments for a vdev vfs.zfs.trim.max_interval: Maximum interval between TRIM queue processing (seconds)
Given the most common TRIM implementation is ATA TRIM the current defaults are targeted at that.
Reviewed by: pjd (mentor) Approved by: pjd (mentor) MFC after: 2 weeks
|
248576 |
21-Mar-2013 |
smh |
Names the ZFS TRIM thread
Reviewed by: pjd (mentor) Approved by: pjd (mentor) MFC after: 2 weeks
|
248575 |
21-Mar-2013 |
smh |
TRIM cache devices based on time instead of TXGs. Currently, the trim module uses the same algorithm for data and cache devices when deciding to issue TRIM requests, based on how far in the past the TXG is.
Unfortunately, this is not ideal for cache devices, because the L2ARC doesn't use the concept of TXGs at all. In fact, when using a pool for reading only, the L2ARC is written but the TXG counter doesn't increase, and so no new TRIM requests are issued to the cache device.
This patch fixes the issue by using time instead of the TXG number as the criteria for trimming on cache devices. The basic delay principle stays the same, but parameters are expressed in seconds instead of TXGs. The new parameters are named trim_l2arc_limit and trim_l2arc_batch, and both default to 30 second.
Reviewed by: pjd (mentor) Approved by: pjd (mentor) Obtained from: https://github.com/dechamps/zfs/commit/17122c31ac7f82875e837019205c21651c05f8cd MFC after: 2 weeks
|
248574 |
21-Mar-2013 |
smh |
Improve TXG handling in the TRIM module. This patch adds some improvements to the way the trim module considers TXGs:
- Free ZIOs are registered with the TXG from the ZIO itself, not the current SPA syncing TXG (which may be out of date); - L2ARC are registered with a zero TXG number, as L2ARC has no concept of TXGs; - The TXG limit for issuing TRIMs is now computed from the last synced TXG, not the currently syncing TXG. Indeed, under extremely unlikely race conditions, there is a risk we could trim blocks which have been freed in a TXG that has not finished syncing, resulting in potential data corruption in case of a crash.
Reviewed by: pjd (mentor) Approved by: pjd (mentor) Obtained from: https://github.com/dechamps/zfs/commit/5b46ad40d9081d75505d6f3bf04ac652445df366 MFC after: 2 weeks
|
248573 |
21-Mar-2013 |
smh |
Don't register repair writes in the trim map.
The trim map inflight writes tree assumes non-conflicting writes, i.e. that there will never be two simultaneous write I/Os to the same range on the same vdev. This seemed like a sane assumption; however, in actual testing, it appears that repair I/Os can very well conflict with "normal" writes.
I'm not quite sure if these conflicting writes are supposed to happen or not, but in the mean time, let's ignore repair writes for now. This should be safe considering that, by definition, we never repair blocks that are freed.
Reviewed by: pjd (mentor) Approved by: pjd (mentor) Obtained from: Source: https://github.com/dechamps/zfs/commit/6a3cebaf7c5fcc92007280b5d403c15d0e61dfe3
|
248572 |
21-Mar-2013 |
smh |
Add TRIM support for L2ARC.
This adds TRIM support to cache vdevs. When ARC buffers are removed from the L2ARC in arc_hdr_destroy(), arc_release() or l2arc_evict(), the size previously occupied by the buffer gets scheduled for TRIMming. As always, actual TRIMs are only issued to the L2ARC after txg_trim_limit.
Reviewed by: pjd (mentor) Approved by: pjd (mentor) Obtained from: https://github.com/dechamps/zfs/commit/31aae373994fd112256607edba7de2359da3e9dc MFC after: 2 weeks
|
248571 |
21-Mar-2013 |
mm |
Merge libzfs_core branch: includes MFV 238590, 238592, 247580
MFV 238590, 238592: In the first zfs ioctl restructuring phase, the libzfs_core library was introduced. It is a new thin library that wraps around kernel ioctl's. The idea is to provide a forward-compatible way of dealing with new features. Arguments are passed in nvlists and not random zfs_cmd fields, new-style ioctls are logged to pool history using a new method of history logging.
http://blog.delphix.com/matt/2012/01/17/the-future-of-libzfs/
MFV 247580 [1]: To address issues of several deadlocks and race conditions the locking code around dsl_dataset was rewritten and the interface to synctasks was changed.
User-Visible Changes: "zfs snapshot" can create more arbitrary snapshots at once (atomically) "zfs destroy" destroys multiple snapshots at once "zfs recv" has improved performance
Backward Compatibility: I have extended the compatibility layer to support full backward compatibility by remapping or rewriting the responsible ioctl arguments. Old utilities are fully supported by the new kernel module.
Forward Compatibility: New utilities work with old kernels with the following restrictions: - creating, destroying, holding and releasing of multiple snapshots at once is not supported, this includes recursive (-r) commands
Illumos ZFS issues: 2882 implement libzfs_core 2900 "zfs snapshot" should be able to create multiple, arbitrary snapshots at once 3464 zfs synctask code needs restructuring
References: https://www.illumos.org/issues/2882 https://www.illumos.org/issues/2900 https://www.illumos.org/issues/3464 [1]
MFC after: 1 month Sponsored by: Hybrid Logic Inc. [1]
|
248493 |
19-Mar-2013 |
mm |
Plug memory leak in dsl_check_snap_cb() This was unnoticed because the function is very rarely used.
MFC after: 3 days
|
248457 |
18-Mar-2013 |
jhibbits |
Add FBT for PowerPC DTrace. Also, clean up the DTrace assembly code, much of which is not necessary for PowerPC.
The FBT module can likely be factored into 3 separate files: common, intel, and powerpc, rather than duplicating most of the code between the x86 and PowerPC flavors.
All DTrace modules for PowerPC will be MFC'd together once Fasttrap is completed.
|
248426 |
17-Mar-2013 |
mm |
Fix typo in sysctl description
Reported by: Jeremy Chadwick MFC after: 3 days
|
248084 |
09-Mar-2013 |
attilio |
Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes.
The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs.
The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example).
Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho
|
247852 |
05-Mar-2013 |
mm |
MFV r247845: Import ZFS bpobj bugfix from vendor.
Illumos ZFS issues: 3603 panic from bpobj_enqueue_subobj() 3604 zdb should print bpobjs more verbosely
References: https://www.illumos.org/issues/3603 https://www.illumos.org/issues/3604
MFC after: 1 week
|
247820 |
04-Mar-2013 |
gibbs |
Fix assertion failure when using userland DTrace probes from the pid provider on a kernel compiled with INVARIANTS.
sys/cddl/contrib/opensolaris/uts/intel/dtrace/fasttrap_isa.c: In fasttrap_probe_pid(), attempts to write to the address space of the thread that fired the probe must be performed with the process of the thread held. Use _PHOLD() to ensure this is the case.
In fasttrap_probe_pid(), use proc_write_regs() instead of calling set_regs() directly. proc_write_regs() performs invariant checks to verify the calling environment of set_regs(). PROC_LOCK()/UNLOCK() around the call to proc_write_regs() so that it's invariants are satisfied.
Sponsored by: Spectra Logic Corporation Reviewed by: gnn, rpaulo MFC after: 1 week
|
247602 |
02-Mar-2013 |
pjd |
Merge Capsicum overhaul:
- Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights.
- The cap_new(2) system call is left, but it is no longer documented and should not be used in new code.
- The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one.
- The cap_getrights(2) syscall is renamed to cap_rights_get(2).
- If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall.
- If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2).
- To support ioctl and fcntl white-listing the filedesc structure was heavly modified.
- The audit subsystem, kdump and procstat tools were updated to recognize new syscalls.
- Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below:
CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT.
Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2).
Added CAP_SYMLINKAT: - Allow for symlinkat(2).
Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2).
Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory.
Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall.
Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call.
Removed CAP_MAPEXEC.
CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE.
Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ | PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ | PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE | PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ | PROT_WRITE | PROT_EXEC).
Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT.
CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required).
CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required).
Added convinient defines:
#define CAP_PREAD (CAP_SEEK | CAP_READ) #define CAP_PWRITE (CAP_SEEK | CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP | CAP_SEEK | CAP_READ) #define CAP_MMAP_W (CAP_MMAP | CAP_SEEK | CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP | CAP_SEEK | 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R | CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R | CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W | CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R | CAP_MMAP_W | CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE
#define CAP_SOCK_CLIENT \ (CAP_CONNECT | CAP_GETPEERNAME | CAP_GETSOCKNAME | CAP_GETSOCKOPT | \ CAP_PEELOFF | CAP_RECV | CAP_SEND | CAP_SETSOCKOPT | CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT | CAP_BIND | CAP_GETPEERNAME | CAP_GETSOCKNAME | \ CAP_GETSOCKOPT | CAP_LISTEN | CAP_PEELOFF | CAP_RECV | CAP_SEND | \ CAP_SETSOCKOPT | CAP_SHUTDOWN)
Added defines for backward API compatibility:
#define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT | CAP_SOCK_SERVER)
Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib
|
247592 |
01-Mar-2013 |
delphij |
MFV r247575:
Import a fix tighten assertion on SPA versions from vendor (Illumos).
Illumos ZFS issue:
3543 Feature flags causes assertion in spa.c to miss certain cases
MFC after: 2 weeks
|
247585 |
01-Mar-2013 |
mm |
MFV r247316: Merge new read-only zfs properties from vendor (illumos)
Illumos ZFS issues: 3588 provide zfs properties for logical (uncompressed) space used and referenced
References: https://www.illumos.org/issues/3588
MFC after: 2 weeks
|
247540 |
01-Mar-2013 |
mm |
Fix the zfs_ioctl compat layer to support zfs_cmd size change introduced in r247265 (ZFS deadman thread). Both new utilities now support the old kernel and new kernel properly detects old utilities.
For future backwards compatibility, the vfs.zfs.version.ioctl read-only sysctl has been introduced. With this sysctl zfs utilities will be able to detect the ioctl interface version of the currently loaded zfs module.
As a side effect, the zfs utilities between r247265 and this revision don't support the old kernel module. If you are using HEAD newer or equal than r247265, install the new kernel module (or whole kernel) first.
MFC after: 10 days
|
247398 |
27-Feb-2013 |
mm |
MFV 247176, 247178, 247315: Import metaslab_sync() speedup from vendor (illumos).
Illumos ZFS issues: 3552 condensing one space map burns 3 seconds of CPU in spa_sync() thread 3564 spa_sync() spends 5-10% of its time in metaslab_sync() (when not condensing) 3578 transferring the freed map to the defer map should be constant time 3579 ztest trips assertion in metaslab_weight()
References: https://www.illumos.org/issues/3552 https://www.illumos.org/issues/3564 https://www.illumos.org/issues/3578 https://www.illumos.org/issues/3579
MFC after: 2 weeks
|
247348 |
26-Feb-2013 |
mm |
Be more verbose on ZFS deadman I/O panic Patch suggested upstream.
Suggested by: Olivier Cinquin MFC after: 12 days
|
247265 |
25-Feb-2013 |
mm |
MFV v242732:
Merge the ZFS I/O deadman thread from vendor (illumos). This feature panics the system on hanging ZFS I/O, helps debugging and resumes failed service.
The panic behavior can be controlled with the loader-only tunables: vfs.zfs.deadman_enabled (enable or disable panic on stalled ZFS I/O) vfs.zfs.deadman_synctime (expiration time for stalled ZFS I/O)
By default, ZFS I/O deadman is enabled by default on amd64 and i386 excluding virtual guest machines.
Illumos ZFS issues: 3246 ZFS I/O deadman thread
References: https://www.illumos.org/issues/3246
MFC after: 2 weeks
|
247187 |
23-Feb-2013 |
mm |
MFV r246653: Import vendor change to avoid "unitialized variable" warnings.
Illumos ZFS issues: 3522 zfs module should not allow uninitialized variables
References: https://www.illumos.org/issues/3522
|
247049 |
20-Feb-2013 |
gibbs |
Avoid panic when tearing down the DTrace pid provider for a process that has crashed.
sys/cddl/contrib/opensolaris/uts/common/dtrace/fasttrap.c: In fasttrap_pid_disable(), we cannot PHOLD the proc structure for a process that no longer exists, but we still have other, fasttrap specific, state that must be cleaned up for probes that existed in the dead process. Instead of returning early if the process related to our probes isn't found, conditionalize the locking and carry on with a NULL proc pointer. The rest of the fasttrap code already understands that a NULL proc is possible and does the right things in this case.
Sponsored by: Spectra Logic Corporation Reviewed by: rpaulo, gnn MFC after: 1 week
|
246808 |
14-Feb-2013 |
delphij |
Eliminate real_LZ4_uncompress. It's unused and does not perform sufficient check against input stream (i.e. it could read beyond specified input buffer).
|
246773 |
13-Feb-2013 |
mm |
Change vfs.zfs.write_to_degraded from CTLFLAG_RW to CTLFLAG_RWTUN
Suggested by: pjd
|
246768 |
13-Feb-2013 |
delphij |
Restore De Bruijn algorithm for sparc64 where the compiler rely on a library function for __builtin_c?z.
Tested by: Michael Moll <kvedulv kvedulv de>
|
246688 |
11-Feb-2013 |
mm |
Merge zfs_ioctl.c code that should have been merged together with ZFS v28. Fixes several problems if working with read-only pools.
Changed code originaly introduced in onnv-gate 13061:bda0decf867b Contains changes up to illumos-gate 13700:4bc0783f6064
PR: kern/175897 Suggested by: avg
MFC after: 2 weeks
|
246678 |
11-Feb-2013 |
mm |
MFV r246633: Import vendor bugfixes regarding SA rounding, header size and layout. This was already partially fixed by avg.
Illumos ZFS issues: 3512 rounding discrepancy in sa_find_sizes() 3513 mismatch between SA header size and layout
References: https://www.illumos.org/issues/3512 https://www.illumos.org/issues/3513
MFC after: 2 weeks
|
246675 |
11-Feb-2013 |
mm |
MFV r246394: Add tunable to allow block allocation on degraded vdevs.
Illumos ZFS issues: 3507 Tunable to allow block allocation even on degraded vdevs
References: https://www.illumos.org/issues/3507
MFC after: 2 weeks
|
246666 |
11-Feb-2013 |
mm |
MFV r246392: Import vendor ZFS bugfix fixing a possible deadlock in arc_read().
Illumos ZFS issues: 3498 panic in arc_read(): !refcount_is_zero(&pbuf->b_hdr->b_refcnt)
References: https://www.illumos.org/issues/3498
MFC after: 2 weeks
|
246651 |
11-Feb-2013 |
mm |
MFV r246390: Import minor type change in refcount.h header from vendor (illumos).
MFC after: 2 weeks
|
246631 |
10-Feb-2013 |
mm |
MFV r246388:
Import vendor bugfixes
Illumos ZFS issues: 3422 zpool create/syseventd race yield non-importable pool 3425 first write to a new zvol can fail with EFBIG
References: https://www.illumos.org/issues/3422 https://www.illumos.org/issues/3425
MFC after: 2 weeks
|
246586 |
09-Feb-2013 |
delphij |
MFV r245512:
* Illumos zfs issue #3035 [1] LZ4 compression support in ZFS.
LZ4 is a new high-speed BSD-licensed compression algorithm created by Yann Collet that delivers very high compression and decompression performance compared to lzjb (>50% faster on compression, >80% faster on decompression and around 3x faster on compression of incompressible data), while giving better compression ratio [1].
This version of LZ4 corresponds to upstream's [2] revision 85.
Please note that for obvious reasons this is not backward read compatible. This means once a pool have LZ4 compressed data, these data can no longer be read by older ZFS implementations.
Local changes:
- On-stack hash table disabled and using kernel slab allocator instead, at this time. This requires larger kernel thread stack for zio workers. This may change in the future should we adjusted the zio workers' thread stack size. - likely and unlikely will be undefined if they are already defined, this is required for i386 XEN build. - Removed De Bruijn sequence based __builtin_ctz family of builtins in favor of the latter. Both GCC and clang supports these builtins. - Changed the way the LZ4 code detects endianness. - Manual pages modifications to mention the feature based on Illumos counterpart. - Boot loader changes to make it support LZ4 decompression.
[1] https://www.illumos.org/issues/3035 [2] http://code.google.com/p/lz4/source/list
Obtained from: Illumos (13921:9d721847e469) Tested on: FreeBSD/amd64 MFC after: 1 month
|
246532 |
08-Feb-2013 |
avg |
zfs_vget, zfs_fhtovp: properly handle the z_shares_dir object
A special gfs vnode corresponds to that object. A regular zfs vnode must not be returned.
This should be upstreamed.
Reported by: pluknet Submitted by: rmacklem Tested by: pluknet MFC after: 10 days
|
246531 |
08-Feb-2013 |
avg |
zfs: update comments about zfid_long_t to match the FreeBSD definitions
MFC after: 1 week
|
246293 |
03-Feb-2013 |
avg |
zfs: fix, improve and re-organize page_lookup and page_unlock
Now they are split into two pairs: page_hold/page_unhold for mappedread and page_busy/page_unbusy for update_pages.
For mappedread we simply hold a page that is to be used as a source if it is resident and valid (and not busy). This is sufficient since we are only doing page -> user buffer copying. There is no page <-> backing storage I/O involved.
update_pages is now better split to properly handle the putpages case (page -> arc) and the regular write case (arc -> page).
For the latter we use complete protocol of marking an object with paging-in-progress and marking a page with io_start (busy count). Also, in this case we remove the write bit from all page mappings and clear dirty bits of the pages, the former is needed to ensure that the latter does the right thing. Additionally we update a page if it is cached instead of just freeing it as was done before. This needs to be verified.
A minor detail: ZFS-backed pages should always be either fully valid or fully invalid. Assert this and use simpler API that does not deal with sub-page blocks.
Reviewed by: kib MFC after: 26 days
|
246242 |
02-Feb-2013 |
avg |
zfs: add MODULE_VERSION for zfsctrl
This should allow the kernel linker to easily detect a situation when the module is present both in a kernel and in a preloaded file (zfs.ko).
Reviewed by: jhb MFC after: 5 days
|
245945 |
26-Jan-2013 |
avg |
spa_generate_rootconf: add support for old vdev labels
It seems that old ZFS versions (v15) completely omit "vdev_children" property when there is a single child.
Reported by: jase Tested by: jase MFC after: 1 week
|
245511 |
16-Jan-2013 |
delphij |
MFV r245510:
improve the comment in txg.c
Obtained from: Illumos (13910:f3454e0a097c) MFC after: 2 weeks
|
245409 |
14-Jan-2013 |
kib |
For zfs vnodes, use the standard inode number based hash algorithm.
Reviewed and tested by: peter Sponsored by: The FreeBSD Foundation MFC after: 5 days
|
245264 |
10-Jan-2013 |
delphij |
The current ZFS code expects ddt_zap_count to always succeed by asserting the underlying zap_count() to return no errors. However, it is possible that the pool reaches to such a state where zap_count would return error, leading to panics when a pool is imported.
This commit changes the ddt_zap_count to return error returned from zap_count and handle the error appropriately. With this change, it's now possible to let zpool rollback damaged transaction groups and import the pool.
Obtained from: ZFS on Linux github (e8fd45a0f975c6b8ae8cd644714fc21f14fac2bf) MFC after: 1 month
|
244635 |
23-Dec-2012 |
avg |
zfs: solaris doesn't have KM_ZERO, kmem_zalloc should be used instead
To do: remove KM_ZERO declaration Pointyhat to: avg (for mindlessly using the pseudo-flag) MFC after: instantly (to fix stable/8 build)
|
244188 |
13-Dec-2012 |
smh |
Added vfs.zfs.vdev.trim_on_init sysctl which allows full vdev trim on initialisation to be enabled (1) / disabled (0) defaults to enabled.
This is useful for devices which have a slow trim speed and are either new or have otherwise already been wiped e.g. secure erase.
PR: kern/173116 Submitted by: Steven Hartland Approved by: pjd (mentor)
|
244187 |
13-Dec-2012 |
smh |
Upgrades trim free request sizes before inserting them into to free map, making range consolidation much more effective particularly for small deletes.
This reduces memory used by the free map as well as reducing the number of bio requests down to geom required to process all deletes.
In tests this achieved a factor of 10 reduction of trim ranges / geom call downs.
While I'm here correct the description of zio_vdev_io_start.
PR: kern/173254 Submitted by: Steven Hartland Approved by: pjd (mentor)
|
244155 |
12-Dec-2012 |
smh |
Renamed zfs trim stats removing duplicate zio_trim identifier from the name Added description option to kstats. Added descriptions for zio_trim kstats
PR: kern/173113 Submitted by: Steven Hartland Reviewed by: pjd Approved by: pjd MFC after: 2 weeks
|
243807 |
03-Dec-2012 |
delphij |
Use SA_ZPL_CRTIME instead of SA_ZPL_CTIME for creation time.
Submitted by: phil.stone at gmx.com MFC after: 2 weeks
|
243763 |
01-Dec-2012 |
avg |
zfs_getpages: make use of vm_page_readahead_finish
Suggested by: kib MFC after: 5 days
|
243762 |
01-Dec-2012 |
avg |
gfs_file_inactive: replace bad code with ugly code
Also, make it explicit that V_XATTRDIR is not properly supported in gfs code yet.
The bad code was plain incorrect: (a) it spoiled handling of v_usecount reaching zero and (b) it leaked v_holdcnt.
The ugly code employs potentially unsafe locking tricks.
Ideally we should separate vnode lifecycle and gfs node lifecycle. A gfs node should have its own reference count where its child nodes should be accounted.
PR: kern/151111 Reviewed by: kib MFC after: 13 days
|
243560 |
26-Nov-2012 |
mm |
MFV r243395:
Introduce a new dataset aclmode setting "restricted" to protect ACL's being destroyed or corrupted by a drive-by chmod.
illumos-gate 13889:a67716f16746 3254 add support in zfs for aclmode=restricted
References: https://www.illumos.org/issues/3254
MFC after: 2 weeks
|
243525 |
25-Nov-2012 |
mm |
Add loader(8) tunable to enable/disable nopwrite functionality: vfs.zfs.nopwrite_enabled
MFC after: 2 weeks
|
243524 |
25-Nov-2012 |
mm |
MFV r243013 and r243267:
Import the zio nop-write improvement from Illumos. To reduce I/O, nop-write omits overwriting data if the checksum (cryptographically secure) of new data matches the checksum of existing data. It also saves space if snapshots are in use.
It currently works only on datasets with enabled compression, disabled deduplication and sha256 checksums.
IllumOS 13887:196932ec9e6a and 13888:7204b3392a58 3236 zio nop-write
References: https://www.illumos.org/issues/3236
MFC after: 2 weeks
|
243521 |
25-Nov-2012 |
avg |
zfs_freebsd_reclaim: remove a stray variable
... which leaked from a subsequent local change. Unfortunately I noticed that only after commit.
MFC after: 5 weeks X-MFC with: r243520
|
243520 |
25-Nov-2012 |
avg |
zfs: overhaul zfs-vfs glue for vnode life-cycle management
* There is no need for the delayed destruction of znodes via taskqueue, now that we do not need to fear recursion from getnewvnode into zfs_inactive and zfs_freebsd_reclaim, thus making znode/vnode state machine a bit simpler.
* More complete porting of zfs_inactive from Solaris VFS model to FreeBSD vop_inactive and vop_reclaim model. All destructive actions are done in zfs_freebsd_reclaim. This allows to simplify zfs_zget logic.
* Allow zfs_zget to return a doomed vnode if the current thread already has an exclusive lock on the vnode.
* Clean up Solaris-isms like bailing out of reclaim/inactive on certain values of v_usecount (aka v_count) or directly messing with this counter.
* Do not clear z_vnode while znode is still accessible. z_vnode should be cleared only after zfs_znode_dmu_fini. Otherwise zfs_zget may get an effectively half-deconstructed znode. This allows to simplify zfs_zget logic further.
The above changes fix at least two known/reported problems:
o An indefinite wait in the following code path: vgone -> VOP_RECLAIM -> zfs_freebsd_reclaim -> vnode_destroy_vobject -> put_pages -> zfs_write -> zil_commit -> zfs_zget This happened because vgone marks a vnode as VI_DOOMED before calling VOP_RECLAIM, but zfs_zget would not return a doomed vnode under any circumstances. The fix in this change is not complete as it won't fix a deadlock between two threads doing VOP_RECLAIM where one thread is in zil_commit trying to zfs_zget a znode/vnode being reclaimed by the other thread, which would be blocked trying to enter zil_commit. This type of deadlock has not been reported as of now.
o An indefinite wait in the unmount path caused by a znode "falling through the cracks" in inactive+reclaim. This would happen if the znode is unlinked while its vnode is still active.
To Do: pass locking flags parameter to zfs_zget, so that the zfs-vfs glue code doesn't have to re-lock a vnode but could ask for proper locking from the very start. This would also allow for the higher level code to obtain a doomed vnode when it is expected/requested. Or to avoid blocking when it is not allowed (see zil_commit example above).
ffs_vgetf seems like a good source of inspiration.
Tested by: Willem Jan Withagen <wjw@digiware.nl> MFC after: 6 weeks
|
243519 |
25-Nov-2012 |
avg |
zfs_fhtovp: there is no reason to amend lock flags with LK_RETRY here
MFC after: 12 days
|
243518 |
25-Nov-2012 |
avg |
add zfs_bmap to aid vnode_pager_haspage
... otherwise zfs_getpages would mostly be called with one page at a time.
It is expected that ZFS VOP_BMAP is only called from vnode_pager_haspage. Since ZFS files can have variable block sizes and also because we don't really know if any given blocks are consecutive, we can not really report any additional blocks behind or ahead of a given block. Since physical block numbers do not make sense for ZFS, we do not do any real translation and thus pass back blk = lblk. The net effect is that vnode_pager_haspage knows that the block exists and that the pages backed by the block can be accessed. vnode_pager_haspage may be wrong about the exact count of the pages backed by the block, because of a variable block size, which vnode_pager_haspage doesn't really know - it only knows max block size in a filesystem. So pages from multiple blocks can be passed to zfs_getpages, but that is expected and correctly handled.
vnode_pager should not call zfs_bmap for any other reason, because ZFS implements VOP_PUTPAGES and thus vnode_pager_generic_getpages is not used.
vfs_cluster code vfs_bio code should not be called for ZFS, because ZFS does not use buffer cache layer.
Also, ZFS does not use vn_bmap_seekhole, it has its prviate mechanism for working with holes.
The above list should cover all the current calls to VOP_BMAP.
Reviewed by: kib MFC after: 6 weeks
|
243517 |
25-Nov-2012 |
avg |
zfs_getpages: optimize for large block sizes
MFC after: 6 weeks
|
243505 |
25-Nov-2012 |
mm |
MFV r243012:
Illumos 13886:e3261d03efbf
3349 zpool upgrade -V bumps the on disk version number, but leaves the in core version
References: https://www.illumos.org/issues/3349
MFC after: 1 week
|
243503 |
25-Nov-2012 |
mm |
MFV r242735:
Illumos 13879:4eac7a87eff2: 3329 spa_sync() spends 10-20% of its time in spa_free_sync_cb() 3330 space_seg_t should have its own kmem_cache 3331 deferred frees should happen after sync_pass 1 3335 make SYNC_PASS_* constants tunable
New loader-only tunables: vfs.zfs.sync_pass_deferred_free vfs.zfs.sync_pass_dont_compress vfs.zfs.sync_pass_rewrite
References: https://www.illumos.org/issues/3329 https://www.illumos.org/issues/3330 https://www.illumos.org/issues/3331 https://www.illumos.org/issues/3335
MFC after: 2 weeks
|
243502 |
24-Nov-2012 |
avg |
zfs roopool: add support for multi-vdev configurations
Tested by: madpilot MFC after: 10 days
|
243501 |
24-Nov-2012 |
avg |
spa_import_rootpool: initialize ub_version before calling spa_config_parse
... because the latter makes some decision based on the version. This is especially important for raidz vdevs. This is similar to what spa_load does.
This is not an issue for upstream because they do not seem to support using raidz as a root pool.
Reported by: Andrei Lavreniyuk <andy.lavr@gmail.com> Tested by: Andrei Lavreniyuk <andy.lavr@gmail.com> MFC after: 6 days
|
243500 |
24-Nov-2012 |
avg |
spa_import_rootpool: do not call spa_history_log_version
The call is a NOP, because pool version in spa_ubsync.ub_version is not initialized and thus appears to be zero. If the version is properly set then the call leads to a NULL pointer dereference because the spa object is still under-constructed.
The same change was independently made in the upstream as a part of a larger change (4445fffbbb1ea25fd0e9ea68b9380dd7a6709025).
MFC after: 6 days
|
243497 |
24-Nov-2012 |
avg |
zfs: create devices/geoms from zvols after receiveing them
PR: kern/167066 Tested by: Andreas Nilsson <andrnils@gmail.com> MFC after: 13 days
|
243270 |
19-Nov-2012 |
avg |
zfs_remove: assert that delete_now case is never true on FreeBSD
That case is specific to Solaris VFS and it would violate pretty fundamental contracts of FreeBSD VFS.
Discussed with: pjd MFC after: 12 days
|
243268 |
19-Nov-2012 |
avg |
zfs_remove: set VV_NOSYNC flag if a node is unlinked
Suggested by: kib MFC after: 12 days
|
243213 |
18-Nov-2012 |
avg |
spa_import_rootpool: fall back to use configuration from zpool.cache...
if we fail to generate a proper root pool configuration based on disk probing. Currently we can not properly generate the configuration for multi-vdev pools. Make that explicit.
Reported by: madpilot, Bartosz Stec <bartosz.stec@it4pro.pl> Tested by: madpilot, Bartosz Stec <bartosz.stec@it4pro.pl> MFC after: 4 days
|
242958 |
13-Nov-2012 |
kib |
Add the wait6(2) system call. It takes POSIX waitid()-like process designator to select a process which is waited for. The system call optionally returns siginfo_t which would be otherwise provided to SIGCHLD handler, as well as extended structure accounting for child and cumulative grandchild resource usage.
Allow to get the current rusage information for non-exited processes as well, similar to Solaris.
The explicit WEXITED flag is required to wait for exited processes, allowing for more fine-grained control of the events the waiter is interested in.
Fix the handling of siginfo for WNOWAIT option for all wait*(2) family, by not removing the queued signal state.
PR: standards/170346 Submitted by: "Jukka A. Ukkonen" <jau@iki.fi> MFC after: 1 month
|
242862 |
10-Nov-2012 |
avg |
zfs_ioc_destroy_snaps_nvl: remove disk device entries for zvol snapshots
... before trying to destroy the zvol snapshots themselves.
PR: kern/173442 Reported by: Petri Helenius <petri@helenius.fi>, mm Obtained from: Brian Behlendorf <behlendorf1@llnl.gov>, Illumos Bug #3170 Tested by: Petri Helenius <petri@helenius.fi> MFC after: 10 days
|
242845 |
10-Nov-2012 |
delphij |
MFV r242729 (mm):
Illumos r13840:97fd5cdf328a:
3145 single-copy arc 3212 ztest: race condition between vdev_online() and spa_vdev_remove()
Illumos r13849:3468a95b27cd:
3258 ztest's use of file descriptors is unstable
|
242833 |
09-Nov-2012 |
attilio |
Complete MPSAFE VFS interface and remove MNTK_MPSAFE flag. Porters should refer to __FreeBSD_version 1000021 for this change as it may have happened at the same timeframe.
|
242723 |
07-Nov-2012 |
jhibbits |
Implement DTrace for PowerPC. This includes both 32-bit and 64-bit.
There is one known issue: Some probes will display an error message along the lines of: "Invalid address (0)"
I tested this with both a simple dtrace probe and dtruss on a few different binaries on 32-bit. I only compiled 64-bit, did not run it, but I don't expect problems without the modules loaded. Volunteers are welcome.
MFC after: 1 month
|
242575 |
04-Nov-2012 |
avg |
zfs_dirlook: bailout early if directory is unlinked
Otherwise we could fail with an incorrect error if e.g. parent object id is removed too or we can even return a wrong vnode if parent object has been already re-used.
Discussed with: pjd Also see: http://article.gmane.org/gmane.os.freebsd.devel.file-systems/13863 MFC after: 26 days
|
242574 |
04-Nov-2012 |
avg |
zfsctl_snapdir_lookup: obtain a snapname in the remount case
... which is triggered if somebody did regular umount on a snapshot mount.
Reviewed by: Matthew Ahrens <mahrens@delphix.com> MFC after: 20 days
|
242573 |
04-Nov-2012 |
avg |
zfs: set MNTK_EXTENDED_SHARED flag
Discussed with: kib MFC after: 20 days
|
242571 |
04-Nov-2012 |
avg |
zfs_vnode_forget: dispose of larvae vnode using public vfs api (mostly)
Reviewed by: kib MFC after: 19 days
|
242570 |
04-Nov-2012 |
avg |
zfs_umount: no need to set MNTK_UNMOUNTF here, dounmount handles that
Reviewed by: kib MFC after: 19 days
|
242568 |
04-Nov-2012 |
avg |
zfs_vnode_lock: no need to double-guess caller's intentions here
vn_lock should do the right thing with respect to given vnode lock flags. If a caller doesn't mind a doomed vnode, then zfs should deliver.
Reviewed by: kib MFC after: 19 days
|
242567 |
04-Nov-2012 |
avg |
zfs_mount: drop vfs.zfs.rootpool.prefer_cached_config tunable
It turned out to be not that useful, because its default value may lead to a problem when a root pool is present in zpool.cache, but its on-disk status is 'exported'. This may happen if the pool was imported in a different environment with -f flag and then exported.
MFC after: 12 days
|
242566 |
04-Nov-2012 |
avg |
zfs_freebsd_close: call zfs_close with count=1 instead of count=0
Otherwise we may be leaking z_sync_cnt, which may lead to unnecessary ZIL sync-ing.
MFC after: 12 days
|
242332 |
30-Oct-2012 |
delphij |
s/dettach/detach/g
Approved by: pjd MFC after: 1 month
|
242135 |
26-Oct-2012 |
avg |
zfs: fix label validation code in vdev_geom_read_config
POOL_STATE_SPARE and POOL_STATE_L2CACHE were not handled correctly and thus the cache and spare disks would not be correctly probed.
Reported by: Michael Schmiedgen <schmiedgen@gmx.net>, Matthew D. Fuller <fullermd@over-yonder.net> Tested by: Michael Schmiedgen <schmiedgen@gmx.net>, flo MFC after: 5 days
|
241896 |
22-Oct-2012 |
kib |
Remove the support for using non-mpsafe filesystem modules.
In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems.
The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes.
Conducted and reviewed by: attilio Tested by: pho
|
241773 |
20-Oct-2012 |
avg |
zfs: wait in arc_lowmem only if curproc == pageproc
... otherwise the current thread might be holding ARC locks and thus run into a deadlock. This happens, for example, when a thread does memory allocation in the ARC code and runs into KVA shortage. Also, it really makes the most sense to wait in pageproc, so that the results of ARC reclamation are seen before the page cache is acted upon. In other cases where vm_lowmem is invoked, e.g. on KVA space shortage, the callers perform multiple attempts (up to 8) and wait for rather long intervals between them (up to 4 seconds), so ARC reclaim results should become visible even without explicit waiting on the ARC thread.
Note that this is not a critical issue for typical ZFS usages where KVA space should already be large enough. On amd64 systems setting KVA size to twice the physical memory size is known to mitigate KVA fragmentation issues in practice.
Side note: perhaps vm_lowmem 'how' parameter should be used to differentiate between causes of the event.
Reported by: Nikolay Denev <ndenev@gmail.com> MFC after: 19 days
|
241628 |
17-Oct-2012 |
avg |
zfs: make use of getnewvnode_reserve in zfs_mknode and zfs_zget
getnewvnode_reserve helps to avoid "recursing" back into zfs code via getnewvnode when that latter needs to reclaim some vnodes. zfs code may hold a number of locks around getnewvnode and doesn't expect any recursion to happen on those locks, because that never happens in solaris.
I believe that this change also eleiminates a need for the delayed znode destruction via the taskqueue.
Many thanks to kib for devising getnewvnode_reserve.
Reported by: flo Tested by: bapt, kwm, swills MFC after: 2 weeks X-MFC after: r241556
|
241394 |
10-Oct-2012 |
kevlo |
Revert previous commit...
Pointyhat to: kevlo (myself)
|
241370 |
09-Oct-2012 |
kevlo |
Prefer NULL over 0 for pointers
|
241297 |
06-Oct-2012 |
avg |
zvol: set mediasize in geom provider right upon its creation
... instead of deferring the action until first open. Unlike upstream this has no benefit on FreeBSD. We know that as soon as the provider is created it is going to be tasted and thus opened. Initial mediasize of zero causes tasting failure and subsequent retasting because of the size change.
MFC after: 14 days
|
241286 |
06-Oct-2012 |
avg |
zfs_mount: taste geom providers for root pool config
This should allow to mount a dataset as a root filesystem even if it belongs to a pool that is not described in zpool.cache. This adds some overhead to the boot process though.
If the root filesystem's pool is found in zpool.cache, the by default its cached configuration will be used for import. vfs.zfs.rootpool.prefer_cached_config could be set to zero to force the config to be retasted.
Discussed with: gibbs, pjd, des MFC after: 25 days
|
240955 |
26-Sep-2012 |
mm |
Merge recent vendor changes in ZFS.
Illumos issued covered: 2811 missing implementation: zfs send -r 3139 zdb dies when it tries to determine path of unlinked file 3189 kernel panic in ZFS test suite during hotspare_onoffline_004_neg 3208 moving zpool cross-endian results in incorrect user/group accounting
References: https://www.illumos.org/issues/ + [issue_id]
Obtained from: illumos (vendor/illumos, vendor/illumos-sys) MFC after: 2 weeks
|
240870 |
23-Sep-2012 |
pjd |
It is possible to recursively destroy snapshots even if the snapshot doesn't exist on a dataset we are starting from. For example if we have the following configuration:
tank tank/foo tank/foo@snap tank/bar tank/bar@snap
We can execute:
# zfs destroy -t tank@snap
eventhough tank@snap doesn't exit.
Unfortunately it is not possible to do the same with recursive rename:
# zfs rename -r tank@snap tank@pans cannot open 'tank@snap': dataset does not exist
...until now. This change allows to recursively rename snapshots even if snapshot doesn't exist on the starting dataset.
Sponsored by: rsync.net MFC after: 2 weeks
|
240868 |
23-Sep-2012 |
pjd |
Add TRIM support.
The code builds a map of regions that were freed. On every write the code consults the map and eventually removes ranges that were freed before, but are now overwritten.
Freed blocks are not TRIMed immediately. There is a tunable that defines how many txg we should wait with TRIMming freed blocks (64 by default).
There is a low priority thread that TRIMs ranges when the time comes. During TRIM we keep in-flight ranges on a list to detect colliding writes - we have to delay writes that collide with in-flight TRIMs in case something will be reordered and write will reached the disk before the TRIM. We don't have to do the same for in-flight writes, as colliding writes just remove ranges to TRIM.
Sponsored by: multiplay.co.uk
This work includes some important fixes and some improvements obtained from the zfsonlinux project, including TRIMming entire vdevs on pool create/add/attach and on pool import for spare and cache vdevs.
Obtained from: zfsonlinux Submitted by: Etienne Dechamps <etienne.dechamps@ovh.net>
|
240831 |
22-Sep-2012 |
avg |
zfs: allow a zvol to be used as a pool vdev, again
Do this by checking if spa_namespace_lock is already held and not taking it again in that case. Add a comment explaining why that is done and why it is safe.
Reviewed by: pjd MFC after: 24 days
|
240829 |
22-Sep-2012 |
pjd |
As in r226967, r226987 and r232401 changes to UFS and TMPFS remove cache entries associated with the source and the target of rename().
MFC after: 1 week
|
240632 |
18-Sep-2012 |
avg |
zfs: correctly calculate dn_bonuslen for saving SAs to disk
Since all attribute values start at 8-byte aligned boundary, we would previously incorrectly calculate dn_bonuslen if any attribute but the last had a variable-length value with length not multiple of 8.
Reported by: Nicolas Rachinsky <fbsd-mas-0@ml.turing-complete.org> Tested by: Nicolas Rachinsky <fbsd-mas-0@ml.turing-complete.org> Reviewed by: Matthew Ahrens <mahrens@delphix.com> (for upstream) MFC after: 2 weeks
|
240631 |
18-Sep-2012 |
avg |
zfs: allow both DEBUG and ZFS_DEBUG to be defined on command line
Discussed with: pjd MFC after: 10 days
|
240415 |
12-Sep-2012 |
mm |
Merge recent zfs vendor changes, sync code and adjust userland DEBUG.
Illumos issued covered: 1884 Empty "used" field for zfs *space commands 3006 VERIFY[S,U,P] and ASSERT[S,U,P] frequently check if first argument is zero 3028 zfs {group,user}space -n prints (null) instead of numeric GID/UID 3048 zfs {user,group}space [-s|-S] is broken 3049 zfs {user,group}space -t doesn't really filter the results 3060 zfs {user,group}space -H output isn't tab-delimited 3061 zfs {user,group}space -o doesn't use specified fields order 3064 usr/src/cmd/zpool/zpool_main.c misspells "successful" 3093 zfs {user,group}space's -i is noop 3098 zfs userspace/groupspace fail without saying why when run as non-root
References: https://www.illumos.org/issues/ + [issue_id]
Obtained from: illumos (vendor/illumos, vendor/illumos-sys) MFC after: 2 weeks
|
240345 |
11-Sep-2012 |
avg |
zfs: fix sa_modify_attrs handling of variable-sized attributes
- skip length_idx index for a replaced variable-sized attribute - skip length_idx index for a removed variable-sized attribute - also re-arranged code to make sure that length_idx is always incremented for variable-sized attributes - additionally add an assertion that the number of actually produced attributes is the same as the expected number of resulting attributes
In cooperation with: Matthew Ahrens <mahrens@delphix.com> Tested by: Trent Nelson <trent@snakebite.org> Reviewed by: Matthew Ahrens <mahrens@delphix.com> (for upstream) To do: get this upstreamed MFC after: 2 weeks
|
240303 |
10-Sep-2012 |
mm |
Add assfail() and assfail3() to the opensolaris module. Remove obsoleted intermediate cddl/compat/opensolaris/sys/debug.h.
MFC after: 2 weeks
|
240133 |
05-Sep-2012 |
mm |
Merge recent vendor changes and sync code: 1862 incremental zfs receive fails for sparse file > 8PB 3112 ztest does not honor ZFS_DEBUG 3122 zfs destroy filesystem should prefetch blocks 3129 'zpool reopen' restarts resilvers 3130 ztest failure: Assertion failed: 0 == dmu_objset_destroy(name, B_FALSE) (0x0 == 0x10)
References: https://www.illumos.org/issues/1862 https://www.illumos.org/issues/3112 https://www.illumos.org/issues/3122 https://www.illumos.org/issues/3129 https://www.illumos.org/issues/3130
Obtained from: illumos (vendor/illumos, vendor/illumos-sys) MFC after: 2 weeks
|
239786 |
28-Aug-2012 |
ed |
Use a proper destructor function.
When calling a revoke(2) on a dtrace device, dtrace_close() could be called, even if threads are still stuck in the device. Defer the actual deallocation of datastructures to the cdevpriv destructor.
While there, remove the unneeded D_TRACKCLOSE and D_NEEDMINOR flags. For the helper device, we never need it. For the regular dtrace devices, we only need these flags on FreeBSD pre-8.
MFC after: 1 month
|
239774 |
28-Aug-2012 |
mm |
Merge recent vendor changes: 3100 zvol rename fails with EBUSY when dirty 3104 eliminate empty bpobjs 3120 zinject hangs in zfsdev_ioctl() due to uninitialized zc
References: https://www.illumos.org/issues/3100 https://www.illumos.org/issues/3104 https://www.illumos.org/issues/3120
Obtained from: illumos (vendor/illumos, vendor/illumos-sys) MFC after: 2 weeks
|
239620 |
23-Aug-2012 |
mm |
Merge recent vendor changes: 3086 unnecessarily setting DS_FLAG_INCONSISTENT on async destroyed datasets 3090 vdev_reopen() during reguid causes vdev to be treated as corrupt 3102 vdev_uberblock_load() and vdev_validate() may read the wrong label
Referenes: https://www.illumos.org/issues/3086 https://www.illumos.org/issues/3090 https://www.illumos.org/issues/3102
PR: kern/170912, kern/170914 Obtained from: illumos (changeset #13776, #13777) MFC after: 2 weeks
|
239389 |
19-Aug-2012 |
mm |
Backport fix for vendor issue #3085 3085 zfs diff panics, then panics in a loop on booting
References: https://www.illumos.org/issues/3085
PR: kern/170763 Obtained from: ssh://anonhg@hg.illumos.org/illumos-gate (r13772) MFC after: 1 week
|
239303 |
15-Aug-2012 |
hselasky |
Streamline use of cdevpriv and correct some corner cases.
1) It is not useful to call "devfs_clear_cdevpriv()" from "d_close" callbacks, hence for example read, write, ioctl and so on might be sleeping at the time of "d_close" being called and then then freed private data can still be accessed. Examples: dtrace, linux_compat, ksyms (all fixed by this patch)
2) In sys/dev/drm* there are some cases in which memory will be freed twice, if open fails, first by code in the open routine, secondly by the cdevpriv destructor. Move registration of the cdevpriv to the end of the drm open routines.
3) devfs_clear_cdevpriv() is not called if the "d_open" callback registered cdevpriv data and the "d_open" callback function returned an error. Fix this.
Discussed with: phk MFC after: 2 weeks
|
239077 |
05-Aug-2012 |
marius |
Include <vm/vm_param.h> for PA_LOCK_COUNT in order to fix kernel build with options ZFS after r239065.
|
238926 |
30-Jul-2012 |
mm |
Partial MFV (illumos-gate 13753:2aba784c276b) 2762 zpool command should have better support for feature flags
References: https://www.illumos.org/issues/2762
MFC after: 2 weeks
|
238656 |
20-Jul-2012 |
trasz |
Make ZVOL resizing ('zfs set volsize') properly resize the GEOM provider.
Sponsored by: FreeBSD Foundation
|
238113 |
04-Jul-2012 |
pjd |
vdev_io_done stage is not used for ioctls.
MFC after: 1 week
|
237972 |
02-Jul-2012 |
mm |
Expose scrub and resilver tunables. This allows the user to tune the priority trade-off between scrub/resilver and other ZFS I/O.
MFC after: 2 weeks Discussed with: pjd
|
237817 |
29-Jun-2012 |
pfg |
Bump dtrace_helper_actions_max from 32 to 128
Dave Pacheco from Joyent (and Dtrace.org) bumped the cap to 1024 but, according to his blog, 128 is the recommended minimum.
For now bump it safely to 128 although we may have to bump it further if there is demand in the future.
Reference:
http://www.illumos.org/issues/2558 http://dtrace.org/blogs/dap/2012/01/50/where-does-your-node-program-spend-its-time/
|
237624 |
27-Jun-2012 |
pfg |
Bring llquantize support into Dtrace.
Bryan Cantrill implemented the equivalent of semi-log graph paper for Dtrace so llquantize will use one logarithmic and one linear scale.
Special thanks to Mark Peek for providing fix to an assertion and to Fabian Keill for testing the port.
Illumos Revision: 13355:15b74a2a9a9d
Reference: https://www.illumos/issues/905
Obtained from: Illumos Tested by: Fabian Keill, mp MFC after: 4 days
|
237458 |
22-Jun-2012 |
mm |
Import Illumos revision 13736:9f1d48e1681f 2901 ZFS receive fails for exabyte sparse files
References: https://www.illumos.org/issues/2901
Obtained from: illumos (issue #2901) MFC after: 1 week
|
236884 |
11-Jun-2012 |
mm |
Introduce "feature flags" for ZFS pools (bump SPA version to 5000). Add first feature "com.delphix:async_destroy" (asynchronous destroy of ZFS datasets). Implement features support in ZFS boot code.
Illumos revisions merged: 13700:2889e2596bd6 13701:1949b688d5fb 2619 asynchronous destruction of ZFS file systems 2747 SPA versioning with zfs feature flags
References: https://www.illumos.org/issues/2619 https://www.illumos.org/issues/2747
Obtained from: illumos (issue #2619, #2747) MFC after: 1 month
|
236823 |
09-Jun-2012 |
pjd |
ds_guid of 0 is special, as it is used by snapshot receive code to differentiate between an incremental and full stream. Be sure not to generate guid equal to 0.
Reported by: someone who saw 0 being generated as 64bit random guid MFC after: 3 days
|
236250 |
29-May-2012 |
pjd |
Tighten up the assertion: because size can't be 0 and even if sm_space is equal to sm_size, any 'sm_space - size' will be less than sm_size.
MFC after: 3 days
|
236249 |
29-May-2012 |
pjd |
Eliminate 'where' argument, we don't use it.
MFC after: 3 days
|
236248 |
29-May-2012 |
pjd |
Remove unused variable.
MFC after: 3 days
|
236247 |
29-May-2012 |
pjd |
Remove unused sysctl.
MFC after: 3 days
|
236155 |
27-May-2012 |
mm |
Import illumos changeset 13570:3411fd5f1589 1948 zpool list should show more detailed pool information
Display per-vdev information with "zpool list -v". The added expandsize property has currently no value on FreeBSD. This changeset allows adding expansion support to individual vdevs in the future.
References: https://www.illumos.org/issues/1948
Obtained from: illumos (issue #1948) MFC after: 2 weeks
|
236146 |
27-May-2012 |
mm |
Import illumos changeset 13605:b5c2b5db80d6 (partial) 763 FMD msg URLs should refer to something visible
Replace sun.com URL's with illumos.org
References: https://www.illumos.org/issues/763
Obtained from: illumos (issue #763) MFC after: 1 week
|
235781 |
22-May-2012 |
trasz |
Fix enforcement of file size limit with O_APPEND on ZFS.
vn_rlimit_fsize takes uio->uio_offset and uio->uio_resid into account when determining whether given write would exceed RLIMIT_FSIZE.
When APPEND flag is specified, ZFS updates uio->uio_offset to point to the end of file.
But this happens after a call to vn_rlimit_fsize, so vn_rlimit_fsize check can be rendered ineffective by thread that opens some file with O_APPEND and lseeks below RLIMIT_FSIZE before calling write.
Submitted by: Mateusz Guzik <mjguzik at gmail dot com> MFC after: 2 weeks
|
235222 |
10-May-2012 |
mm |
Import illumos changeset 13686:4bc0783f6064 2703 add mechanism to report ZFS send progress
If the zfs send command is used with the -v flag, the amount of bytes transmitted is reported in per second updates.
References: https://www.illumos.org/issues/2703
Obtained from: illumos (issue #2703) MFC after: 2 weeks
|
234795 |
29-Apr-2012 |
marius |
Partially revert r232938; ZFS only requires nfs4 but not posix1e.
Submitted by: jhb
|
234691 |
26-Apr-2012 |
rstone |
Implement the D "cpu" variable, which returns curcpu. I have chosen not to follow the example of OpenSolaris and its descendants, which implemented cpu as an inline that took a value out of curthread. At certain points in the FreeBSD scheduler curthread->td_oncpu will no longer be valid (in particukar, just before the thread gets descheduled) so instead I have implemented this as its own built-in variable.
Sponsored by: Sandvine Inc. MFC after: 1 week
|
234607 |
23-Apr-2012 |
trasz |
Remove unused thread argument to vrecycle().
Reviewed by: kib
|
234064 |
09-Apr-2012 |
attilio |
- Introduce a cache-miss optimization for consistency with other accesses of the cache member of vm_object objects. - Use novel vm_page_is_cached() for checks outside of the vm subsystem.
Reviewed by: alc MFC after: 2 weeks X-MFC: r234039
|
233918 |
05-Apr-2012 |
avg |
zfs_ioctl: no need for ddi_copyin/out here because sys_ioctl handles that
On FreeBSD the direct ioctl argument is automatically copied in/out as necesary by the kernel ioctl entry point.
PR: kern/164445 Submitted by: Luis Garces-Erice <lge@ieee.org> Tested by: Attila Nagy <bra@fsn.hu> MFC after: 5 days
|
233408 |
24-Mar-2012 |
gonzo |
Add MIPS support to cddl/contrib part:
- header and stub .c file for fasttrap module. It's not supported on MIPS yet, but there is no way to disable support completely - Do as amd64 trying to limit allocated memory
|
232938 |
13-Mar-2012 |
adrian |
Add dependencies onto acl_posix1e and acl_nfs4.
|
232186 |
26-Feb-2012 |
mm |
Analogous to r232059, add a parameter for the ZFS file system:
allow.mount.zfs: allow mounting the zfs filesystem inside a jail
This way the permssions for mounting all current VFCF_JAIL filesystems inside a jail are controlled wia allow.mount.* jail parameters.
Update sysctl descriptions. Update jail(8) and zfs(8) manpages.
TODO: document the connection of allow.mount.* and VFCF_JAIL for kernel developers
MFC after: 10 days
|
231852 |
17-Feb-2012 |
bz |
Merge multi-FIB IPv6 support from projects/multi-fibv6/head/:
Extend the so far IPv4-only support for multiple routing tables (FIBs) introduced in r178888 to IPv6 providing feature parity.
This includes an extended rtalloc(9) KPI for IPv6, the necessary adjustments to the network stack, and user land support as in netstat.
Sponsored by: Cisco Systems, Inc. Reviewed by: melifaro (basically) MFC after: 10 days
|
230945 |
03-Feb-2012 |
mm |
Revert r230913 and r230914.
The initialization was correct, the problem needs deeper analysis.
|
230914 |
02-Feb-2012 |
mm |
Add copyright information on last commits to comply with CDDL.
Discussed with: pluknet@ MFC after: 3 days
|
230913 |
02-Feb-2012 |
mm |
Fix out of bounds write causing random panics, uncovered by the change in r230256
Reviewed by: pluknet@ MFC after: 3 days
|
230689 |
29-Jan-2012 |
kmacy |
always exclude data bufs regardless of debug settings
|
230647 |
28-Jan-2012 |
kmacy |
add tunable for developers working on areas outside of ZFS to further reduce core size by excluding ARC metadata buffers from core dumps
|
230623 |
27-Jan-2012 |
kmacy |
exclude kmem_alloc'ed ARC data buffers from kernel minidumps on amd64 excluding other allocations including UMA now entails the addition of a single flag to kmem_alloc or uma zone create
Reviewed by: alc, avg MFC after: 2 weeks
|
230514 |
24-Jan-2012 |
mm |
Merge illumos revisions 13572, 13573, 13574:
Rev. 13572: disk sync write perf regression when slog is used post oi_148 [1]
Rev. 13573: crash during reguid causes stale config [2] allow and unallow missing from zpool history since removal of pyzfs [5]
Rev. 13574: leaking a vdev when removing an l2cache device [3] memory leak when adding a file-based l2arc device [4] leak in ZFS from metaslab_group_create and zfs_ereport_checksum [6]
References: https://www.illumos.org/issues/1909 [1] https://www.illumos.org/issues/1949 [2] https://www.illumos.org/issues/1951 [3] https://www.illumos.org/issues/1952 [4] https://www.illumos.org/issues/1953 [5] https://www.illumos.org/issues/1954 [6]
Obtained from: illumos (issues #1909, #1949, #1951, #1952, #1953, #1954) MFC after: 2 weeks
|
230438 |
21-Jan-2012 |
pjd |
Dramatically optimize listing snapshots when user requests only snapshot names and wants to sort them by name, ie. when executes:
# zfs list -t snapshot -o name -s name
Because only name is needed we don't have to read all snapshot properties.
Below you can find how long does it take to list 34509 snapshots from a single disk pool before and after this change with cold and warm cache:
before:
# time zfs list -t snapshot -o name -s name > /dev/null cold cache: 525s warm cache: 218s
after:
# time zfs list -t snapshot -o name -s name > /dev/null cold cache: 1.7s warm cache: 1.1s
MFC after: 1 week
|
230397 |
20-Jan-2012 |
pjd |
By default turn off prefetch when listing snapshots. In my tests it makes listing snapshots 19% faster with cold cache and 47% faster with warm cache.
MFC after: 1 week
|
230256 |
17-Jan-2012 |
pluknet |
Fix the "lock &zrl->zr_mtx already initialized" assertion by initializing the allocated memory before calling mtx_init(9) on mtx pointing to it. Otherwize, random contents of uninitialized memory might occasionally trigger the assertion.
Reported by: Pavel Polyakov <bsd kobyla org> Reviewed by: pjd MFC after: 1 week
|
229663 |
05-Jan-2012 |
pjd |
- Allow to change vfs.zfs.arc_meta_limit at runtime. - Change vfs.zfs.arc_meta_used from CTLFLAG_RDTUN to CTLFLAG_RD, as it is not a tunable.
MFC after: 3 days
|
229425 |
03-Jan-2012 |
dim |
In sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c, check the the number of links against LINK_MAX (which is INT16_MAX), not against UINT32_MAX. Otherwise, the constant would implicitly be converted to -1.
Reviewed by: pjd MFC after: 1 week
|
228686 |
18-Dec-2011 |
pjd |
From time to time people report space map corruption resulting in panic (ss == NULL) on pool import. I had such a panic recently. With current version of ZFS it is still possible to import the pool in readonly mode and backup all the data, but in case it is impossible for some reason add tunable vfs.zfs.space_map_last_hope, which when set to '1' will tell ZFS to remove colliding range and retry. This seems to have worked for me, but I consider it highly risky to use.
MFC after: 1 week
|
228685 |
18-Dec-2011 |
pjd |
Implement replying of ACLs updates. ACL changes should go to ZIL only if the 'sync' property is set to 'always', so replying them is not common.
MFC after: 1 month
|
228448 |
12-Dec-2011 |
attilio |
Revert the approach for skipping lockstat_probe_func call when doing lock_success/lock_failure, introduced in r228424, by directly skipping in dtrace_probe.
This mainly helps in avoiding namespace pollution and thus lockstat.h dependency by systm.h.
As an added bonus, this also helps in MFC case. Reviewed by: avg MFC after: 3 months (or never) X-MFC: r228424
|
228392 |
10-Dec-2011 |
pjd |
Move ru_inblock increment into arc_read_nolock() so we don't account for cached reads.
Discussed with: gibbs No objections from: avg Tested by: Marcus Reid <marcus@blazingdot.com> MFC after: 1 week
|
228363 |
09-Dec-2011 |
pjd |
The vfs.zfs.txg.timeout sysctl can be safely modified at run time.
MFC after: 1 week
|
228104 |
28-Nov-2011 |
mm |
Fix typo in copyright notice.
MFC after: 1 month
|
228103 |
28-Nov-2011 |
mm |
Merge new ZFS features from illumos:
1644 add ZFS "clones" property https://www.illumos.org/issues/1644
1645 add ZFS "written" and "written@..." properties https://www.illumos.org/issues/1645
1646 "zfs send" should estimate size of stream https://www.illumos.org/issues/1646
1647 "zfs destroy" should determine space reclaimed by destroying multiple snapshots https://www.illumos.org/issues/1647
1693 persistent 'comment' field for a zpool https://www.illumos.org/issues/1693
1708 adjust size of zpool history data https://www.illumos.org/issues/1708
1748 desire support for reguid in zfs https://www.illumos.org/issues/1748
Obtained from: illumos (changesets 13514, 13524, 13525) MFC after: 1 month
|
227697 |
19-Nov-2011 |
kib |
Existing VOP_VPTOCNP() interface has a fatal flow that is critical for nullfs. The problem is that resulting vnode is only required to be held on return from the successfull call to vop, instead of being referenced.
Nullfs VOP_INACTIVE() method reclaims the vnode, which in combination with the VOP_VPTOCNP() interface means that the directory vnode returned from VOP_VPTOCNP() is reclaimed in advance, causing vn_fullpath() to error with EBADF or like.
Change the interface for VOP_VPTOCNP(), now the dvp must be referenced. Convert all in-tree implementations of VOP_VPTOCNP(), which is trivial, because vhold(9) and vref(9) are similar in the locking prerequisites. Out-of-tree fs implementation of VOP_VPTOCNP(), if any, should have no trouble with the fix.
Tested by: pho Reviewed by: mckusick MFC after: 3 weeks (subject of re approval)
|
227291 |
07-Nov-2011 |
rstone |
Replace fasttrap_copyout() with uwrite(). FreeBSD copyout() is not able to write to the .text section of a process.
Obtained from: rpaulo MFC after: 3 days
|
227111 |
05-Nov-2011 |
pjd |
Correct typo in comment.
Reported by: Fabian Keil <fk@fabiankeil.de> MFC after: 3 days
|
227110 |
05-Nov-2011 |
pjd |
In zvol_open() if the spa_namespace_lock is already held, it means that ZFS is trying to open and taste ZVOL as its VDEV. This is not supported, so return an error instead of panicing on spa_namespace_lock recursion.
Reported by: Robert Millan <rmh@debian.org> PR: kern/162008 MFC after: 3 days
|
226732 |
25-Oct-2011 |
mm |
Fix typo in copyright notice introduced in r226724 (missing character in e-mail adress)
Reported by: pjd MFC after: 3 days
|
226724 |
25-Oct-2011 |
mm |
Update copyright information in several ZFS files, as the clause 3.3 of the CDDL licence explicitly requires every Contributor to add a copyright notice.
This also reflects the copyright notices for the changes recently added by Illumos.
MFC after: 3 days
|
226707 |
24-Oct-2011 |
pjd |
- Use better naming now that we allow to rename any mounted file system (not only legacy). - Update copyright to include myself.
MFC after: 2 weeks
|
226700 |
24-Oct-2011 |
pjd |
Don't forget to rename mounted snapshots of the file system being renamed.
MFC after: 2 weeks
|
226678 |
24-Oct-2011 |
pjd |
Include <sys/zfs_vfsops.h> only when compiling kernel module.
MFC after: 2 weeks
|
226676 |
24-Oct-2011 |
pjd |
Allow to rename file systems without remounting if it is possible. It is possible for file systems with 'mountpoint' preperty set to 'legacy' or 'none' - we don't have to change mount directory for them. Currently such file systems are unmounted on rename and not even mounted back.
This introduces layering violation, as we need to update 'f_mntfromname' field in statfs structure related to mountpoint (for the dataset we are renaming and all its children).
In my opinion it is worth it, as it allow to update FreeBSD in even cleaner way - in ZFS-only configuration root file system is ZFS file system with 'mountpoint' property set to 'legacy'. If root dataset is named system/rootfs, we can snapshot it (system/rootfs@upgrade), clone it (system/oldrootfs), update FreeBSD and if it doesn't boot we can boot back from system/oldrootfs and rename it back to system/rootfs while it is mounted as /. Before it was not possible, because unmounting / was not possible.
MFC after: 2 weeks
|
226620 |
21-Oct-2011 |
pjd |
Update per-thread I/O statistics collection in ZFS. This allows to see processes I/O activity in 'top -m io' output.
PR kern/156218 Reported by: Marcus Reid <marcus@blazingdot.com> Patch by: avg MFC after: 3 days
|
226617 |
21-Oct-2011 |
pjd |
zfs vdev_file_io_start: validate vdev before using vdev_tsd
vdev_tsd can be NULL for certain vdev states. At least in userland testing with ztest.
Submitted by: avg MFC after: 3 days
|
226512 |
18-Oct-2011 |
mm |
Import fix for Illumos bug #1475 to reduce diff against upstream.
Panic caused by this bug was already partially fixed by pjd@ in p4 CH 185940 and 185942.
Reference: 1475 zfs spill block hold can access invalid spill blkptr https://www.illumos.org/issues/1475
Reviewed by: delphij Obtained from: Illumos (issue 1475, changeset 13469:b8e89e5c4167) MFC after: 1 week
|
226483 |
17-Oct-2011 |
delphij |
Fix a bug in sa_find_sizes() which could lead to panic:
When calculating space needed for SA_BONUS buffers, hdrsize is always rounded up to next 8-aligned boundary. However, in two places the round up was done against sum of 'total' plus hdrsize. On the other hand, hdrsize increments by 4 each time, which means in certain conditions, we would end up returning with will_spill == 0 and (total + hdrsize) larger than full_space, leading to a failed assertion because it's invalid for dmu_set_bonus.
Sponsored by: iXsystems, Inc. Reviewed by: mm MFC after: 3 days
|
225617 |
16-Sep-2011 |
kmacy |
In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls.
Reviewed by: rwatson Approved by: re (bz)
|
225418 |
06-Sep-2011 |
kib |
Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic flags field. Updates to the atomic flags are performed using the atomic ops on the containing word, do not require any vm lock to be held, and are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9) functions are provided to modify afalgs.
Document the changes to flags field to only require the page lock.
Introduce vm_page_reference(9) function to provide a stable KPI and KBI for filesystems like tmpfs and zfs which need to mark a page as referenced.
Reviewed by: alc, attilio Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64) Approved by: re (bz)
|
225166 |
25-Aug-2011 |
mm |
Generalize ffs_pages_remove() into vn_pages_remove().
Remove mapped pages for all dataset vnodes in zfs_rezget() using new vn_pages_remove() to fix mmapped files changed by zfs rollback or zfs receive -F.
PR: kern/160035, kern/156933 Reviewed by: kib, pjd Approved by: re (kib) MFC after: 1 week
|
225153 |
24-Aug-2011 |
pjd |
We need to unlock and destroy vnode attached to znode which we are freeing.
Reviewed by: kib Approved by: re (bz) MFC after: 1 week
|
224855 |
13-Aug-2011 |
mm |
zfs_ioctl.c: improve code readability in zfs_ioc_dataset_list_next()
zvol.c: fix calling of dmu_objset_prefetch() in zvol_create_minors() by passing full instead of relative dataset name and prefetching all visible datasets to be processed later instead of just the pool name
Reviewed by: pjd Approved by: re (kib) MFC after: 1 week > Reviewed by: If someone else reviewed your modification. > Approved by: If you needed approval for this commit. > Obtained from: If the change is from a third party. > MFC after: N [day[s]|week[s]|month[s]]. Request a reminder email. > Security: Vulnerability reference (one per line) or description. > Empty fields above will be automatically removed.
M opensolaris/uts/common/fs/zfs/zfs_ioctl.c M opensolaris/uts/common/fs/zfs/zvol.c
|
224814 |
13-Aug-2011 |
mm |
Fix race between dmu_objset_prefetch() invoked from zfs_ioc_dataset_list_next() and dsl_dir_destroy_check() indirectly invoked from dmu_recv_existing_end() via dsl_dataset_destroy() by not prefetching temporary clones, as these count as always inconsistent. In addition, do not prefetch hidden datasets at all as we are not going to process these later.
Filed as Illumos Bug #1346
PR: kern/157728 Tested by: Borja Marcos <borjam@sarenet.es>, mm Reviewed by: pjd Approved by: re (kib) MFC after: 1 week
|
224791 |
12-Aug-2011 |
pjd |
Eliminate the zfsdev_state_lock entirely and replace it with the spa_namespace_lock. This fixes LOR between the spa_namespace_lock and spa_config lock. LOR can cause deadlock on vdevs removal/insertion.
Reported by: gibbs, delphij Tested by: delphij Approved by: re (kib) MFC after: 1 week
|
224605 |
02-Aug-2011 |
mm |
Fix panic in zfs_read() if IO_SYNC flag supplied by checking for zfsvfs->z_log before calling zil_commit(). [1] Do not call zfs_read() from zfs_getextattr() with the IO_SYNC flag.
Submitted by: Alexander Zagrebin <alex@zagrebin.ru> [1] Reviewed by: pjd@ Approved by: re (kib) MFC after: 3 days
|
224579 |
01-Aug-2011 |
mm |
Fix integer overflow in txg_delay() by initializing the variable "timeout" as clock_t.
Filed as Illumos Bug #1313
Reviewed by: avg Approved by: re (kib) MFC after: 3 days
|
224526 |
30-Jul-2011 |
mm |
Fix serious bug in ZIL that can lead to pool corruption in the case of a held dataset during remount.
Detailed description is available at: https://www.illumos.org/issues/883
illumos-gate revision: 13380:161b964a0e10
Reviewed by: pjd Approved by: re (kib) Obtained from: Illumos (Bug #883) MFC after: 3 days
|
224252 |
21-Jul-2011 |
delphij |
Bring the code more in-line with OpenSolaris source to ease future port.
Reviewed by: pjd, mm Approved by: re (kib)
|
224251 |
21-Jul-2011 |
delphij |
A different implementation of r224231 proposed by pjd@, which does not require change in the znode structure. Specifically, it queries rdev from the znode in the same sa_bulk_lookup already done in zfs_getattr().
Submitted by: pjd (with some revisions) Reviewed by: pjd, mm Approved by: re (kib)
|
224231 |
20-Jul-2011 |
delphij |
Add a new field to in-core znode, z_rdev, to represent device nodes.
PR: kern/159010 Reviewed by: mm@ Approved by: re (kib) MFC after: 2 weeks
|
224177 |
18-Jul-2011 |
mm |
ZFS tries to allocate blocks evenly across all devices. This means when devices are imbalanced zfs will lots of CPU searching for space on devices which tend to be pretty full. It should instead fail quickly on the full devices and move onto devices which have more availability.
New loader tunable: vfs.zfs.mg_alloc_failures (min = 8)
Illumos-gate changeset: 13379:4df42cc92254
Obtained from: Illumos (Bug #1051) MFC after: 2 weeks
|
224174 |
18-Jul-2011 |
mm |
Resurrect the ZFS "aclmode" property Change default of "aclmode" to "discard".
Illumos-gate changeset: 13370:8c04143bd318
Obtained from: Illumos (Feature #742) MFC after: 2 weeks
|
223758 |
04-Jul-2011 |
attilio |
With retirement of cpumask_t and usage of cpuset_t for representing a mask of CPUs, pc_other_cpus and pc_cpumask become highly inefficient.
Remove them and replace their usage with custom pc_cpuid magic (as, atm, pc_cpumask can be easilly represented by (1 << pc_cpuid) and pc_other_cpus by (all_cpus & ~(1 << pc_cpuid))).
This change is not targeted for MFC because of struct pcpu members removal and dependency by cpumask_t retirement.
MD review by: marcel, marius, alc Tested by: pluknet MD testing by: marcel, marius, gonzo, andreast
|
223623 |
28-Jun-2011 |
mm |
Add a new "REFCOMPRESSRATIO" property.
For snapshots, this is the same as COMPRESSRATIO, but for filesystems/volumes, the COMPRESSRATIO is based on the data "USED" (ie, includes blocks in children, but not blocks shared with the origin).
This is needed to figure out how much space a filesystem would use if it were not compressed (ignoring snapshots).
Illumos-gate revision: 13387
Obtained from: Illumos (Feature #1092) MFC after: 2 weeks
|
223622 |
28-Jun-2011 |
mm |
Disable vdev cache (readahead) by default.
The vdev cache is very underutilized (hit ratio 30%-70%) and may consume excessive memory on systems with many vdevs.
Illumos-gate revision: 13346
Obtained from: Illumos (Bug #175) MFC after: 1 week
|
223262 |
18-Jun-2011 |
benl |
Fix clang warnings.
Approved by: philip (mentor)
|
222950 |
10-Jun-2011 |
gibbs |
Remove C constructs that are incompatible with C++ from various OpenSolaris and ZFS header files. These changes are sufficient to allow a C++ program to use the libzfs library.
Note: The majority of these files already included 'extern "C"' declarations, so the intention of providing C++ compatibility already existed even if it wasn't provided.
cddl/compat/opensolaris/include/assert.h: Wrap our compatibility assert implementation in 'extern "C"'. Since this is a compatibility header I matched the Solaris style of doing this explicitly rather than rely on FreeBSD's __BEGIN/END_DECLS macro.
sys/cddl/compat/opensolaris/sys/kstat.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/arc.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_pool.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/ddt.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h: Rename parameters in function declarations that conflict with C++ keywords. This was the solution preferred by members of the Illumos community.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h: In C, nested structures are visible in the global namespace, but in C++, they take on the namespace of the structure in which they are contained. Flatten nested structure definitions within struct zfs_cmd so these structures are visible in the global namespace when compiled in both languages.
Sponsored by: Spectra Logic Corporation
|
222835 |
07-Jun-2011 |
mm |
Silence notice on pool creation, import and access.
Suggested by: Jeremy Chadwick (freebsd-stable@) Discussed with: pjd MFC after: 1 week
|
222813 |
07-Jun-2011 |
attilio |
etire the cpumask_t type and replace it with cpuset_t usage.
This is intended to fix the bug where cpu mask objects are capped to 32. MAXCPU, then, can now arbitrarely bumped to whatever value. Anyway, as long as several structures in the kernel are statically allocated and sized as MAXCPU, it is suggested to keep it as low as possible for the time being.
Technical notes on this commit itself: - More functions to handle with cpuset_t objects are introduced. The most notable are cpusetobj_ffs() (which calculates a ffs(3) for a cpuset_t object), cpusetobj_strprint() (which prepares a string representing a cpuset_t object) and cpusetobj_strscan() (which creates a valid cpuset_t starting from a string representation). - pc_cpumask and pc_other_cpus are target to be removed soon. With the moving from cpumask_t to cpuset_t they are now inefficient and not really useful. Anyway, for the time being, please note that access to pcpu datas is protected by sched_pin() in order to avoid migrating the CPU while reading more than one (possible) word - Please note that size of cpuset_t objects may differ between kernel and userland. While this is not directly related to the patch itself, it is good to understand that concept and possibly use the patch as a reference on how to deal with cpuset_t objects in userland, when accessing kernland members. - KTR_CPUMASK is changed and now is represented through a string, to be set as the example reported in NOTES.
Please additively note that no MAXCPU is bumped in this patch, but private testing has been done until to MAXCPU=128 on a real 8x8x2(htt) machine (amd64).
Please note that the FreeBSD version is not yet bumped because of the upcoming pcpu changes. However, note that this patch is not targeted for MFC.
People to thank for the time spent on this patch: - sbruno, pluknet and Nicholas Esborn (nick AT desert DOT net) tested several revision of the patches and really helped in improving stability of this work. - marius fixed several bugs in the sparc64 implementation and reviewed patches related to ktr. - jeff and jhb discussed the basic approach followed. - kib and marcel made targeted review on some specific part of the patch. - marius, art, nwhitehorn and andreast reviewed MD specific part of the patch. - marius, andreast, gonzo, nwhitehorn and jceel tested MD specific implementations of the patch. - Other people have made contributions on other patches that have been already committed and have been listed separately.
Companies that should be mentioned for having participated at several degrees: - Yahoo! for having offered the machines used for testing on big count of CPUs. - The FreeBSD Foundation for having sponsored my devsummit attendance, which has been instrumental. - Sandvine for having offered offices and infrastructure during development.
(I really hope I didn't forget anyone, if it happened I apologize in advance).
|
222268 |
24-May-2011 |
pjd |
Don't pass pointer to name buffer which is on the stack to another thread, because the stack might be paged out once the other thread tries to use the data. Instead, just allocate memory.
MFC after: 2 weeks
|
222267 |
24-May-2011 |
pjd |
Don't access task structure once we call task function. The task structure might be no longer available. This also allows to eliminates the need for two tasks in the zio structure.
Submitted by: anonymous MFC after: 2 weeks
|
222199 |
22-May-2011 |
rmacklem |
Fix the zfs file system so that it uses the lock flags argument added to VFS_FHTOVP() by r222167.
Reviewed by: pjd
|
222167 |
22-May-2011 |
rmacklem |
Add a lock flags argument to the VFS_FHTOVP() file system method, so that callers can indicate the minimum vnode locking requirement. This will allow some file systems to choose to return a LK_SHARED locked vnode when LK_SHARED is specified for the flags argument. This patch only adds the flag. It does not change any file system to use it and all callers specify LK_EXCLUSIVE, so file system semantics are not changed.
Reviewed by: kib
|
222050 |
18-May-2011 |
mm |
Restore old (v15) behaviour for a recursive snapshot destroy. (zfs destroy -r pool/dataset@snapshot)
To destroy all descendent snapshots with the same name the top level snapshot was not required to exist. So if the top level snapshot does not exist, check permissions of the parent dataset instead.
Filed as Illumos Bug #1043
Reviewed by: delphij Approved by: pjd MFC after: together with v28
|
221409 |
03-May-2011 |
marius |
Convert the last use of xcopyout() to ddi_copyout() and remove the now unused xcopyin() as well as xcopyout(). MFC together with r219089.
Approved by: mm
|
221263 |
30-Apr-2011 |
mm |
Fix deduplicated zfs receive (dmu_recv_stream builds incomplete guid_to_ds_map)
Illumos-gate changeset: 13329:c48b8bf84ab7 MFC together with v28
Approved by: pjd Obtained from: Illumos (Bug #755)
|
221112 |
27-Apr-2011 |
marcel |
Fix copy-paste bug.
|
220447 |
08-Apr-2011 |
mm |
Partially fix ZFS compat code for sparc64. Some endianess bugs still need to be resolved.
Submitted by: marius (parts of the fix) MFC after: 1 month
|
219973 |
24-Mar-2011 |
pjd |
Checking file access on size change is bogus. The checks are done earlier by VFS where we know if this is truncate(2) or ftruncate(2). If this is the latter we should depend on the mode the file was opened and not on the current permission.
PR: standards/154873 Reported by: Mark Martinec <Mark.Martinec@ijs.si> Discussed with: Eric Schrock <eric.schrock@delphix.com> Discussed with: Mark Maybee <Mark.Maybee@Oracle.COM> MFC after: 1 month
|
219636 |
14-Mar-2011 |
pjd |
Fix potential panic in dbuf_sync_list() relate to spill blocks handling.
Obtained from: IllumOS MFC after: 1 month
|
219404 |
08-Mar-2011 |
pjd |
Correct readdir over ZFS handling.
Reported by: Pierre Beyssac <pb@fasterix.frmug.org> MFC after: 1 month
|
219320 |
06-Mar-2011 |
pjd |
Fix libzpool build.
MFC after: 1 month
|
219317 |
05-Mar-2011 |
pjd |
Make renaming of a ZVOL, ZVOL's parent directory and ZVOL snapshot work.
Reported by: avg MFC after: 1 month
|
219316 |
05-Mar-2011 |
pjd |
Simplify zvol_remove_minors() a bit.
MFC after: 1 month
|
219089 |
27-Feb-2011 |
pjd |
Finally... Import the latest open-source ZFS version - (SPA) 28.
Few new things available from now on:
- Data deduplication. - Triple parity RAIDZ (RAIDZ3). - zfs diff. - zpool split. - Snapshot holds. - zpool import -F. Allows to rewind corrupted pool to earlier transaction group. - Possibility to import pool in read-only mode.
MFC after: 1 month
|
218550 |
11-Feb-2011 |
kib |
For UIO_NOCOPY case of reading request on zfs vnode, which has vm object attached, activate the page after the successful read, and free the page if read was unsuccessfull.
Freshly allocated page is not on any queue yet, and not activating (or deactivating) the page leaves it on no queue, excluding the page from pagedaemon scans and making the memory disappeared until the vnode reclaimed.
Reviewed by: avg MFC after: 1 week
|
218386 |
06-Feb-2011 |
trasz |
Make it impossible to clear the MNT_NFS4ACLS flag on ZFS filesystem by using "mount -uw".
Reviewed by: pjd MFC after: 2 weeks
|
218278 |
04-Feb-2011 |
ae |
vdev's sectorsize should not be greater than 8 Kbytes and also it should be power of 2. This prevents non-aligned access while probing vdev's labels.
PR: kern/147852 Reviewed by: pjd MFC after: 1 week
|
217588 |
19-Jan-2011 |
trasz |
Add MNT_NFS4ACLS to ZFS mount flags. It's not conditional, since there is no way to disable NFSv4 ACLs in ZFS. This should make it easier for the NFS server to figure out whether the exported filesystem supports ACLs or not.
Reviewed by: pjd MFC after: 2 weeks
|
217367 |
13-Jan-2011 |
mdf |
Re-commit the zfs sysctl(9) type-safety changes.
Thanks to dim and pjd for the pointer to zfs_context.h for building userland.
|
217332 |
12-Jan-2011 |
mdf |
Revert cddl changes for sysctl(9) until I understand why this isn't building on universe.
|
217319 |
12-Jan-2011 |
mdf |
sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly.
Commit the zfs piece.
|
216919 |
03-Jan-2011 |
mm |
MFp4 r186485, r186859:
Fix a race by defining two tasks in the zio structure as we can still be returning from issue task when interrupt task is used.
Tested by: pjd Approved by: pjd, delphij (mentor) MFC after: 3 days
|
216378 |
11-Dec-2010 |
pjd |
Remove redundant semicolon and empty like.
|
216256 |
07-Dec-2010 |
ivoras |
Undo r216230: the interaction between saved ashift in metadata and detected ashift does not support this. With this change, pools created while stripesize=512 could not be imported when stripesize becomes larger (on the same drive).
Noticed by: pjd
|
216230 |
06-Dec-2010 |
ivoras |
Use GEOM stripesize field when calculating ashift. This will enable correct alignment on drives with large sector sizes (e.g. 4 KiB) but the implementation might need to be revisited if devices with large stripesizes appear (e.g. if RAID controllers or flash drives start using the field), probably by introducing a physsectorsize field in GEOM providers.
Discussed with: mav, mostly silence on freebsd-geom@ and freebsd-fs@
|
215681 |
22-Nov-2010 |
jhb |
Remove some bogus, self-referential mergeinfo.
|
215401 |
16-Nov-2010 |
avg |
zfs+sendfile: populate all requested pages, not just those already cached
kern_sendfile() uses vm_rdwr() to read-ahead blocks of data to populate page cache. When sendfile stumbles upon a page that is not populated yet, it sends out all the mbufs that it collected so far. This resulted in very poor performance with ZFS when file data is not in the page cache, because ZFS vop_read for UIO_NOCOPY case populated only those pages that are already in cache, but not valid. Which means that most of the time it populated only the first requested page in the described above scenario.
Reported by: Alexander Zagrebin <alexz@visp.ru> Tested by: Alexander Zagrebin <alexz@visp.ru>, Artemiev Igor <ai@kliksys.ru> MFC after: 12 days
|
215397 |
16-Nov-2010 |
avg |
fix misspelling in a comment
Reported by: Daniel Braniss <danny@cs.huji.ac.il> MFC after: 3 days
|
215260 |
13-Nov-2010 |
mm |
Disable VFS_HOLD placed on mnt_vnodecovered during the mount of a snapshot and VFS_RELE on a non-existing hold on snapshot parent's z_vfs.
This disables the changes from OpenSolaris onnv-revision 9234:bffdc4fc05c4 (bug IDs: 6792139, 6794830) - not applicable to FreeBSD.
This fixes the process hang if umounting a manually mounted snapshot.
Reported by: Alexander Zagrebin <alexz@visp.ru> Approved by: delphij (mentor) MFC after: 1 week
|
214854 |
05-Nov-2010 |
delphij |
Validate whether the zfs_cmd_t submitted from userland is not smaller than what we have. Without the check the kernel could accessing memory that does not belong to the request struct.
Note that we do not test if the struct equals in size at this time, which may faciliate forward compatibility with newer binaries.
Reviewed by: pjd at MeetBSD CA '2010 MFC after: 1 week
|
214378 |
26-Oct-2010 |
mm |
Bugfix merge from OpenSolaris:
OpenSolaris onnv-revision: 10209:91f47f0e7728 6830541 zfs_get_data_trips on a verify 6696242 multiple zfs_fillpage() zfs: accessing past end of object panics 6785914 zfs fails to drop dn_struct_rwlock in recovery code path
Approved by: delphij (mentor) Obtained from: OpenSolaris (Bug ID 6830541, 6696242, 6785914) MFC after: 2 weeks
|
213937 |
16-Oct-2010 |
avg |
zfs: add vop_getpages method implementation
This should make vnode_pager_getpages path a bit shorter and clearer. Also this should eliminate problems with partially valid pages. Having this method opens room for future optimizations.
To do: try to satisfy other pages besides the required one taking into account tradeofs between number of page faults, read throughput and read latency. Also, eventually vop_putpages should be added too.
Reviewed by: kib, mm, pjd MFC after: 3 weeks
|
213790 |
13-Oct-2010 |
rpaulo |
In zfs_post_common(), use %d instead of %hhu.
Found with: clang
|
213730 |
12-Oct-2010 |
avg |
zfs + sendfile: do not produce partially valid pages for vnode's tail
Since r212650 and before this change sendfile(2) could produce a partially valid page for a trailing portion of a ZFS vnode. vm_fault() always wants to see a fully valid page even if it's the last page that partially extends beyond vnode's end. Otherwise it calls vop_getpages() to bring in the page. In the case of ZFS this means that the data is read from the page into the same page and this breaks checks in ZFS mappedread() - a thread that set VPO_BUSY on the page in vm_fault() will get blocked forever waiting for it to be cleared.
Many thanks to Kai and Jeremy for reproducing the issue and providing important debugging information and help.
Reported by: Kai Gallasch <gallasch@free.de>, Jeremy Chadwick <freebsd@jdc.parodius.com> Tested by: Kai Gallasch <gallasch@free.de>, Jeremy Chadwick <freebsd@jdc.parodius.com> Reviewed by: kib MFC after: 3 days To-Do: apply the same treatment to tmpfs + sendfile
|
213673 |
10-Oct-2010 |
pjd |
Provide internal ioflags() function that converts ioflag provided by FreeBSD's VFS to OpenSolaris-specific ioflag expected by ZFS. Use it for read and write operations.
Reviewed by: mm MFC after: 1 week
|
213634 |
08-Oct-2010 |
mm |
Change FAPPEND to IO_APPEND as this is a ioflag and not a fflag. This corrects writing to append-only files on ZFS.
PR: kern/149495 [1], kern/151082 [2] Submitted by: Daniel Zhelev <daniel@zhelev.biz> [1], Michael Naef <cal@linu.gs> [2] Approved by: delphij (mentor) MFC after: 1 week
|
213198 |
27-Sep-2010 |
mm |
Properly handle IO with B_FAILFAST Retry IO once with ZIO_FLAG_TRYHARD before declaring a pool faulted
OpenSolaris revision and Bug IDs:
9725:0bf7402e8022 6843014 ZFS B_FAILFAST handling is broken
Approved by: delphij (mentor) Obtained from: OpenSolaris (Bug ID 6843014) MFC after: 3 weeks
|
213197 |
27-Sep-2010 |
mm |
Enable offlining of log devices.
OpenSolaris revision and Bug IDs:
9701:cc5b64682e64 6803605 should be able to offline log devices 6726045 vdev_deflate_ratio is not set when offlining a log device 6599442 zpool import has faults in the display
Approved by: delphij (mentor) Obtained from: OpenSolaris (Bug ID 6803605, 6726045, 6599442) MFC after: 3 weeks
|
212951 |
21-Sep-2010 |
avg |
zfs_map_page/zfs_unmap_page: do not use sched_pin() and SFB_CPUPRIVATE
zfs_map_page/zfs_unmap_page are mostly called around potential I/O paths and it seems to be a not very good idea to do cpu pinning there.
Suggested by: kib MFC after: 2 weeks
|
212950 |
21-Sep-2010 |
avg |
zfs_vnops: use zfs_map_page/zfs_unmap_page helper functions in another place
MFC after: 2 weeks
|
212783 |
17-Sep-2010 |
avg |
zfs arc_reclaim_needed: fix typo in mismerge in r212780
PR: kern/146410, kern/138790 MFC after: 3 weeks X-MFC with: r212780
|
212782 |
17-Sep-2010 |
avg |
zfs+sendfile: advance uio_offset upon reading as well
Picked from analogous code in tmpfs.
MFC after: 1 week
|
212781 |
17-Sep-2010 |
avg |
zfs arc_reclaim_needed: remove redundant checks for arc_c_max and arc_c_max
Those checks are not present in upstream code and they are enforced in actual calculations of delta by which ARC size can be grown or should be reduced.
MFC after: 3 weeks
|
212780 |
17-Sep-2010 |
avg |
zfs arc_reclaim_needed: more reasonable threshold for available pages
vm_paging_target() is not a trigger of any kind for pageademon, but rather a "soft" target for it when it's already triggered. Thus, trying to keep 2048 pages above that level at the expense of ARC was simply driving ARC size into the ground even with normal memory loads. Instead, use a threshold at which a pagedaemon scan is triggered, so that ARC reclaiming helps with pagedaemon's task, but the latter still recycles active and inactive pages.
PR: kern/146410, kern/138790 MFC after: 3 weeks
|
212694 |
15-Sep-2010 |
mm |
Fix kernel panic when moving a file to .zfs/shares Fix possible loss of correct error return code in ZFS mount
OpenSolaris revisions and Bug IDs:
11824:53128e5db7cf 6863610 ZFS mount can lose correct error return
12079:13822b941977 6939941 problem with moving files in zfs (142901-12)
Approved by: delphij (mentor) Obtained from: OpenSolaris (Bug ID 6863610, 6939941) MFC after: 3 days
|
212657 |
15-Sep-2010 |
avg |
zfs vn_has_cached_data: take into account v_object->cache != NULL
This mirrors code in tmpfs. This changge shouldn't affect much read path, it may cause unnecessary vm_page_lookup calls in the case where v_object has no active or inactive pages but has some cache pages. I believe this situation to be non-essential.
In write path this change should allow us to properly detect the above case and free a cache page when we write to a range that corresponds to it. If this situation is undetected then we could have a discrepancy between data in page cache and in ARC or on disk.
This change allows us to re-enable vn_has_cached_data() check in zfs_write.
NOTE: strictly speaking resident_page_count and cache fields of v_object should be exmined under VM_OBJECT_LOCK, but for this particular usage we may get away with it.
Discussed with: alc, kib Approved by: pjd Tested with: tools/regression/fsx MFC after: 3 weeks
|
212655 |
15-Sep-2010 |
avg |
zfs mappedread, update_pages: use int for offset and length within a page
uint64_t, int64_t were redundant there
Approved by: pjd Tested by: tools/regression/fsx MFC after: 2 weeks
|
212654 |
15-Sep-2010 |
avg |
zfs mappedread: use uiomove_fromphys where possible
Reviewed by: alc Approved by: pjd Tested by: tools/regression/fsx MFC after: 2 weeks
|
212652 |
15-Sep-2010 |
avg |
zfs: catch up with vm_page_sleep_if_busy changes
Reviewed by: alc Approved by: pjd Tested by: tools/regression/fsx MFC after: 2 weeks
|
212650 |
15-Sep-2010 |
avg |
tmpfs, zfs + sendfile: mark page bits as valid after populating it with data
Otherwise, adding insult to injury, in addition to double-caching of data we would always copy the data into a vnode's vm object page from backend. This is specific to sendfile case only (VOP_READ with UIO_NOCOPY).
PR: kern/141305 Reported by: Wiktor Niesiobedzki <bsd@vink.pl> Reviewed by: alc Tested by: tools/regression/sockets/sendfile MFC after: 2 weeks
|
212611 |
14-Sep-2010 |
mm |
Remove duplicated VFS_HOLD due to a mismerge.
PR: kern/150544 Approved by: delphij (mentor) MFC after: 1 day
|
212605 |
14-Sep-2010 |
mm |
Add missing vop_vector zfsctl_ops_shares Add missing locks around VOP_READDIR and VOP_GETATTR with z_shares_dir
PR: kern/150544 Approved by: delphij (mentor) Obtained from: perforce (pjd) MFC after: 1 day
|
212573 |
13-Sep-2010 |
pjd |
Remove the page queues lock around vm_page_undirty() - it is no longer needed.
Reviewed by: alc
|
212494 |
12-Sep-2010 |
rpaulo |
Revamp locking a bit. This fixes three problems: * processes now can't go away while we are inserting probes (fixes a panic) * if a trap happens, we won't be holding the process lock (fixes a hang) * fix a LOR between the process lock and the fasttrap bucket list lock
Thanks to kib for pointing some problems. Sponsored by: The FreeBSD Foundation
|
212465 |
11-Sep-2010 |
rpaulo |
Avoid a LOR (sleepable after non-sleepable) in fasttrap_tracepoint_enable().
Sponsored by: The FreeBSD Foundation
|
212425 |
10-Sep-2010 |
mdf |
Replace sbuf_overflowed() with sbuf_error(), which returns any error code associated with overflow or with the drain function. While this function is not expected to be used often, it produces more information in the form of an errno that sbuf_overflowed() did.
|
212385 |
09-Sep-2010 |
pjd |
On FreeBSD we can log from pool that have multiple top-level vdevs or log vdevs, so don't deny adding new vdevs if bootfs property is set.
MFC after: 2 weeks
|
212357 |
09-Sep-2010 |
rpaulo |
Fix two bugs in DTrace: * when the process exits, remove the associated USDT probes * when the process forks, duplicate the USDT probes.
Sponsored by: The FreeBSD Foundation
|
212160 |
02-Sep-2010 |
gibbs |
Correct bioq_disksort so that bioq_insert_tail() offers barrier semantic. Add the BIO_ORDERED flag for struct bio and update bio clients to use it.
The barrier semantics of bioq_insert_tail() were broken in two ways:
o In bioq_disksort(), an added bio could be inserted at the head of the queue, even when a barrier was present, if the sort key for the new entry was less than that of the last queued barrier bio.
o The last_offset used to generate the sort key for newly queued bios did not stay at the position of the barrier until either the barrier was de-queued, or a new barrier (which updates last_offset) was queued. When a barrier is in effect, we know that the disk will pass through the barrier position just before the "blocked bios" are released, so using the barrier's offset for last_offset is the optimal choice.
sys/geom/sched/subr_disk.c: sys/kern/subr_disk.c: o Update last_offset in bioq_insert_tail().
o Only update last_offset in bioq_remove() if the removed bio is at the head of the queue (typically due to a call via bioq_takefirst()) and no barrier is active.
o In bioq_disksort(), if we have a barrier (insert_point is non-NULL), set prev to the barrier and cur to it's next element. Now that last_offset is kept at the barrier position, this change isn't strictly necessary, but since we have to take a decision branch anyway, it does avoid one, no-op, loop iteration in the while loop that immediately follows.
o In bioq_disksort(), bypass the normal sort for bios with the BIO_ORDERED attribute and instead insert them into the queue with bioq_insert_tail(). bioq_insert_tail() not only gives the desired command order during insertion, but also provides barrier semantics so that commands disksorted in the future cannot pass the just enqueued transaction.
sys/sys/bio.h: Add BIO_ORDERED as bit 4 of the bio_flags field in struct bio.
sys/cam/ata/ata_da.c: sys/cam/scsi/scsi_da.c Use an ordered command for SCSI/ATA-NCQ commands issued in response to bios with the BIO_ORDERED flag set.
sys/cam/scsi/scsi_da.c Use an ordered tag when issuing a synchronize cache command.
Wrap some lines to 80 columns.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c sys/geom/geom_io.c Mark bios with the BIO_FLUSH command as BIO_ORDERED.
Sponsored by: Spectra Logic Corporation MFC after: 1 month
|
212002 |
30-Aug-2010 |
jh |
execve(2) has a special check for file permissions: a file must have at least one execute bit set, otherwise execve(2) will return EACCES even for an user with PRIV_VFS_EXEC privilege.
Add the check also to vaccess(9), vaccess_acl_nfs4(9) and vaccess_acl_posix1e(9). This makes access(2) to better agree with execve(2). Because ZFS doesn't use vaccess(9) for VEXEC, add the check to zfs_freebsd_access() too. There may be other file systems which are not using vaccess*() functions and need to be handled separately.
PR: kern/125009 Reviewed by: bde, trasz Approved by: pjd (ZFS part)
|
211948 |
28-Aug-2010 |
pjd |
Return NULL pointer instead of B_FALSE as it is done in the vendor code.
Obtained from: //depot/user/pjd/zfs/...
|
211947 |
28-Aug-2010 |
pjd |
Move ZUT_OBJS in the same place that is used in vendor code.
Obtained from: //depot/user/pjd/zfs/...
|
211932 |
28-Aug-2010 |
mm |
Import changes from OpenSolaris that provide - better ACL caching and speedup of ACL permission checks - faster handling of stat() - lowered mutex contention in the read/writer lock (rrwlock) - several related bugfixes
Detailed information (OpenSolaris onnv changesets and Bug IDs):
9749:105f407a2680 6802734 Support for Access Based Enumeration (not used on FreeBSD) 6844861 inconsistent xattr readdir behavior with too-small buffer
9866:ddc5f1d8eb4e 6848431 zfs with rstchown=0 or file_chown_self privilege allows user to "take" ownership
9981:b4907297e740 6775100 stat() performance on files on zfs should be improved 6827779 rrwlock is overly protective of its counters
10143:d2d432dfe597 6857433 memory leaks found at: zfs_acl_alloc/zfs_acl_node_alloc 6860318 truncate() on zfsroot succeeds when file has a component of its path set without access permission
10232:f37b85f7e03e 6865875 zfs sometimes incorrectly giving search access to a dir
10250:b179ceb34b62 6867395 zpool_upgrade_007_pos testcase panic'd with BAD TRAP: type=e (#pf Page fault)
10269:2788675568fd 6868276 zfs_rezget() can be hazardous when znode has a cached ACL
10295:f7a18a1e9610 6870564 panic in zfs_getsecattr
Approved by: delphij (mentor) Obtained from: OpenSolaris (multiple Bug IDs) MFC after: 2 weeks
|
211931 |
28-Aug-2010 |
mm |
Update ZFS metaslab code from OpenSolaris. This provides a noticeable write speedup, especially on pools with less than 30% of free space.
Detailed information (OpenSolaris onnv changesets and Bug IDs):
11146:7e58f40bcb1c 6826241 Sync write IOPS drops dramatically during TXG sync 6869229 zfs should switch to shiny new metaslabs more frequently
11728:59fdb3b856f6 6918420 zdb -m has issues printing metaslab statistics
12047:7c1fcc8419ca 6917066 zfs block picking can be improved
Approved by: delphij (mentor) Obtained from: OpenSolaris (Bug ID 6826241, 6869229, 6918420, 6917066) MFC after: 2 weeks
|
211929 |
28-Aug-2010 |
rpaulo |
Remove debugging.
Sponsored by: The FreeBSD Foundation
|
211925 |
28-Aug-2010 |
rpaulo |
Replace a memory barrier with a mutex barrier.
Sponsored by: The FreeBSD Foundation
|
211900 |
27-Aug-2010 |
pjd |
Use ZFS_CTLDIR_NAME instead of hardcoding ".zfs".
|
211855 |
26-Aug-2010 |
pjd |
Update comment now that I finally committed r211854.
MFC after: 1 month
|
211762 |
24-Aug-2010 |
avg |
zfs arc_reclaim_thread: no need to call arc_reclaim_needed when resetting needfree
needfree is checked at the very start of arc_reclaim_needed. This change makes code easier to follow and maintain in face of potential changed in arc_reclaim_needed.
Also, put the whole sub-block under _KERNEL because needfree can be set only in kernel code.
To do: rename needfree to something else to aovid confusion with OpenSolaris global variable of the same name which is used in the same code, but has different meaning (page deficit).
Note: I have an impression that locking around accesses to this variable as well as mutual notifications between arc_reclaim_thread and arc_lowmem are not proper.
MFC after: 1 week
|
211745 |
24-Aug-2010 |
rpaulo |
Replace a pksignal() call with tdksignal().
Pointed out by: kib
|
211744 |
24-Aug-2010 |
rpaulo |
MD fasttrap implementation.
Sponsored by: The FreeBSD Foundation
|
211738 |
24-Aug-2010 |
rpaulo |
Port the fasttrap provider to FreeBSD. This provider is responsible for injecting debugging probes in the userland programs and is the basis for the pid provider and the usdt provider.
Sponsored by: The FreeBSD Foundation
|
211618 |
22-Aug-2010 |
rpaulo |
Port this to FreeBSD. We miss some suword functions, so we use copyout.
Sponsored by: The FreeBSD Foundation > Description of fields to fill in above: 76 columns --| > PR: If a GNATS PR is affected by the change. > Submitted by: If someone else sent in the change. > Reviewed by: If someone else reviewed your modification. > Approved by: If you needed approval for this commit. > Obtained from: If the change is from a third party. > MFC after: N [day[s]|week[s]|month[s]]. Request a reminder email. > Security: Vulnerability reference (one per line) or description. > Empty fields above will be automatically removed.
M sys/fasttrap_impl.h
|
211608 |
22-Aug-2010 |
rpaulo |
Kernel DTrace support for: o uregs (sson@) o ustack (sson@) o /dev/dtrace/helper device (needed for USDT probes)
The work done by me was: Sponsored by: The FreeBSD Foundation
|
211606 |
22-Aug-2010 |
rpaulo |
Add the FreeBSD definition for the fasttrap ioctls.
Sponsored by: The FreeBSD Foundation
|
211555 |
21-Aug-2010 |
rpaulo |
Port the DTrace helper ioctls to FreeBSD and add a helper member to dof_helper_t (needed by drti.o).
Sponsored by: The FreeBSD Foundation
|
211484 |
19-Aug-2010 |
imp |
First cut at mips n64 ABI support
|
210999 |
07-Aug-2010 |
pjd |
In FreeBSD we use 'jailed' property.
MFC after: 2 weeks
|
210470 |
25-Jul-2010 |
mm |
Import two changesets from OpenSolaris to make future updates easier.
The changes do not affect FreeBSD code because zfs_znode_move(), cleanlocks() and cleanshares() are not used.
OpenSolaris onnv changeset: 9788:f660bc44f2e8, 9909:aa280f585a3e
Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6843700, 6790232) MFC after: 7 weeks
|
210457 |
24-Jul-2010 |
mm |
Consider snapshots as descendants via zfs allow -d
OpenSolaris onnv changeset: 9847:2f3ba86e857a
Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6809340) MFC after: 1 week
|
210427 |
23-Jul-2010 |
avg |
zfs arc_memory_throttle: available memory is free + cache
OpenSolaris freemem has the same meaning as our v_free_count + v_cache_count.
Obtained from: Artem Belevich <fbsdlist@src.cx>, Peter Jeremy <peterjeremy@acm.org> Discussed with: pjd MFC after: 2 weeks
|
210398 |
22-Jul-2010 |
mm |
Enable fake resolving of SMB RIDs by using nulldomain and UID_NOBODY - fixes panics when Solaris/OpenSolaris pools that contain files uploaded with the SMB protocol are accessed
Enable seting/unsetting the sharesmb property (dummy action) - allows users who import pools from Solaris/Opensolaris to unset the sharesmb property and get rid of annoying messages
PR: kern/145778, kern/148709 Approved by: pjd, delphij (mentor) MFC after: 7 weeks
|
210282 |
20-Jul-2010 |
mm |
To improve latency, lower default vfs.zfs.vdev.max_pending from 35 to 10
OpenSolaris onnv changeset (partial): 10801:e0bf032e8673
Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6891731) MFC after: 1 week
|
210193 |
17-Jul-2010 |
nwhitehorn |
Add OpenSolaris atomics for powerpc64 and connect ZFS to the build on this platform.
Reviewed by: pjd
|
210192 |
17-Jul-2010 |
nwhitehorn |
Increase stack size for ZFS sync thread. This is required to make ZFS function on 64-bit PowerPC.
Reviewed by: pjd Obtained from: OpenSolaris changeset 14653:7cf402a7f374
|
210172 |
16-Jul-2010 |
jhb |
Revert the previous commit. The race is not applicable to the lockmgr implementation in 8.0 and later as its flags field does not hold dynamic state such as waiters flags, but is only modified in lockinit() aside from VN_LOCK_*().
Discussed with: attilio
|
210171 |
16-Jul-2010 |
jhb |
When the MNTK_EXTENDED_SHARED mount option was added, some filesystems were changed to defer the setting of VN_LOCK_ASHARE() (which clears LK_NOSHARE in the vnode lock's flags) until after they had determined if the vnode was a FIFO. This occurs after the vnode has been inserted a VFS hash or some similar table, so it is possible for another thread to find this vnode via vget() on an i-node number and block on the vnode lock. If the lockmgr interlock (vnode interlock for vnode locks) is not held when clearing the LK_NOSHARE flag, then the lk_flags field can be clobbered. As a result the thread blocked on the vnode lock may never get woken up. Fix this by holding the vnode interlock while modifying the lock flags in this case.
MFC after: 3 days
|
209962 |
13-Jul-2010 |
mm |
Merge ZFS version 15 and almost all OpenSolaris bugfixes referenced in Solaris 10 updates 141445-09 and 142901-14.
Detailed information: (OpenSolaris revisions and Bug IDs, Solaris 10 patch numbers)
7844:effed23820ae 6755435 zfs_open() and zfs_close() needs to use ZFS_ENTER/ZFS_VERIFY_ZP (141445-01)
7897:e520d8258820 6748436 inconsistent zpool.cache in boot_archive could panic a zfs root filesystem upon boot-up (141445-01)
7965:b795da521357 6740164 zpool attach can create an illegal root pool (141909-02)
8084:b811cc60d650 6769612 zpool_import() will continue to write to cachefile even if altroot is set (N/A)
8121:7fd09d4ebd9c 6757430 want an option for zdb to disable space map loading and leak tracking (141445-01)
8129:e4f45a0bfbb0 6542860 ASSERT: reason != VDEV_LABEL_REMOVE||vdev_inuse(vd, crtxg, reason, 0) (141445-01)
8188:fd00c0a81e80 6761100 want zdb option to select older uberblocks (141445-01)
8190:6eeea43ced42 6774886 zfs_setattr() won't allow ndmp to restore SUNWattr_rw (141445-01)
8225:59a9961c2aeb 6737463 panic while trying to write out config file if root pool import fails (141445-01)
8227:f7d7be9b1f56 6765294 Refactor replay (141445-01)
8228:51e9ca9ee3a5 6572357 libzfs should do more to avoid mnttab lookups (141909-01) 6572376 zfs_iter_filesystems and zfs_iter_snapshots get objset stats twice (141909-01)
8241:5a60f16123ba 6328632 zpool offline is a bit too conservative (141445-01) 6739487 ASSERT: txg <= spa_final_txg due to scrub/export race (141445-01) 6767129 ASSERT: cvd->vdev_isspare, in spa_vdev_detach() (141445-01) 6747698 checksum failures after offline -t / export / import / scrub (141445-01) 6745863 ZFS writes to disk after it has been offlined (141445-01) 6722540 50% slowdown on scrub/resilver with certain vdev configurations (141445-01) 6759999 resilver logic rewrites ditto blocks on both source and destination (141445-01) 6758107 I/O should never suspend during spa_load() (141445-01) 6776548 codereview(1) runs off the page when faced with multi-line comments (N/A) 6761406 AMD errata 91 workaround doesn't work on 64-bit systems (141445-01)
8242:e46e4b2f0a03 6770866 GRUB/ZFS should require physical path or devid, but not both (141445-01)
8269:03a7e9050cfd 6674216 "zfs share" doesn't work, but "zfs set sharenfs=on" does (141445-01) 6621164 $SRC/cmd/zfs/zfs_main.c seems to have a syntax error in the translation note (141445-01) 6635482 i18n problems in libzfs_dataset.c and zfs_main.c (141445-01) 6595194 "zfs get" VALUE column is as wide as NAME (141445-01) 6722991 vdev_disk.c: error checking for ddi_pathname_to_dev_t() must test for NODEV (141445-01) 6396518 ASSERT strings shouldn't be pre-processed (141445-01)
8274:846b39508aff 6713916 scrub/resilver needlessly decompress data (141445-01)
8343:655db2375fed 6739553 libzfs_status msgid table is out of sync (141445-01) 6784104 libzfs unfairly rejects numerical values greater than 2^63 (141445-01) 6784108 zfs_realloc() should not free original memory on failure (141445-01)
8525:e0e0e525d0f8 6788830 set large value to reservation cause core dump (141445-01) 6791064 want sysevents for ZFS scrub (141445-01) 6791066 need to be able to set cachefile on faulted pools (141445-01) 6791071 zpool_do_import() should not enable datasets on faulted pools (141445-01) 6792134 getting multiple properties on a faulted pool leads to confusion (141445-01)
8547:bcc7b46e5ff7 6792884 Vista clients cannot access .zfs (141445-01)
8632:36ef517870a3 6798384 It can take a village to raise a zio (141445-01)
8636:7e4ce9158df3 6551866 deadlock between zfs_write(), zfs_freesp(), and zfs_putapage() (141909-01) 6504953 zfs_getpage() misunderstands VOP_GETPAGE() interface (141909-01) 6702206 ZFS read/writer lock contention throttles sendfile() benchmark (141445-01) 6780491 Zone on a ZFS filesystem has poor fork/exec performance (141445-01) 6747596 assertion failed: DVA_EQUAL(BP_IDENTITY(&zio->io_bp_orig), BP_IDENTITY(zio->io_bp))); (141445-01)
8692:692d4668b40d 6801507 ZFS read aggregation should not mind the gap (141445-01)
8697:e62d2612c14d 6633095 creating a filesystem with many properties set is slow (141445-01)
8768:dfecfdbb27ed 6775697 oracle crashes when overwriting after hitting quota on zfs (141909-01)
8811:f8deccf701cf 6790687 libzfs mnttab caching ignores external changes (141445-01) 6791101 memory leak from libzfs_mnttab_init (141445-01)
8845:91af0d9c0790 6800942 smb_session_create() incorrectly stores IP addresses (N/A) 6582163 Access Control List (ACL) for shares (141445-01) 6804954 smb_search - shortname field should be space padded following the NULL terminator (N/A) 6800184 Panic at smb_oplock_conflict+0x35() (N/A)
8876:59d2e67b4b65 6803822 Reboot after replacement of system disk in a ZFS mirror drops to grub> prompt (141445-01)
8924:5af812f84759 6789318 coredump when issue zdb -uuuu poolname/ (141445-01) 6790345 zdb -dddd -e poolname coredump (141445-01) 6797109 zdb: 'zdb -dddddd pool_name/fs_name inode' coredump if the file with inode was deleted (141445-01) 6797118 zdb: 'zdb -dddddd poolname inum' coredump if I miss the fs name (141445-01) 6803343 shareiscsi=on failed, iscsitgtd failed request to share (141445-01)
9030:243fd360d81f 6815893 hang mounting a dataset after booting into a new boot environment (141445-01)
9056:826e1858a846 6809691 'zpool create -f' no longer overwrites ufs infomation (141445-01)
9179:d8fbd96b79b3 6790064 zfs needs to determine uid and gid earlier in create process (141445-01)
9214:8d350e5d04aa 6604992 forced unmount + being in .zfs/snapshot/<snap1> = not happy (141909-01) 6810367 assertion failed: dvp->v_flag & VROOT, file: ../../common/fs/gfs.c, line: 426 (141909-01)
9229:e3f8b41e5db4 6807765 ztest_dsl_dataset_promote_busy needs to clean up after ENOSPC (141445-01)
9230:e4561e3eb1ef 6821169 offlining a device results in checksum errors (141445-01) 6821170 ZFS should not increment error stats for unavailable devices (141445-01) 6824006 need to increase issue and interrupt taskqs threads in zfs (141445-01)
9234:bffdc4fc05c4 6792139 recovering from a suspended pool needs some work (141445-01) 6794830 reboot command hangs on a failed zfs pool (141445-01)
9246:67c03c93c071 6824062 System panicked in zfs_mount due to NULL pointer dereference when running btts and svvs tests (141909-01)
9276:a8a7fc849933 6816124 System crash running zpool destroy on broken zpool (141445-03)
9355:09928982c591 6818183 zfs snapshot -r is slow due to set_snap_props() doing txg_wait_synced() for each new snapshot (141445-03)
9391:413d0661ef33 6710376 log device can show incorrect status when other parts of pool are degraded (141445-03)
9396:f41cf682d0d3 (part already merged) 6501037 want user/group quotas on ZFS (141445-03) 6827260 assertion failed in arc_read(): hdr == pbuf->b_hdr (141445-03) 6815592 panic: No such hold X on refcount Y from zfs_znode_move (141445-03) 6759986 zfs list shows temporary %clone when doing online zfs recv (141445-03)
9404:319573cd93f8 6774713 zfs ignores canmount=noauto when sharenfs property != off (141445-03)
9412:4aefd8704ce0 6717022 ZFS DMU needs zero-copy support (141445-03)
9425:e7ffacaec3a8 6799895 spa_add_spares() needs to be protected by config lock (141445-03) 6826466 want to post sysevents on hot spare activation (141445-03) 6826468 spa 'allowfaulted' needs some work (141445-03) 6826469 kernel support for storing vdev FRU information (141445-03) 6826470 skip posting checksum errors from DTL regions of leaf vdevs (141445-03) 6826471 I/O errors after device remove probe can confuse FMA (141445-03) 6826472 spares should enjoy some of the benefits of cache devices (141445-03)
9443:2a96d8478e95 6833711 gang leaders shouldn't have to be logical (141445-03)
9463:d0bd231c7518 6764124 want zdb to be able to checksum metadata blocks only (141445-03)
9465:8372081b8019 6830237 zfs panic in zfs_groupmember() (141445-03)
9466:1fdfd1fed9c4 6833162 phantom log device in zpool status (141445-03)
9469:4f68f041ddcd 6824968 add ZFS userquota support to rquotad (141445-03)
9470:6d827468d7b5 6834217 godfather I/O should reexecute (141445-03)
9480:fcff33da767f 6596237 Stop looking and start ganging (141909-02)
9493:9933d599bc93 6623978 lwb->lwb_buf != NULL, file ../../../uts/common/fs/zfs/zil.c, line 787, function zil_lwb_commit (141445-06)
9512:64cafcbcc337 6801810 Commit of aligned streaming rewrites to ZIL device causes unwanted disk reads (N/A)
9515:d3b739d9d043 6586537 async zio taskqs can block out userland commands (142901-09)
9554:787363635b6a 6836768 zfs_userspace() callback has no way to indicate failure (N/A)
9574:1eb6a6ab2c57 6838062 zfs panics when an error is encountered in space_map_load() (141909-02)
9583:b0696cd037cc 6794136 Panic BAD TRAP: type=e when importing degraded zraid pool. (141909-03)
9630:e25a03f552e0 6776104 "zfs import" deadlock between spa_unload() and spa_async_thread() (141445-06)
9653:a70048a304d1 6664765 Unable to remove files when using fat-zap and quota exceeded on ZFS filesystem (141445-06)
9688:127be1845343 6841321 zfs userspace / zfs get userused@ doesn't work on mounted snapshot (N/A) 6843069 zfs get userused@S-1-... doesn't work (N/A)
9873:8ddc892eca6e 6847229 assertion failed: refcount_count(&tx->tx_space_written) + delta <= tx->tx_space_towrite in dmu_tx.c (141445-06)
9904:d260bd3fd47c 6838344 kernel heap corruption detected on zil while stress testing (141445-06)
9951:a4895b3dd543 6844900 zfs_ioc_userspace_upgrade leaks (N/A)
10040:38b25aeeaf7a 6857012 zfs panics on zpool import (141445-06)
10000:241a51d8720c 6848242 zdb -e no longer works as expected (N/A)
10100:4a6965f6bef8 6856634 snv_117 not booting: zfs_parse_bootfs: error2 (141445-07)
10160:a45b03783d44 6861983 zfs should use new name <-> SID interfaces (N/A) 6862984 userquota commands can hang (141445-06)
10299:80845694147f 6696858 zfs receive of incremental replication stream can dereference NULL pointer and crash (N/A)
10302:a9e3d1987706 6696858 zfs receive of incremental replication stream can dereference NULL pointer and crash (fix lint) (N/A)
10575:2a8816c5173b (partial merge) 6882227 spa_async_remove() shouldn't do a full clear (142901-14)
10800:469478b180d9 6880764 fsync on zfs is broken if writes are greater than 32kb on a hard crash and no log attached (142901-09) 6793430 zdb -ivvvv assertion failure: bp->blk_cksum.zc_word[2] == dmu_objset_id(zilog->zl_os) (N/A)
10801:e0bf032e8673 (partial merge) 6822816 assertion failed: zap_remove_int(ds_next_clones_obj) returns ENOENT (142901-09)
10810:b6b161a6ae4a 6892298 buf->b_hdr->b_state != arc_anon, file: ../../common/fs/zfs/arc.c, line: 2849 (142901-09)
10890:499786962772 6807339 spurious checksum errors when replacing a vdev (142901-13)
11249:6c30f7dfc97b 6906110 bad trap panic in zil_replay_log_record (142901-13) 6906946 zfs replay isn't handling uid/gid correctly (142901-13)
11454:6e69bacc1a5a 6898245 suspended zpool should not cause rest of the zfs/zpool commands to hang (142901-10)
11546:42ea6be8961b (partial merge) 6833999 3-way deadlock in dsl_dataset_hold_ref() and dsl_sync_task_group_sync() (142901-09)
Discussed with: pjd Approved by: delphij (mentor) Obtained from: OpenSolaris (multiple Bug IDs) MFC after: 2 months
|
209721 |
06-Jul-2010 |
rpaulo |
Merge from vendor-sys/opensolaris: * add fasttrap files
|
209275 |
17-Jun-2010 |
mm |
Import latest ARC change from OpenSolaris: - large ghost eviction causes high write latency - arc_adjust might adjust MRU unnecessarily - arc_adapt can lead to wild arc_p adjustment
OpenSolaris onnv-revision: 12636:13b5d698941e
Submitted by: avg Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6950219, 6953403, 6951024) MFC after: 1 month
|
209261 |
17-Jun-2010 |
pjd |
Turn off UMA allocations on all archs by default. It isn't stable even on amd64.
Reported by: many MFC after: 3 days
|
209230 |
16-Jun-2010 |
pjd |
Remove redundant assignment.
MFC after: 3 days
|
209101 |
12-Jun-2010 |
mm |
Fix arc_read_done may try to byteswap undefined data (sparc related)
OpenSolaris onnv-revision: 10839:cf83b553a2ab
Obtained from: OpenSolaris (Bug ID 6836714) Approved by: pjd, delphij (mentor) MFC after: 3 days
|
209100 |
12-Jun-2010 |
mm |
Fix panic in zfs_getsecattr
OpenSolaris onnv-revision: 10295:f7a18a1e9610
Obtained from: OpenSolaris (Bug ID 6870564) Approved by: pjd, delphij (mentor) MFC after: 3 days
|
209099 |
12-Jun-2010 |
mm |
Fix possible zfs panic on zpool import
OpenSolaris onnv-revision: 10040:38b25aeeaf7a
Obtained from: OpenSolaris (Bug ID 6857012) Approved by: pjd, delphij (mentor) MFC after: 3 days
|
209098 |
12-Jun-2010 |
mm |
Fix zpool resilver stalls with spa_scrub_thread in a 3 way deadlock
OpenSolaris onnv-revision: 9997:174d75a29a1c
Obtained from: OpenSolaris (Bug ID 6843235) Approved by: pjd, delphij (mentor) MFC after: 3 days
|
209097 |
12-Jun-2010 |
mm |
Fix ZFS panic deadlock: cycle in blocking chain via zfs_zget
OpenSolaris onnv-revision: 9774:0bb234ab2287
Obtained from: OpenSolaris (Bug ID 6788152) Approved by: pjd, delphij (mentor) MFC after: 3 days
|
209096 |
12-Jun-2010 |
mm |
Fix vdev_probe() starvation brings txg train to a screeching halt
OpenSolaris onnv-revision: 9722:e3866bad4e96
Obtained from: OpenSolaris (Bug ID 6844069) Approved by: pjd, delphij (mentor) MFC after: 3 days
|
209095 |
12-Jun-2010 |
mm |
Fix incomplete resilvering after disk replacement (raidz)
OpenSolaris onnv-revision: 9434:3bebded7c76a
Obtained from: OpenSolaris (Bug ID 6794570) Approved by: pjd, delphij (mentor) MFC after: 3 days
|
209094 |
12-Jun-2010 |
mm |
Fix zfs destroy fails to free object in open context, stops up txg train
OpenSolaris onnv-revision: 9409:9dc3f17354ed
Obtained from: OpenSolaris (Bug ID 6809683) Approved by: pjd, delphij (mentor) MFC after: 3 days
|
209093 |
12-Jun-2010 |
mm |
Fix unable to remove a file over NFS after hitting refquota limit
OpenSolaris onnv-revision: 8890:8c2bd5f17bf2
Obtained from: OpenSolaris (Bug ID 6798878) Approved by: pjd, delphij (mentor) MFC after: 3 days
|
209059 |
11-Jun-2010 |
jhb |
Update several places that iterate over CPUs to use CPU_FOREACH().
|
208775 |
03-Jun-2010 |
mm |
Fix freeing space after deleting large files with holes.
OpenSolaris onnv revision: 9950:78fc41aa9bc5
Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6792701) MFC after: 3 days
|
208689 |
01-Jun-2010 |
mm |
Fix ZIL close when doing zfs rollback or zfs receive on a mounted dataset.
The fix is a partial import and merge of OpenSolaris onnv revisions 8227:f7d7be9b1f56. and 9292:e112194b5b73
Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6798298) MFC after: 3 days
|
208683 |
31-May-2010 |
pjd |
Fix a bug where resilver is not started automatically on pool import or load. If disk was missing on pool load or import and on next pool load or import it was present, resilver wasn't started automatically and ZFS reported all disks as ONLINE and healthy. Then, when another disk died, pool became unaccessible, because if it was 2-way mirror or RAIDZ1 two vdevs were out of sync.
To fix the problem, start resilver automatically on pool load or import.
Obtained from: OpenSolaris MFC after: 3 days
|
208682 |
31-May-2010 |
pjd |
Fix panic when reading label from provider with non power of 2 sector size.
Reported by: James R. Van Artsdalen <james-freebsd-fs2@jrv.org> MFC after: 3 days
|
208474 |
23-May-2010 |
mm |
Remove kstat.zfs.arcstats.l2_write_bytes_written
The arcstats.l2_write_bytes_written kstat counter introduced in r205231 was duplicite with vendor's arcstats.l2_write_bytes counter imported in r208373 (OpenSolaris revision 8582:df9361868dbe)
Approved by: pjd, delphij (mentor) MFC after: 3 days
|
208472 |
23-May-2010 |
mm |
Fix zfs receive temporarily changing unchanged stream properties. Fix possible panic with zfs_enable_datasets.
OpenSolaris onnv revision: 8536:33bd5de3260e
Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6748561, 6757075) MFC after: 3 days
|
208458 |
23-May-2010 |
pjd |
Create UMA zones unconditionally.
MFC after: 3 days
|
208454 |
23-May-2010 |
pjd |
Remove ZIO_USE_UMA from arc.c as well.
MFC after: 3 days
|
208443 |
23-May-2010 |
mm |
Fix kernel panic when calling spa_tryimport() on a corrupted pool.
OpenSolaris onnv revision: 8680:005fe27123ba
Approved by: delphij (mentor) Obtained from: OpenSolaris (Bug ID 6786321) MFC after: 1 day
|
208442 |
23-May-2010 |
mm |
Fix mutex_exit misorder that can cause a kernel panic.
OpenSolaris onnv revision: 8667:5c308a17eb7c
Approved by: delphij (mentor) Obtained from: OpenSolaris (Bug ID 6795440) MFC after: 1 day
|
208373 |
21-May-2010 |
mm |
Update L2ARC code and fix several bugs.
- improve ARC memory consumption (Bug ID 6488341) - ARC/L2ARC metadata accounting (Bug ID 6748019) - L2ARC turbo warmup (Bud ID 6748023) - kstats for ARC content (Bug ID 6748023) - kstats for evicted bytes from ARC by L2ARC state (Bud ID 6871680) - fix panic on i386 systems (Bug ID 6821260)
OpenSolaris onnv revisions: 8582:df9361868dbe, 8628:97dcded6e556, 9215:7c4584f76b47, 9274:a10f8bd993c1, 10357:29060492b29d
OpenSolaris Bug IDs: 6748019, 6748023, 6748030, 6488341, 6798268, 6821260, 6790261, 6871680
Approved by: pjd, delphij (mentor) Obtained from: OpenSlaris (multiple bug IDs) MFC after: 3 days
|
208372 |
21-May-2010 |
mm |
Reorder some already introduced locking variables.
OpenSolaris onnv revision: 8214:d7abf7c1f1c1
Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6747934) MFC after: 3 days
|
208371 |
21-May-2010 |
mm |
Fix stack overflow in zfs send.
OpenSolaris onnv-revision: 8012:8ea30813950f
Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6765626) MFC after: 3 days
|
208370 |
21-May-2010 |
mm |
Fix: vdev_reopen() can lead to failed allocations
OpenSolaris onnv-revision: 7980:589f37f25048
Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6764914) MFC after: 3 days
|
208166 |
16-May-2010 |
pjd |
Fix userland build by making io_task available only for the kernel and by providing taskq_dispatch_safe() macro.
MFC after: 1 week
|
208148 |
16-May-2010 |
pjd |
Allow to configure UMA usage for ZIO data via loader and turn it on by default for amd64. On i386 I saw performance degradation when UMA was used, but for amd64 it should help.
MFC after: 3 days
|
208147 |
16-May-2010 |
pjd |
Add task structure to zio and use it instead of allocating one. This eliminates the only place where we can sleep when calling zio_interrupt(). As a side-effect this can actually improve performance a little as we allocate one less thing for every I/O.
Prodded by: kib MFC after: 1 week
|
208142 |
16-May-2010 |
pjd |
The whole point of having dedicated worker thread for each leaf VDEV was to avoid calling zio_interrupt() from geom_up thread context. It turns out that when provider is forcibly removed from the system and we kill worker thread there can still be some ZIOs pending. To complete pending ZIOs when there is no worker thread anymore we still have to call zio_interrupt() from geom_up context. To avoid this race just remove use of worker threads altogether. This should be more or less fine, because I also thought that zio_interrupt() does more work, but it only makes small UMA allocation with M_WAITOK. It also saves one context switch per I/O request.
PR: kern/145339 Reported by: Alex Bakhtin <Alex.Bakhtin@gmail.com> MFC after: 1 week
|
208131 |
16-May-2010 |
mm |
Fix deadlock between zfs_dirent_lock and zfs_rmdir
OpenSolaris onnv revision: 11321:506b7043a14c
Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6847615) MFC after: 3 days
|
208130 |
16-May-2010 |
mm |
Fix perfomance problem with ZFS prefetch caching [1] Add statistics for ZFS prefetch (sysctl kstat.zfs.misc.zfetchstats)
Partial import of OpenSolaris onnv revision 10474:0e96dd3b905a
Reported by: jhell@dataix.net (private e-mail) [1] Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6859997, 6868951) MFC after: 3 days
|
208050 |
13-May-2010 |
mm |
Fix ZIL-related panic on zfs rollback.
OpenSolaris onnv-revision: 8746:e1d96ca6808c
Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6796377) MCF after: 1 week
|
208047 |
13-May-2010 |
mm |
Import OpenSolaris revision 7837:001de5627df3 It includes the following changes: - parallel reads in traversal code (Bug ID 6333409) - faster traversal for zfs send (Bug ID 6418042) - traversal code cleanup (Bug ID 6725675) - fix for two scrub related bugs (Bug ID 6729696, 6730101) - fix assertion in dbuf_verify (Bug ID 6752226) - fix panic during zfs send with i/o errors (Bug ID 6577985) - replace P2CROSS with P2BOUNDARY (Bug ID 6725680)
List of OpenSolaris Bug IDs: 6333409, 6418042, 6757112, 6725668, 6725675, 6725680, 6725698, 6729696, 6730101, 6752226, 6577985, 6755042
Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (multiple Bug IDs) MFC after: 1 week
|
208030 |
13-May-2010 |
trasz |
Add missing check to prevent local users from panicing the kernel by trying to set malformed ACL.
MFC after: 3 days
|
207956 |
12-May-2010 |
mm |
Fix possible hang when replaying large truncations.
OpenSolaris onnv revision: 7904:6a124a4ca9c5
Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6761624) MFC after: 3 days
|
207936 |
11-May-2010 |
pjd |
Eventhough r203504 eliminates taste traffic provoked by vdev_geom.c, ZFS still like to open all vdevs, close them and open them again, which in turn provokes taste traffic anyway.
I don't know of any clean way to fix it, so do it the hard way - if we can't open provider for writing just retry 5 times with 0.5 pauses. This should elimitate accidental races caused by other classes tasting providers created on top of our vdevs.
MFC after: 3 days Reported by: James R. Van Artsdalen <james-freebsd-fs2@jrv.org> Reported by: Yuri Pankov <yuri.pankov@gmail.com>
|
207934 |
11-May-2010 |
pjd |
Add missing new line characters to the warnings.
MFC after: 3 days
|
207911 |
11-May-2010 |
mm |
Fix failed assertion on destroying datasets from an older pool version.
OpenSolaris onnv revision: 9390:887948510f80
PR: kern/146471 Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6826861) MFC after: 3 days
|
207910 |
11-May-2010 |
mm |
Fix possible panic with zfs destroy.
OpenSolaris onnv revision: 8779:f164e0e90508
PR: kern/146471 Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6784924) MFC after: 3 days
|
207909 |
11-May-2010 |
mm |
Fix zfs rename (may occasionally fail with dataset busy).
OpenSolaris onnv revision: 8517:41a0783dde17
PR: kern/146471 Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6784757) MFC after: 3 days
|
207908 |
11-May-2010 |
mm |
Fix endianess bug in ZFS intent log (ZIL).
OpenSolaris onnv revision: 8109:6147a1bdd359
Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6760048) MFC after: 3 days
|
207745 |
07-May-2010 |
trasz |
Enforce RLIMIT_FSIZE in ZFS.
Reviewed by: pjd@
|
207736 |
07-May-2010 |
mckusick |
Merger of the quota64 project into head.
This joint work of Dag-Erling Smørgrav and myself updates the FFS quota system to support both traditional 32-bit and new 64-bit quotas (for those of you who want to put 2+Tb quotas on your users).
By default quotas are not compiled into the kernel. To include them in your kernel configuration you need to specify:
options QUOTA # Enable FFS quotas
If you are already running with the current 32-bit quotas, they should continue to work just as they have in the past. If you wish to convert to using 64-bit quotas, use `quotacheck -c 64'; if you wish to revert from 64-bit quotas back to 32-bit quotas, use `quotacheck -c 32'.
There is a new library of functions to simplify the use of the quota system, do `man quotafile' for details. If your application is currently using the quotactl(2), it is highly recommended that you convert your application to use the quotafile interface. Note that existing binaries will continue to work.
Special thanks to John Kozubik of rsync.net for getting me interested in pursuing 64-bit quota support and for funding part of my development time on this project.
|
207683 |
05-May-2010 |
marius |
- Fix broken symlinks on cross platform zfs send/recv. [1] - Enable zfs_ace_byteswap() on FreeBSD as it works just fine (tested between amd64 and sparc64 in both directions by Michael Moll).
PR: 146272 Approved by: mm, pjd Obtained from: OpenSolaris (onnv rev. 8283:1ca59f393041; Bug ID 6764193) [1] MFC after: 3 days
|
207670 |
05-May-2010 |
mm |
Introduce hardforce export option (-F) for "zpool export". When exporting with this flag, zpool.cache remains untouched.
OpenSolaris onnv revision: 8211:32722be6ad3b
Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID: 6775357)
|
207626 |
04-May-2010 |
mm |
Speed up ZFS list operation with objset prefetching.
Partial import of OpenSolaris onnv revisions: 8415:8809e849f63e, 10474:0e96dd3b905a
PR: kern/146297 Submitted by: myself Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6386929, 6755389, 6847118) MFC after: 2 weeks
|
207624 |
04-May-2010 |
mm |
Fix deadlock during zfs receive.
OpenSolaris onnv revision: 9299:8809e849f63e
PR: kern/146296 Submitted by: myself Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6783818, 6826836) MFC after: 1 week
|
207481 |
01-May-2010 |
mm |
Add sysctl and loader tunable vfs.zfs.txg.write_limit_override. This tunable improves fine-tuning of ZFS write throttling.
PR: kern/146108 Suggested by: Nikolay Denev <ndenev at gmail.com> Approved by: pjd, delphij (mentor) MFC after: 2 weeks
|
207480 |
01-May-2010 |
mm |
Change description of tunable group vfs.zfs.txg to be more understandable.
Approved by: pjd, delphij (mentor) MFC after: 3 days
|
207427 |
30-Apr-2010 |
mm |
Fix improper pool write throughput calculation.
OpenSolaris onnv revision: 9366:17553395a745
PR: kern/146108 Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris, Bug ID 6817339 MFC after: 2 weeks
|
207334 |
28-Apr-2010 |
pjd |
Backport fix for 'zfs_znode_dmu_init: existing znode for dbuf' panic from OpenSolaris.
PR: kern/144402 Reported by: Alex Bakhtin <alex.bakhtin@gmail.com> Tested by: Alex Bakhtin <alex.bakhtin@gmail.com> Obtained from: OpenSolaris, Bug ID 6895088 MFC after: 3 days
|
207068 |
22-Apr-2010 |
pjd |
Allow to modify directory's content even if the ZFS_NOUNLINK (SF_NOUNLINK, sunlnk) flag is set. We only deny dirctory's removal or rename.
PR: kern/143343 Reported by: marck MFC after: 3 days
|
206797 |
18-Apr-2010 |
pjd |
Restore previous order.
|
206796 |
18-Apr-2010 |
pjd |
Style fixes.
|
206795 |
18-Apr-2010 |
pjd |
Add missing list and lock destruction.
|
206794 |
18-Apr-2010 |
pjd |
Extend locks scope to match OpenSolaris.
|
206793 |
18-Apr-2010 |
pjd |
Remove racy assertion.
Obtained from: OpenSolaris
|
206792 |
18-Apr-2010 |
pjd |
Set ARC_L2_WRITING on L2ARC header creation.
Obtained from: OpenSolaris
|
206667 |
15-Apr-2010 |
pjd |
Fix 3-way deadlock that can happen because of ZFS and vnode lock order reversal.
thread0 (vfs_fhtovp) thread1 (vop_getattr) thread2 (zfs_recv) -------------------- --------------------- ------------------ vn_lock rrw_enter_read rrw_enter_write (hangs) rrw_enter_read (hangs) vn_lock (hangs)
Submitted by: Attila Nagy <bra@fsn.hu> MFC after: 3 days
|
205346 |
19-Mar-2010 |
pjd |
The same code is used to import and to create pool. The order of operations is the following: 1. Try to open vdev by remembered path and guid. 2. If 1 failed, try to find vdev which guid matches and ignore the path. 3. If 2 failed this means either that the vdev we're looking for is gone or that pool is being created and vdev doesn't contain proper guid yet. To be able to handle pool creation we open vdev by path anyway.
Because of 3 it is possible that we open wrong vdev on import which can lead to confusions.
The solution for this is to check spa_load_state. On pool creation it will be equal to SPA_LOAD_NONE and we can open vdev only by path immediately and if it is not equal to SPA_LOAD_NONE we first open by path+guid and when that fails, we open by guid. We no longer open wrong vdev on import.
MFC after: 2 weeks
|
205264 |
17-Mar-2010 |
kmacy |
- cache line align arcs_lock array (h/t Marius Nuennerich) - fix ARCS_LOCK_PAD to use architecture defined CACHE_LINE_SIZE - cache line align buf_hash_table ht_locks array
MFC after: 7 days
|
205253 |
17-Mar-2010 |
kmacy |
use CACHE_LINE_SIZE instead of hardcoding 128 for lock pad
pointed out by Marius Nuennerich and jhb@
|
205231 |
16-Mar-2010 |
kmacy |
- reduce contention by breaking up ARC state locks in to 16 for data and 16 for metadata - export L2ARC tunables as sysctls - add several kstats to track L2ARC state more precisely - avoid holding a contended lock when atomically incrementing a contended counter (no lock protection needed for atomics)
|
205133 |
13-Mar-2010 |
kmacy |
fix compilation under ZIO_USE_UMA
|
205132 |
13-Mar-2010 |
kmacy |
Don't bottleneck on acquiring the stream locks - this avoids a massive drop off in throughput with large numbers of simultaneous reads
MFC after: 7 days
|
205079 |
12-Mar-2010 |
pjd |
Remove bogus assertion.
Reported by: Johan Ström <johan@stromnet.se> Obtained from: OpenSolaris, Bug ID 6827260 MFC after: 1 week
|
204804 |
06-Mar-2010 |
pjd |
Remove racy assertion.
Reported by: Attila Nagy <bra@fsn.hu> Obtained from: OpenSolaris, Bug ID 6827260 MFC after: 1 week
|
204185 |
22-Feb-2010 |
marcel |
Use mf and not mf.a. The latter doesn't force memory ordering and applies to sequential memory.
|
204101 |
19-Feb-2010 |
pjd |
Don't set f_bsize to recordsize. It might confuse some software (like squid).
Submitted by: Alexander Zagrebin <alexz@visp.ru> MFC after: 2 weeks
|
204073 |
18-Feb-2010 |
pjd |
Add tunable and sysctl to skip hostid check on pool import.
|
203533 |
05-Feb-2010 |
delphij |
Remove two files that are not needed by FreeBSD.
Approved by: pjd MFC after: 2 weeks
|
203504 |
04-Feb-2010 |
pjd |
Open provider for writting when we find the right one. Opening too much providers for writing provokes huge traffic related to taste events send by GEOM on close. This can lead to various problems with opening GEOM providers that are created on top of other GEOM providers.
Reorted by: Kurt Touet <ktouet@gmail.com>, mr Tested by: mr, Baginski Darren <kickbsd@ya.ru> MFC after: 2 weeks
|
202129 |
11-Jan-2010 |
delphij |
Report ZFS filesystem version instead of the zpool version when we say it.
Reported by: Yuri Pankov (on -fs@) Submitted by: delphij Approved by: pjd MFC after: 1 week
|
201756 |
07-Jan-2010 |
delphij |
Re-apply onnv-gate revisions 7994 and 8986 (corresponds to FreeBSD revision 200726 and 200727). It looks like that the two revisions were not applied in the right sequence, I found this when comparing with the OpenSolaris code.
MFC after: 3 days Reviewed by: mm@
|
201406 |
02-Jan-2010 |
delphij |
Reduce diff against OpenSolaris - move Giant acquire/release to zfs_znode.c. As a side effect this also eliminates two potential Giant leaks.
Approved by: pjd MFC after: 1 month
|
201143 |
28-Dec-2009 |
delphij |
Apply OpenSolaris revision 8012 which brings our zpool to version 14, making it possible for zpools created on OpenSolaris 2009.06 be used on FreeBSD.
PR: kern/141800 Submitted by: mm Reviewed by: pjd, trasz Obtained from: OpenSolaris MFC after: 2 weeks
|
200727 |
19-Dec-2009 |
delphij |
Apply fix for Solaris bug 6462803: zfs snapshot -r failed because filesystem was busy (onnv revision 8989)
Submitted by: mm Approved by: pjd Obtained from: OpenSolaris MFC after: 2 weeks
|
200726 |
19-Dec-2009 |
delphij |
Apply fix for Solaris bug 6801979: zfs recv can fail with E2BIG (onnv revision 8986)
Requested by: mm Submitted by: pjd Obtained from: OpenSolaris MFC after: 2 weeks
|
200724 |
19-Dec-2009 |
delphij |
Apply fix Solaris bug 6462803 zfs snapshot -r failed because filesystem was busy.
Submitted by: mm Approved by: pjd MFC after: 2 weeks
|
200162 |
05-Dec-2009 |
kib |
Change VOP_FSYNC for zfs vnode from VOP_PANIC to zfs_freebsd_fsync(), both to not panic when fsync(2) is called for fifo on zfs filedescriptor, and to actually fsync fifo inode to permanent storage.
PR: kern/141177 Reviewed by: pjd MFC after: 1 week
|
200158 |
05-Dec-2009 |
pjd |
We have to eventually look for provider without checking guid as this is need for attaching when there is no metadata yet.
Before r200125 the order of looking for providers was wrong. It was: 1. Find provider by name. 2. Find provider by guid. 3. Find provider by name and guid.
Where it should have been: 1. Find provider by name and guid. 2. Find provider by guid. 3. Find provider by name.
MFC after: 1 week
|
200126 |
05-Dec-2009 |
pjd |
Fix deadlock when ZVOLs are present and we are replacing dead component or calling scrub when pool is in a degraded state. It will try to taste ZVOLs, which will lead to deadlock, as ZVOL will try to acquire the same locks as replace/scrub is holding already.
We can't simply skip provider based on their GEOM class, because ZVOL can have providers build on top of it and we need to skip those as well.
We do it by asking for ZFS::iszvol attribute. Any ZVOL-based provider will give us positive answer and we have to skip those providers.
This way we remove possibility to create ZFS pools on top of ZVOLs, but it is not very useful anyway.
I believe deadlock is still possible in some very complex situations like when we have MD provider on top of UFS file on top of ZVOL. When we try to replace dead component in the pool mentioned ZVOL is based on, there might be a deadlock when ZFS will try to taste MD provider. There is no easy way to detect that, but it isn't very common.
MFC after: 1 week
|
200125 |
05-Dec-2009 |
pjd |
Always check guid when opening by path, because we may end up with provider that does have the same name, but only by accident.
MFC after: 1 week
|
200124 |
05-Dec-2009 |
pjd |
Avoid using additional variable for storing an error if we are not going to do anything with it.
|
199157 |
10-Nov-2009 |
pjd |
Be careful which vattr fields are set during setattr replay. Without this fix strange things can appear after unclean shutdown like files with mode set to 07777.
Reported by: des MFC after: 3 days
|
199156 |
10-Nov-2009 |
pjd |
Avoid passing invalid mountpoint to getnewvnode().
Reported by: rwatson Tested by: rwatson MFC after: 3 days
|
198703 |
30-Oct-2009 |
pjd |
- zfs_zaccess() can handle VAPPEND too, so map V_APPEND to VAPPEND and call zfs_access() instead of vaccess() in this case as well. - If VADMIN is specified with another V* flag (unlikely) call both zfs_access() and vaccess() after spliting V* flags.
This fixes "dirtying snapshot!" panic.
PR: kern/139806 Reported by: Carl Chave <carl@chave.us> In co-operation with: jh MFC after: 3 days
|
197861 |
08-Oct-2009 |
pjd |
Allow file system owner to modify system flags if securelevel permits.
MFC after: 3 days
|
197843 |
07-Oct-2009 |
pjd |
On FreeBSD it is enough to report provider removal when orphan event is received, we don't have to do it on every ENXIO error in I/O path. Solaris has no GEOM so they have to handle it in a less clean way.
MFC after: 3 days
|
197842 |
07-Oct-2009 |
pjd |
Fix white-spaces.
MFC after: 3 days
|
197831 |
07-Oct-2009 |
pjd |
Fix situation where Mac OS X NFS client creates a file and when it tries to set ownership and mode in the same setattr operation, the mode was overwritten by secpolicy_vnode_setattr().
PR: kern/118320 Submitted by: Mark Thompson <info-gentoo@mark.thompson.bz> MFC after: 3 days
|
197816 |
06-Oct-2009 |
kmacy |
Prevent paging pressure from draining arc too much - always drain arc if above arc_c_max - never drain arc if arc is below arc_c_max
MFC after: 3 days
|
197683 |
01-Oct-2009 |
delphij |
Return EOPNOTSUPP instead of EINVAL when doing chflags(2) over an old format ZFS, as defined in the manual page.
Submitted by: pjd (response of my original patch but bugs are mine) MFC after: 3 days
|
197515 |
26-Sep-2009 |
pjd |
Handle cases where virtual (GFS) vnodes are referenced when doing forced unmount. In that case we cannot depend on the proper order of invalidating vnodes, so we have to free resources when we have a chance.
PR: kern/139062 Reported by: trasz MFC after: 3 days
|
197514 |
26-Sep-2009 |
pjd |
On lookup error VFS expects *vpp to be set to NULL, be sure to do that.
MFC after: 3 days
|
197513 |
26-Sep-2009 |
pjd |
Use traverse() function to find and return mount point's vnode instead of covered vnode when snapshot is already mounted.
MFC after: 3 days
|
197512 |
26-Sep-2009 |
pjd |
- Don't depend on value returned by gfs_*_inactive(), it doesn't work well with forced unmounts when GFS vnodes are referenced. - Make other preparations to GFS for forced unmounts.
PR: kern/139062 Reported by: trasz MFC after: 3 days
|
197497 |
25-Sep-2009 |
pjd |
Switch to fletcher4 as the default checksum algorithm. Fletcher2 was proven to be a bit weak and OpenSolaris also switched to fletcher4.
PR: kern/139072 Reported by: Daniel Grund <bugs@dgrund.de> MFC after: 3 days
|
197459 |
24-Sep-2009 |
pjd |
Before calling vflush(FORCECLOSE) mark file system as unmounted so the following vnops will fail. This is very important, because without this change vnode could be reclaimed at any point, even if we increased usecount. The only way to ensure that vnode won't be reclaimed was to lock it, which would be very hard to do in ZFS without changing a lot of code. With this change simply increasing usecount is enough to be sure vnode won't be reclaimed from under us. To be precise it can still be reclaimed but we won't be able to see it, because every try to enter ZFS through VFS will result in EIO.
The only function that cannot return EIO, because it is needed for vflush() is zfs_root(). Introduce ZFS_ENTER_NOERROR() macro that only locks z_teardown_lock and never returns EIO.
MFC after: 3 days
|
197458 |
24-Sep-2009 |
pjd |
Close race in zfs_zget(). We have to increase usecount first and then check for VI_DOOMED flag. Before this change vnode could be reclaimed between checking for the flag and increasing usecount.
MFC after: 3 days
|
197435 |
23-Sep-2009 |
trasz |
In VOP_SETACL(9) and VOP_GETACL(9), specifying wrong ACL type should result in EINVAL, not EOPNOTSUPP.
|
197426 |
23-Sep-2009 |
pjd |
Restore BSD behaviour - when creating new directory entry use parent directory gid to set group ownership and not process gid.
This was overlooked during v6 -> v13 switch.
PR: kern/139076 Reported by: Sean Winn <sean@gothic.net.au> MFC after: 3 days
|
197351 |
20-Sep-2009 |
pjd |
Purge namecache in the same place OpenSolaris does.
|
197289 |
17-Sep-2009 |
pjd |
Purge file system namecache when receiving incremental stream and rolling back to it.
MFC after: 3 days
|
197287 |
17-Sep-2009 |
pjd |
Purge namecache for the file system being rolled back, so it doesn't point at invalid vnodes after the rollback resulting in EIO errors when trying to access files which are in the namecache.
Reported by: des MFC after: 3 days
|
197219 |
15-Sep-2009 |
pjd |
Forced unmounts work just fine in my tests under heavy load. There might still be a problem, but it isn't worth a warning.
|
197218 |
15-Sep-2009 |
pjd |
We believe ZFS is ready for production use. Remove a warning about it being experimental. :)
|
197201 |
14-Sep-2009 |
pjd |
- Mount ZFS snapshots with MNT_IGNORE flag, so they are not visible in regular df(1) and mount(8) output. This is a bit smilar to OpenSolaris and follows ZFS route of not listing snapshots by default with 'zfs list' command. - Add UPDATING entry to note that ZFS snapshots are no longer visible in mount(8) and df(1) output by default.
Reviewed by: kib MFC after: 3 days
|
197177 |
13-Sep-2009 |
pjd |
Support both case: when snapshot is already mounted and when it is not yet mounted.
MFC after: 3 days
|
197172 |
13-Sep-2009 |
pjd |
Add missing \n.
Reported by: marck
|
197167 |
13-Sep-2009 |
pjd |
Work-around READDIRPLUS problem with .zfs/ and .zfs/snapshot/ directories by just returning EOPNOTSUPP. This will allow NFS server to fall back to regular READDIR.
Note that converting inode number to snapshot's vnode is expensive operation. Snapshots are stored in AVL tree, but based on their names, not inode numbers, so to convert inode to snapshot vnode we have to interate over all snalshots.
This is not a problem in OpenSolaris, because in their READDIRPLUS implementation they use VOP_LOOKUP() on d_name, instead of VFS_VGET() on d_fileno as we do.
PR: kern/125149 Reported by: Weldon Godfrey <wgodfrey@ena.com> Analysis by: Jaakko Heinonen <jh@saunalahti.fi> MFC after: 3 days
|
197153 |
13-Sep-2009 |
pjd |
When zfs.ko is compiled with debug, make sure that znode and vnode point at each other.
MFC after: 3 days
|
197152 |
13-Sep-2009 |
pjd |
Extend scope of the z_teardown_lock lock for consistency and "just in case".
MFC after: 3 days
|
197151 |
13-Sep-2009 |
pjd |
Be sure not to overflow struct fid.
MFC after: 3 days
|
197150 |
13-Sep-2009 |
pjd |
There is a bug where mze_insert() can trigger an assert() of inserting the same entry twice. This bug is not fixed yet, but leads to situation where when try to access corrupted directory the kernel will panic. Until the bug is properly fixed, try to recover from it and log that it happened.
Reported by: marck OpenSolaris bug: 6709336 MFC after: 3 days
|
197133 |
12-Sep-2009 |
pjd |
- Protect reclaim with z_teardown_inactive_lock. - Be prepared for dbuf to disappear in zfs_reclaim_complete() and check if z_dbuf field is NULL - this might happen in case of rollback or forced unmount between zfs_freebsd_reclaim() and zfs_reclaim_complete(). - On forced unmount wait for all znodes to be destroyed - destruction can be done asynchronously via zfs_reclaim_complete().
MFC after: 1 week
|
197131 |
12-Sep-2009 |
pjd |
Tighten up the check for race in zfs_zget() - ZTOV(zp) can not only contain NULL, but also can point to dead vnode, take that into account.
PR: kern/132068 Reported by: Edward Fisk" <7ogcg7g02@sneakemail.com>, kris Fix based on patch from: Jaakko Heinonen <jh@saunalahti.fi> MFC after: 1 week
|
196985 |
08-Sep-2009 |
pjd |
Only log successful commands! Without this fix we log even unsuccessful commands executed by unprivileged users. Action is not really taken, but it is logged to pool history, which might be confusing.
Reported by: Denis Ahrens <denis@h3q.com> MFC after: 3 days
|
196982 |
08-Sep-2009 |
pjd |
We don't export individual snapshots, so mnt_export field in snapshot's mount point is NULL. That's why when we try to access snapshots over NFS use mnt_export field from the parent file system.
MFC after: 1 week
|
196980 |
08-Sep-2009 |
pjd |
When we automatically mount snapshot we want to return vnode of the mount point from the lookup and not covered vnode. This is one of the fixes for using .zfs/ over NFS.
MFC after: 1 week
|
196979 |
08-Sep-2009 |
pjd |
On FreeBSD we don't have to look for snapshot's mount point, because fhtovp method is already called with proper mount point.
MFC after: 1 week
|
196978 |
08-Sep-2009 |
pjd |
Call ZFS_EXIT() after locking the vnode.
MFC after: 1 week
|
196965 |
08-Sep-2009 |
pjd |
Fix reference count leak for a case where snapshot's mount point is updated. Such situation is not supported.
This problem was triggered by something like this:
# zpool create tank da0 # zfs snapshot tank@snap # cd /tank/.zfs/snapshot/snap (this will mount the snapshot) # cd # mount -u nosuid /tank/.zfs/snapshot/snap (refcount leak) # zpool export tank cannot export 'tank': pool is busy
MFC after: 1 week
|
196954 |
07-Sep-2009 |
pjd |
If we have to use avl_find(), optimize a bit and use avl_insert() instead of avl_add() (the latter is actually a wrapper around avl_find() + avl_insert()).
Fix similar case in the code that is currently commented out.
|
196953 |
07-Sep-2009 |
pjd |
When snapshot mount point is busy (for example we are still in it) we will fail to unmount it, but it won't be removed from the tree, so in that case there is no need to reinsert it.
This fixes a panic reproducable in the following steps:
# zfs create tank/foo # zfs snapshot tank/foo@snap # cd /tank/foo/.zfs/snapshot/snap # umount /tank/foo panic: avl_find() succeeded inside avl_add()
Reported by: trasz MFC after: 3 days
|
196949 |
07-Sep-2009 |
trasz |
Enable NFSv4 ACL support in ZFS.
Reviewed by: pjd
|
196944 |
07-Sep-2009 |
pjd |
Don't recheck ownership on update mount. This will eliminate LOR between vfs_busy() and mount mutex. We check ownership in vfs_domount() anyway.
Noticed by: kib Reviewed by: kib MFC after: 1 week
|
196941 |
07-Sep-2009 |
trasz |
Prevent the line from wrapping.
|
196927 |
07-Sep-2009 |
pjd |
Changing provider size is not really supported by GEOM, but doing so when provider is closed should be ok.
When administrator requests to change ZVOL size do it immediately if ZVOL is closed or do it on last ZVOL close.
PR: kern/136942 Requested by: Bernard Buri <bsd@ask-us.at> MFC after: 1 week
|
196919 |
07-Sep-2009 |
pjd |
bzero() on-stack argument, so mutex_init() won't misinterpret that the lock is already initialized if we have some garbage on the stack.
PR: kern/135480 Reported by: Emil Mikulic <emikulic@gmail.com> MFC after: 3 days
|
196863 |
05-Sep-2009 |
trasz |
Improve wording.
Discussed with: pjd, cperciva, rink, wkoszek and des, in order of appearance.
|
196703 |
31-Aug-2009 |
pjd |
Backport the 'dirtying dbuf' panic fix from newer ZFS version.
Reported by: Thomas Backman <serenity@exscape.org> MFC after: 1 week
|
196702 |
31-Aug-2009 |
pjd |
Remove empty directory.
|
196662 |
30-Aug-2009 |
pjd |
Add missing mountpoint vnode locking.
This fixes panic on assertion with DEBUG_VFS_LOCKS and vfs.usermount=1 when regular user tries to mount dataset owned by him.
MFC after: 1 week
|
196458 |
23-Aug-2009 |
pjd |
- Hide ZFS kernel threads under zfskern process. - Use better (shorter) threads names: 'zvol:worker zvol/tank/vol00' -> 'zvol tank/vol00' 'vdev:worker da0' -> 'vdev da0'
|
196457 |
23-Aug-2009 |
pjd |
Set priority of vdev_geom threads and zvol threads to PRIBIO.
|
196309 |
17-Aug-2009 |
pjd |
getcwd() (when __getcwd() fails) works by stating current directory, going up (..), calling readdir and looking for previous directory inode. In case of .zfs/ directory this doesn't work, because .zfs/ is hidden by default, so it won't be visible in readdir output.
Fix this by implementing VPTOCNP for snapshot directories, so __getcwd() doesn't fail and getcwd() doesn't have to use readdir method.
This fixes /bin/pwd from within .zfs/snapshot/<name>/.
Suggested by: kib Approved by: re (rwatson)
|
196307 |
17-Aug-2009 |
pjd |
Manage asynchronous vnode release just like Solaris.
Discussed with: kmacy Approved by: re (kib)
|
196303 |
17-Aug-2009 |
pjd |
- Reduce z_teardown_lock lock scope a bit. - The error variable is int, not bool. - Convert spaces to tabs where needed.
Approved by: re (kib)
|
196301 |
17-Aug-2009 |
pjd |
If z_buf is NULL, we should free znode immediately.
Noticed by: avg Approved by: re (kib)
|
196299 |
17-Aug-2009 |
pjd |
- We need to recycle vnode instead of freeing znode.
Submitted by: avg
- Add missing vnode interlock unlock. - Remove redundant znode locking.
Approved by: re (kib)
|
196297 |
17-Aug-2009 |
pjd |
Fix panic in zfs recv code. The last vnode (mountpoint's vnode) can have 0 usecount.
Reported by: Thomas Backman <serenity@exscape.org> Approved by: re (kib)
|
196295 |
17-Aug-2009 |
pjd |
Remove OpenSolaris taskq port (it performs very poorly in our kernel) and replace it with wrappers around our taskqueue(9). To make it possible implement taskqueue_member() function which returns 1 if the given thread was created by the given taskqueue.
Approved by: re (kib)
|
196291 |
17-Aug-2009 |
pjd |
- Fix a race where /dev/zfs control device is created before ZFS is fully initialized. Also destroy /dev/zfs before doing other deinitializations. - Initialization through taskq is no longer needed and there is a race where one of the zpool/zfs command loads zfs.ko and tries to do some work immediately, but /dev/zfs is not there yet.
Reported by: pav Approved by: re (kib)
|
196289 |
17-Aug-2009 |
pjd |
Remove files that are no longer used.
Discussed with: kmacy Approved by: re (kib)
|
196269 |
16-Aug-2009 |
marcel |
Fix misalignment in nvpair_native_embedded() caused by the compiler replacing the bzero(). See also revision 195627, which fixed the misalignment in nvpair_native_embedded_array().
Approved by: re (kensmith)
|
195909 |
27-Jul-2009 |
pjd |
We don't support ephemeral IDs in FreeBSD and without this fix ZFS can panic when in zfs_fuid_create_cred() when userid is negative. It is converted to unsigned value which makes IS_EPHEMERAL() macro to incorrectly report that this is ephemeral ID. The most reasonable solution for now is to always report that the given ID is not ephemeral.
PR: kern/132337 Submitted by: Matthew West <freebsd@r.zeeb.org> Tested by: Thomas Backman <serenity@exscape.org>, Michael Reifenberger <mike@reifenberger.com> Approved by: re (kib) MFC after: 2 weeks
|
195822 |
22-Jul-2009 |
trasz |
Fix extattr_list_file(2) on ZFS in case the attribute directory doesn't exist and user doesn't have write access to the file. Without this fix, it returns bogus value instead of 0. For some reason this didn't manifest on my kernel compiled with -O0.
PR: kern/136601 Submitted by: Jaakko Heinonen <jh at saunalahti dot fi> Approved by: re (kib)
|
195785 |
20-Jul-2009 |
trasz |
Fix permission handling for extended attributes in ZFS. Without this change, ZFS uses SunOS Alternate Data Streams semantics - each EA has its own permissions, which are set at EA creation time and - unlike SunOS - invisible to the user and impossible to change. From the user point of view, it's just broken: sometimes access is granted when it shouldn't be, sometimes it's denied when it shouldn't be.
This patch makes it behave just like UFS, i.e. depend on current file permissions. Also, it fixes returned error codes (ENOATTR instead of ENOENT) and makes listextattr(2) return 0 instead of EPERM where there is no EA directory (i.e. the file never had any EA).
Reviewed by: pjd (idea, not actual code) Approved by: re (kib)
|
195627 |
11-Jul-2009 |
marcel |
In nvpair_native_embedded_array(), meaningless pointers are zeroed. The programmer was aware that alignment was not guaranteed in the packed structure and used bzero() to NULL out the pointers. However, on ia64, the compiler is quite agressive in finding ILP and calls to bzero() are often replaced by simple assignments (i.e. stores). Especially when the width or size in question corresponds with a store instruction (i.e. st1, st2, st4 or st8).
The problem here is not a compiler bug. The address of the memory to zero-out was given by '&packed->nvl_priv' and given the type of the 'packed' pointer the compiler could assume proper alignment for the replacement of bzero() with an 8-byte wide store to be valid. The problem is with the programmer. The programmer knew that the address did not have the alignment guarantees needed for a regular assignment, but failed to inform the compiler of that fact. In fact, the programmer told the compiler the opposite: alignment is guaranteed.
The fix is to avoid using a pointer of type "nvlist_t *" and instead use a "char *" pointer as the basis for calculating the address. This tells the compiler that only 1-byte alignment can be assumed and the compiler will either keep the bzero() call or instead replace it with a sequence of byte-wise stores. Both are valid.
Approved by: re (kib)
|
194586 |
21-Jun-2009 |
kib |
Add another flags argument to vn_open_cred. Use it to specify that some vn_open_cred invocations shall not audit namei path.
In particular, specify VN_OPEN_NOAUDIT for dotdot lookup performed by default implementation of vop_vptocnp, and for the open done for core file. vn_fullpath is called from the audit code, and vn_open there need to disable audit to avoid infinite recursion. Core file is created on return to user mode, that, in particular, happens during syscall return. The creation of the core file is audited by direct calls, and we do not want to overwrite audit information for syscall.
Reported, reviewed and tested by: rwatson
|
194453 |
18-Jun-2009 |
jhb |
Bootstrap mergeinfo for the OpenSolaris contrib bits.
|
194300 |
16-Jun-2009 |
jhb |
Remove confusing mergeinfo caused by renaming files.
|
194118 |
13-Jun-2009 |
jamie |
Rename the host-related prison fields to be the same as the host.* parameters they represent, and the variables they replaced, instead of abbreviated versions of them.
Approved by: bz (mentor)
|
194043 |
11-Jun-2009 |
kmacy |
pjd has requested that I keep the tunable as zfs_prefetch_disable to minimize gratuitous differences with Opensolaris' ZFS
Sorry for the churn
|
193980 |
11-Jun-2009 |
kmacy |
check against prefetch_enable
|
193953 |
10-Jun-2009 |
kmacy |
use default policy for enabling prefetching unless the TUNABLE is set
|
193878 |
10-Jun-2009 |
kmacy |
As far as I can tell systems that have less than 4GB are more often hurt by prefetched than helped. On i386 systems and systems with less than 4GB, prefetch is now disabled by default. I've added a prefetch enable tunable, to enable prefetching for those systems. The prefetch disable tunable will continue to unconditionally disable prefetching.
|
193440 |
04-Jun-2009 |
ps |
Support shared vnode locks for write operations when the offset is provided on filesystems that support it. This really improves mysql + innodb performance on ZFS.
Reviewed by: jhb, kmacy, jeffr
|
193163 |
31-May-2009 |
dfr |
Allow the bootfs property to be set for raidz pools on FreeBSD.
Reviewed by: pjd
|
193128 |
30-May-2009 |
kmacy |
fix xdrmem_control to be safe in an if statement fix zfs to depend on krpc remove xdr from zfs makefile
Submitted by: dchagin@freebsd.org
|
193110 |
30-May-2009 |
kmacy |
work around snapshot shutdown race reported by Henri Hennebert
|
192971 |
28-May-2009 |
kmacy |
MFdevbranch 192944 - add FreeBSD implementation of xdrmem_control needed by zfs - have zfs define xdr_ops using FreeBSD's definition - remove solaris xdr files from zfs compile
|
192853 |
26-May-2009 |
sson |
Add the OpenSolaris dtrace lockstat provider. The lockstat provider adds probes for mutexes, reader/writer and shared/exclusive locks to gather contention statistics and other locking information for dtrace scripts, the lockstat(1M) command and other potential consumers.
Reviewed by: attilio jhb jb Approved by: gnn (mentor)
|
192800 |
26-May-2009 |
trasz |
MFp4 changes neccessary for NFSv4 ACLs support in ZFS. This is mostly about removing a few #ifdefs and providing compatibility wrappers and VOP implementations to get and set an ACL; ZFS does ACL enforcement all by itself.
Note that the VOPs are ifdefed out for now, so this change should be a no-op.
Reviewed by: pjd
|
192689 |
24-May-2009 |
trasz |
Fix comment.
|
192360 |
19-May-2009 |
kmacy |
- back out direct map hack - it is no longer needed
|
192240 |
17-May-2009 |
kmacy |
set createtxg prop name
PR: bin/130105
|
192237 |
17-May-2009 |
kmacy |
SAVESTART implies SAVENAME
|
192211 |
16-May-2009 |
kmacy |
- allow forced unmounts - don't assume snapshot was auto-mounted
|
192209 |
16-May-2009 |
kmacy |
only use direct map if system has more than 2GB
|
192207 |
16-May-2009 |
kmacy |
apply band-aid to x86_64 systems with more physical memory than kmem by allocating from the direct map
|
191990 |
11-May-2009 |
attilio |
Remove the thread argument from the FSD (File-System Dependent) parts of the VFS. Now all the VFS_* functions and relating parts don't want the context as long as it always refers to curthread.
In some points, in particular when dealing with VOPs and functions living in the same namespace (eg. vflush) which still need to be converted, pass curthread explicitly in order to retain the old behaviour. Such loose ends will be fixed ASAP.
While here fix a bug: now, UFS_EXTATTR can be compiled alone without the UFS_EXTATTR_AUTOSTART option.
VFS KPI is heavilly changed by this commit so thirdy parts modules needs to be recompiled. Bump __FreeBSD_version in order to signal such situation.
|
191984 |
11-May-2009 |
kmacy |
rename xdr support files to avoid conflicts when linking in to the kernel
|
191931 |
09-May-2009 |
kmacy |
- rename atomic.S and crc32.c to avoid collisions when linking zfs in to the kernel - update Makefile - ifdef out acl_{alloc, free}, they aren't used by zfs and conflict with existing in-kernel routines
|
191907 |
07-May-2009 |
kmacy |
don't call vn_rele_async_fini in the !_KERNEL case
|
191905 |
07-May-2009 |
kmacy |
move VN_RELE_ASYNC to the compatibility layer with the rest of the VN_* defines
|
191903 |
07-May-2009 |
kmacy |
avoid LOR and gratuitous extra lock acquisitions by moving user_evict list buffers to a temporary list
|
191902 |
07-May-2009 |
kmacy |
Allow the VM to provide backpressure on the ARC cache as it does on Solaris.
|
191900 |
07-May-2009 |
kmacy |
Asynchronously release vnodes to avoid blocking on range locks when calling back in to zfs. This is based on a fix that went in to opensolaris on March 9th. However, it uses a dedicated thread instead of a Solaris' taskq to avoid doing a blocking memory allocation with the vnode interlock held.
This fixes a long-time deadlock in ZFS. This is not, strictly speaking, an LOR. The spa_zio thread releases a vnode, this calls in to vn_reclaim which in turn needs to acquire range locks to sync dirty data out to disk. The range locks are already held by a user-level process waiting on a condition variable that it the process is waiting on a spa_zio thread to signal it on. The process could not be signalled because the spa_zio thread could not proceed.
The nature of this problem was not apparent due to ZFS locks opting out of witness which meant that DDB did not know about the locks that were held by ZFS.
Reviewed by: pjd MFC after: 7 days
|
190888 |
10-Apr-2009 |
rwatson |
Remove VOP_LEASE and supporting functions. This hasn't been used since the removal of NQNFS, but was left in in case it was required for NFSv4. Since our new NFSv4 client and server can't use it for their requirements, GC the old mechanism, as well as other unused lease- related code and interfaces.
Due to its impact on kernel programming and binary interfaces, this change should not be MFC'd.
Proposed by: jeff Reviewed by: jeff Discussed with: rmacklem, zach loafman @ isilon
|
190878 |
10-Apr-2009 |
thompsa |
Revert r190676,190677
The geom and CAM changes for root_hold are the wrong solution for USB design quirks.
Requested by: scottl
|
190676 |
03-Apr-2009 |
thompsa |
Add a how argument to root_mount_hold() so it can be passed NOWAIT and be called in situations where sleeping isnt allowed.
|
189967 |
18-Mar-2009 |
jhb |
The zfs_get_xattrdir() function is used to find the extended attribute directory for a znode. When the directory already exists, it returns a referenced but unlocked vnode. When a directory does not yet exist, it calls zfs_make_xattrdir() to create a new one. zfs_make_xattrdir() returns the vnode both referenced and and locked and zfs_get_xattrdir() was leaking this vnode lock to its callers. Fix this by dropping the vnode lock if zfs_make_xattrdir() successfully creates a new extended attribute directory.
Reviewed by: pjd
|
189696 |
11-Mar-2009 |
jhb |
Add a new internal mount flag (MNTK_EXTENDED_SHARED) to indicate that a filesystem supports additional operations using shared vnode locks. Currently this is used to enable shared locks for open() and close() of read-only file descriptors. - When an ISOPEN namei() request is performed with LOCKSHARED, use a shared vnode lock for the leaf vnode only if the mount point has the extended shared flag set. - Set LOCKSHARED in vn_open_cred() for requests that specify O_RDONLY but not O_CREAT. - Use a shared vnode lock around VOP_CLOSE() if the file was opened with O_RDONLY and the mountpoint has the extended shared flag set. - Adjust md(4) to upgrade the vnode lock on the vnode it gets back from vn_open() since it now may only have a shared vnode lock. - Don't enable shared vnode locks on FIFO vnodes in ZFS and UFS since FIFO's require exclusive vnode locks for their open() and close() routines. (My recent MPSAFE patches for UDF and cd9660 already included this change.) - Enable extended shared operations on UFS, cd9660, and UDF.
Submitted by: ups Reviewed by: pjd (ZFS bits) MFC after: 1 month
|
188588 |
13-Feb-2009 |
jhb |
Use shared vnode locks when invoking VOP_READDIR().
MFC after: 1 month
|
187830 |
28-Jan-2009 |
ed |
Last step of splitting up minor and unit numbers: remove minor().
Inside the kernel, the minor() function was responsible for obtaining the device minor number of a character device. Because we made device numbers dynamically allocated and independent of the unit number passed to make_dev() a long time ago, it was actually a misnomer. If you really want to obtain the device number, you should use dev2udev().
We already converted all the drivers to use dev2unit() to obtain the device unit number, which is still used by a lot of drivers. I've noticed not a single driver passes NULL to dev2unit(). Even if they would, its behaviour would make little sense. This is why I've removed the NULL check.
Ths commit removes minor(), minor2unit() and unit2minor() from the kernel. Because there was a naming collision with uminor(), we can rename umajor() and uminor() back to major() and minor(). This means that the makedev(3) manual page also applies to kernel space code now.
I suspect umajor() and uminor() isn't used that often in external code, but to make it easier for other parties to port their code, I've increased __FreeBSD_version to 800062.
|
185614 |
04-Dec-2008 |
imp |
Put the MIPS support back in after it was removed in r185029.
|
185321 |
25-Nov-2008 |
trasz |
MFp4: We don't support TX_CREATE_ACL_ATTR nor TX_MKDIR_ACL_ATTR; code found in zfs_replay.c will panic if it encounters transactions of this type. Make sure we don't put these into the ZIL.
Approved by: rwatson (mentor), pjd
|
185319 |
25-Nov-2008 |
pjd |
Fix locking (file descriptor table and Giant around VFS).
Most submitted by: kib Reviewed by: kib
|
185174 |
22-Nov-2008 |
pjd |
IFp4: Don't rely on disk IDs and always use vdev guids, which means always look up for components by reading metadata. This might be slower when there are big number of disks in the system, but is definiately more reliable.
|
185172 |
22-Nov-2008 |
pjd |
IFp4: Finish implemnetation of chflags(2) for ZFS. While doing this I found that zfs_access() can only handle VREAD, VWRITE and VEXEC, for the rest we need to use vaccess(9).
|
185171 |
22-Nov-2008 |
pjd |
IFp4: Don't free pathname too soon, debugging code is still using it.
|
185029 |
17-Nov-2008 |
pjd |
Update ZFS from version 6 to 13 and bring some FreeBSD-specific changes.
This bring huge amount of changes, I'll enumerate only user-visible changes:
- Delegated Administration
Allows regular users to perform ZFS operations, like file system creation, snapshot creation, etc.
- L2ARC
Level 2 cache for ZFS - allows to use additional disks for cache. Huge performance improvements mostly for random read of mostly static content.
- slog
Allow to use additional disks for ZFS Intent Log to speed up operations like fsync(2).
- vfs.zfs.super_owner
Allows regular users to perform privileged operations on files stored on ZFS file systems owned by him. Very careful with this one.
- chflags(2)
Not all the flags are supported. This still needs work.
- ZFSBoot
Support to boot off of ZFS pool. Not finished, AFAIK.
Submitted by: dfr
- Snapshot properties
- New failure modes
Before if write requested failed, system paniced. Now one can select from one of three failure modes: - panic - panic on write error - wait - wait for disk to reappear - continue - serve read requests if possible, block write requests
- Refquota, refreservation properties
Just quota and reservation properties, but don't count space consumed by children file systems, clones and snapshots.
- Sparse volumes
ZVOLs that don't reserve space in the pool.
- External attributes
Compatible with extattr(2).
- NFSv4-ACLs
Not sure about the status, might not be complete yet.
Submitted by: trasz
- Creation-time properties
- Regression tests for zpool(8) command.
Obtained from: OpenSolaris
|
184770 |
08-Nov-2008 |
trasz |
Require write access on a directory being moved from one parent directory to another in ZFS.
Approved by: rwatson (mentor), pjd
|
184740 |
06-Nov-2008 |
trasz |
Backoff the last patch. It was overly restrictive - we want to check for write permission on target only when moving the target between two directories.
Approved by: rwatson (mentor)
|
184737 |
06-Nov-2008 |
trasz |
Change ZFS behaviour to match UFS: when moving (rename(2)) a subdirectory from one parent directory to another, in addition to the usual access checks one also needs write access to the subdirectory being moved.
Approved by: rwatson (mentor), pjd
|
184698 |
05-Nov-2008 |
rodrigc |
Merge latest DTrace changes from Perforce.
|
184413 |
28-Oct-2008 |
trasz |
Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessary to add more V* constants, and the variables changed by this patch were often being assigned to mode_t variables, which is 16 bit.
Approved by: rwatson (mentor)
|
183754 |
10-Oct-2008 |
attilio |
Remove the struct thread unuseful argument from bufobj interface. In particular following functions KPI results modified: - bufobj_invalbuf() - bufsync()
and BO_SYNC() "virtual method" of the buffer objects set. Main consumers of bufobj functions are affected by this change too and, in particular, functions which changed their KPI are: - vinvalbuf() - g_vfs_close()
Due to the KPI breakage, __FreeBSD_version will be bumped in a later commit.
As a side note, please consider just temporary the 'curthread' argument passing to VOP_SYNC() (in bufsync()) as it will be axed out ASAP
Reviewed by: kib Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
|
183417 |
27-Sep-2008 |
jb |
Disable use of the user credentials until there is code to set the levels that DTrace uses.
This fixes a bug that would have affected kernels built with MAC and all kernels built after the mpsafetty integration.
The bug will be apparent in RELENG7 on MAC kernels.
Reported by: kan
|
183397 |
27-Sep-2008 |
ed |
Replace all calls to minor() with dev2unit().
After I removed all the unit2minor()/minor2unit() calls from the kernel yesterday, I realised calling minor() everywhere is quite confusing. Character devices now only have the ability to store a unit number, not a minor number. Remove the confusion by using dev2unit() everywhere.
This commit could also be considered as a bug fix. A lot of drivers call minor(), while they should actually be calling dev2unit(). In -CURRENT this isn't a problem, but it turns out we never had any problem reports related to that issue in the past. I suspect not many people connect more than 256 pieces of the same hardware.
Reviewed by: kib
|
183037 |
15-Sep-2008 |
pjd |
Add missing ZFS_EXIT().
PR: kern/124899 Submitted by: Masakazu Asama <m-asama@ginzado.ne.jp>
|
182905 |
10-Sep-2008 |
trasz |
Remove VSVTX, VSGID and VSUID. This should be a no-op, as VSVTX == S_ISVTX, VSGID == S_ISGID and VSUID == S_ISUID.
Approved by: rwatson (mentor)
|
182840 |
07-Sep-2008 |
pjd |
Initialize vp, so we don't call VOP_UNLOCK() with NULL vnode pointer.
Confirmed by: marcus
|
182824 |
06-Sep-2008 |
pjd |
Lock vnode exclusively around insmntque().
|
182781 |
05-Sep-2008 |
pjd |
Catch up after last insmntque() changes: - The vnode has to be locked exclusively before calling insmntque(). - Until I find a way to handle insmntque() failures use VV_FORCEINSMQ flag to force insmntque() to always succeed.
Reported by: kris, trasz, des, others Suggested by: kib Tested by: trasz
|
182371 |
28-Aug-2008 |
attilio |
Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread was always curthread and totally unuseful.
Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
|
182031 |
23-Aug-2008 |
imp |
Add MIPS support.
Reviewed by: jb@
|
181879 |
19-Aug-2008 |
jb |
Add calls to callout_drain() to ensure the callouts are flushed before we free memory from underneath them.
This fixes an occasional panic I've been seeing in softclock() where a bad pointer would be encountered when pushing DTrace hard.
|
180660 |
21-Jul-2008 |
pjd |
We want to use LBOLT instead of lbolt on FreeBSD. I've this already fixed in p4, but the fix was never integrated into HEAD.
Reported by: ed
|
179758 |
12-Jun-2008 |
ed |
Remove the $FreeBSD$ tag again, now I know fbsd:nokeywords exists.
Requested by: pjd Approved by: philip (mentor)
|
179757 |
12-Jun-2008 |
ed |
Turn dev2unit(), minor(), unit2minor() and minor2unit() into macro's.
Now that we got rid of the minor-to-unit conversion and the constraints on device minor numbers, we can convert the functions that operate on minor and unit numbers to simple macro's. The unit2minor() and minor2unit() macro's are now no-ops.
The ZFS code als defined a macro named `minor'. Change the ZFS code to use umajor() and uminor() here, as it is the correct approach to do this. Also add $FreeBSD$ to keep SVN happy.
Approved by: philip (mentor), pjd
|
179726 |
11-Jun-2008 |
ed |
Don't enforce unique device minor number policy anymore.
Except for the case where we use the cloner library (clone_create() and friends), there is no reason to enforce a unique device minor number policy. There are various drivers in the source tree that allocate unr pools and such to provide minor numbers, without using them themselves.
Because we still need to support unique device minor numbers for the cloner library, introduce a new flag called D_NEEDMINOR. All cdevsw's that are used in combination with the cloner library should be marked with this flag to make the cloning work.
This means drivers can now freely use si_drv0 to store their own flags and state, making it effectively the same as si_drv1 and si_drv2. We still keep the minor() and dev2unit() routines around to make drivers happy.
The NTFS code also used the minor number in its hash table. We should not do this anymore. If the si_drv0 field would be changed, it would no longer end up in the same list.
Approved by: philip (mentor)
|
179469 |
01-Jun-2008 |
jb |
Merge a recent change from the OpenSolaris source tree. (Don't ask for a vendor import of this yet, we're in the early days of svn)
Instead of using cyclic timers to call the state clean and deadman callbacks, use a callout on FreeBSD to avoid the deadlock on FreeBSD due to trying to send interprocessor interrupts with interrupts disabled.
Reported by: ps, jhb, peter, thompsa
|
179310 |
25-May-2008 |
pjd |
Fix namespace collision after src/sys/sys/file.h:1.78.
|
179307 |
25-May-2008 |
jb |
Comment out the code that breaks with invariants. This is stuff that is still WIP along with the lockstat provider, so there is no harm leaving it out for now.
|
179280 |
24-May-2008 |
jb |
Make the zfs module depend on the opensolaris module in preparation for it to shared stuff with the DTrace modules.
|
179264 |
23-May-2008 |
jb |
Delete a couple of OpenSolaris headers which get in the way of our implementation.
|
179198 |
22-May-2008 |
jb |
FreeBSD changes to vendor source.
|
179194 |
22-May-2008 |
jb |
This commit was generated by cvs2svn to compensate for changes in r179193, which included commits to RCS files with non-trunk default branches.
|
178243 |
16-Apr-2008 |
kib |
Move the head of byte-level advisory lock list from the filesystem-specific vnode data to the struct vnode. Provide the default implementation for the vop_advlock and vop_advlockasync. Purge the locks on the vnode reclaim by using the lf_purgelocks(). The default implementation is augmented for the nfs and smbfs. In the nfs_advlock, push the Giant inside the nfs_dolock.
Before the change, the vop_advlock and vop_advlockasync have taken the unlocked vnode and dereferenced the fs-private inode data, racing with with the vnode reclamation due to forced unmount. Now, the vop_getattr under the shared vnode lock is used to obtain the inode size, and later, in the lf_advlockasync, after locking the vnode interlock, the VI_DOOMED flag is checked to prevent an operation on the doomed vnode.
The implementation of the lf_purgelocks() is submitted by dfr.
Reported by: kris Tested by: kris, pho Discussed with: jeff, dfr MFC after: 2 weeks
|
178129 |
11-Apr-2008 |
marius |
Add atomic operations for ZFS/sparc64.
Approved by: core, pjd Obtained from: OpenSolaris (w/ adaptations) MFC after: 2 weeks
|
178127 |
11-Apr-2008 |
marius |
- Fix the path encoded in the multiple inclusion protection. - GCC uses 32-byte function alignment for UltraSPARC CPUs. - Remove code duplication.
Approved by: core, pjd MFC after: 2 weeks
|
177633 |
26-Mar-2008 |
dfr |
Add the new kernel-mode NFS Lock Manager. To use it instead of the user-mode lock manager, build a kernel with the NFSLOCKD option and add '-k' to 'rpc_lockd_flags' in rc.conf.
Highlights include:
* Thread-safe kernel RPC client - many threads can use the same RPC client handle safely with replies being de-multiplexed at the socket upcall (typically driven directly by the NIC interrupt) and handed off to whichever thread matches the reply. For UDP sockets, many RPC clients can share the same socket. This allows the use of a single privileged UDP port number to talk to an arbitrary number of remote hosts.
* Single-threaded kernel RPC server. Adding support for multi-threaded server would be relatively straightforward and would follow approximately the Solaris KPI. A single thread should be sufficient for the NLM since it should rarely block in normal operation.
* Kernel mode NLM server supporting cancel requests and granted callbacks. I've tested the NLM server reasonably extensively - it passes both my own tests and the NFS Connectathon locking tests running on Solaris, Mac OS X and Ubuntu Linux.
* Userland NLM client supported. While the NLM server doesn't have support for the local NFS client's locking needs, it does have to field async replies and granted callbacks from remote NLMs that the local client has contacted. We relay these replies to the userland rpc.lockd over a local domain RPC socket.
* Robust deadlock detection for the local lock manager. In particular it will detect deadlocks caused by a lock request that covers more than one blocking request. As required by the NLM protocol, all deadlock detection happens synchronously - a user is guaranteed that if a lock request isn't rejected immediately, the lock will eventually be granted. The old system allowed for a 'deferred deadlock' condition where a blocked lock request could wake up and find that some other deadlock-causing lock owner had beaten them to the lock.
* Since both local and remote locks are managed by the same kernel locking code, local and remote processes can safely use file locks for mutual exclusion. Local processes have no fairness advantage compared to remote processes when contending to lock a region that has just been unlocked - the local lock manager enforces a strict first-come first-served model for both local and remote lockers.
Sponsored by: Isilon Systems PR: 95247 107555 115524 116679 MFC after: 2 weeks
|
177253 |
16-Mar-2008 |
rwatson |
In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr.
MFC after: 1 month Discussed with: imp, rink
|
177230 |
15-Mar-2008 |
pjd |
Fix mmap(2) on ZFS after some changes in VM subsystem.
Submitted by: alc Reported by: kris (originally) and many others Tested with: fsx MFC after: 1 week
|
176559 |
25-Feb-2008 |
attilio |
Axe the 'thread' argument from VOP_ISLOCKED() and lockstatus() as it is always curthread.
As KPI gets broken by this patch, manpages and __FreeBSD_version will be updated by further commits.
Tested by: Andrea Barberio <insomniac at slackware dot it>
|
176519 |
24-Feb-2008 |
attilio |
Introduce some functions in the vnode locks namespace and in the ffs namespace in order to handle lockmgr fields in a controlled way instead than spreading all around bogus stubs: - VN_LOCK_AREC() allows lock recursion for a specified vnode - VN_LOCK_ASHARE() allows lock sharing for a specified vnode
In FFS land: - BUF_AREC() allows lock recursion for a specified buffer lock - BUF_NOREC() disallows recursion for a specified buffer lock
Side note: union_subr.c::unionfs_node_update() is the only other function directly handling lockmgr fields. As this is not simple to fix, it has been left behind as "sole" exception.
|
175633 |
24-Jan-2008 |
pjd |
- Reduce how much ZFS caches by default. This is another change to mitigate 'kmem_map too small panics'. - Print two warnings if there is not enough memory and not enough address space. - Improve comment.
|
175294 |
13-Jan-2008 |
attilio |
VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary.
KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed.
Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
|
175202 |
10-Jan-2008 |
attilio |
vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed.
Manpage and FreeBSD_version will be updated through further commits.
As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock.
Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>
|
174049 |
28-Nov-2007 |
jb |
* Check endianness the FreeBSD way.
* Use LBOLT rather than lbolt to avoid a clash with a FreeBSD global variable.
|
174048 |
28-Nov-2007 |
jb |
Fix a prototype definition.
|
174047 |
28-Nov-2007 |
jb |
Check endianness the FreeBSD way.
|
174046 |
28-Nov-2007 |
jb |
Include an extra header to get this to compile cleanly.
|
173419 |
07-Nov-2007 |
pjd |
Warn if kmem_map size is set to less than 512MB. Previous warning was a bit pointless, because default is set to something around 300MB and also insufficient.
MFC after: 3 days
|
173374 |
05-Nov-2007 |
pjd |
Remove unused header.
MFC after: 3 days
|
173373 |
05-Nov-2007 |
pjd |
If setting a state to anything but open state, close access to vdev. This fixes replacing drive in place, eg. zpool replace tank da1 da1. Before it complained that device is already open.
MFC after: 1 week
|
173268 |
02-Nov-2007 |
lulf |
- Add sysctl for sizeof(znode_t), which will be used by fstat(1).
Approved by: pjd (mentor)
|
173250 |
01-Nov-2007 |
pjd |
Call zil_commit() (if ZIL is not disabled) after every non-read request (BIO_WRITE and BIO_FLUSH) as it is done is Solaris. The difference is that Solaris calls it only for sync requests, but we can't say in GEOM is the request is sync or async, so we do it for every request.
MFC after: 1 week
|
172836 |
20-Oct-2007 |
julian |
Rename the kthread_xxx (e.g. kthread_create()) calls to kproc_xxx as they actually make whole processes. Thos makes way for us to add REAL kthread_create() and friends that actually make theads. it turns out that most of these calls actually end up being moved back to the thread version when it's added. but we need to make this cosmetic change first.
I'd LOVE to do this rename in 7.0 so that we can eventually MFC the new kthread_xxx() calls.
|
172645 |
14-Oct-2007 |
thompsa |
ZFS_LOG adds a newline by itself.
Pointed out by: pjd
|
172624 |
14-Oct-2007 |
thompsa |
Print the ZFS ereport to the console if vfs.zfs.debug is set to help diagnose problems with zfs-on-root since devd isnt running yet.
Reviewed by: pjd
|
172443 |
04-Oct-2007 |
pjd |
Fix lock leak leading to the 'System call <name> returning with 1 locks held' panic.
Reported by: kris Approved by: re (kensmith)
|
172301 |
23-Sep-2007 |
pjd |
Now that we have CDDLed code in the tree, add CDDL license.
Discussed with: core Approved by: re (kensmith)
|
172135 |
10-Sep-2007 |
pjd |
Reduce the limit of vnodes on i386 when ZFS is loaded to 3/4 of the original value, so we don't run out of KVA. The default vnodes limit fits better for UFS, but ZFS allocated more file system specific memory for a vnode than UFS.
Don't touch vnodes limit if we detect it was tuned by system administrator and restore original value when ZFS is unloaded.
This isn't final fix, but before we implement something better, this will help to stabilize ZFS under heavy load on i386.
Approved by: re (bmah)
|
172130 |
10-Sep-2007 |
pjd |
After dfr@ vnode leak fix, we can allow ARC to consume more memory.
Tested by: kris Approved by: re (bmah)
|
172030 |
01-Sep-2007 |
pjd |
Use CTLFLAG_RDTUN for tunable sysctls.
Approved by: re (bmah)
|
171567 |
24-Jul-2007 |
pjd |
Update assertion after revision 1.23.
Reviewed by: dfr Approved by: re (rwatson)
|
171316 |
09-Jul-2007 |
dfr |
Correct a reference-counting mistake in the ZFS code which led to abnormal memory usage and pessimal cache performance.
Reviewed by: pjd Approved by: re (rwatson)
|
171063 |
27-Jun-2007 |
dfr |
In zfs_vget, if we fail to translate an inode number to the corresponding vnode, make sure we return an error code to the caller.
Reviewed by: pjd Approved by: re
|
170437 |
08-Jun-2007 |
marcel |
Add my copyright.
Requested by: pjd@
|
170431 |
08-Jun-2007 |
pjd |
- Reduce number of atomic operations needed to be implemented in asm by implementing some of them using existing ones. - Allow to compile ZFS on all archs and use atomic operations surrounded by global mutex on archs we don't have or can't have all atomic operations needed by ZFS.
|
170430 |
08-Jun-2007 |
pjd |
Missing atomic operations for ZFS/ia64.
Submitted by: marcel
|
170281 |
04-Jun-2007 |
pjd |
Reimplement traverse() helper function: 1. Pass locking flags to VFS_ROOT(). 2. Check v_mountedhere while the vnode is locked. 3. Always return locked vnode on success.
Change 1 fixes problem reported by Stephen M. Rumble - after zfs_vfsops.c,1.9 change, zfs_root() no longer locks the vnode unconditionally and traverse() didn't pass right lock type to VFS_ROOT(). The result was that kernel paniced when .zfs/ directory was accessed via NFS.
|
170044 |
28-May-2007 |
pjd |
Adjust va_mask for setattr. FreeBSD doesn't have va_mask, so we initialize it based on individual fields beeing set. This doesn't work for setattr replay, because va_type is set there, so we add AT_TYPE flag to va_mask, which won't be accepted by zfs_setattr().
Reported by: kris
|
170040 |
28-May-2007 |
pjd |
Because we allocate componentname structures on stack, bzero() them before use just in case.
|
169929 |
24-May-2007 |
pjd |
Initialize ZFS a bit earlier and block root mounting until initialization is complete. This fixes some root-on-ZFS configurations.
Reported by: Bruno Damour <freebsd.ruomad@free.fr> Tested by: Bruno Damour <freebsd.ruomad@free.fr>
|
169920 |
23-May-2007 |
pjd |
FreeBSD's namecache works quite well with ZFS, so remove DNLC.
|
169919 |
23-May-2007 |
pjd |
All objects we create using GFS are directories, so initialize d_type properly, but add XXX comment saying that it can eventually change in the future.
|
169884 |
22-May-2007 |
pjd |
Lock vnode on lookup. This fixes ZIL replay for rmdir/unlink/rename.
Reported by: des
|
169430 |
09-May-2007 |
pjd |
Increase debug level - this message is not that important.
|
169325 |
06-May-2007 |
pjd |
- Add missing lock destruction and remove duplicate initializations. With this change it is possible to unload zfs.ko module from WITNESS-enabled kernel. - Remove bogus comment.
|
169303 |
06-May-2007 |
pjd |
Use provider's ident to handle situations when disks are moved around and show up with different names: first try to open provider using remembered name and compare its ident, if equal, this is our provider, if not equal or there is no provider with such name, find provider with remembered ident and don't care about the name.
|
169302 |
06-May-2007 |
pjd |
MFp4: We don't need to cover vnode_pager_setsize() with the z_map_lock.
|
169199 |
02-May-2007 |
pjd |
Share-lock a vnode where possible.
|
169198 |
02-May-2007 |
pjd |
When parent directory has to be unlocked, lock it back with the same lock type. Before this change, if directory was shared-locked, it was relocked exclusively.
|
169197 |
02-May-2007 |
pjd |
Lock vnode using cn_lkflags in case the caller wants the vnode to be shared-locked.
|
169196 |
02-May-2007 |
pjd |
The getnewvnode() function sets LK_NOSHARE by default, so if we want to support shared vnodes locking, we need to remove that flag. Also add LK_CANRECURSE flag as found in nfsclient.
|
169195 |
02-May-2007 |
pjd |
ZFS should update timestamps upon the creat() of an existing file.
Obtained from: OpenSolaris Bug: http://bugs.opensolaris.org/view_bug.do?bug_id=6465105
|
169194 |
02-May-2007 |
pjd |
- Lock vnode with flags passed in as argument in zfs_vget() and zfs_root().
Pointed out by: ups Also reported by: kris
- Add comments where I'm not sure if LK_RETRY should be used.
|
169172 |
01-May-2007 |
pjd |
MFp4: Remove LK_RETRY flag when locking vnode in zfs_lookup, we don't want dead vnodes here.
Suggested by: kib
|
169170 |
01-May-2007 |
pjd |
White space fixes.
|
169167 |
01-May-2007 |
pjd |
Add a comment explaining why we call dmu_write() unconditionally, even if uiomove() fails, especially that it is different from what OpenSolaris does (I'm not entirely sure they are right).
Suggested by: darrenr
|
169108 |
29-Apr-2007 |
pjd |
- Define d_type for ".", ".." and ".zfs" directories. - Add a TODO comment where d_type is still noe defined.
|
169107 |
29-Apr-2007 |
pjd |
Oops, correct important typo in last commit.
|
169106 |
29-Apr-2007 |
pjd |
Avoid freeing NULL pointer in case of an error.
|
169087 |
29-Apr-2007 |
pjd |
Fix two use-after-free cases.
|
169059 |
26-Apr-2007 |
pjd |
MFp4: Optimize mappedwrite() and mappedread() functions to write/read as much non-mapped data as possible at once and not page-by-page. Which this change we combain I/Os, but also saves many VM_OBJECT_UNLOCK()/VM_OBJECT_LOCK() operations.
Simple 'fsx -l 33554432 -o 524288 -N 10000 /tank/fsx' test shows ~23% performance increase.
|
169057 |
26-Apr-2007 |
pjd |
- Always try to write one whole page at a time. - vm_page_undirty() is enough (instead of vm_page_set_validclean()), but it has to be called before we write the data in case someone makes page dirty after our write, but before our vm_page_undirty() call. - Always dmu_write, not matter if uiomove() succeeded, because it could partially be ok and we would lose some changes.
All good ideas from: ups
|
169056 |
26-Apr-2007 |
pjd |
MFV: Free znodes immediatelly, allowing the ARC to hold onto less memory.
Full description at: http://bugs.opensolaris.org/view_bug.do?bug_id=6543706
|
169055 |
26-Apr-2007 |
pjd |
MFV: Functions name change.
|
169028 |
24-Apr-2007 |
pjd |
ZIL (ZFS Intent Log) can be safely turned on and off at run time, because it is only used when dataset is beeing mounted to decide if log should also be opened.
|
169027 |
24-Apr-2007 |
pjd |
MFp4: Now that ZFS can use FreeBSD's namecache, turn it off by default and turn off DNLC, but don't remove DNLC yet just in case.
|
169025 |
24-Apr-2007 |
pjd |
MFp4: Rearange the code so vobject is destroyed from reclaim() method like in all other file system on FreeBSD (instead from inactive() method).
A nice side-effect of this change, except that it speedups file system when mmaped file are often open/closed, is that it makes FreeBSD's namecache work:)
|
169024 |
24-Apr-2007 |
pjd |
MFp4: Once page is written successfully, we should clear the dirty bits. This fixes slow operations on mmaped files, because without this fix, pages were written to disk multiple times.
If one is looking for even greater speed up for such operation, he should disable ZIL (by setting vfs.zfs.zil_disable to 1 in /boot/loader.conf). Disabling ZIL makes fsx run ~9 times faster.
|
169023 |
24-Apr-2007 |
pjd |
MFp4: Reduce diff against vendor.
|
169022 |
24-Apr-2007 |
pjd |
MFp4: We have stronger 'lock already initialized' check now, so we can reduce diff against the vendor by removing bzero of this mutex.
|
168987 |
23-Apr-2007 |
bmah |
Mostly-cosmetic fixes in low-memory warning messages:
o Fix linewrap issues.
o Fix two typos (s/Recomended/Recommended/ and s/tunning/tuning/)
o Remove a couple of extra instances of the word "of".
o Update names of kmem_size variables.
Approved by: pjd
|
168978 |
23-Apr-2007 |
pjd |
Too much diff reduction. 'cmd' has to be u_long.
Reported by: delphij
|
168962 |
23-Apr-2007 |
pjd |
MFp4: Reduce diff against vendor code: - Move FreeBSD-specific code to zfs_freebsd_*() functions in zfs_vnops.c and keep original functions as similar to vendor's code as possible. - Add various includes back, now that we have them.
|
168959 |
22-Apr-2007 |
pjd |
Fix 'zpool status -v'. To get object number we should use ZFS_DIRENT_OBJ() macro, as za_first_integer field also contains type. This should be fixed in ZFS itself, but this bug is not visible on Solaris, because there, type is not stored in za_first_integer. On the other hand it will be visible on MacOS X.
Reported by: Barry Pederson <bp@barryp.org>
|
168958 |
22-Apr-2007 |
pjd |
Fix st_rdev handling (implement it, actually).
Reported by: gj
|
168926 |
21-Apr-2007 |
pjd |
MFp4:
@118370 Correct typo.
@118371 Integrate changes from vendor.
@118491 Show backtrace on unexpected code paths.
@118494 Integrate changes from vendor.
@118504 Fix sendfile(2). I had two ways of fixing it: 1. Fixing sendfile(2) itself to use VOP_GETPAGES() instead of hacking around with vn_rdwr(UIO_NOCOPY), which was suggested by ups. 2. Modify ZFS behaviour to handle this special case.
Although 1 is more correct, I've choosen 2, because hack from 1 have a side-effect of beeing faster - it reads ahead MAXBSIZE bytes instead of reading page by page. This is not easy to implement with VOP_GETPAGES(), at least not for me in this very moment.
Reported by: Andrey V. Elsukov <bu7cher@yandex.ru>
@118525 Reorganize the code to reduce diff.
@118526 This code path is expected. It is simply when file is opened with O_FSYNC flag.
Reported by: kris Reported by: Michal Suszko <dry@dry.pl>
|
168839 |
18-Apr-2007 |
pjd |
MFp4: We check for PRIV_VFS_MOUNT already in mount(2) syscall and we don't want to do the check when snapshot is automatically mounted by an unprivileged user doing lookup on a snapshot directory.
|
168826 |
17-Apr-2007 |
pjd |
Simplify.
|
168821 |
17-Apr-2007 |
pjd |
Ignore hostid check for root-on-ZFS configurations. Making hostid available before the root is mounted is tricky and having it in /boot/ is not really desire.
Reported by: Zephiris <zephiris@gmail.com>
|
168775 |
16-Apr-2007 |
pjd |
Uncomment forgotten check. Without this check in-place, ZFS will panic on unload instead of returning EBUSY. This check tells if there are mounted ZFS file systems or not. We can't unload if there are mounted file systems.
Reported by: Andrey V. Elsukov <bu7cher@yandex.ru>
|
168753 |
15-Apr-2007 |
pjd |
MFp4: Start DNLC after desiredvnodes variable is initialized. Before this change if zfs.ko was loaded by the loader, DNLC was automatically disabled.
Reported by: Zephiris <zephiris@gmail.com>
|
168738 |
14-Apr-2007 |
pjd |
Fix RAID-Z resilvering.
Obtained from: OpenSolaris
|
168724 |
14-Apr-2007 |
pjd |
MFp4: Hmm, it seems to work now.
|
168715 |
14-Apr-2007 |
pjd |
MFp4: Use max_ncpus, which is used in other places in the code.
|
168714 |
14-Apr-2007 |
pjd |
MFp4: Add more debug, so we can see if zpool.cache was loaded or why it wasn't loaded.
|
168713 |
14-Apr-2007 |
pjd |
MFp4: Allow to tune vfs.zfs.debug from loader.conf.
|
168712 |
14-Apr-2007 |
pjd |
MFp4: - Allow to tune number of spa_zio_* threads. - Reduce default number of spa_zio_* threads to N*spa_zio_issue plus N*spa_zio_intr threads per ZIO type, where N is the number of CPUs. - Put ZIO type number in thread's name.
|
168696 |
13-Apr-2007 |
pjd |
Fix overflow, which was causing endless loops when 32bit machine had more than 2GB of RAM. This was because our physmem is long and 'physmem*PAGESIZE' can be negative for more than 2GB of memory.
Reported by: Andrey V. Elsukov <bu7cher@yandex.ru>
It is not yet tested by Andrey, so there can be other problems, but this was definiately a bug, so I'm committing a fix now.
|
168683 |
13-Apr-2007 |
pjd |
Fix vnodes starvation caused by DNLC (ZFS name cache): - Tune number of namecache entires better (based on desiredvnodes). - Handle vfs_lowvnodes event by releasing requested number of name cache entries, but no less than 5%.
Reported by: simokawa
|
168676 |
12-Apr-2007 |
pjd |
MFp4: Synchronize with vendor (mostly 'zfs rename -r').
|
168675 |
12-Apr-2007 |
pjd |
MFp4: Bring back comments.
Requested by: jhb
|
168583 |
10-Apr-2007 |
pjd |
MFp4: Allow to set zfs_recover via vfs.zfs.recover from /boot/loader.conf.
|
168582 |
10-Apr-2007 |
pjd |
MFp4: Hide under '#ifdef _KERNEL' only what's really needed.
|
168566 |
10-Apr-2007 |
pjd |
Try to stabilize ZFS with regard to memory consumption: - Allow to shrink ARC down to 16MB (instead of 64MB). - Set arc_max to 1/2 of kmem_map by default. - Start freeing things earlier when low memory situation is detected. - Serialize execution of arc_lowmem().
I decided to setup minimum ZFS memory requirements to 512MB of RAM and 256MB of kmem_map size. If there is less RAM or kmem_map, a warning will be printed. World is cruel, be no better. In other words: modern file system requires modern hardware:)
From ZFS administration guide:
"Currently the minimum amount of memory recommended to install a Solaris system is 512 Mbytes. However, for good ZFS performance, at least one Gbyte or more of memory is recommended."
|
168565 |
10-Apr-2007 |
pjd |
Reduce diff against vendor - we have now stronger check for "mutex already initialized", so we can go back to kmem_alloc().
|
168559 |
09-Apr-2007 |
pjd |
Remove unused #define.
|
168511 |
09-Apr-2007 |
pjd |
We don't have to wait for the root file system to be mounted anymore, now that kobj KPI supports operating on files loaded by the loader.
|
168510 |
09-Apr-2007 |
pjd |
Drop the Giant lock before calling zfs_domount(), which is held when mounting root file system.
|
168509 |
08-Apr-2007 |
pjd |
Move zpool.cache from /etc/zfs/ to /boot/zfs/, so we can keep it on dedicated /boot/ file system and use ZFS for the root file system.
|
168498 |
08-Apr-2007 |
pjd |
MFp4: Synchronize with recent OpenSolaris changes.
|
168494 |
08-Apr-2007 |
pjd |
- Use 'name=value' so it can be properly recognized by devd(8). - Use only subclass as devd's type.
|
168488 |
08-Apr-2007 |
pjd |
Take vnode pointer and hold it under znode lock, so we won't race with zfs_reclaim(). This may or may not fix problem reported by kris, but it's definiatelly better that way.
|
168482 |
07-Apr-2007 |
pjd |
Move atomic.S files to directories that better fit OpenSolaris directory layout.
|
168481 |
07-Apr-2007 |
pjd |
Fix libzpool compilation.
Reported by: des
|
168478 |
07-Apr-2007 |
pjd |
Limit the number of system taskq threads to the number of CPUs. They are only used when there is a need for reducing namecache.
Observed by: kris, csjp
|
168474 |
07-Apr-2007 |
des |
Fix some type mismatches.
Reviewed by: pjd@
|
168473 |
07-Apr-2007 |
pjd |
Allow to tune maximum and minimum memory used by ARC.
|
168460 |
07-Apr-2007 |
pjd |
Add missing mutex_init() which was causing assertion panic when on clone destruction.
Reported by: kris
|
168404 |
06-Apr-2007 |
pjd |
Please welcome ZFS - The last word in file systems.
ZFS file system was ported from OpenSolaris operating system. The code in under CDDL license.
I'd like to thank all SUN developers that created this great piece of software.
Supported by: Wheel LTD (http://www.wheel.pl/) Supported by: The FreeBSD Foundation (http://www.freebsdfoundation.org/) Supported by: Sentex (http://www.sentex.net/)
|