History log of /freebsd-10-stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
Revision Date Author Comments
# 315835 23-Mar-2017 avg

MFC r314913: MFV r314911: 7867 ARC space accounting leak


# 315073 11-Mar-2017 avg

MFC r314274: l2arc: fix write size calculation broken by Compressed ARC commit


# 314874 07-Mar-2017 jpaetzel

MFC 313879

MVF: 313876

7504 kmem_reap hangs spa_sync and administrative tasks

illumos/illumos-gate@405a5a0f5c3ab36cb76559467d1a62ba648bd809
https://github.com/illumos/illumos-gate/commit/405a5a0f5c3ab36cb76559467d1a62ba648bd80

https://www.illumos.org/issues/7504

We see long spa_sync(). We are waiting to hold dp_config_rwlock for writer. Some
other thread holds dp_config_rwlock for reader, then calls arc_get_data_buf(),
which finds that arc_is_overflowing()==B_TRUE. So it waits (while holding
dp_config_rwlock for reader) for arc_reclaim_thread to signal arc_reclaim_waiters_cv.
Before signaling, arc_reclaim_thread does arc_kmem_reap_now(), which takes ~seconds.

Author: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>


# 314032 21-Feb-2017 avg

MFC r313687: remove l2_padding_needed statistic from zfs arc


# 308765 17-Nov-2016 avg

Revert r308753: some unrelated changes were included into the commit


# 308753 17-Nov-2016 avg

MFC r308040,308479: nap time between pats is forced to be at most half
of the timeout

Note that in this branch the default nap period is 1 second unlike the
head where the period is 10 seconds.


# 307298 14-Oct-2016 mav

MFC r305561: MFV r305560:
7278 tuning zfs_arc_max does not impact arc_c_min

When changing zfs_arc_max (e.g. as zdb does), it may be set to less
than the default arc_c_min. arc_c_min should decrease to not be more than
arc_c_max, but it doesn't; therefore tuning of arc_c_max is ineffective.

Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Author: Matthew Ahrens <mahrens@delphix.com>

openzfs/openzfs@608764beadaf4bb71c5d8fe1818e8392ac66a61b


# 307266 14-Oct-2016 mav

MFC r305323: MFV r302991: 6950 ARC should cache compressed data

illumos/illumos-gate@dcbf3bd6a1f1360fc1afcee9e22c6dcff7844bf2
https://github.com/illumos/illumos-gate/commit/dcbf3bd6a1f1360fc1afcee9e22c6dcff
7844bf2

https://www.illumos.org/issues/6950
When reading compressed data from disk, the ARC should keep the compressed
block cached and only decompress it when consumers access the block. The
uncompressed data should be short-lived allowing the ARC to cache a much large
r
amount of data. The DMU would also maintain a smaller cache of uncompressed
blocks to minimize the impact of decompressing frequently accessed blocks.

Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Don Brady <don.brady@intel.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: George Wilson <george.wilson@delphix.com>


# 304139 15-Aug-2016 avg

MFC r302838: 6513 partially filled holes lose birth time


# 302729 13-Jul-2016 avg

MFC r301873: l2arc: reset b_tmp_cdata to NULL in the case of unset b_daddr


# 302714 13-Jul-2016 smh

MFC r302265, r302382

Allow ZFS ARC min / max to be tuned at runtime

Relnotes: YES
Sponsored by: Multiplay


# 301695 08-Jun-2016 ngie

MFC r300870,r300884:

r300870:

Unbreak the zfs(4) build

vm/vm_pageout.h grew a dependency on the bool typedef in r300865

arc.c didn't include sys/types.h, which included the definition for the typedef

Other items (ofed, drm2) might need to be chased for this commit.

Pointyhat to: alc

r300884:

Fix up r300870

The sys/types.h fix I proposed was only tested with zfs(4), not with
libzpool, which is where the build failure actually existed

Remove vm/vm_pageout.h from arc.c and zfs_vnops.c because they're both
unneeded

In collaboration with: kib


# 300039 17-May-2016 avg

MFC r297848: l2arc: make sure that all writes honor ashift of a cache device

Note: no MFC stable/9 because it has become quite out of date with head,
so the merge would be quite labourious and, thus, risky.


# 297116 20-Mar-2016 mav

MFC r296530: MFV r296529:
6672 arc_reclaim_thread() should use gethrtime() instead of ddi_get_lbolt()
6673 want a macro to convert seconds to nanoseconds and vice-versa

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Eli Rosenthal <eli.rosenthal@delphix.com>

illumos/illumos-gate@a8f6344fa0921599e1f4511e41b5f9a25c38c0f9


# 297099 20-Mar-2016 mav

MFC r294809: MFV r294808:
6421 Add missing multilist_destroy calls to arc_fini

Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Jorgen Lundman <lundman@lundman.net>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Prakash Surya <prakash.surya@delphix.com>

illumos/illumos-gate@57deb2328260c447bf1db25fe74e0eece102733e


# 297077 20-Mar-2016 mav

MFC r277300 (by smh): Mechanically convert cddl sun #ifdef's to illumos

Since the upstream for cddl code is now illumos not sun, mechanically
convert all sun #ifdef's to illumos #ifdef's which have been used in all
newer code for some time.

Also do a manual pass to correct the use if #ifdef comments as per style(9)
as well as few uses of #if defined(__FreeBSD__) vs #ifndef illumos.


# 290766 13-Nov-2015 mav

MFC r290191 (by avg):
l2arc: do not call trim_map_free() for blocks with zero b_asize

b_asize can be zero if the block is compressed into an empty block
(ZIO_COMPRESS_EMPTY) and the trim code asserts that meaningless
zero-sized trimming is not attempted.
The logic for calling trim_map_free() is extracted into a new function
l2arc_trim() to minimize code duplication.

PR: 203473
Reported by: Willem Jan Withagen <wjw@digiware.nl>
Tested by: Willem Jan Withagen <wjw@digiware.nl>


# 290757 13-Nov-2015 mav

MFC r289422:
4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Matthew Ahrens <mahrens@delphix.com>

illumos/illumos-gate@45818ee124adeaaf947698996b4f4c722afc6d1f

This is only a partial merge of respective ZFS infrastructure changes.
At this moment FreeBSD kernel has no those crypto algorithms, so the
parts of the code to enable them are commented out. When they are
implemented, it will be trivial to plug them in.


# 290752 13-Nov-2015 mav

MFC r289305: 6293 ztest failure: error == 28 (0xc == 0x1c) in ztest_tx_assign()

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Matthew Ahrens <mahrens@delphix.com>

illumos/illumos-gate@8fe00bfb8790ad51653f67b01d5ac14256cbb404


# 290749 13-Nov-2015 mav

MFC r289295: 5219 l2arc_write_buffers() may write beyond target_sz

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Saso Kiselkov <skiselkov@gmail.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Steven Hartland <steven.hartland@multiplay.co.uk>
Reviewed by: Justin Gibbs <gibbs@FreeBSD.org>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Andriy Gapon <avg@freebsd.org>

illumos/illumos-gate@d7d9a6d919f92d74ea0510a53f8441396048e800


# 288599 03-Oct-2015 mav

MFC r288064 (by avg): 6220 memleak in l2arc on debug build

illumos/illumos-gate/commit/c546f36aa898d913ff77674fb5ff97f15b2e08b4
https://www.illumos.org/issues/6220
5408 introduced a memleak in l2arc, namely the member b_thawed gets leaked
when an arc_hdr is realloced from full to l2only.

Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Simon Klinkert <simon.klinkert@gmail.com>
Reviewed by: George Wilson <george@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Arne Jansen <sensille@gmx.net>


# 288596 03-Oct-2015 mav

MFC r287706 (by delphij):
6214 zpools going south

In r286570 (MFV of r277426) an unprotected write to b_flags to
set the compression mode was introduced. This would open a race
window where data is partially decompressed, modified, checksummed
and written to the pool, resulting in pool corruption due to the
partial decompression.

Prevent this by reintroducing b_compress

illumos/illumos-gate@d4cd038c92c36fd0ae35945831a8fc2975b5272c

Illumos issues:

6214 zpools going south
https://www.illumos.org/issues/6214


# 288594 03-Oct-2015 mav

MFC r287702: 5987 zfs prefetch code needs work

Rewrite the ZFS prefetch code to detect only forward, sequential
streams.

The following kstats have been added:

kstat.zfs.misc.arcstats.sync_wait_for_async

How many sync reads have waited for async read
to complete. (less is better)

kstat.zfs.misc.arcstats.demand_hit_predictive_prefetch

How many demand read didn't have to wait for I/O
because of predictive prefetch. (more is better)

zfetch kstats have been similified to hits, misses, and max_streams,
with max_streams representing times when we were not able to create
new stream because we already have the maximum number of sequences
for a file.

The sysctl variable/loader tunable vfs.zfs.zfetch.block_cap have been
replaced by vfs.zfs.zfetch.max_distance, which controls maximum bytes
to prefetch per stream.

illumos/illumos-gate@cf6106c8a0d6598b045811f9650d66e07eb332af

Illumos ZFS issues:

5987 zfs prefetch code needs work
https://www.illumos.org/issues/5987


# 288592 03-Oct-2015 mav

MFC r287283 (by delphij):
Fix a buffer overrun which may lead to data corruption, introduced in
r286951 by reinstating changes in r274628.

In l2arc_compress_buf(), we allocate a buffer to stash away the compressed
data in 'cdata', allocated of l2hdr->b_asize bytes.

We then ask zio_compress_data() to compress the buffer, b_l1hdr.b_tmp_cdata,
which is of l2hdr->b_asize bytes, and have the compressed size (or original
size, if compress didn't gain enough) stored in csize.

To pad the buffer to fit the optimal write size, we round up the compressed
size to L2 device's vdev_ashift.

Illumos code rounds up the size by at most SPA_MINBLOCKSIZE. Because we
know csize <= b_asize, and b_asize is integer multiple of SPA_MINBLOCKSIZE,
we are guaranteed that the rounded up csize would be <= b_asize. However,
this is not necessarily true when we round up to 1 << vdev_ashift, because
it could be larger than SPA_MINBLOCKSIZE.

So, in the worst case scenario, we are overwriting at most

(1 << vdev_ashift - SPA_MINBLOCKSIZE)

bytes of memory next to the compressed data buffer.

Andriy's original change in r274628 reorganized the code a little bit,
by moving the padding to after we determined that the compression was
beneficial. At which point, we would check rounded size against the
allocated buffer size, and the buffer overrun would not be possible.


# 288588 03-Oct-2015 mav

MFC r286951: Restore part of r274628, reverted at r286776.


# 288587 03-Oct-2015 mav

MFC r286776: Remove some random accumulated diff from Illumos.


# 288586 03-Oct-2015 mav

MFC r286774: 2618 arc.c mistypes in the comments

Reviewed by: Jason King <jason.brian.king@gmail.com>
Reviewed by: Josef Sipek <jeffpc@josefsipek.net>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Bart Coddens <bart.coddens@gmail.com>

illumos/illumos-gate@fc98fea58e89224f6f13d7fae246d6cb5dfa35ea


# 288585 03-Oct-2015 mav

MFC r286770: Fix r286766 build with debug.


# 288584 03-Oct-2015 mav

MFC r286767: Fix minor mismerge sometimes earlier.


# 288583 03-Oct-2015 mav

MFC r286766: 5817 change type of arcs_size from uint64_t to refcount_t

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com>
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Prakash Surya <prakash.surya@delphix.com>

illumos/illumos-gate@2fd872a734cf486007a8dba532cec52bfb4d40e5

As a way to make it more difficult to introduce bugs into the ARC, and to
make it easier to diagnose issues when bugs do creep in, it would be
beneficial to change the type of the arc_state_t's arcs_size field to be
a refcount_t instead of a uint64_t. This would allow us to make stricter
checks when incrementing and decrementing the value with debugging enabled,
but still fallback to simple, fast atomic operations when debugging is
disabled.


# 288582 03-Oct-2015 mav

MFC r286764: 6033 arc_adjust() should search MFU lists for oldest buffer
when adjusting MFU size.

illumos/illumos-gate@31c46cf23cd1cf4d66390a983dc5072d7d299ba2

https://www.illumos.org/issues/6033
When we're looking for the list containing oldest buffer we never
actually look at the MFU lists even when we try to evict from MFU.
looks like a copy paste error, the fix is here:

Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Xin Li <delphij@delphij.net>
Reviewed by: Prakash Surya <me@prakashsurya.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Alek Pinchuk <alek@nexenta.com>
Obtained from: illumos


# 288581 03-Oct-2015 mav

MFC r286763: 5497 lock contention on arcs_mtx

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Prakash Surya <prakash.surya@delphix.com>

illumos/illumos-gate@244781f10dcd82684fd8163c016540667842f203

This patch attempts to reduce lock contention on the current arc_state_t
mutexes. These mutexes are used liberally to protect the number of LRU
lists within the ARC (e.g. ARC_mru, ARC_mfu, etc). The granularity at
which these locks are acquired has been shown to greatly affect the
performance of highly concurrent, cached workloads.


# 288580 03-Oct-2015 mav

MFC r286762: Revert part of r205231, introducing multiple ARC state locks.

This local implementation will be replaced by one from Illumos to reduce
code divergence and make further merges easier.


# 288566 03-Oct-2015 mav

MFC r286655: Fix set of sign extension bugs in r286625.


# 288565 03-Oct-2015 mav

MFC r286647: Fix assertion panic caused by combination of r286598 and TRIM.


# 288564 03-Oct-2015 mav

MFC r286628: Fix r286625 build on i386.


# 288563 03-Oct-2015 mav

MFC r286626: Fix minor mismerge in r286574.


# 288562 03-Oct-2015 mav

MFC r286625:
5376 arc_kmem_reap_now() should not result in clearing arc_no_grow

Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Matthew Ahrens <mahrens@delphix.com>

illumos/illumos-gate@2ec99e3e987d8aa273f1e9ba2b983557d058198c


# 288561 03-Oct-2015 mav

MFC r286623: Remove extra lock, that IMO only creates potential problems now.


# 288557 03-Oct-2015 mav

MFC r286598: 5701 zpool list reports incorrect "alloc" value for cache devices


# 288550 03-Oct-2015 mav

MFC r286576: Fix r286570 build with debug.


# 288548 03-Oct-2015 mav

MFC r286574: 5445 Add more visibility via arcstats; specifically
arc_state_t stats and differentiate between "data" and "metadata"

Reviewed by: Basil Crow <basil.crow@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Bayard Bell <bayard.bell@nexenta.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Prakash Surya <prakash.surya@delphix.com>

illumos/illumos-gate@4076b1bf41cfd9f968a33ed54a7ae76d9e996fe8


# 288547 03-Oct-2015 mav

MFC r286570: 5408 managing ZFS cache devices requires lots of RAM
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Don Brady <dev.fs.zfs@gmail.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Chris Williamson <Chris.Williamson@delphix.com>

illumos/illumos-gate@89c86e32293a30cdd7af530c38b2073fee01411c

Currently, every buffer cached in the L2ARC is accompanied by a 240-byte
header in memory, leading to very high memory consumption when using very
large cache devices. These changes significantly reduce this overhead.

Currently:

L1-only header = 176 bytes
L1 + L2 or L2-only header = 176 bytes + 32 byte checksum + 32 byte l2hdr
= 240 bytes

Memory-optimized:

L1-only header = 176 bytes
L1 + L2 header = 176 bytes + 32 byte checksum = 208 bytes
L2-only header = 96 bytes + 32 byte checksum = 128 bytes

So overall:

Trunk Optimized
+-----------------+
L1-only | 176 B | 176 B | (same)
+-----------------+
L1 & L2 | 240 B | 208 B | (saved 32 bytes)
+-----------------+
L2-only | 240 B | 128 B | (saved 116 bytes)
+-----------------+

For an average blocksize of 8KB, this means that for the L2ARC, the ratio
of metadata to data has gone down from about 2.92% to 1.56%. For a
'storage optimized' EC2 instance with 1600GB of SSD and 60GB of RAM, this
means that we expect a completely full L2ARC to use (1600 GB * 0.0156) /
60GB = 41% of the available memory, down from 78%.

Relnotes: yes


# 288536 03-Oct-2015 mav

MFC r281109: Add DTrace probe to the new ARC reclaim cause added in r281026.


# 288519 02-Oct-2015 mav

MFC r277826 (by delphij):
Diff reduction with upstream. The actual change was merged in r272483
already.


# 288518 02-Oct-2015 mav

MFC r277452 (by will): Fix arc__shrink DTrace probe's to_free argument.

Remove the unnecessary #ifdef _KERNEL, which did not differ in the true or
false cases. Actually set the value of to_free before using it.


# 288517 02-Oct-2015 mav

MFC r275780 (by delphij):
Add a loader tunable, vfs.zfs.arc_meta_min, which controls how much metadata
ZFS should keep in ARC at minimum.

In arc_evict(), when doing recycle, take more factors into account by
applying the following policy:

1. If no evictable data, evict metadata;
2. If no evictable metadata, evict data;
3. If we hit arc_meta_limit, evict metadata;
4. If we haven't hit arc_meta_min, evict data;
5* (Illumos only, not present in new FreeBSD code, yet) evict the oldest
cached element from data and metadata.
(FreeBSD) evict the data type specified by caller, which is the
existing behavior.

Note that because of our splitted locks (implemented in r205231 to improve
scalability by reducing lock contention), implementing the fifth Illumos
behavior will not be cheap, so for now just implement the 1-4 and fall back
to current behavior for 5.

Illumos issue:
5368 ARC should cache more metadata


# 287665 11-Sep-2015 avg

MFC r287099: account for ashift when gathering buffers to be written to l2arc device

The change differs from that in head because of other changes that have not
been MFC-ed yet.


# 287656 11-Sep-2015 avg

MFC r284513: l2arc: pass correct size to trim requests


# 285717 20-Jul-2015 jpaetzel

MFC 278040:

Prevent inlining txg_quiesce

This allows dtrace to monitor the calls to txg_quiesce which can be
really helpful.

Also standardize __noinline order for arc_kmem_reap_now.

Sponsored by: Multiplay

Approved by: re


# 282361 03-May-2015 mav

MFC r281026, r281108, r281109:
Make ZFS ARC track both KVA usage and fragmentation.

Even on Illumos, with its much larger KVA, ZFS ARC steps back if KVA usage
reaches certain threshold (3/4 on i386 or 16/17 otherwise). FreeBSD has
even less KVA, but had no such limit on archs with direct map as amd64.
As result, on machines with a lot of RAM, during load with very small user-
space memory pressure, such as `zfs send`, it was possible to reach state,
when there is enough both physical RAM and KVA (I've seen up to 25-30%),
but no continuous KVA range to allocate even single 128KB I/O request.

Address this situation from two sides:
- restore KVA usage limitations in a way the most close to Illumos;
- introduce new requirement for KVA fragmentation, specifying that we
should have at least one sequential KVA range of zfs_max_recordsize bytes.

Experiments show that first limitation done alone is not sufficient. On
machine with 64GB of RAM it is sometimes needed to drop up to half of ARC
size to get at leats one 1MB KVA chunk. Statically limiting ARC to half
of KVA/RAM is too strict, so second limitation makes it to work in cycles:
accumulate trash up to certain critical mass, do massive spring-cleaning,
and then start littering again.


# 281104 05-Apr-2015 mav

MFC r280822: Some cosmetic polishing. No functional change.


# 277586 23-Jan-2015 delphij

MFC r275811: MFV r275783:

Convert ARC flags to use enum. Previously, public flags are defined in
arc.h and private flags are defined in arc.c which can lead to confusion
and programming errors.

Consistently use 'hdr' (when referencing arc_buf_hdr_t) instead of 'buf'
or 'ab' because arc_buf_t are often named 'buf' as well.

Illumos issue:
5369 arc flags should be an enum
5370 consistent arc_buf_hdr_t naming scheme


# 277583 23-Jan-2015 delphij

MFC r275748: MFV r247174:

Expose arc_meta_limit, et al via kstats.

Note that as a result, vfs.zfs.arc_meta_used is removed.
The existing vfs.zfs.arc_meta_limit sysctl/tunable is retained
with a SYSCTL_PROC wrapper.

Illumos ZFS issues:
3561 arc_meta_limit should be exposed via kstats

Relnotes: yes


# 275609 08-Dec-2014 avg

MFC r274628: l2arc: restore correct rounding up of asize of compressed data


# 275492 04-Dec-2014 delphij

MFC r274172 (avg)

fix l2arc compression buffers leak

We have observed that arc_release() can be called concurrently with a
l2arc in-flight write.
Also, we have observed that arc_hdr_destroy() can be called from
arc_write_done() for a zio with ZIO_FLAG_IO_REWRITE flag in similar
circumstances.

Previously the l2arc headers would be freed while leaking their
associated compression buffers. Now the buffers are placed on
l2arc_free_on_write list for delayed freeing. This is similar to what
was already done to arc buffers that were supposed to be freed
concurrently with in-flight writes of those buffers.

In addition to fixing the discovered leaks this change also adds some
protective code to assert that a compression buffer associated with a
l2arc header is never leaked.

A new kstat l2_cdata_free_on_write is added. It keeps a count of
delayed compression buffer frees which previously would have been leaks.

Tested by: Vitalij Satanivskij <satan@ukr.net> et al
Requested by: many
Sponsored by: HybridCluster / ClusterHQ

This is a 10.1-RELEASE errata candidate.


# 274625 17-Nov-2014 avg

MFC r272708: l2arc_write_buffers: reduce headroom value


# 273984 02-Nov-2014 delphij

MFC r273026:

Add a tunable for arc_shrink_shift (vfs.zfs.arc_shrink_shift) that
controls how much fraction, 1/2^arc_shrink_shift, should be reclaimed
when there is memory pressure.

Submitted by: Richard Kojedzinszky <krichy at tvnetwork.hu>


# 273194 16-Oct-2014 delphij

MFC r272527:

Don't make nested definition for range_seg_cache.

Reported by: ian


# 273193 16-Oct-2014 delphij

MFC r272506: MFV r272495:

In arc_kmem_reap_now(), reap range_seg_cache too to reclaim memory in
response of memory pressure.

Illumos issue:
5163 arc should reap range_seg_cache


# 273191 16-Oct-2014 delphij

MFV r273060:

Use write_psize instead of write_asize when doing vdev_space_update.
Without this change the accounting of L2ARC usage would be wrong and
give 16EB free space because the number became negative and overflows.

Obtained from: FreeNAS (issue #6239)


# 272879 09-Oct-2014 smh

MFC r271754:
Remove unused ZFS ARC functions

Sponsored by: Multiplay


# 272875 09-Oct-2014 smh

MFC r270759:
Refactor ZFS ARC reclaim logic to be more VM cooperative

MFC r270861:
Ensure that ZFS ARC free memory checks include cached pages

MFC r272483:
Refactor ZFS ARC reclaim checks and limits

Sponsored by: Multiplay


# 269846 11-Aug-2014 delphij

MFC r269230: MFV r269224:

Increase default ARC buf_hash_table size. When typical block size is small,
the hash table could be too small, which would lead to long hash chains and
limit performance for cached reads.

A new loader tunable, vfs.zfs.arc_average_blocksize, have been added which
allows users to override the default assumption of average (typical) block
size. Old default was 65536 (64 KiB) and new default is 8192 (8 KiB).

Illumos issue:
5034 ARC's buf_hash_table is too small


# 269732 08-Aug-2014 delphij

MFC r269086:

As of r268075, the responsibility of rounding up buffer to optimal size have
been transferred from zio_compress_data to its caller. Therefore, passing
the 'minblocksize' down will be a no-op.

Eliminate the parameter to reduce diff against upstream.


# 269417 02-Aug-2014 delphij

MFC r268858: MFV r268850:

Change the interaction between the DMU and ARC so that when the DMU is
shutting down an objset, we do not evict the data from the ARC. Instead
we simply coordinate the destruction of the DMU's data with the ARC.

The only case where we actually need to explicitly evict from the ARC is
when dbuf_rele_and_unlock() determines that the administrator has requested
that it not be kept in memory, via the primarycache/secondarycache properties.
In this case, we evict the data from the ARC by its blkptr_t, the same way
as when a block is freed we explicitly evict it from the ARC.

Illumos issue:
4631 zvol_get_stats triggering too many reads


# 268657 15-Jul-2014 delphij

MFC r268123: MFV r268119:

4914 zfs on-disk bookmark structure should be named *_phys_t


# 268654 15-Jul-2014 delphij

MFC r268085: MFV r267569:

4897 Space accounting mismatch in L2ARC/zpool


# 268649 15-Jul-2014 delphij

MFC r268075: MFV r267565:

4757 ZFS embedded-data block pointers ("zero block compression")
4913 zfs release should not be subject to space checks


# 263397 19-Mar-2014 delphij

MFC r260150: MFV r259170:

4370 avoid transmitting holes during zfs send

4371 DMU code clean up

illumos/illumos-gate@43466aae47bfcd2ad9bf501faec8e75c08095e4f

NOTE: Make sure the boot code is updated if a zpool upgrade is
done on boot zpool.


# 262115 17-Feb-2014 avg

MFC r260835: MFV r260834: Fix memory leak of compressed buffers in l2arc_write_done


# 260763 16-Jan-2014 avg

MFC r258632,258704: MFV r255255: 4045 zfs write throttle & i/o scheduler
performance work

Sponsored by: HybridCluster [merge]


# 258566 25-Nov-2013 avg

MFV r258378: 4089 NULL pointer dereference in arc_read()

illumos/illumos-gate@57815f6b95a743697e148327725b7f568e75e6ea

Tested by: adrian
Approved by: re (gjb)


# 258565 25-Nov-2013 avg

MFV r258377: 4088 use after free in arc_release()

illumos/illumos-gate@ccc22e130479b5bd7c0002267fee1e0602d3f772

Approved by: re (gjb)


# 257058 24-Oct-2013 smh

MFC r256889:

Use the vdev's ashift to calculate the supported min block size passed to
zio_compress_data(..) when compressing l2arc buffers.

This eliminates L2ARC I/O errors, which resulted in very poor performance on
vdev's configured with block size greater than 512b due to compression
assuming a smaller min block size than the vdev supports.

Approved by: re (glebius)


# 288599 03-Oct-2015 mav

MFC r288064 (by avg): 6220 memleak in l2arc on debug build

illumos/illumos-gate/commit/c546f36aa898d913ff77674fb5ff97f15b2e08b4
https://www.illumos.org/issues/6220
5408 introduced a memleak in l2arc, namely the member b_thawed gets leaked
when an arc_hdr is realloced from full to l2only.

Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Simon Klinkert <simon.klinkert@gmail.com>
Reviewed by: George Wilson <george@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Arne Jansen <sensille@gmx.net>


# 288596 03-Oct-2015 mav

MFC r287706 (by delphij):
6214 zpools going south

In r286570 (MFV of r277426) an unprotected write to b_flags to
set the compression mode was introduced. This would open a race
window where data is partially decompressed, modified, checksummed
and written to the pool, resulting in pool corruption due to the
partial decompression.

Prevent this by reintroducing b_compress

illumos/illumos-gate@d4cd038c92c36fd0ae35945831a8fc2975b5272c

Illumos issues:

6214 zpools going south
https://www.illumos.org/issues/6214


# 288594 03-Oct-2015 mav

MFC r287702: 5987 zfs prefetch code needs work

Rewrite the ZFS prefetch code to detect only forward, sequential
streams.

The following kstats have been added:

kstat.zfs.misc.arcstats.sync_wait_for_async

How many sync reads have waited for async read
to complete. (less is better)

kstat.zfs.misc.arcstats.demand_hit_predictive_prefetch

How many demand read didn't have to wait for I/O
because of predictive prefetch. (more is better)

zfetch kstats have been similified to hits, misses, and max_streams,
with max_streams representing times when we were not able to create
new stream because we already have the maximum number of sequences
for a file.

The sysctl variable/loader tunable vfs.zfs.zfetch.block_cap have been
replaced by vfs.zfs.zfetch.max_distance, which controls maximum bytes
to prefetch per stream.

illumos/illumos-gate@cf6106c8a0d6598b045811f9650d66e07eb332af

Illumos ZFS issues:

5987 zfs prefetch code needs work
https://www.illumos.org/issues/5987


# 288592 03-Oct-2015 mav

MFC r287283 (by delphij):
Fix a buffer overrun which may lead to data corruption, introduced in
r286951 by reinstating changes in r274628.

In l2arc_compress_buf(), we allocate a buffer to stash away the compressed
data in 'cdata', allocated of l2hdr->b_asize bytes.

We then ask zio_compress_data() to compress the buffer, b_l1hdr.b_tmp_cdata,
which is of l2hdr->b_asize bytes, and have the compressed size (or original
size, if compress didn't gain enough) stored in csize.

To pad the buffer to fit the optimal write size, we round up the compressed
size to L2 device's vdev_ashift.

Illumos code rounds up the size by at most SPA_MINBLOCKSIZE. Because we
know csize <= b_asize, and b_asize is integer multiple of SPA_MINBLOCKSIZE,
we are guaranteed that the rounded up csize would be <= b_asize. However,
this is not necessarily true when we round up to 1 << vdev_ashift, because
it could be larger than SPA_MINBLOCKSIZE.

So, in the worst case scenario, we are overwriting at most

(1 << vdev_ashift - SPA_MINBLOCKSIZE)

bytes of memory next to the compressed data buffer.

Andriy's original change in r274628 reorganized the code a little bit,
by moving the padding to after we determined that the compression was
beneficial. At which point, we would check rounded size against the
allocated buffer size, and the buffer overrun would not be possible.


# 288588 03-Oct-2015 mav

MFC r286951: Restore part of r274628, reverted at r286776.


# 288587 03-Oct-2015 mav

MFC r286776: Remove some random accumulated diff from Illumos.


# 288586 03-Oct-2015 mav

MFC r286774: 2618 arc.c mistypes in the comments

Reviewed by: Jason King <jason.brian.king@gmail.com>
Reviewed by: Josef Sipek <jeffpc@josefsipek.net>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Bart Coddens <bart.coddens@gmail.com>

illumos/illumos-gate@fc98fea58e89224f6f13d7fae246d6cb5dfa35ea


# 288585 03-Oct-2015 mav

MFC r286770: Fix r286766 build with debug.


# 288584 03-Oct-2015 mav

MFC r286767: Fix minor mismerge sometimes earlier.


# 288583 03-Oct-2015 mav

MFC r286766: 5817 change type of arcs_size from uint64_t to refcount_t

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com>
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Prakash Surya <prakash.surya@delphix.com>

illumos/illumos-gate@2fd872a734cf486007a8dba532cec52bfb4d40e5

As a way to make it more difficult to introduce bugs into the ARC, and to
make it easier to diagnose issues when bugs do creep in, it would be
beneficial to change the type of the arc_state_t's arcs_size field to be
a refcount_t instead of a uint64_t. This would allow us to make stricter
checks when incrementing and decrementing the value with debugging enabled,
but still fallback to simple, fast atomic operations when debugging is
disabled.


# 288582 03-Oct-2015 mav

MFC r286764: 6033 arc_adjust() should search MFU lists for oldest buffer
when adjusting MFU size.

illumos/illumos-gate@31c46cf23cd1cf4d66390a983dc5072d7d299ba2

https://www.illumos.org/issues/6033
When we're looking for the list containing oldest buffer we never
actually look at the MFU lists even when we try to evict from MFU.
looks like a copy paste error, the fix is here:

Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Xin Li <delphij@delphij.net>
Reviewed by: Prakash Surya <me@prakashsurya.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Alek Pinchuk <alek@nexenta.com>
Obtained from: illumos


# 288581 03-Oct-2015 mav

MFC r286763: 5497 lock contention on arcs_mtx

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Prakash Surya <prakash.surya@delphix.com>

illumos/illumos-gate@244781f10dcd82684fd8163c016540667842f203

This patch attempts to reduce lock contention on the current arc_state_t
mutexes. These mutexes are used liberally to protect the number of LRU
lists within the ARC (e.g. ARC_mru, ARC_mfu, etc). The granularity at
which these locks are acquired has been shown to greatly affect the
performance of highly concurrent, cached workloads.


# 288580 03-Oct-2015 mav

MFC r286762: Revert part of r205231, introducing multiple ARC state locks.

This local implementation will be replaced by one from Illumos to reduce
code divergence and make further merges easier.


# 288566 03-Oct-2015 mav

MFC r286655: Fix set of sign extension bugs in r286625.


# 288565 03-Oct-2015 mav

MFC r286647: Fix assertion panic caused by combination of r286598 and TRIM.


# 288564 03-Oct-2015 mav

MFC r286628: Fix r286625 build on i386.


# 288563 03-Oct-2015 mav

MFC r286626: Fix minor mismerge in r286574.


# 288562 03-Oct-2015 mav

MFC r286625:
5376 arc_kmem_reap_now() should not result in clearing arc_no_grow

Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Matthew Ahrens <mahrens@delphix.com>

illumos/illumos-gate@2ec99e3e987d8aa273f1e9ba2b983557d058198c


# 288561 03-Oct-2015 mav

MFC r286623: Remove extra lock, that IMO only creates potential problems now.


# 288557 03-Oct-2015 mav

MFC r286598: 5701 zpool list reports incorrect "alloc" value for cache devices


# 288550 03-Oct-2015 mav

MFC r286576: Fix r286570 build with debug.


# 288548 03-Oct-2015 mav

MFC r286574: 5445 Add more visibility via arcstats; specifically
arc_state_t stats and differentiate between "data" and "metadata"

Reviewed by: Basil Crow <basil.crow@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Bayard Bell <bayard.bell@nexenta.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Prakash Surya <prakash.surya@delphix.com>

illumos/illumos-gate@4076b1bf41cfd9f968a33ed54a7ae76d9e996fe8


# 288547 03-Oct-2015 mav

MFC r286570: 5408 managing ZFS cache devices requires lots of RAM
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Don Brady <dev.fs.zfs@gmail.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Chris Williamson <Chris.Williamson@delphix.com>

illumos/illumos-gate@89c86e32293a30cdd7af530c38b2073fee01411c

Currently, every buffer cached in the L2ARC is accompanied by a 240-byte
header in memory, leading to very high memory consumption when using very
large cache devices. These changes significantly reduce this overhead.

Currently:

L1-only header = 176 bytes
L1 + L2 or L2-only header = 176 bytes + 32 byte checksum + 32 byte l2hdr
= 240 bytes

Memory-optimized:

L1-only header = 176 bytes
L1 + L2 header = 176 bytes + 32 byte checksum = 208 bytes
L2-only header = 96 bytes + 32 byte checksum = 128 bytes

So overall:

Trunk Optimized
+-----------------+
L1-only | 176 B | 176 B | (same)
+-----------------+
L1 & L2 | 240 B | 208 B | (saved 32 bytes)
+-----------------+
L2-only | 240 B | 128 B | (saved 116 bytes)
+-----------------+

For an average blocksize of 8KB, this means that for the L2ARC, the ratio
of metadata to data has gone down from about 2.92% to 1.56%. For a
'storage optimized' EC2 instance with 1600GB of SSD and 60GB of RAM, this
means that we expect a completely full L2ARC to use (1600 GB * 0.0156) /
60GB = 41% of the available memory, down from 78%.

Relnotes: yes


# 288536 03-Oct-2015 mav

MFC r281109: Add DTrace probe to the new ARC reclaim cause added in r281026.


# 288519 02-Oct-2015 mav

MFC r277826 (by delphij):
Diff reduction with upstream. The actual change was merged in r272483
already.


# 288518 02-Oct-2015 mav

MFC r277452 (by will): Fix arc__shrink DTrace probe's to_free argument.

Remove the unnecessary #ifdef _KERNEL, which did not differ in the true or
false cases. Actually set the value of to_free before using it.


# 288517 02-Oct-2015 mav

MFC r275780 (by delphij):
Add a loader tunable, vfs.zfs.arc_meta_min, which controls how much metadata
ZFS should keep in ARC at minimum.

In arc_evict(), when doing recycle, take more factors into account by
applying the following policy:

1. If no evictable data, evict metadata;
2. If no evictable metadata, evict data;
3. If we hit arc_meta_limit, evict metadata;
4. If we haven't hit arc_meta_min, evict data;
5* (Illumos only, not present in new FreeBSD code, yet) evict the oldest
cached element from data and metadata.
(FreeBSD) evict the data type specified by caller, which is the
existing behavior.

Note that because of our splitted locks (implemented in r205231 to improve
scalability by reducing lock contention), implementing the fifth Illumos
behavior will not be cheap, so for now just implement the 1-4 and fall back
to current behavior for 5.

Illumos issue:
5368 ARC should cache more metadata


# 287665 11-Sep-2015 avg

MFC r287099: account for ashift when gathering buffers to be written to l2arc device

The change differs from that in head because of other changes that have not
been MFC-ed yet.


# 287656 11-Sep-2015 avg

MFC r284513: l2arc: pass correct size to trim requests


# 285717 20-Jul-2015 jpaetzel

MFC 278040:

Prevent inlining txg_quiesce

This allows dtrace to monitor the calls to txg_quiesce which can be
really helpful.

Also standardize __noinline order for arc_kmem_reap_now.

Sponsored by: Multiplay

Approved by: re


# 282361 03-May-2015 mav

MFC r281026, r281108, r281109:
Make ZFS ARC track both KVA usage and fragmentation.

Even on Illumos, with its much larger KVA, ZFS ARC steps back if KVA usage
reaches certain threshold (3/4 on i386 or 16/17 otherwise). FreeBSD has
even less KVA, but had no such limit on archs with direct map as amd64.
As result, on machines with a lot of RAM, during load with very small user-
space memory pressure, such as `zfs send`, it was possible to reach state,
when there is enough both physical RAM and KVA (I've seen up to 25-30%),
but no continuous KVA range to allocate even single 128KB I/O request.

Address this situation from two sides:
- restore KVA usage limitations in a way the most close to Illumos;
- introduce new requirement for KVA fragmentation, specifying that we
should have at least one sequential KVA range of zfs_max_recordsize bytes.

Experiments show that first limitation done alone is not sufficient. On
machine with 64GB of RAM it is sometimes needed to drop up to half of ARC
size to get at leats one 1MB KVA chunk. Statically limiting ARC to half
of KVA/RAM is too strict, so second limitation makes it to work in cycles:
accumulate trash up to certain critical mass, do massive spring-cleaning,
and then start littering again.


# 281104 05-Apr-2015 mav

MFC r280822: Some cosmetic polishing. No functional change.


# 277586 23-Jan-2015 delphij

MFC r275811: MFV r275783:

Convert ARC flags to use enum. Previously, public flags are defined in
arc.h and private flags are defined in arc.c which can lead to confusion
and programming errors.

Consistently use 'hdr' (when referencing arc_buf_hdr_t) instead of 'buf'
or 'ab' because arc_buf_t are often named 'buf' as well.

Illumos issue:
5369 arc flags should be an enum
5370 consistent arc_buf_hdr_t naming scheme


# 277583 23-Jan-2015 delphij

MFC r275748: MFV r247174:

Expose arc_meta_limit, et al via kstats.

Note that as a result, vfs.zfs.arc_meta_used is removed.
The existing vfs.zfs.arc_meta_limit sysctl/tunable is retained
with a SYSCTL_PROC wrapper.

Illumos ZFS issues:
3561 arc_meta_limit should be exposed via kstats

Relnotes: yes


# 275609 08-Dec-2014 avg

MFC r274628: l2arc: restore correct rounding up of asize of compressed data


# 275492 04-Dec-2014 delphij

MFC r274172 (avg)

fix l2arc compression buffers leak

We have observed that arc_release() can be called concurrently with a
l2arc in-flight write.
Also, we have observed that arc_hdr_destroy() can be called from
arc_write_done() for a zio with ZIO_FLAG_IO_REWRITE flag in similar
circumstances.

Previously the l2arc headers would be freed while leaking their
associated compression buffers. Now the buffers are placed on
l2arc_free_on_write list for delayed freeing. This is similar to what
was already done to arc buffers that were supposed to be freed
concurrently with in-flight writes of those buffers.

In addition to fixing the discovered leaks this change also adds some
protective code to assert that a compression buffer associated with a
l2arc header is never leaked.

A new kstat l2_cdata_free_on_write is added. It keeps a count of
delayed compression buffer frees which previously would have been leaks.

Tested by: Vitalij Satanivskij <satan@ukr.net> et al
Requested by: many
Sponsored by: HybridCluster / ClusterHQ

This is a 10.1-RELEASE errata candidate.


# 274625 17-Nov-2014 avg

MFC r272708: l2arc_write_buffers: reduce headroom value


# 273984 02-Nov-2014 delphij

MFC r273026:

Add a tunable for arc_shrink_shift (vfs.zfs.arc_shrink_shift) that
controls how much fraction, 1/2^arc_shrink_shift, should be reclaimed
when there is memory pressure.

Submitted by: Richard Kojedzinszky <krichy at tvnetwork.hu>


# 273194 16-Oct-2014 delphij

MFC r272527:

Don't make nested definition for range_seg_cache.

Reported by: ian


# 273193 16-Oct-2014 delphij

MFC r272506: MFV r272495:

In arc_kmem_reap_now(), reap range_seg_cache too to reclaim memory in
response of memory pressure.

Illumos issue:
5163 arc should reap range_seg_cache


# 273191 16-Oct-2014 delphij

MFV r273060:

Use write_psize instead of write_asize when doing vdev_space_update.
Without this change the accounting of L2ARC usage would be wrong and
give 16EB free space because the number became negative and overflows.

Obtained from: FreeNAS (issue #6239)


# 272879 09-Oct-2014 smh

MFC r271754:
Remove unused ZFS ARC functions

Sponsored by: Multiplay


# 272875 09-Oct-2014 smh

MFC r270759:
Refactor ZFS ARC reclaim logic to be more VM cooperative

MFC r270861:
Ensure that ZFS ARC free memory checks include cached pages

MFC r272483:
Refactor ZFS ARC reclaim checks and limits

Sponsored by: Multiplay


# 269846 11-Aug-2014 delphij

MFC r269230: MFV r269224:

Increase default ARC buf_hash_table size. When typical block size is small,
the hash table could be too small, which would lead to long hash chains and
limit performance for cached reads.

A new loader tunable, vfs.zfs.arc_average_blocksize, have been added which
allows users to override the default assumption of average (typical) block
size. Old default was 65536 (64 KiB) and new default is 8192 (8 KiB).

Illumos issue:
5034 ARC's buf_hash_table is too small


# 269732 08-Aug-2014 delphij

MFC r269086:

As of r268075, the responsibility of rounding up buffer to optimal size have
been transferred from zio_compress_data to its caller. Therefore, passing
the 'minblocksize' down will be a no-op.

Eliminate the parameter to reduce diff against upstream.


# 269417 02-Aug-2014 delphij

MFC r268858: MFV r268850:

Change the interaction between the DMU and ARC so that when the DMU is
shutting down an objset, we do not evict the data from the ARC. Instead
we simply coordinate the destruction of the DMU's data with the ARC.

The only case where we actually need to explicitly evict from the ARC is
when dbuf_rele_and_unlock() determines that the administrator has requested
that it not be kept in memory, via the primarycache/secondarycache properties.
In this case, we evict the data from the ARC by its blkptr_t, the same way
as when a block is freed we explicitly evict it from the ARC.

Illumos issue:
4631 zvol_get_stats triggering too many reads


# 268657 15-Jul-2014 delphij

MFC r268123: MFV r268119:

4914 zfs on-disk bookmark structure should be named *_phys_t


# 268654 15-Jul-2014 delphij

MFC r268085: MFV r267569:

4897 Space accounting mismatch in L2ARC/zpool


# 268649 15-Jul-2014 delphij

MFC r268075: MFV r267565:

4757 ZFS embedded-data block pointers ("zero block compression")
4913 zfs release should not be subject to space checks


# 263397 19-Mar-2014 delphij

MFC r260150: MFV r259170:

4370 avoid transmitting holes during zfs send

4371 DMU code clean up

illumos/illumos-gate@43466aae47bfcd2ad9bf501faec8e75c08095e4f

NOTE: Make sure the boot code is updated if a zpool upgrade is
done on boot zpool.


# 262115 17-Feb-2014 avg

MFC r260835: MFV r260834: Fix memory leak of compressed buffers in l2arc_write_done


# 260763 16-Jan-2014 avg

MFC r258632,258704: MFV r255255: 4045 zfs write throttle & i/o scheduler
performance work

Sponsored by: HybridCluster [merge]


# 258566 25-Nov-2013 avg

MFV r258378: 4089 NULL pointer dereference in arc_read()

illumos/illumos-gate@57815f6b95a743697e148327725b7f568e75e6ea

Tested by: adrian
Approved by: re (gjb)


# 258565 25-Nov-2013 avg

MFV r258377: 4088 use after free in arc_release()

illumos/illumos-gate@ccc22e130479b5bd7c0002267fee1e0602d3f772

Approved by: re (gjb)


# 257058 24-Oct-2013 smh

MFC r256889:

Use the vdev's ashift to calculate the supported min block size passed to
zio_compress_data(..) when compressing l2arc buffers.

This eliminates L2ARC I/O errors, which resulted in very poor performance on
vdev's configured with block size greater than 512b due to compression
assuming a smaller min block size than the vdev supports.

Approved by: re (glebius)