README.attrcache revision 310490
1275970Scy		 NFS Attribute Caching OS Problems and Amd
2275970Scy		      Last updated September 18, 2005
3275970Scy
4275970Scy* Summary:
5275970Scy
6275970ScySome OSs don't seem to have a way to turn off the NFS attribute cache, which
7275970Scybreaks the Amd automounter so badly that it is not recommend using Amd on
8275970Scysuch OS for heavy use, not until this is fixed.
9275970Scy
10275970Scy
11275970Scy* Details:
12275970Scy
13275970ScyAmd is a user-level NFSv2 server that manages automounts of all other file
14275970Scysystems.  The kernel contacts Amd via RPCs, and Amd in turn performs the
15275970Scyactual mounts, and then responds back to the kernel's RPCs.  Every kernel
16275970Scycaches attributes of files, in a cache called the Directory Name Lookup
17275970ScyCache (DNLC), or a Directory Cache (dcache).
18275970Scy
19275970ScyAmd manages its namespace in the user level, but the kernel caches names
20275970Scyitself.  So the two must coordinate to ensure that both namespaces are in
21275970Scysync.  If the kernel uses a cached entry from the DNLC, without consulting
22275970ScyAmd, users may see corruption of the automounter namespace (symlinks
23275970Scypointing to the wrong places, ESTALE errors, and more).  For example,
24275970Scysuppose Amd timed out an entry and removed the entry from Amd's namespace.
25275970ScyAmd has to tell the kernel to purge its corresponding DNLC entry too.  The
26275970Scyway Amd often does that is by incrementing the last modification time
27275970Scy(mtime) of the parent directory.  This is the most common method for kernels
28275970Scyto check if their DNLC entries are stale: if the parent directory mtime is
29275970Scynewer, the kernel will discard all cached entries for that directory, and
30275970Scywill re-issue lookup methods.  Those lookups will result in
31275970ScyNFS_GETATTR/NFS_LOOKUP calls sent from the kernel down to Amd, and Amd can
32275970Scythen properly inform the kernel of the new state of automounted entries.
33275970Scy
34275970ScyIn order to ensure that Amd is "in charge" of its namespace without
35275970Scyinterference from the kernel, Amd will try to turn off the NFS attribute
36275970Scycache.  It does so by using the NFSMNT_NOAC flag, if it exists, or by
37275970Scysetting various "cache timeout" fields in struct nfs_args to 0 (acregmin,
38275970Scyacregmax, acdirmin, or acdirmax).
39275970Scy
40275970ScyWe have released a major new version of am-utils, version 6.1, in June 2005.
41275970ScySince then, a lot of people have experimented with Amd, in anticipation of
42275970Scymigrating from the very old am-utils 6.0 to the new 6.1.  For a couple of
43275970Scymonths since the release of 6.1, we have received reports of problems with
44275970ScyAmd, especially under heavy use.  Users reported getting ESTALE errors from
45275970Scytime to time, or seeing automounted entries whose symlinks don't point to
46275970Scywhere it should be.  After much debugging, we traced it to a few places in
47275970ScyAmd where it wasn't updating the parent directory mtime as it should have;
48275970Scyin some places where Amd was indeed updating the mtime, it was using a
49275970Scyresolution of only 1 second, which was not fine enough under heavy load.  We
50275970Scyfixed this problem and switched to using a microsecond resolution mtime.
51275970Scy
52275970ScyAfter fixing this in Amd, we went on to verify that things work for other
53275970ScyOSs.  When we got to test certain BSDs, we found out that they always cache
54275970Scydirectory entries, and there is no way to turn it off completely.
55275970ScySpecifically, if we set the ac{reg,dir}{min,max} fields in struct nfs_args
56275970Scyall to zero, the kernel seems to cache the entries for a default number of
57275970Scyseconds (something like 5-30 seconds).  On some OSs, setting these four
58275970Scyfields to 0 turns off the attribute cache, but not on some BSDs.  We were
59275970Scyable to verify this using Amd and a script that exercises the interaction of
60275970Scythe kernel's attrcache and Amd.  (If you're interested, the script can be
61275970Scymade available.)
62275970Scy
63275970ScyWe then experimented by setting the ac{reg,dir}{min,max} fields in struct
64275970Scynfs_args all to 1, the smallest non-zero value we could.  When we ran the
65275970ScyAmd exercising script, we found that the value of 1 reduced the race between
66275970Scythe DNLC and Amd, and the script took a little longer to run before it
67275970Scydetected an incoherency.  That makes sense: the smaller the DNLC cache
68275970Scyinterval is, the shorter the window of vulnerability is.  (BTW, the man
69275970Scypages on some OSs say that the ac{reg,dir}{min,max} fields use a 1 second
70275970Scyresolution, but experimentation indicated it was in 0.1 second units.)
71275970Scy
72275970ScyClearly, setting the ac{reg,dir}{min,max} fields to 0 is worse than setting
73275970Scyit to 1 on those OSs that don't have a way to turn off the attribute cache.
74275970ScySo the current workaround I've implemented in am-utils is to create a
75275970Scyconfiguration parameter called "broken_attrcache" which, if turned on, will
76275970Scyset these nfs_args fields to 1 instead of 0.  I wish I didn't have to create
77275970Scysuch ugly workaround features in Amd, but I've got no choice.
78275970Scy
79275970ScyThe near term solution is for every OS to support a true 'noac' flag, which
80275970Scycan be added fairly easily.  This'd make Amd work reliably.
81275970Scy
82275970ScyThe long term solution is to implement Autofs support for all OSs and to
83275970Scysupport it in Amd.  Currently, Amd supports autofs on Solaris and Linux;
84275970ScyFreeBSD is next.  Still, we found that even with autofs support, many
85275970Scysysadmins still prefer to use the good 'ol non-autofs mode.
86275970Scy
87275970Scy
88275970Scy* Confirmed Status
89275970Scy
90275970ScyThis is the confirmed status of various OSs' vulnerability to this attribute
91275970Scycache bug.  We are slowly checking the status of other OSs.  The status of
92275970Scyany OS not listed is unknown as of the date at the top of this file.
93275970Scy
94275970Scy** Not Vulnerable (support a proper "noac" flag):
95275970Scy
96275970ScySun Solaris 8 and 9 (10 probably works fine)
97275970ScyLinux: 2.6.11 kernel (2.4.latest probably works fine)
98275970ScyFreeBSD 5.4 and 6.0-SNAP001 (older versions probably work fine)
99275970ScyOpenBSD 3.7 (older versions probably work fine)
100275970Scy
101275970Scy** Vulnerable (don't support a proper "noac" flag natively):
102275970Scy
103275970ScyNetBSD 2.0.2 (older versions are also probably affected)
104275970Scy
105275970ScyNote: NetBSD has promised to support a noac flag hopefully after 2.1.0 is
106275970Scyreleased (maybe in 3.0 or 2.2).  In the mean time, you can apply one of
107275970Scythese two kernel patchs to support a 'noac' flag in NetBSD 2.x or 3.x:
108275970Scy	ftp://ftp.netbsd.org/pub/NetBSD/misc/christos/2x.nfs.noac.diff
109275970Scy	ftp://ftp.netbsd.org/pub/NetBSD/misc/christos/3x.nfs.noac.diff
110275970ScyAfter applying this patch and rebuilding your kernel, reboot with the new
111275970Scykernel.  Then copy the new nfs.h and nfsmount.h from /sys/nfs/ to
112275970Scy/usr/include/nfs/, and finally rebuild am-utils from scratch.
113275970Scy
114275970Scy** Testing
115275970Scy
116275970ScyWhen you build am-utils, a script named scripts/test-attrcache is built,
117275970Scywhich can be used to test the NFS attribute cache behavior of the current
118275970ScyOS.  You can run this script as root as follows:
119275970Scy
120275970Scy# make install
121275970Scy# cd scripts
122275970Scy# sh test-attrcache
123275970Scy
124275970ScyIf you run this script on an OS whose status is known (and not listed
125275970Scyabove), please report it to us via Bugzilla or the am-utils mailing list
126275970Scy(see www.am-utils.org), so we can record it in this file.
127275970Scy
128275970ScySincerely,
129275970ScyErez.
130275970Scy