README.attrcache revision 174294
1226584Sdim		 NFS Attribute Caching OS Problems and Amd
2226584Sdim		      Last updated September 18, 2005
3226584Sdim
4226584Sdim* Summary:
5226584Sdim
6226584SdimSome OSs don't seem to have a way to turn off the NFS attribute cache, which
7226584Sdimbreaks the Amd automounter so badly that it is not recommend using Amd on
8226584Sdimsuch OS for heavy use, not until this is fixed.
9226584Sdim
10226584Sdim
11226584Sdim* Details:
12226584Sdim
13226584SdimAmd is a user-level NFSv2 server that manages automounts of all other file
14226584Sdimsystems.  The kernel contacts Amd via RPCs, and Amd in turn performs the
15226584Sdimactual mounts, and then responds back to the kernel's RPCs.  Every kernel
16226584Sdimcaches attributes of files, in a cache called the Directory Name Lookup
17234353SdimCache (DNLC), or a Directory Cache (dcache).
18249423Sdim
19226584SdimAmd manages its namespace in the user level, but the kernel caches names
20226584Sdimitself.  So the two must coordinate to ensure that both namespaces are in
21226584Sdimsync.  If the kernel uses a cached entry from the DNLC, without consulting
22249423SdimAmd, users may see corruption of the automounter namespace (symlinks
23249423Sdimpointing to the wrong places, ESTALE errors, and more).  For example,
24249423Sdimsuppose Amd timed out an entry and removed the entry from Amd's namespace.
25249423SdimAmd has to tell the kernel to purge its corresponding DNLC entry too.  The
26226584Sdimway Amd often does that is by incrementing the last modification time
27226584Sdim(mtime) of the parent directory.  This is the most common method for kernels
28226584Sdimto check if their DNLC entries are stale: if the parent directory mtime is
29226584Sdimnewer, the kernel will discard all cached entries for that directory, and
30226584Sdimwill re-issue lookup methods.  Those lookups will result in
31226584SdimNFS_GETATTR/NFS_LOOKUP calls sent from the kernel down to Amd, and Amd can
32226584Sdimthen properly inform the kernel of the new state of automounted entries.
33234353Sdim
34234353SdimIn order to ensure that Amd is "in charge" of its namespace without
35226584Sdiminterference from the kernel, Amd will try to turn off the NFS attribute
36226584Sdimcache.  It does so by using the NFSMNT_NOAC flag, if it exists, or by
37226584Sdimsetting various "cache timeout" fields in struct nfs_args to 0 (acregmin,
38226584Sdimacregmax, acdirmin, or acdirmax).
39226584Sdim
40226584SdimWe have released a major new version of am-utils, version 6.1, in June 2005.
41226584SdimSince then, a lot of people have experimented with Amd, in anticipation of
42226584Sdimmigrating from the very old am-utils 6.0 to the new 6.1.  For a couple of
43226584Sdimmonths since the release of 6.1, we have received reports of problems with
44226584SdimAmd, especially under heavy use.  Users reported getting ESTALE errors from
45226584Sdimtime to time, or seeing automounted entries whose symlinks don't point to
46226584Sdimwhere it should be.  After much debugging, we traced it to a few places in
47226584SdimAmd where it wasn't updating the parent directory mtime as it should have;
48226584Sdimin some places where Amd was indeed updating the mtime, it was using a
49234353Sdimresolution of only 1 second, which was not fine enough under heavy load.  We
50234353Sdimfixed this problem and switched to using a microsecond resolution mtime.
51234353Sdim
52234353SdimAfter fixing this in Amd, we went on to verify that things work for other
53234353SdimOSs.  When we got to test certain BSDs, we found out that they always cache
54226584Sdimdirectory entries, and there is no way to turn it off completely.
55226584SdimSpecifically, if we set the ac{reg,dir}{min,max} fields in struct nfs_args
56226584Sdimall to zero, the kernel seems to cache the entries for a default number of
57226584Sdimseconds (something like 5-30 seconds).  On some OSs, setting these four
58226584Sdimfields to 0 turns off the attribute cache, but not on some BSDs.  We were
59226584Sdimable to verify this using Amd and a script that exercises the interaction of
60226584Sdimthe kernel's attrcache and Amd.  (If you're interested, the script can be
61226584Sdimmade available.)
62226584Sdim
63226584SdimWe then experimented by setting the ac{reg,dir}{min,max} fields in struct
64226584Sdimnfs_args all to 1, the smallest non-zero value we could.  When we ran the
65226584SdimAmd exercising script, we found that the value of 1 reduced the race between
66226584Sdimthe DNLC and Amd, and the script took a little longer to run before it
67226584Sdimdetected an incoherency.  That makes sense: the smaller the DNLC cache
68226584Sdiminterval is, the shorter the window of vulnerability is.  (BTW, the man
69234353Sdimpages on some OSs say that the ac{reg,dir}{min,max} fields use a 1 second
70234353Sdimresolution, but experimentation indicated it was in 0.1 second units.)
71226584Sdim
72226584SdimClearly, setting the ac{reg,dir}{min,max} fields to 0 is worse than setting
73226584Sdimit to 1 on those OSs that don't have a way to turn off the attribute cache.
74226584SdimSo the current workaround I've implemented in am-utils is to create a
75226584Sdimconfiguration parameter called "broken_attrcache" which, if turned on, will
76226584Sdimset these nfs_args fields to 1 instead of 0.  I wish I didn't have to create
77226584Sdimsuch ugly workaround features in Amd, but I've got no choice.
78234353Sdim
79226584SdimThe near term solution is for every OS to support a true 'noac' flag, which
80226584Sdimcan be added fairly easily.  This'd make Amd work reliably.
81226584Sdim
82226584SdimThe long term solution is to implement Autofs support for all OSs and to
83226584Sdimsupport it in Amd.  Currently, Amd supports autofs on Solaris and Linux;
84226584SdimFreeBSD is next.  Still, we found that even with autofs support, many
85226584Sdimsysadmins still prefer to use the good 'ol non-autofs mode.
86226584Sdim
87226584Sdim
88226584Sdim* Confirmed Status
89226584Sdim
90226584SdimThis is the confirmed status of various OSs' vulnerability to this attribute
91226584Sdimcache bug.  We are slowly checking the status of other OSs.  The status of
92226584Sdimany OS not listed is unknown as of the date at the top of this file.
93288943Sdim
94288943Sdim** Not Vulnerable (support a proper "noac" flag):
95226584Sdim
96226584SdimSun Solaris 8 and 9 (10 probably works fine)
97288943SdimLinux: 2.6.11 kernel (2.4.latest probably works fine)
98288943SdimFreeBSD 5.4 and 6.0-SNAP001 (older versions probably work fine)
99249423SdimOpenBSD 3.7 (older versions probably work fine)
100249423Sdim
101249423Sdim** Vulnerable (don't support a proper "noac" flag natively):
102288943Sdim
103288943SdimNetBSD 2.0.2 (older versions are also probably affected)
104249423Sdim
105288943SdimNote: NetBSD has promised to support a noac flag hopefully after 2.1.0 is
106249423Sdimreleased (maybe in 3.0 or 2.2).  In the mean time, you can apply one of
107249423Sdimthese two kernel patchs to support a 'noac' flag in NetBSD 2.x or 3.x:
108249423Sdim	ftp://ftp.netbsd.org/pub/NetBSD/misc/christos/2x.nfs.noac.diff
109249423Sdim	ftp://ftp.netbsd.org/pub/NetBSD/misc/christos/3x.nfs.noac.diff
110226584SdimAfter applying this patch and rebuilding your kernel, reboot with the new
111288943Sdimkernel.  Then copy the new nfs.h and nfsmount.h from /sys/nfs/ to
112226584Sdim/usr/include/nfs/, and finally rebuild am-utils from scratch.
113226584Sdim
114288943Sdim** Testing
115226584Sdim
116288943SdimWhen you build am-utils, a script named scripts/test-attrcache is built,
117288943Sdimwhich can be used to test the NFS attribute cache behavior of the current
118226584SdimOS.  You can run this script as root as follows:
119226584Sdim
120226584Sdim# make install
121226584Sdim# cd scripts
122226584Sdim# sh test-attrcache
123226584Sdim
124288943SdimIf you run this script on an OS whose status is known (and not listed
125288943Sdimabove), please report it to am-utils@am-utils.org, so we can record it in
126249423Sdimthis file.
127249423Sdim
128249423SdimSincerely,
129249423SdimErez.
130249423Sdim