README.attrcache revision 174295
1248590Smm		 NFS Attribute Caching OS Problems and Amd
2248590Smm		      Last updated September 18, 2005
3248590Smm
4248590Smm* Summary:
5248590Smm
6248590SmmSome OSs don't seem to have a way to turn off the NFS attribute cache, which
7248590Smmbreaks the Amd automounter so badly that it is not recommend using Amd on
8248590Smmsuch OS for heavy use, not until this is fixed.
9248590Smm
10248590Smm
11248590Smm* Details:
12248590Smm
13248590SmmAmd is a user-level NFSv2 server that manages automounts of all other file
14248590Smmsystems.  The kernel contacts Amd via RPCs, and Amd in turn performs the
15248590Smmactual mounts, and then responds back to the kernel's RPCs.  Every kernel
16248590Smmcaches attributes of files, in a cache called the Directory Name Lookup
17248590SmmCache (DNLC), or a Directory Cache (dcache).
18248590Smm
19248590SmmAmd manages its namespace in the user level, but the kernel caches names
20248590Smmitself.  So the two must coordinate to ensure that both namespaces are in
21248590Smmsync.  If the kernel uses a cached entry from the DNLC, without consulting
22248590SmmAmd, users may see corruption of the automounter namespace (symlinks
23248590Smmpointing to the wrong places, ESTALE errors, and more).  For example,
24248590Smmsuppose Amd timed out an entry and removed the entry from Amd's namespace.
25248590SmmAmd has to tell the kernel to purge its corresponding DNLC entry too.  The
26248590Smmway Amd often does that is by incrementing the last modification time
27248590Smm(mtime) of the parent directory.  This is the most common method for kernels
28248590Smmto check if their DNLC entries are stale: if the parent directory mtime is
29248590Smmnewer, the kernel will discard all cached entries for that directory, and
30248590Smmwill re-issue lookup methods.  Those lookups will result in
31248590SmmNFS_GETATTR/NFS_LOOKUP calls sent from the kernel down to Amd, and Amd can
32248590Smmthen properly inform the kernel of the new state of automounted entries.
33248590Smm
34248590SmmIn order to ensure that Amd is "in charge" of its namespace without
35248590Smminterference from the kernel, Amd will try to turn off the NFS attribute
36248590Smmcache.  It does so by using the NFSMNT_NOAC flag, if it exists, or by
37248590Smmsetting various "cache timeout" fields in struct nfs_args to 0 (acregmin,
38248590Smmacregmax, acdirmin, or acdirmax).
39248590Smm
40248590SmmWe have released a major new version of am-utils, version 6.1, in June 2005.
41248590SmmSince then, a lot of people have experimented with Amd, in anticipation of
42248590Smmmigrating from the very old am-utils 6.0 to the new 6.1.  For a couple of
43248590Smmmonths since the release of 6.1, we have received reports of problems with
44248590SmmAmd, especially under heavy use.  Users reported getting ESTALE errors from
45248590Smmtime to time, or seeing automounted entries whose symlinks don't point to
46248590Smmwhere it should be.  After much debugging, we traced it to a few places in
47248590SmmAmd where it wasn't updating the parent directory mtime as it should have;
48248590Smmin some places where Amd was indeed updating the mtime, it was using a
49248590Smmresolution of only 1 second, which was not fine enough under heavy load.  We
50248590Smmfixed this problem and switched to using a microsecond resolution mtime.
51248590Smm
52248590SmmAfter fixing this in Amd, we went on to verify that things work for other
53248590SmmOSs.  When we got to test certain BSDs, we found out that they always cache
54248590Smmdirectory entries, and there is no way to turn it off completely.
55248590SmmSpecifically, if we set the ac{reg,dir}{min,max} fields in struct nfs_args
56248590Smmall to zero, the kernel seems to cache the entries for a default number of
57248590Smmseconds (something like 5-30 seconds).  On some OSs, setting these four
58248590Smmfields to 0 turns off the attribute cache, but not on some BSDs.  We were
59248590Smmable to verify this using Amd and a script that exercises the interaction of
60248590Smmthe kernel's attrcache and Amd.  (If you're interested, the script can be
61248590Smmmade available.)
62248590Smm
63248590SmmWe then experimented by setting the ac{reg,dir}{min,max} fields in struct
64248590Smmnfs_args all to 1, the smallest non-zero value we could.  When we ran the
65248590SmmAmd exercising script, we found that the value of 1 reduced the race between
66248590Smmthe DNLC and Amd, and the script took a little longer to run before it
67248590Smmdetected an incoherency.  That makes sense: the smaller the DNLC cache
68248590Smminterval is, the shorter the window of vulnerability is.  (BTW, the man
69248590Smmpages on some OSs say that the ac{reg,dir}{min,max} fields use a 1 second
70248590Smmresolution, but experimentation indicated it was in 0.1 second units.)
71248590Smm
72248590SmmClearly, setting the ac{reg,dir}{min,max} fields to 0 is worse than setting
73248590Smmit to 1 on those OSs that don't have a way to turn off the attribute cache.
74248590SmmSo the current workaround I've implemented in am-utils is to create a
75248590Smmconfiguration parameter called "broken_attrcache" which, if turned on, will
76248590Smmset these nfs_args fields to 1 instead of 0.  I wish I didn't have to create
77248590Smmsuch ugly workaround features in Amd, but I've got no choice.
78248590Smm
79248590SmmThe near term solution is for every OS to support a true 'noac' flag, which
80248590Smmcan be added fairly easily.  This'd make Amd work reliably.
81248590Smm
82248590SmmThe long term solution is to implement Autofs support for all OSs and to
83248590Smmsupport it in Amd.  Currently, Amd supports autofs on Solaris and Linux;
84248590SmmFreeBSD is next.  Still, we found that even with autofs support, many
85248590Smmsysadmins still prefer to use the good 'ol non-autofs mode.
86248590Smm
87248590Smm
88248590Smm* Confirmed Status
89248590Smm
90This is the confirmed status of various OSs' vulnerability to this attribute
91cache bug.  We are slowly checking the status of other OSs.  The status of
92any OS not listed is unknown as of the date at the top of this file.
93
94** Not Vulnerable (support a proper "noac" flag):
95
96Sun Solaris 8 and 9 (10 probably works fine)
97Linux: 2.6.11 kernel (2.4.latest probably works fine)
98FreeBSD 5.4 and 6.0-SNAP001 (older versions probably work fine)
99OpenBSD 3.7 (older versions probably work fine)
100
101** Vulnerable (don't support a proper "noac" flag natively):
102
103NetBSD 2.0.2 (older versions are also probably affected)
104
105Note: NetBSD has promised to support a noac flag hopefully after 2.1.0 is
106released (maybe in 3.0 or 2.2).  In the mean time, you can apply one of
107these two kernel patchs to support a 'noac' flag in NetBSD 2.x or 3.x:
108	ftp://ftp.netbsd.org/pub/NetBSD/misc/christos/2x.nfs.noac.diff
109	ftp://ftp.netbsd.org/pub/NetBSD/misc/christos/3x.nfs.noac.diff
110After applying this patch and rebuilding your kernel, reboot with the new
111kernel.  Then copy the new nfs.h and nfsmount.h from /sys/nfs/ to
112/usr/include/nfs/, and finally rebuild am-utils from scratch.
113
114** Testing
115
116When you build am-utils, a script named scripts/test-attrcache is built,
117which can be used to test the NFS attribute cache behavior of the current
118OS.  You can run this script as root as follows:
119
120# make install
121# cd scripts
122# sh test-attrcache
123
124If you run this script on an OS whose status is known (and not listed
125above), please report it to am-utils@am-utils.org, so we can record it in
126this file.
127
128Sincerely,
129Erez.
130