README.attrcache revision 310490
1275970Scy NFS Attribute Caching OS Problems and Amd 2275970Scy Last updated September 18, 2005 3275970Scy 4275970Scy* Summary: 5275970Scy 6275970ScySome OSs don't seem to have a way to turn off the NFS attribute cache, which 7275970Scybreaks the Amd automounter so badly that it is not recommend using Amd on 8275970Scysuch OS for heavy use, not until this is fixed. 9275970Scy 10275970Scy 11275970Scy* Details: 12275970Scy 13275970ScyAmd is a user-level NFSv2 server that manages automounts of all other file 14275970Scysystems. The kernel contacts Amd via RPCs, and Amd in turn performs the 15275970Scyactual mounts, and then responds back to the kernel's RPCs. Every kernel 16275970Scycaches attributes of files, in a cache called the Directory Name Lookup 17275970ScyCache (DNLC), or a Directory Cache (dcache). 18275970Scy 19275970ScyAmd manages its namespace in the user level, but the kernel caches names 20275970Scyitself. So the two must coordinate to ensure that both namespaces are in 21275970Scysync. If the kernel uses a cached entry from the DNLC, without consulting 22275970ScyAmd, users may see corruption of the automounter namespace (symlinks 23275970Scypointing to the wrong places, ESTALE errors, and more). For example, 24275970Scysuppose Amd timed out an entry and removed the entry from Amd's namespace. 25275970ScyAmd has to tell the kernel to purge its corresponding DNLC entry too. The 26275970Scyway Amd often does that is by incrementing the last modification time 27275970Scy(mtime) of the parent directory. This is the most common method for kernels 28275970Scyto check if their DNLC entries are stale: if the parent directory mtime is 29275970Scynewer, the kernel will discard all cached entries for that directory, and 30275970Scywill re-issue lookup methods. Those lookups will result in 31275970ScyNFS_GETATTR/NFS_LOOKUP calls sent from the kernel down to Amd, and Amd can 32275970Scythen properly inform the kernel of the new state of automounted entries. 33275970Scy 34275970ScyIn order to ensure that Amd is "in charge" of its namespace without 35275970Scyinterference from the kernel, Amd will try to turn off the NFS attribute 36275970Scycache. It does so by using the NFSMNT_NOAC flag, if it exists, or by 37275970Scysetting various "cache timeout" fields in struct nfs_args to 0 (acregmin, 38275970Scyacregmax, acdirmin, or acdirmax). 39275970Scy 40275970ScyWe have released a major new version of am-utils, version 6.1, in June 2005. 41275970ScySince then, a lot of people have experimented with Amd, in anticipation of 42275970Scymigrating from the very old am-utils 6.0 to the new 6.1. For a couple of 43275970Scymonths since the release of 6.1, we have received reports of problems with 44275970ScyAmd, especially under heavy use. Users reported getting ESTALE errors from 45275970Scytime to time, or seeing automounted entries whose symlinks don't point to 46275970Scywhere it should be. After much debugging, we traced it to a few places in 47275970ScyAmd where it wasn't updating the parent directory mtime as it should have; 48275970Scyin some places where Amd was indeed updating the mtime, it was using a 49275970Scyresolution of only 1 second, which was not fine enough under heavy load. We 50275970Scyfixed this problem and switched to using a microsecond resolution mtime. 51275970Scy 52275970ScyAfter fixing this in Amd, we went on to verify that things work for other 53275970ScyOSs. When we got to test certain BSDs, we found out that they always cache 54275970Scydirectory entries, and there is no way to turn it off completely. 55275970ScySpecifically, if we set the ac{reg,dir}{min,max} fields in struct nfs_args 56275970Scyall to zero, the kernel seems to cache the entries for a default number of 57275970Scyseconds (something like 5-30 seconds). On some OSs, setting these four 58275970Scyfields to 0 turns off the attribute cache, but not on some BSDs. We were 59275970Scyable to verify this using Amd and a script that exercises the interaction of 60275970Scythe kernel's attrcache and Amd. (If you're interested, the script can be 61275970Scymade available.) 62275970Scy 63275970ScyWe then experimented by setting the ac{reg,dir}{min,max} fields in struct 64275970Scynfs_args all to 1, the smallest non-zero value we could. When we ran the 65275970ScyAmd exercising script, we found that the value of 1 reduced the race between 66275970Scythe DNLC and Amd, and the script took a little longer to run before it 67275970Scydetected an incoherency. That makes sense: the smaller the DNLC cache 68275970Scyinterval is, the shorter the window of vulnerability is. (BTW, the man 69275970Scypages on some OSs say that the ac{reg,dir}{min,max} fields use a 1 second 70275970Scyresolution, but experimentation indicated it was in 0.1 second units.) 71275970Scy 72275970ScyClearly, setting the ac{reg,dir}{min,max} fields to 0 is worse than setting 73275970Scyit to 1 on those OSs that don't have a way to turn off the attribute cache. 74275970ScySo the current workaround I've implemented in am-utils is to create a 75275970Scyconfiguration parameter called "broken_attrcache" which, if turned on, will 76275970Scyset these nfs_args fields to 1 instead of 0. I wish I didn't have to create 77275970Scysuch ugly workaround features in Amd, but I've got no choice. 78275970Scy 79275970ScyThe near term solution is for every OS to support a true 'noac' flag, which 80275970Scycan be added fairly easily. This'd make Amd work reliably. 81275970Scy 82275970ScyThe long term solution is to implement Autofs support for all OSs and to 83275970Scysupport it in Amd. Currently, Amd supports autofs on Solaris and Linux; 84275970ScyFreeBSD is next. Still, we found that even with autofs support, many 85275970Scysysadmins still prefer to use the good 'ol non-autofs mode. 86275970Scy 87275970Scy 88275970Scy* Confirmed Status 89275970Scy 90275970ScyThis is the confirmed status of various OSs' vulnerability to this attribute 91275970Scycache bug. We are slowly checking the status of other OSs. The status of 92275970Scyany OS not listed is unknown as of the date at the top of this file. 93275970Scy 94275970Scy** Not Vulnerable (support a proper "noac" flag): 95275970Scy 96275970ScySun Solaris 8 and 9 (10 probably works fine) 97275970ScyLinux: 2.6.11 kernel (2.4.latest probably works fine) 98275970ScyFreeBSD 5.4 and 6.0-SNAP001 (older versions probably work fine) 99275970ScyOpenBSD 3.7 (older versions probably work fine) 100275970Scy 101275970Scy** Vulnerable (don't support a proper "noac" flag natively): 102275970Scy 103275970ScyNetBSD 2.0.2 (older versions are also probably affected) 104275970Scy 105275970ScyNote: NetBSD has promised to support a noac flag hopefully after 2.1.0 is 106275970Scyreleased (maybe in 3.0 or 2.2). In the mean time, you can apply one of 107275970Scythese two kernel patchs to support a 'noac' flag in NetBSD 2.x or 3.x: 108275970Scy ftp://ftp.netbsd.org/pub/NetBSD/misc/christos/2x.nfs.noac.diff 109275970Scy ftp://ftp.netbsd.org/pub/NetBSD/misc/christos/3x.nfs.noac.diff 110275970ScyAfter applying this patch and rebuilding your kernel, reboot with the new 111275970Scykernel. Then copy the new nfs.h and nfsmount.h from /sys/nfs/ to 112275970Scy/usr/include/nfs/, and finally rebuild am-utils from scratch. 113275970Scy 114275970Scy** Testing 115275970Scy 116275970ScyWhen you build am-utils, a script named scripts/test-attrcache is built, 117275970Scywhich can be used to test the NFS attribute cache behavior of the current 118275970ScyOS. You can run this script as root as follows: 119275970Scy 120275970Scy# make install 121275970Scy# cd scripts 122275970Scy# sh test-attrcache 123275970Scy 124275970ScyIf you run this script on an OS whose status is known (and not listed 125275970Scyabove), please report it to us via Bugzilla or the am-utils mailing list 126275970Scy(see www.am-utils.org), so we can record it in this file. 127275970Scy 128275970ScySincerely, 129275970ScyErez. 130275970Scy