#
24457f1b |
|
04-Apr-2024 |
Kuniyuki Iwashima <kuniyu@amazon.com> |
nfs: Handle error of rpc_proc_register() in nfs_net_init(). syzkaller reported a warning [0] triggered while destroying immature netns. rpc_proc_register() was called in init_nfs_fs(), but its error has been ignored since at least the initial commit 1da177e4c3f4 ("Linux-2.6.12-rc2"). Recently, commit d47151b79e32 ("nfs: expose /proc/net/sunrpc/nfs in net namespaces") converted the procfs to per-netns and made the problem more visible. Even when rpc_proc_register() fails, nfs_net_init() could succeed, and thus nfs_net_exit() will be called while destroying the netns. Then, remove_proc_entry() will be called for non-existing proc directory and trigger the warning below. Let's handle the error of rpc_proc_register() properly in nfs_net_init(). [0]: name 'nfs' WARNING: CPU: 1 PID: 1710 at fs/proc/generic.c:711 remove_proc_entry+0x1bb/0x2d0 fs/proc/generic.c:711 Modules linked in: CPU: 1 PID: 1710 Comm: syz-executor.2 Not tainted 6.8.0-12822-gcd51db110a7e #12 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:remove_proc_entry+0x1bb/0x2d0 fs/proc/generic.c:711 Code: 41 5d 41 5e c3 e8 85 09 b5 ff 48 c7 c7 88 58 64 86 e8 09 0e 71 02 e8 74 09 b5 ff 4c 89 e6 48 c7 c7 de 1b 80 84 e8 c5 ad 97 ff <0f> 0b eb b1 e8 5c 09 b5 ff 48 c7 c7 88 58 64 86 e8 e0 0d 71 02 eb RSP: 0018:ffffc9000c6d7ce0 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff8880422b8b00 RCX: ffffffff8110503c RDX: ffff888030652f00 RSI: ffffffff81105045 RDI: 0000000000000001 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: ffffffff81bb62cb R12: ffffffff84807ffc R13: ffff88804ad6fcc0 R14: ffffffff84807ffc R15: ffffffff85741ff8 FS: 00007f30cfba8640(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ff51afe8000 CR3: 000000005a60a005 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> rpc_proc_unregister+0x64/0x70 net/sunrpc/stats.c:310 nfs_net_exit+0x1c/0x30 fs/nfs/inode.c:2438 ops_exit_list+0x62/0xb0 net/core/net_namespace.c:170 setup_net+0x46c/0x660 net/core/net_namespace.c:372 copy_net_ns+0x244/0x590 net/core/net_namespace.c:505 create_new_namespaces+0x2ed/0x770 kernel/nsproxy.c:110 unshare_nsproxy_namespaces+0xae/0x160 kernel/nsproxy.c:228 ksys_unshare+0x342/0x760 kernel/fork.c:3322 __do_sys_unshare kernel/fork.c:3393 [inline] __se_sys_unshare kernel/fork.c:3391 [inline] __x64_sys_unshare+0x1f/0x30 kernel/fork.c:3391 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x4f/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x46/0x4e RIP: 0033:0x7f30d0febe5d Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 9f 1b 00 f7 d8 64 89 01 48 RSP: 002b:00007f30cfba7cc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000110 RAX: ffffffffffffffda RBX: 00000000004bbf80 RCX: 00007f30d0febe5d RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000006c020600 RBP: 00000000004bbf80 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002 R13: 000000000000000b R14: 00007f30d104c530 R15: 0000000000000000 </TASK> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
1548036e |
|
15-Feb-2024 |
Josef Bacik <josef@toxicpanda.com> |
nfs: make the rpc_stat per net namespace Now that we're exposing the rpc stats on a per-network namespace basis, move this struct into struct nfs_net and use that to make sure only the per-network namespace stats are exposed. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
d47151b7 |
|
15-Feb-2024 |
Josef Bacik <josef@toxicpanda.com> |
nfs: expose /proc/net/sunrpc/nfs in net namespaces We're using nfs mounts inside of containers in production and noticed that the nfs stats are not exposed in /proc. This is a problem for us as we use these stats for monitoring, and have to do this awkward bind mount from the main host into the container in order to get to these states. Add the rpc_proc_register call to the pernet operations entry and exit points so these stats can be exposed inside of network namespaces. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
f88c3fb8 |
|
12-Mar-2024 |
Linus Torvalds <torvalds@linux-foundation.org> |
mm, slab: remove last vestiges of SLAB_MEM_SPREAD Yes, yes, I know the slab people were planning on going slow and letting every subsystem fight this thing on their own. But let's just rip off the band-aid and get it over and done with. I don't want to see a number of unnecessary pull requests just to get rid of a flag that no longer has any meaning. This was mainly done with a couple of 'sed' scripts and then some manual cleanup of the end result. Link: https://lore.kernel.org/all/CAHk-=wji0u+OOtmAOD-5JV3SXcRJF___k_+8XNKmak0yd5vW1Q@mail.gmail.com/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
41d581a9 |
|
04-Oct-2023 |
Jeff Layton <jlayton@kernel.org> |
nfs: convert to new timestamp accessors Convert to using the new inode timestamp accessor functions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20231004185347.80880-49-jlayton@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
|
#
0d72b928 |
|
07-Aug-2023 |
Jeff Layton <jlayton@kernel.org> |
fs: pass the request_mask to generic_fillattr generic_fillattr just fills in the entire stat struct indiscriminately today, copying data from the inode. There is at least one attribute (STATX_CHANGE_COOKIE) that can have side effects when it is reported, and we're looking at adding more with the addition of multigrain timestamps. Add a request_mask argument to generic_fillattr and have most callers just pass in the value that is passed to getattr. Have other callers (e.g. ksmbd) just pass in STATX_BASIC_STATS. Also move the setting of STATX_CHANGE_COOKIE into generic_fillattr. Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: "Paulo Alcantara (SUSE)" <pc@manguebit.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jeff Layton <jlayton@kernel.org> Message-Id: <20230807-mgctime-v7-2-d1dec143a704@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
|
#
55e04e9c |
|
05-Jul-2023 |
Jeff Layton <jlayton@kernel.org> |
nfs: convert to ctime accessor functions In later patches, we're going to change how the inode's ctime field is used. Switch to using accessor functions instead of raw accesses of inode->i_ctime. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230705190309.579783-55-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
|
#
cded49ba |
|
12-Jun-2023 |
Jeff Layton <jlayton@kernel.org> |
nfs: don't report STATX_BTIME in ->getattr NFS doesn't properly support reporting the btime in getattr (yet), but 61a968b4f05e mistakenly added it to the request_mask. This causes statx for STATX_BTIME to report a zeroed out btime instead of properly clearing the flag. Cc: stable@vger.kernel.org # v6.3+ Fixes: 61a968b4f05e ("nfs: report the inode version in getattr if requested") Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2214134 Reported-by: Boyang Xue <bxue@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
3db63daa |
|
21-Mar-2023 |
NeilBrown <neilb@suse.de> |
NFSv3: handle out-of-order write replies. NFSv3 includes pre/post wcc attributes which allow the client to determine if all changes to the file have been made by the client itself, or if any might have been made by some other client. If there are gaps in the pre/post ctime sequence it must be assumed that some other client changed the file in that gap and the local cache must be suspect. The next time the file is opened the cache should be invalidated. Since Commit 1c341b777501 ("NFS: Add deferred cache invalidation for close-to-open consistency violations") in linux 5.3 the Linux client has been triggering this invalidation. The chunk in nfs_update_inode() in particularly triggers. Unfortunately Linux NFS assumes that all replies will be processed in the order sent, and will arrive in the order processed. This is not true in general. Consequently Linux NFS might ignore the wcc info in a WRITE reply because the reply is in response to a WRITE that was sent before some other request for which a reply has already been seen. This is detected by Linux using the gencount tests in nfs_inode_attr_cmp(). Also, when the gencount tests pass it is still possible that the request were processed on the server in a different order, and a gap seen in the ctime sequence might be filled in by a subsequent reply, so gaps should not immediately trigger delayed invalidation. The net result is that writing to a server and then reading the file back can result in going to the server for the read rather than serving it from cache - all because a couple of replies arrived out-of-order. This is a performance regression over kernels before 5.3, though the change in 5.3 is a correctness improvement. This has been seen with Linux writing to a Netapp server which occasionally re-orders requests. In testing the majority of requests were in-order, but a few (maybe 2 or three at a time) could be re-ordered. This patch addresses the problem by recording any gaps seen in the pre/post ctime sequence and not triggering invalidation until either there are too many gaps to fit in the table, or until there are no more active writes and the remaining gaps cannot be resolved. We allocate a table of 16 gaps on demand. If the allocation fails we revert to current behaviour which is of little cost as we are unlikely to be able to cache the writes anyway. In the table we store "start->end" pair when iversion is updated and "end<-start" pairs pre/post pairs reported by the server. Usually these exactly cancel out and so nothing is stored. When there are out-of-order replies we do store gaps and these will eventually be cancelled against later replies when this client is the only writer. If the final write is out-of-order there may be one gap remaining when the file is closed. This will be noticed and if there is precisely on gap and if the iversion can be advanced to match it, then we do so. This patch makes no attempt to handle directories correctly. The same problem potentially exists in the out-of-order replies to create/unlink requests can cause future lookup requires to be sent to the server unnecessarily. A similar scheme using the same primitives could be used to notice and handle out-of-order replies. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
000dbe0b |
|
20-Feb-2023 |
Dave Wysochanski <dwysocha@redhat.com> |
NFS: Convert buffered read paths to use netfs when fscache is enabled Convert the NFS buffered read code paths to corresponding netfs APIs, but only when fscache is configured and enabled. The netfs API defines struct netfs_request_ops which must be filled in by the network filesystem. For NFS, we only need to define 5 of the functions, the main one being the issue_read() function. The issue_read() function is called by the netfs layer when a read cannot be fulfilled locally, and must be sent to the server (either the cache is not active, or it is active but the data is not available). Once the read from the server is complete, netfs requires a call to netfs_subreq_terminated() which conveys either how many bytes were read successfully, or an error. Note that issue_read() is called with a structure, netfs_io_subrequest, which defines the IO requested, and contains a start and a length (both in bytes), and assumes the underlying netfs will return a either an error on the whole region, or the number of bytes successfully read. The NFS IO path is page based and the main APIs are the pgio APIs defined in pagelist.c. For the pgio APIs, there is no way for the caller to know how many RPCs will be sent and how the pages will be broken up into underlying RPCs, each of which will have their own completion and return code. In contrast, netfs is subrequest based, a single subrequest may contain multiple pages, and a single subrequest is initiated with issue_read() and terminated with netfs_subreq_terminated(). Thus, to utilze the netfs APIs, NFS needs some way to accommodate the netfs API requirement on the single response to the whole subrequest, while also minimizing disruptive changes to the NFS pgio layer. The approach taken with this patch is to allocate a small structure for each nfs_netfs_issue_read() call, store the final error and number of bytes successfully transferred in the structure, and update these values as each RPC completes. The refcount on the structure is used as a marker for the last RPC completion, is incremented in nfs_netfs_read_initiate(), and decremented inside nfs_netfs_read_completion(), when a nfs_pgio_header contains a valid pointer to the data. On the final put (which signals the final outstanding RPC is complete) in nfs_netfs_read_completion(), call netfs_subreq_terminated() with either the final error value (if one or more READs complete with an error) or the number of bytes successfully transferred (if all RPCs complete successfully). Note that when all RPCs complete successfully, the number of bytes transferred is capped to the length of the subrequest. Capping the transferred length to the subrequest length prevents "Subreq overread" warnings from netfs. This is due to the "aligned_len" in nfs_pageio_add_page(), and the corner case where NFS requests a full page at the end of the file, even when i_size reflects only a partial page (NFS overread). Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Tested-by: Daire Byrne <daire@dneg.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
4f704d9a |
|
13-Mar-2023 |
Christian Brauner <brauner@kernel.org> |
nfs: use vfs setgid helper We've aligned setgid behavior over multiple kernel releases. The details can be found in the following two merge messages: cf619f891971 ("Merge tag 'fs.ovl.setgid.v6.2') 426b4ca2d6a5 ("Merge tag 'fs.setgid.v6.0') Consistent setgid stripping behavior is now encapsulated in the setattr_should_drop_sgid() helper which is used by all filesystems that strip setgid bits outside of vfs proper. Switch nfs to rely on this helper as well. Without this patch the setgid stripping tests in xfstests will fail. Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Message-Id: <20230313-fs-nfs-setgid-v2-1-9a59f436cfc0@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
|
#
61a968b4 |
|
05-Aug-2022 |
Jeff Layton <jlayton@kernel.org> |
nfs: report the inode version in getattr if requested Allow NFS to report the i_version in getattr requests. Since the cost to fetch it is relatively cheap, do it unconditionally and just set the flag if it looks like it's valid. Also, conditionally enable the MONOTONIC flag when the server reports its change attr type as such. Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Jeff Layton <jlayton@kernel.org>
|
#
b74d24f7 |
|
12-Jan-2023 |
Christian Brauner <brauner@kernel.org> |
fs: port ->getattr() to pass mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
#
c1632a0f |
|
12-Jan-2023 |
Christian Brauner <brauner@kernel.org> |
fs: port ->setattr() to pass mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
#
6f1c1d95 |
|
22-Sep-2022 |
ChenXiaoSong <chenxiaosong2@huawei.com> |
NFS: make sure open context mode have FMODE_EXEC when file open for exec Because file f_mode never have FMODE_EXEC, open context mode won't get FMODE_EXEC from file f_mode. Open context mode only care about FMODE_READ/ FMODE_WRITE/FMODE_EXEC, and all info about open context mode can be convert from file f_flags, so convert file f_flags to open context mode by flags_to_mode(). Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
7e7ce2cc |
|
14-Jun-2022 |
yuzhe <yuzhe@nfschina.com> |
nfs: remove unnecessary (void*) conversions. remove unnecessary void* type castings. Signed-off-by: yuzhe <yuzhe@nfschina.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
f5d39b02 |
|
22-Aug-2022 |
Peter Zijlstra <peterz@infradead.org> |
freezer,sched: Rewrite core freezer logic Rewrite the core freezer to behave better wrt thawing and be simpler in general. By replacing PF_FROZEN with TASK_FROZEN, a special block state, it is ensured frozen tasks stay frozen until thawed and don't randomly wake up early, as is currently possible. As such, it does away with PF_FROZEN and PF_FREEZER_SKIP, freeing up two PF_flags (yay!). Specifically; the current scheme works a little like: freezer_do_not_count(); schedule(); freezer_count(); And either the task is blocked, or it lands in try_to_freezer() through freezer_count(). Now, when it is blocked, the freezer considers it frozen and continues. However, on thawing, once pm_freezing is cleared, freezer_count() stops working, and any random/spurious wakeup will let a task run before its time. That is, thawing tries to thaw things in explicit order; kernel threads and workqueues before doing bringing SMP back before userspace etc.. However due to the above mentioned races it is entirely possible for userspace tasks to thaw (by accident) before SMP is back. This can be a fatal problem in asymmetric ISA architectures (eg ARMv9) where the userspace task requires a special CPU to run. As said; replace this with a special task state TASK_FROZEN and add the following state transitions: TASK_FREEZABLE -> TASK_FROZEN __TASK_STOPPED -> TASK_FROZEN __TASK_TRACED -> TASK_FROZEN The new TASK_FREEZABLE can be set on any state part of TASK_NORMAL (IOW. TASK_INTERRUPTIBLE and TASK_UNINTERRUPTIBLE) -- any such state is already required to deal with spurious wakeups and the freezer causes one such when thawing the task (since the original state is lost). The special __TASK_{STOPPED,TRACED} states *can* be restored since their canonical state is in ->jobctl. With this, frozen tasks need an explicit TASK_FROZEN wakeup and are free of undue (early / spurious) wakeups. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20220822114649.055452969@infradead.org
|
#
67f4b5dc |
|
13-Aug-2022 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Fix another fsync() issue after a server reboot Currently, when the writeback code detects a server reboot, it redirties any pages that were not committed to disk, and it sets the flag NFS_CONTEXT_RESEND_WRITES in the nfs_open_context of the file descriptor that dirtied the file. While this allows the file descriptor in question to redrive its own writes, it violates the fsync() requirement that we should be synchronising all writes to disk. While the problem is infrequent, we do see corner cases where an untimely server reboot causes the fsync() call to abandon its attempt to sync data to disk and causing data corruption issues due to missed error conditions or similar. In order to tighted up the client's ability to deal with this situation without introducing livelocks, add a counter that records the number of times pages are redirtied due to a server reboot-like condition, and use that in fsync() to redrive the sync to disk. Fixes: 2197e9b06c22 ("NFS: Fix up fsync() when the server rebooted") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
ab0fc21b |
|
29-Mar-2022 |
ChenXiaoSong <chenxiaosong2@huawei.com> |
Revert "NFSv4: Handle the special Linux file open access mode" This reverts commit 44942b4e457beda00981f616402a1a791e8c616e. After secondly opening a file with O_ACCMODE|O_DIRECT flags, nfs4_valid_open_stateid() will dereference NULL nfs4_state when lseek(). Reproducer: 1. mount -t nfs -o vers=4.2 $server_ip:/ /mnt/ 2. fd = open("/mnt/file", O_ACCMODE|O_DIRECT|O_CREAT) 3. close(fd) 4. fd = open("/mnt/file", O_ACCMODE|O_DIRECT) 5. lseek(fd) Reported-by: Lyu Tao <tao.lyu@epfl.ch> Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
ad1e109a |
|
17-Feb-2022 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Don't ask for readdirplus unless it can help nfs_getattr() If attribute caching is turned off, then use of readdirplus is not going to help stat() performance. Readdirplus also doesn't help if a file is being written to, since we will have to flush those writes in order to sync the mtime/ctime. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
230bc98f |
|
17-Feb-2022 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Improve heuristic for readdirplus The heuristic for readdirplus is designed to try to detect 'ls -l' and similar patterns. It does so by looking for cache hit/miss patterns in both the attribute cache and in the dcache of the files in a given directory, and then sets a flag for the readdirplus code to interpret. The problem with this approach is that a single attribute or dcache miss can cause the NFS code to force a refresh of the attributes for the entire set of files contained in the directory. To be able to make a more nuanced decision, let's sample the number of hits and misses in the set of open directory descriptors. That allows us to set thresholds at which we start preferring READDIRPLUS over regular READDIR, or at which we start to force a re-read of the remaining readdir cache using READDIRPLUS. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
f1ec501d |
|
23-Feb-2022 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Remove unnecessary XATTR cache invalidation in nfs_fhget() We should never expect the 'xattr_cache' to be non-null in that case, hence nfs_set_cache_invalid() is just going to optimise it away. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
41e97b7f |
|
09-Feb-2022 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Remove unused flag NFS_INO_REVAL_PAGECACHE Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
88a6099f |
|
09-Feb-2022 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Replace last uses of NFS_INO_REVAL_PAGECACHE Now that we have more fine grained attribute revalidation, let's just get rid of NFS_INO_REVAL_PAGECACHE. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
da48f267 |
|
29-Jan-2022 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Convert GFP_NOFS to GFP_KERNEL Assume that sections that should not re-enter the filesystem are already protected with memalloc_nofs_save/restore call, so relax those GFP_NOFS instances which might be used by other contexts. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
d7867712 |
|
29-Jan-2022 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Charge open/lock file contexts to kmemcg Allow kmemcg to limit the number of open/lock file contexts, in the same way that it limits the parent file descriptors. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
fd60b288 |
|
22-Mar-2022 |
Muchun Song <songmuchun@bytedance.com> |
fs: allocate inode by using alloc_inode_sb() The inode allocation is supposed to use alloc_inode_sb(), so convert kmem_cache_alloc() of all filesystems to alloc_inode_sb(). Link: https://lkml.kernel.org/r/20220228122126.37293-5-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Theodore Ts'o <tytso@mit.edu> [ext4] Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Alex Shi <alexs@kernel.org> Cc: Anna Schumaker <Anna.Schumaker@Netapp.com> Cc: Chao Yu <chao@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Fam Zheng <fam.zheng@bytedance.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kari Argillander <kari.argillander@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
d19e0183 |
|
15-Feb-2022 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Do not report writeback errors in nfs_getattr() The result of the writeback, whether it is an ENOSPC or an EIO, or anything else, does not inhibit the NFS client from reporting the correct file timestamps. Fixes: 79566ef018f5 ("NFS: Getattr doesn't require data sync semantics") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
a6b5a28e |
|
14-Nov-2020 |
Dave Wysochanski <dwysocha@redhat.com> |
nfs: Convert to new fscache volume/cookie API Change the nfs filesystem to support fscache's indexing rewrite and reenable caching in nfs. The following changes have been made: (1) The fscache_netfs struct is no more, and there's no need to register the filesystem as a whole. (2) The session cookie is now an fscache_volume cookie, allocated with fscache_acquire_volume(). That takes three parameters: a string representing the "volume" in the index, a string naming the cache to use (or NULL) and a u64 that conveys coherency metadata for the volume. For nfs, I've made it render the volume name string as: "nfs,<ver>,<family>,<address>,<port>,<fsidH>,<fsidL>*<,param>[,<uniq>]" (3) The fscache_cookie_def is no more and needed information is passed directly to fscache_acquire_cookie(). The cache no longer calls back into the filesystem, but rather metadata changes are indicated at other times. fscache_acquire_cookie() is passed the same keying and coherency information as before. (4) fscache_enable/disable_cookie() have been removed. Call fscache_use_cookie() and fscache_unuse_cookie() when a file is opened or closed to prevent a cache file from being culled and to keep resources to hand that are needed to do I/O. If a file is opened for writing, we invalidate it with FSCACHE_INVAL_DIO_WRITE in lieu of doing writeback to the cache, thereby making it cease caching until all currently open files are closed. This should give the same behaviour as the uptream code. Making the cache store local modifications isn't straightforward for NFS, so that's left for future patches. (5) fscache_invalidate() now needs to be given uptodate auxiliary data and a file size. It also takes a flag to indicate if this was due to a DIO write. (6) Call nfs_fscache_invalidate() with FSCACHE_INVAL_DIO_WRITE on a file to which a DIO write is made. (7) Call fscache_note_page_release() from nfs_release_page(). (8) Use a killable wait in nfs_vm_page_mkwrite() when waiting for PG_fscache to be cleared. (9) The functions to read and write data to/from the cache are stubbed out pending a conversion to use netfslib. Changes ======= ver #3: - Added missing =n fallback for nfs_fscache_release_file()[1][2]. ver #2: - Use gfpflags_allow_blocking() rather than using flag directly. - fscache_acquire_volume() now returns errors. - Remove NFS_INO_FSCACHE as it's no longer used. - Need to unuse a cookie on file-release, not inode-clear. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Co-developed-by: David Howells <dhowells@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Acked-by: Jeff Layton <jlayton@kernel.org> cc: Trond Myklebust <trond.myklebust@hammerspace.com> cc: Anna Schumaker <anna.schumaker@netapp.com> cc: linux-nfs@vger.kernel.org cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/202112100804.nksO8K4u-lkp@intel.com/ [1] Link: https://lore.kernel.org/r/202112100957.2oEDT20W-lkp@intel.com/ [2] Link: https://lore.kernel.org/r/163819668938.215744.14448852181937731615.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906979003.143852.2601189243864854724.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967182112.1823006.7791504655391213379.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021575950.640689.12069642327533368467.stgit@warthog.procyon.org.uk/ # v4
|
#
93c2e5e0 |
|
16-Nov-2021 |
Benjamin Coddington <bcodding@redhat.com> |
NFS: Add a tracepoint to show the results of nfs_set_cache_invalid() This provides some insight into the client's invalidation behavior to show both when the client uses the helper, and the results of calling the helper which can vary depending on how the helper is called. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
dd225cb3 |
|
22-Oct-2021 |
Anna Schumaker <Anna.Schumaker@Netapp.com> |
NFS: Remove the nfs4_label argument from nfs_setsecurity Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
cf7ab00a |
|
22-Oct-2021 |
Anna Schumaker <Anna.Schumaker@Netapp.com> |
NFS: Remove the nfs4_label argument from nfs_fhget() Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
1b00ad65 |
|
22-Oct-2021 |
Anna Schumaker <Anna.Schumaker@Netapp.com> |
NFS: Remove the nfs4_label from the nfs_setattrres Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
2ef61e0e |
|
22-Oct-2021 |
Anna Schumaker <Anna.Schumaker@Netapp.com> |
NFS: Remove the nfs4_label from the nfs4_getattr_res Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
d755ad8d |
|
22-Oct-2021 |
Anna Schumaker <Anna.Schumaker@Netapp.com> |
NFS: Create a new nfs_alloc_fattr_with_label() function For creating fattrs with the label field already allocated for us. I also update nfs_free_fattr() to free the label in the end. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
d4a95a7e |
|
04-Nov-2021 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Always initialise fattr->label in nfs_fattr_alloc() We're about to add a check in nfs_free_fattr() for whether or not the label is non-zero. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
e48c81bb |
|
05-Nov-2021 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFSv4: Remove unnecessary 'minor version' check It is completely redundant to the server capability check. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
e591b298 |
|
28-Sep-2021 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Save some space in the inode Save some space in the nfs_inode by setting up an anonymous union with the fields that are peculiar to a specific type of filesystem object. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
110cb2d2 |
|
04-Oct-2021 |
Chuck Lever <chuck.lever@oracle.com> |
NFS: Instrument i_size_write() Generate a trace event whenever the NFS client modifies the size of a file. These new events aid troubleshooting workloads that trigger races around size updates. There are four new trace points, all named nfs_size_something so they are easy to grep for or enable as a group with a single glob. Size updated on the server: kworker/u24:10-194 [010] 369.939174: nfs_size_update: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899344277980615 cursize=250471 newsize=172083 Server-side size update reported via NFSv3 WCC attributes: fsx-1387 [006] 380.760686: nfs_size_wcc: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899355909932456 cursize=146792 newsize=171216 File has been truncated locally: fsx-1387 [007] 369.437421: nfs_size_truncate: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899231200117272 cursize=215244 newsize=0 File has been extended locally: fsx-1387 [007] 369.439213: nfs_size_grow: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899343704248410 cursize=258048 newsize=262144 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
36a10a3c |
|
02-Oct-2021 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Remove unnecessary page cache invalidations Remove cache invalidations that are already covered by change attribute updates. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
488796ec |
|
28-Sep-2021 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Don't set NFS_INO_DATA_INVAL_DEFER and NFS_INO_INVALID_DATA NFS_INO_DATA_INVAL_DEFER and NFS_INO_INVALID_DATA should be considered mutually exclusive. Fixes: 1c341b777501 ("NFS: Add deferred cache invalidation for close-to-open consistency violations") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
|
#
eea41330 |
|
26-Sep-2021 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Default change_attr_type to NFS4_CHANGE_TYPE_IS_UNDEFINED Both NFSv3 and NFSv2 generate their change attribute from the ctime value that was supplied by the server. However the problem is that there are plenty of servers out there with ctime resolutions of 1ms or worse. In a modern performance system, this is insufficient when trying to decide which is the most recent set of attributes when, for instance, a READ or GETATTR call races with a WRITE or SETATTR. For this reason, let's revert to labelling the NFSv2/v3 change attributes as NFS4_CHANGE_TYPE_IS_UNDEFINED. This will ensure we protect against such races. Fixes: 7b24dacf0840 ("NFS: Another inode revalidation improvement") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
|
#
ca05cbae |
|
10-Jul-2021 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Fix up nfs_ctx_key_to_expire() If the cached credential exists but doesn't have any expiration callback then exit early. Fix up atomicity issues when replacing the credential with a new one since the existing code could lead to refcount leaks. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
e97bc663 |
|
11-May-2021 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: nfs_find_open_context() may only select open files If a file has already been closed, then it should not be selected to support further I/O. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> [Trond: Fix an invalid pointer deref reported by Colin Ian King]
|
#
a9601ac5 |
|
26-Jun-2021 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Avoid duplicate resets of attribute cache timeouts We know that the attributes changed on the server if and only if the change attribute is different. Otherwise, we're just refreshing our cache with values that were already known to be stale. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
213bb584 |
|
26-Jun-2021 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Fix up inode attribute revalidation timeouts The inode is considered revalidated when we've checked the value of the change attribute against our cached value since that suffices to establish whether or not the other cached values are valid. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
ce62b114 |
|
05-Mar-2018 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Split attribute support out from the server capabilities There are lots of attributes, and they are crowding out the bit space. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
cc7f2dae |
|
11-Apr-2021 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Don't store NFS_INO_REVAL_FORCED NFS_INO_REVAL_FORCED is intended to tell us that the cache needs revalidation despite the fact that we hold a delegation. We shouldn't need to store it anymore, though. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
7b24dacf |
|
09-Apr-2021 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Another inode revalidation improvement If we're trying to update the inode because a previous update left the cache in a partially unrevalidated state, then allow the update if the change attrs match. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
6f9be83d |
|
26-Mar-2021 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Use information about the change attribute to optimise updates If the NFSv4.2 server supports the 'change_attr_type' attribute, then allow the client to optimise its attribute cache update strategy. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
04c63498 |
|
25-Mar-2021 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Simplify cache consistency in nfs_check_inode_attributes() We should not be invalidating the access or acl caches in nfs_check_inode_attributes(), since the point is we're unsure about whether the contents of the struct nfs_fattr are fully up to date. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
c88c696c |
|
25-Mar-2021 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Remove a line of code that has no effect in nfs_update_inode() Commit 0b467264d0db ("NFS: Fix attribute revalidation") changed the way we populate the 'invalid' attribute, and made the line that strips away the NFS_INO_INVALID_ATTR bits redundant. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
709fa576 |
|
25-Mar-2021 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Fix up handling of outstanding layoutcommit in nfs_update_inode() If there is an outstanding layoutcommit, then the list of attributes whose values are expected to change is not the full set. So let's be explicit about the full list. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
720869eb |
|
13-Apr-2021 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Separate tracking of file mode cache validity from the uid/gid chown()/chgrp() and chmod() are separate operations, and in addition, there are mode operations that are performed automatically by the server. So let's track mode validity separately from the file ownership validity. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
fabf2b34 |
|
25-Mar-2021 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Separate tracking of file nlinks cache validity from the mode/uid/gid Rename can cause us to revalidate the access cache, so lets track the nlinks separately from the mode/uid/gid. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
36a9346c |
|
25-Mar-2021 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Don't set NFS_INO_REVAL_PAGECACHE in the inode cache validity It is no longer necessary to preserve the NFS_INO_REVAL_PAGECACHE flag. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
13c0b082 |
|
25-Mar-2021 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Replace use of NFS_INO_REVAL_PAGECACHE when checking cache validity When checking cache validity, be more specific than just 'we want to check the page cache validity'. In almost all cases, we want to check that change attribute, and possibly also the size. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
1f3208b2 |
|
25-Mar-2021 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Add a cache validity flag argument to nfs_revalidate_inode() Add an argument to nfs_revalidate_inode() to allow callers to specify which attributes they need to check for validity. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
1f9f4328 |
|
12-Apr-2021 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: nfs_setattr_update_inode() should clear the suid/sgid bits When we do a 'chown' or 'chgrp', the server will clear the suid/sgid bits. Ensure that we mirror that in nfs_setattr_update_inode(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
63cdd7ed |
|
24-Mar-2021 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Fix up statx() results If statx has valid attributes available that weren't asked for, then return them and set the result mask appropriately. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
e8764a6f |
|
24-Mar-2021 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Don't revalidate attributes that are not being asked for If the user doesn't set STATX_UID/GID/MODE, then don't care if they are known to be stale. Ditto if we're not being asked for the file size. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
4cdfeb64 |
|
23-Mar-2021 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Fix up revalidation of space used Ensure that when the change attribute or the size change, we also remember to revalidate the space used. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
50c7a799 |
|
25-Mar-2021 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: NFS_INO_REVAL_PAGECACHE should mark the change attribute invalid When we're looking to revalidate the page cache, we should just ensure that we mark the change attribute invalid. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
4eb6a823 |
|
25-Mar-2021 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Mask out unsupported attributes in nfs_getattr() We don't currently support STATX_BTIME, so don't advertise it in the return values for nfs_getattr(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
9fdbfad1 |
|
29-Mar-2021 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Deal correctly with attribute generation counter overflow We need to use unsigned long subtraction and then convert to signed in order to deal correcly with C overflow rules. Fixes: f5062003465c ("NFS: Set an attribute barrier on all updates") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
beab450d |
|
26-Mar-2021 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Fix fscache invalidation in nfs_set_cache_invalid() Ensure that we invalidate the fscache before we strip the NFS_INO_INVALID_DATA flag. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
6e3e2c43 |
|
01-Mar-2021 |
Al Viro <viro@zeniv.linux.org.uk> |
new helper: inode_wrong_type() inode_wrong_type(inode, mode) returns true if setting inode->i_mode to given value would've changed the inode type. We have enough of those checks open-coded to make a helper worthwhile. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
b6f80a2e |
|
08-Mar-2021 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Fix open coded versions of nfs_set_cache_invalid() in NFSv4 nfs_set_cache_invalid() has code to handle delegations, and other optimisations, so let's use it when appropriate. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
ac46b3d7 |
|
08-Mar-2021 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Fix open coded versions of nfs_set_cache_invalid() nfs_set_cache_invalid() has code to handle delegations, and other optimisations, so let's use it when appropriate. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
fd6d3fee |
|
08-Mar-2021 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Clean up function nfs_mark_dir_for_revalidate() Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
848fdd62 |
|
08-Feb-2021 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Don't set NFS_INO_INVALID_XATTR if there is no xattr cache Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
28aa2f9e |
|
08-Feb-2021 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Always clear an invalid mapping when attempting a buffered write If the page cache is invalid, then we can't do read-modify-write, so ensure that we do clear it when we know it is invalid. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
549c7297 |
|
21-Jan-2021 |
Christian Brauner <christian.brauner@ubuntu.com> |
fs: make helpers idmap mount aware Extend some inode methods with an additional user namespace argument. A filesystem that is aware of idmapped mounts will receive the user namespace the mount has been marked with. This can be used for additional permission checking and also to enable filesystems to translate between uids and gids if they need to. We have implemented all relevant helpers in earlier patches. As requested we simply extend the exisiting inode method instead of introducing new ones. This is a little more code churn but it's mostly mechanical and doesnt't leave us with additional inode methods. Link: https://lore.kernel.org/r/20210121131959.646623-25-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
|
#
0d56a451 |
|
21-Jan-2021 |
Christian Brauner <christian.brauner@ubuntu.com> |
stat: handle idmapped mounts The generic_fillattr() helper fills in the basic attributes associated with an inode. Enable it to handle idmapped mounts. If the inode is accessed through an idmapped mount map it into the mount's user namespace before we store the uid and gid. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Link: https://lore.kernel.org/r/20210121131959.646623-12-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
|
#
bf701b76 |
|
26-Nov-2020 |
NeilBrown <neilb@suse.de> |
NFS: switch nfsiod to be an UNBOUND workqueue. nfsiod is currently a concurrency-managed workqueue (CMWQ). This means that workitems scheduled to nfsiod on a given CPU are queued behind all other work items queued on any CMWQ on the same CPU. This can introduce unexpected latency. Occaionally nfsiod can even cause excessive latency. If the work item to complete a CLOSE request calls the final iput() on an inode, the address_space of that inode will be dismantled. This takes time proportional to the number of in-memory pages, which on a large host working on large files (e.g.. 5TB), can be a large number of pages resulting in a noticable number of seconds. We can avoid these latency problems by switching nfsiod to WQ_UNBOUND. This causes each concurrent work item to gets a dedicated thread which can be scheduled to an idle CPU. There is precedent for this as several other filesystems use WQ_UNBOUND workqueue for handling various async events. Signed-off-by: NeilBrown <neilb@suse.de> Fixes: ada609ee2ac2 ("workqueue: use WQ_MEM_RECLAIM instead of WQ_RESCUER") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
b593c09f |
|
02-Nov-2020 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Improve handling of directory verifiers If the server insists on using the readdir verifiers in order to allow cookies to expire, then we should ensure that we cache the verifier with the cookie, so that we can return an error if the application tries to use the expired cookie. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com>
|
#
95ad37f9 |
|
23-Jun-2020 |
Frank van der Linden <fllinden@amazon.com> |
NFSv4.2: add client side xattr caching. Implement client side caching for NFSv4.2 extended attributes. The cache is a per-inode hashtable, with name/value entries. There is one special entry for the listxattr cache. NFS inodes have a pointer to a cache structure. The cache structure is allocated on demand, freed when the cache is invalidated. Memory shrinkers keep the size in check. Large entries (> PAGE_SIZE) are collected by a separate shrinker, and freed more aggressively than others. Signed-off-by: Frank van der Linden <fllinden@amazon.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
0f44da51 |
|
23-Jun-2020 |
Frank van der Linden <fllinden@amazon.com> |
nfs: define and use the NFS_INO_INVALID_XATTR flag Define the NFS_INO_INVALID_XATTR flag, to be used for the NFSv4.2 xattr cache, and use it where appropriate. No functional change as yet. Signed-off-by: Frank van der Linden <fllinden@amazon.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
ac7cbb22 |
|
04-Jun-2020 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Allow applications to speed up readdir+statx() using AT_STATX_DONT_SYNC If the application uses the AT_STATX_DONT_SYNC flag after doing readdir(), then we should still mark the parent inode as seeing a readdirplus hit. That ensures that we continue to use readdirplus in the 'ls -l' type of workflow to do fast lookups of the dentries. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
3a39e778 |
|
21-May-2020 |
Zheng Bin <zhengbin13@huawei.com> |
nfs: set invalid blocks after NFSv4 writes Use the following command to test nfsv4(size of file1M is 1MB): mount -t nfs -o vers=4.0,actimeo=60 127.0.0.1/dir1 /mnt cp file1M /mnt du -h /mnt/file1M -->0 within 60s, then 1M When write is done(cp file1M /mnt), will call this: nfs_writeback_done nfs4_write_done nfs4_write_done_cb nfs_writeback_update_inode nfs_post_op_update_inode_force_wcc_locked(change, ctime, mtime nfs_post_op_update_inode_force_wcc_locked nfs_set_cache_invalid nfs_refresh_inode_locked nfs_update_inode nfsd write response contains change, ctime, mtime, the flag will be clear after nfs_update_inode. Howerver, write response does not contain space_used, previous open response contains space_used whose value is 0, so inode->i_blocks is still 0. nfs_getattr -->called by "du -h" do_update |= force_sync || nfs_attribute_cache_expired -->false in 60s cache_validity = READ_ONCE(NFS_I(inode)->cache_validity) do_update |= cache_validity & (NFS_INO_INVALID_ATTR -->false if (do_update) { __nfs_revalidate_inode } Within 60s, does not send getattr request to nfsd, thus "du -h /mnt/file1M" is 0. Add a NFS_INO_INVALID_BLOCKS flag, set it when nfsv4 write is done. Fixes: 16e143751727 ("NFS: More fine grained attribute tracking") Signed-off-by: Zheng Bin <zhengbin13@huawei.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
93ce4af7 |
|
06-Apr-2020 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Clean up process of marking inode stale. Instead of the various open coded calls to set the NFS_INO_STALE bit and call nfs_zap_caches(), consolidate them into a single function nfs_set_inode_stale(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
1d179d6b |
|
07-Feb-2020 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: alloc_nfs_open_context() must use the file cred when available If we're creating a nfs_open_context() for a specific file pointer, we must use the cred assigned to that file. Fixes: a52458b48af1 ("NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'.") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
efeda80d |
|
05-Feb-2020 |
Trond Myklebust <trondmy@gmail.com> |
NFSv4: Fix revalidation of dentries with delegations If a dentry was not initially looked up while we were holding a delegation, then we do still need to revalidate that it still holds the same name. If there are multiple hard links to the same file, then all the hard links need validation. Reported-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> [Anna: Put nfs_unset_verifier_delegated() under CONFIG_NFS_V4] Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
65f51603 |
|
26-Jan-2020 |
Trond Myklebust <trondmy@gmail.com> |
NFS: nfs_find_open_context() should use cred_fscmp() We want to find open contexts that match our filesystem access properties. They don't have to exactly match the cred. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
c74dfe97 |
|
06-Jan-2020 |
Trond Myklebust <trondmy@gmail.com> |
NFS: Add mount option 'softreval' Add a mount option 'softreval' that allows attribute revalidation 'getattr' calls to time out, and causes them to fall back to using the cached attributes. The use case for this option is for ensuring that we can still (slowly) traverse paths and use cached information even when the server is down. Once the server comes back up again, the getattr calls start succeeding, and the caches will revalidate as usual. The 'softreval' mount option is automatically enabled if you have specified 'softerr'. It can be turned off using the options 'nosoftreval', or 'hard'. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
e86d5a02 |
|
04-Oct-2019 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Convert struct nfs_fattr to use struct timespec64 NFSv4 supports 64-bit times, so we should switch to using struct timespec64 when decoding attributes. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
eb3d8f42 |
|
28-Aug-2019 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Fix inode fileid checks in attribute revalidation code We want to throw out the attrbute if it refers to the mounted on fileid, and not the real fileid. However we do not want to block cache consistency updates from NFSv4 writes. Reported-by: Murphy Zhou <jencce.kernel@gmail.com> Fixes: 7e10cc25bfa0 ("NFS: Don't refresh attributes with mounted-on-file...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
7e10cc25 |
|
08-Aug-2019 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Don't refresh attributes with mounted-on-file information If we've been given the attributes of the mounted-on-file, then do not use those to check or update the attributes on the application-visible inode. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
1c341b77 |
|
22-May-2019 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Add deferred cache invalidation for close-to-open consistency violations If the client detects that close-to-open cache consistency has been violated, and that the file or directory has been changed on the server, then do a cache invalidation when we're done working with the file. The reason we don't do an immediate cache invalidation is that we want to avoid performance problems due to false positives. Also, note that we cannot guarantee cache consistency in this situation even if we do invalidate the cache. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
10b7a70c |
|
06-Feb-2019 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Cleanup - add nfs_clients_exit to mirror nfs_clients_init Add a helper to clean up the struct nfs_net when it is being destroyed. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
996bc4f4 |
|
24-Jan-2019 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Create a root NFS directory in /sys/fs/nfs Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
44942b4e |
|
27-Jun-2019 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFSv4: Handle the special Linux file open access mode According to the open() manpage, Linux reserves the access mode 3 to mean "check for read and write permission on the file and return a file descriptor that can't be used for reading or writing." Currently, the NFSv4 code will ask the server to open the file, and will use an incorrect share access mode of 0. Since it has an incorrect share access mode, the client later forgets to send a corresponding close, meaning it can leak stateids on the server. Fixes: ce4ef7c0a8a05 ("NFS: Split out NFS v4 file operations") Cc: stable@vger.kernel.org # 3.6+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
09c434b8 |
|
19-May-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Add SPDX license identifier for more missed files Add SPDX license identifiers to all files which: - Have no license information of any form - Have MODULE_LICENCE("GPL*") inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
ca1a199e |
|
15-Apr-2019 |
Al Viro <viro@zeniv.linux.org.uk> |
nfs{,4}: switch to ->free_inode() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
15494511 |
|
07-Apr-2019 |
Trond Myklebust <trondmy@gmail.com> |
NFS: Ensure that all nfs lock contexts have a valid open context Force the lock context to keep a reference to the parent open context so that we can guarantee the validity of the latter. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
302fad7b |
|
18-Feb-2019 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Fix up documentation warnings Fix up some compiler warnings about function parameters, etc not being correctly described or formatted. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
a52458b4 |
|
02-Dec-2018 |
NeilBrown <neilb@suse.com> |
NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'. SUNRPC has two sorts of credentials, both of which appear as "struct rpc_cred". There are "generic credentials" which are supplied by clients such as NFS and passed in 'struct rpc_message' to indicate which user should be used to authorize the request, and there are low-level credentials such as AUTH_NULL, AUTH_UNIX, AUTH_GSS which describe the credential to be sent over the wires. This patch replaces all the generic credentials by 'struct cred' pointers - the credential structure used throughout Linux. For machine credentials, there is a special 'struct cred *' pointer which is statically allocated and recognized where needed as having a special meaning. A look-up of a low-level cred will map this to a machine credential. Signed-off-by: NeilBrown <neilb@suse.com> Acked-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
ddf529ee |
|
02-Dec-2018 |
NeilBrown <neilb@suse.com> |
NFS: move credential expiry tracking out of SUNRPC into NFS. NFS needs to know when a credential is about to expire so that it can modify write-back behaviour to finish the write inside the expiry time. It currently uses functions in SUNRPC code which make use of a fairly complex callback scheme and flags in the generic credientials. As I am working to discard the generic credentials, this has to change. This patch moves the logic into NFS, in part by finding and caching the low-level credential in the open_context. We then make direct cred-api calls on that. This makes the code much simpler and removes a dependency on generic rpc credentials. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
0de43976 |
|
02-Sep-2018 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Convert lookups of the open context to RCU Reduce contention on the inode->i_lock by ensuring that we use RCU when looking up the NFS open context. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
6ba0c4e5 |
|
02-Sep-2018 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Simplify internal check for whether file is open for write Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
1db97eaa |
|
02-Sep-2018 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Convert lookups of the lock context to RCU Speed up lookups of an existing lock context by avoiding the inode->i_lock, and using RCU instead. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
95582b00 |
|
08-May-2018 |
Deepa Dinamani <deepa.kernel@gmail.com> |
vfs: change inode times to use struct timespec64 struct timespec is not y2038 safe. Transition vfs to use y2038 safe struct timespec64 instead. The change was made with the help of the following cocinelle script. This catches about 80% of the changes. All the header file and logic changes are included in the first 5 rules. The rest are trivial substitutions. I avoid changing any of the function signatures or any other filesystem specific data structures to keep the patch simple for review. The script can be a little shorter by combining different cases. But, this version was sufficient for my usecase. virtual patch @ depends on patch @ identifier now; @@ - struct timespec + struct timespec64 current_time ( ... ) { - struct timespec now = current_kernel_time(); + struct timespec64 now = current_kernel_time64(); ... - return timespec_trunc( + return timespec64_trunc( ... ); } @ depends on patch @ identifier xtime; @@ struct \( iattr \| inode \| kstat \) { ... - struct timespec xtime; + struct timespec64 xtime; ... } @ depends on patch @ identifier t; @@ struct inode_operations { ... int (*update_time) (..., - struct timespec t, + struct timespec64 t, ...); ... } @ depends on patch @ identifier t; identifier fn_update_time =~ "update_time$"; @@ fn_update_time (..., - struct timespec *t, + struct timespec64 *t, ...) { ... } @ depends on patch @ identifier t; @@ lease_get_mtime( ... , - struct timespec *t + struct timespec64 *t ) { ... } @te depends on patch forall@ identifier ts; local idexpression struct inode *inode_node; identifier i_xtime =~ "^i_[acm]time$"; identifier ia_xtime =~ "^ia_[acm]time$"; identifier fn_update_time =~ "update_time$"; identifier fn; expression e, E3; local idexpression struct inode *node1; local idexpression struct inode *node2; local idexpression struct iattr *attr1; local idexpression struct iattr *attr2; local idexpression struct iattr attr; identifier i_xtime1 =~ "^i_[acm]time$"; identifier i_xtime2 =~ "^i_[acm]time$"; identifier ia_xtime1 =~ "^ia_[acm]time$"; identifier ia_xtime2 =~ "^ia_[acm]time$"; @@ ( ( - struct timespec ts; + struct timespec64 ts; | - struct timespec ts = current_time(inode_node); + struct timespec64 ts = current_time(inode_node); ) <+... when != ts ( - timespec_equal(&inode_node->i_xtime, &ts) + timespec64_equal(&inode_node->i_xtime, &ts) | - timespec_equal(&ts, &inode_node->i_xtime) + timespec64_equal(&ts, &inode_node->i_xtime) | - timespec_compare(&inode_node->i_xtime, &ts) + timespec64_compare(&inode_node->i_xtime, &ts) | - timespec_compare(&ts, &inode_node->i_xtime) + timespec64_compare(&ts, &inode_node->i_xtime) | ts = current_time(e) | fn_update_time(..., &ts,...) | inode_node->i_xtime = ts | node1->i_xtime = ts | ts = inode_node->i_xtime | <+... attr1->ia_xtime ...+> = ts | ts = attr1->ia_xtime | ts.tv_sec | ts.tv_nsec | btrfs_set_stack_timespec_sec(..., ts.tv_sec) | btrfs_set_stack_timespec_nsec(..., ts.tv_nsec) | - ts = timespec64_to_timespec( + ts = ... -) | - ts = ktime_to_timespec( + ts = ktime_to_timespec64( ...) | - ts = E3 + ts = timespec_to_timespec64(E3) | - ktime_get_real_ts(&ts) + ktime_get_real_ts64(&ts) | fn(..., - ts + timespec64_to_timespec(ts) ,...) ) ...+> ( <... when != ts - return ts; + return timespec64_to_timespec(ts); ...> ) | - timespec_equal(&node1->i_xtime1, &node2->i_xtime2) + timespec64_equal(&node1->i_xtime2, &node2->i_xtime2) | - timespec_equal(&node1->i_xtime1, &attr2->ia_xtime2) + timespec64_equal(&node1->i_xtime2, &attr2->ia_xtime2) | - timespec_compare(&node1->i_xtime1, &node2->i_xtime2) + timespec64_compare(&node1->i_xtime1, &node2->i_xtime2) | node1->i_xtime1 = - timespec_trunc(attr1->ia_xtime1, + timespec64_trunc(attr1->ia_xtime1, ...) | - attr1->ia_xtime1 = timespec_trunc(attr2->ia_xtime2, + attr1->ia_xtime1 = timespec64_trunc(attr2->ia_xtime2, ...) | - ktime_get_real_ts(&attr1->ia_xtime1) + ktime_get_real_ts64(&attr1->ia_xtime1) | - ktime_get_real_ts(&attr.ia_xtime1) + ktime_get_real_ts64(&attr.ia_xtime1) ) @ depends on patch @ struct inode *node; struct iattr *attr; identifier fn; identifier i_xtime =~ "^i_[acm]time$"; identifier ia_xtime =~ "^ia_[acm]time$"; expression e; @@ ( - fn(node->i_xtime); + fn(timespec64_to_timespec(node->i_xtime)); | fn(..., - node->i_xtime); + timespec64_to_timespec(node->i_xtime)); | - e = fn(attr->ia_xtime); + e = fn(timespec64_to_timespec(attr->ia_xtime)); ) @ depends on patch forall @ struct inode *node; struct iattr *attr; identifier i_xtime =~ "^i_[acm]time$"; identifier ia_xtime =~ "^ia_[acm]time$"; identifier fn; @@ { + struct timespec ts; <+... ( + ts = timespec64_to_timespec(node->i_xtime); fn (..., - &node->i_xtime, + &ts, ...); | + ts = timespec64_to_timespec(attr->ia_xtime); fn (..., - &attr->ia_xtime, + &ts, ...); ) ...+> } @ depends on patch forall @ struct inode *node; struct iattr *attr; struct kstat *stat; identifier ia_xtime =~ "^ia_[acm]time$"; identifier i_xtime =~ "^i_[acm]time$"; identifier xtime =~ "^[acm]time$"; identifier fn, ret; @@ { + struct timespec ts; <+... ( + ts = timespec64_to_timespec(node->i_xtime); ret = fn (..., - &node->i_xtime, + &ts, ...); | + ts = timespec64_to_timespec(node->i_xtime); ret = fn (..., - &node->i_xtime); + &ts); | + ts = timespec64_to_timespec(attr->ia_xtime); ret = fn (..., - &attr->ia_xtime, + &ts, ...); | + ts = timespec64_to_timespec(attr->ia_xtime); ret = fn (..., - &attr->ia_xtime); + &ts); | + ts = timespec64_to_timespec(stat->xtime); ret = fn (..., - &stat->xtime); + &ts); ) ...+> } @ depends on patch @ struct inode *node; struct inode *node2; identifier i_xtime1 =~ "^i_[acm]time$"; identifier i_xtime2 =~ "^i_[acm]time$"; identifier i_xtime3 =~ "^i_[acm]time$"; struct iattr *attrp; struct iattr *attrp2; struct iattr attr ; identifier ia_xtime1 =~ "^ia_[acm]time$"; identifier ia_xtime2 =~ "^ia_[acm]time$"; struct kstat *stat; struct kstat stat1; struct timespec64 ts; identifier xtime =~ "^[acmb]time$"; expression e; @@ ( ( node->i_xtime2 \| attrp->ia_xtime2 \| attr.ia_xtime2 \) = node->i_xtime1 ; | node->i_xtime2 = \( node2->i_xtime1 \| timespec64_trunc(...) \); | node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \); | node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \); | stat->xtime = node2->i_xtime1; | stat1.xtime = node2->i_xtime1; | ( node->i_xtime2 \| attrp->ia_xtime2 \) = attrp->ia_xtime1 ; | ( attrp->ia_xtime1 \| attr.ia_xtime1 \) = attrp2->ia_xtime2; | - e = node->i_xtime1; + e = timespec64_to_timespec( node->i_xtime1 ); | - e = attrp->ia_xtime1; + e = timespec64_to_timespec( attrp->ia_xtime1 ); | node->i_xtime1 = current_time(...); | node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = - e; + timespec_to_timespec64(e); | node->i_xtime1 = node->i_xtime3 = - e; + timespec_to_timespec64(e); | - node->i_xtime1 = e; + node->i_xtime1 = timespec_to_timespec64(e); ) Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Cc: <anton@tuxera.com> Cc: <balbi@kernel.org> Cc: <bfields@fieldses.org> Cc: <darrick.wong@oracle.com> Cc: <dhowells@redhat.com> Cc: <dsterba@suse.com> Cc: <dwmw2@infradead.org> Cc: <hch@lst.de> Cc: <hirofumi@mail.parknet.co.jp> Cc: <hubcap@omnibond.com> Cc: <jack@suse.com> Cc: <jaegeuk@kernel.org> Cc: <jaharkes@cs.cmu.edu> Cc: <jslaby@suse.com> Cc: <keescook@chromium.org> Cc: <mark@fasheh.com> Cc: <miklos@szeredi.hu> Cc: <nico@linaro.org> Cc: <reiserfs-devel@vger.kernel.org> Cc: <richard@nod.at> Cc: <sage@redhat.com> Cc: <sfrench@samba.org> Cc: <swhiteho@redhat.com> Cc: <tj@kernel.org> Cc: <trond.myklebust@primarydata.com> Cc: <tytso@mit.edu> Cc: <viro@zeniv.linux.org.uk>
|
#
3f0b3cf4 |
|
03-Jun-2018 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Filter cache invalidation when holding a delegation If the client holds a delegation, then ensure we filter out attempts to invalidate the size, owner, group owner, or mode unless we made the change, in which case, check that NFS_INO_REVAL_FORCED is set by the caller. Always filter out attempts to invalidate the change attribute and size, since we are authoritative for those. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
4ebe83af |
|
02-Jun-2018 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Ignore NFS_INO_REVAL_FORCED in nfs_check_inode_attributes() If we hold a delegation, we should not need to call nfs_check_inode_attributes() since we already know which attributes are valid, and which ones may still need revalidation. The state of the NFS_INO_REVAL_FORCED flag is therefore irrelevant. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
c80d17c5 |
|
03-Jun-2018 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Improve caching while holding a delegation Make sure that the client completely ignores change attribute and size changes on the server when it holds a delegation. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
0b467264 |
|
03-Jun-2018 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Fix attribute revalidation Don't mark attributes as invalid just because they have changed. Instead, for the purposes of adjusting the attribute cache timeout, keep a separate variable that tracks whether or not a change occurred. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
6a97d02d |
|
08-Apr-2018 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: fix up nfs_setattr_update_inode Always try to set the attributes, even if we don't have a valid struct nfs_fattr. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
a841b54d |
|
07-Apr-2018 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Pass the inode down to the getattr() callback Allow the getattr() callback to check things like whether or not we hold a delegation so that it can adjust the attributes that it is asking for. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
d554168f |
|
29-May-2018 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Fix up nfs_post_op_update_inode() to force ctime updates We do not want to ignore ctime updates that originate from functions such as link(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
821a868a |
|
27-Mar-2018 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Set the force revalidate flag if the inode is not completely initialised Ensure that a delegation doesn't cause us to skip initialising the inode if it was incomplete when we exited nfs_fhget() Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
0a2dfbec |
|
09-May-2018 |
Deepa Dinamani <deepa.kernel@gmail.com> |
fs: nfs: get rid of memcpys for inode times Subsequent patches in the series convert inode timestamps to use struct timespec64 instead of struct timespec as part of solving the y2038 problem. This will lead to type mismatch for memcpys. Use regular assignments instead. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Cc: trond.myklebust@primarydata.com
|
#
f6cdfa6d |
|
27-Mar-2018 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFSv4: Declare the size up to date after it was set. When we've changed the file size, then ensure we declare it to be up to date in the inode attributes. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
d943f2dd |
|
20-Mar-2018 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFSv4: Ignore change attribute invalidations if we hold a delegation Don't bother even recording an invalid change attribute if we hold a delegation since we already know the state of our attribute cache. We can rely on the fact that we will pick up a copy from the server when we return the delegation. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
16e14375 |
|
20-Mar-2018 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: More fine grained attribute tracking Currently, if the NFS_INO_INVALID_ATTR flag is set, for instance by a call to nfs_post_op_update_inode_locked(), then it will not be cleared until all the attributes have been revalidated. This means, for instance, that NFSv4 writes will always force a full attribute revalidation. Track the ctime, mtime, size and change attribute separately from the other attributes so that we can have nfs_post_op_update_inode_locked() set them correctly, and later have the cache consistency bitmask be able to clear them. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
cac88f94 |
|
20-Mar-2018 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Don't force unnecessary cache invalidation in nfs_update_inode() If we managed to revalidate all the attributes, then there is no reason to mark them as invalid again. We do, however want to ensure that we set nfsi->attrtimeo correctly. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
783b194c |
|
20-Mar-2018 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Don't redirty the attribute cache in nfs_wcc_update_inode() If we received weak cache consistency data from the server, then those attributes are up to date, and there is no reason to mark them as dirty in the attribute cache. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
8619ddd0 |
|
20-Mar-2018 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Don't force a revalidation of all attributes if change is missing Even if the change attribute is missing, it is still OK to mark the other attributes as being up to date. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
199366f0 |
|
20-Mar-2018 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Move the delegation return down into _nfs4_do_setattr() Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
2f635cee |
|
27-Mar-2018 |
Kirill Tkhai <ktkhai@virtuozzo.com> |
net: Drop pernet_operations::async Synchronous pernet_operations are not allowed anymore. All are asynchronous. So, drop the structure member. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
723c921e |
|
15-Mar-2018 |
Peter Zijlstra <peterz@infradead.org> |
sched/wait, fs/nfs: Convert wait_on_atomic_t() usage to the new wait_var_event() API The old wait_on_atomic_t() is going to get removed, use the more flexible wait_var_event() API instead. No change in functionality. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Anna Schumaker <anna.schumaker@netapp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
7300bd94 |
|
26-Feb-2018 |
Kirill Tkhai <ktkhai@virtuozzo.com> |
net: Convert nfs_net_ops These pernet_operations just create and destroy /proc entries and net_generic()->cb_ident_idr IDR. So, we are able to mark them async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c472c07b |
|
01-Feb-2018 |
Goffredo Baroncelli <kreijack@inwind.it> |
iversion: Rename make inode_cmp_iversion{+raw} to inode_eq_iversion{+raw} The function inode_cmp_iversion{+raw} is counter-intuitive, because it returns true when the counters are different and false when these are equal. Rename it to inode_eq_iversion{+raw}, which will returns true when the counters are equal and false otherwise. Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it> Signed-off-by: Jeff Layton <jlayton@redhat.com>
|
#
1eb5d98f |
|
09-Jan-2018 |
Jeff Layton <jlayton@kernel.org> |
nfs: convert to new i_version API For NFS, we just use the "raw" API since the i_version is mostly managed by the server. The exception there is when the client holds a write delegation, but we only need to bump it once there anyway to handle CB_GETATTR. Tested-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Jeff Layton <jlayton@redhat.com>
|
#
128159f2 |
|
28-Jan-2018 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Remove a redundant call to unmap_mapping_range() We don't need to call unmap_mapping_range() prior to calling nfs_sync_mapping(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
9ccee940 |
|
04-Jan-2018 |
Trond Myklebust <trond.myklebust@primarydata.com> |
Support statx() mask and query flags parameters Support the query flags AT_STATX_FORCE_SYNC by forcing an attribute revalidation, and AT_STATX_DONT_SYNC by returning cached attributes only. Use the mask to optimise away server revalidation for attributes that are not being requested by the user. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
1751e8a6 |
|
27-Nov-2017 |
Linus Torvalds <torvalds@linux-foundation.org> |
Rename superblock flags (MS_xyz -> SB_xyz) This is a pure automated search-and-replace of the internal kernel superblock flags. The s_flags are now called SB_*, with the names and the values for the moment mirroring the MS_* flags that they're equivalent to. Note how the MS_xyz flags are the ones passed to the mount system call, while the SB_xyz flags are what we then use in sb->s_flags. The script to do this was: # places to look in; re security/*: it generally should *not* be # touched (that stuff parses mount(2) arguments directly), but # there are two places where we really deal with superblock flags. FILES="drivers/mtd drivers/staging/lustre fs ipc mm \ include/linux/fs.h include/uapi/linux/bfs_fs.h \ security/apparmor/apparmorfs.c security/apparmor/include/lib.h" # the list of MS_... constants SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \ DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \ POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \ I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \ ACTIVE NOUSER" SED_PROG= for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done # we want files that contain at least one of MS_..., # with fs/namespace.c and fs/pnode.c excluded. L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c') for f in $L; do sed -i $f $SED_PROG; done Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
b0b5352d |
|
12-Nov-2017 |
Vasily Averin <vvs@virtuozzo.com> |
nfs client: exit_net cleanup check added Be sure that nfs_client_list and nfs_volume_list lists initialized in net_init hook were return to initial state in net_exit hook. Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
2f62b5aa |
|
19-Oct-2017 |
Elena Reshetova <elena.reshetova@intel.com> |
fs, nfs: convert nfs_lock_context.count from atomic_t to refcount_t atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable nfs_lock_context.count is used as pure reference counter. Convert it to refcount_t and fix up the operations. Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
5e4def20 |
|
02-Nov-2017 |
David Howells <dhowells@redhat.com> |
Pass mode to wait_on_atomic_t() action funcs and provide default actions Make wait_on_atomic_t() pass the TASK_* mode onto its action function as an extra argument and make it 'unsigned int throughout. Also, consolidate a bunch of identical action functions into a default function that can do the appropriate thing for the mode. Also, change the argument name in the bit_wait*() function declarations to reflect the fact that it's the mode and not the bit number. [Peter Z gives this a grudging ACK, but thinks that the whole atomic_t wait should be done differently, though he's not immediately sure as to how] Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> cc: Ingo Molnar <mingo@kernel.org>
|
#
5cb953d4 |
|
01-Aug-2017 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Use an atomic_long_t to count the number of commits Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
a6b6d5b8 |
|
01-Aug-2017 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Use an atomic_long_t to count the number of requests Rather than forcing us to take the inode->i_lock just in order to bump the number. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
e824f99a |
|
01-Aug-2017 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFSv4: Use a mutex to protect the per-inode commit lists The commit lists can get very large, so using the inode->i_lock can end up affecting general metadata performance. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
f174ff7a |
|
29-Jun-2017 |
Peng Tao <tao.peng@primarydata.com> |
nfs: add a nfs_ilookup helper This helper will allow to find an existing NFS inode by the file handle and fattr. Signed-off-by: Peng Tao <tao.peng@primarydata.com> [hch: split from a larger patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
26fde4df |
|
02-Jul-2017 |
NeilBrown <neilb@suse.com> |
NFS: check for nfs_refresh_inode() errors in nfs_fhget() If an NFS server returns a filehandle that we have previously seen, and reports a different type, then nfs_refresh_inode() will log a warning and return an error. nfs_fhget() does not check for this error and may return an inode with a different type than the one that the server reported. This is likely to cause confusion, and is one way that ->open_context() could return a directory inode as discussed in the previous patch. So if nfs_refresh_inode() returns and error, return that error from nfs_fhget() to avoid the confusion propagating. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
cc89684c |
|
04-Jul-2017 |
NeilBrown <neilb@suse.com> |
NFS: only invalidate dentrys that are clearly invalid. Since commit bafc9b754f75 ("vfs: More precise tests in d_invalidate") in v3.18, a return of '0' from ->d_revalidate() will cause the dentry to be invalidated even if it has filesystems mounted on or it or on a descendant. The mounted filesystem is unmounted. This means we need to be careful not to return 0 unless the directory referred to truly is invalid. So -ESTALE or -ENOENT should invalidate the directory. Other errors such a -EPERM or -ERESTARTSYS should be returned from ->d_revalidate() so they are propagated to the caller. A particular problem can be demonstrated by: 1/ mount an NFS filesystem using NFSv3 on /mnt 2/ mount any other filesystem on /mnt/foo 3/ ls /mnt/foo 4/ turn off network, or otherwise make the server unable to respond 5/ ls /mnt/foo & 6/ cat /proc/$!/stack # note that nfs_lookup_revalidate is in the call stack 7/ kill -9 $! # this results in -ERESTARTSYS being returned 8/ observe that /mnt/foo has been unmounted. This patch changes nfs_lookup_revalidate() to only treat -ESTALE from nfs_lookup_verify_inode() and -ESTALE or -ENOENT from ->lookup() as indicating an invalid inode. Other errors are returned. Also nfs_check_inode_attributes() is changed to return -ESTALE rather than -EIO. This is consistent with the error returned in similar circumstances from nfs_update_inode(). As this bug allows any user to unmount a filesystem mounted on an NFS filesystem, this fix is suitable for stable kernels. Fixes: bafc9b754f75 ("vfs: More precise tests in d_invalidate") Cc: stable@vger.kernel.org (v3.18+) Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
59b86d85 |
|
28-Apr-2017 |
Hou Tao <houtao1@huawei.com> |
NFS: always treat the invocation of nfs_getattr as cache hit when noac is on When using 'ls -l' to display a large directory, if noac option is used, in function nfs_getattr() nfs_need_revalidate_inode() will always be true for NFSv3 and the nfs_entry cache of the directory will be flushed. The flush will lead to a fully reread of the directory entries from server. To prevent the unnecessary RPCs, we need to check whether or not the noac option is used, and always report the invocation of nfs_getattr() as cache hit instead cache miss when it's on. Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
a528d35e |
|
31-Jan-2017 |
David Howells <dhowells@redhat.com> |
statx: Add a system call to make enhanced file info available Add a system call to make extended file information available, including file creation and some attribute flags where available through the underlying filesystem. The getattr inode operation is altered to take two additional arguments: a u32 request_mask and an unsigned int flags that indicate the synchronisation mode. This change is propagated to the vfs_getattr*() function. Functions like vfs_stat() are now inline wrappers around new functions vfs_statx() and vfs_statx_fd() to reduce stack usage. ======== OVERVIEW ======== The idea was initially proposed as a set of xattrs that could be retrieved with getxattr(), but the general preference proved to be for a new syscall with an extended stat structure. A number of requests were gathered for features to be included. The following have been included: (1) Make the fields a consistent size on all arches and make them large. (2) Spare space, request flags and information flags are provided for future expansion. (3) Better support for the y2038 problem [Arnd Bergmann] (tv_sec is an __s64). (4) Creation time: The SMB protocol carries the creation time, which could be exported by Samba, which will in turn help CIFS make use of FS-Cache as that can be used for coherency data (stx_btime). This is also specified in NFSv4 as a recommended attribute and could be exported by NFSD [Steve French]. (5) Lightweight stat: Ask for just those details of interest, and allow a netfs (such as NFS) to approximate anything not of interest, possibly without going to the server [Trond Myklebust, Ulrich Drepper, Andreas Dilger] (AT_STATX_DONT_SYNC). (6) Heavyweight stat: Force a netfs to go to the server, even if it thinks its cached attributes are up to date [Trond Myklebust] (AT_STATX_FORCE_SYNC). And the following have been left out for future extension: (7) Data version number: Could be used by userspace NFS servers [Aneesh Kumar]. Can also be used to modify fill_post_wcc() in NFSD which retrieves i_version directly, but has just called vfs_getattr(). It could get it from the kstat struct if it used vfs_xgetattr() instead. (There's disagreement on the exact semantics of a single field, since not all filesystems do this the same way). (8) BSD stat compatibility: Including more fields from the BSD stat such as creation time (st_btime) and inode generation number (st_gen) [Jeremy Allison, Bernd Schubert]. (9) Inode generation number: Useful for FUSE and userspace NFS servers [Bernd Schubert]. (This was asked for but later deemed unnecessary with the open-by-handle capability available and caused disagreement as to whether it's a security hole or not). (10) Extra coherency data may be useful in making backups [Andreas Dilger]. (No particular data were offered, but things like last backup timestamp, the data version number and the DOS archive bit would come into this category). (11) Allow the filesystem to indicate what it can/cannot provide: A filesystem can now say it doesn't support a standard stat feature if that isn't available, so if, for instance, inode numbers or UIDs don't exist or are fabricated locally... (This requires a separate system call - I have an fsinfo() call idea for this). (12) Store a 16-byte volume ID in the superblock that can be returned in struct xstat [Steve French]. (Deferred to fsinfo). (13) Include granularity fields in the time data to indicate the granularity of each of the times (NFSv4 time_delta) [Steve French]. (Deferred to fsinfo). (14) FS_IOC_GETFLAGS value. These could be translated to BSD's st_flags. Note that the Linux IOC flags are a mess and filesystems such as Ext4 define flags that aren't in linux/fs.h, so translation in the kernel may be a necessity (or, possibly, we provide the filesystem type too). (Some attributes are made available in stx_attributes, but the general feeling was that the IOC flags were to ext[234]-specific and shouldn't be exposed through statx this way). (15) Mask of features available on file (eg: ACLs, seclabel) [Brad Boyer, Michael Kerrisk]. (Deferred, probably to fsinfo. Finding out if there's an ACL or seclabal might require extra filesystem operations). (16) Femtosecond-resolution timestamps [Dave Chinner]. (A __reserved field has been left in the statx_timestamp struct for this - if there proves to be a need). (17) A set multiple attributes syscall to go with this. =============== NEW SYSTEM CALL =============== The new system call is: int ret = statx(int dfd, const char *filename, unsigned int flags, unsigned int mask, struct statx *buffer); The dfd, filename and flags parameters indicate the file to query, in a similar way to fstatat(). There is no equivalent of lstat() as that can be emulated with statx() by passing AT_SYMLINK_NOFOLLOW in flags. There is also no equivalent of fstat() as that can be emulated by passing a NULL filename to statx() with the fd of interest in dfd. Whether or not statx() synchronises the attributes with the backing store can be controlled by OR'ing a value into the flags argument (this typically only affects network filesystems): (1) AT_STATX_SYNC_AS_STAT tells statx() to behave as stat() does in this respect. (2) AT_STATX_FORCE_SYNC will require a network filesystem to synchronise its attributes with the server - which might require data writeback to occur to get the timestamps correct. (3) AT_STATX_DONT_SYNC will suppress synchronisation with the server in a network filesystem. The resulting values should be considered approximate. mask is a bitmask indicating the fields in struct statx that are of interest to the caller. The user should set this to STATX_BASIC_STATS to get the basic set returned by stat(). It should be noted that asking for more information may entail extra I/O operations. buffer points to the destination for the data. This must be 256 bytes in size. ====================== MAIN ATTRIBUTES RECORD ====================== The following structures are defined in which to return the main attribute set: struct statx_timestamp { __s64 tv_sec; __s32 tv_nsec; __s32 __reserved; }; struct statx { __u32 stx_mask; __u32 stx_blksize; __u64 stx_attributes; __u32 stx_nlink; __u32 stx_uid; __u32 stx_gid; __u16 stx_mode; __u16 __spare0[1]; __u64 stx_ino; __u64 stx_size; __u64 stx_blocks; __u64 __spare1[1]; struct statx_timestamp stx_atime; struct statx_timestamp stx_btime; struct statx_timestamp stx_ctime; struct statx_timestamp stx_mtime; __u32 stx_rdev_major; __u32 stx_rdev_minor; __u32 stx_dev_major; __u32 stx_dev_minor; __u64 __spare2[14]; }; The defined bits in request_mask and stx_mask are: STATX_TYPE Want/got stx_mode & S_IFMT STATX_MODE Want/got stx_mode & ~S_IFMT STATX_NLINK Want/got stx_nlink STATX_UID Want/got stx_uid STATX_GID Want/got stx_gid STATX_ATIME Want/got stx_atime{,_ns} STATX_MTIME Want/got stx_mtime{,_ns} STATX_CTIME Want/got stx_ctime{,_ns} STATX_INO Want/got stx_ino STATX_SIZE Want/got stx_size STATX_BLOCKS Want/got stx_blocks STATX_BASIC_STATS [The stuff in the normal stat struct] STATX_BTIME Want/got stx_btime{,_ns} STATX_ALL [All currently available stuff] stx_btime is the file creation time, stx_mask is a bitmask indicating the data provided and __spares*[] are where as-yet undefined fields can be placed. Time fields are structures with separate seconds and nanoseconds fields plus a reserved field in case we want to add even finer resolution. Note that times will be negative if before 1970; in such a case, the nanosecond fields will also be negative if not zero. The bits defined in the stx_attributes field convey information about a file, how it is accessed, where it is and what it does. The following attributes map to FS_*_FL flags and are the same numerical value: STATX_ATTR_COMPRESSED File is compressed by the fs STATX_ATTR_IMMUTABLE File is marked immutable STATX_ATTR_APPEND File is append-only STATX_ATTR_NODUMP File is not to be dumped STATX_ATTR_ENCRYPTED File requires key to decrypt in fs Within the kernel, the supported flags are listed by: KSTAT_ATTR_FS_IOC_FLAGS [Are any other IOC flags of sufficient general interest to be exposed through this interface?] New flags include: STATX_ATTR_AUTOMOUNT Object is an automount trigger These are for the use of GUI tools that might want to mark files specially, depending on what they are. Fields in struct statx come in a number of classes: (0) stx_dev_*, stx_blksize. These are local system information and are always available. (1) stx_mode, stx_nlinks, stx_uid, stx_gid, stx_[amc]time, stx_ino, stx_size, stx_blocks. These will be returned whether the caller asks for them or not. The corresponding bits in stx_mask will be set to indicate whether they actually have valid values. If the caller didn't ask for them, then they may be approximated. For example, NFS won't waste any time updating them from the server, unless as a byproduct of updating something requested. If the values don't actually exist for the underlying object (such as UID or GID on a DOS file), then the bit won't be set in the stx_mask, even if the caller asked for the value. In such a case, the returned value will be a fabrication. Note that there are instances where the type might not be valid, for instance Windows reparse points. (2) stx_rdev_*. This will be set only if stx_mode indicates we're looking at a blockdev or a chardev, otherwise will be 0. (3) stx_btime. Similar to (1), except this will be set to 0 if it doesn't exist. ======= TESTING ======= The following test program can be used to test the statx system call: samples/statx/test-statx.c Just compile and run, passing it paths to the files you want to examine. The file is built automatically if CONFIG_SAMPLES is enabled. Here's some example output. Firstly, an NFS directory that crosses to another FSID. Note that the AUTOMOUNT attribute is set because transiting this directory will cause d_automount to be invoked by the VFS. [root@andromeda ~]# /tmp/test-statx -A /warthog/data statx(/warthog/data) = 0 results=7ff Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 00:26 Inode: 1703937 Links: 125 Access: (3777/drwxrwxrwx) Uid: 0 Gid: 4041 Access: 2016-11-24 09:02:12.219699527+0000 Modify: 2016-11-17 10:44:36.225653653+0000 Change: 2016-11-17 10:44:36.225653653+0000 Attributes: 0000000000001000 (-------- -------- -------- -------- -------- -------- ---m---- --------) Secondly, the result of automounting on that directory. [root@andromeda ~]# /tmp/test-statx /warthog/data statx(/warthog/data) = 0 results=7ff Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 00:27 Inode: 2 Links: 125 Access: (3777/drwxrwxrwx) Uid: 0 Gid: 4041 Access: 2016-11-24 09:02:12.219699527+0000 Modify: 2016-11-17 10:44:36.225653653+0000 Change: 2016-11-17 10:44:36.225653653+0000 Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
174cd4b1 |
|
02-Feb-2017 |
Ingo Molnar <mingo@kernel.org> |
sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h> Fix up affected files that include this signal functionality via sched.h. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
7c0f6ba6 |
|
24-Dec-2016 |
Linus Torvalds <torvalds@linux-foundation.org> |
Replace <asm/uaccess.h> with <linux/uaccess.h> globally This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
a5f925bc |
|
19-Dec-2016 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Don't abuse NFS_INO_REVAL_FORCED in nfs_post_op_update_inode_locked() The NFS_INO_REVAL_FORCED flag now really only has meaning for the case when we've just been handed a delegation for a file that was already cached, and we're unsure about that cache. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
187e593d |
|
16-Dec-2016 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Clean up nfs_attribute_timeout() It can be made static. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
3f642a13 |
|
16-Dec-2016 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Remove unused function nfs_revalidate_inode_rcu() Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
61540bf6 |
|
08-Dec-2016 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Clean up cache validity checking Consolidate the open-coded checking of NFS_I(inode)->cache_validity into a couple of helper functions. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
58ff4184 |
|
16-Dec-2016 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Don't revalidate the file on close if we hold a delegation If we're holding a delegation, we can skip sending the close-to-open GETATTR until we're returning that delegation. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
1cd9cb05 |
|
04-Dec-2016 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Only look at the change attribute cache state in nfs_check_verifier When looking at whether or not our dcache is valid, we really don't care about the general state of the directory attribute cache. Instead, we we only care about the state of the change attribute. This fixes a performance issue when the client is responsible for changing the directory contents; a number of NFSv4 operations will atomically update the directory change attribute, but may not return all the other attributes. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
10727772 |
|
04-Dec-2016 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Fix incorrect mapping revalidation when holding a delegation We should only care about checking the attributes if the page cache is marked as dubious (using NFS_INO_REVAL_PAGECACHE) and the NFS_INO_REVAL_FORCED flag is set. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
1bcf4c5c |
|
02-Dec-2016 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Allow getattr to also report readdirplus cache hits If the use called stat() on an 'ls -l' workload, and the attribute cache was successfully revalidate by READDIRPLUS, then we want to report that back so that the readdir code continues to use readdirplus. Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
d51fdb87 |
|
12-Oct-2016 |
NeilBrown <neilb@suse.com> |
NFS: discard nfs_lockowner structure. It now has only one field and is only used in one structure. So replaced it in that structure by the field it contains. Signed-off-by: NeilBrown <neilb@suse.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
532d4def |
|
12-Oct-2016 |
NeilBrown <neilb@suse.com> |
NFSv4: add flock_owner to open context An open file description (struct file) in a given process can be associated with two different lock owners. It can have a Posix lock owner which will be different in each process that has a fd on the file. It can have a Flock owner which will be the same in all processes. When searching for a lock stateid to use, we need to consider both of these owners So add a new "flock_owner" to the "nfs_open_context" (of which there is one for each open file description). This flock_owner does not need to be reference-counted as there is a 1-1 relation between 'struct file' and nfs open contexts, and it will never be part of a list of contexts. So there is no need for a 'flock_context' - just the owner is enough. The io_count included in the (Posix) lock_context provides no guarantee that all read-aheads that could use the state have completed, so not supporting it for flock locks in not a serious problem. Synchronization between flock and read-ahead can be added later if needed. When creating an open_context for a non-openning create call, we don't have a 'struct file' to pass in, so the lock context gets initialized with a NULL owner, but this will never be used. The flock_owner is not used at all in this patch, that will come later. Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: NeilBrown <neilb@suse.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
b184b5c3 |
|
12-Oct-2016 |
NeilBrown <neilb@suse.com> |
NFS: remove l_pid field from nfs_lockowner this field is not used in any important way and probably should have been removed by Commit: 8003d3c4aaa5 ("nfs4: treat lock owners as opaque values") which removed the pid argument from nfs4_get_lock_state. Except in unusual and uninteresting cases, two threads with the same ->tgid will have the same ->files pointer, so keeping them both for comparison brings no benefit. Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: NeilBrown <neilb@suse.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
1ad13dbc |
|
27-Oct-2016 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFSv4: Optimise away forced revalidation when we know the attributes are OK The NFS_INO_REVAL_FORCED flag needs to be set if we just got a delegation, and we see that there might still be some ambiguity as to whether or not our attribute or data cache are valid. In practice, this means that a call to nfs_check_inode_attributes() will have noticed a discrepancy between cached attributes and measured ones, so let's move the setting of NFS_INO_REVAL_FORCED to there. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
c7d03a00 |
|
16-Nov-2016 |
Alexey Dobriyan <adobriyan@gmail.com> |
netns: make struct pernet_operations::id unsigned int Make struct pernet_operations::id unsigned. There are 2 reasons to do so: 1) This field is really an index into an zero based array and thus is unsigned entity. Using negative value is out-of-bound access by definition. 2) On x86_64 unsigned 32-bit data which are mixed with pointers via array indexing or offsets added or subtracted to pointers are preffered to signed 32-bit data. "int" being used as an array index needs to be sign-extended to 64-bit before being used. void f(long *p, int i) { g(p[i]); } roughly translates to movsx rsi, esi mov rdi, [rsi+...] call g MOVSX is 3 byte instruction which isn't necessary if the variable is unsigned because x86_64 is zero extending by default. Now, there is net_generic() function which, you guessed it right, uses "int" as an array index: static inline void *net_generic(const struct net *net, int id) { ... ptr = ng->ptr[id - 1]; ... } And this function is used a lot, so those sign extensions add up. Patch snipes ~1730 bytes on allyesconfig kernel (without all junk messing with code generation): add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730) Unfortunately some functions actually grow bigger. This is a semmingly random artefact of code generation with register allocator being used differently. gcc decides that some variable needs to live in new r8+ registers and every access now requires REX prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be used which is longer than [r8] However, overall balance is in negative direction: add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730) function old new delta nfsd4_lock 3886 3959 +73 tipc_link_build_proto_msg 1096 1140 +44 mac80211_hwsim_new_radio 2776 2808 +32 tipc_mon_rcv 1032 1058 +26 svcauth_gss_legacy_init 1413 1429 +16 tipc_bcbase_select_primary 379 392 +13 nfsd4_exchange_id 1247 1260 +13 nfsd4_setclientid_confirm 782 793 +11 ... put_client_renew_locked 494 480 -14 ip_set_sockfn_get 730 716 -14 geneve_sock_add 829 813 -16 nfsd4_sequence_done 721 703 -18 nlmclnt_lookup_host 708 686 -22 nfsd4_lockt 1085 1063 -22 nfs_get_client 1077 1050 -27 tcf_bpf_init 1106 1076 -30 nfsd4_encode_fattr 5997 5930 -67 Total: Before=154856051, After=154854321, chg -0.00% Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
944171cb |
|
28-Jul-2016 |
Benjamin Coddington <bcodding@redhat.com> |
pNFS: Actively set attributes as invalid if LAYOUTCOMMIT is outstanding A LAYOUTCOMMIT then subsequent GETATTR may both return the same attributes, and in that case NFS_INO_INVALID_ATTR is never set on the second pass through nfs_update_inode(). The existing check to skip the clearing of NFS_INO_INVALID_ATTR if a LAYOUTCOMMIT is outstanding does not help in this case (see commit 10b7e9ad4488: "pNFS: Don't mark the inode as revalidated if a LAYOUTCOMMIT is outstanding"). We know that if a LAYOUTCOMMIT is outstanding then attributes will need upating, so always set NFS_INO_INVALID_ATTR. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
10b7e9ad |
|
17-Jul-2016 |
Trond Myklebust <trond.myklebust@primarydata.com> |
pNFS: Don't mark the inode as revalidated if a LAYOUTCOMMIT is outstanding We know that the attributes will need updating if there is still a LAYOUTCOMMIT outstanding. Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
79566ef0 |
|
25-Jun-2016 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Getattr doesn't require data sync semantics When retrieving stat() information, NFS unfortunately does require us to sync writes to disk in order to ensure that mtime and ctime are up to date. However we shouldn't have to ensure that those writes are persisted. Relaxing that requirement does mean that we may see an mtime/ctime change if the server reboots and forces us to replay all writes. The exception to this rule are pNFS clients that are required to send layoutcommit, however that is dealt with by the call to pnfs_sync_inode() in _nfs_revalidate_inode(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
651b0e70 |
|
25-Jun-2016 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Do not aggressively cache file attributes in the case of O_DIRECT A file that is open for O_DIRECT is by definition not obeying close-to-open cache consistency semantics, so let's not cache the attributes too aggressively either. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
be527494 |
|
22-Jun-2016 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Remove unused function nfs_revalidate_mapping_protected() Clean up... Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
ac46bd37 |
|
05-Jul-2016 |
Trond Myklebust <trond.myklebust@primarydata.com> |
pNFS: Ensure we layoutcommit before revalidating attributes If we need to update the cached attributes, then we'd better make sure that we also layoutcommit first. Otherwise, the server may have stale attributes. Prior to this patch, the revalidation code tried to "fix" this problem by simply disabling attributes that would be affected by the layoutcommit. That approach breaks nfs_writeback_check_extend(), leading to a file size corruption. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
916ec34d |
|
17-Jun-2016 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Fix potential race in nfs_fhget() If we don't set the mode correctly in nfs_init_locked(), then there is potential for a race with a second call to nfs_fhget that will cause inode aliasing. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
ca0daa27 |
|
08-Jun-2016 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Cache aggressively when file is open for writing Unless the user is using file locking, we must assume close-to-open cache consistency when the file is open for writing. Adjust the caching algorithm so that it does not clear the cache on out-of-order writes and/or attribute revalidations. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
38512aa98 |
|
07-Jun-2016 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Don't flush caches for a getattr that races with writeback If there were outstanding writes then chalk up the unexpected change attribute on the server to them. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
884be175 |
|
28-Apr-2016 |
Al Viro <viro@zeniv.linux.org.uk> |
nfs: per-name sillyunlink exclusion use d_alloc_parallel() for sillyunlink/lookup exclusion and explicit rwsem (nfs_rmdir() being a writer and nfs_call_unlink() - a reader) for rmdir/sillyunlink one. That ought to make lookup/readdir/!O_CREAT atomic_open really parallel on NFS. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
be62a1a8 |
|
26-Mar-2016 |
Miklos Szeredi <mszeredi@redhat.com> |
nfs: use file_dentry() NFS may be used as lower layer of overlayfs and accessing f_path.dentry can lead to a crash. Fix by replacing direct access of file->f_path.dentry with the file_dentry() accessor, which will always return a native object. Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Acked-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: <stable@vger.kernel.org> # v4.2 Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk>
|
#
95d9f6c3 |
|
02-Mar-2016 |
Christoph Hellwig <hch@lst.de> |
nfs: remove nfs_inode_dio_wait Just call inode_dio_wait directly instead of through a pointless wrapper. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
5955102c |
|
22-Jan-2016 |
Al Viro <viro@zeniv.linux.org.uk> |
wrappers for ->i_mutex access parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested}, inode_foo(inode) being mutex_foo(&inode->i_mutex). Please, use those for access to ->i_mutex; over the coming cycle ->i_mutex will become rwsem, with ->lookup() done with it held only shared. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
5d097056 |
|
14-Jan-2016 |
Vladimir Davydov <vdavydov.dev@gmail.com> |
kmemcg: account certain kmem allocations to memcg Mark those kmem allocations that are known to be easily triggered from userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to memcg. For the list, see below: - threadinfo - task_struct - task_delay_info - pid - cred - mm_struct - vm_area_struct and vm_region (nommu) - anon_vma and anon_vma_chain - signal_struct - sighand_struct - fs_struct - files_struct - fdtable and fdtable->full_fds_bits - dentry and external_name - inode for all filesystems. This is the most tedious part, because most filesystems overwrite the alloc_inode method. The list is far from complete, so feel free to add more objects. Nevertheless, it should be close to "account everything" approach and keep most workloads within bounds. Malevolent users will be able to breach the limit, but this was possible even with the former "account everything" approach (simply because it did not account everything in fact). [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Tejun Heo <tj@kernel.org> Cc: Greg Thelen <gthelen@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
210c7c17 |
|
06-Jan-2016 |
Benjamin Coddington <bcodding@redhat.com> |
NFS: Use wait_on_atomic_t() for unlock after readahead The use of wait_on_atomic_t() for waiting on I/O to complete before unlocking allows us to git rid of the NFS_IO_INPROGRESS flag, and thus the nfs_iocounter's flags member, and finally the nfs_iocounter altogether. The count of I/O is moved to the lock context, and the counter increment/decrement functions become simple enough to open-code. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> [Trond: Fix up conflict with existing function nfs_wait_atomic_killable()] Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
ade14a7d |
|
29-Dec-2015 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Fix attribute cache revalidation If a NFSv4 client uses the cache_consistency_bitmask in order to request only information about the change attribute, timestamps and size, then it has not revalidated all attributes, and hence the attribute timeout timestamp should not be updated. Reported-by: Donald Buczek <buczek@molgen.mpg.de> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
0bcbf039 |
|
05-Dec-2015 |
Peng Tao <tao.peng@primarydata.com> |
nfs: handle request add failure properly When we fail to queue a read page to IO descriptor, we need to clean it up otherwise it is hanging around preventing nfs module from being removed. When we fail to queue a write page to IO descriptor, we need to clean it up and also save the failure status to open context. Then at file close, we can try to write pages back again and drop the page if it fails to writeback in .launder_page, which will be done in the next patch. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
dfd01f02 |
|
13-Dec-2015 |
Peter Zijlstra <peterz@infradead.org> |
sched/wait: Fix the signal handling fix Jan Stancek reported that I wrecked things for him by fixing things for Vladimir :/ His report was due to an UNINTERRUPTIBLE wait getting -EINTR, which should not be possible, however my previous patch made this possible by unconditionally checking signal_pending(). We cannot use current->state as was done previously, because the instruction after the store to that variable it can be changed. We must instead pass the initial state along and use that. Fixes: 68985633bccb ("sched/wait: Fix signal handling in bit wait helpers") Reported-by: Jan Stancek <jstancek@redhat.com> Reported-by: Chris Mason <clm@fb.com> Tested-by: Jan Stancek <jstancek@redhat.com> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> Tested-by: Chris Mason <clm@fb.com> Reviewed-by: Paul Turner <pjt@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: tglx@linutronix.de Cc: Oleg Nesterov <oleg@redhat.com> Cc: hpa@zytor.com Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
0d0def49 |
|
17-Nov-2015 |
Al Viro <viro@zeniv.linux.org.uk> |
teach nfs_get_link() to work in RCU mode based upon the corresponding patch from Neil's March patchset, again with kmap-related horrors removed. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
21fc61c7 |
|
16-Nov-2015 |
Al Viro <viro@zeniv.linux.org.uk> |
don't put symlink bodies in pagecache into highmem kmap() in page_follow_link_light() needed to go - allowing to hold an arbitrary number of kmaps for long is a great way to deadlocking the system. new helper (inode_nohighmem(inode)) needs to be used for pagecache symlinks inodes; done for all in-tree cases. page_follow_link_light() instrumented to yell about anything missed. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
c812012f |
|
25-Nov-2015 |
Jeff Layton <jlayton@kernel.org> |
nfs: if we have no valid attrs, then don't declare the attribute cache valid If we pass in an empty nfs_fattr struct to nfs_update_inode, it will (correctly) not update any of the attributes, but it then clears the NFS_INO_INVALID_ATTR flag, which indicates that the attributes are up to date. Don't clear the flag if the fattr struct has no valid attrs to apply. Reviewed-by: Steve French <steve.french@primarydata.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
616c3196 |
|
25-Nov-2015 |
Jeff Layton <jlayton@kernel.org> |
nfs: ensure that attrcache is revalidated after a SETATTR If we get no post-op attributes back from a SETATTR operation, then no attributes will of course be updated during the call to nfs_update_inode. We know however that the attributes are invalid at that point, since we just changed some of them. At the very least, the ctime will be bogus. If we get no post-op attributes back on the call, mark the attrcache invalid to reflect that fact. Reviewed-by: Steve French <steve.french@primarydata.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
4eae5014 |
|
04-Sep-2015 |
Trond Myklebust <trond.myklebust@primarydata.com> |
Revert "NFS: Make close(2) asynchronous when closing NFS O_DIRECT files" This reverts commit f895c53f8ace3c3e49ebf9def90e63fc6d46d2bf. This commit causes a NFSv4 regression in that close()+unlink() can end up failing. The reason is that we no longer have a guarantee that the CLOSE has completed on the server, meaning that the subsequent call to REMOVE may fail with NFS4ERR_FILE_OPEN if the server implements Windows unlink() semantics. Reported-by: <Olga Kornievskaia <aglo@umich.edu> Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
5cf9d706 |
|
04-Sep-2015 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Optimise away the close-to-open getattr if there is no cached data If there is no cached data, then there is no need to track the file change attribute on close. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
ae57ca0f |
|
26-Aug-2015 |
Kinglong Mee <kinglongmee@gmail.com> |
NFS: Check size by inode_newsize_ok in nfs_setattr Set rlimit for NFS's files is useless right now. For local process's rlimit, it should be checked by nfs client. The same, CIFS also call inode_change_ok checking rlimit at its client in cifs_setattr_nounix() and cifs_setattr_unix(). v3, fix bad using of error Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
aaae3f00 |
|
20-Aug-2015 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFSv4: Force a post-op attribute update when holding a delegation If the ctime or mtime or change attribute have changed because of an operation we initiated, we should make sure that we force an attribute update. However we do not want to mark the page cache for revalidation. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org # v4.0+
|
#
7c2dad99 |
|
05-Aug-2015 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Don't let the ctime override attribute barriers. Chuck reports seeing cases where a GETATTR that happens to race with an asynchronous WRITE is overriding the file size, despite the attribute barrier being set by the writeback code. The culprit turns out to be the check in nfs_ctime_need_update(), which sees that the ctime is newer than the cached ctime, and assumes that it is safe to override the attribute barrier. This patch removes that override, and ensures that attribute barriers are always respected. Reported-by: Chuck Lever <chuck.lever@oracle.com> Fixes: a08a8cd375db9 ("NFS: Add attribute update barriers to NFS writebacks") Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
aff8d8dc |
|
13-Jul-2015 |
Anna Schumaker <Anna.Schumaker@netapp.com> |
NFS: Remove nfs_release() And call nfs_file_clear_open_context() directly. This makes it obvious that nfs_file_release() will always return 0. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
cd812599 |
|
05-Jul-2015 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Remove the "NFS_CAP_CHANGE_ATTR" capability Setting the change attribute has been mandatory for all NFS versions, since commit 3a1556e8662c ("NFSv2/v3: Simulate the change attribute"). We should therefore not have anything be conditional on it being set/unset. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
5c675d64 |
|
05-Jul-2015 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Set NFS_INO_REVAL_PAGECACHE if the change attribute is uninitialised We can't allow caching of data until the change attribute has been initialised correctly. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
85a23cee |
|
05-Jul-2015 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Don't revalidate the mapping if both size and change attr are up to date If we've ensured that the size and the change attribute are both correct, then there is no point in marking those attributes as needing revalidation again. Only do so if we know the size is incorrect and was not updated. Fixes: f2467b6f64da ("NFS: Clear NFS_INO_REVAL_PAGECACHE when...") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
cd738ee9 |
|
30-Jun-2015 |
Kinglong Mee <kinglongmee@gmail.com> |
nfs: Remove unneeded micro checking of CONFIG_PROC_FS Have checking CONFIG_PROC_FS in include/linux/sunrpc/stats.h. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
7ef5ca4f |
|
07-May-2015 |
NeilBrown <neilb@suse.de> |
NFS: report more appropriate block size for directories. In glibc 2.21 (and several previous), a call to opendir() will result in a 32K (BUFSIZ*4) buffer being allocated and passed to getdents. However a call to fdopendir() results in an 'fstat' request to determine block size and a matching buffer allocated for subsequent use with getdents. This will typically be 1M. The first getdents call on an NFS directory will always use READDIR_PLUS (or NFSv4 equivalent) if available. Subsequent getdents calls only use this more expensive version if some 'stat' requests are made between the getdents calls. For this reason it is good to keep at least that first getdents call relatively short. When fdopendir() and readdir() is used on a large directory, it takes approximately 32 times as long to complete as using "opendir". Current versions of 'find' use fdopendir() and demonstrate this slowness. 'stat' on a directory currently returns the 'wsize'. This number has no meaning on directories. Actual READDIR requests are limited to ->dtsize, which itself is capped at 4 pages, coincidently the same as BUFSIZ*4. So this is a meaningful number to use as the blocksize on directories, and has the effect of making 'find' on large directories go a lot faster. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
c456aacf |
|
23-Apr-2015 |
Firo Yang <firogm@gmail.com> |
nfs: Remove unneeded casts in nfs Don't unnecessarily cast allocation return value in fs/nfs/inode.c::nfs_alloc_inode(). Signed-off-by: Firo Yang <firogm@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
ea96d1ec |
|
03-Apr-2015 |
Anna Schumaker <Anna.Schumaker@netapp.com> |
nfs: Fetch MOUNTED_ON_FILEID when updating an inode 2ef47eb1 (NFS: Fix use of nfs_attr_use_mounted_on_fileid()) was a good start to fixing a circular directory structure warning for NFS v4 "junctioned" mountpoints. Unfortunately, further testing continued to generate this error. My server is configured like this: anna@nfsd ~ % df Filesystem Size Used Avail Use% Mounted on /dev/vda1 9.1G 2.0G 6.5G 24% / /dev/vdc1 1014M 33M 982M 4% /exports /dev/vdc2 1014M 33M 982M 4% /exports/vol1 /dev/vdc3 1014M 33M 982M 4% /exports/vol1/vol2 anna@nfsd ~ % cat /etc/exports /exports/ *(rw,async,no_subtree_check,no_root_squash) /exports/vol1/ *(rw,async,no_subtree_check,no_root_squash) /exports/vol1/vol2 *(rw,async,no_subtree_check,no_root_squash) I've been running chown across the entire mountpoint twice in a row to hit this problem. The first run succeeds, but the second one fails with the circular directory warning along with: anna@client ~ % dmesg [Apr 3 14:28] NFS: server 192.168.100.204 error: fileid changed fsid 0:39: expected fileid 0x100080, got 0x80 WHere 0x80 is the mountpoint's fileid and 0x100080 is the mounted-on fileid. This patch fixes the issue by requesting an updated mounted-on fileid from the server during nfs_update_inode(), and then checking that the fileid stored in the nfs_inode matches either the fileid or mounted-on fileid returned by the server. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
9a51940b |
|
16-Mar-2015 |
Anna Schumaker <Anna.Schumaker@netapp.com> |
NFS: Don't zap caches on fallocate() This patch adds a GETATTR to the end of ALLOCATE and DEALLOCATE operations so we can set the updated inode size and change attribute directly. DEALLOCATE will still need to release pagecache pages, so nfs42_proc_deallocate() now calls truncate_pagecache_range() before contacting the server. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
2b0143b5 |
|
17-Mar-2015 |
David Howells <dhowells@redhat.com> |
VFS: normal filesystems (and lustre): d_inode() annotations that's the bulk of filesystem drivers dealing with inodes of their own Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
8c18d76b |
|
25-Mar-2015 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Block new writes while syncing data in nfs_getattr() Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
9e1681c2 |
|
25-Mar-2015 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFSv4: Truncating file opens should also sync O_DIRECT writes We don't just want to sync out buffered writes, but also O_DIRECT ones. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
4d346bea |
|
25-Mar-2015 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Add a helper to sync both O_DIRECT and buffered writes Then apply it to nfs_setattr() and nfs_getattr(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
ef070dcb |
|
02-Mar-2015 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Don't write enable new pages while an invalidation is proceeding nfs_vm_page_mkwrite() should wait until the page cache invalidation is finished. This is the second patch in a 2 patch series to deprecate the NFS client's reliance on nfs_release_page() in the context of nfs_invalidate_mapping(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
874f9463 |
|
02-Mar-2015 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Fix a regression in the read() syscall When invalidating the page cache for a regular file, we want to first sync all dirty data to disk and then call invalidate_inode_pages2(). The latter relies on nfs_launder_page() and nfs_release_page() to deal respectively with dirty pages, and unstable written pages. When commit 9590544694bec ("NFS: avoid deadlocks with loop-back mounted NFS filesystems.") changed the behaviour of nfs_release_page(), then it made it possible for invalidate_inode_pages2() to fail with an EBUSY. Unfortunately, that error is then propagated back to read(). Let's therefore work around the problem for now by protecting the call to sync the data and invalidate_inode_pages2() so that they are atomic w.r.t. the addition of new writes. Later on, we can revisit whether or not we still need nfs_launder_page() and nfs_release_page(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
3235b403 |
|
26-Feb-2015 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFSv4: Set a barrier in the update_changeattr() helper Ensure that we don't regress the changes that were made to the directory. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
|
#
92d64e47 |
|
26-Feb-2015 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Fix nfs_post_op_update_inode() to set an attribute barrier nfs_post_op_update_inode() is called after a self-induced attribute update. Ensure that it also sets the barrier. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
|
#
00fb4c9f |
|
26-Feb-2015 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Remove size hack in nfs_inode_attrs_need_update() Prior to this patch, we used to always OK attribute updates that extended the file size on the assumption that we might be performing writeback. Now that we have attribute barriers to protect the writeback related updates, we should remove this hack, as it can cause truncate() operations to apparently be reverted if/when a readahead or getattr RPC call races with our on-the-wire SETATTR. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
|
#
8f8ba1d7 |
|
26-Feb-2015 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFSv4: Add attribute update barriers to delegreturn and pNFS layoutcommit Ensure that other operations that race with delegreturn and layoutcommit cannot revert the attribute updates that were made on the server. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
|
#
a08a8cd3 |
|
26-Feb-2015 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Add attribute update barriers to NFS writebacks Ensure that other operations that race with our write RPC calls cannot revert the file size updates that were made on the server. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
|
#
f5062003 |
|
26-Feb-2015 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Set an attribute barrier on all updates Ensure that we update the attribute barrier even if there were no invalidations, provided that this value is newer than the old one. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
|
#
f044636d |
|
26-Feb-2015 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Add attribute update barriers to nfs_setattr_update_inode() Ensure that other operations which raced with our setattr RPC call cannot revert the file attribute changes that were made on the server. To do so, we artificially bump the attribute generation counter on the inode so that all calls to nfs_fattr_init() that precede ours will be dropped. The motivation for the patch came from Chuck Lever's reports of readaheads racing with truncate operations and causing the file size to be reverted. Reported-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
|
#
140e049c |
|
26-Feb-2015 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Add a helper to set attribute barriers Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
|
#
bf40e556 |
|
13-Feb-2015 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFSv4: Kill unused nfs_inode->delegation_state field Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
3a7ed3ff |
|
08-Jan-2015 |
Omar Sandoval <osandov@osandov.com> |
nfs: prevent truncate on active swapfile Most filesystems prevent truncation of an active swapfile by way of inode_newsize_ok, called from inode_change_ok. NFS doesn't call either from nfs_setattr, presumably because most of these checks are expected to be done server-side. However, the IS_SWAPFILE check can only be done client-side, and truncating a swapfile can't possibly be good. Signed-off-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
2ef47eb1 |
|
09-Dec-2014 |
Anna Schumaker <Anna.Schumaker@netapp.com> |
NFS: Fix use of nfs_attr_use_mounted_on_fileid() This function call was being optimized out during nfs_fhget(), leading to situations where we have a valid fileid but still want to use the mounted_on_fileid. For example, imagine we have our server configured like this: server % df Filesystem Size Used Avail Use% Mounted on /dev/vda1 9.1G 6.5G 1.9G 78% / /dev/vdb1 487M 2.3M 456M 1% /exports /dev/vdc1 487M 2.3M 456M 1% /exports/vol1 /dev/vdd1 487M 2.3M 456M 1% /exports/vol2 If our client mounts /exports and tries to do a "chown -R" across the entire mountpoint, we will get a nasty message warning us about a circular directory structure. Running chown with strace tells me that each directory has the same device and inode number: newfstatat(AT_FDCWD, "/nfs/", {st_dev=makedev(0, 38), st_ino=2, ...}) = 0 newfstatat(4, "vol1", {st_dev=makedev(0, 38), st_ino=2, ...}) = 0 newfstatat(4, "vol2", {st_dev=makedev(0, 38), st_ino=2, ...}) = 0 With this patch the mounted_on_fileid values are used for st_ino, so the directory loop warning isn't reported. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
b83ae6d4 |
|
14-Jan-2015 |
Christoph Hellwig <hch@lst.de> |
fs: remove mapping->backing_dev_info Now that we never use the backing_dev_info pointer in struct address_space we can simply remove it and save 4 to 8 bytes in every inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Reviewed-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
|
#
f4ac1674 |
|
25-Nov-2014 |
Anna Schumaker <Anna.Schumaker@netapp.com> |
nfs: Add ALLOCATE support This patch adds support for using the NFS v4.2 operation ALLOCATE to preallocate data in a file. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
cb1410c7 |
|
11-Nov-2014 |
Weston Andros Adamson <dros@primarydata.com> |
NFS: fix subtle change in COMMIT behavior Recent work in the pgio layer made it possible for there to be more than one request per page. This caused a subtle change in commit behavior, because write.c:nfs_commit_unstable_pages compares the number of *pages* waiting for writeback against the number of requests on a commit list to choose when to send a COMMIT in a non-blocking flush. This is probably hard to hit in normal operation - you have to be using rsize/wsize < PAGE_SIZE, or pnfs with lots of boundaries that are not page aligned to have a noticeable change in behavior. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
16caf5b6 |
|
23-Oct-2014 |
Jan Kara <jack@suse.cz> |
nfs: Fix use of uninitialized variable in nfs_getattr() Variable 'err' needn't be initialized when nfs_getattr() uses it to check whether it should call generic_fillattr() or not. That can result in spurious error returns. Initialize 'err' properly. Signed-off-by: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
1c6dcbe5 |
|
26-Sep-2014 |
Anna Schumaker <Anna.Schumaker@netapp.com> |
NFS: Implement SEEK The SEEK operation is used when an application makes an lseek call with either the SEEK_HOLE or SEEK_DATA flags set. I fall back on nfs_file_llseek() if the server does not have SEEK support. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
08a899d5 |
|
07-Sep-2014 |
Christoph Hellwig <hch@lst.de> |
nfs: setattr can only change regular file sizes The VFS never calls setattr with ATTR_SIZE on anything but regular files. Remove the if check and turn it into an assert similar to what some other file systems do. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
65b38851 |
|
31-Jul-2014 |
Eric W. Biederman <ebiederm@xmission.com> |
NFS: Fix /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes The usage of pid_ns->child_reaper->nsproxy->net_ns in nfs_server_list_open and nfs_client_list_open is not safe. /proc for a pid namespace can remain mounted after the all of the process in that pid namespace have exited. There are also times before the initial process in a pid namespace has started or after the initial process in a pid namespace has exited where pid_ns->child_reaper can be NULL or stale. Making the idiom pid_ns->child_reaper->nsproxy a double whammy of problems. Luckily all that needs to happen is to move /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes under /proc/net to /proc/net/nfsfs/servers and /proc/net/nfsfs/volumes and add a symlink from the original location, and to use seq_open_net as it has been designed. Cc: stable@vger.kernel.org Cc: Trond Myklebust <trond.myklebust@primarydata.com> Cc: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
912a108d |
|
13-Jul-2014 |
NeilBrown <neilb@suse.de> |
NFS: teach nfs_neg_need_reval to understand LOOKUP_RCU This requires nfs_check_verifier to take an rcu_walk flag, and requires an rcu version of nfs_revalidate_inode which returns -ECHILD rather than making an RPC call. With this, nfs_lookup_revalidate can call nfs_neg_need_reval in RCU-walk mode. We can also move the LOOKUP_RCU check past the nfs_check_verifier() call in nfs_lookup_revalidate. If RCU_WALK prevents nfs_check_verifier or nfs_neg_need_reval from doing a full check, they return a status indicating that a revalidation is required. As this revalidation will not be possible in RCU_WALK mode, -ECHILD will ultimately be returned, which is the desired result. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
c1221321 |
|
06-Jul-2014 |
NeilBrown <neilb@suse.de> |
sched: Allow wait_on_bit_action() functions to support a timeout It is currently not possible for various wait_on_bit functions to implement a timeout. While the "action" function that is called to do the waiting could certainly use schedule_timeout(), there is no way to carry forward the remaining timeout after a false wake-up. As false-wakeups a clearly possible at least due to possible hash collisions in bit_waitqueue(), this is a real problem. The 'action' function is currently passed a pointer to the word containing the bit being waited on. No current action functions use this pointer. So changing it to something else will be a little noisy but will have no immediate effect. This patch changes the 'action' function to take a pointer to the "struct wait_bit_key", which contains a pointer to the word containing the bit so nothing is really lost. It also adds a 'private' field to "struct wait_bit_key", which is initialized to zero. An action function can now implement a timeout with something like static int timed_out_waiter(struct wait_bit_key *key) { unsigned long waited; if (key->private == 0) { key->private = jiffies; if (key->private == 0) key->private -= 1; } waited = jiffies - key->private; if (waited > 10 * HZ) return -EAGAIN; schedule_timeout(waited - 10 * HZ); return 0; } If any other need for context in a waiter were found it would be easy to use ->private for some other purpose, or even extend "struct wait_bit_key". My particular need is to support timeouts in nfs_release_page() to avoid deadlocks with loopback mounted NFS. While wait_on_bit_timeout() would be a cleaner interface, it will not meet my need. I need the timeout to be sensitive to the state of the connection with the server, which could change. So I need to use an 'action' interface. Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steve French <sfrench@samba.org> Cc: David Howells <dhowells@redhat.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140707051604.28027.41257.stgit@notabene.brown Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
74316201 |
|
06-Jul-2014 |
NeilBrown <neilb@suse.de> |
sched: Remove proliferation of wait_on_bit() action functions The current "wait_on_bit" interface requires an 'action' function to be provided which does the actual waiting. There are over 20 such functions, many of them identical. Most cases can be satisfied by one of just two functions, one which uses io_schedule() and one which just uses schedule(). So: Rename wait_on_bit and wait_on_bit_lock to wait_on_bit_action and wait_on_bit_lock_action to make it explicit that they need an action function. Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io which are *not* given an action function but implicitly use a standard one. The decision to error-out if a signal is pending is now made based on the 'mode' argument rather than being encoded in the action function. All instances of the old wait_on_bit and wait_on_bit_lock which can use the new version have been changed accordingly and their action functions have been discarded. wait_on_bit{_lock} does not return any specific error code in the event of a signal so the caller must check for non-zero and interpolate their own error code as appropriate. The wait_on_bit() call in __fscache_wait_on_invalidate() was ambiguous as it specified TASK_UNINTERRUPTIBLE but used fscache_wait_bit_interruptible as an action function. David Howells confirms this should be uniformly "uninterruptible" The main remaining user of wait_on_bit{,_lock}_action is NFS which needs to use a freezer-aware schedule() call. A comment in fs/gfs2/glock.c notes that having multiple 'action' functions is useful as they display differently in the 'wchan' field of 'ps'. (and /proc/$PID/wchan). As the new bit_wait{,_io} functions are tagged "__sched", they will not show up at all, but something higher in the stack. So the distinction will still be visible, only with different function names (gds2_glock_wait versus gfs2_glock_dq_wait in the gfs2/glock.c case). Since first version of this patch (against 3.15) two new action functions appeared, on in NFS and one in CIFS. CIFS also now uses an action function that makes the same freezer aware schedule call as NFS. Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: David Howells <dhowells@redhat.com> (fscache, keys) Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2) Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steve French <sfrench@samba.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brown Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
6edf9609 |
|
20-Jun-2014 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Don't mark the data cache as invalid if it has been flushed Now that we have functions such as nfs_write_pageuptodate() that use the cache_validity flags to check if the data cache is valid or not, it is a little more important to keep the flags in sync with the state of the data cache. In particular, we'd like to ensure that if the data cache is empty, we don't start marking it as needing revalidation. Reported-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
f2467b6f |
|
20-Jun-2014 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Clear NFS_INO_REVAL_PAGECACHE when we update the file size In nfs_update_inode(), if the change attribute is seen to change on the server, then we set NFS_INO_REVAL_PAGECACHE in order to make sure that we check the file size. However, if we also update the file size in the same function, we don't need to check it again. So make sure that we clear the NFS_INO_REVAL_PAGECACHE that was set earlier. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
4e857c58 |
|
17-Mar-2014 |
Peter Zijlstra <peterz@infradead.org> |
arch: Mass conversion of smp_mb__*() Mostly scripted conversion of the smp_mb__* barriers. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-arch@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
43b6535e |
|
15-Apr-2014 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Don't declare inode uptodate unless all attributes were checked Fix a bug, whereby nfs_update_inode() was declaring the inode to be up to date despite not having checked all the attributes. The bug occurs because the temporary variable in which we cache the validity information is 'sanitised' before reapplying to nfsi->cache_validity. Reported-by: Kinglong Mee <kinglongmee@gmail.com> Cc: stable@vger.kernel.org # 3.5+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
91b0abe3 |
|
03-Apr-2014 |
Johannes Weiner <hannes@cmpxchg.org> |
mm + fs: store shadow entries in page cache Reclaim will be leaving shadow entries in the page cache radix tree upon evicting the real page. As those pages are found from the LRU, an iput() can lead to the inode being freed concurrently. At this point, reclaim must no longer install shadow pages because the inode freeing code needs to ensure the page tree is really empty. Add an address_space flag, AS_EXITING, that the inode freeing code sets under the tree lock before doing the final truncate. Reclaim will check for this flag before installing shadow pages. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Minchan Kim <minchan@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bob Liu <bob.liu@oracle.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Metin Doslu <metin@citusdata.com> Cc: Michel Lespinasse <walken@google.com> Cc: Ozgun Erdogan <ozgun@citusdata.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roman Gushchin <klamm@yandex-team.ru> Cc: Ryan Mallon <rmallon@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
311324ad |
|
07-Feb-2014 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Be more aggressive in using readdirplus for 'ls -l' situations Try to detect 'ls -l' by having nfs_getattr() look at whether or not there is an opendir() file descriptor for the parent directory. If so, then assume that we want to force use of readdirplus in order to avoid the multiple GETATTR calls over the wire. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
fd1defc2 |
|
06-Feb-2014 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Do not set NFS_INO_INVALID_LABEL unless server supports labeled NFS Commit aa9c2669626c (NFS: Client implementation of Labeled-NFS) introduces a performance regression. When nfs_zap_caches_locked is called, it sets the NFS_INO_INVALID_LABEL flag irrespectively of whether or not the NFS server supports security labels. Since that flag is never cleared, it means that all calls to nfs_revalidate_inode() will now trigger an on-the-wire GETATTR call. This patch ensures that we never set the NFS_INO_INVALID_LABEL unless the server advertises support for labeled NFS. It also causes nfs_setsecurity() to clear NFS_INO_INVALID_LABEL when it has successfully set the security label for the inode. Finally it gets rid of the NFS_INO_INVALID_LABEL cruft from nfs_update_inode, which has nothing to do with labeled NFS. Reported-by: Neil Brown <neilb@suse.de> Cc: stable@vger.kernel.org # 3.11+ Tested-by: Neil Brown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
4db72b40 |
|
28-Jan-2014 |
Jeff Layton <jlayton@kernel.org> |
nfs: add memory barriers around NFS_INO_INVALID_DATA and NFS_INO_INVALIDATING If the setting of NFS_INO_INVALIDATING gets reordered to before the clearing of NFS_INO_INVALID_DATA, then another task may hit a race window where both appear to be clear, even though the inode's pages are still in need of invalidation. Fix this by adding the appropriate memory barriers. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
17dfeb91 |
|
28-Jan-2014 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Fix races in nfs_revalidate_mapping Commit d529ef83c355f97027ff85298a9709fe06216a66 (NFS: fix the handling of NFS_INO_INVALID_DATA flag in nfs_revalidate_mapping) introduces a potential race, since it doesn't test the value of nfsi->cache_validity and set the bitlock in nfsi->flags atomically. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: Jeff Layton <jlayton@redhat.com>
|
#
d529ef83 |
|
27-Jan-2014 |
Jeff Layton <jlayton@kernel.org> |
NFS: fix the handling of NFS_INO_INVALID_DATA flag in nfs_revalidate_mapping There is a possible race in how the nfs_invalidate_mapping function is handled. Currently, we go and invalidate the pages in the file and then clear NFS_INO_INVALID_DATA. The problem is that it's possible for a stale page to creep into the mapping after the page was invalidated (i.e., via readahead). If another writer comes along and sets the flag after that happens but before invalidate_inode_pages2 returns then we could clear the flag without the cache having been properly invalidated. So, we must clear the flag first and then invalidate the pages. Doing this however, opens another race: It's possible to have two concurrent read() calls that end up in nfs_revalidate_mapping at the same time. The first one clears the NFS_INO_INVALID_DATA flag and then goes to call nfs_invalidate_mapping. Just before calling that though, the other task races in, checks the flag and finds it cleared. At that point, it trusts that the mapping is good and gets the lock on the page, allowing the read() to be satisfied from the cache even though the data is no longer valid. These effects are easily manifested by running diotest3 from the LTP test suite on NFS. That program does a series of DIO writes and buffered reads. The operations are serialized and page-aligned but the existing code fails the test since it occasionally allows a read to come out of the cache incorrectly. While mixing direct and buffered I/O isn't recommended, I believe it's possible to hit this in other ways that just use buffered I/O, though that situation is much harder to reproduce. The problem is that the checking/clearing of that flag and the invalidation of the mapping really need to be atomic. Fix this by serializing concurrent invalidations with a bitlock. At the same time, we also need to allow other places that check NFS_INO_INVALID_DATA to check whether we might be in the middle of invalidating the file, so fix up a couple of places that do that to look for the new NFS_INO_INVALIDATING flag. Doing this requires us to be careful not to set the bitlock unnecessarily, so this code only does that if it believes it will be doing an invalidation. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
013cdf10 |
|
20-Dec-2013 |
Christoph Hellwig <hch@infradead.org> |
nfs: use generic posix ACL infrastructure for v3 Posix ACLs This causes a small behaviour change in that we don't bother to set ACLs on file creation if the mode bit can express the access permissions fully, and thus behaving identical to local filesystems. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
d8c951c3 |
|
12-Jan-2014 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFSv4.1: Don't trust attributes if a pNFS LAYOUTCOMMIT is outstanding If a LAYOUTCOMMIT is outstanding, then chances are that the metadata server may still be returning incorrect values for the change attribute, ctime, mtime and/or size. Just ignore those attributes for now, and wait for the LAYOUTCOMMIT rpc call to finish. Reported-by: shaobingqing <shaobingqing@bwstor.com.cn> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
1e8968c5 |
|
17-Dec-2013 |
Niels de Vos <ndevos@redhat.com> |
NFS: dprintk() should not print negative fileids and inode numbers A fileid in NFS is a uint64. There are some occurrences where dprintk() outputs a signed fileid. This leads to confusion and more difficult to read debugging (negative fileids matching positive inode numbers). Signed-off-by: Niels de Vos <ndevos@redhat.com> CC: Santosh Pradhan <spradhan@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
829e57d7 |
|
18-Nov-2013 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix a warning in nfs_setsecurity Fix the following warning: linux-nfs/fs/nfs/inode.c:315:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
fab99ebe |
|
04-Nov-2013 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFSv4.2: Remove redundant checks in nfs_setsecurity+nfs4_label_init_security We already check for nfs_server_capable(inode, NFS_CAP_SECURITY_LABEL) in nfs4_label_alloc() We check the minor version in _nfs4_server_capabilities before setting NFS_CAP_SECURITY_LABEL. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
3da580aa |
|
02-Nov-2013 |
Jeff Layton <jlayton@kernel.org> |
nfs: set security label when revalidating inode Currently, we fetch the security label when revalidating an inode's attributes, but don't apply it. This is in contrast to the readdir() codepath where we do apply label changes. Cc: Dave Quigley <dpquigl@davequigley.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
9e6ee76d |
|
17-Oct-2013 |
Chuck Lever <chuck.lever@oracle.com> |
NFS: Export _nfs_display_fhandle() Allow code in nfsv4.ko to use _nfs_display_fhandle(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
f1fe29b4 |
|
27-Sep-2013 |
David Howells <dhowells@redhat.com> |
NFS: Use i_writecount to control whether to get an fscache cookie in nfs_open() Use i_writecount to control whether to get an fscache cookie in nfs_open() as NFS does not do write caching yet. I *think* this is the cause of a problem encountered by Mark Moseley whereby __fscache_uncache_page() gets a NULL pointer dereference because cookie->def is NULL: BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 IP: [<ffffffff812a1903>] __fscache_uncache_page+0x23/0x160 PGD 0 Thread overran stack, or stack corrupted Oops: 0000 [#1] SMP Modules linked in: ... CPU: 7 PID: 18993 Comm: php Not tainted 3.11.1 #1 Hardware name: Dell Inc. PowerEdge R420/072XWF, BIOS 1.3.5 08/21/2012 task: ffff8804203460c0 ti: ffff880420346640 RIP: 0010:[<ffffffff812a1903>] __fscache_uncache_page+0x23/0x160 RSP: 0018:ffff8801053af878 EFLAGS: 00210286 RAX: 0000000000000000 RBX: ffff8800be2f8780 RCX: ffff88022ffae5e8 RDX: 0000000000004c66 RSI: ffffea00055ff440 RDI: ffff8800be2f8780 RBP: ffff8801053af898 R08: 0000000000000001 R09: 0000000000000003 R10: 0000000000000000 R11: 0000000000000000 R12: ffffea00055ff440 R13: 0000000000001000 R14: ffff8800c50be538 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88042fc60000(0063) knlGS:00000000e439c700 CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 0000000000000010 CR3: 0000000001d8f000 CR4: 00000000000607f0 Stack: ... Call Trace: [<ffffffff81365a72>] __nfs_fscache_invalidate_page+0x42/0x70 [<ffffffff813553d5>] nfs_invalidate_page+0x75/0x90 [<ffffffff811b8f5e>] truncate_inode_page+0x8e/0x90 [<ffffffff811b90ad>] truncate_inode_pages_range.part.12+0x14d/0x620 [<ffffffff81d6387d>] ? __mutex_lock_slowpath+0x1fd/0x2e0 [<ffffffff811b95d3>] truncate_inode_pages_range+0x53/0x70 [<ffffffff811b969d>] truncate_inode_pages+0x2d/0x40 [<ffffffff811b96ff>] truncate_pagecache+0x4f/0x70 [<ffffffff81356840>] nfs_setattr_update_inode+0xa0/0x120 [<ffffffff81368de4>] nfs3_proc_setattr+0xc4/0xe0 [<ffffffff81357f78>] nfs_setattr+0xc8/0x150 [<ffffffff8122d95b>] notify_change+0x1cb/0x390 [<ffffffff8120a55b>] do_truncate+0x7b/0xc0 [<ffffffff8121f96c>] do_last+0xa4c/0xfd0 [<ffffffff8121ffbc>] path_openat+0xcc/0x670 [<ffffffff81220a0e>] do_filp_open+0x4e/0xb0 [<ffffffff8120ba1f>] do_sys_open+0x13f/0x2b0 [<ffffffff8126aaf6>] compat_SyS_open+0x36/0x50 [<ffffffff81d7204c>] sysenter_dispatch+0x7/0x24 The code at the instruction pointer was disassembled: > (gdb) disas __fscache_uncache_page > Dump of assembler code for function __fscache_uncache_page: > ... > 0xffffffff812a18ff <+31>: mov 0x48(%rbx),%rax > 0xffffffff812a1903 <+35>: cmpb $0x0,0x10(%rax) > 0xffffffff812a1907 <+39>: je 0xffffffff812a19cd <__fscache_uncache_page+237> These instructions make up: ASSERTCMP(cookie->def->type, !=, FSCACHE_COOKIE_TYPE_INDEX); That cmpb is the faulting instruction (%rax is 0). So cookie->def is NULL - which presumably means that the cookie has already been at least partway through __fscache_relinquish_cookie(). What I think may be happening is something like a three-way race on the same file: PROCESS 1 PROCESS 2 PROCESS 3 =============== =============== =============== open(O_TRUNC|O_WRONLY) open(O_RDONLY) open(O_WRONLY) -->nfs_open() -->nfs_fscache_set_inode_cookie() nfs_fscache_inode_lock() nfs_fscache_disable_inode_cookie() __fscache_relinquish_cookie() nfs_inode->fscache = NULL <--nfs_fscache_set_inode_cookie() -->nfs_open() -->nfs_fscache_set_inode_cookie() nfs_fscache_inode_lock() nfs_fscache_enable_inode_cookie() __fscache_acquire_cookie() nfs_inode->fscache = cookie <--nfs_fscache_set_inode_cookie() <--nfs_open() -->nfs_setattr() ... ... -->nfs_invalidate_page() -->__nfs_fscache_invalidate_page() cookie = nfsi->fscache -->nfs_open() -->nfs_fscache_set_inode_cookie() nfs_fscache_inode_lock() nfs_fscache_disable_inode_cookie() -->__fscache_relinquish_cookie() -->__fscache_uncache_page(cookie) <crash> <--__fscache_relinquish_cookie() nfs_inode->fscache = NULL <--nfs_fscache_set_inode_cookie() What is needed is something to prevent process #2 from reacquiring the cookie - and I think checking i_writecount should do the trick. It's also possible to have a two-way race on this if the file is opened O_TRUNC|O_RDONLY instead. Reported-by: Mark Moseley <moseleymark@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com>
|
#
7caef267 |
|
12-Sep-2013 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
truncate: drop 'oldsize' truncate_pagecache() parameter truncate_pagecache() doesn't care about old size since commit cedabed49b39 ("vfs: Fix vmtruncate() regression"). Let's drop it. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f4ce1299 |
|
19-Aug-2013 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Add event tracing for generic NFS events Add tracepoints for inode attribute updates, attribute revalidation, writeback start/end fsync start/end, attribute change start/end, permission check start/end. The intention is to enable performance tracing using 'perf'as well as improving debugging. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
1264a2f0 |
|
12-Aug-2013 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: refactor code for calculating the crc32 hash of a filehandle We want to be able to display the crc32 hash of the filehandle in tracepoints. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
eddffa40 |
|
06-Aug-2013 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Remove unnecessary call to nfs_setsecurity in nfs_fhget() We only need to call it on the creation of the inode. Reported-by: Julia Lawall <Julia.Lawall@lip6.fr> Cc: Steve Dickson <SteveD@redhat.com> Cc: Dave Quigley <dpquigl@davequigley.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
f8806c84 |
|
05-Aug-2013 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix writeback performance issue on cache invalidation If a cache invalidation is triggered, and we happen to have a lot of writebacks cached at the time, then the call to invalidate_inode_pages2() will end up calling ->launder_page() on each and every dirty page in order to sync its contents to disk, thus defeating write coalescing. The following patch ensures that we try to sync the inode to disk before calling invalidate_inode_pages2() so that we do the writeback as efficiently as possible. Reported-by: William Dauchy <william@gandi.net> Reported-by: Pascal Bouchareine <pascal@gandi.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: William Dauchy <william@gandi.net> Reviewed-by: Jeff Layton <jlayton@redhat.com>
|
#
43f291cd |
|
05-Jul-2013 |
Scott Mayhew <smayhew@redhat.com> |
NFS: Make nfs_attribute_cache_expired() non-static NFS: Make nfs_attribute_cache_expired() non-static so we can call it from nfs_readdir(). Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
c8d74d9b |
|
01-Jun-2013 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFSv4: Move the DNS resolver into the NFSv4 module The other protocols don't use it, so make it local to NFSv4, and remove the EXPORT. Also ensure that we only compile in cache_lib.o if we're using the legacy DNS resolver. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Bryan Schumaker <bjschuma@netapp.com>
|
#
aa9c2669 |
|
21-May-2013 |
David Quigley <dpquigl@davequigley.com> |
NFS: Client implementation of Labeled-NFS This patch implements the client transport and handling support for labeled NFS. The patch adds two functions to encode and decode the security label recommended attribute which makes use of the LSM hooks added earlier. It also adds code to grab the label from the file attribute structures and encode the label to be sent back to the server. Acked-by: James Morris <james.l.morris@oracle.com> Signed-off-by: Matthew N. Dodd <Matthew.Dodd@sparta.com> Signed-off-by: Miguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg> Signed-off-by: Phua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg> Signed-off-by: Khin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg> Signed-off-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
14c43f76 |
|
21-May-2013 |
David Quigley <dpquigl@davequigley.com> |
NFS: Add label lifecycle management This patch adds the lifecycle management for the security label structure introduced in an earlier patch. The label is not used yet but allocations and freeing of the structure is handled. Signed-off-by: Matthew N. Dodd <Matthew.Dodd@sparta.com> Signed-off-by: Miguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg> Signed-off-by: Phua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg> Signed-off-by: Khin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg> Signed-off-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
1775fd3e |
|
21-May-2013 |
David Quigley <dpquigl@davequigley.com> |
NFS:Add labels to client function prototypes After looking at all of the nfsv4 operations the label structure has been added to the prototypes of the functions which can transmit label data. Signed-off-by: Matthew N. Dodd <Matthew.Dodd@sparta.com> Signed-off-by: Miguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg> Signed-off-by: Phua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg> Signed-off-by: Khin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg> Signed-off-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
e058f70b |
|
21-May-2013 |
Steve Dickson <steved@redhat.com> |
NFSv4: Introduce new label structure In order to mimic the way that NFSv4 ACLs are implemented we have created a structure to be used to pass label data up and down the call chain. This patch adds the new structure and new members to the required NFSv4 call structures. Signed-off-by: Matthew N. Dodd <Matthew.Dodd@sparta.com> Signed-off-by: Miguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg> Signed-off-by: Phua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg> Signed-off-by: Khin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg> Signed-off-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
c45ffdd2 |
|
29-May-2013 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFSv4: Close another NFSv4 recovery race State recovery currently relies on being able to find a valid nfs_open_context in the inode->open_files list. We therefore need to put the nfs_open_context on the list while we're still protected by the sp->so_reclaim_seqcount in order to avoid reboot races. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
416ad3c9 |
|
06-May-2013 |
Colin Cross <ccross@android.com> |
freezer: add unsafe versions of freezable helpers for NFS NFS calls the freezable helpers with locks held, which is unsafe and will cause lockdep warnings when 6aa9707 "lockdep: check that no locks held at freeze time" is reapplied (it was reverted in dbf520a). NFS shouldn't be doing this, but it has long-running syscalls that must hold a lock but also shouldn't block suspend. Until NFS freeze handling is rewritten to use a signal to exit out of the critical section, add new *_unsafe versions of the helpers that will not run the lockdep test when 6aa9707 is reapplied, and call them from NFS. In practice the likley result of holding the lock while freezing is that a second task blocked on the lock will never freeze, aborting suspend, but it is possible to manufacture a case using the cgroup freezer, the lock, and the suspend freezer to create a deadlock. Silencing the lockdep warning here will allow problems to be found in other drivers that may have a more serious deadlock risk, and prevent new problems from being added. Signed-off-by: Colin Cross <ccross@android.com> Acked-by: Pavel Machek <pavel@ucw.cz> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
577b4232 |
|
08-Apr-2013 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Add functionality to allow waiting on all outstanding reads to complete This will later allow NFS locking code to wait for readahead to complete before releasing byte range locks. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
8c86899f |
|
15-Mar-2013 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: __nfs_find_lock_context needs to check ctx->lock_context for a match too Currently, we're forcing an unnecessary duplication of the initial nfs_lock_context in calls to nfs_get_lock_context, since __nfs_find_lock_context ignores the ctx->lock_context. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
f6488c9b |
|
27-Feb-2013 |
Jeff Layton <jlayton@kernel.org> |
nfs: don't allow nfs_find_actor to match inodes of the wrong type Benny Halevy reported the following oops when testing RHEL6: <7>nfs_update_inode: inode 892950 mode changed, 0040755 to 0100644 <1>BUG: unable to handle kernel NULL pointer dereference at (null) <1>IP: [<ffffffffa02a52c5>] nfs_closedir+0x15/0x30 [nfs] <4>PGD 81448a067 PUD 831632067 PMD 0 <4>Oops: 0000 [#1] SMP <4>last sysfs file: /sys/kernel/mm/redhat_transparent_hugepage/enabled <4>CPU 6 <4>Modules linked in: fuse bonding 8021q garp ebtable_nat ebtables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi softdog bridge stp llc xt_physdev ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 dm_round_robin dm_multipath objlayoutdriver2(U) nfs(U) lockd fscache auth_rpcgss nfs_acl sunrpc vhost_net macvtap macvlan tun kvm_intel kvm be2net igb dca ptp pps_core microcode serio_raw sg iTCO_wdt iTCO_vendor_support i7core_edac edac_core shpchp ext4 mbcache jbd2 sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] <4> <4>Pid: 6332, comm: dd Not tainted 2.6.32-358.el6.x86_64 #1 HP ProLiant DL170e G6 /ProLiant DL170e G6 <4>RIP: 0010:[<ffffffffa02a52c5>] [<ffffffffa02a52c5>] nfs_closedir+0x15/0x30 [nfs] <4>RSP: 0018:ffff88081458bb98 EFLAGS: 00010292 <4>RAX: ffffffffa02a52b0 RBX: 0000000000000000 RCX: 0000000000000003 <4>RDX: ffffffffa02e45a0 RSI: ffff88081440b300 RDI: ffff88082d5f5760 <4>RBP: ffff88081458bba8 R08: 0000000000000000 R09: 0000000000000000 <4>R10: 0000000000000772 R11: 0000000000400004 R12: 0000000040000008 <4>R13: ffff88082d5f5760 R14: ffff88082d6e8800 R15: ffff88082f12d780 <4>FS: 00007f728f37e700(0000) GS:ffff8800456c0000(0000) knlGS:0000000000000000 <4>CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b <4>CR2: 0000000000000000 CR3: 0000000831279000 CR4: 00000000000007e0 <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>Process dd (pid: 6332, threadinfo ffff88081458a000, task ffff88082fa0e040) <4>Stack: <4> 0000000040000008 ffff88081440b300 ffff88081458bbf8 ffffffff81182745 <4><d> ffff88082d5f5760 ffff88082d6e8800 ffff88081458bbf8 ffffffffffffffea <4><d> ffff88082f12d780 ffff88082d6e8800 ffffffffa02a50a0 ffff88082d5f5760 <4>Call Trace: <4> [<ffffffff81182745>] __fput+0xf5/0x210 <4> [<ffffffffa02a50a0>] ? do_open+0x0/0x20 [nfs] <4> [<ffffffff81182885>] fput+0x25/0x30 <4> [<ffffffff8117e23e>] __dentry_open+0x27e/0x360 <4> [<ffffffff811c397a>] ? inotify_d_instantiate+0x2a/0x60 <4> [<ffffffff8117e4b9>] lookup_instantiate_filp+0x69/0x90 <4> [<ffffffffa02a6679>] nfs_intent_set_file+0x59/0x90 [nfs] <4> [<ffffffffa02a686b>] nfs_atomic_lookup+0x1bb/0x310 [nfs] <4> [<ffffffff8118e0c2>] __lookup_hash+0x102/0x160 <4> [<ffffffff81225052>] ? selinux_inode_permission+0x72/0xb0 <4> [<ffffffff8118e76a>] lookup_hash+0x3a/0x50 <4> [<ffffffff81192a4b>] do_filp_open+0x2eb/0xdd0 <4> [<ffffffff8104757c>] ? __do_page_fault+0x1ec/0x480 <4> [<ffffffff8119f562>] ? alloc_fd+0x92/0x160 <4> [<ffffffff8117de79>] do_sys_open+0x69/0x140 <4> [<ffffffff811811f6>] ? sys_lseek+0x66/0x80 <4> [<ffffffff8117df90>] sys_open+0x20/0x30 <4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b <4>Code: 65 48 8b 04 25 c8 cb 00 00 83 a8 44 e0 ff ff 01 5b 41 5c c9 c3 90 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 48 8b 9e a0 00 00 00 <48> 8b 3b e8 13 0c f7 ff 48 89 df e8 ab 3d ec e0 48 83 c4 08 31 <1>RIP [<ffffffffa02a52c5>] nfs_closedir+0x15/0x30 [nfs] <4> RSP <ffff88081458bb98> <4>CR2: 0000000000000000 I think this is ultimately due to a bug on the server. The client had previously found a directory dentry. It then later tried to do an atomic open on a new (regular file) dentry. The attributes it got back had the same filehandle as the previously found directory inode. It then tried to put the filp because it failed the aops tests for O_DIRECT opens, and oopsed here because the ctx was still NULL. Obviously the root cause here is a server issue, but we can take steps to mitigate this on the client. When nfs_fhget is called, we always know what type of inode it is. In the event that there's a broken or malicious server on the other end of the wire, the client can end up crashing because the wrong ops are set on it. Have nfs_find_actor check that the inode type is correct after checking the fileid. The fileid check should rarely ever match, so it should only rarely ever get to this check. In the case where we have a broken server, we may see two different inodes with the same i_ino, but the client should be able to cope with them without crashing. This should fix the oops reported here: https://bugzilla.redhat.com/show_bug.cgi?id=913660 Reported-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
496ad9aa |
|
23-Jan-2013 |
Al Viro <viro@zeniv.linux.org.uk> |
new helper: file_inode(file) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
9ff593c4 |
|
01-Feb-2013 |
Eric W. Biederman <ebiederm@xmission.com> |
nfs: kuid and kgid conversions for nfs/inode.c - Use uid_eq and gid_eq when comparing kuids and kgids. - Use make_kuid(&init_user_ns, -2) and make_kgid(&init_user_ns, -2) as the initial uid and gid on nfs inodes, instead of using the typeunsafe value of -2. Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
322b2b90 |
|
11-Jan-2013 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
Revert "NFS: add nfs_sb_deactive_async to avoid deadlock" This reverts commit 324d003b0cd82151adbaecefef57b73f7959a469. The deadlock turned out to be caused by a workqueue limitation that has now been worked around in the RPC code (see comment in rpc_free_task). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
de242c0b |
|
20-Dec-2012 |
David Howells <dhowells@redhat.com> |
NFS: Use FS-Cache invalidation Use the new FS-Cache invalidation facility from NFS to deal with foreign changes being detected on the server rather than attempting to retire the old cookie and get a new one. The problem with the old method was that NFS did not wait for all outstanding storage and retrieval ops on the cache to complete. There was no automatic wait between the calls to ->readpages() and calls to invalidate_inode_pages2() as the latter can only wait on locked pages that have been added to the pagecache (which they haven't yet on entry to ->readpages()). This was leading to oopses like the one below when an outstanding read got cut off from its cookie by a premature release. BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 IP: [<ffffffffa0075118>] __fscache_read_or_alloc_pages+0x1dd/0x315 [fscache] PGD 15889067 PUD 15890067 PMD 0 Oops: 0000 [#1] SMP CPU 0 Modules linked in: cachefiles nfs fscache auth_rpcgss nfs_acl lockd sunrpc Pid: 4544, comm: tar Not tainted 3.1.0-rc4-fsdevel+ #1064 /DG965RY RIP: 0010:[<ffffffffa0075118>] [<ffffffffa0075118>] __fscache_read_or_alloc_pages+0x1dd/0x315 [fscache] RSP: 0018:ffff8800158799e8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8800070d41e0 RCX: ffff8800083dc1b0 RDX: 0000000000000000 RSI: ffff880015879960 RDI: ffff88003e627b90 RBP: ffff880015879a28 R08: 0000000000000002 R09: 0000000000000002 R10: 0000000000000001 R11: ffff880015879950 R12: ffff880015879aa4 R13: 0000000000000000 R14: ffff8800083dc158 R15: ffff880015879be8 FS: 00007f671e9d87c0(0000) GS:ffff88003bc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000000000a8 CR3: 000000001587f000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process tar (pid: 4544, threadinfo ffff880015878000, task ffff880015875040) Stack: ffffffffa00b1759 ffff8800070dc158 ffff8800000213da ffff88002a286508 ffff880015879aa4 ffff880015879be8 0000000000000001 ffff88002a2866e8 ffff880015879a88 ffffffffa00b20be 00000000000200da ffff880015875040 Call Trace: [<ffffffffa00b1759>] ? nfs_fscache_wait_bit+0xd/0xd [nfs] [<ffffffffa00b20be>] __nfs_readpages_from_fscache+0x7e/0x13f [nfs] [<ffffffff81095fe7>] ? __alloc_pages_nodemask+0x156/0x662 [<ffffffffa0098763>] nfs_readpages+0xee/0x187 [nfs] [<ffffffff81098a5e>] __do_page_cache_readahead+0x1be/0x267 [<ffffffff81098942>] ? __do_page_cache_readahead+0xa2/0x267 [<ffffffff81098d7b>] ra_submit+0x1c/0x20 [<ffffffff8109900a>] ondemand_readahead+0x28b/0x29a [<ffffffff810990ce>] page_cache_sync_readahead+0x38/0x3a [<ffffffff81091d8a>] generic_file_aio_read+0x2ab/0x67e [<ffffffffa008cfbe>] nfs_file_read+0xa4/0xc9 [nfs] [<ffffffff810c22c4>] do_sync_read+0xba/0xfa [<ffffffff810a62c9>] ? might_fault+0x4e/0x9e [<ffffffff81177a47>] ? security_file_permission+0x7b/0x84 [<ffffffff810c25dd>] ? rw_verify_area+0xab/0xc8 [<ffffffff810c29a4>] vfs_read+0xaa/0x13a [<ffffffff810c2a79>] sys_read+0x45/0x6c [<ffffffff813ac37b>] system_call_fastpath+0x16/0x1b Reported-by: Mark Moseley <moseleymark@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com>
|
#
eed99357 |
|
14-Dec-2012 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Ensure that we always drop inodes that have been marked as stale There is no need to cache stale inodes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
f48407dd |
|
15-Oct-2012 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Remove BUG_ON()s in the fs/nfs/inode.c Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
324d003b |
|
30-Oct-2012 |
Weston Andros Adamson <dros@netapp.com> |
NFS: add nfs_sb_deactive_async to avoid deadlock Use nfs_sb_deactive_async instead of nfs_sb_deactive when in a workqueue context. This avoids a deadlock where rpc_shutdown_client loops forever in a workqueue kworker context, trying to kill all RPC tasks associated with the client, while one or more of these tasks have already been assigned to the same kworker (and will never run rpc_exit_task). This approach is needed because RPC tasks that have already been assigned to a kworker by queue_work cannot be canceled, as explained in the comment for workqueue.c:insert_wq_barrier. Signed-off-by: Weston Andros Adamson <dros@netapp.com> [Trond: add module_get/put.] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
8c0a8537 |
|
25-Sep-2012 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
fs: push rcu_barrier() from deactivate_locked_super() to filesystems There's no reason to call rcu_barrier() on every deactivate_locked_super(). We only need to make sure that all delayed rcu free inodes are flushed before we destroy related cache. Removing rcu_barrier() from deactivate_locked_super() affects some fast paths. E.g. on my machine exit_group() of a last process in IPC namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
2a369153 |
|
13-Aug-2012 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Clean up helper function nfs4_select_rw_stateid() We want to be able to pass on the information that the page was not dirtied under a lock. Instead of adding a flag parameter, do this by passing a pointer to a 'struct nfs_lock_owner' that may be NULL. Also reuse this structure in struct nfs_lock_context to carry the fl_owner_t and pid_t. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
b3c54de6 |
|
13-Aug-2012 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Convert nfs_get_lock_context to return an ERR_PTR on failure We want to be able to distinguish between allocation failures, and the case where the lock context is not needed (because there are no locks). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
c3f52af3 |
|
03-Sep-2012 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix the initialisation of the readdir 'cookieverf' array When the NFS_COOKIEVERF helper macro was converted into a static inline function in commit 99fadcd764 (nfs: convert NFS_*(inode) helpers to static inline), we broke the initialisation of the readdir cookies, since that depended on doing a memset with an argument of 'sizeof(NFS_COOKIEVERF(inode))' which therefore changed from sizeof(be32 cookieverf[2]) to sizeof(be32 *). At this point, NFS_COOKIEVERF seems to be more of an obfuscation than a helper, so the best thing would be to just get rid of it. Also see: https://bugzilla.kernel.org/show_bug.cgi?id=46881 Reported-by: Andi Kleen <andi@firstfloor.org> Reported-by: David Binderman <dcb314@hotmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
|
#
29418aa4 |
|
31-Jul-2012 |
Mel Gorman <mgorman@suse.de> |
nfs: disable data cache revalidation for swapfiles The VM does not like PG_private set on PG_swapcache pages. As suggested by Trond in http://lkml.org/lkml/2006/8/25/348, this patch disables NFS data cache revalidation on swap files. as it does not make sense to have other clients change the file while it is being used as swap. This avoids setting PG_private on swap pages, since there ought to be no further races with invalidate_inode_pages2() to deal with. Since we cannot set PG_private we cannot use page->private which is already used by PG_swapcache pages to store the nfs_page. Thus augment the new nfs_page_find_request logic. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Eric B Munson <emunson@mgebm.net> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Neil Brown <neilb@suse.de> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Xiaotian Feng <dfeng@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
89d77c8f |
|
30-Jul-2012 |
Bryan Schumaker <bjschuma@netapp.com> |
NFS: Convert v4 into a module This patch exports symbols needed by the v4 module. In addition, I also switch over to using IS_ENABLED() to check if CONFIG_NFS_V4 or CONFIG_NFS_V4_MODULE are set. The module (nfs4.ko) will be created in the same directory as nfs.ko and will be automatically loaded the first time you try to mount over NFS v4. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
1c606fb7 |
|
30-Jul-2012 |
Bryan Schumaker <bjschuma@netapp.com> |
NFS: Convert v3 into a module This patch exports symbols and moves over the final structures needed by the v3 module. In addition, I also switch over to using IS_ENABLED() to check if CONFIG_NFS_V3 or CONFIG_NFS_V3_MODULE are set. The module (nfs3.ko) will be created in the same directory as nfs.ko and will be automatically loaded the first time you try to mount over NFS v3. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
ddda8e0a |
|
30-Jul-2012 |
Bryan Schumaker <bjschuma@netapp.com> |
NFS: Convert v2 into a module The module (nfs2.ko) will be created in the same directory as nfs.ko and will be automatically loaded the first time you try to mount over NFS v2. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
19d87ca3 |
|
30-Jul-2012 |
Bryan Schumaker <bjschuma@netapp.com> |
NFS: Split out remaining NFS v4 inode functions Somehow I missed this in my previous patch series, but these functions are only needed by the v4 code and should be moved to a v4-only file. I wasn't exactly sure where I should put these functions, so I moved them into nfs4super.c where I could make them static. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
ab7017a3 |
|
30-Jul-2012 |
Bryan Schumaker <bjschuma@netapp.com> |
NFS: Add version registering framework This patch adds in the code to track multiple versions of the NFS protocol. I created default structures for v2, v3 and v4 so that each version can continue to work while I convert them into kernel modules. I also removed the const parameter from the rpc_version array so that I can change it at runtime. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
129d1977 |
|
16-Jul-2012 |
Bryan Schumaker <bjschuma@netapp.com> |
NFS: Create an init_nfs_v4() function I want to initialize all of NFS v4 in a single function that will eventually be used as the v4 module init function. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
57ec14c5 |
|
20-Jun-2012 |
Bryan Schumaker <bjschuma@netapp.com> |
NFS: Create a return_delegation rpc op Delegations are a v4 feature, so push return_delegation out of the generic client by creating a new rpc_op and renaming the old function to be in the nfs v4 "namespace" Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
011e2a7f |
|
20-Jun-2012 |
Bryan Schumaker <bjschuma@netapp.com> |
NFS: Create a have_delegation rpc_op Delegations are a v4 feature, so push them out of the generic code. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
1a0de48a |
|
19-Jun-2012 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Initialise commit_info.rpc_out when !defined(CONFIG_NFS_V4) Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Fred Isaman <iisaman@netapp.com>
|
#
1d59d61f |
|
30-May-2012 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Ensure that setattr and getattr wait for O_DIRECT write completion Use the same mechanism as the block devices are using, but move the helper functions from fs/direct-io.c into fs/inode.c to remove the dependency on CONFIG_BLOCK. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Fred Isaman <iisaman@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
2701d086 |
|
24-May-2012 |
Andy Adamson <andros@netapp.com> |
NFSv4.1 add nfs_inode book keeping for mdsthreshold Keep track of the number of bytes read or written via buffered, direct, and mem-mapped i/o for use by mdsthreshold size_io hints. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
82be417a |
|
23-May-2012 |
Andy Adamson <andros@netapp.com> |
NFSv4.1 cache mdsthreshold values on OPEN Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
e73e6c9e |
|
30-Apr-2012 |
Matthew Treinish <treinish@linux.vnet.ibm.com> |
Fixed goto readability in nfs_update_inode. Simplified error gotos to make it slightly easier to read, it doesn't affect the functionality of the routine. Signed-off-by: Matthew Treinish <treinish@linux.vnet.ibm.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
dbd5768f |
|
03-May-2012 |
Jan Kara <jack@suse.cz> |
vfs: Rename end_writeback() to clear_inode() After we moved inode_sync_wait() from end_writeback() it doesn't make sense to call the function end_writeback() anymore. Rename it to clear_inode() which well says what the function really does - set I_CLEAR flag. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
|
#
d69ee9b8 |
|
01-May-2012 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Adapt readdirplus to application usage patterns While the use of READDIRPLUS is significantly more efficient than READDIR followed by many LOOKUP calls, it is still less efficient than just READDIR if the attributes are not required. This patch tracks when lookups are attempted on the directory, and uses that information to selectively disable READDIRPLUS on that directory. The first 'readdir' call is always served using READDIRPLUS. Subsequent calls only use READDIRPLUS if there was a successful lookup or revalidation on a child in the mean time. Credit for the original idea should go to Neil Brown. See: http://www.spinics.net/lists/linux-nfs/msg19996.html However, the implementation in this patch differs from Neil's in that it focuses on tracking lookups rather than calls to stat(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Neil Brown <neilb@suse.de>
|
#
fee7fe19 |
|
27-Apr-2012 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Simplify the cache invalidation code Now that NFSv2 and NFSv3 have simulated change attributes, instead of using all three of mtime, ctime and change attribute to manage data cache consistency, we can simplify the code to just use the change attribute. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
6a4506c0 |
|
27-Apr-2012 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Change attribute updates should set NFS_INO_REVAL_PAGECACHE Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
4124bbc5 |
|
27-Apr-2012 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Simplify nfs_fhget() If the inode is being initialised, there is no point in setting flags such as NFS_INO_INVALID_ACCESS, NFS_INO_INVALID_ACL or NFS_INO_INVALID_DATA since there are no cached access calls, acls or data caches to invalidate. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
01da47bd |
|
28-Apr-2012 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Optimise away nfs_check_inode_attributes() when holding a delegation We already know that the attribute cache is valid. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
b4b1eadf |
|
29-Apr-2012 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Don't force page cache revalidations when holding a delegation If we're holding a delegation, then we already know that our page cache is valid. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
ea2cf228 |
|
20-Apr-2012 |
Fred Isaman <iisaman@netapp.com> |
NFS: create struct nfs_commit_info It is COMMIT that is handled the most differently between the paged and direct paths. Create a structure that encapsulates everything either path needs to know about the commit state. We could use void to hide some of the layout driver stuff, but Trond suggests pulling it out to ensure type checking, given the huge changes being made, and the fact that it doesn't interfere with other drivers. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
9ffc93f2 |
|
28-Mar-2012 |
David Howells <dhowells@redhat.com> |
Remove all #inclusions of asm/system.h Remove all #inclusions of asm/system.h preparatory to splitting and killing it. Performed with the following command: perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *` Signed-off-by: David Howells <dhowells@redhat.com>
|
#
e27d359e |
|
18-Mar-2012 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
SUNRPC/NFS: Add Kbuild dependencies for NFS_DEBUG/RPC_DEBUG This allows us to turn on/off the dprintk() debugging interfaces for those distributions that don't ship the 'rpcdebug' utility. It also allows us to add Kbuild dependencies. Specifically, we already know that dprintk() in general relies on CONFIG_SYSCTL. Now it turns out that the NFS dprintks depend on CONFIG_CRC32 after we added support for the filehandle hash. Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
d6d6dc7c |
|
08-Mar-2012 |
Fred Isaman <iisaman@netapp.com> |
NFS: remove nfs_inode radix tree The radix tree is only being used to compile lists of reqs needing commit. It is simpler to just put the reqs directly into a list. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
4f1abd22 |
|
06-Mar-2012 |
Weston Andros Adamson <dros@netapp.com> |
NFS: add fh_crc to debug output Print the filehandle crc in two debug messages Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
d8e0539e |
|
06-Mar-2012 |
Weston Andros Adamson <dros@netapp.com> |
NFS: add filehandle crc for debug display Match wireshark's CRC-32 hash for easier debugging Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
fa68a1ba |
|
06-Mar-2012 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix a typo in _nfs_display_fhandle The check for 'fh == NULL' needs to come _before_ we dereference fh. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
20d27e92 |
|
01-Mar-2012 |
Chuck Lever <chuck.lever@oracle.com> |
NFS: Add a client-side function to display NFS file handles For debugging, introduce a simplistic function to print NFS file handles on the system console. The main function is hooked into the dprintk debugging facility, but you can directly call the helper, _nfs_display_fhandle(), if you want to print a handle unconditionally. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
28cd1b3f |
|
23-Jan-2012 |
Stanislav Kinsbursky <skinsbursky@parallels.com> |
NFS: make cb_ident_idr per net ns This patch makes ID's infrastructure network namespace aware. This was done mainly because of nfs_client_lock, which is desired to be per network namespace, but protects NFS clients ID's. NOTE: NFS client's net pointer have to be set prior to ID initialization, proper assignment was moved. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
6b13168b |
|
23-Jan-2012 |
Stanislav Kinsbursky <skinsbursky@parallels.com> |
NFS: make nfs_client_list per net ns This patch splits global list of NFS clients into per-net-ns array of lists. This looks more strict and clearer. BTW, this patch also makes "/proc/fs/nfsfs/servers" entry content depends on /proc mount owner pid namespace. See below for details. NOTE: few words about how was /proc/fs/nfsfs/ entries content show per network namespace done. This is a little bit tricky and not the best is could be. But it's cheap (proper fix for /proc conteinerization is a hard nut to crack). The idea is simple: take proper network namespace from pid namespace child reaper nsproxy of /proc/ mount creator. This actually means, that if there are 2 containers with different net namespace sharing pid namespace, then read of /proc/fs/nfsfs/ entries will always return content, taken from net namespace of pid namespace creator task (and thus second namespace set wil be unvisible). Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
a030889a |
|
26-Jan-2012 |
Weston Andros Adamson <dros@netapp.com> |
NFS: start printks w/ NFS: even if __func__ shown This patch addresses printks that have some context to show that they are from fs/nfs/, but for the sake of consistency now start with NFS: Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
ec7652aa |
|
06-Dec-2011 |
Stanislav Kinsbursky <skinsbursky@parallels.com> |
SUNRPC: register RPC stats /proc entries in passed network namespace context This patch makes it possible to create NFS program entry ("/proc/net/rpc/nfs") in passed network namespace context instead of hard-coded "init_net". Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
536e43d1 |
|
17-Jan-2012 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Optimise away unnecessary setattrs for open(O_TRUNC); Currently, we will correctly optimise away a truncate that doesn't change the file size. However, in the case of open(O_TRUNC), we also want to optimise away the time changes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
9e2e74db |
|
10-Jan-2012 |
Stanislav Kinsbursky <skinsbursky@parallels.com> |
NFS: blocklayout pipe creation per network namespace context introduced This patch implements blocklayout pipe creation and registration per each existent network namespace. This was achived by registering NFS per-net operations, responsible for blocklayout pipe allocation/register and unregister/destruction instead of initialization and destruction of static "bl_device_pipe" pipe (this one was removed). Note, than pointer to network blocklayout pipe is stored in per-net "nfs_net" structure, because allocating of one more per-net structure for blocklayout module looks redundant. This patch also changes dev_remove() function prototype (and all it's callers, where it' requied) by adding network namespace pointer parameter, which is used to discover proper blocklayout pipe for rpc_queue_upcall() call. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
1b340d01 |
|
25-Nov-2011 |
Stanislav Kinsbursky <skinsbursky@parallels.com> |
NFS: DNS resolver cache per network namespace context introduced This patch implements DNS resolver cache creation and registration for each alive network namespace context. This was done by registering NFS per-net operations, responsible for DNS cache allocation/register and unregister/destructioning instead of initialization and destruction of static "nfs_dns_resolve" cache detail (this one was removed). Pointer to network dns resolver cache is stored in new per-net "nfs_net" structure. This patch also changes nfs_dns_resolve_name() function prototype (and it's calls) by adding network pointer parameter, which is used to get proper DNS resolver cache pointer for do_cache_lookup_wait() call. Note: empty nfs_dns_resolver_init() and nfs_dns_resolver_destroy() functions will be used in next patch in the series. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
90ab5ee9 |
|
12-Jan-2012 |
Rusty Russell <rusty@rustcorp.com.au> |
module_param: make bool parameters really bool (drivers & misc) module_param(bool) used to counter-intuitively take an int. In fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy trick. It's time to remove the int/unsigned int option. For this version it'll simply give a warning, but it'll break next kernel version. Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
6926afd1 |
|
07-Jan-2012 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFSv4: Save the owner/group name string when doing open ...so that we can do the uid/gid mapping outside the asynchronous RPC context. This fixes a bug in the current NFSv4 atomic open code where the client isn't able to determine what the true uid/gid fields of the file are, (because the asynchronous nature of the OPEN call denies it the ability to do an upcall) and so fills them with default values, marking the inode as needing revalidation. Unfortunately, in some cases, the VFS will do some additional sanity checks on the file, and may override the server's decision to allow the open because it sees the wrong owner/group fields. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
6b520e05 |
|
12-Dec-2011 |
Al Viro <viro@zeniv.linux.org.uk> |
vfs: fix the stupidity with i_dentry in inode destructors Seeing that just about every destructor got that INIT_LIST_HEAD() copied into it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once(); the cost of taking it into inode_init_always() will be negligible for pipes and sockets and negative for everything else. Not to mention the removal of boilerplate code from ->destroy_inode() instances... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
5ede7b1c |
|
23-Oct-2011 |
Al Viro <viro@zeniv.linux.org.uk> |
pull manipulations of rpc_cred inside alloc_nfs_open_context() No need to duplicate them in both callers; make it return ERR_PTR(-ENOMEM) on allocation failure instead of NULL and it'll be able to report rpc_lookup_cred() failures just fine. Callers are much happier that way... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
d310310c |
|
01-Dec-2011 |
Jeff Layton <jlayton@kernel.org> |
Freezer / sunrpc / NFS: don't allow TASK_KILLABLE sleeps to block the freezer Allow the freezer to skip wait_on_bit_killable sleeps in the sunrpc layer. This should allow suspend and hibernate events to proceed, even when there are RPC's pending on the wire. Also, wrap the TASK_KILLABLE sleeps in NFS layer in freezer_do_not_count and freezer_count calls. This allows the freezer to skip tasks that are sleeping while looping on EJUKEBOX or NFS4ERR_DELAY sorts of errors. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
|
#
1788ea6e |
|
04-Nov-2011 |
Jeff Layton <jlayton@kernel.org> |
nfs: when attempting to open a directory, fall back on normal lookup (try #5) commit d953126 changed how nfs_atomic_lookup handles an -EISDIR return from an OPEN call. Prior to that patch, that caused the client to fall back to doing a normal lookup. When that patch went in, the code began returning that error to userspace. The d_revalidate codepath however never had the corresponding change, so it was still possible to end up with a NULL ctx->state pointer after that. That patch caused a regression. When we attempt to open a directory that does not have a cached dentry, that open now errors out with EISDIR. If you attempt the same open with a cached dentry, it will succeed. Fix this by reverting the change in nfs_atomic_lookup and allowing attempts to open directories to fall back to a normal lookup Also, add a NFSv4-specific f_ops->open routine that just returns -ENOTDIR. This should never be called if things are working properly, but if it ever is, then the dprintk may help in debugging. To facilitate this, a new file_operations field is also added to the nfs_rpc_ops struct. Cc: stable@kernel.org Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
bfe86848 |
|
28-Oct-2011 |
Miklos Szeredi <mszeredi@suse.cz> |
filesystems: add set_nlink() Replace remaining direct i_nlink updates with a new set_nlink() updater function. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
6d6b77f1 |
|
28-Oct-2011 |
Miklos Szeredi <mszeredi@suse.cz> |
filesystems: add missing nlink wrappers Replace direct i_nlink updates with the respective updater function (inc_nlink, drop_nlink, clear_nlink, inode_dec_link_count). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
#
a9a4a87a |
|
17-Oct-2011 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Use the inode->i_version to cache NFSv4 change attribute information Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
3d4ff43d |
|
22-Jun-2011 |
Al Viro <viro@zeniv.linux.org.uk> |
nfs_open_context doesn't need struct path either just dentry, please... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
533eb461 |
|
13-Jun-2011 |
Andy Adamson <andros@netapp.com> |
NFSv4.1: allow nfs_fhget to succeed with mounted on fileid Commit 28331a46d88459788c8fca72dbb0415cd7f514c9 "Ensure we request the ordinary fileid when doing readdirplus" changed the meaning of NFS_ATTR_FATTR_FILEID which used to be set when FATTR4_WORD1_MOUNTED_ON_FILED was requested. Allow nfs_fhget to succeed with only a mounted on fileid when crossing a mountpoint or a referral. Ask for the fileid of the absent file system if mounted_on_fileid is not supported. Signed-off-by: Andy Adamson <andros@netapp.com> cc:stable@kernel.org [2.6.39] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
0f66b598 |
|
16-Oct-2010 |
Peng Tao <bergwolf@gmail.com> |
NFS41: do not update isize if inode needs layoutcommit nfs_update_inode will update isize if there is no queued pages. For pNFS, layoutcommit is supposed to change file size on server, the same effect as queued pages. nfs_update_inode may be called when dirty pages are written back (nfsi->npages==0) but layoutcommit is not sent, and it will change client file size according to server file size. Then client ends up losing what it just writes back in pNFS path. So we should skip updating client file size if file needs layoutcommit. Signed-off-by: Peng Tao <peng_tao@emc.com> Cc: stable@kernel.org [2.6.39] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
cbe82603 |
|
22-May-2011 |
Benny Halevy <bhalevy@panasas.com> |
pnfs: layoutreturn NFSv4.1 LAYOUTRETURN implementation Currently, does not support layout-type payload encoding. Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: Zhang Jingwang <zhangjingwang@nrchpc.ac.cn> [call pnfs_return_layout right before pnfs_destroy_layout] [remove assert_spin_locked from pnfs_clear_lseg_list] [remove wait parameter from the layoutreturn path.] [remove return_type field from nfs4_layoutreturn_args] [remove range from nfs4_layoutreturn_args] [no need to send layoutcommit from _pnfs_return_layout] [don't wait on sync layoutreturn] [fix layout stateid in layoutreturn args] [fixed NULL deref in _pnfs_return_layout] [removed recaim member of nfs4_layoutreturn_args] Signed-off-by: Benny Halevy <bhalevy@panasas.com>
|
#
60c16ea8 |
|
23-May-2011 |
Harshula Jayasuriya <harshula@redhat.com> |
NFS: nfs_update_inode: print current and new inode size in debug output Hi Trond, In nfs_update_inode debug output, print the current and new inode size when the file size changes on the NFS server. Signed-off-by: Harshula Jayasuriya <harshula@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
7ebb9315 |
|
24-Mar-2011 |
Bryan Schumaker <bjschuma@netapp.com> |
NFS: use secinfo when crossing mountpoints A submount may use different security than the parent mount does. We should figure out what sec flavor the submount uses at mount time. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
e0c2b380 |
|
23-Mar-2011 |
Fred Isaman <iisaman@netapp.com> |
NFSv4.1: filelayout driver specific code for COMMIT Implement all the hooks created in the previous patches. This requires exporting quite a few functions and adding a few structure fields. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
480c2006 |
|
23-Mar-2011 |
Bryan Schumaker <bjschuma@netapp.com> |
NFS: Create nfs_open_dir_context nfs_opendir() created a context that held much more information than we need for a readdir. This patch introduces a slimmed-down nfs_open_dir_context that contains only the cookie and the cred used for RPC operations. The new context will eventually be used to help detect readdir loops. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
3fa0b4e2 |
|
02-Dec-2010 |
Frank Filz <ffilzlnx@us.ibm.com> |
(try3-resend) Fix nfs_compat_user_ino64 so it doesn't cause problems if bit 31 or 63 are set in fileid The problem was use of an int32, which when converted to a uint64 is sign extended resulting in a fileid that doesn't fit in 32 bits even though the intent of the function is to fit the fileid into 32 bits. Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> [Trond: Added an include for compat.h] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
27dc1cd3 |
|
25-Jan-2011 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: nfs_wcc_update_inode() should set nfsi->attr_gencount If the call to nfs_wcc_update_inode() results in an attribute update, we need to ensure that the inode's attr_gencount gets bumped too, otherwise we are not protected against races with other GETATTR calls. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
ada609ee |
|
25-Jan-2011 |
Tejun Heo <tj@kernel.org> |
workqueue: use WQ_MEM_RECLAIM instead of WQ_RESCUER WQ_RESCUER is now an internal flag and should only be used in the workqueue implementation proper. Use WQ_MEM_RECLAIM instead. This doesn't introduce any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: dm-devel@redhat.com Cc: Neil Brown <neilb@suse.de>
|
#
36d43a43 |
|
14-Jan-2011 |
David Howells <dhowells@redhat.com> |
NFS: Use d_automount() rather than abusing follow_link() Make NFS use the new d_automount() dentry operation rather than abusing follow_link() on directories. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
fa0d7e3d |
|
06-Jan-2011 |
Nick Piggin <npiggin@kernel.dk> |
fs: icache RCU free inodes RCU free the struct inode. This will allow: - Subsequent store-free path walking patch. The inode must be consulted for permissions when walking, so an RCU inode reference is a must. - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want to take i_lock no longer need to take sb_inode_list_lock to walk the list in the first place. This will simplify and optimize locking. - Could remove some nested trylock loops in dcache code - Could potentially simplify things a bit in VM land. Do not need to take the page lock to follow page->mapping. The downsides of this is the performance cost of using RCU. In a simple creat/unlink microbenchmark, performance drops by about 10% due to inability to reuse cache-hot slab objects. As iterations increase and RCU freeing starts kicking over, this increases to about 20%. In cases where inode lifetimes are longer (ie. many inodes may be allocated during the average life span of a single inode), a lot of this cache reuse is not applicable, so the regression caused by this patch is smaller. The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU, however this adds some complexity to list walking and store-free path walking, so I prefer to implement this at a later date, if it is shown to be a win in real situations. I haven't found a regression in any non-micro benchmark so I doubt it will be a problem. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
|
#
4541d16c |
|
06-Jan-2011 |
Fred Isaman <iisaman@netapp.com> |
pnfs: change how lsegs are removed from layout list This is to prepare the way for sensible io draining. Instead of just removing the lseg from the list, we instead clear the VALID flag (preventing new io from grabbing references to the lseg) and remove the reference holding it in the list. Thus the lseg will be removed once any io in progress completes and any references still held are dropped. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
f4eecd5d |
|
05-Jan-2011 |
Andy Adamson <andros@netapp.com> |
NFS implement v4.0 callback_ident Use the small id to pointer translator service to provide a unique callback identifier per SETCLIENTID call used to identify the v4.0 callback service associated with the clientid. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
11de3b11 |
|
01-Dec-2010 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix a memory leak in nfs_readdir We need to ensure that the entries in the nfs_cache_array get cleared when the page is removed from the page cache. To do so, we use the freepage address_space operation. Change nfs_readdir_clear_array to use kmap_atomic(), so that the function can be safely called from all contexts. Finally, modify the cache_page_release helper to call nfs_readdir_clear_array directly, when dealing with an anonymous page from 'uncached_readdir'. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
e5e94017 |
|
19-Oct-2010 |
Benny Halevy <bhalevy@panasas.com> |
NFS: create and destroy inode's layout cache At the start of the io paths, try to grab the relevant layout information. This will initiate the inode's layout cache, but stubs ensure the cache stays empty. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Dean Hildebrand <dhildebz@umich.edu> Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn> Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
0715dc63 |
|
24-Sep-2010 |
Bryan Schumaker <bjschuma@netapp.com> |
NFS: remove readdir plus limit We will now use readdir plus even on directories that are very large. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
955a857e |
|
29-Sep-2010 |
Bryan Schumaker <bjschuma@netapp.com> |
NFS: new idmapper This patch creates a new idmapper system that uses the request-key function to place a call into userspace to map user and group ids to names. The old idmapper was single threaded, which prevented more than one request from running at a single time. This means that a user would have to wait for an upcall to finish before accessing a cached result. The upcall result is stored on a keyring of type id_resolver. See the file Documentation/filesystems/nfs/idmapper.txt for instructions. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> [Trond: fix up the return value of nfs_idmap_lookup_name and clean up code] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
5c78f58e |
|
27-Sep-2010 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Really fix put_nfs_open_context() In nfs_open_revalidate(), if the open_context() call returns an inode that is not the same as dentry->d_inode, then we will call put_nfs_open_context() with a valid dentry->d_inode, but without the context being part of the nfsi->open_files list. In this case too, we want to just skip the list removal, but we do want to call the ->close_context() callback in order to close the NFSv4 state. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Jeff Layton <jlayton@redhat.com>
|
#
ef84303e |
|
22-Sep-2010 |
Benny Halevy <bhalevy@panasas.com> |
NFS: handle inode==NULL in __put_nfs_open_context inode may be NULL when put_nfs_open_context is called from nfs_atomic_lookup before d_add_unique(dentry, inode) Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
60958892 |
|
21-Sep-2010 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Convert nfsiod to use alloc_workqueue() create_singlethread_workqueue() is deprecated. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
cd9a1c0e |
|
17-Sep-2010 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFSv4: Clean up nfs4_atomic_open Start moving the 'struct nameidata' dependent code out of the lower level NFS code in preparation for the removal of open intents. Instead of the struct nameidata, we pass down a partially initialised struct nfs_open_context that will be fully initialised by the atomic open upon success. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
b57922d9 |
|
07-Jun-2010 |
Al Viro <viro@zeniv.linux.org.uk> |
convert remaining ->clear_inode() to ->evict_inode() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
1b924e5f |
|
31-Jul-2010 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Clean up the callers of nfs_wb_all() There is no need to flush out writes before calling nfs_wb_all(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
f11ac8db |
|
25-Jun-2010 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFSv4: Ensure that we track the NFSv4 lock state in read/write requests. This patch fixes bugzilla entry 14501: https://bugzilla.kernel.org/show_bug.cgi?id=14501 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
d7cf8dd0 |
|
16-Apr-2010 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFSv4: Allow attribute caching with 'noac' mounts if client holds a delegation If the server has given us a delegation on a file, we _know_ that we can cache the attribute information even when the user has specified 'noac'. Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
987f8dfc |
|
16-Apr-2010 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Reduce stack footprint of nfs_setattr() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
a3cba2aa |
|
16-Apr-2010 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Reduce stack footprint of nfs_revalidate_inode() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
2d36bfde |
|
16-Apr-2010 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Add helper functions for allocating filehandles and fattr structs NFS Filehandles and struct fattr are really too large to be allocated on the stack. This patch adds in a couple of helper functions to allocate them dynamically instead. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
1544fa0f |
|
25-Mar-2010 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix the mode calculation in nfs_find_open_context Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
5a0e3ad6 |
|
24-Mar-2010 |
Tejun Heo <tj@kernel.org> |
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
|
#
b4d2314b |
|
10-Mar-2010 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFSv4: Don't ignore the NFS_INO_REVAL_FORCED flag in nfs_revalidate_inode() If the NFS_INO_REVAL_FORCED flag is set, that means that we don't yet have an up to date attribute cache. Even if we hold a delegation, we must put a GETATTR on the wire. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org
|
#
1cda707d |
|
19-Feb-2010 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Remove requirement for inode->i_mutex from nfs_invalidate_mapping Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
5cf95214 |
|
19-Feb-2010 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Clean up nfs_sync_mapping Remove the redundant call to filemap_write_and_wait(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
acdc53b2 |
|
19-Feb-2010 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Replace __nfs_write_mapping with sync_inode() Now that we have correct COMMIT semantics in writeback_single_inode, we can reduce and simplify nfs_wb_all(). Also replace nfs_wb_nocommit() with a call to filemap_write_and_wait(), which doesn't need to hold the inode->i_mutex. With that done, we can eliminate nfs_write_mapping() altogether. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
ff778d02 |
|
19-Feb-2010 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Add a count of the number of unstable writes carried by an inode In order to know when we should do opportunistic commits of the unstable writes, when the VM is doing a background flush, we add a field to count the number of unstable writes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
8fc795f7 |
|
19-Feb-2010 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Cleanup - move nfs_write_inode() into fs/nfs/write.c The sole purpose of nfs_write_inode is to commit unstable writes, so move it into fs/nfs/write.c, and make nfs_commit_inode static. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
a9185b41 |
|
05-Mar-2010 |
Christoph Hellwig <hch@lst.de> |
pass writeback_control to ->write_inode This gives the filesystem more information about the writeback that is happening. Trond requested this for the NFS unstable write handling, and other filesystems might benefit from this too by beeing able to distinguish between the different callers in more detail. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
26821ed4 |
|
05-Mar-2010 |
Christoph Hellwig <hch@lst.de> |
make sure data is on disk before calling ->write_inode Similar to the fsync issue fixed a while ago in commit 2daea67e966dc0c42067ebea015ddac6834cef88 we need to write for data to actually hit the disk before writing out the metadata to guarantee data integrity for filesystems that modify the inode in the data I/O completion path. Currently XFS and NFS handle this manually, and AFS has a write_inode method that does nothing but waiting for data, while others are possibly missing out on this. Fortunately this change has a lot less impact than the fsync change as none of the write_inode methods starts data writeout of any form by itself. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
6eae7974 |
|
30-Jan-2010 |
Al Viro <viro@zeniv.linux.org.uk> |
Switch alloc_nfs_open_context() to struct path Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
f895c53f |
|
01-Feb-2010 |
Chuck Lever <chuck.lever@oracle.com> |
NFS: Make close(2) asynchronous when closing NFS O_DIRECT files For NFSv2 and v3: O_DIRECT writes are always synchronous, and aren't cached, so nothing should be flushed when closing an NFS O_DIRECT file descriptor. Thus there are no write errors to report on close(2). In addition, there's no cached data to verify on the next open(2), so we don't need clean GETATTR results at close time to compare with. Thus, there's no need for the nfs_revalidate_inode() call when closing an NFS O_DIRECT file. This reduces the number of synchronous on-the-wire requests for a simple open-write-close of an NFS O_DIRECT file by roughly 20%. For NFSv4: Call nfs4_do_close() with wait set to zero when closing an NFS O_DIRECT file. The CLOSE will go on the wire, but the application won't wait for it to complete. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
9b4b3513 |
|
03-Feb-2010 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Don't clobber the attribute type in nfs_update_inode() If the NFS_ATTR_FATTR_TYPE field isn't set in fattr->valid, then we should not set the S_IFMT part of inode->i_mode. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
c08d3b0e |
|
20-Aug-2009 |
npiggin@suse.de <npiggin@suse.de> |
truncate: use new helpers Update some fs code to make use of new helper functions introduced in the previous patch. Should be no significant change in behaviour (except CIFS now calls send_sig under i_lock, via inode_newsize_ok). Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Miklos Szeredi <miklos@szeredi.hu> Cc: linux-nfs@vger.kernel.org Cc: Trond.Myklebust@netapp.com Cc: linux-cifs-client@lists.samba.org Cc: sfrench@samba.org Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
e571cbf1 |
|
19-Aug-2009 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Add a dns resolver for use with NFSv4 referrals and migration The NFSv4 and NFSv4.1 protocols both allow for the redirection of a client from one server to another in order to support filesystem migration and replication. For full protocol support, we need to add the ability to convert a DNS host name into an IP address that we can feed to the RPC client. We'll reuse the sunrpc cache, now that it has been converted to work with rpc_pipefs. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
62ab460c |
|
09-Aug-2009 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFSv4: Add 'server capability' flags for NFSv4 recommended attributes If the NFSv4 server doesn't support a POSIX attribute, the generic NFS code needs to know that, so that it don't keep trying to poll for it. However, by the same count, if the NFSv4 server does support that attribute, then we should ensure that the inode metadata is appropriately labelled as being untrusted. For instance, if we don't know the correct value of the file's uid, we should certainly not be caching ACLs or ACCESS results. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
405f5571 |
|
11-Jul-2009 |
Alexey Dobriyan <adobriyan@gmail.com> |
headers: smp_lock.h redux * Remove smp_lock.h from files which don't need it (including some headers!) * Add smp_lock.h to files which do need it * Make smp_lock.h include conditional in hardirq.h It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT This will make hardirq.h inclusion cheaper for every PREEMPT=n config (which includes allmodconfig/allyesconfig, BTW) Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
ef79c097 |
|
03-Apr-2009 |
David Howells <dhowells@redhat.com> |
NFS: Use local disk inode cache Bind data storage objects in the local cache to NFS inodes. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
|
#
8ec442ae |
|
03-Apr-2009 |
David Howells <dhowells@redhat.com> |
NFS: Register NFS for caching and retrieve the top-level index Register NFS for caching and retrieve the top-level cache index object cookie. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
|
#
7fe5c398 |
|
19-Mar-2009 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Optimise NFS close() Close-to-open cache consistency rules really only require us to flush out writes on calls to close(), and require us to revalidate attributes on the very last close of the file. Currently we appear to be doing a lot of extra attribute revalidation and cache flushes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
72cb77f4 |
|
11-Mar-2009 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Throttle page dirtying while we're flushing to disk The following patch is a combination of a patch by myself and Peter Staubach. Trond: If we allow other processes to dirty pages while a process is doing a consistency sync to disk, we can end up never making progress. Peter: Attached is a patch which addresses a continuing problem with the NFS client generating out of order WRITE requests. While this is compliant with all of the current protocol specifications, there are servers in the market which can not handle out of order WRITE requests very well. Also, this may lead to sub-optimal block allocations in the underlying file system on the server. This may cause the read throughputs to be reduced when reading the file from the server. Peter: There has been a lot of work recently done to address out of order issues on a systemic level. However, the NFS client is still susceptible to the problem. Out of order WRITE requests can occur when pdflush is in the middle of writing out pages while the process dirtying the pages calls generic_file_buffered_write which calls generic_perform_write which calls balance_dirty_pages_rate_limited which ends up calling writeback_inodes which ends up calling back into the NFS client to writes out dirty pages for the same file that pdflush happens to be working with. Signed-off-by: Peter Staubach <staubach@redhat.com> [modification by Trond to merge the two similar patches] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
fb8a1f11 |
|
11-Mar-2009 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: cleanup - remove struct nfs_inode->ncommit Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
9e6e70f8 |
|
11-Mar-2009 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFSv4: Support NFSv4 optional attributes in the struct nfs_fattr Currently, filling struct nfs_fattr is more or less an all or nothing operation, since NFSv2 and NFSv3 have only mandatory attributes. In NFSv4, some attributes are optional, and so we may simply not be able to fill in those fields. Furthermore, NFSv4 allows you to specify which attributes you are interested in retrieving, thus permitting you to optimise away retrieval of attributes that you know will no change... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
37d9d76d |
|
11-Mar-2009 |
NeilBrown <neilb@suse.de> |
NFS: flush cached directory information slightly more readily. If cached directory contents becomes incorrect, there is no way to flush the contents. This contrasts with files where file locking is the recommended way to ensure cache consistency between multiple applications (a read-lock always flushes the cache). Also while changes to files often change the size of the file (thus triggering a cache flush), changes to directories often do not change the apparent size (as the size is often rounded to a block size). So it is particularly important with directories to avoid the possibility of an incorrect cache wherever possible. When the link count on a directory changes it implies a change in the number of child directories, and so a change in the contents of this directory. So use that as a trigger to flush cached contents. When the ctime changes but the mtime does not, there are two possible reasons. 1/ The owner/mode information has been changed. 2/ utimes has been used to set the mtime backwards. In the first case, a data-cache flush is not required. In the second case it is. So on the basis that correctness trumps performance, flush the directory contents cache in this case also. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
2b57dc6c |
|
11-Mar-2009 |
Suresh Jayaraman <sjayaraman@suse.de> |
NFS: Minor __nfs_revalidate_inode cleanup Remove redundant NFS_STALE() check, a leftover due to the commit 691beb13cdc88358334ef0ba867c080a247a760f Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
64672d55 |
|
23-Dec-2008 |
Peter Staubach <staubach@redhat.com> |
optimize attribute timeouts for "noac" and "actimeo=0" Hi. I've been looking at a bugzilla which describes a problem where a customer was advised to use either the "noac" or "actimeo=0" mount options to solve a consistency problem that they were seeing in the file attributes. It turned out that this solution did not work reliably for them because sometimes, the local attribute cache was believed to be valid and not timed out. (With an attribute cache timeout of 0, the cache should always appear to be timed out.) In looking at this situation, it appears to me that the problem is that the attribute cache timeout code has an off-by-one error in it. It is assuming that the cache is valid in the region, [read_cache_jiffies, read_cache_jiffies + attrtimeo]. The cache should be considered valid only in the region, [read_cache_jiffies, read_cache_jiffies + attrtimeo). With this change, the options, "noac" and "actimeo=0", work as originally expected. This problem was previously addressed by special casing the attrtimeo == 0 case. However, since the problem is only an off- by-one error, the cleaner solution is address the off-by-one error and thus, not require the special case. Thanx... ps Signed-off-by: Peter Staubach <staubach@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
dc0b027d |
|
23-Dec-2008 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFSv4: Convert the open and close ops to use fmode Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
ae05f269 |
|
28-Oct-2008 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Convert nfs_attr_generation_counter into an atomic_long The most important property we need from nfs_attr_generation_counter is monotonicity, which is not guaranteed by the current system of smp memory barriers. We should convert it to an atomic_long_t, and drop the memory barriers. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
526719ba |
|
27-Oct-2008 |
Alan Cox <alan@lxorguk.ukuu.org.uk> |
Switch to a valid email address... Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
011935a0 |
|
14-Oct-2008 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix a resolution problem with nfs_inode->cache_change_attribute The cache_change_attribute is used to decide whether or not a directory has changed, in which case we may need to look it up again. Again, the use of 'jiffies' leads to an issue of resolution. Once again, the fix is to change nfs_inode->cache_change_attribute, and just make it a simple counter. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
4704f0e2 |
|
14-Oct-2008 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix the resolution problem with nfs_inode_attrs_need_update() It appears that 'jiffies' timestamps do not have high enough resolution for nfs_inode_attrs_need_update(). One problem is that a GETATTR can be launched within < 1 jiffy of the last operation that updated the attribute. Another problem is that RPC calls can take < 1 jiffy to execute. We can fix this by switching the variables to use a simple global counter that gets incremented every time we start another GETATTR call. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
921615f1 |
|
14-Oct-2008 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Changes to inode->i_nlinks must set the NFS_INO_INVALID_ATTR flag Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
03254e65 |
|
09-Oct-2008 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix attribute updates This fixes a regression seen when running the Connectathon testsuite against an ext3 filesystem. The reason was that the inode was constantly being marked as 'just updated' by the jiffy wraparound test. This again meant that newer GETATTR calls were failing to pass the nfs_inode_attrs_need_update() test unless the changes caused a ctime update on the server, since they were perceived as having been started before the latest inode update. Given that nfs_inode_attrs_need_update() already checks for wraparound of nfsi->last_updated, we can drop the buggy "protection" in nfs_update_inode(). Also make a slight micro-optimisation of nfs_inode_attrs_need_update(): we are more often going to see time_after(fattr->time_start, nfsi->last_updated) be true, rather than seeing an update of ctime/size, so put that test first to ensure that we optimise away the ctime/size tests. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
691beb13 |
|
05-Oct-2008 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Allow concurrent inode revalidation Currently, if two processes are both trying to revalidate metadata for the same inode, they will find themselves being serialised. There is no good justification for this now that we have improved our ability to detect stale attribute data, so we should remove that serialisation. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
2f28ea61 |
|
05-Oct-2008 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix up nfs_setattr_update_inode() Ensure that it sets the inode metadata under the correct spinlock. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
076f1fc9 |
|
05-Oct-2008 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Don't clear nfsi->cache_validity in nfs_check_inode_attributes() If we're merely checking the inode attributes because we suspect that the 'updated' attributes returned by the RPC call are stale, then we shouldn't be doing weak cache consistency updates or clearing the cache_validity flags. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
4dc05efb |
|
23-Sep-2008 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Convert __nfs_revalidate_inode() to use nfs_refresh_inode() In the case where there are parallel RPC calls to the same inode, we may receive stale metadata due to the lack of ordering, hence the sanity checking of metadata in nfs_refresh_inode(). Currently, __nfs_revalidate_inode() is calling nfs_update_inode() directly, without any further sanity checks, and hence may end up setting the inode up with stale metadata. Fix is to use nfs_refresh_inode() instead of nfs_update_inode(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
d65f557f |
|
04-Oct-2008 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix nfs_post_op_update_inode_force_wcc() If we believe that the attributes are old (see nfs_refresh_inode()), then we shouldn't force an update. Also ensure that we hold the inode->i_lock across attribute checks and the call to nfs_refresh_inode_locked() to ensure that we don't race with other attribute updates. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
a10ad176 |
|
23-Sep-2008 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix the NFS attribute update Currently nfs_refresh_inode() will only update the inode metadata if it sees that the RPC call that returned the nfs_fattr was started after the last update of the inode. This means that if we have parallel RPC calls to the same inode (when sending WRITE calls, for instance), we may often miss updates. This patch attempts to recover those missed updates by also accepting them if the ctime in the nfs_fattr is more recent than the inode's cached ctime. It also recovers the case where the file size has increased, but the ctime has not been updated due to limited ctime resolution. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
870a5be8 |
|
04-Oct-2008 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Clean up nfs_refresh_inode() and nfs_post_op_update_inode() Try to avoid taking and dropping the inode->i_lock more than once. Do so by moving the code in nfs_refresh_inode() that needs to be done under the spinlock into a function nfs_refresh_inode_locked(), and then having both nfs_refresh_inode() and nfs_post_op_update_inode() call it directly. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
51cc5068 |
|
25-Jul-2008 |
Alexey Dobriyan <adobriyan@gmail.com> |
SL*B: drop kmem cache argument from constructor Kmem cache passed to constructor is only needed for constructors that are themselves multiplexeres. Nobody uses this "feature", nor does anybody uses passed kmem cache in non-trivial way, so pass only pointer to object. Non-trivial places are: arch/powerpc/mm/init_64.c arch/powerpc/mm/hugetlbpage.c This is flag day, yes. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Christoph Lameter <cl@linux-foundation.org> Cc: Jon Tollefson <kniht@linux.vnet.ibm.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Matt Mackall <mpm@selenic.com> [akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c] [akpm@linux-foundation.org: fix mm/slab.c] [akpm@linux-foundation.org: fix ubifs] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
fa6dc9dc |
|
11-Jun-2008 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Remove attribute update related BKL references Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
a3d01454 |
|
10-Jun-2008 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Remove BKL requirement from attribute updates The main problem is dealing with inode->i_size: we need to set the inode->i_lock on all attribute updates, and so vmtruncate won't cut it. Make an NFS-private version of vmtruncate that has the necessary locking semantics. The result should be that the following inode attribute updates are protected by inode->i_lock nfsi->cache_validity nfsi->read_cache_jiffies nfsi->attrtimeo nfsi->attrtimeo_timestamp nfsi->change_attr nfsi->last_updated nfsi->cache_change_attribute nfsi->access_cache nfsi->access_cache_entry_lru nfsi->access_cache_inode_lru nfsi->acl_access nfsi->acl_default nfsi->nfs_page_tree nfsi->ncommit nfsi->npages nfsi->open_files nfsi->silly_list nfsi->acl nfsi->open_states inode->i_size inode->i_atime inode->i_mtime inode->i_ctime inode->i_nlink inode->i_uid inode->i_gid The following is protected by dir->i_mutex nfsi->cookieverf Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
f41f7418 |
|
11-Jun-2008 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Ensure we zap only the access and acl caches when setting new acls ...and ensure that we obey the NFS_INO_INVALID_ACL flag when retrieving the acls. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
659bfcd6 |
|
10-Jun-2008 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix the ftruncate() credential problem ftruncate() access checking is supposed to be performed at open() time, just like reads and writes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
31f31db1 |
|
02-May-2008 |
Jan Blunck <jblunck@suse.de> |
nfs: path_{get,put}() cleanups Here are some more places where path_{get,put}() can be used instead of dput()/mntput() pair. Signed-off-by: Jan Blunck <jblunck@suse.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
3110ff80 |
|
02-May-2008 |
Harvey Harrison <harvey.harrison@gmail.com> |
nfs: replace remaining __FUNCTION__ occurrences __FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
b0b53973 |
|
05-May-2008 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Ensure that 'noac' and/or 'actimeo=0' turn off attribute caching Both the 'noac' and 'actimeo=0' mount options should ensure that attributes are not cached, however a bug in nfs_attribute_timeout() means that currently, the attributes may in fact get cached for up to one jiffy. This has been seen to cause corruption in some applications. The reason for the bug is that the time_in_range() test returns 'true' as long as the current time lies between nfsi->read_cache_jiffies and nfsi->read_cache_jiffies + nfsi->attrtimeo. In other words, if jiffies equals nfsi->read_cache_jiffies, then we still cache the attribute data. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
66d3aac0 |
|
31-Mar-2008 |
Jeff Layton <jlayton@kernel.org> |
NFS: initialize flags field in nfs_open_context The nfs_open_context struct had a "flags" field added recently, but the allocator isn't initializing it. It also looks like the allocator isn't initializing the mode or list either, but they seem to be overwritten by the caller, so that's less of an issue. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
98a8e323 |
|
11-Mar-2008 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
SUNRPC: Add a helper rpcauth_lookup_generic_cred() The NFSv4 protocol allows clients to negotiate security protocols on the fly in the case where an administrator on the server changes the export settings and/or in the case where we may have a filesystem migration event. Instead of having the NFS client code cache credentials that are tied to a particular AUTH method it is therefore preferable to have a generic credential that can be converted into whatever AUTH is in use by the RPC client when the read/write/sillyrename/... is put on the wire. We do this by means of the new "generic" credential, which basically just caches the minimal information that is needed to look up an RPCSEC_GSS, AUTH_SYS, or AUTH_NULL credential. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
c37dcd33 |
|
05-Mar-2008 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix the fsid revalidation in nfs_update_inode() When we detect that we've crossed a mountpoint on the remote server, we must take care not to use that inode to revalidate the fsid on our current superblock. To do so, we label the inode as a remote mountpoint, and check for that in nfs_update_inode(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
5746006f |
|
19-Feb-2008 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Add an nfsiod workqueue NFS post-rpciod cleanups often involve tasks that cannot be safely performed within the rpciod context (due to deadlock concerns). We therefore add a dedicated NFS workqueue that can perform tasks like cleaning up state after an interrupted NFSv4 open() call, or calling put_nfs_open_context() after an asynchronous read or write call. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
383ba719 |
|
19-Feb-2008 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix a deadlock with lazy umount We can't allow rpc callback functions like task->tk_ops->rpc_call_prepare() and task->tk_ops->rpc_call_done() to call mntput() in any way, since that will cause a deadlock when the call to rpc_shutdown_client() attempts to wait on 'task' to complete. We can avoid the above deadlock by moving calls to mntput to task->tk_ops->rpc_release() callback, since at that time the task will be marked as completed, and so rpc_shutdown_client won't attempt to wait on it. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
e6f81075 |
|
24-Jan-2008 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Add an asynchronous delegreturn operation for use in nfs_clear_inode Otherwise, there is a potential deadlock if the last dput() from an NFSv4 close() or other asynchronous operation leads to nfs_clear_inode calling the synchronous delegreturn. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
99fadcd7 |
|
22-Jan-2008 |
Benny Halevy <bhalevy@panasas.com> |
nfs: convert NFS_*(inode) helpers to static inline Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
3a10c30a |
|
22-Jan-2008 |
Benny Halevy <bhalevy@panasas.com> |
nfs: obliterate NFS_FLAGS macro use NFS_I(inode)->flags instead Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
bfc69a45 |
|
15-Oct-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: define a function to update nfsi->cache_change_attribute Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
28c494c5 |
|
26-Oct-2007 |
Chuck Lever <chuck.lever@oracle.com> |
NFS: Prevent nfs_getattr() hang during heavy write workloads POSIX requires that ctime and mtime, as reported by the stat(2) call, reflect the activity of the most recent write(2). To that end, nfs_getattr() flushes pending dirty writes to a file before doing a GETATTR to allow the NFS server to set the file's size, ctime, and mtime properly. However, nfs_getattr() can be starved when a constant stream of application writes to a file prevents nfs_wb_nocommit() from completing. This usually results in hangs of programs doing a stat against an NFS file that is being written. "ls -l" is a common victim of this behavior. To prevent starvation, hold the file's i_mutex in nfs_getattr() to freeze applications writes temporarily so the client can more quickly obtain clean values for a file's size, mtime, and ctime. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
8a8c74bf |
|
26-Oct-2007 |
Chuck Lever <chuck.lever@oracle.com> |
NFS: Ensure nfs_wcc_update_inode always converts file size to loff_t The nfs_wcc_update_inode() function omits logic to convert the type of the NFS on-the-wire value of a file's size (__u64) to the type of file size value stored in struct inode (loff_t, which is signed). Everywhere else in the NFS client I checked already correctly converts the file size type. This effects only very large files. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
150030b7 |
|
06-Dec-2007 |
Matthew Wilcox <willy@infradead.org> |
NFS: Switch from intr mount option to TASK_KILLABLE By using the TASK_KILLABLE infrastructure, we can get rid of the 'intr' mount option. We have to use _killable everywhere instead of _interruptible as we get rid of rpc_clnt_sigmask/sigunmask. Signed-off-by: Liam R. Howlett <howlett@gmail.com> Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
|
#
a49c3c77 |
|
18-Oct-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFSv4: Ensure that we wait for the CLOSE request to complete Otherwise, we do end up breaking close-to-open semantics. We also end up breaking some of the silly-rename tests in Connectathon on some setups. Please refer to the bug-report at http://bugzilla.linux-nfs.org/show_bug.cgi?id=150 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
565277f6 |
|
15-Oct-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix a race in sillyrename lookup() and sillyrename() can race one another because the sillyrename() completion cannot take the parent directory's inode->i_mutex since the latter may be held by whoever is calling dput(). We therefore have little option but to add extra locking to ensure that nfs_lookup() and nfs_atomic_open() do not race with the sillyrename completion. If somebody has looked up the sillyrenamed file in the meantime, we just transfer the sillydelete information to the new dentry. Please refer to the bug-report at http://bugzilla.linux-nfs.org/show_bug.cgi?id=150 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
188b95dd |
|
18-Oct-2007 |
Jeff Layton <jlayton@kernel.org> |
NFS: if ATTR_KILL_S*ID bits are set, then skip mode change If the ATTR_KILL_S*ID bits are set then any mode change is only for clearing the setuid/setgid bits. For NFS, skip the mode change and let the server handle it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
4ba9b9d0 |
|
17-Oct-2007 |
Christoph Lameter <clameter@sgi.com> |
Slab API: remove useless ctor parameter and reorder parameters Slab constructors currently have a flags parameter that is never used. And the order of the arguments is opposite to other slab functions. The object pointer is placed before the kmem_cache pointer. Convert ctor(void *object, struct kmem_cache *s, unsigned long flags) to ctor(struct kmem_cache *s, void *object) throughout the kernel [akpm@linux-foundation.org: coupla fixes] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f43bf0be |
|
08-Oct-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Add a boot parameter to disable 64 bit inode numbers This boot parameter will allow legacy 32-bit applications which call stat() to continue to function even if the NFSv3/v4 server uses 64-bit inode numbers. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
2a3f5fd4 |
|
08-Oct-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: nfs_refresh_inode should clear cache_validity flags on success If the cached attributes match the ones supplied in the fattr, then assume we've revalidated the inode. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
40d24704 |
|
08-Oct-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix a connectathon regression in NFSv3 and NFSv4 We're failing basic test6 against Linux servers because they lack a correct change attribute. The fix is to assume that we always want to invalidate the readdir caches when we call update_changeattr and/or nfs_post_op_update_inode on a directory. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
c7c20973 |
|
28-Sep-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Get rid of some obsolete macros - NFS_READTIME, NFS_CHANGE_ATTR are completely unused. - Inline the few remaining uses of NFS_ATTRTIMEO, and remove. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
6d2b2966 |
|
01-Oct-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Reset nfsi->last_updated only if the attribute changed Otherwise set it to nfsi->read_cache_jiffies in order to prevent jiffy wraparound issues. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
60ccd4ec |
|
29-Sep-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Remove nfs_begin_data_update/nfs_end_data_update The lower level routines in fs/nfs/proc.c, fs/nfs/nfs3proc.c and fs/nfs/nfs4proc.c should already be dealing with the revalidation issues. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
80eb209d |
|
29-Sep-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Remove NFS_I(inode)->data_updates We have no more users... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
7668fdbe |
|
01-Oct-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: nfs_post_op_update_inode don't update cache_change_attribute If nfs_post_op_update_inode fails because the server didn't return any attributes, then we let the subsequent inode revalidation update cache_change_attribute. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
12b373eb |
|
01-Oct-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Don't revalidate dentries on directory size or ctime changes We only need to look at the mtime changes... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
2f78e431 |
|
30-Sep-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Don't set cache_change_attribute in nfs_revalidate_mapping The attribute revalidation code will already have taken care of resetting nfsi->cache_change_attribute. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
70ca8852 |
|
30-Sep-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fake up 'wcc' attributes to prevent cache invalidation after write NFSv2 and v4 don't offer weak cache consistency attributes on WRITE calls. In NFSv3, returning wcc data is optional. In all cases, we want to prevent the client from invalidating our cached data whenever ->write_done() attempts to update the inode attributes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
b64e8a5e |
|
30-Sep-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Remove bogus check of cache_change_attribute in nfs_update_inode Remove the bogus 'data_stable' check in nfs_update_inode. The cache_change_attribute tells you if the directory changed on the server, and should have nothing to do with the file length. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
7fdc49c4 |
|
28-Sep-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix the ESTALE "revalidation" in _nfs_revalidate_inode() For one thing, the test NFS_ATTRTIMEO() == 0 makes no sense: we're testing whether or not the cache timeout length is zero, which is totally unrelated to the issue of whether or not we trust the file staleness. Secondly, we do not want to retry the GETATTR once a file has been declared stale by the server: we rather want to discard that inode as soon as possible, since there are broken servers still in use out there that reuse filehandles on new files. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
c4812998 |
|
28-Sep-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix atime revalidation in readdir() NFSv3 will correctly update atime on a readdir call, so there is no need to set the NFS_INO_INVALID_ATIME flag unless the call to nfs_refresh_inode() fails. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
47aabaa7 |
|
27-Sep-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFSv4: Don't use ctime/mtime for determining when to invalidate the caches In NFSv4 we should only be looking at the change attribute. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
17cadc95 |
|
27-Sep-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Don't force a dcache revalidation if nfs_wcc_update_inode succeeds The reason is that if the weak cache consistency update was successful, then we know that our client must be the only one that changed the directory, and we've already updated the dcache to reflect the change. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
e323ea46 |
|
30-Sep-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: nfs_wcc_update_inode: directory caches are always invalidated We must ensure that the readdir data is always invalidated whether or not the weak cache consistency data update succeeds. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
6ecc5e8f |
|
28-Sep-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix dcache revalidation bugs We don't need to force a dentry lookup just because we're making changes to the directory. Don't update nfsi->cache_change_attribute in nfs_end_data_update: that overrides the NFSv3/v4 weak consistency checking that tells us our update was the only one, and that tells us the dcache is still valid. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
7957c141 |
|
28-Sep-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: fix nfs_verify_change_attribute We always want to check that the verifier and directory cache_change_attribute match. This also allows us to remove the 'wraparound hack' for the cache_change_attribute. If we're only checking for equality, then we don't care about wraparound issues. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
68e8a70d |
|
14-Aug-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: nfs_post_op_update_inode() should call nfs_refresh_inode() Ensure that we don't clobber the results from a more recent getattr call... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
f2115dc9 |
|
14-Aug-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix over-conservative attribute invalidation in nfs_update_inode() We should always be declaring the attribute cache as valid after having updated it. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
cd3758e3 |
|
10-Aug-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Replace file->private_data with calls to nfs_file_open_context() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
c7e15961 |
|
26-Jul-2007 |
Fabio Olive Leite <fleite@redhat.com> |
Re: [NFS] [PATCH] Attribute timeout handling and wrapping u32 jiffies I would like to discuss the idea that the current checks for attribute timeout using time_after are inadequate for 32bit architectures, since time_after works correctly only when the two timestamps being compared are within 2^31 jiffies of each other. The signed overflow caused by comparing values more than 2^31 jiffies apart will flip the result, causing incorrect assumptions of validity. 2^31 jiffies is a fairly large period of time (~25 days) when compared to the lifetime of most kernel data structures, but for long lived NFS mounts that can sit idle for months (think that for some reason autofs cannot be used), it is easy to compare inode attribute timestamps with very disparate or even bogus values (as in when jiffies have wrapped many times, where the comparison doesn't even make sense). Currently the code tests for attribute timeout by simply adding the desired amount of jiffies to the stored timestamp and comparing that with the current timestamp of obtained attribute data with time_after. This is incorrect, as it returns true for the desired timeout period and another full 2^31 range of jiffies. In testing with artificial jumps (several small jumps, not one big crank) of the jiffies I was able to reproduce a problem found in a server with very long lived NFS mounts, where attributes would not be refreshed even after touching files and directories in the server: Initial uptime: 03:42:01 up 6 min, 0 users, load average: 0.01, 0.12, 0.07 NFS volume is mounted and time is advanced: 03:38:09 up 25 days, 2 min, 0 users, load average: 1.22, 1.05, 1.08 # ls -l /local/A/foo/bar /nfs/A/foo/bar -rw-r--r-- 1 root root 0 Dec 17 03:38 /local/A/foo/bar -rw-r--r-- 1 root root 0 Nov 22 00:36 /nfs/A/foo/bar # touch /local/A/foo/bar # ls -l /local/A/foo/bar /nfs/A/foo/bar -rw-r--r-- 1 root root 0 Dec 17 03:47 /local/A/foo/bar -rw-r--r-- 1 root root 0 Nov 22 00:36 /nfs/A/foo/bar We can see the local mtime is updated, but the NFS mount still shows the old value. The patch below makes it work: Initial setup... 07:11:02 up 25 days, 1 min, 0 users, load average: 0.15, 0.03, 0.04 # ls -l /local/A/foo/bar /nfs/A/foo/bar -rw-r--r-- 1 root root 0 Jan 11 07:11 /local/A/foo/bar -rw-r--r-- 1 root root 0 Jan 11 07:11 /nfs/A/foo/bar # touch /local/A/foo/bar # ls -l /local/A/foo/bar /nfs/A/foo/bar -rw-r--r-- 1 root root 0 Jan 11 07:14 /local/A/foo/bar -rw-r--r-- 1 root root 0 Jan 11 07:14 /nfs/A/foo/bar Signed-off-by: Fabio Olive Leite <fleite@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
4e769b93 |
|
03-Aug-2007 |
Peter Staubach <staubach@redhat.com> |
64 bit ino support for NFS client Hi. Attached is a patch to modify the NFS client code to support 64 bit ino's, as appropriate for the system and the NFS protocol version. The code basically just expand the NFS interfaces for routines which handle ino's from using ino_t to u64 and then uses the fileid in the nfs_inode instead of i_ino in the inode. The code paths that were updated are in the getattr method and the readdir methods. This should be no real change on 64 bit platforms. Since the ino_t is an unsigned long, it would already be 64 bits wide. Thanx... ps Signed-off-by: Peter Staubach <staubach@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
ed90ef51 |
|
20-Jul-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Clean up NFS writeback flush code The only user of nfs_sync_mapping_range() is nfs_getattr(), which uses it to flush out the entire inode without sending a commit. We therefore replace nfs_sync_mapping_range with a more appropriate helper. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
5e11934d |
|
25-Jul-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix put_nfs_open_context We need to grab the inode->i_lock atomically with the last reference put in order to remove the open context that is being freed from the nfsi->open_files list. Fix by converting the kref to a standard atomic counter and then using atomic_dec_and_lock()... Thanks to Arnd Bergmann for pointing out the problem. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
20c2df83 |
|
19-Jul-2007 |
Paul Mundt <lethal@linux-sh.org> |
mm: Remove slab destructors from kmem_cache_create(). Slab destructors were no longer supported after Christoph's c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been BUGs for both slab and slub, and slob never supported them either. This rips out support for the dtor pointer from kmem_cache_create() completely and fixes up every single callsite in the kernel (there were about 224, not including the slab allocator definitions themselves, or the documentation references). Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
#
412c77ce |
|
03-Jul-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFSv4: Defer inode revalidation when setting up a delegation Currently we force a synchronous call to __nfs_revalidate_inode() in nfs_inode_set_delegation(). This not only ensures that we cannot call nfs_inode_set_delegation from an asynchronous context, but it also slows down any call to open(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
587142f8 |
|
02-Jul-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Replace NFS_I(inode)->req_lock with inode->i_lock There is no justification for keeping a special spinlock for the exclusive use of the NFS writeback code. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
3bec63db |
|
17-Jun-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Convert struct nfs_open_context to use a kref Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
2aefa104 |
|
17-Jun-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Remove the redundant 'dirty' and 'commit' lists from nfs_inode Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
a0356862 |
|
05-Jun-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix nfs_reval_fsid() We don't need to revalidate the fsid on the root directory. It suffices to revalidate it on the current directory. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
4a35bd41 |
|
05-Jun-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFSv4: Ensure that nfs4_do_close() doesn't race with umount nfs4_do_close() does not currently have any way to ensure that the user won't attempt to unmount the partition while the asynchronous RPC call is completing. This again may cause Oopses in nfs_update_inode(). Add a vfsmount argument to nfs4_close_state to ensure that the partition remains mounted while we're closing the file. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
88be9f99 |
|
05-Jun-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Replace vfsmount and dentry in nfs_open_context with struct path Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
e8edc6e0 |
|
20-May-2007 |
Alexey Dobriyan <adobriyan@gmail.com> |
Detach sched.h from mm.h First thing mm.h does is including sched.h solely for can_do_mlock() inline function which has "current" dereference inside. By dealing with can_do_mlock() mm.h can be detached from sched.h which is good. See below, why. This patch a) removes unconditional inclusion of sched.h from mm.h b) makes can_do_mlock() normal function in mm/mlock.c c) exports can_do_mlock() to not break compilation d) adds sched.h inclusions back to files that were getting it indirectly. e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were getting them indirectly Net result is: a) mm.h users would get less code to open, read, preprocess, parse, ... if they don't need sched.h b) sched.h stops being dependency for significant number of files: on x86_64 allmodconfig touching sched.h results in recompile of 4083 files, after patch it's only 3744 (-8.3%). Cross-compile tested on all arm defconfigs, all mips defconfigs, all powerpc defconfigs, alpha alpha-up arm i386 i386-up i386-defconfig i386-allnoconfig ia64 ia64-up m68k mips parisc parisc-up powerpc powerpc-up s390 s390-up sparc sparc-up sparc64 sparc64-up um-x86_64 x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig as well as my two usual configs. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
a35afb83 |
|
16-May-2007 |
Christoph Lameter <clameter@sgi.com> |
Remove SLAB_CTOR_CONSTRUCTOR SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: David Howells <dhowells@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Steven French <sfrench@us.ibm.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@ucw.cz> Cc: David Chinner <dgc@sgi.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
7a13e932 |
|
26-Apr-2007 |
Jesper Juhl <jesper.juhl@gmail.com> |
NFS: Kill the obsolete NFS_PARANOIA Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
50953fe9 |
|
06-May-2007 |
Christoph Lameter <clameter@sgi.com> |
slab allocators: Remove SLAB_DEBUG_INITIAL flag I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by SLAB. I think its purpose was to have a callback after an object has been freed to verify that the state is the constructor state again? The callback is performed before each freeing of an object. I would think that it is much easier to check the object state manually before the free. That also places the check near the code object manipulation of the object. Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was compiled with SLAB debugging on. If there would be code in a constructor handling SLAB_DEBUG_INITIAL then it would have to be conditional on SLAB_DEBUG otherwise it would just be dead code. But there is no such code in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real use of, difficult to understand and there are easier ways to accomplish the same effect (i.e. add debug code before kfree). There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be clear in fs inode caches. Remove the pointless checks (they would even be pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors. This is the last slab flag that SLUB did not support. Remove the check for unimplemented flags from SLUB. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e1552e19 |
|
14-Apr-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix an Oops in nfs_setattr() It looks like nfs_setattr() and nfs_rename() also need to test whether the target is a regular file before calling nfs_wb_all()... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
63470738 |
|
16-Mar-2007 |
Trond Myklebust <trond.myklebust@fys.uio.no> |
[PATCH] nfs: nfs_getattr() can't call nfs_sync_mapping_range() for non-regular files Looks like we need a check in nfs_getattr() for a regular file. It makes no sense to call nfs_sync_mapping_range() on anything else. I think that should fix your problem: it will stop the NFS client from interfering with dirty pages on that inode's mapping. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Olof Johansson <olof@lixom.net> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
b0c4fddc |
|
05-Feb-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Cleanup - avoid rereading 'jiffies' more than once in the same routine Micro-optimisations for nfs_fhget() and nfs_wcc_update_inode(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
3e7d950a |
|
05-Feb-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix a wraparound issue with nfsi->cache_change_attribute Fix wraparound issue with nfsi->cache_change_attribute. If it is found to lie in the future, then update it to lie in the past. Patch based on a suggestion by Neil Brown. ..and minor micro-optimisation: avoid reading 'jiffies' more than once in nfs_update_inode(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
d30c8348 |
|
13-Dec-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: nfs_writepages() cleanup Strip out the call to nfs_commit_inode(), and allow that to be done by nfs_write_inode(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
717d44e8 |
|
24-Jan-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
[PATCH] NFS: Fix races in nfs_revalidate_mapping() Prevent the call to invalidate_inode_pages2() from racing with file writes by taking the inode->i_mutex across the page cache flush and invalidate. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
01cce933 |
|
08-Dec-2006 |
Josef "Jeff" Sipek <jsipek@cs.sunysb.edu> |
[PATCH] nfs: change uses of f_{dentry,vfsmnt} to use f_path Change all the uses of f_{dentry,vfsmnt} to f_path.{dentry,mnt} in the nfs client code. Signed-off-by: Josef "Jeff" Sipek <jsipek@cs.sunysb.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
e18b890b |
|
06-Dec-2006 |
Christoph Lameter <clameter@sgi.com> |
[PATCH] slab: remove kmem_cache_t Replace all uses of kmem_cache_t with struct kmem_cache. The patch was generated using the following script: #!/bin/sh # # Replace one string by another in all the kernel sources. # set -e for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do quilt add $file sed -e "1,\$s/$1/$2/g" $file >/tmp/$$ mv /tmp/$$ $file quilt refresh done The script was run like this sh replace kmem_cache_t "struct kmem_cache" Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
e94b1766 |
|
06-Dec-2006 |
Christoph Lameter <clameter@sgi.com> |
[PATCH] slab: remove SLAB_KERNEL SLAB_KERNEL is an alias of GFP_KERNEL. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
1c75950b |
|
09-Oct-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: cleanup of nfs_sync_inode_wait() Allow callers to directly pass it a struct writeback_control. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
85233a7a |
|
20-Oct-2006 |
Chuck Lever <chuck.lever@oracle.com> |
[PATCH] NFS: __nfs_revalidate_inode() can use "inode" before checking it is non-NULL The "!inode" check in __nfs_revalidate_inode() occurs well after the first time it is dereferenced, so get rid of it. Coverity: #cid 1372, 1373 Test plan: Code review; recheck with Coverity. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
cd9ae2b6 |
|
20-Oct-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
[PATCH] NFS: Deal with failure of invalidate_inode_pages2() If invalidate_inode_pages2() fails, then it should in principle just be because the current process was signalled. In that case, we just want to ensure that the inode's page cache remains marked as invalid. Also add a helper to allow the O_DIRECT code to simply mark the page cache as invalid once it is finished writing, instead of calling invalidate_inode_pages2() itself. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
ba52de12 |
|
27-Sep-2006 |
Theodore Ts'o <tytso@mit.edu> |
[PATCH] inode-diet: Eliminate i_blksize from the inode structure This eliminates the i_blksize field from struct inode. Filesystems that want to provide a per-inode st_blksize can do so by providing their own getattr routine instead of using the generic_fillattr() function. Note that some filesystems were providing pretty much random (and incorrect) values for i_blksize. [bunk@stusta.de: cleanup] [akpm@osdl.org: generic_fillattr() fix] Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
1a1d92c1 |
|
27-Sep-2006 |
Alexey Dobriyan <adobriyan@gmail.com> |
[PATCH] Really ignore kmem_cache_destroy return value * Rougly half of callers already do it by not checking return value * Code in drivers/acpi/osl.c does the following to be sure: (void)kmem_cache_destroy(cache); * Those who check it printk something, however, slab_error already printed the name of failed cache. * XFS BUGs on failed kmem_cache_destroy which is not the decision low-level filesystem driver should make. Converted to ignore. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
f52720ca |
|
27-Sep-2006 |
Panagiotis Issaris <takis@issaris.org> |
[PATCH] fs: Removing useless casts * Removing useless casts * Removing useless wrapper * Conversion from kmalloc+memset to kzalloc Signed-off-by: Panagiotis Issaris <takis@issaris.org> Acked-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
f551e44f |
|
20-Sep-2006 |
Chuck Lever <chuck.lever@oracle.com> |
NFS: add comments clarifying the use of nfs_post_op_update() Comments-only change to clarify a detail of the NFS protocol and how it is implemented in Linux. Test plan: None. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
97db8f41 |
|
14-Sep-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Don't invalidate the symlink we just stuffed into the cache And slight optimisation of nfs_end_data_update(): directories never have delegations anyway. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
6aaca566 |
|
22-Aug-2006 |
David Howells <dhowells@redhat.com> |
NFS: Add server and volume lists to /proc Make two new proc files available: /proc/fs/nfsfs/servers /proc/fs/nfsfs/volumes The first lists the servers with which we are currently dealing (struct nfs_client), and the second lists the volumes we have on those servers (struct nfs_server). Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
54ceac45 |
|
22-Aug-2006 |
David Howells <dhowells@redhat.com> |
NFS: Share NFS superblocks per-protocol per-server per-FSID The attached patch makes NFS share superblocks between mounts from the same server and FSID over the same protocol. It does this by creating each superblock with a false root and returning the real root dentry in the vfsmount presented by get_sb(). The root dentry set starts off as an anonymous dentry if we don't already have the dentry for its inode, otherwise it simply returns the dentry we already have. We may thus end up with several trees of dentries in the superblock, and if at some later point one of anonymous tree roots is discovered by normal filesystem activity to be located in another tree within the superblock, the anonymous root is named and materialises attached to the second tree at the appropriate point. Why do it this way? Why not pass an extra argument to the mount() syscall to indicate the subpath and then pathwalk from the server root to the desired directory? You can't guarantee this will work for two reasons: (1) The root and intervening nodes may not be accessible to the client. With NFS2 and NFS3, for instance, mountd is called on the server to get the filehandle for the tip of a path. mountd won't give us handles for anything we don't have permission to access, and so we can't set up NFS inodes for such nodes, and so can't easily set up dentries (we'd have to have ghost inodes or something). With this patch we don't actually create dentries until we get handles from the server that we can use to set up their inodes, and we don't actually bind them into the tree until we know for sure where they go. (2) Inaccessible symbolic links. If we're asked to mount two exports from the server, eg: mount warthog:/warthog/aaa/xxx /mmm mount warthog:/warthog/bbb/yyy /nnn We may not be able to access anything nearer the root than xxx and yyy, but we may find out later that /mmm/www/yyy, say, is actually the same directory as the one mounted on /nnn. What we might then find out, for example, is that /warthog/bbb was actually a symbolic link to /warthog/aaa/xxx/www, but we can't actually determine that by talking to the server until /warthog is made available by NFS. This would lead to having constructed an errneous dentry tree which we can't easily fix. We can end up with a dentry marked as a directory when it should actually be a symlink, or we could end up with an apparently hardlinked directory. With this patch we need not make assumptions about the type of a dentry for which we can't retrieve information, nor need we assume we know its place in the grand scheme of things until we actually see that place. This patch reduces the possibility of aliasing in the inode and page caches for inodes that may be accessed by more than one NFS export. It also reduces the number of superblocks required for NFS where there are many NFS exports being used from a server (home directory server + autofs for example). This in turn makes it simpler to do local caching of network filesystems, as it can then be guaranteed that there won't be links from multiple inodes in separate superblocks to the same cache file. Obviously, cache aliasing between different levels of NFS protocol could still be a problem, but at least that gives us another key to use when indexing the cache. This patch makes the following changes: (1) The server record construction/destruction has been abstracted out into its own set of functions to make things easier to get right. These have been moved into fs/nfs/client.c. All the code in fs/nfs/client.c has to do with the management of connections to servers, and doesn't touch superblocks in any way; the remaining code in fs/nfs/super.c has to do with VFS superblock management. (2) The sequence of events undertaken by NFS mount is now reordered: (a) A volume representation (struct nfs_server) is allocated. (b) A server representation (struct nfs_client) is acquired. This may be allocated or shared, and is keyed on server address, port and NFS version. (c) If allocated, the client representation is initialised. The state member variable of nfs_client is used to prevent a race during initialisation from two mounts. (d) For NFS4 a simple pathwalk is performed, walking from FH to FH to find the root filehandle for the mount (fs/nfs/getroot.c). For NFS2/3 we are given the root FH in advance. (e) The volume FSID is probed for on the root FH. (f) The volume representation is initialised from the FSINFO record retrieved on the root FH. (g) sget() is called to acquire a superblock. This may be allocated or shared, keyed on client pointer and FSID. (h) If allocated, the superblock is initialised. (i) If the superblock is shared, then the new nfs_server record is discarded. (j) The root dentry for this mount is looked up from the root FH. (k) The root dentry for this mount is assigned to the vfsmount. (3) nfs_readdir_lookup() creates dentries for each of the entries readdir() returns; this function now attaches disconnected trees from alternate roots that happen to be discovered attached to a directory being read (in the same way nfs_lookup() is made to do for lookup ops). The new d_materialise_unique() function is now used to do this, thus permitting the whole thing to be done under one set of locks, and thus avoiding any race between mount and lookup operations on the same directory. (4) The client management code uses a new debug facility: NFSDBG_CLIENT which is set by echoing 1024 to /proc/net/sunrpc/nfs_debug. (5) Clone mounts are now called xdev mounts. (6) Use the dentry passed to the statfs() op as the handle for retrieving fs statistics rather than the root dentry of the superblock (which is now a dummy). Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
8fa5c000 |
|
22-Aug-2006 |
David Howells <dhowells@redhat.com> |
NFS: Move rpc_ops from nfs_server to nfs_client Move the rpc_ops from the nfs_server struct to the nfs_client struct as they're common to all server records of a particular NFS protocol version. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
cfcea3e8 |
|
25-Jul-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Add a global LRU list for the ACCESS cache ...in order to allow the addition of a memory shrinker. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
1c3c07e9 |
|
25-Jul-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Add a new ACCESS rpc call cache to the linux nfs client The current access cache only allows one entry at a time to be cached for each inode. Add a per-inode red-black tree in order to allow more than one to be cached at a time. Should significantly cut down the time spent in path traversal for shared directories such as ${PATH}, /usr/share, etc. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
6ab3d562 |
|
30-Jun-2006 |
Jörn Engel <joern@wohnheim.fh-wedel.de> |
Remove obsolete #include <linux/config.h> Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
#
266bee88 |
|
27-Jun-2006 |
David Brownell <david-b@pacbell.net> |
[PATCH] fix static linking of NFS Builds on ARM report link problems with common configurations like statically linked NFS (for nfsroot). The symptom is that __init section code references __exit section code; that won't work since the exit sections are discarded (since they can never be called). The best fix for these particular cases would be an "__init_or_exit" section annotation. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
d75d5414 |
|
25-Jun-2006 |
Andrew Morton <akpm@osdl.org> |
git-nfs-build-fixes Fix various problems with nfs4 disabled. And various other things. In file included from fs/nfs/inode.c:50: fs/nfs/internal.h:24: error: static declaration of 'nfs_do_refmount' follows non-static declaration include/linux/nfs_fs.h:320: error: previous declaration of 'nfs_do_refmount' was here fs/nfs/internal.h:65: warning: 'struct nfs4_fs_locations' declared inside parameter list fs/nfs/internal.h:65: warning: its scope is only this definition or declaration, which is probably not what you want fs/nfs/internal.h: In function 'nfs4_path': fs/nfs/internal.h:97: error: 'struct nfs_server' has no member named 'mnt_path' fs/nfs/inode.c: In function 'init_once': fs/nfs/inode.c:1116: error: 'struct nfs_inode' has no member named 'open_states' fs/nfs/inode.c:1116: error: 'struct nfs_inode' has no member named 'delegation' fs/nfs/inode.c:1116: error: 'struct nfs_inode' has no member named 'delegation_state' fs/nfs/inode.c:1116: error: 'struct nfs_inode' has no member named 'rwsem' distcc[26452] ERROR: compile fs/nfs/inode.c on g5/64 failed make[1]: *** [fs/nfs/inode.o] Error 1 make: *** [fs/nfs/inode.o] Error 2 make: *** Waiting for unfinished jobs.... In file included from fs/nfs/nfs3xdr.c:26: fs/nfs/internal.h:24: error: static declaration of 'nfs_do_refmount' follows non-static declaration include/linux/nfs_fs.h:320: error: previous declaration of 'nfs_do_refmount' was here fs/nfs/internal.h:65: warning: 'struct nfs4_fs_locations' declared inside parameter list fs/nfs/internal.h:65: warning: its scope is only this definition or declaration, which is probably not what you want fs/nfs/internal.h: In function 'nfs4_path': fs/nfs/internal.h:97: error: 'struct nfs_server' has no member named 'mnt_path' distcc[26486] ERROR: compile fs/nfs/nfs3xdr.c on g5/64 failed make[1]: *** [fs/nfs/nfs3xdr.o] Error 1 make: *** [fs/nfs/nfs3xdr.o] Error 2 In file included from fs/nfs/nfs3proc.c:24: fs/nfs/internal.h:24: error: static declaration of 'nfs_do_refmount' follows non-static declaration include/linux/nfs_fs.h:320: error: previous declaration of 'nfs_do_refmount' was here fs/nfs/internal.h:65: warning: 'struct nfs4_fs_locations' declared inside parameter list fs/nfs/internal.h:65: warning: its scope is only this definition or declaration, which is probably not what you want fs/nfs/internal.h: In function 'nfs4_path': fs/nfs/internal.h:97: error: 'struct nfs_server' has no member named 'mnt_path' distcc[26469] ERROR: compile fs/nfs/nfs3proc.c on bix/32 failed make[1]: *** [fs/nfs/nfs3proc.o] Error 1 make: *** [fs/nfs/nfs3proc.o] Error 2 **FAILED** Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andreas Gruenbacher <agruen@suse.de> Cc: Andy Adamson <andros@citi.umich.edu> Cc: Chuck Lever <cel@netapp.com> Cc: David Howells <dhowells@redhat.com> Cc: J. Bruce Fields <bfields@fieldses.org> Cc: Manoj Naik <manoj@almaden.ibm.com> Cc: Marc Eshel <eshel@almaden.ibm.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
726c3342 |
|
23-Jun-2006 |
David Howells <dhowells@redhat.com> |
[PATCH] VFS: Permit filesystem to perform statfs with a known root dentry Give the statfs superblock operation a dentry pointer rather than a superblock pointer. This complements the get_sb() patch. That reduced the significance of sb->s_root, allowing NFS to place a fake root there. However, NFS does require a dentry to use as a target for the statfs operation. This permits the root in the vfsmount to be used instead. linux/mount.h has been added where necessary to make allyesconfig build successfully. Interest has also been expressed for use with the FUSE and XFS filesystems. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Nathan Scott <nathans@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
454e2398 |
|
23-Jun-2006 |
David Howells <dhowells@redhat.com> |
[PATCH] VFS: Permit filesystem to override root dentry on mount Extend the get_sb() filesystem operation to take an extra argument that permits the VFS to pass in the target vfsmount that defines the mountpoint. The filesystem is then required to manually set the superblock and root dentry pointers. For most filesystems, this should be done with simple_set_mnt() which will set the superblock pointer and then set the root dentry to the superblock's s_root (as per the old default behaviour). The get_sb() op now returns an integer as there's now no need to return the superblock pointer. This patch permits a superblock to be implicitly shared amongst several mount points, such as can be done with NFS to avoid potential inode aliasing. In such a case, simple_set_mnt() would not be called, and instead the mnt_root and mnt_sb would be set directly. The patch also makes the following changes: (*) the get_sb_*() convenience functions in the core kernel now take a vfsmount pointer argument and return an integer, so most filesystems have to change very little. (*) If one of the convenience function is not used, then get_sb() should normally call simple_set_mnt() to instantiate the vfsmount. This will always return 0, and so can be tail-called from get_sb(). (*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the dcache upon superblock destruction rather than shrink_dcache_anon(). This is required because the superblock may now have multiple trees that aren't actually bound to s_root, but that still need to be cleaned up. The currently called functions assume that the whole tree is rooted at s_root, and that anonymous dentries are not the roots of trees which results in dentries being left unculled. However, with the way NFS superblock sharing are currently set to be implemented, these assumptions are violated: the root of the filesystem is simply a dummy dentry and inode (the real inode for '/' may well be inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries with child trees. [*] Anonymous until discovered from another tree. (*) The documentation has been adjusted, including the additional bit of changing ext2_* into foo_* in the documentation. [akpm@osdl.org: convert ipath_fs, do other stuff] Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Nathan Scott <nathans@sgi.com> Cc: Roland Dreier <rolandd@cisco.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
f7b422b1 |
|
09-Jun-2006 |
David Howells <dhowells@redhat.com> |
NFS: Split fs/nfs/inode.c As fs/nfs/inode.c is rather large, heterogenous and unwieldy, the attached patch splits it up into a number of files: (*) fs/nfs/inode.c Strictly inode specific functions. (*) fs/nfs/super.c Superblock management functions for NFS and NFS4, normal access, clones and referrals. The NFS4 superblock functions _could_ move out into a separate conditionally compiled file, but it's probably not worth it as there're so many common bits. (*) fs/nfs/namespace.c Some namespace-specific functions have been moved here. (*) fs/nfs/nfs4namespace.c NFS4-specific namespace functions (this could be merged into the previous file). This file is conditionally compiled. (*) fs/nfs/internal.h Inter-file declarations, plus a few simple utility functions moved from fs/nfs/inode.c. Additionally, all the in-.c-file externs have been moved here, and those files they were moved from now includes this file. For the most part, the functions have not been changed, only some multiplexor functions have changed significantly. I've also: (*) Added some extra banner comments above some functions. (*) Rearranged the function order within the files to be more logical and better grouped (IMO), though someone may prefer a different order. (*) Reduced the number of #ifdefs in .c files. (*) Added missing __init and __exit directives. Signed-Off-By: David Howells <dhowells@redhat.com>
|
#
4e5ccf60 |
|
09-Jun-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix typo in nfs_do_clone_mount() Doh! Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
860de071 |
|
09-Jun-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix compile errors introduced by referrals patches Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
87e4ba1a |
|
09-Jun-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFSv4: Ensure that referral mounts bind to a reserved port Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
6b97fd3d |
|
09-Jun-2006 |
Manoj Naik <manoj@almaden.ibm.com> |
NFSv4: Follow a referral Respond to a moved error on NFS lookup by setting up the referral. Note: We don't actually follow the referral during lookup/getattr, but later when we detect fsid mismatch in inode revalidation (similar to the processing done for cloning submounts). Referrals will have fake attributes until they are actually followed or traversed. Signed-off-by: Manoj Naik <manoj@almaden.ibm.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
9cdb3883 |
|
09-Jun-2006 |
Manoj Naik <manoj@almaden.ibm.com> |
NFSv4: Ensure client submounts when following a referral Set up mountpoint when hitting a referral on moved error by getting fs_locations. Signed-off-by: Manoj Naik <manoj@almaden.ibm.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
61f5164c |
|
09-Jun-2006 |
Manoj Naik <manoj@almaden.ibm.com> |
NFS: Expand clone mounts to include other servers Signed-off-by: Manoj Naik <manoj@almaden.ibm.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
c818ba43 |
|
09-Jun-2006 |
Manoj Naik <manoj@almaden.ibm.com> |
NFSv4: Create NFSv4 transport and client Move existing code into a separate function so that it can be also used by referral code. Signed-off-by: Manoj Naik <manoj@almaden.ibm.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
51d8fa6a |
|
09-Jun-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Add timeout to submounts Make automounted partitions expire using the mark_mounts_for_expiry() function. The timeout is controlled via a sysctl. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
55a97593 |
|
09-Jun-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Ensure the client submounts, when it crosses a server mountpoint. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
8b4bdcf8 |
|
09-Jun-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Store the file system "fsid" value in the NFS super block. This should enable us to detect if we are crossing a mountpoint in the case where the server is exporting "nohide" mounts. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
8b512d9a |
|
09-Jun-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
VFS: Remove dependency of ->umount_begin() call on MNT_FORCE Allow filesystems to decide to perform pre-umount processing whether or not MNT_FORCE is set. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
da6d503a |
|
01-Jun-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Remove nfs_delete_inode() Now that we have a real nfs_invalidate_page() to ensure that truncate_inode_pages() does the right thing when there are pending dirty pages, we can get rid of nfs_delete_inode(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
1842bfb4 |
|
24-May-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix up inode revalidation accounting Currently, we are accounting for all calls to nfs_revalidate_inode(), but not to nfs_revalidate_mapping(), or nfs_lookup_verify_inode(), etc... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
44b11874 |
|
24-May-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Separate metadata and page cache revalidation mechanisms Separate out the function of revalidating the inode metadata, and revalidating the mapping. The former may be called by lookup(), and only really needs to check that permissions, ctime, etc haven't changed whereas the latter needs only done when we want to read data from the page cache, and may need to sync and then invalidate the mapping. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
f1bb0b92 |
|
24-May-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix page cache revalidation Fix up a bug in the handling of NFS_INO_REVAL_PAGECACHE: make sure that nfs_update_inode() clears it when we're sure we're not racing with other updates. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
73a3d07c |
|
24-May-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Clean up inode metadata updates Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
9d1e9232 |
|
24-May-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFSv4: Some NFSv4 servers have broken behaviour for the change attribute The Linux NFSv4 server violates RFC3530 in that the change attribute is not guaranteed to be updated for every change to the inode. Our optimisation for checking whether or not the inode metadata has changed or not is broken too. Grr.... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
b9d9506d |
|
19-Apr-2006 |
John Hawkes <hawkes@sgi.com> |
NFS: nfs_show_stats; for_each_possible_cpu(), not NR_CPUS Convert a for-loop that explicitly references "NR_CPUS" into the potentially more efficient for_each_possible_cpu() construct. Signed-off-by: John Hawkes <hawkes@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
fffb60f9 |
|
24-Mar-2006 |
Paul Jackson <pj@sgi.com> |
[PATCH] cpuset memory spread: slab cache format Rewrap the overly long source code lines resulting from the previous patch's addition of the slab cache flag SLAB_MEM_SPREAD. This patch contains only formatting changes, and no function change. Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
4b6a9316 |
|
24-Mar-2006 |
Paul Jackson <pj@sgi.com> |
[PATCH] cpuset memory spread: slab cache filesystems Mark file system inode and similar slab caches subject to SLAB_MEM_SPREAD memory spreading. If a slab cache is marked SLAB_MEM_SPREAD, then anytime that a task that's in a cpuset with the 'memory_spread_slab' option enabled goes to allocate from such a slab cache, the allocations are spread evenly over all the memory nodes (task->mems_allowed) allowed to that task, instead of favoring allocation on the node local to the current cpu. The following inode and similar caches are marked SLAB_MEM_SPREAD: file cache ==== ===== fs/adfs/super.c adfs_inode_cache fs/affs/super.c affs_inode_cache fs/befs/linuxvfs.c befs_inode_cache fs/bfs/inode.c bfs_inode_cache fs/block_dev.c bdev_cache fs/cifs/cifsfs.c cifs_inode_cache fs/coda/inode.c coda_inode_cache fs/dquot.c dquot fs/efs/super.c efs_inode_cache fs/ext2/super.c ext2_inode_cache fs/ext2/xattr.c (fs/mbcache.c) ext2_xattr fs/ext3/super.c ext3_inode_cache fs/ext3/xattr.c (fs/mbcache.c) ext3_xattr fs/fat/cache.c fat_cache fs/fat/inode.c fat_inode_cache fs/freevxfs/vxfs_super.c vxfs_inode fs/hpfs/super.c hpfs_inode_cache fs/isofs/inode.c isofs_inode_cache fs/jffs/inode-v23.c jffs_fm fs/jffs2/super.c jffs2_i fs/jfs/super.c jfs_ip fs/minix/inode.c minix_inode_cache fs/ncpfs/inode.c ncp_inode_cache fs/nfs/direct.c nfs_direct_cache fs/nfs/inode.c nfs_inode_cache fs/ntfs/super.c ntfs_big_inode_cache_name fs/ntfs/super.c ntfs_inode_cache fs/ocfs2/dlm/dlmfs.c dlmfs_inode_cache fs/ocfs2/super.c ocfs2_inode_cache fs/proc/inode.c proc_inode_cache fs/qnx4/inode.c qnx4_inode_cache fs/reiserfs/super.c reiser_inode_cache fs/romfs/inode.c romfs_inode_cache fs/smbfs/inode.c smb_inode_cache fs/sysv/inode.c sysv_inode_cache fs/udf/super.c udf_inode_cache fs/ufs/super.c ufs_inode_cache net/socket.c sock_inode_cache net/sunrpc/rpc_pipe.c rpc_inode_cache The choice of which slab caches to so mark was quite simple. I marked those already marked SLAB_RECLAIM_ACCOUNT, except for fs/xfs, dentry_cache, inode_cache, and buffer_head, which were marked in a previous patch. Even though SLAB_RECLAIM_ACCOUNT is for a different purpose, it marks the same potentially large file system i/o related slab caches as we need for memory spreading. Given that the rule now becomes "wherever you would have used a SLAB_RECLAIM_ACCOUNT slab cache flag before (usually the inode cache), use the SLAB_MEM_SPREAD flag too", this should be easy enough to maintain. Future file system writers will just copy one of the existing file system slab cache setups and tend to get it right without thinking. Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
e8c96f8c |
|
24-Mar-2006 |
Tobias Klauser <tklauser@nuerscht.ch> |
[PATCH] fs: Use ARRAY_SIZE macro Use ARRAY_SIZE macro instead of sizeof(x)/sizeof(x[0]) and remove a duplicate of ARRAY_SIZE. Some trailing whitespaces are also deleted. Signed-off-by: Tobias Klauser <tklauser@nuerscht.ch> Cc: David Howells <dhowells@redhat.com> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Neil Brown <neilb@cse.unsw.edu.au> Cc: Chris Mason <mason@suse.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Nathan Scott <nathans@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
9b04c997 |
|
24-Mar-2006 |
Theodore Ts'o <tytso@mit.edu> |
[PATCH] vfs: MS_VERBOSE should be MS_SILENT The meaning of MS_VERBOSE is backwards; if the bit is set, it really means, "don't be verbose". This is confusing and counter-intuitive. In addition, there is also no way to set the MS_VERBOSE flag in the mount(8) program in util-linux, but interesting, it does define options which would do the right thing if MS_SILENT were defined, which unfortunately we do not: #ifdef MS_SILENT { "quiet", 0, 0, MS_SILENT }, /* be quiet */ { "loud", 0, 1, MS_SILENT }, /* print out messages. */ #endif So the obvious fix is to deprecate the use of MS_VERBOSE and replace it with MS_SILENT. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
c42de9dd |
|
20-Mar-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix a race in nfs_sync_inode() Kudos to Neil Brown for spotting the problem: "in nfs_sync_inode, there is effectively the sequence: nfs_wait_on_requests nfs_flush_inode nfs_commit_inode This seems a bit racy to me as if the only requests are on the ->commit list, and nfs_commit_inode is called separately after nfs_wait_on_requests completes, and before nfs_commit_inode start (say: by nfs_write_inode) then none of these function will return >0, yet there will be some pending request that aren't waited for." The solution is to search for requests to wait upon, search for dirty requests, and search for uncommitted requests while holding the nfsi->req_lock The patch also cleans up nfs_sync_inode(), getting rid of the redundant FLUSH_WAIT flag. It turns out that we were always setting it. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
03f28e3a |
|
20-Mar-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Make nfs_fhget() return appropriate error values Currently it returns NULL, which usually gets interpreted as ENOMEM. In fact it can mean a host of issues. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
01d0ae8b |
|
20-Mar-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFSv4: Fix an oops in nfs4_fill_super The mount statistics patches introduced a call to nfs_free_iostats that is not only redundant, but actually causes an oops. Also fix a memory leak due to the lack of a call to nfs_free_iostats on unmount. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
4ece3a2d |
|
20-Mar-2006 |
Chuck Lever <cel@netapp.com> |
NFS: add RPC I/O statistics to /proc/self/mountstats NFS client now shows various RPC I/O metrics in /proc/self/mountstats. Test plan: Mount/umount while doing "cat /proc/self/mountstats", multiple iterations of connectathon locking suite. Test with NFS version 2, 3, and 4. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
67ec9f46 |
|
20-Mar-2006 |
Chuck Lever <cel@netapp.com> |
NFS: report how long an NFS file system has been mounted Add a field in nfs_server to record a timestamp when a mount succeeds. Report the number of seconds the file system has been mounted via nfs_show_stats(). Test plan: Mount an NFS file system, watch the mountstats reports and compare with clock time. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
91d5b470 |
|
20-Mar-2006 |
Chuck Lever <cel@netapp.com> |
NFS: add I/O performance counters Invoke the byte and event counter macros where we want to count bytes and events. Clean-up: fix a possible NULL dereference in nfs_lock, and simplify nfs_file_open. Test-plan: fsx and iozone on UP and SMP systems, with and without pre-emption. Watch for memory overwrite bugs, and performance loss (significantly more CPU required per op). Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
d9ef5a8c |
|
20-Mar-2006 |
Chuck Lever <cel@netapp.com> |
NFS: introduce mechanism for tracking NFS client metrics Add a per-superblock performance counter facility to the NFS client. This facility mimics the counters available for block devices and for networking. Expose these new counters via the new /proc/self/mountstats interface. Thanks to Andrew Morton and Trond Myklebust for their review and comments. Test plan: fsx and iozone on UP and SMP systems, with and without pre-emption. Watch for memory overwrite bugs, and performance loss (significantly more CPU required per op). Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
c8bded96 |
|
20-Mar-2006 |
Chuck Lever <cel@netapp.com> |
NFS: clean up some mount options Get rid of "lock" and "posix", and spell out "vers=". Test plan: None. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
7a480e25 |
|
20-Mar-2006 |
Chuck Lever <cel@netapp.com> |
NFS: show retransmit settings when displaying mount options Sometimes it's important to know the exact RPC retransmit settings the kernel is using for an NFS mount point. Add this facility to the NFS client's show_options method. Test plan: Set various retransmit settings via the mount command, and check that the settings are reflected in /proc/mounts. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
bd647545 |
|
20-Mar-2006 |
Eric Sesterhenn <snakebyte@gmx.de> |
NFS: kzalloc conversion in fs/nfs this converts fs/nfs to kzalloc() usage. compile tested with make allyesconfig Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
967b9281 |
|
20-Mar-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFSv4: Do not call rpciod_down() before call to destroy_nfsv4_state() The reason is that the idmapper cleanup may call flush_workqueue() on rpciod_workqueue. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
fb374d24 |
|
20-Mar-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: reduce the number of false cache invalidations. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
ca62b9c3 |
|
20-Mar-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFSv4: Don't invalidate cached attributes if change attribute is unchanged Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
755c1e20 |
|
20-Mar-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: writes should not clobber utimes() calls Ensure that we flush out writes in the case when someone calls utimes() in order to set the file times. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
b92dccf6 |
|
20-Mar-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix a busy inodes issue... The nfs_open_context may live longer than the file descriptor that spawned it, so it needs to carry a reference to the vfsmount. If not, then generic_shutdown_super() may end up being called before reads and writes have been flushed out. Make a couple of functions static while we're at it... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
fc33a7bb |
|
09-Jan-2006 |
Christoph Hellwig <hch@lst.de> |
[PATCH] per-mountpoint noatime/nodiratime Turn noatime and nodiratime into per-mount instead of per-sb flags. After all the preparations this is a rather trivial patch. The mount code needs to treat the two options as per-mount instead of per-superblock, and touch_atime needs to be changed to check the new MNT_ flags in addition to the MS_ flags that are kept for filesystems that are always noatime/nodiratime but not user settable anymore. Besides that core code only nfs needed an update because it's leaving atime updates to the server and thus sets the S_NOATIME flag on every inode, but needs to know whether it's a real noatime mount for an getattr optimization. While we're at it I've killed the IS_NOATIME/IS_NODIRATIME macros that were only used by touch_atime. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
28fd1298 |
|
08-Jan-2006 |
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> |
[PATCH] Fix and add EXPORT_SYMBOL(filemap_write_and_wait) This patch add EXPORT_SYMBOL(filemap_write_and_wait) and use it. See mm/filemap.c: And changes the filemap_write_and_wait() and filemap_write_and_wait_range(). Current filemap_write_and_wait() doesn't wait if filemap_fdatawrite() returns error. However, even if filemap_fdatawrite() returned an error, it may have submitted the partially data pages to the device. (e.g. in the case of -ENOSPC) <quotation> Andrew Morton writes, If filemap_fdatawrite() returns an error, this might be due to some I/O problem: dead disk, unplugged cable, etc. Given the generally crappy quality of the kernel's handling of such exceptions, there's a good chance that the filemap_fdatawait() will get stuck in D state forever. </quotation> So, this patch doesn't wait if filemap_fdatawrite() returns the -EIO. Trond, could you please review the nfs part? Especially I'm not sure, nfs must use the "filemap_fdatawrite(inode->i_mapping) == 0", or not. Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
58df095b |
|
03-Jan-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFSv4: Allow entries in the idmap cache to expire If someone changes the uid/gid mapping in userland, then we do eventually want those changes to be propagated to the kernel. Currently the kernel assumes that it may cache entries forever. Add an expiration time + garbage collector for idmap entries. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
f518e35a |
|
03-Jan-2006 |
Chuck Lever <cel@netapp.com> |
SUNRPC: get rid of cl_chatty Clean up: Every ULP that uses the in-kernel RPC client, except the NLM client, sets cl_chatty. There's no reason why NLM shouldn't set it, so just get rid of cl_chatty and always be verbose. Test-plan: Compile with CONFIG_NFS enabled. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
a72b4422 |
|
03-Jan-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFSv4: Allow user to set the port used by the NFSv4 callback channel Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
a895b4a1 |
|
03-Jan-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Clean up weak cache consistency code ...and ensure that nfs_update_inode() respects wcc Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
70b9ecbd |
|
03-Jan-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Make stat() return updated mtimes after a write() The SuS states that a call to write() will cause mtime to be updated on the file. In order to satisfy that requirement, we need to flush out any cached writes in nfs_getattr(). Speed things up slightly by not committing the writes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
40859d7e |
|
30-Nov-2005 |
Chuck Lever <cel@netapp.com> |
NFS: support large reads and writes on the wire Most NFS server implementations allow up to 64KB reads and writes on the wire. The Solaris NFS server allows up to a megabyte, for instance. Now the Linux NFS client supports transfer sizes up to 1MB, too. This will help reduce protocol and context switch overhead on read/write intensive NFS workloads, and support larger atomic read and write operations on servers that support them. Test-plan: Connectathon and iozone on mount point with wsize=rsize>32768 over TCP. Tests with NFS over UDP to verify the maximum RPC payload size cap. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
325cfed9 |
|
30-Nov-2005 |
Chuck Lever <cel@netapp.com> |
NFS: make "inode number mismatch" message more useful To help NFS users and server developers, make the "inode number mismatch" message display more useful information. Test-plan: None. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
dc20f803 |
|
30-Nov-2005 |
Chuck Lever <cel@netapp.com> |
NFS: get rid of useless kernel log message nfs_statfs() generates a log message when GETATTR returns an error. This is usually a useless message. Make it a dprintk. Test plan: None Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
6b59a754 |
|
30-Nov-2005 |
Chuck Lever <cel@netapp.com> |
NFS: Fix error recovery code in fs/nfs/inode.c:__init_nfs() Red Hat found a problem in the error recovery logic in __init_nfs. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
286d7d6a |
|
03-Jan-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFSv4: Remove requirement for machine creds for the "setclientid" operation Use a cred from the nfs4_client->cl_state_owners list. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
29884df0 |
|
13-Dec-2005 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix another O_DIRECT race Ensure we call unmap_mapping_range() and sync dirty pages to disk before doing an NFS direct write. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
24aa1fe6 |
|
03-Dec-2005 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix a few further cache consistency regressions Steve Dickson writes: Doing the following: 1. On server: $ mkdir ~/t $ echo Hello > ~/t/tmp 2. On client, wait for a string to appear in this file: $ until grep -q foo t/tmp ; do echo -n . ; sleep 1 ; done 3. On server, create a *new* file with the same name containing that string: $ mv ~/t/tmp ~/t/tmp.old; echo foo > ~/t/tmp will show how the client will never (and I mean never ;-) ) see the updated file. The problem is that we do not update nfsi->cache_change_attribute when the file changes on the server (we only update it when our client makes the changes). This again means that functions like nfs_check_verifier() will fail to register when the parent directory has changed and should trigger a dentry lookup revalidation. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
223db122 |
|
30-Nov-2005 |
Steve Dickson <steved@redhat.com> |
NFS: Fix cache consistency regression Make sure cache_change_attribute is initialized to jiffies so when the mtime changes on directory, the directory will be refreshed. Signed-off by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
b37b03b7 |
|
25-Nov-2005 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix a spinlock recursion inside nfs_update_inode() In cases where the server has gone insane, nfs_update_inode() may end up calling nfs_invalidate_inode(), which again calls stuff that takes the inode->i_lock that we're already holding. In addition, given the sort of things we have in NFS these days that need to be cleaned up on inode release, I'm not sure we should ever be calling make_bad_inode(). Fix up spinlock recursion, and limit nfs_invalidate_inode() to clearing the caches, and marking the inode as being stale. Thanks to Steve Dickson <SteveD@redhat.com> for spotting this. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
f99d49ad |
|
07-Nov-2005 |
Jesper Juhl <jesper.juhl@gmail.com> |
[PATCH] kfree cleanup: fs This is the fs/ part of the big kfree cleanup patch. Remove pointless checks for NULL prior to calling kfree() in fs/. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
d530838b |
|
04-Nov-2005 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFSv4: Fix problem with OPEN_DOWNGRADE RFC 3530 states that for OPEN_DOWNGRADE "The share_access and share_deny bits specified must be exactly equal to the union of the share_access and share_deny bits specified for some subset of the OPENs in effect for current openowner on the current file. Setattr is currently violating the NFSv4 rules for OPEN_DOWNGRADE in that it may cause a downgrade from OPEN4_SHARE_ACCESS_BOTH to OPEN4_SHARE_ACCESS_WRITE despite the fact that there exists no open file with O_WRONLY access mode. Fix the problem by replacing nfs4_find_state() with a modified version of nfs_find_open_context(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
d3f8cf48 |
|
30-Oct-2005 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
[PATCH] NFS: Remove unbalanced spin_unlock() calls from nfs_refresh_inode() Doh! Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
bec273b4 |
|
27-Oct-2005 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Allow files that are open for write to invalidate caches Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
decf491f |
|
27-Oct-2005 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Don't let nfs_end_data_update() clobber attribute update information Since we almost always call nfs_end_data_update() after we called nfs_refresh_inode(), we now end up marking the inode metadata as needing revalidation immediately after having updated it. This patch rearranges things so that we mark the inode as needing revalidation _before_ we call nfs_refresh_inode() on those operations that need it. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
33801147 |
|
27-Oct-2005 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Optimise inode attribute cache updates Allow nfs_refresh_inode() also to update attributes on the inode if the RPC call was sent after the last call to nfs_update_inode(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
913a70fc |
|
27-Oct-2005 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Convert cache_change_attribute into a jiffy-based value Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
642ac549 |
|
18-Oct-2005 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFSv4: Return delegations in case we're changing ACLs Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
cae7a073 |
|
18-Oct-2005 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFSv4: Return delegation upon rename or removal of file. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
6ce96917 |
|
17-Oct-2005 |
Trond Myklebust <trond.myklebust@fys.uio.no> |
[PATCH] NFS: Fix Oopsable/unnecessary i_count manipulations in nfs_wait_on_inode() Oopsable since nfs_wait_on_inode() can get called as part of iput_final(). Unnecessary since the caller had better be damned sure that the inode won't disappear from underneath it anyway. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
b3c52da3 |
|
17-Oct-2005 |
Trond Myklebust <trond.myklebust@fys.uio.no> |
[PATCH] NFS: Fix cache consistency races If the data cache has been marked as potentially invalid by nfs_refresh_inode, we should invalidate it rather than assume that changes are due to our own activity. Also ensure that we always start with a valid cache before declaring it to be protected by a delegation. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
3063d8a1 |
|
25-Aug-2005 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Make /proc/mounts display the protocol used by NFSv4 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
03bf4b70 |
|
25-Aug-2005 |
Chuck Lever <cel@netapp.com> |
[PATCH] RPC: parametrize various transport connect timeouts Each transport implementation can now set unique bind, connect, reestablishment, and idle timeout values. These are variables, allowing the values to be modified dynamically. This permits exponential backoff of any of these values, for instance. As an example, we implement exponential backoff for the connection reestablishment timeout. Test-plan: Destructive testing (unplugging the network temporarily). Connectathon with UDP and TCP. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
eab5c084 |
|
11-Aug-2005 |
Chuck Lever <cel@citi.umich.edu> |
[PATCH] NFS: use a constant value for TCP retransmit timeouts Implement a best practice: don't use exponential backoff when computing retransmit timeout values on TCP connections, but simply retransmit at regular intervals. This also fixes a bug introduced when xprt_reset_majortimeo() was added. Test-plan: Enable RPC debugging and watch timeout behavior on a NFS/TCP mount. Version: Thu, 11 Aug 2005 16:02:19 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
fef26658 |
|
09-Sep-2005 |
Mark Fasheh <mark.fasheh@oracle.com> |
[PATCH] update filesystems for new delete_inode behavior Update the file systems in fs/ implementing a delete_inode() callback to call truncate_inode_pages(). One implementation note: In developing this patch I put the calls to truncate_inode_pages() at the very top of those filesystems delete_inode() callbacks in order to retain the previous behavior. I'm guessing that some of those could probably be optimized. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
dc59250c |
|
18-Aug-2005 |
Chuck Lever <cel@citi.umich.edu> |
[PATCH] NFS: Introduce the use of inode->i_lock to protect fields in nfsi Down the road we want to eliminate the use of the global kernel lock entirely from the NFS client. To do this, we need to protect the fields in the nfs_inode structure adequately. Start by serializing updates to the "cache_validity" field. Note this change addresses an SMP hang found by njw@osdl.org, where processes deadlock because nfs_end_data_update and nfs_revalidate_mapping update the "cache_validity" field without proper serialization. Test plan: Millions of fsx ops on SMP clients. Run Nick Wilson's breaknfs program on large SMP clients. Signed-off-by: Chuck Lever <cel@netapp.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
412d582e |
|
18-Aug-2005 |
Chuck Lever <cel@citi.umich.edu> |
[PATCH] NFS: use atomic bitops to manipulate flags in nfsi->flags Introduce atomic bitops to manipulate the bits in the nfs_inode structure's "flags" field. Using bitops means we can use a generic wait_on_bit call instead of an ad hoc locking scheme in fs/nfs/inode.c, so we can remove the "nfs_i_wait" field from nfs_inode at the same time. The other new flags field will continue to use bitmask and logic AND and OR. This permits several flags to be set at the same time efficiently. The following patch adds a spin lock to protect these flags, and this spin lock will later cover other fields in the nfs_inode structure, amortizing the cost of using this type of serialization. Test plan: Millions of fsx ops on SMP clients. Signed-off-by: Chuck Lever <cel@netapp.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
55296809 |
|
18-Aug-2005 |
Chuck Lever <cel@citi.umich.edu> |
[PATCH] NFS: split nfsi->flags into two fields Certain bits in nfsi->flags can be manipulated with atomic bitops, and some are better manipulated via logical bitmask operations. This patch splits the flags field into two. The next patch introduces atomic bitops for one of the fields. Test plan: Millions of fsx ops on SMP clients. Signed-off-by: Chuck Lever <cel@netapp.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
65e4308d |
|
16-Aug-2005 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
[PATCH] NFS: Ensure we always update inode->i_mode when doing O_EXCL creates When the client performs an exclusive create and opens the file for writing, a Netapp filer will first create the file using the mode 01777. It does this since an NFSv3/v4 exclusive create cannot immediately set the mode bits. The 01777 mode then gets put into the inode->i_mode. After the file creation is successful, we then do a setattr to change the mode to the correct value (as per the NFS spec). The problem is that nfs_refresh_inode() no longer updates inode->i_mode, so the latter retains the 01777 mode. A bit later, the VFS notices this, and calls remove_suid(). This of course now resets the file mode to inode->i_mode & 0777. Hey presto, the file mode on the server is now magically changed to 0777. Duh... Fixes http://bugzilla.linux-nfs.org/show_bug.cgi?id=32 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
3da28eb1 |
|
22-Jun-2005 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
[PATCH] NFS: Replace nfs_page insertion sort with a radix sort Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
fe51beec |
|
22-Jun-2005 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
[PATCH] NFS: Ensure that fstat() always returns the correct mtime Even if the file is open for writes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
7d52e862 |
|
22-Jun-2005 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
[PATCH] NFS: Cleanup of caching code, and slight optimization of writes. Unless we're doing O_APPEND writes, we really don't care about revalidating the file length. Just make sure that we catch any page cache invalidations. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
951a143b |
|
22-Jun-2005 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
[PATCH] NFS: Fix the file size revalidation Instead of looking at whether or not the file is open for writes before we accept to update the length using the server value, we should rather be looking at whether or not we are currently caching any writes. Failure to do so means in particular that we're not updating the file length correctly after obtaining a POSIX or BSD lock. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
f0dd2136 |
|
22-Jun-2005 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
[PATCH] NFS: Clean up readdir changes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
00a92642 |
|
22-Jun-2005 |
Olivier Galibert <galibert@pobox.com> |
[PATCH] NFS: Hide NFS server-generated readdir cookies from userland NFSv3 currently returns the unsigned 64-bit cookie directly to userspace. The following patch causes the kernel to generate loff_t offsets for the benefit of userland. The current server-generated READDIR cookie is cached in the nfs_open_context instead of in filp->f_pos, so we still end up work correctly under directory insertions/deletion. Signed-off-by: Olivier Galibert <galibert@pobox.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
458818ed |
|
22-Jun-2005 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
[PATCH] NFS: Fix up v3 ACL caching code Initialize the inode cache values correctly. Clean up __nfs3_forget_cached_acls() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
055ffbea |
|
22-Jun-2005 |
Andreas Gruenbacher <agruen@suse.de> |
[PATCH] NFS: Fix handling of the umask when an NFSv3 default acl is present. NFSv3 has no concept of a umask on the server side: The client applies the umask locally, and sends the effective permissions to the server. This behavior is wrong when files are created in a directory that has a default ACL. In this case, the umask is supposed to be ignored, and only the default ACL determines the file's effective permissions. Usually its the server's task to conditionally apply the umask. But since the server knows nothing about the umask, we have to do it on the client side. This patch tries to fetch the parent directory's default ACL before creating a new file, computes the appropriate create mode to send to the server, and finally sets the new file's access and default acl appropriately. Many thanks to Buck Huppmann <buchk@pobox.com> for sending the initial version of this patch, as well as for arguing why we need this change. Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Olaf Kirch <okir@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
b7fa0554 |
|
22-Jun-2005 |
Andreas Gruenbacher <agruen@suse.de> |
[PATCH] NFS: Add support for NFSv3 ACLs This adds acl support fo nfs clients via the NFSACL protocol extension, by implementing the getxattr, listxattr, setxattr, and removexattr iops for the system.posix_acl_access and system.posix_acl_default attributes. This patch implements a dumb version that uses no caching (and thus adds some overhead). (Another patch in this patchset adds caching as well.) Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Olaf Kirch <okir@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
6a19275a |
|
22-Jun-2005 |
J. Bruce Fields <bfields@citi.umich.edu> |
[PATCH] RPC: [PATCH] improve rpcauthauth_create error returns Currently we return -ENOMEM for every single failure to create a new auth. This is actually accurate for auth_null and auth_unix, but for auth_gss it's a bit confusing. Allow rpcauth_create (and the ->create methods) to return errors. With this patch, the user may sometimes see an EINVAL instead. Whee. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
e50a1c2e |
|
22-Jun-2005 |
J. Bruce Fields <bfields@citi.umich.edu> |
[PATCH] NFSv4: client-side caching NFSv4 ACLs Add nfs4_acl field to the nfs_inode, and use it to cache acls. Only cache acls of size up to a page. Also prepare for up to a page of acl data even when the user doesn't pass in a buffer, as when they want to get the acl length to decide what size buffer to allocate. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
ada70d94 |
|
22-Jun-2005 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
[PATCH] NFS: Add hooks to allow common NFS attribute code to clear cached acls Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
92cfc62c |
|
22-Jun-2005 |
J. Bruce Fields <bfields@citi.umich.edu> |
[PATCH] NFS: Allow NFS versions to support different sets of inode operations. ACL support will require supporting additional inode operations in v4 (getxattr, setxattr, listxattr). This patch allows different protocol versions to support different inode operations by adding a file_inode_ops to the nfs_rpc_ops (to match the existing dir_inode_ops). Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
464a98bd |
|
22-Jun-2005 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
[PATCH] NFS: cleanup: shrink struct nfs_open_context Remove the wait queue, and replace the functions that depended on it with wait_on_bit(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
4ce79717 |
|
22-Jun-2005 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
[PATCH] NFS: Header file cleanup... - Move NFSv4 state definitions into a private header file. - Clean up gunk in nfs_fs.h Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
9085bbcb |
|
22-Jun-2005 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
[PATCH] NFS: Kill annoying mount version mismatch printks Ensure that we fix up the missing fields in the nfs_mount_data with sane defaults for older versions of mount, and return errors in the cases where we cannot. Convert a bunch of annoying warnings into dprintks() Return -EPROTONOSUPPORT rather than EIO if mount() tries to set NFSv3 without it actually being compiled in. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
5b616f5d |
|
22-Jun-2005 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
[PATCH] RPC: Make rpc_create_client() destroy the transport on failure. This saves us a couple of lines of cleanup code for each call. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
75c96f85 |
|
05-May-2005 |
Adrian Bunk <bunk@stusta.de> |
[PATCH] make some things static This patch makes some needlessly global identifiers static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Arjan van de Ven <arjanv@infradead.org> Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
1da177e4 |
|
16-Apr-2005 |
Linus Torvalds <torvalds@ppc970.osdl.org> |
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!
|