[LU-4050] NFS reexport issue Created: 02/Oct/13 Updated: 03/Jun/14 Resolved: 23/Oct/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.0 |
| Fix Version/s: | Lustre 2.5.0, Lustre 2.6.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Dmitry Eremin (Inactive) | Assignee: | Dmitry Eremin (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
RHEL 6.4, Lustre from master at 02a976b |
||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 10863 | ||||||||||||
| Description |
|
Reexport works but in kernel messages logged the following: |
| Comments |
| Comment by Dmitry Eremin (Inactive) [ 02/Oct/13 ] |
|
Also listing of files is not refreshed until something changed on NFS share side. You can create a lot of files on Lustre but those files will be not visible until create something on NFS. |
| Comment by Andreas Dilger [ 02/Oct/13 ] |
|
It is worthwhile to check if fid [0x200000400:0x1:0x0] actually exists or not. If this is a deleted file and NFS is just requesting attributes for a stale file handle then we should probably just turn this console error message off. |
| Comment by Dmitry Eremin (Inactive) [ 02/Oct/13 ] |
|
It looks just requesting attributes for a removed file. # mkdir /mnt/lustre/2 # lfs path2fid /mnt/lustre/2 [0x200000400:0x9b9:0x0] [ Make file or dir on NFS share to sync ] # dmesg -c # rm -rf /mnt/lustre/2 [Re-read from NFS share (just ls)] # dmesg -c LustreError: 3815:0:(llite_nfs.c:105:search_inode_for_lustre()) can't get object attrs, fid [0x200000400:0x9b9:0x0], rc -2 LustreError: 3815:0:(llite_nfs.c:105:search_inode_for_lustre()) Skipped 4 previous similar messages The error message will appear until make file or dir on NFS share to sync. |
| Comment by Andreas Dilger [ 03/Oct/13 ] |
|
I think in this case that the console message should just be D_INFO for -ENOENT, or maybe always? As for the NFS refresh problem, I think that is a bug? Shouldn't NFS revalidate the directory after a few seconds? |
| Comment by Dmitry Eremin (Inactive) [ 03/Oct/13 ] |
|
It's synced by timeout after about 1,5 min. It's not convenient for user I think. I measure the time when NFS observe that something changed in directory. changed time: 18:40:53.694375047 Is this bug? |
| Comment by Dmitry Eremin (Inactive) [ 03/Oct/13 ] |
|
Time of modification is not changed after directory creation. File: `/mnt/lustre/export' Size: 4096 Blocks: 8 IO Block: 4096 directory Device: 2c54f966h/743766374d Inode: 144115205255727587 Links: 4 Access: (0777/drwxrwxrwx) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-10-03 19:38:54.000000000 +0400 Modify: 2013-10-03 19:41:10.000000000 +0400 Change: 2013-10-03 19:41:10.000000000 +0400 mkdir /mnt/lustre/export/ack File: `/mnt/lustre/export' Size: 4096 Blocks: 8 IO Block: 4096 directory Device: 2c54f966h/743766374d Inode: 144115205255727587 Links: 5 Access: (0777/drwxrwxrwx) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-10-03 19:38:54.000000000 +0400 Modify: 2013-10-03 19:41:10.000000000 +0400 Change: 2013-10-03 19:41:10.000000000 +0400 |
| Comment by Peter Jones [ 04/Oct/13 ] |
|
Fixing message patch http://review.whamcloud.com/#/c/7850/ |
| Comment by Andreas Dilger [ 04/Oct/13 ] |
|
I suspect that the UPDATE IBITS lock is not being revoked or similar, so the Lustre client is incorrectly caching the timestamps. |
| Comment by John Hammond [ 04/Oct/13 ] |
|
If the UPDATE lock is not being revoked then why does the link count reported by stat change? |
| Comment by John Hammond [ 04/Oct/13 ] |
|
Dmitry, could you clarify your reproducer? Could you also post your /etc/exports and the output of mount on all relevant nodes? |
| Comment by Andreas Dilger [ 05/Oct/13 ] |
|
Could this be related to |
| Comment by Dmitry Eremin (Inactive) [ 05/Oct/13 ] |
|
I have checked with patch http://review.whamcloud.com/6460. It not fix the issue. The reproducer is following: On Lustre node I launch llmount.sh and then service nfs start. # mount l201u2@tcp:/lustre on /mnt/lustre type lustre (rw,user_xattr,flock) # cat /etc/exports /mnt/lustre 192.168.1.0/24(rw,no_root_squash,no_all_squash) On NFS node I mount NFS share: # mount l201u2:/mnt/lustre on /net/l201u2/mnt/lustre type nfs (rw,nosuid,nodev,intr,sloppy,vers=4,addr=192.168.1.2,clientaddr=192.168.1.1) Then launch test script on Lustre node: #!/bin/sh D=/mnt/lustre/nfs S=$(date +%T.%N) while [ ! -d $D/ask ]; do echo -n . done echo stat $D mkdir $D/ack && echo mkdir $D/ack stat $D F=$(date +%T.%N) echo Start: $S mkdir: $F The output is following: ............................................................. File: `/mnt/lustre/nfs' Size: 4096 Blocks: 8 IO Block: 4096 directory Device: 2c54f966h/743766374d Inode: 144115205255725057 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-10-05 14:20:13.000000000 +0400 Modify: 2013-10-05 14:31:24.000000000 +0400 Change: 2013-10-05 14:31:24.000000000 +0400 mkdir /mnt/lustre/nfs/ack File: `/mnt/lustre/nfs' Size: 4096 Blocks: 8 IO Block: 4096 directory Device: 2c54f966h/743766374d Inode: 144115205255725057 Links: 4 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-10-05 14:20:13.000000000 +0400 Modify: 2013-10-05 14:31:24.000000000 +0400 Change: 2013-10-05 14:31:24.000000000 +0400 Start: 14:31:21.749700343 mkdir: 14:31:24.247125347 Then launch test script on NFS node: #!/bin/sh D=/net/l201u2/mnt/lustre/nfs S=$(date +%T.%N) echo Start: $S mkdir $D/ask stat $D while [ -d $D/ask ]; do if [ -d $D/ack ]; then stat $D F=$(date +%T.%N) echo mkdir: $F rm -rf $D/ask $D/ack fi done echo Start: $S mkdir: $F The output is following: Start: 14:32:12.174908734 File: `/net/l201u2/mnt/lustre/nfs' Size: 4096 Blocks: 8 IO Block: 524288 directory Device: 1ah/26d Inode: 144115205255725057 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-10-05 14:20:13.000000000 +0400 Modify: 2013-10-05 14:31:24.000000000 +0400 Change: 2013-10-05 14:31:24.000000000 +0400 The script runs long time and after I executed "ls -la /net/l201u2/mnt/lustre/nfs" in other console finished: File: `/net/l201u2/mnt/lustre/nfs' Size: 4096 Blocks: 8 IO Block: 524288 directory Device: 1ah/26d Inode: 144115205255725057 Links: 4 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-10-05 14:20:13.000000000 +0400 Modify: 2013-10-05 14:31:24.000000000 +0400 Change: 2013-10-05 14:31:24.000000000 +0400 mkdir: 14:37:50.287751827 Start: 14:32:12.174908734 mkdir: 14:37:50.287751827 I observed that "ls" in other console immediately unblock the script and it see new directory that was created on Lustre node. This behavior I observe with patch http://review.whamcloud.com/6460. |
| Comment by Dmitry Eremin (Inactive) [ 05/Oct/13 ] |
|
With patch http://review.whamcloud.com/6460 I observe immediately changes propagation if I read a listing of directory. For example, "while true; do ls; done" on NFS node shows correct listing immediately when I change something on Lustre node. |
| Comment by Dmitry Eremin (Inactive) [ 05/Oct/13 ] |
|
Just to clarify the status. The patch http://review.whamcloud.com/6460 make situation much better. With this patch readdir fully synchronize directory content and attributes. But only stat still don't updates in time. For example: # stat /net/l201u2/mnt/lustre/nfs File: `/net/l201u2/mnt/lustre/nfs' Size: 4096 Blocks: 8 IO Block: 524288 directory Device: 1bh/27d Inode: 144115205255725057 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-10-05 14:20:13.000000000 +0400 Modify: 2013-10-05 22:55:53.000000000 +0400 Change: 2013-10-05 22:55:53.000000000 +0400 # stat /net/l201u2/mnt/lustre/nfs File: `/net/l201u2/mnt/lustre/nfs' Size: 4096 Blocks: 8 IO Block: 524288 directory Device: 1bh/27d Inode: 144115205255725057 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-10-05 14:20:13.000000000 +0400 Modify: 2013-10-05 22:55:53.000000000 +0400 Change: 2013-10-05 22:55:53.000000000 +0400 # ls /net/l201u2/mnt/lustre/nfs 123 234 456 test_ack.sh test_ask.sh # stat /net/l201u2/mnt/lustre/nfs File: `/net/l201u2/mnt/lustre/nfs' Size: 4096 Blocks: 8 IO Block: 524288 directory Device: 1bh/27d Inode: 144115205255725057 Links: 5 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-10-05 14:20:13.000000000 +0400 Modify: 2013-10-05 22:56:59.000000000 +0400 Change: 2013-10-05 22:56:59.000000000 +0400 As you can see only ls command update the stat information. |
| Comment by John Hammond [ 06/Oct/13 ] |
|
Thanks Dmitry. There may be more than one issue at play here. There is certainly a spurious negative dentry caching effect on the NFS side. To see this assume /mnt/lustre is exported and mounted via NFS at /mnt/lustre-export and that /mnt/lustre/dir exists but is empty. # cat lu-4050.sh #!/bin/bash lustre_dir=/mnt/lustre/dir export_dir=/mnt/lustre-export/dir mkdir $lustre_dir/0 stat $export_dir/1 mkdir $lustre_dir/1 stat $export_dir/1 # bash -x lu-4050.sh + lustre_dir=/mnt/lustre/dir + export_dir=/mnt/lustre-export/dir + mkdir /mnt/lustre/dir/0 + stat /mnt/lustre-export/dir/1 stat: cannot stat `/mnt/lustre-export/dir/1': No such file or directory + mkdir /mnt/lustre/dir/1 + stat /mnt/lustre-export/dir/1 stat: cannot stat `/mnt/lustre-export/dir/1': No such file or directory The same effect is seen using touch rather than mkdir. Dmitry's two scripts can be simplified to: # cat lu-4050-bis.sh #!/bin/bash lustre_dir=/mnt/lustre/dir export_dir=/mnt/lustre-export/dir stat $lustre_dir mkdir $export_dir/0 stat $export_dir stat $export_dir/1 stat $lustre_dir mkdir $lustre_dir/1 stat $lustre_dir stat $export_dir/1 stat $export_dir # bash -x lu-4050-bis.sh + lustre_dir=/mnt/lustre/dir + export_dir=/mnt/lustre-export/dir + stat /mnt/lustre/dir File: `/mnt/lustre/dir' Size: 4096 Blocks: 8 IO Block: 4096 directory Device: 2c54f966h/743766374d Inode: 144115205255725102 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-10-05 20:13:23.000000000 -0500 Modify: 2013-10-05 20:13:23.000000000 -0500 Change: 2013-10-05 20:13:23.000000000 -0500 + mkdir /mnt/lustre-export/dir/0 + stat /mnt/lustre-export/dir File: `/mnt/lustre-export/dir' Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 1dh/29d Inode: 144115205255725102 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-10-05 20:13:23.000000000 -0500 Modify: 2013-10-05 20:13:45.000000000 -0500 Change: 2013-10-05 20:13:45.000000000 -0500 + stat /mnt/lustre-export/dir/1 stat: cannot stat `/mnt/lustre-export/dir/1': No such file or directory + stat /mnt/lustre/dir File: `/mnt/lustre/dir' Size: 4096 Blocks: 8 IO Block: 4096 directory Device: 2c54f966h/743766374d Inode: 144115205255725102 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-10-05 20:13:23.000000000 -0500 Modify: 2013-10-05 20:13:45.000000000 -0500 Change: 2013-10-05 20:13:45.000000000 -0500 + mkdir /mnt/lustre/dir/1 + stat /mnt/lustre/dir File: `/mnt/lustre/dir' Size: 4096 Blocks: 8 IO Block: 4096 directory Device: 2c54f966h/743766374d Inode: 144115205255725102 Links: 4 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-10-05 20:13:23.000000000 -0500 Modify: 2013-10-05 20:13:45.000000000 -0500 Change: 2013-10-05 20:13:45.000000000 -0500 + stat /mnt/lustre-export/dir/1 stat: cannot stat `/mnt/lustre-export/dir/1': No such file or directory + stat /mnt/lustre-export/dir File: `/mnt/lustre-export/dir' Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 1dh/29d Inode: 144115205255725102 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-10-05 20:13:23.000000000 -0500 Modify: 2013-10-05 20:13:45.000000000 -0500 Change: 2013-10-05 20:13:45.000000000 -0500 Notice that there is no rmdir in the script and that the link counts reported for the directory are 2, 3, 3, 4, 3. |
| Comment by John Hammond [ 07/Oct/13 ] |
|
I think this can be explained by the fact that we don't maintain i_version and we have second resolution timestamps. Thus a nfsd getattr followed by a Lustre mkdir or create in the same second defeats NFS's change_attr logic. |
| Comment by Andreas Dilger [ 07/Oct/13 ] |
|
I was thinking we could export the directory version as the Lustre inode version for the directory, but in fact that may not work. The inode version is typically the last transaction number that modified it, which seems reasonable, but to avoid contention in the case of parallel directory operations and version based recovery, I think that the parent directory version is not updated for every create, just for operations on the directory inode itself. Need to ask Mike about this to confirm. If I'm wrong then the inode version would be perfect for this. |
| Comment by Dmitry Eremin (Inactive) [ 10/Oct/13 ] |
|
Also patch set #6 of http://review.whamcloud.com/#/c/6460/ helps with this issue. |
| Comment by Patrick Farrell (Inactive) [ 10/Oct/13 ] |
|
Dmitry, Can you clarify what you mean by helps? Does it resolve the issue? Or if it doesn't resolve the issue, how exactly does it help? |
| Comment by Dmitry Eremin (Inactive) [ 11/Oct/13 ] |
|
With this patch make directory on NFS share in sync with Lustre if used readdir(). But stat() is not updated if mtime is not changed in terms of seconds. Probably it's fine but I'd like to investigate what can be done to proper updating stat(). |
| Comment by John Hammond [ 11/Oct/13 ] |
|
Andreas, you're not wrong. The parent directory is not updated for creates. From looking at this and the NFS side, it seems like nanosecond timestamps are the rightest thing here. |
| Comment by Peter Jones [ 11/Oct/13 ] |
|
THis may simply be a duplicate of |
| Comment by Jodi Levi (Inactive) [ 21/Oct/13 ] |
|
Dmitry, |