[LU-4050] NFS reexport issue Created: 02/Oct/13  Updated: 03/Jun/14  Resolved: 23/Oct/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0
Fix Version/s: Lustre 2.5.0, Lustre 2.6.0

Type: Bug Priority: Minor
Reporter: Dmitry Eremin (Inactive) Assignee: Dmitry Eremin (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

RHEL 6.4, Lustre from master at 02a976b LU-3974 libcfs: replace num_physpages with totalram_pages


Issue Links:
Related
is related to LU-1158 nanosecond timestamp support for Lustre Reopened
is related to LU-3240 The link count is not updated after t... Resolved
Severity: 3
Rank (Obsolete): 10863

 Description   

Reexport works but in kernel messages logged the following:
Lustre: ctl-lustre-MDT0000: super-sequence allocation rc = 0 [0x0000000200000400-0x0000000240000400):0:mdt
LustreError: 3819:0:(llite_nfs.c:105:search_inode_for_lustre()) can't get object attrs, fid [0x200000400:0x1:0x0], rc -2
LustreError: 3819:0:(llite_nfs.c:105:search_inode_for_lustre()) can't get object attrs, fid [0x200000400:0x1:0x0], rc -2
LustreError: 3819:0:(llite_nfs.c:105:search_inode_for_lustre()) can't get object attrs, fid [0x200000400:0x1:0x0], rc -2
LustreError: 3819:0:(llite_nfs.c:105:search_inode_for_lustre()) can't get object attrs, fid [0x200000400:0x1:0x0], rc -2
LustreError: 3819:0:(llite_nfs.c:105:search_inode_for_lustre()) can't get object attrs, fid [0x200000400:0x1:0x0], rc -2
LustreError: 3819:0:(llite_nfs.c:105:search_inode_for_lustre()) can't get object attrs, fid [0x200000400:0x1:0x0], rc -2
LustreError: 3819:0:(llite_nfs.c:105:search_inode_for_lustre()) Skipped 1 previous similar message
LustreError: 3819:0:(llite_nfs.c:105:search_inode_for_lustre()) can't get object attrs, fid [0x200000400:0x1:0x0], rc -2
LustreError: 3819:0:(llite_nfs.c:105:search_inode_for_lustre()) Skipped 2 previous similar messages



 Comments   
Comment by Dmitry Eremin (Inactive) [ 02/Oct/13 ]

Also listing of files is not refreshed until something changed on NFS share side. You can create a lot of files on Lustre but those files will be not visible until create something on NFS.

Comment by Andreas Dilger [ 02/Oct/13 ]

It is worthwhile to check if fid [0x200000400:0x1:0x0] actually exists or not. If this is a deleted file and NFS is just requesting attributes for a stale file handle then we should probably just turn this console error message off.

Comment by Dmitry Eremin (Inactive) [ 02/Oct/13 ]

It looks just requesting attributes for a removed file.

# mkdir /mnt/lustre/2
# lfs path2fid /mnt/lustre/2
[0x200000400:0x9b9:0x0]

[ Make file or dir on NFS share to sync ]

# dmesg -c
# rm -rf /mnt/lustre/2

[Re-read from NFS share (just ls)]

# dmesg -c
LustreError: 3815:0:(llite_nfs.c:105:search_inode_for_lustre()) can't get object attrs, fid [0x200000400:0x9b9:0x0], rc -2
LustreError: 3815:0:(llite_nfs.c:105:search_inode_for_lustre()) Skipped 4 previous similar messages

The error message will appear until make file or dir on NFS share to sync.

Comment by Andreas Dilger [ 03/Oct/13 ]

I think in this case that the console message should just be D_INFO for -ENOENT, or maybe always?

As for the NFS refresh problem, I think that is a bug? Shouldn't NFS revalidate the directory after a few seconds?

Comment by Dmitry Eremin (Inactive) [ 03/Oct/13 ]

It's synced by timeout after about 1,5 min. It's not convenient for user I think. I measure the time when NFS observe that something changed in directory.

changed time: 18:40:53.694375047
observe time: 18:41:23.707787906

Is this bug?

Comment by Dmitry Eremin (Inactive) [ 03/Oct/13 ]

Time of modification is not changed after directory creation.

  File: `/mnt/lustre/export'
  Size: 4096            Blocks: 8          IO Block: 4096   directory
Device: 2c54f966h/743766374d    Inode: 144115205255727587  Links: 4
Access: (0777/drwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2013-10-03 19:38:54.000000000 +0400
Modify: 2013-10-03 19:41:10.000000000 +0400
Change: 2013-10-03 19:41:10.000000000 +0400

mkdir /mnt/lustre/export/ack

  File: `/mnt/lustre/export'
  Size: 4096            Blocks: 8          IO Block: 4096   directory
Device: 2c54f966h/743766374d    Inode: 144115205255727587  Links: 5
Access: (0777/drwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2013-10-03 19:38:54.000000000 +0400
Modify: 2013-10-03 19:41:10.000000000 +0400
Change: 2013-10-03 19:41:10.000000000 +0400
Comment by Peter Jones [ 04/Oct/13 ]

Fixing message patch http://review.whamcloud.com/#/c/7850/

Comment by Andreas Dilger [ 04/Oct/13 ]

I suspect that the UPDATE IBITS lock is not being revoked or similar, so the Lustre client is incorrectly caching the timestamps.

Comment by John Hammond [ 04/Oct/13 ]

If the UPDATE lock is not being revoked then why does the link count reported by stat change?

Comment by John Hammond [ 04/Oct/13 ]

Dmitry, could you clarify your reproducer? Could you also post your /etc/exports and the output of mount on all relevant nodes?

Comment by Andreas Dilger [ 05/Oct/13 ]

Could this be related to LU-3240 (patch http://review.whamcloud.com/6460)? I thought that patch was landed, but it is not.

Comment by Dmitry Eremin (Inactive) [ 05/Oct/13 ]

I have checked with patch http://review.whamcloud.com/6460. It not fix the issue.

The reproducer is following:

On Lustre node I launch llmount.sh and then service nfs start.

# mount
l201u2@tcp:/lustre on /mnt/lustre type lustre (rw,user_xattr,flock)
# cat /etc/exports
/mnt/lustre 192.168.1.0/24(rw,no_root_squash,no_all_squash)

On NFS node I mount NFS share:

# mount
l201u2:/mnt/lustre on /net/l201u2/mnt/lustre type nfs (rw,nosuid,nodev,intr,sloppy,vers=4,addr=192.168.1.2,clientaddr=192.168.1.1)

Then launch test script on Lustre node:

#!/bin/sh

D=/mnt/lustre/nfs

S=$(date +%T.%N)
while [ ! -d $D/ask ]; do
    echo -n .
done
echo
stat $D
mkdir $D/ack && echo mkdir $D/ack
stat $D
F=$(date +%T.%N)
echo Start: $S mkdir: $F

The output is following:

.............................................................
  File: `/mnt/lustre/nfs'
  Size: 4096            Blocks: 8          IO Block: 4096   directory
Device: 2c54f966h/743766374d    Inode: 144115205255725057  Links: 3
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2013-10-05 14:20:13.000000000 +0400
Modify: 2013-10-05 14:31:24.000000000 +0400
Change: 2013-10-05 14:31:24.000000000 +0400
mkdir /mnt/lustre/nfs/ack
  File: `/mnt/lustre/nfs'
  Size: 4096            Blocks: 8          IO Block: 4096   directory
Device: 2c54f966h/743766374d    Inode: 144115205255725057  Links: 4
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2013-10-05 14:20:13.000000000 +0400
Modify: 2013-10-05 14:31:24.000000000 +0400
Change: 2013-10-05 14:31:24.000000000 +0400
Start: 14:31:21.749700343 mkdir: 14:31:24.247125347

Then launch test script on NFS node:

#!/bin/sh

D=/net/l201u2/mnt/lustre/nfs

S=$(date +%T.%N)
echo Start: $S
mkdir $D/ask
stat $D
while [ -d $D/ask ]; do
    if [ -d $D/ack ]; then
        stat $D
        F=$(date +%T.%N)
        echo mkdir: $F
        rm -rf $D/ask $D/ack
    fi
done
echo Start: $S mkdir: $F

The output is following:

Start: 14:32:12.174908734
  File: `/net/l201u2/mnt/lustre/nfs'
  Size: 4096            Blocks: 8          IO Block: 524288 directory
Device: 1ah/26d Inode: 144115205255725057  Links: 3
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2013-10-05 14:20:13.000000000 +0400
Modify: 2013-10-05 14:31:24.000000000 +0400
Change: 2013-10-05 14:31:24.000000000 +0400

The script runs long time and after I executed "ls -la /net/l201u2/mnt/lustre/nfs" in other console finished:

  File: `/net/l201u2/mnt/lustre/nfs'
  Size: 4096            Blocks: 8          IO Block: 524288 directory
Device: 1ah/26d Inode: 144115205255725057  Links: 4
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2013-10-05 14:20:13.000000000 +0400
Modify: 2013-10-05 14:31:24.000000000 +0400
Change: 2013-10-05 14:31:24.000000000 +0400
mkdir: 14:37:50.287751827
Start: 14:32:12.174908734 mkdir: 14:37:50.287751827

I observed that "ls" in other console immediately unblock the script and it see new directory that was created on Lustre node. This behavior I observe with patch http://review.whamcloud.com/6460.

Comment by Dmitry Eremin (Inactive) [ 05/Oct/13 ]

With patch http://review.whamcloud.com/6460 I observe immediately changes propagation if I read a listing of directory. For example, "while true; do ls; done" on NFS node shows correct listing immediately when I change something on Lustre node.

Comment by Dmitry Eremin (Inactive) [ 05/Oct/13 ]

Just to clarify the status. The patch http://review.whamcloud.com/6460 make situation much better. With this patch readdir fully synchronize directory content and attributes. But only stat still don't updates in time. For example:

# stat /net/l201u2/mnt/lustre/nfs
  File: `/net/l201u2/mnt/lustre/nfs'
  Size: 4096            Blocks: 8          IO Block: 524288 directory
Device: 1bh/27d Inode: 144115205255725057  Links: 3
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2013-10-05 14:20:13.000000000 +0400
Modify: 2013-10-05 22:55:53.000000000 +0400
Change: 2013-10-05 22:55:53.000000000 +0400
# stat /net/l201u2/mnt/lustre/nfs
  File: `/net/l201u2/mnt/lustre/nfs'
  Size: 4096            Blocks: 8          IO Block: 524288 directory
Device: 1bh/27d Inode: 144115205255725057  Links: 3
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2013-10-05 14:20:13.000000000 +0400
Modify: 2013-10-05 22:55:53.000000000 +0400
Change: 2013-10-05 22:55:53.000000000 +0400
# ls /net/l201u2/mnt/lustre/nfs
123  234  456  test_ack.sh  test_ask.sh
# stat /net/l201u2/mnt/lustre/nfs
  File: `/net/l201u2/mnt/lustre/nfs'
  Size: 4096            Blocks: 8          IO Block: 524288 directory
Device: 1bh/27d Inode: 144115205255725057  Links: 5
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2013-10-05 14:20:13.000000000 +0400
Modify: 2013-10-05 22:56:59.000000000 +0400
Change: 2013-10-05 22:56:59.000000000 +0400

As you can see only ls command update the stat information.

Comment by John Hammond [ 06/Oct/13 ]

Thanks Dmitry. There may be more than one issue at play here. There is certainly a spurious negative dentry caching effect on the NFS side. To see this assume /mnt/lustre is exported and mounted via NFS at /mnt/lustre-export and that /mnt/lustre/dir exists but is empty.

# cat lu-4050.sh
#!/bin/bash

lustre_dir=/mnt/lustre/dir
export_dir=/mnt/lustre-export/dir

mkdir $lustre_dir/0
stat  $export_dir/1
mkdir $lustre_dir/1
stat  $export_dir/1
# bash -x lu-4050.sh
+ lustre_dir=/mnt/lustre/dir
+ export_dir=/mnt/lustre-export/dir
+ mkdir /mnt/lustre/dir/0
+ stat /mnt/lustre-export/dir/1
stat: cannot stat `/mnt/lustre-export/dir/1': No such file or directory
+ mkdir /mnt/lustre/dir/1
+ stat /mnt/lustre-export/dir/1
stat: cannot stat `/mnt/lustre-export/dir/1': No such file or directory

The same effect is seen using touch rather than mkdir.

Dmitry's two scripts can be simplified to:

# cat lu-4050-bis.sh
#!/bin/bash

lustre_dir=/mnt/lustre/dir
export_dir=/mnt/lustre-export/dir

stat  $lustre_dir
mkdir $export_dir/0
stat  $export_dir
stat  $export_dir/1
stat  $lustre_dir
mkdir $lustre_dir/1
stat  $lustre_dir
stat  $export_dir/1
stat  $export_dir
# bash -x lu-4050-bis.sh 
+ lustre_dir=/mnt/lustre/dir
+ export_dir=/mnt/lustre-export/dir
+ stat /mnt/lustre/dir
  File: `/mnt/lustre/dir'
  Size: 4096      	Blocks: 8          IO Block: 4096   directory
Device: 2c54f966h/743766374d	Inode: 144115205255725102  Links: 2
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2013-10-05 20:13:23.000000000 -0500
Modify: 2013-10-05 20:13:23.000000000 -0500
Change: 2013-10-05 20:13:23.000000000 -0500
+ mkdir /mnt/lustre-export/dir/0
+ stat /mnt/lustre-export/dir
  File: `/mnt/lustre-export/dir'
  Size: 4096      	Blocks: 8          IO Block: 1048576 directory
Device: 1dh/29d	Inode: 144115205255725102  Links: 3
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2013-10-05 20:13:23.000000000 -0500
Modify: 2013-10-05 20:13:45.000000000 -0500
Change: 2013-10-05 20:13:45.000000000 -0500
+ stat /mnt/lustre-export/dir/1
stat: cannot stat `/mnt/lustre-export/dir/1': No such file or directory
+ stat /mnt/lustre/dir
  File: `/mnt/lustre/dir'
  Size: 4096      	Blocks: 8          IO Block: 4096   directory
Device: 2c54f966h/743766374d	Inode: 144115205255725102  Links: 3
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2013-10-05 20:13:23.000000000 -0500
Modify: 2013-10-05 20:13:45.000000000 -0500
Change: 2013-10-05 20:13:45.000000000 -0500
+ mkdir /mnt/lustre/dir/1
+ stat /mnt/lustre/dir
  File: `/mnt/lustre/dir'
  Size: 4096      	Blocks: 8          IO Block: 4096   directory
Device: 2c54f966h/743766374d	Inode: 144115205255725102  Links: 4
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2013-10-05 20:13:23.000000000 -0500
Modify: 2013-10-05 20:13:45.000000000 -0500
Change: 2013-10-05 20:13:45.000000000 -0500
+ stat /mnt/lustre-export/dir/1
stat: cannot stat `/mnt/lustre-export/dir/1': No such file or directory
+ stat /mnt/lustre-export/dir
  File: `/mnt/lustre-export/dir'
  Size: 4096      	Blocks: 8          IO Block: 1048576 directory
Device: 1dh/29d	Inode: 144115205255725102  Links: 3
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2013-10-05 20:13:23.000000000 -0500
Modify: 2013-10-05 20:13:45.000000000 -0500
Change: 2013-10-05 20:13:45.000000000 -0500

Notice that there is no rmdir in the script and that the link counts reported for the directory are 2, 3, 3, 4, 3.

Comment by John Hammond [ 07/Oct/13 ]

I think this can be explained by the fact that we don't maintain i_version and we have second resolution timestamps. Thus a nfsd getattr followed by a Lustre mkdir or create in the same second defeats NFS's change_attr logic.

Comment by Andreas Dilger [ 07/Oct/13 ]

I was thinking we could export the directory version as the Lustre inode version for the directory, but in fact that may not work. The inode version is typically the last transaction number that modified it, which seems reasonable, but to avoid contention in the case of parallel directory operations and version based recovery, I think that the parent directory version is not updated for every create, just for operations on the directory inode itself.

Need to ask Mike about this to confirm.

If I'm wrong then the inode version would be perfect for this.

Comment by Dmitry Eremin (Inactive) [ 10/Oct/13 ]

Also patch set #6 of http://review.whamcloud.com/#/c/6460/ helps with this issue.

Comment by Patrick Farrell (Inactive) [ 10/Oct/13 ]

Dmitry,

Can you clarify what you mean by helps? Does it resolve the issue? Or if it doesn't resolve the issue, how exactly does it help?

Comment by Dmitry Eremin (Inactive) [ 11/Oct/13 ]

With this patch make directory on NFS share in sync with Lustre if used readdir(). But stat() is not updated if mtime is not changed in terms of seconds. Probably it's fine but I'd like to investigate what can be done to proper updating stat().

Comment by John Hammond [ 11/Oct/13 ]

Andreas, you're not wrong. The parent directory is not updated for creates. From looking at this and the NFS side, it seems like nanosecond timestamps are the rightest thing here.

Comment by Peter Jones [ 11/Oct/13 ]

THis may simply be a duplicate of LU-3240 (now marked as a 2.5.0 blocker) in which case it can be closed as a duplicate of that ticket. If there is still any residual work then it is a lower priority and can be handled in a later release

Comment by Jodi Levi (Inactive) [ 21/Oct/13 ]

Dmitry,
Would you open a new ticket for the remaining work that needs to be completed for this? Then close this ticket with a FixVersion of 2.5.0?
Thank you!

Generated at Sat Feb 10 01:39:13 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.