Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.5.0, Lustre 2.6.0
    • Lustre 2.5.0
    • None
    • RHEL 6.4, Lustre from master at 02a976b LU-3974 libcfs: replace num_physpages with totalram_pages
    • 3
    • 10863

    Description

      Reexport works but in kernel messages logged the following:
      Lustre: ctl-lustre-MDT0000: super-sequence allocation rc = 0 [0x0000000200000400-0x0000000240000400):0:mdt
      LustreError: 3819:0:(llite_nfs.c:105:search_inode_for_lustre()) can't get object attrs, fid [0x200000400:0x1:0x0], rc -2
      LustreError: 3819:0:(llite_nfs.c:105:search_inode_for_lustre()) can't get object attrs, fid [0x200000400:0x1:0x0], rc -2
      LustreError: 3819:0:(llite_nfs.c:105:search_inode_for_lustre()) can't get object attrs, fid [0x200000400:0x1:0x0], rc -2
      LustreError: 3819:0:(llite_nfs.c:105:search_inode_for_lustre()) can't get object attrs, fid [0x200000400:0x1:0x0], rc -2
      LustreError: 3819:0:(llite_nfs.c:105:search_inode_for_lustre()) can't get object attrs, fid [0x200000400:0x1:0x0], rc -2
      LustreError: 3819:0:(llite_nfs.c:105:search_inode_for_lustre()) can't get object attrs, fid [0x200000400:0x1:0x0], rc -2
      LustreError: 3819:0:(llite_nfs.c:105:search_inode_for_lustre()) Skipped 1 previous similar message
      LustreError: 3819:0:(llite_nfs.c:105:search_inode_for_lustre()) can't get object attrs, fid [0x200000400:0x1:0x0], rc -2
      LustreError: 3819:0:(llite_nfs.c:105:search_inode_for_lustre()) Skipped 2 previous similar messages

      Attachments

        Issue Links

          Activity

            [LU-4050] NFS reexport issue

            Dmitry,
            Would you open a new ticket for the remaining work that needs to be completed for this? Then close this ticket with a FixVersion of 2.5.0?
            Thank you!

            jlevi Jodi Levi (Inactive) added a comment - Dmitry, Would you open a new ticket for the remaining work that needs to be completed for this? Then close this ticket with a FixVersion of 2.5.0? Thank you!
            pjones Peter Jones added a comment -

            THis may simply be a duplicate of LU-3240 (now marked as a 2.5.0 blocker) in which case it can be closed as a duplicate of that ticket. If there is still any residual work then it is a lower priority and can be handled in a later release

            pjones Peter Jones added a comment - THis may simply be a duplicate of LU-3240 (now marked as a 2.5.0 blocker) in which case it can be closed as a duplicate of that ticket. If there is still any residual work then it is a lower priority and can be handled in a later release
            jhammond John Hammond added a comment -

            Andreas, you're not wrong. The parent directory is not updated for creates. From looking at this and the NFS side, it seems like nanosecond timestamps are the rightest thing here.

            jhammond John Hammond added a comment - Andreas, you're not wrong. The parent directory is not updated for creates. From looking at this and the NFS side, it seems like nanosecond timestamps are the rightest thing here.

            With this patch make directory on NFS share in sync with Lustre if used readdir(). But stat() is not updated if mtime is not changed in terms of seconds. Probably it's fine but I'd like to investigate what can be done to proper updating stat().

            dmiter Dmitry Eremin (Inactive) added a comment - With this patch make directory on NFS share in sync with Lustre if used readdir(). But stat() is not updated if mtime is not changed in terms of seconds. Probably it's fine but I'd like to investigate what can be done to proper updating stat().

            Dmitry,

            Can you clarify what you mean by helps? Does it resolve the issue? Or if it doesn't resolve the issue, how exactly does it help?

            paf Patrick Farrell (Inactive) added a comment - Dmitry, Can you clarify what you mean by helps? Does it resolve the issue? Or if it doesn't resolve the issue, how exactly does it help?

            Also patch set #6 of http://review.whamcloud.com/#/c/6460/ helps with this issue.

            dmiter Dmitry Eremin (Inactive) added a comment - Also patch set #6 of http://review.whamcloud.com/#/c/6460/ helps with this issue.

            I was thinking we could export the directory version as the Lustre inode version for the directory, but in fact that may not work. The inode version is typically the last transaction number that modified it, which seems reasonable, but to avoid contention in the case of parallel directory operations and version based recovery, I think that the parent directory version is not updated for every create, just for operations on the directory inode itself.

            Need to ask Mike about this to confirm.

            If I'm wrong then the inode version would be perfect for this.

            adilger Andreas Dilger added a comment - I was thinking we could export the directory version as the Lustre inode version for the directory, but in fact that may not work. The inode version is typically the last transaction number that modified it, which seems reasonable, but to avoid contention in the case of parallel directory operations and version based recovery, I think that the parent directory version is not updated for every create, just for operations on the directory inode itself. Need to ask Mike about this to confirm. If I'm wrong then the inode version would be perfect for this.
            jhammond John Hammond added a comment -

            I think this can be explained by the fact that we don't maintain i_version and we have second resolution timestamps. Thus a nfsd getattr followed by a Lustre mkdir or create in the same second defeats NFS's change_attr logic.

            jhammond John Hammond added a comment - I think this can be explained by the fact that we don't maintain i_version and we have second resolution timestamps. Thus a nfsd getattr followed by a Lustre mkdir or create in the same second defeats NFS's change_attr logic.
            jhammond John Hammond added a comment -

            Thanks Dmitry. There may be more than one issue at play here. There is certainly a spurious negative dentry caching effect on the NFS side. To see this assume /mnt/lustre is exported and mounted via NFS at /mnt/lustre-export and that /mnt/lustre/dir exists but is empty.

            # cat lu-4050.sh
            #!/bin/bash
            
            lustre_dir=/mnt/lustre/dir
            export_dir=/mnt/lustre-export/dir
            
            mkdir $lustre_dir/0
            stat  $export_dir/1
            mkdir $lustre_dir/1
            stat  $export_dir/1
            # bash -x lu-4050.sh
            + lustre_dir=/mnt/lustre/dir
            + export_dir=/mnt/lustre-export/dir
            + mkdir /mnt/lustre/dir/0
            + stat /mnt/lustre-export/dir/1
            stat: cannot stat `/mnt/lustre-export/dir/1': No such file or directory
            + mkdir /mnt/lustre/dir/1
            + stat /mnt/lustre-export/dir/1
            stat: cannot stat `/mnt/lustre-export/dir/1': No such file or directory
            

            The same effect is seen using touch rather than mkdir.

            Dmitry's two scripts can be simplified to:

            # cat lu-4050-bis.sh
            #!/bin/bash
            
            lustre_dir=/mnt/lustre/dir
            export_dir=/mnt/lustre-export/dir
            
            stat  $lustre_dir
            mkdir $export_dir/0
            stat  $export_dir
            stat  $export_dir/1
            stat  $lustre_dir
            mkdir $lustre_dir/1
            stat  $lustre_dir
            stat  $export_dir/1
            stat  $export_dir
            # bash -x lu-4050-bis.sh 
            + lustre_dir=/mnt/lustre/dir
            + export_dir=/mnt/lustre-export/dir
            + stat /mnt/lustre/dir
              File: `/mnt/lustre/dir'
              Size: 4096      	Blocks: 8          IO Block: 4096   directory
            Device: 2c54f966h/743766374d	Inode: 144115205255725102  Links: 2
            Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
            Access: 2013-10-05 20:13:23.000000000 -0500
            Modify: 2013-10-05 20:13:23.000000000 -0500
            Change: 2013-10-05 20:13:23.000000000 -0500
            + mkdir /mnt/lustre-export/dir/0
            + stat /mnt/lustre-export/dir
              File: `/mnt/lustre-export/dir'
              Size: 4096      	Blocks: 8          IO Block: 1048576 directory
            Device: 1dh/29d	Inode: 144115205255725102  Links: 3
            Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
            Access: 2013-10-05 20:13:23.000000000 -0500
            Modify: 2013-10-05 20:13:45.000000000 -0500
            Change: 2013-10-05 20:13:45.000000000 -0500
            + stat /mnt/lustre-export/dir/1
            stat: cannot stat `/mnt/lustre-export/dir/1': No such file or directory
            + stat /mnt/lustre/dir
              File: `/mnt/lustre/dir'
              Size: 4096      	Blocks: 8          IO Block: 4096   directory
            Device: 2c54f966h/743766374d	Inode: 144115205255725102  Links: 3
            Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
            Access: 2013-10-05 20:13:23.000000000 -0500
            Modify: 2013-10-05 20:13:45.000000000 -0500
            Change: 2013-10-05 20:13:45.000000000 -0500
            + mkdir /mnt/lustre/dir/1
            + stat /mnt/lustre/dir
              File: `/mnt/lustre/dir'
              Size: 4096      	Blocks: 8          IO Block: 4096   directory
            Device: 2c54f966h/743766374d	Inode: 144115205255725102  Links: 4
            Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
            Access: 2013-10-05 20:13:23.000000000 -0500
            Modify: 2013-10-05 20:13:45.000000000 -0500
            Change: 2013-10-05 20:13:45.000000000 -0500
            + stat /mnt/lustre-export/dir/1
            stat: cannot stat `/mnt/lustre-export/dir/1': No such file or directory
            + stat /mnt/lustre-export/dir
              File: `/mnt/lustre-export/dir'
              Size: 4096      	Blocks: 8          IO Block: 1048576 directory
            Device: 1dh/29d	Inode: 144115205255725102  Links: 3
            Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
            Access: 2013-10-05 20:13:23.000000000 -0500
            Modify: 2013-10-05 20:13:45.000000000 -0500
            Change: 2013-10-05 20:13:45.000000000 -0500
            

            Notice that there is no rmdir in the script and that the link counts reported for the directory are 2, 3, 3, 4, 3.

            jhammond John Hammond added a comment - Thanks Dmitry. There may be more than one issue at play here. There is certainly a spurious negative dentry caching effect on the NFS side. To see this assume /mnt/lustre is exported and mounted via NFS at /mnt/lustre-export and that /mnt/lustre/dir exists but is empty. # cat lu-4050.sh #!/bin/bash lustre_dir=/mnt/lustre/dir export_dir=/mnt/lustre-export/dir mkdir $lustre_dir/0 stat $export_dir/1 mkdir $lustre_dir/1 stat $export_dir/1 # bash -x lu-4050.sh + lustre_dir=/mnt/lustre/dir + export_dir=/mnt/lustre-export/dir + mkdir /mnt/lustre/dir/0 + stat /mnt/lustre-export/dir/1 stat: cannot stat `/mnt/lustre-export/dir/1': No such file or directory + mkdir /mnt/lustre/dir/1 + stat /mnt/lustre-export/dir/1 stat: cannot stat `/mnt/lustre-export/dir/1': No such file or directory The same effect is seen using touch rather than mkdir. Dmitry's two scripts can be simplified to: # cat lu-4050-bis.sh #!/bin/bash lustre_dir=/mnt/lustre/dir export_dir=/mnt/lustre-export/dir stat $lustre_dir mkdir $export_dir/0 stat $export_dir stat $export_dir/1 stat $lustre_dir mkdir $lustre_dir/1 stat $lustre_dir stat $export_dir/1 stat $export_dir # bash -x lu-4050-bis.sh + lustre_dir=/mnt/lustre/dir + export_dir=/mnt/lustre-export/dir + stat /mnt/lustre/dir File: `/mnt/lustre/dir' Size: 4096 Blocks: 8 IO Block: 4096 directory Device: 2c54f966h/743766374d Inode: 144115205255725102 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-10-05 20:13:23.000000000 -0500 Modify: 2013-10-05 20:13:23.000000000 -0500 Change: 2013-10-05 20:13:23.000000000 -0500 + mkdir /mnt/lustre-export/dir/0 + stat /mnt/lustre-export/dir File: `/mnt/lustre-export/dir' Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 1dh/29d Inode: 144115205255725102 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-10-05 20:13:23.000000000 -0500 Modify: 2013-10-05 20:13:45.000000000 -0500 Change: 2013-10-05 20:13:45.000000000 -0500 + stat /mnt/lustre-export/dir/1 stat: cannot stat `/mnt/lustre-export/dir/1': No such file or directory + stat /mnt/lustre/dir File: `/mnt/lustre/dir' Size: 4096 Blocks: 8 IO Block: 4096 directory Device: 2c54f966h/743766374d Inode: 144115205255725102 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-10-05 20:13:23.000000000 -0500 Modify: 2013-10-05 20:13:45.000000000 -0500 Change: 2013-10-05 20:13:45.000000000 -0500 + mkdir /mnt/lustre/dir/1 + stat /mnt/lustre/dir File: `/mnt/lustre/dir' Size: 4096 Blocks: 8 IO Block: 4096 directory Device: 2c54f966h/743766374d Inode: 144115205255725102 Links: 4 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-10-05 20:13:23.000000000 -0500 Modify: 2013-10-05 20:13:45.000000000 -0500 Change: 2013-10-05 20:13:45.000000000 -0500 + stat /mnt/lustre-export/dir/1 stat: cannot stat `/mnt/lustre-export/dir/1': No such file or directory + stat /mnt/lustre-export/dir File: `/mnt/lustre-export/dir' Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 1dh/29d Inode: 144115205255725102 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-10-05 20:13:23.000000000 -0500 Modify: 2013-10-05 20:13:45.000000000 -0500 Change: 2013-10-05 20:13:45.000000000 -0500 Notice that there is no rmdir in the script and that the link counts reported for the directory are 2, 3, 3, 4, 3.

            Just to clarify the status. The patch http://review.whamcloud.com/6460 make situation much better. With this patch readdir fully synchronize directory content and attributes. But only stat still don't updates in time. For example:

            # stat /net/l201u2/mnt/lustre/nfs
              File: `/net/l201u2/mnt/lustre/nfs'
              Size: 4096            Blocks: 8          IO Block: 524288 directory
            Device: 1bh/27d Inode: 144115205255725057  Links: 3
            Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
            Access: 2013-10-05 14:20:13.000000000 +0400
            Modify: 2013-10-05 22:55:53.000000000 +0400
            Change: 2013-10-05 22:55:53.000000000 +0400
            # stat /net/l201u2/mnt/lustre/nfs
              File: `/net/l201u2/mnt/lustre/nfs'
              Size: 4096            Blocks: 8          IO Block: 524288 directory
            Device: 1bh/27d Inode: 144115205255725057  Links: 3
            Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
            Access: 2013-10-05 14:20:13.000000000 +0400
            Modify: 2013-10-05 22:55:53.000000000 +0400
            Change: 2013-10-05 22:55:53.000000000 +0400
            # ls /net/l201u2/mnt/lustre/nfs
            123  234  456  test_ack.sh  test_ask.sh
            # stat /net/l201u2/mnt/lustre/nfs
              File: `/net/l201u2/mnt/lustre/nfs'
              Size: 4096            Blocks: 8          IO Block: 524288 directory
            Device: 1bh/27d Inode: 144115205255725057  Links: 5
            Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
            Access: 2013-10-05 14:20:13.000000000 +0400
            Modify: 2013-10-05 22:56:59.000000000 +0400
            Change: 2013-10-05 22:56:59.000000000 +0400
            

            As you can see only ls command update the stat information.

            dmiter Dmitry Eremin (Inactive) added a comment - Just to clarify the status. The patch http://review.whamcloud.com/6460 make situation much better. With this patch readdir fully synchronize directory content and attributes. But only stat still don't updates in time. For example: # stat /net/l201u2/mnt/lustre/nfs File: `/net/l201u2/mnt/lustre/nfs' Size: 4096 Blocks: 8 IO Block: 524288 directory Device: 1bh/27d Inode: 144115205255725057 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-10-05 14:20:13.000000000 +0400 Modify: 2013-10-05 22:55:53.000000000 +0400 Change: 2013-10-05 22:55:53.000000000 +0400 # stat /net/l201u2/mnt/lustre/nfs File: `/net/l201u2/mnt/lustre/nfs' Size: 4096 Blocks: 8 IO Block: 524288 directory Device: 1bh/27d Inode: 144115205255725057 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-10-05 14:20:13.000000000 +0400 Modify: 2013-10-05 22:55:53.000000000 +0400 Change: 2013-10-05 22:55:53.000000000 +0400 # ls /net/l201u2/mnt/lustre/nfs 123 234 456 test_ack.sh test_ask.sh # stat /net/l201u2/mnt/lustre/nfs File: `/net/l201u2/mnt/lustre/nfs' Size: 4096 Blocks: 8 IO Block: 524288 directory Device: 1bh/27d Inode: 144115205255725057 Links: 5 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-10-05 14:20:13.000000000 +0400 Modify: 2013-10-05 22:56:59.000000000 +0400 Change: 2013-10-05 22:56:59.000000000 +0400 As you can see only ls command update the stat information.

            People

              dmiter Dmitry Eremin (Inactive)
              dmiter Dmitry Eremin (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: