Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3550

Stale file handle on mount when mounting Lustre 2.4 via NFS

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.5.0
    • Lustre 2.4.0
    • None
    • 3
    • 8928

    Description

      When attempting to mount NFS exported Lustre, the mount operation reports 'stale file handle' and fails to complete. This happens with 2.4 servers and a 2.4 client. It does NOT happen with a 2.4 client and 2.2 servers.

      Investigation of the NFS traffic between the NFS client and NFS server (Lustre client) shows the NFS client requesting the file handle for the mount, then receiving a file handle back from the server. There is a bit more chatter, then the client sends back the same file handle as part of an info request. Then the server responds with a stale file handle error.

      This is happening on both CentOS 6.4 and SLES11SP2 clients.

      I'm attaching a series of logs of this issue.
      Here's a description of what's in those logs:
      Lustre MDS (2.4). (Full DK logs provided)
      Lustre Client(2.4)/NFS Server [The source of the NFS export] (Full DK logs & /var/log/messages with nfsd debug on full (0x7FFF))
      NFS Client (/var/log/messages with nfs debug set to 1, and a tcpdump of all traffic)

      For analyzing the tcpdump (if you need it - I suspect the NFS debug logs will make it irrelevant), the IP addresses:
      NFS Server: 172.29.53.155
      NFS Client: 172.29.53.160

      The /var/log/messages logs are not trimmed, sorry. Look for the last debug markers from Lustre in those files and you can line them up with the rest of the logs.

      Attachments

        Issue Links

          Activity

            [LU-3550] Stale file handle on mount when mounting Lustre 2.4 via NFS

            We have submitted related patch to the kernel maintainer, and hope the issue can be resolved from root. From Intel side, we cannot do more but waiting for the respond. If you have got your things work, we can close this ticket, and reopen it in future when needed.

            yong.fan nasf (Inactive) added a comment - We have submitted related patch to the kernel maintainer, and hope the issue can be resolved from root. From Intel side, we cannot do more but waiting for the respond. If you have got your things work, we can close this ticket, and reopen it in future when needed.

            I'm not sure what the long term plan is regarding this bug. The fundamental limitation isn't in Lustre, and we've got an acceptable workaround with setting FSID manually.

            Is further work planned on the Intel side, or should this bug be closed? Cray is getting along fine with the work around.

            paf Patrick Farrell (Inactive) added a comment - I'm not sure what the long term plan is regarding this bug. The fundamental limitation isn't in Lustre, and we've got an acceptable workaround with setting FSID manually. Is further work planned on the Intel side, or should this bug be closed? Cray is getting along fine with the work around.

            Hi Patrick,

            I downloaded the nfs-untils-1.2.3 source, and patched/compiled/tested on RHEL6 (2.6.32-358.6.1.el6). Not care the proc changes.

            yong.fan nasf (Inactive) added a comment - Hi Patrick, I downloaded the nfs-untils-1.2.3 source, and patched/compiled/tested on RHEL6 (2.6.32-358.6.1.el6). Not care the proc changes.

            nasf,

            Has WC tested the latest nfs-utils with CentOS 6.4? I thought I saw a proc interface change between the CentOS 6.4 kernel and the kernels targeted by 1.2.8, but I could be wrong about that.

            • Patrick
            paf Patrick Farrell (Inactive) added a comment - nasf, Has WC tested the latest nfs-utils with CentOS 6.4? I thought I saw a proc interface change between the CentOS 6.4 kernel and the kernels targeted by 1.2.8, but I could be wrong about that. Patrick
            yong.fan nasf (Inactive) added a comment - - edited

            Above patch is for the latest nfs-utils. If you want to use nfs-utils-1.2.3, then the following one:

            343,344c343,344
            < 	uint64_t inode=0;
            < 	uint64_t inode64;
            ---
            > 	unsigned int inode=0;
            > 	unsigned long long inode64;
            
            yong.fan nasf (Inactive) added a comment - - edited Above patch is for the latest nfs-utils. If you want to use nfs-utils-1.2.3, then the following one: 343,344c343,344 < uint64_t inode=0; < uint64_t inode64; --- > unsigned int inode=0; > unsigned long long inode64;

            nasf,

            I've been trying to build nfs-utils 1.2.3 [default in CentOS 6.4] (without patches, just to verify I can) and I am stuck in a dependency hell, with it not finding various installed packages. A bit of searching shows that patching has been done to nfs-utils to clean up a lot of unnecessary dependencies, which include the ones I'm dealing with.
            (http://www.spinics.net/lists/linux-nfs/msg26388.html)

            However, as I understand it, the kernel nfsd /proc interface has changed since CentOS 6.4 and SLES11SP2, so I can't just go grab the latest nfs-utils and expect it to work.

            Do you have a particular version you recommend building, or any tips on this?

            I may be able to land that linking patch by itself without problem and will try that next, but I thought I'd ask you as well.

            • Patrick
            paf Patrick Farrell (Inactive) added a comment - nasf, I've been trying to build nfs-utils 1.2.3 [default in CentOS 6.4] (without patches, just to verify I can) and I am stuck in a dependency hell, with it not finding various installed packages. A bit of searching shows that patching has been done to nfs-utils to clean up a lot of unnecessary dependencies, which include the ones I'm dealing with. ( http://www.spinics.net/lists/linux-nfs/msg26388.html ) However, as I understand it, the kernel nfsd /proc interface has changed since CentOS 6.4 and SLES11SP2, so I can't just go grab the latest nfs-utils and expect it to work. Do you have a particular version you recommend building, or any tips on this? I may be able to land that linking patch by itself without problem and will try that next, but I thought I'd ask you as well. Patrick

            There are two issues for this topic:

            1) Originally, Lustre did not return FSID via statfs() to nfs-utils. This issue has been resolved by the patch http://review.whamcloud.com/6493, which has already been landed to master (Lustre-2.5)

            2) The nfs-utils defect of converting 64-bits ino# into 32-bits and causes information lost as to cannot locate the right root. It can be resolved by the patch:

            diff --git a/utils/mountd/cache.c b/utils/mountd/cache.c
            index 517aa62..a7212e7 100644
            --- a/utils/mountd/cache.c
            +++ b/utils/mountd/cache.c
            @@ -388,7 +388,7 @@ struct parsed_fsid {
                    int fsidtype;
                    /* We could use a union for this, but it would be more
                     * complicated; why bother? */
            -       unsigned int inode;
            +       uint64_t inode;
                    unsigned int minor;
                    unsigned int major;
                    unsigned int fsidnum;
            -- 
            1.7.1
            

            If you have chance, you can test above two patches together for verification. Thanks!

            yong.fan nasf (Inactive) added a comment - There are two issues for this topic: 1) Originally, Lustre did not return FSID via statfs() to nfs-utils. This issue has been resolved by the patch http://review.whamcloud.com/6493 , which has already been landed to master (Lustre-2.5) 2) The nfs-utils defect of converting 64-bits ino# into 32-bits and causes information lost as to cannot locate the right root. It can be resolved by the patch: diff --git a/utils/mountd/cache.c b/utils/mountd/cache.c index 517aa62..a7212e7 100644 --- a/utils/mountd/cache.c +++ b/utils/mountd/cache.c @@ -388,7 +388,7 @@ struct parsed_fsid { int fsidtype; /* We could use a union for this , but it would be more * complicated; why bother? */ - unsigned int inode; + uint64_t inode; unsigned int minor; unsigned int major; unsigned int fsidnum; -- 1.7.1 If you have chance, you can test above two patches together for verification. Thanks!

            I've discussed this internally at Cray with someone with NFS expertise.

            He agrees that this work around (using the -o fsid= option to exportfs when exporting Lustre over NFS) is the appropriate solution, as the only other option is a fairly invasive patch to the NFS code in the Linux kernel. In light of that, WC may want to update documentation for exporting Lustre over NFS, but no code changes are necessary.

            paf Patrick Farrell (Inactive) added a comment - I've discussed this internally at Cray with someone with NFS expertise. He agrees that this work around (using the -o fsid= option to exportfs when exporting Lustre over NFS) is the appropriate solution, as the only other option is a fairly invasive patch to the NFS code in the Linux kernel. In light of that, WC may want to update documentation for exporting Lustre over NFS, but no code changes are necessary.

            Andreas's patch is for an issue with parsing 64 bit inode numbers in NFS-utils, and so isn't involved here.

            The problem is this:
            In NFS, for mount requests, the root of the file system is identified by only the FSID. The FSID is defined as a 64 bit integer type in NFSv3 (See http://tools.ietf.org/html/rfc1813 for the NFSv3 RFC), but in the default case in Linux, it's built as two 32 bit integers, one of which is the inode. For NFSv4, it is two 64 bit types (http://www.ietf.org/rfc/rfc3530.txt).

            Linux includes an option for a 64 bit inode type, in that case:

            case FSID_UUID16_INUM:
                            /* 8 byte inode and 16 byte fsid */
                            *(u64*)fsidv = (u64)ino;
                            memcpy(fsidv+2, uuid, 16);
                            break;
            


            The vers value set here is a function of the export options and whether or not it's a root export.
            Here's the relevant code, from set_version_and_fsid_type:

                    } else if (exp->ex_flags & NFSEXP_FSID) {
                            fsid_type = FSID_NUM;
                    } else if (exp->ex_uuid) {
                            if (fhp->fh_maxsize >= 64) {
                                    if (is_root_export(exp))
                                            fsid_type = FSID_UUID16;
                                    else
                                            fsid_type = FSID_UUID16_INUM;
                            } else {
                                    if (is_root_export(exp))
                                            fsid_type = FSID_UUID8;
                                    else
                                            fsid_type = FSID_UUID4_INUM;
                            }
                    } else if (!old_valid_dev(exp_sb(exp)->s_dev))
                            /* for newer device numbers, we must use a newer fsid format */
                            fsid_type = FSID_ENCODE_DEV;
                    else
                            fsid_type = FSID_DEV;
            

            The export option in question (ex_uuid) is one I can't quite figure out how to set in the export options.
            The code which parses the export options seems to be expecting the character sequence "uuid" (svc_export_parse), but when specifying -o uuid=[some uuid], I get an error.
            I can't find out how to actually set this flag.

            On the other hand, when I do -o fsid= on the export I can specify an integer or a UUID.
            This (either the integer or the UUID) allows me to mount the NFS export and do normal operations on it with NFSv4 or NFSv3 (at least client side. I will be testing with an NFSv3 only server today.).

            Presumably this is hitting this case in set_version_and_fsid:

            } else if (exp->ex_flags & NFSEXP_FSID) {
                            fsid_type = FSID_NUM;
            

            In any case, this appears to be a work around. Longer term, if we don't wish to have to specify -o fsid=, the NFS code in the kernel would need to change somehow to support 64 bit inodes in FSIDs.

            paf Patrick Farrell (Inactive) added a comment - - edited Andreas's patch is for an issue with parsing 64 bit inode numbers in NFS-utils, and so isn't involved here. The problem is this: In NFS, for mount requests, the root of the file system is identified by only the FSID. The FSID is defined as a 64 bit integer type in NFSv3 (See http://tools.ietf.org/html/rfc1813 for the NFSv3 RFC), but in the default case in Linux, it's built as two 32 bit integers, one of which is the inode. For NFSv4, it is two 64 bit types ( http://www.ietf.org/rfc/rfc3530.txt ). Linux includes an option for a 64 bit inode type, in that case: case FSID_UUID16_INUM: /* 8 byte inode and 16 byte fsid */ *(u64*)fsidv = (u64)ino; memcpy(fsidv+2, uuid, 16); break ; — The vers value set here is a function of the export options and whether or not it's a root export. Here's the relevant code, from set_version_and_fsid_type: } else if (exp->ex_flags & NFSEXP_FSID) { fsid_type = FSID_NUM; } else if (exp->ex_uuid) { if (fhp->fh_maxsize >= 64) { if (is_root_export(exp)) fsid_type = FSID_UUID16; else fsid_type = FSID_UUID16_INUM; } else { if (is_root_export(exp)) fsid_type = FSID_UUID8; else fsid_type = FSID_UUID4_INUM; } } else if (!old_valid_dev(exp_sb(exp)->s_dev)) /* for newer device numbers, we must use a newer fsid format */ fsid_type = FSID_ENCODE_DEV; else fsid_type = FSID_DEV; The export option in question (ex_uuid) is one I can't quite figure out how to set in the export options. The code which parses the export options seems to be expecting the character sequence "uuid" (svc_export_parse), but when specifying -o uuid= [some uuid] , I get an error. I can't find out how to actually set this flag. On the other hand, when I do -o fsid= on the export I can specify an integer or a UUID. This (either the integer or the UUID) allows me to mount the NFS export and do normal operations on it with NFSv4 or NFSv3 (at least client side. I will be testing with an NFSv3 only server today.). Presumably this is hitting this case in set_version_and_fsid: } else if (exp->ex_flags & NFSEXP_FSID) { fsid_type = FSID_NUM; In any case, this appears to be a work around. Longer term, if we don't wish to have to specify -o fsid=, the NFS code in the kernel would need to change somehow to support 64 bit inodes in FSIDs.

            Excuse me - A closer look at the patch from Andreas suggests it's for a related issue but not exactly the one we're facing.

            The issue I'm looking at comes up in mk_fsid, called from fh_compose, which is called from exp_rootfh:

            static inline void mk_fsid(int vers, u32 *fsidv, dev_t dev, ino_t ino,
                                       u32 fsid, unsigned char *uuid)
            {
                    u32 *up;
                    switch(vers) {
                    case FSID_DEV:
                            fsidv[0] = htonl((MAJOR(dev)<<16) |
                                             MINOR(dev));
                            fsidv[1] = ino_t_to_u32(ino);
                            break;
            }
            

            Where we see the inode being coerced to 32 bits. This is what goes out on the wire to the client, even though Lustre has 64 bit inodes.

            I will have to look more closely at Andreas's patch and the issue it's resolving, as well as the code I noted above, to understand fully what's going on.

            paf Patrick Farrell (Inactive) added a comment - - edited Excuse me - A closer look at the patch from Andreas suggests it's for a related issue but not exactly the one we're facing. The issue I'm looking at comes up in mk_fsid, called from fh_compose, which is called from exp_rootfh: static inline void mk_fsid( int vers, u32 *fsidv, dev_t dev, ino_t ino, u32 fsid, unsigned char *uuid) { u32 *up; switch (vers) { case FSID_DEV: fsidv[0] = htonl((MAJOR(dev)<<16) | MINOR(dev)); fsidv[1] = ino_t_to_u32(ino); break ; } Where we see the inode being coerced to 32 bits. This is what goes out on the wire to the client, even though Lustre has 64 bit inodes. I will have to look more closely at Andreas's patch and the issue it's resolving, as well as the code I noted above, to understand fully what's going on.

            People

              yong.fan nasf (Inactive)
              paf Patrick Farrell (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: