Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2904

parallel-scale-nfsv3: FAIL: setup nfs failed!

Details

    • 3
    • 6993

    Description

      The parallel-scale-nfsv3 test failed as follows:

      Mounting NFS clients (version 3)...
      CMD: client-12vm1,client-12vm2 mkdir -p /mnt/lustre
      CMD: client-12vm1,client-12vm2 mount -t nfs -o nfsvers=3,async                 client-12vm3:/mnt/lustre /mnt/lustre
      client-12vm2: mount.nfs: Connection timed out
      client-12vm1: mount.nfs: Connection timed out
       parallel-scale-nfsv3 : @@@@@@ FAIL: setup nfs failed! 
      

      Syslog on Lustre MDS/Lustre Client/NFS Server client-12vm3 showed that:

      Mar  4 17:34:15 client-12vm3 mrshd[4254]: root@client-12vm1.lab.whamcloud.com as root: cmd='(PATH=$PATH:/usr/lib64/lustre/utils:/usr/lib64/lustre/tests:/sbin:/usr/sbin; cd /usr/lib64/lustre/tests; LUSTRE="/usr/lib64/lustre"  sh -c "exportfs -o rw,async,no_root_squash *:/mnt/lustre         && exportfs -v");echo XXRETCODE:$?'
      Mar  4 17:34:15 client-12vm3 xinetd[1640]: EXIT: mshell status=0 pid=4253 duration=0(sec)
      Mar  4 17:34:16 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:894 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:34:16 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:713 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:34:17 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:784 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:34:17 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:877 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:34:19 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:946 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:34:19 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:1013 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:34:23 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:797 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:34:23 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:701 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:34:31 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:719 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:34:31 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:941 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:34:41 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:943 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:34:41 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:810 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:34:51 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:849 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:34:51 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:740 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:35:01 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:846 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:35:01 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:667 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:35:11 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:955 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:35:11 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:1006 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:35:21 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:828 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:35:21 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:739 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:35:31 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:1011 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:35:31 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:994 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:35:41 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:847 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:35:41 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:756 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:35:51 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:892 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:35:51 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:749 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:36:01 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:1017 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:36:01 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:873 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:36:11 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:874 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:36:11 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:749 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:36:21 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:916 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:36:21 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:841 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:36:21 client-12vm3 xinetd[1640]: START: mshell pid=4286 from=::ffff:10.10.4.206
      Mar  4 17:36:21 client-12vm3 mrshd[4287]: root@client-12vm1.lab.whamcloud.com as root: cmd='/usr/sbin/lctl mark "/usr/sbin/lctl mark  parallel-scale-nfsv3 : @@@@@@ FAIL: setup nfs failed! ";echo XXRETCODE:$?'
      Mar  4 17:36:21 client-12vm3 kernel: Lustre: DEBUG MARKER: /usr/sbin/lctl mark  parallel-scale-nfsv3 : @@@@@@ FAIL: setup nfs failed!
      

      Maloo report: https://maloo.whamcloud.com/test_sets/5cbf6978-853e-11e2-bfd3-52540035b04c

      Attachments

        Issue Links

          Activity

            [LU-2904] parallel-scale-nfsv3: FAIL: setup nfs failed!

            1) In theory, we can fill the low 32-bits of FSID with s_dev, and fill the high 32-bits of FSID as anything, such as Lustre magic. But in spite of what is filled, we cannot control how the caller to use the returned FSID. And there is no explicit advantage of replacing current patch, since the 64-bits FSID only be generated when mount.

            yong.fan nasf (Inactive) added a comment - 1) In theory, we can fill the low 32-bits of FSID with s_dev, and fill the high 32-bits of FSID as anything, such as Lustre magic. But in spite of what is filled, we cannot control how the caller to use the returned FSID. And there is no explicit advantage of replacing current patch, since the 64-bits FSID only be generated when mount.

            1) I understand (and agree) about returning a fs id from statfs - but think we may use a s_dev for it and 32bit is enough, and we may use an high part of fs id with filling lustre magic (if need).

            2) let me time until Monday to look into mounted code carefully.

            shadow Alexey Lyashkov added a comment - 1) I understand (and agree) about returning a fs id from statfs - but think we may use a s_dev for it and 32bit is enough, and we may use an high part of fs id with filling lustre magic (if need). 2) let me time until Monday to look into mounted code carefully.

            The root issue is in user space nfs-utils.

            1) The FSID returned via statfs() to nfs-utils is 64-bits, in spite of use the new generated 64-bits FSID or reuse the old 32-bits FSID. If old 32-bits FSID is used, then __kernel_fsid_t::val[1] (or val[0]) is zero. Before this patch applied, Lustre did not return FSID via statfs().

            2) The root NFS handle contains root inode#. Lustre root inode# is 64-bits, but when nfs-utils parses the root handle, it is converted to 32-bits. So it cannot locate the right "inode". I have made a patch for that, and sent it to related kernel maintainers, and hope the patch can be accepted/landed in the next nfs-utils release.

            diff --git a/utils/mountd/cache.c b/utils/mountd/cache.c
            index 517aa62..a7212e7 100644
            — a/utils/mountd/cache.c
            +++ b/utils/mountd/cache.c
            @@ -388,7 +388,7 @@ struct parsed_fsid {
            int fsidtype;
            /* We could use a union for this, but it would be more

            • complicated; why bother? */
            • unsigned int inode;
              + uint64_t inode;
              unsigned int minor;
              unsigned int major;
              unsigned int fsidnum;

              1.7.1
            yong.fan nasf (Inactive) added a comment - The root issue is in user space nfs-utils. 1) The FSID returned via statfs() to nfs-utils is 64-bits, in spite of use the new generated 64-bits FSID or reuse the old 32-bits FSID. If old 32-bits FSID is used, then __kernel_fsid_t::val [1] (or val [0] ) is zero. Before this patch applied, Lustre did not return FSID via statfs(). 2) The root NFS handle contains root inode#. Lustre root inode# is 64-bits, but when nfs-utils parses the root handle, it is converted to 32-bits. So it cannot locate the right "inode". I have made a patch for that, and sent it to related kernel maintainers, and hope the patch can be accepted/landed in the next nfs-utils release. diff --git a/utils/mountd/cache.c b/utils/mountd/cache.c index 517aa62..a7212e7 100644 — a/utils/mountd/cache.c +++ b/utils/mountd/cache.c @@ -388,7 +388,7 @@ struct parsed_fsid { int fsidtype; /* We could use a union for this, but it would be more complicated; why bother? */ unsigned int inode; + uint64_t inode; unsigned int minor; unsigned int major; unsigned int fsidnum; – 1.7.1

            anyway most kernel FS uses

            <------>u64 id = huge_encode_dev(sb->s_bdev->bd_dev);
            <------>buf->f_fsid.val[0] = (u32)id;
            <------>buf->f_fsid.val[1] = (u32)(id >> 32);
            

            i don't understand - why do not use same.

            shadow Alexey Lyashkov added a comment - anyway most kernel FS uses <------>u64 id = huge_encode_dev(sb->s_bdev->bd_dev); <------>buf->f_fsid.val[0] = (u32)id; <------>buf->f_fsid.val[1] = (u32)(id >> 32); i don't understand - why do not use same.

            did you really think 2^32 lustre mounts exist on single node? FSid just an unique id for a mount point.
            root cause to have it's 64bit - 32bit for block device id and 32bit for slice number inside of block device, so just need identify a mount point correctly. In case lustre, we don't have a slices inside a device and don't need a fill it, but device id exactly identify an mount point.

            as about root node for an export, let me look - but as i remember a nfs-tools it's also as about NFS handle from an kernel.

            shadow Alexey Lyashkov added a comment - did you really think 2^32 lustre mounts exist on single node? FSid just an unique id for a mount point. root cause to have it's 64bit - 32bit for block device id and 32bit for slice number inside of block device, so just need identify a mount point correctly. In case lustre, we don't have a slices inside a device and don't need a fill it, but device id exactly identify an mount point. as about root node for an export, let me look - but as i remember a nfs-tools it's also as about NFS handle from an kernel.

            Honestly, I am not sure whether 32-bits is enough or not for kinds of statfs() users. It is true that in mixed environment new client will export 64-bits FSID and old client will export 32-bits FSID, such difference may cause issues if the users want to access Lustre via different clients with the same handle. But I do not know whether someone really will want to use Lustre as that. From a long view, we need to upgrade the FSID to 64-bits, otherwise, if 32-bits is always enough, the statfs() API should be shrink...

            As for NFS handle with lu_fid, it works for objects under export-point, but the root NFS handle does not contains the lu_fid (which does not goes down to Lustre). That is why we make this patch.

            yong.fan nasf (Inactive) added a comment - Honestly, I am not sure whether 32-bits is enough or not for kinds of statfs() users. It is true that in mixed environment new client will export 64-bits FSID and old client will export 32-bits FSID, such difference may cause issues if the users want to access Lustre via different clients with the same handle. But I do not know whether someone really will want to use Lustre as that. From a long view, we need to upgrade the FSID to 64-bits, otherwise, if 32-bits is always enough, the statfs() API should be shrink... As for NFS handle with lu_fid, it works for objects under export-point, but the root NFS handle does not contains the lu_fid (which does not goes down to Lustre). That is why we make this patch.

            fsid have a single requirement - that is should be same for a cluster and unique on node.
            i think 32bits uid is enough to encode FS id in statfs.
            but using a single FS have a benefits with interop - different nodes (with older and new nfsd) have same fs id in NFS handles so may used in failover pair.

            I have some question to second patch also - we have prepared own NFS handle structure with lu_fid inside and it's should be don't have a limitation over 32bits, if we have lost one code patch and nfs handle created with wrong format - we need invest it is.

            shadow Alexey Lyashkov added a comment - fsid have a single requirement - that is should be same for a cluster and unique on node. i think 32bits uid is enough to encode FS id in statfs. but using a single FS have a benefits with interop - different nodes (with older and new nfsd) have same fs id in NFS handles so may used in failover pair. I have some question to second patch also - we have prepared own NFS handle structure with lu_fid inside and it's should be don't have a limitation over 32bits, if we have lost one code patch and nfs handle created with wrong format - we need invest it is.

            32-bits uuid maybe works for this case, but since the POSIX API is 64-bits, and the statfs() is not only for re-exporting via nfs, but also for others, so we prefer to generate and return 64-bits uuid as expected.

            yong.fan nasf (Inactive) added a comment - 32-bits uuid maybe works for this case, but since the POSIX API is 64-bits, and the statfs() is not only for re-exporting via nfs, but also for others, so we prefer to generate and return 64-bits uuid as expected.

            last patch
            http://git.whamcloud.com/?p=fs/lustre-release.git;a=commitdiff;h=8c4f4a47e051b097358818f4d3777d02124abbe7

            looks invalid - lustre client had already such code

                    /* We set sb->s_dev equal on all lustre clients in order to support
                     * NFS export clustering.  NFSD requires that the FSID be the same
                     * on all clients. */
                    /* s_dev is also used in lt_compare() to compare two fs, but that is
                     * only a node-local comparison. */
                    uuid = obd_get_uuid(sbi->ll_md_exp);
                    if (uuid != NULL)
                            sb->s_dev = get_uuid2int(uuid->uuid, strlen(uuid->uuid));
                    sbi->ll_mnt = mnt;
            

            In that case exporting an s_dev via statfs will enough.

            shadow Alexey Lyashkov added a comment - last patch http://git.whamcloud.com/?p=fs/lustre-release.git;a=commitdiff;h=8c4f4a47e051b097358818f4d3777d02124abbe7 looks invalid - lustre client had already such code /* We set sb->s_dev equal on all lustre clients in order to support * NFS export clustering. NFSD requires that the FSID be the same * on all clients. */ /* s_dev is also used in lt_compare() to compare two fs, but that is * only a node-local comparison. */ uuid = obd_get_uuid(sbi->ll_md_exp); if (uuid != NULL) sb->s_dev = get_uuid2int(uuid->uuid, strlen(uuid->uuid)); sbi->ll_mnt = mnt; In that case exporting an s_dev via statfs will enough.
            yujian Jian Yu added a comment - - edited

            Patch for Lustre b2_1 branch to add "32bitapi" Lustre client mount option while exporting the Lustre client as NFSv3 server: http://review.whamcloud.com/6457
            Patch for Lustre b1_8 branch: http://review.whamcloud.com/6663
            Patch for Lustre master branch: http://review.whamcloud.com/6649

            yujian Jian Yu added a comment - - edited Patch for Lustre b2_1 branch to add "32bitapi" Lustre client mount option while exporting the Lustre client as NFSv3 server: http://review.whamcloud.com/6457 Patch for Lustre b1_8 branch: http://review.whamcloud.com/6663 Patch for Lustre master branch: http://review.whamcloud.com/6649
            yujian Jian Yu added a comment -

            Lustre b2_1 client build: http://build.whamcloud.com/job/lustre-b2_1/204
            Lustre master server build: http://build.whamcloud.com/job/lustre-master/1508
            Distro/Arch: RHEL6.4/x86_64

            The issue still occurred: https://maloo.whamcloud.com/test_sets/b5a0c146-c624-11e2-9bf1-52540035b04c

            CMD: client-26vm3 exportfs -o rw,async,no_root_squash *:/mnt/lustre         && exportfs -v
            /mnt/lustre   	<world>(rw,async,wdelay,no_root_squash,no_subtree_check)
            
            Mounting NFS clients (version 3)...
            CMD: client-26vm5,client-26vm6.lab.whamcloud.com mkdir -p /mnt/lustre
            CMD: client-26vm5,client-26vm6.lab.whamcloud.com mount -t nfs -o nfsvers=3,async                 client-26vm3:/mnt/lustre /mnt/lustre
            client-26vm6: mount.nfs: Connection timed out
            client-26vm5: mount.nfs: Connection timed out
             parallel-scale-nfsv3 : @@@@@@ FAIL: setup nfs failed! 
            

            On somehow, this issue can be resolved by specifying "fsid=1" (without "32bitapi" for Lustre mount option) when re-export Lustre via NFS (v3 or v4). For example: "/mnt/lustre 10.211.55.*(rw,no_root_squash,fsid=1)". (Verified on 2.6.32-358.2.1.el6)

            We need a patch on Lustre b2_1 branch to resolve the interop issue.

            yujian Jian Yu added a comment - Lustre b2_1 client build: http://build.whamcloud.com/job/lustre-b2_1/204 Lustre master server build: http://build.whamcloud.com/job/lustre-master/1508 Distro/Arch: RHEL6.4/x86_64 The issue still occurred: https://maloo.whamcloud.com/test_sets/b5a0c146-c624-11e2-9bf1-52540035b04c CMD: client-26vm3 exportfs -o rw,async,no_root_squash *:/mnt/lustre && exportfs -v /mnt/lustre <world>(rw,async,wdelay,no_root_squash,no_subtree_check) Mounting NFS clients (version 3)... CMD: client-26vm5,client-26vm6.lab.whamcloud.com mkdir -p /mnt/lustre CMD: client-26vm5,client-26vm6.lab.whamcloud.com mount -t nfs -o nfsvers=3,async client-26vm3:/mnt/lustre /mnt/lustre client-26vm6: mount.nfs: Connection timed out client-26vm5: mount.nfs: Connection timed out parallel-scale-nfsv3 : @@@@@@ FAIL: setup nfs failed! On somehow, this issue can be resolved by specifying "fsid=1" (without "32bitapi" for Lustre mount option) when re-export Lustre via NFS (v3 or v4). For example: "/mnt/lustre 10.211.55.*(rw,no_root_squash,fsid=1)". (Verified on 2.6.32-358.2.1.el6) We need a patch on Lustre b2_1 branch to resolve the interop issue.

            People

              yong.fan nasf (Inactive)
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              17 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: