Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2904

parallel-scale-nfsv3: FAIL: setup nfs failed!

Details

    • 3
    • 6993

    Description

      The parallel-scale-nfsv3 test failed as follows:

      Mounting NFS clients (version 3)...
      CMD: client-12vm1,client-12vm2 mkdir -p /mnt/lustre
      CMD: client-12vm1,client-12vm2 mount -t nfs -o nfsvers=3,async                 client-12vm3:/mnt/lustre /mnt/lustre
      client-12vm2: mount.nfs: Connection timed out
      client-12vm1: mount.nfs: Connection timed out
       parallel-scale-nfsv3 : @@@@@@ FAIL: setup nfs failed! 
      

      Syslog on Lustre MDS/Lustre Client/NFS Server client-12vm3 showed that:

      Mar  4 17:34:15 client-12vm3 mrshd[4254]: root@client-12vm1.lab.whamcloud.com as root: cmd='(PATH=$PATH:/usr/lib64/lustre/utils:/usr/lib64/lustre/tests:/sbin:/usr/sbin; cd /usr/lib64/lustre/tests; LUSTRE="/usr/lib64/lustre"  sh -c "exportfs -o rw,async,no_root_squash *:/mnt/lustre         && exportfs -v");echo XXRETCODE:$?'
      Mar  4 17:34:15 client-12vm3 xinetd[1640]: EXIT: mshell status=0 pid=4253 duration=0(sec)
      Mar  4 17:34:16 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:894 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:34:16 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:713 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:34:17 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:784 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:34:17 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:877 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:34:19 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:946 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:34:19 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:1013 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:34:23 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:797 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:34:23 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:701 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:34:31 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:719 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:34:31 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:941 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:34:41 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:943 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:34:41 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:810 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:34:51 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:849 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:34:51 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:740 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:35:01 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:846 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:35:01 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:667 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:35:11 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:955 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:35:11 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:1006 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:35:21 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:828 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:35:21 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:739 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:35:31 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:1011 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:35:31 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:994 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:35:41 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:847 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:35:41 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:756 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:35:51 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:892 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:35:51 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:749 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:36:01 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:1017 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:36:01 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:873 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:36:11 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:874 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:36:11 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:749 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:36:21 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:916 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:36:21 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:841 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:36:21 client-12vm3 xinetd[1640]: START: mshell pid=4286 from=::ffff:10.10.4.206
      Mar  4 17:36:21 client-12vm3 mrshd[4287]: root@client-12vm1.lab.whamcloud.com as root: cmd='/usr/sbin/lctl mark "/usr/sbin/lctl mark  parallel-scale-nfsv3 : @@@@@@ FAIL: setup nfs failed! ";echo XXRETCODE:$?'
      Mar  4 17:36:21 client-12vm3 kernel: Lustre: DEBUG MARKER: /usr/sbin/lctl mark  parallel-scale-nfsv3 : @@@@@@ FAIL: setup nfs failed!
      

      Maloo report: https://maloo.whamcloud.com/test_sets/5cbf6978-853e-11e2-bfd3-52540035b04c

      Attachments

        Issue Links

          Activity

            [LU-2904] parallel-scale-nfsv3: FAIL: setup nfs failed!

            Generate the FSID from super_block::s_dev.

            The patch for master:
            http://review.whamcloud.com/#/c/7434/
            The patch for b2_4:
            http://review.whamcloud.com/#/c/7435/

            yong.fan nasf (Inactive) added a comment - Generate the FSID from super_block::s_dev. The patch for master: http://review.whamcloud.com/#/c/7434/ The patch for b2_4: http://review.whamcloud.com/#/c/7435/

            I think I agree with Alexey - What's the purpose of requiring a patched version of NFS-utils? Obviously we eventually want to fix the entire NFS-utils and kernel NFS/NFSD problems with 64 bit root inodes, but until complete fixes are available, shouldn't we not require a patch? (Also, having an nfs-utils patch adds another package that Lustre users must build themselves or that must be provided with Lustre [like e2fsprogs].)

            It seems like the better solution is to document and require the -o fsid= option while pushing for fixes upstream. (This is Cray's plan going forward whether or not this specific patch remains in Lustre.)

            paf Patrick Farrell (Inactive) added a comment - I think I agree with Alexey - What's the purpose of requiring a patched version of NFS-utils? Obviously we eventually want to fix the entire NFS-utils and kernel NFS/NFSD problems with 64 bit root inodes, but until complete fixes are available, shouldn't we not require a patch? (Also, having an nfs-utils patch adds another package that Lustre users must build themselves or that must be provided with Lustre [like e2fsprogs] .) It seems like the better solution is to document and require the -o fsid= option while pushing for fixes upstream. (This is Cray's plan going forward whether or not this specific patch remains in Lustre.)

            I strongly disagree about that patch, please revert and commit correct version without clustred nfs interoperability broken.

            shadow Alexey Lyashkov added a comment - I strongly disagree about that patch, please revert and commit correct version without clustred nfs interoperability broken.
            yujian Jian Yu added a comment -

            The patch http://review.whamcloud.com/6493 was landed on Lustre b2_4 branch.

            yujian Jian Yu added a comment - The patch http://review.whamcloud.com/6493 was landed on Lustre b2_4 branch.
            yujian Jian Yu added a comment - Hi Oleg, Could you please cherry-pick the patch to Lustre b2_4 branch? Thanks. The failure occurred regularly on Lustre b2_4 branch: https://maloo.whamcloud.com/test_sets/0c61eedc-fdad-11e2-9fd5-52540035b04c https://maloo.whamcloud.com/test_sets/f1c60464-fd16-11e2-9fdb-52540035b04c https://maloo.whamcloud.com/test_sets/13499228-fcda-11e2-b90c-52540035b04c https://maloo.whamcloud.com/test_sets/3d64d0f8-fcc2-11e2-9fdb-52540035b04c https://maloo.whamcloud.com/test_sets/512fc62e-fcb8-11e2-9222-52540035b04c

            you have break a clustered NFS or NFS failver configuation where
            2 NFS servers in pair - first with older nfs-utils tools where fsid generated from s_dev, second with this patch.

            so you have broke interoperability with older versions.

            shadow Alexey Lyashkov added a comment - you have break a clustered NFS or NFS failver configuation where 2 NFS servers in pair - first with older nfs-utils tools where fsid generated from s_dev, second with this patch. so you have broke interoperability with older versions.
            yujian Jian Yu added a comment -

            The patch of http://review.whamcloud.com/6493 also needs to be backported to Lustre b1_8 and b2_1 branches to pass the testing on the following interop configurations:

            NFS clients + Lustre b1_8/b2_1 NFS server/Lustre client + Lustre b2_4/master Lustre servers

            yujian Jian Yu added a comment - The patch of http://review.whamcloud.com/6493 also needs to be backported to Lustre b1_8 and b2_1 branches to pass the testing on the following interop configurations: NFS clients + Lustre b1_8/b2_1 NFS server/Lustre client + Lustre b2_4/master Lustre servers
            yujian Jian Yu added a comment -

            Hi Oleg,

            Could you please cherry-pick http://review.whamcloud.com/6493 to Lustre b2_4 branch? Thanks.

            The parallel-scale-nfsv3 test also failed on the current Lustre b2_4 branch:
            https://maloo.whamcloud.com/test_sets/9f30063c-ed8f-11e2-8e3a-52540035b04c

            yujian Jian Yu added a comment - Hi Oleg, Could you please cherry-pick http://review.whamcloud.com/6493 to Lustre b2_4 branch? Thanks. The parallel-scale-nfsv3 test also failed on the current Lustre b2_4 branch: https://maloo.whamcloud.com/test_sets/9f30063c-ed8f-11e2-8e3a-52540035b04c

            Currently, the llite can export both 32-bits "s_dev" and the 64-bits "FSID", which one will be used depends on the users (nfs-utils or other applications). Even if they are different, they can be used to indicate/locate the same the device (FS). Using the "s_dev" will ignore "FSID", the same for reserve case. It is not required to be the same.

            I am not sure I caught the point you worry about, but you can give my a detailed example that the patch breaking something.

            yong.fan nasf (Inactive) added a comment - Currently, the llite can export both 32-bits "s_dev" and the 64-bits "FSID", which one will be used depends on the users (nfs-utils or other applications). Even if they are different, they can be used to indicate/locate the same the device (FS). Using the "s_dev" will ignore "FSID", the same for reserve case. It is not required to be the same. I am not sure I caught the point you worry about, but you can give my a detailed example that the patch breaking something.
            shadow Alexey Lyashkov added a comment - - edited

            well, you can't control how caller will use FSID anyway, but FSID purpose - just separate one NFS handle in hash from other. In case single FS exported via different path and NFS may do round robin access to same files or load balancing (in NFS v4). In that case any number (as you see in ticket set FSID=1 in exports file) is enough, but it's need to be unique at host and and same on cluster. As you don't know which versions NFS servers in load balancing pairs used - you need present same FSID for each case - when it's generated from s_dev and from stat() call.
            Also it's avoid using a private kernel types in lustre includes/structures.

            shadow Alexey Lyashkov added a comment - - edited well, you can't control how caller will use FSID anyway, but FSID purpose - just separate one NFS handle in hash from other. In case single FS exported via different path and NFS may do round robin access to same files or load balancing (in NFS v4). In that case any number (as you see in ticket set FSID=1 in exports file) is enough, but it's need to be unique at host and and same on cluster. As you don't know which versions NFS servers in load balancing pairs used - you need present same FSID for each case - when it's generated from s_dev and from stat() call. Also it's avoid using a private kernel types in lustre includes/structures.

            1) In theory, we can fill the low 32-bits of FSID with s_dev, and fill the high 32-bits of FSID as anything, such as Lustre magic. But in spite of what is filled, we cannot control how the caller to use the returned FSID. And there is no explicit advantage of replacing current patch, since the 64-bits FSID only be generated when mount.

            yong.fan nasf (Inactive) added a comment - 1) In theory, we can fill the low 32-bits of FSID with s_dev, and fill the high 32-bits of FSID as anything, such as Lustre magic. But in spite of what is filled, we cannot control how the caller to use the returned FSID. And there is no explicit advantage of replacing current patch, since the 64-bits FSID only be generated when mount.

            People

              yong.fan nasf (Inactive)
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              17 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: