Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2904

parallel-scale-nfsv3: FAIL: setup nfs failed!

Details

    • 3
    • 6993

    Description

      The parallel-scale-nfsv3 test failed as follows:

      Mounting NFS clients (version 3)...
      CMD: client-12vm1,client-12vm2 mkdir -p /mnt/lustre
      CMD: client-12vm1,client-12vm2 mount -t nfs -o nfsvers=3,async                 client-12vm3:/mnt/lustre /mnt/lustre
      client-12vm2: mount.nfs: Connection timed out
      client-12vm1: mount.nfs: Connection timed out
       parallel-scale-nfsv3 : @@@@@@ FAIL: setup nfs failed! 
      

      Syslog on Lustre MDS/Lustre Client/NFS Server client-12vm3 showed that:

      Mar  4 17:34:15 client-12vm3 mrshd[4254]: root@client-12vm1.lab.whamcloud.com as root: cmd='(PATH=$PATH:/usr/lib64/lustre/utils:/usr/lib64/lustre/tests:/sbin:/usr/sbin; cd /usr/lib64/lustre/tests; LUSTRE="/usr/lib64/lustre"  sh -c "exportfs -o rw,async,no_root_squash *:/mnt/lustre         && exportfs -v");echo XXRETCODE:$?'
      Mar  4 17:34:15 client-12vm3 xinetd[1640]: EXIT: mshell status=0 pid=4253 duration=0(sec)
      Mar  4 17:34:16 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:894 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:34:16 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:713 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:34:17 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:784 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:34:17 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:877 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:34:19 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:946 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:34:19 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:1013 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:34:23 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:797 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:34:23 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:701 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:34:31 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:719 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:34:31 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:941 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:34:41 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:943 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:34:41 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:810 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:34:51 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:849 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:34:51 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:740 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:35:01 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:846 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:35:01 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:667 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:35:11 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:955 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:35:11 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:1006 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:35:21 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:828 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:35:21 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:739 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:35:31 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:1011 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:35:31 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:994 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:35:41 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:847 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:35:41 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:756 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:35:51 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:892 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:35:51 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:749 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:36:01 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:1017 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:36:01 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:873 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:36:11 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:874 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:36:11 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:749 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:36:21 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.207:916 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:36:21 client-12vm3 rpc.mountd[4165]: authenticated mount request from 10.10.4.206:841 for /mnt/lustre (/mnt/lustre)
      Mar  4 17:36:21 client-12vm3 xinetd[1640]: START: mshell pid=4286 from=::ffff:10.10.4.206
      Mar  4 17:36:21 client-12vm3 mrshd[4287]: root@client-12vm1.lab.whamcloud.com as root: cmd='/usr/sbin/lctl mark "/usr/sbin/lctl mark  parallel-scale-nfsv3 : @@@@@@ FAIL: setup nfs failed! ";echo XXRETCODE:$?'
      Mar  4 17:36:21 client-12vm3 kernel: Lustre: DEBUG MARKER: /usr/sbin/lctl mark  parallel-scale-nfsv3 : @@@@@@ FAIL: setup nfs failed!
      

      Maloo report: https://maloo.whamcloud.com/test_sets/5cbf6978-853e-11e2-bfd3-52540035b04c

      Attachments

        Issue Links

          Activity

            [LU-2904] parallel-scale-nfsv3: FAIL: setup nfs failed!
            pjones Peter Jones added a comment -

            Landed for 2.4.1 and 2.5

            pjones Peter Jones added a comment - Landed for 2.4.1 and 2.5

            Generate the FSID from super_block::s_dev.

            The patch for master:
            http://review.whamcloud.com/#/c/7434/
            The patch for b2_4:
            http://review.whamcloud.com/#/c/7435/

            yong.fan nasf (Inactive) added a comment - Generate the FSID from super_block::s_dev. The patch for master: http://review.whamcloud.com/#/c/7434/ The patch for b2_4: http://review.whamcloud.com/#/c/7435/

            I think I agree with Alexey - What's the purpose of requiring a patched version of NFS-utils? Obviously we eventually want to fix the entire NFS-utils and kernel NFS/NFSD problems with 64 bit root inodes, but until complete fixes are available, shouldn't we not require a patch? (Also, having an nfs-utils patch adds another package that Lustre users must build themselves or that must be provided with Lustre [like e2fsprogs].)

            It seems like the better solution is to document and require the -o fsid= option while pushing for fixes upstream. (This is Cray's plan going forward whether or not this specific patch remains in Lustre.)

            paf Patrick Farrell (Inactive) added a comment - I think I agree with Alexey - What's the purpose of requiring a patched version of NFS-utils? Obviously we eventually want to fix the entire NFS-utils and kernel NFS/NFSD problems with 64 bit root inodes, but until complete fixes are available, shouldn't we not require a patch? (Also, having an nfs-utils patch adds another package that Lustre users must build themselves or that must be provided with Lustre [like e2fsprogs] .) It seems like the better solution is to document and require the -o fsid= option while pushing for fixes upstream. (This is Cray's plan going forward whether or not this specific patch remains in Lustre.)

            I strongly disagree about that patch, please revert and commit correct version without clustred nfs interoperability broken.

            shadow Alexey Lyashkov added a comment - I strongly disagree about that patch, please revert and commit correct version without clustred nfs interoperability broken.
            yujian Jian Yu added a comment -

            The patch http://review.whamcloud.com/6493 was landed on Lustre b2_4 branch.

            yujian Jian Yu added a comment - The patch http://review.whamcloud.com/6493 was landed on Lustre b2_4 branch.
            yujian Jian Yu added a comment - Hi Oleg, Could you please cherry-pick the patch to Lustre b2_4 branch? Thanks. The failure occurred regularly on Lustre b2_4 branch: https://maloo.whamcloud.com/test_sets/0c61eedc-fdad-11e2-9fd5-52540035b04c https://maloo.whamcloud.com/test_sets/f1c60464-fd16-11e2-9fdb-52540035b04c https://maloo.whamcloud.com/test_sets/13499228-fcda-11e2-b90c-52540035b04c https://maloo.whamcloud.com/test_sets/3d64d0f8-fcc2-11e2-9fdb-52540035b04c https://maloo.whamcloud.com/test_sets/512fc62e-fcb8-11e2-9222-52540035b04c

            you have break a clustered NFS or NFS failver configuation where
            2 NFS servers in pair - first with older nfs-utils tools where fsid generated from s_dev, second with this patch.

            so you have broke interoperability with older versions.

            shadow Alexey Lyashkov added a comment - you have break a clustered NFS or NFS failver configuation where 2 NFS servers in pair - first with older nfs-utils tools where fsid generated from s_dev, second with this patch. so you have broke interoperability with older versions.
            yujian Jian Yu added a comment -

            The patch of http://review.whamcloud.com/6493 also needs to be backported to Lustre b1_8 and b2_1 branches to pass the testing on the following interop configurations:

            NFS clients + Lustre b1_8/b2_1 NFS server/Lustre client + Lustre b2_4/master Lustre servers

            yujian Jian Yu added a comment - The patch of http://review.whamcloud.com/6493 also needs to be backported to Lustre b1_8 and b2_1 branches to pass the testing on the following interop configurations: NFS clients + Lustre b1_8/b2_1 NFS server/Lustre client + Lustre b2_4/master Lustre servers
            yujian Jian Yu added a comment -

            Hi Oleg,

            Could you please cherry-pick http://review.whamcloud.com/6493 to Lustre b2_4 branch? Thanks.

            The parallel-scale-nfsv3 test also failed on the current Lustre b2_4 branch:
            https://maloo.whamcloud.com/test_sets/9f30063c-ed8f-11e2-8e3a-52540035b04c

            yujian Jian Yu added a comment - Hi Oleg, Could you please cherry-pick http://review.whamcloud.com/6493 to Lustre b2_4 branch? Thanks. The parallel-scale-nfsv3 test also failed on the current Lustre b2_4 branch: https://maloo.whamcloud.com/test_sets/9f30063c-ed8f-11e2-8e3a-52540035b04c

            Currently, the llite can export both 32-bits "s_dev" and the 64-bits "FSID", which one will be used depends on the users (nfs-utils or other applications). Even if they are different, they can be used to indicate/locate the same the device (FS). Using the "s_dev" will ignore "FSID", the same for reserve case. It is not required to be the same.

            I am not sure I caught the point you worry about, but you can give my a detailed example that the patch breaking something.

            yong.fan nasf (Inactive) added a comment - Currently, the llite can export both 32-bits "s_dev" and the 64-bits "FSID", which one will be used depends on the users (nfs-utils or other applications). Even if they are different, they can be used to indicate/locate the same the device (FS). Using the "s_dev" will ignore "FSID", the same for reserve case. It is not required to be the same. I am not sure I caught the point you worry about, but you can give my a detailed example that the patch breaking something.

            People

              yong.fan nasf (Inactive)
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              17 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: