Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14294

parallel-scale-nfsv4 fails to start with “setup nfs failed! “ for RHEL8.3

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0, Lustre 2.15.4
    • Lustre 2.14.0, Lustre 2.15.1, Lustre 2.15.3
    • RHEL8.3 server
    • 3
    • 9223372036854775807

    Description

      The parallel-scale-nfsv4 test suite is failing in NFS setup and, thus, no tests are run. We are seeing this for RHEL8.3 servers.

      Looking at a recent failure at https://testing.whamcloud.com/test_sets/d76032dc-6074-406f-824c-a7f3676496cb, we see

      CMD: trevis-202vm4 { [[ -e /etc/SuSE-release ]] &&
      				 service nfsserver restart; } ||
      				 service nfs restart ||
      				 service nfs-server restart
      trevis-202vm4: Redirecting to /bin/systemctl restart nfs.service
      trevis-202vm4: Failed to restart nfs.service: Unit nfs.service not found.
      trevis-202vm4: Redirecting to /bin/systemctl restart nfs-server.service
      trevis-202vm4: Job for nfs-server.service canceled.
      pdsh@trevis-202vm1: trevis-202vm4: ssh exited with exit code 1
       parallel-scale-nfsv4 : @@@@@@ FAIL: setup nfs failed! 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:6273:error()
        = /usr/lib64/lustre/tests/parallel-scale-nfs.sh:68:main()
      

      When we see this failure, so far, it is when node-provisioning/lustre-initialization takes place right before parallel-scale-nfsv4 is run.

      Logs for failures
      https://testing.whamcloud.com/test_sets/fe77403a-33b6-4c9b-9fb3-51cc04edd4aa
      https://testing.whamcloud.com/test_sets/3f882eb8-2355-4108-adce-ed73e10f054c
      https://testing.whamcloud.com/test_sets/d76032dc-6074-406f-824c-a7f3676496cb
      https://testing.whamcloud.com/test_sets/4d2fe1de-af46-4313-933d-7c36b9024138

      Attachments

        Issue Links

          Activity

            [LU-14294] parallel-scale-nfsv4 fails to start with “setup nfs failed! “ for RHEL8.3

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51283/
            Subject: LU-14294 tests: fixed NFS configuration issue
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set:
            Commit: cef89c354f22f873f1f2e09536de7c690852828b

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51283/ Subject: LU-14294 tests: fixed NFS configuration issue Project: fs/lustre-release Branch: b2_15 Current Patch Set: Commit: cef89c354f22f873f1f2e09536de7c690852828b

            "Alex Deiter <alex.deiter@gmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51283
            Subject: LU-14294 tests: fixed NFS configuration issue
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set: 1
            Commit: dfbddc3a4367eb1096134233b8ef3d80c981b78a

            gerrit Gerrit Updater added a comment - "Alex Deiter <alex.deiter@gmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51283 Subject: LU-14294 tests: fixed NFS configuration issue Project: fs/lustre-release Branch: b2_15 Current Patch Set: 1 Commit: dfbddc3a4367eb1096134233b8ef3d80c981b78a
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49062/
            Subject: LU-14294 tests: fixed NFS configuration issue
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 1a8fe55b17ac2bc2195aaba446467ccdac67b564

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49062/ Subject: LU-14294 tests: fixed NFS configuration issue Project: fs/lustre-release Branch: master Current Patch Set: Commit: 1a8fe55b17ac2bc2195aaba446467ccdac67b564

            "Alex Deiter <alex.deiter@gmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49062
            Subject: LU-14294 tests: fixed NFS configuration issue
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: f63c4399c084f9c3380ceb2990722c3f477a297b

            gerrit Gerrit Updater added a comment - "Alex Deiter <alex.deiter@gmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49062 Subject: LU-14294 tests: fixed NFS configuration issue Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: f63c4399c084f9c3380ceb2990722c3f477a297b
            sarah Sarah Liu added a comment -

            +1 in interop testing between master(el8.5) and 2.12 client(el7.9) in nfsv3 testing
            https://testing.whamcloud.com/test_sets/6f42d6c1-3777-463a-aed3-ce12f028983c

            sarah Sarah Liu added a comment - +1 in interop testing between master(el8.5) and 2.12 client(el7.9) in nfsv3 testing https://testing.whamcloud.com/test_sets/6f42d6c1-3777-463a-aed3-ce12f028983c

            parallel-scale-nfsv3 runs before parallel-scale-nfsv4, actually parallel-scale-nfsv3 runs and hangs which causes the cluster to run node-provisioning/lustre-initialization, and parallel-scale-nfsv3 does start the NFS servers.

            Looking at the suite_log for parallel-scale-nfsv3, at https://testing.whamcloud.com/test_sets/bc5183ad-2cad-4b97-aba4-604b73b9765f, the NFS server starts

            CMD: trevis-202vm4 { [[ -e /etc/SuSE-release ]] &&
            				 service nfsserver restart; } ||
            				 service nfs restart ||
            				 service nfs-server restart
            trevis-202vm4: Redirecting to /bin/systemctl restart nfs.service
            trevis-202vm4: Failed to restart nfs.service: Unit nfs.service not found.
            trevis-202vm4: Redirecting to /bin/systemctl restart nfs-server.service
            CMD: trevis-202vm1.trevis.whamcloud.com,trevis-202vm2 chkconfig --list rpcidmapd 2>/dev/null |
            			       grep -q rpcidmapd && service rpcidmapd restart ||
            			       true
            
            Mounting NFS clients (version 3)...
            

            Looking at the MDS (vm4) console log, we see acknowledgment from NFSD before parallel-scale-nfsv3 starts running tests

            [64667.180020] Lustre: DEBUG MARKER: { [[ -e /etc/SuSE-release ]] &&
            [64667.180020] 				 service nfsserver restart; } ||
            [64667.180020] 				 service nfs restart ||
            [64667.180020] 				 service nfs-server restart
            [64667.719483] Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
            [64668.019298] NFSD: Using nfsdcld client tracking operations.
            [64668.020325] NFSD: no clients to reclaim, skipping NFSv4 grace period (net f0000098)
            [64671.281631] Lustre: DEBUG MARKER: /usr/sbin/lctl mark excepting tests: 
            

            Before parallel-scale-nfsv4 starts, we don't see the same

            [  344.752738] Lustre: DEBUG MARKER: { [[ -e /etc/SuSE-release ]] &&
            [  344.752738] 				 service nfsserver restart; } ||
            [  344.752738] 				 service nfs restart ||
            [  344.752738] 				 service nfs-server restart
            [  345.178077] Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
            [  345.638306] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  parallel-scale-nfsv4 : @@@@@@ FAIL: setup nfs failed! 
            [  346.014296] Lustre: DEBUG MARKER: parallel-scale-nfsv4 : @@@@@@ FAIL: setup nfs failed!

            So, the NFS RPMs were loaded on the servers.

            jamesanunez James Nunez (Inactive) added a comment - parallel-scale-nfsv3 runs before parallel-scale-nfsv4, actually parallel-scale-nfsv3 runs and hangs which causes the cluster to run node-provisioning/lustre-initialization, and parallel-scale-nfsv3 does start the NFS servers. Looking at the suite_log for parallel-scale-nfsv3, at https://testing.whamcloud.com/test_sets/bc5183ad-2cad-4b97-aba4-604b73b9765f , the NFS server starts CMD: trevis-202vm4 { [[ -e /etc/SuSE-release ]] && service nfsserver restart; } || service nfs restart || service nfs-server restart trevis-202vm4: Redirecting to /bin/systemctl restart nfs.service trevis-202vm4: Failed to restart nfs.service: Unit nfs.service not found. trevis-202vm4: Redirecting to /bin/systemctl restart nfs-server.service CMD: trevis-202vm1.trevis.whamcloud.com,trevis-202vm2 chkconfig --list rpcidmapd 2>/dev/null | grep -q rpcidmapd && service rpcidmapd restart || true Mounting NFS clients (version 3)... Looking at the MDS (vm4) console log, we see acknowledgment from NFSD before parallel-scale-nfsv3 starts running tests [64667.180020] Lustre: DEBUG MARKER: { [[ -e /etc/SuSE-release ]] && [64667.180020] service nfsserver restart; } || [64667.180020] service nfs restart || [64667.180020] service nfs-server restart [64667.719483] Installing knfsd (copyright (C) 1996 okir@monad.swb.de). [64668.019298] NFSD: Using nfsdcld client tracking operations. [64668.020325] NFSD: no clients to reclaim, skipping NFSv4 grace period (net f0000098) [64671.281631] Lustre: DEBUG MARKER: /usr/sbin/lctl mark excepting tests: Before parallel-scale-nfsv4 starts, we don't see the same [ 344.752738] Lustre: DEBUG MARKER: { [[ -e /etc/SuSE-release ]] && [ 344.752738] service nfsserver restart; } || [ 344.752738] service nfs restart || [ 344.752738] service nfs-server restart [ 345.178077] Installing knfsd (copyright (C) 1996 okir@monad.swb.de). [ 345.638306] Lustre: DEBUG MARKER: /usr/sbin/lctl mark parallel-scale-nfsv4 : @@@@@@ FAIL: setup nfs failed! [ 346.014296] Lustre: DEBUG MARKER: parallel-scale-nfsv4 : @@@@@@ FAIL: setup nfs failed! So, the NFS RPMs were loaded on the servers.

            Have the NFS tools RPMs been installed?

            adilger Andreas Dilger added a comment - Have the NFS tools RPMs been installed?

            People

              Deiter Alex Deiter
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: