[LU-14294] parallel-scale-nfsv4 fails to start with “setup nfs failed! “ for RHEL8.3 Created: 05/Jan/21  Updated: 02/Aug/23  Resolved: 19/May/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.14.0, Lustre 2.15.1, Lustre 2.15.3
Fix Version/s: Lustre 2.16.0, Lustre 2.15.4

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: Alex Deiter
Resolution: Fixed Votes: 0
Labels: rhel8, rhel8.3
Environment:

RHEL8.3 server


Issue Links:
Duplicate
is duplicated by LU-12231 parallel-scale-nfsv4 test racer_on_nf... Resolved
Related
is related to LU-9892 parallel-scale-nfsv3 no sub tests fai... Reopened
is related to LU-12230 parallel-scale-nfsv3 test_connectatho... Resolved
is related to LU-13219 parallel-scale-nfsv3: FAIL: setup nfs... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

The parallel-scale-nfsv4 test suite is failing in NFS setup and, thus, no tests are run. We are seeing this for RHEL8.3 servers.

Looking at a recent failure at https://testing.whamcloud.com/test_sets/d76032dc-6074-406f-824c-a7f3676496cb, we see

CMD: trevis-202vm4 { [[ -e /etc/SuSE-release ]] &&
				 service nfsserver restart; } ||
				 service nfs restart ||
				 service nfs-server restart
trevis-202vm4: Redirecting to /bin/systemctl restart nfs.service
trevis-202vm4: Failed to restart nfs.service: Unit nfs.service not found.
trevis-202vm4: Redirecting to /bin/systemctl restart nfs-server.service
trevis-202vm4: Job for nfs-server.service canceled.
pdsh@trevis-202vm1: trevis-202vm4: ssh exited with exit code 1
 parallel-scale-nfsv4 : @@@@@@ FAIL: setup nfs failed! 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:6273:error()
  = /usr/lib64/lustre/tests/parallel-scale-nfs.sh:68:main()

When we see this failure, so far, it is when node-provisioning/lustre-initialization takes place right before parallel-scale-nfsv4 is run.

Logs for failures
https://testing.whamcloud.com/test_sets/fe77403a-33b6-4c9b-9fb3-51cc04edd4aa
https://testing.whamcloud.com/test_sets/3f882eb8-2355-4108-adce-ed73e10f054c
https://testing.whamcloud.com/test_sets/d76032dc-6074-406f-824c-a7f3676496cb
https://testing.whamcloud.com/test_sets/4d2fe1de-af46-4313-933d-7c36b9024138



 Comments   
Comment by Andreas Dilger [ 05/Jan/21 ]

Have the NFS tools RPMs been installed?

Comment by James Nunez (Inactive) [ 05/Jan/21 ]

parallel-scale-nfsv3 runs before parallel-scale-nfsv4, actually parallel-scale-nfsv3 runs and hangs which causes the cluster to run node-provisioning/lustre-initialization, and parallel-scale-nfsv3 does start the NFS servers.

Looking at the suite_log for parallel-scale-nfsv3, at https://testing.whamcloud.com/test_sets/bc5183ad-2cad-4b97-aba4-604b73b9765f, the NFS server starts

CMD: trevis-202vm4 { [[ -e /etc/SuSE-release ]] &&
				 service nfsserver restart; } ||
				 service nfs restart ||
				 service nfs-server restart
trevis-202vm4: Redirecting to /bin/systemctl restart nfs.service
trevis-202vm4: Failed to restart nfs.service: Unit nfs.service not found.
trevis-202vm4: Redirecting to /bin/systemctl restart nfs-server.service
CMD: trevis-202vm1.trevis.whamcloud.com,trevis-202vm2 chkconfig --list rpcidmapd 2>/dev/null |
			       grep -q rpcidmapd && service rpcidmapd restart ||
			       true

Mounting NFS clients (version 3)...

Looking at the MDS (vm4) console log, we see acknowledgment from NFSD before parallel-scale-nfsv3 starts running tests

[64667.180020] Lustre: DEBUG MARKER: { [[ -e /etc/SuSE-release ]] &&
[64667.180020] 				 service nfsserver restart; } ||
[64667.180020] 				 service nfs restart ||
[64667.180020] 				 service nfs-server restart
[64667.719483] Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
[64668.019298] NFSD: Using nfsdcld client tracking operations.
[64668.020325] NFSD: no clients to reclaim, skipping NFSv4 grace period (net f0000098)
[64671.281631] Lustre: DEBUG MARKER: /usr/sbin/lctl mark excepting tests: 

Before parallel-scale-nfsv4 starts, we don't see the same

[  344.752738] Lustre: DEBUG MARKER: { [[ -e /etc/SuSE-release ]] &&
[  344.752738] 				 service nfsserver restart; } ||
[  344.752738] 				 service nfs restart ||
[  344.752738] 				 service nfs-server restart
[  345.178077] Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
[  345.638306] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  parallel-scale-nfsv4 : @@@@@@ FAIL: setup nfs failed! 
[  346.014296] Lustre: DEBUG MARKER: parallel-scale-nfsv4 : @@@@@@ FAIL: setup nfs failed!

So, the NFS RPMs were loaded on the servers.

Comment by Sarah Liu [ 23/Mar/22 ]

+1 in interop testing between master(el8.5) and 2.12 client(el7.9) in nfsv3 testing
https://testing.whamcloud.com/test_sets/6f42d6c1-3777-463a-aed3-ce12f028983c

Comment by Gerrit Updater [ 07/Nov/22 ]

"Alex Deiter <alex.deiter@gmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49062
Subject: LU-14294 tests: fixed NFS configuration issue
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f63c4399c084f9c3380ceb2990722c3f477a297b

Comment by Gerrit Updater [ 19/May/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49062/
Subject: LU-14294 tests: fixed NFS configuration issue
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 1a8fe55b17ac2bc2195aaba446467ccdac67b564

Comment by Peter Jones [ 19/May/23 ]

Landed for 2.16

Comment by Gerrit Updater [ 12/Jun/23 ]

"Alex Deiter <alex.deiter@gmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51283
Subject: LU-14294 tests: fixed NFS configuration issue
Project: fs/lustre-release
Branch: b2_15
Current Patch Set: 1
Commit: dfbddc3a4367eb1096134233b8ef3d80c981b78a

Comment by Gerrit Updater [ 02/Aug/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51283/
Subject: LU-14294 tests: fixed NFS configuration issue
Project: fs/lustre-release
Branch: b2_15
Current Patch Set:
Commit: cef89c354f22f873f1f2e09536de7c690852828b

Generated at Sat Feb 10 03:08:30 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.