[LU-9892] parallel-scale-nfsv3 no sub tests failed: setup nfs failed! Created: 18/Aug/17  Updated: 07/Jun/23

Status: Reopened
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.1, Lustre 2.11.0, Lustre 2.10.2, Lustre 2.10.3, Lustre 2.10.6
Fix Version/s: None

Type: Bug Priority: Major
Reporter: James Casper Assignee: Minh Diep
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Trevis2, full
server: SLES12sp2, ldiskfs, branch b2_10, v2.10.0.49, b13
client: SLES12sp2, branch b2_10, v2.10.0.49, b13


Issue Links:
Related
is related to LU-10566 parallel-scale-nfsv4 test_metabench: ... Reopened
is related to LU-14294 parallel-scale-nfsv4 fails to start w... Resolved
is related to LU-10292 parallel-scale-nfsv3: FAIL: setup nfs... Closed
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

https://testing.hpdd.intel.com/test_sessions/1002375a-10d7-42d6-9686-f936404ef858

FYI: "setup nfs failed" message shows up in several Jira tickets opened (and resolved) between 2012 and 2016.

From suite_log:

Mounting NFS clients (version 3)...
CMD: trevis-50vm1,trevis-50vm2 mkdir -p /mnt/lustre
CMD: trevis-50vm1,trevis-50vm2 mount -t nfs -o nfsvers=3,async 			trevis-50vm7:/mnt/lustre /mnt/lustre
trevis-50vm2: mount.nfs: requested NFS version or transport protocol is not supported
trevis-50vm1: mount.nfs: requested NFS version or transport protocol is not supported
parallel-scale-nfsv3 : @@@@@@ FAIL: setup nfs failed! 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:5291:error()
  = /usr/lib64/lustre/tests/parallel-scale-nfs.sh:61:main()


 Comments   
Comment by James Nunez (Inactive) [ 18/Aug/17 ]

For b2_10, this failure started on 2017-06-28 18:19:34 with logs at https://testing.hpdd.intel.com/test_sets/b6f83e48-5c2f-11e7-a749-5254006e85c2 .

So far, these NFS setup failures are only seen on SLES12SP2 (server) testing.

The "setup nfs failed" error is also seen on the master branch. The first time this was seen on master is 2017-09-11 with logs at https://testing.hpdd.intel.com/test_sets/43826966-9695-11e7-b760-5254006e85c2. The NFS setup failure is sometimes followed by a hang on umount. One example of this is at https://testing.hpdd.intel.com/test_sets/d4b6316e-7779-11e7-9a57-5254006e85c2.

We are seeing the same failure on parallel-scale-nfsv4 which is no surprise because the failure is happening in setup-nfs.sh which both parallel-scale-nfsv3/v4 scripts call.

There was a comment this failure may be due to using obsolete (depricated?) commands, like ‘chkconfig’ and ‘service’, in setup-nfs.sh and that we should use systemctl to control services in SLES12 and versions of RHEL7. We need to test this.

In setup-nfs.sh, we see use of chkconfig for:

  26         do_nodes $LUSTRE_CLIENT "chkconfig --list nfsserver > /dev/null 2>&1 &&
  27                                  service nfsserver restart ||
  28                                  service nfs restart" || return 1
  29 
  30         do_nodes $NFS_CLIENTS "chkconfig --list rpcidmapd 2>/dev/null |
  31                                grep -q rpcidmapd && service rpcidmapd restart ||
  32                                true"
Comment by Peter Jones [ 04/Dec/17 ]

Bob/Nikolay

Any idea why this test is failing for SLES servers?

Peter

Comment by Oleg Drokin [ 04/Dec/17 ]

we need to make sure that the nfs v3 support is there on the server side and all the modules are loaded and such

Comment by Bob Glossman (Inactive) [ 04/Dec/17 ]

As James already mentioned setup-nfs.sh is using obsolete commands. However these commands are equally obsolete on RHEL 7 as SLES. If nfs is being setup and run correctly on RHEL, that suggests some install and setup of nfs server filesystems and services are being done outside the lustre tests themselves. This makes it likely to be a DCO issue, some setup that is being done properly on RHEL 7 but not done correctly or not done at all on SLES.

Comment by Minh Diep [ 08/Dec/17 ]

perhaps Suse dropped support nfsv3? just a thought

Comment by Bob Glossman (Inactive) [ 08/Dec/17 ]

this is not so. I tried a manual nfs mount on sles12sp2 with -o nfsvers=3 and it worked fine.

Comment by Gerrit Updater [ 11/Dec/17 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30476
Subject: LU-9892 test: use systemctl and chkconfig
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 5b9c53215366d4b15976925dc8c8511ae1be930b

Comment by Gerrit Updater [ 09/Jan/18 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30809
Subject: LU-9892 test: fix SuSe nfsserver setup
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 13d27979cac2b2280e24fb2038f6ddd847663baa

Comment by Gerrit Updater [ 14/Jan/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30476/
Subject: LU-9892 test: fix SuSe nfsserver setup
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 35e86c52cd36edd7b5b87c0f7f1da33ed90d5140

Comment by Peter Jones [ 14/Jan/18 ]

Landed for 2.11

Comment by Gerrit Updater [ 02/Feb/18 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30809/
Subject: LU-9892 test: fix SuSe nfsserver setup
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 2d3105038f49ec4da5d42faccfc74b458d81f75e

Comment by James Nunez (Inactive) [ 15/Mar/18 ]

We're seeing this issue again  with the 2.10.59 master tag for parallel-scale-nfsv3 and nfsv4. Here are a couple of failures:

https://testing.hpdd.intel.com/test_sets/994a37f2-274a-11e8-9e0e-52540065bddc

https://testing.hpdd.intel.com/test_sets/a07bee60-276a-11e8-9e0e-52540065bddc

Generated at Sat Feb 10 02:30:14 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.