[LU-9892] parallel-scale-nfsv3 no sub tests failed: setup nfs failed! Created: 18/Aug/17 Updated: 07/Jun/23 |
|
| Status: | Reopened |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.1, Lustre 2.11.0, Lustre 2.10.2, Lustre 2.10.3, Lustre 2.10.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | James Casper | Assignee: | Minh Diep |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Trevis2, full |
||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
https://testing.hpdd.intel.com/test_sessions/1002375a-10d7-42d6-9686-f936404ef858 FYI: "setup nfs failed" message shows up in several Jira tickets opened (and resolved) between 2012 and 2016. From suite_log: Mounting NFS clients (version 3)... CMD: trevis-50vm1,trevis-50vm2 mkdir -p /mnt/lustre CMD: trevis-50vm1,trevis-50vm2 mount -t nfs -o nfsvers=3,async trevis-50vm7:/mnt/lustre /mnt/lustre trevis-50vm2: mount.nfs: requested NFS version or transport protocol is not supported trevis-50vm1: mount.nfs: requested NFS version or transport protocol is not supported parallel-scale-nfsv3 : @@@@@@ FAIL: setup nfs failed! Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:5291:error() = /usr/lib64/lustre/tests/parallel-scale-nfs.sh:61:main() |
| Comments |
| Comment by James Nunez (Inactive) [ 18/Aug/17 ] |
|
For b2_10, this failure started on 2017-06-28 18:19:34 with logs at https://testing.hpdd.intel.com/test_sets/b6f83e48-5c2f-11e7-a749-5254006e85c2 . So far, these NFS setup failures are only seen on SLES12SP2 (server) testing. The "setup nfs failed" error is also seen on the master branch. The first time this was seen on master is 2017-09-11 with logs at https://testing.hpdd.intel.com/test_sets/43826966-9695-11e7-b760-5254006e85c2. The NFS setup failure is sometimes followed by a hang on umount. One example of this is at https://testing.hpdd.intel.com/test_sets/d4b6316e-7779-11e7-9a57-5254006e85c2. We are seeing the same failure on parallel-scale-nfsv4 which is no surprise because the failure is happening in setup-nfs.sh which both parallel-scale-nfsv3/v4 scripts call. There was a comment this failure may be due to using obsolete (depricated?) commands, like ‘chkconfig’ and ‘service’, in setup-nfs.sh and that we should use systemctl to control services in SLES12 and versions of RHEL7. We need to test this. In setup-nfs.sh, we see use of chkconfig for: 26 do_nodes $LUSTRE_CLIENT "chkconfig --list nfsserver > /dev/null 2>&1 && 27 service nfsserver restart || 28 service nfs restart" || return 1 29 30 do_nodes $NFS_CLIENTS "chkconfig --list rpcidmapd 2>/dev/null | 31 grep -q rpcidmapd && service rpcidmapd restart || 32 true" |
| Comment by Peter Jones [ 04/Dec/17 ] |
|
Bob/Nikolay Any idea why this test is failing for SLES servers? Peter |
| Comment by Oleg Drokin [ 04/Dec/17 ] |
|
we need to make sure that the nfs v3 support is there on the server side and all the modules are loaded and such |
| Comment by Bob Glossman (Inactive) [ 04/Dec/17 ] |
|
As James already mentioned setup-nfs.sh is using obsolete commands. However these commands are equally obsolete on RHEL 7 as SLES. If nfs is being setup and run correctly on RHEL, that suggests some install and setup of nfs server filesystems and services are being done outside the lustre tests themselves. This makes it likely to be a DCO issue, some setup that is being done properly on RHEL 7 but not done correctly or not done at all on SLES. |
| Comment by Minh Diep [ 08/Dec/17 ] |
|
perhaps Suse dropped support nfsv3? just a thought |
| Comment by Bob Glossman (Inactive) [ 08/Dec/17 ] |
|
this is not so. I tried a manual nfs mount on sles12sp2 with -o nfsvers=3 and it worked fine. |
| Comment by Gerrit Updater [ 11/Dec/17 ] |
|
Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30476 |
| Comment by Gerrit Updater [ 09/Jan/18 ] |
|
Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30809 |
| Comment by Gerrit Updater [ 14/Jan/18 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30476/ |
| Comment by Peter Jones [ 14/Jan/18 ] |
|
Landed for 2.11 |
| Comment by Gerrit Updater [ 02/Feb/18 ] |
|
John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30809/ |
| Comment by James Nunez (Inactive) [ 15/Mar/18 ] |
|
We're seeing this issue again with the 2.10.59 master tag for parallel-scale-nfsv3 and nfsv4. Here are a couple of failures: https://testing.hpdd.intel.com/test_sets/994a37f2-274a-11e8-9e0e-52540065bddc https://testing.hpdd.intel.com/test_sets/a07bee60-276a-11e8-9e0e-52540065bddc |