[LU-5524] parallel-scale-nfsv3: FAIL: setup nfs failed! Created: 20/Aug/14  Updated: 28/Aug/14  Resolved: 28/Aug/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0, Lustre 2.5.3
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Jian Yu Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

Lustre Build: https://build.hpdd.intel.com/job/lustre-b2_5/82/
Distro/Arch: RHEL6.5/x86_64


Severity: 3
Rank (Obsolete): 15382

 Description   

parallel-scale-nfsv3 test failed as follows:

CMD: shadow-40vm7 service nfs restart
shadow-40vm7: Cannot register service: RPC: Unable to receive; errno = Connection refused
shadow-40vm7: rpc.rquotad: unable to register (RQUOTAPROG, RQUOTAVERS, udp).
shadow-40vm7: rpc.nfsd: writing fd to kernel failed: errno 5 (Input/output error)
shadow-40vm7: rpc.nfsd: writing fd to kernel failed: errno 5 (Input/output error)
shadow-40vm7: rpc.nfsd: unable to set any sockets for nfsd
Shutting down NFS daemon: [  OK  ]
Shutting down NFS mountd: [  OK  ]
Shutting down NFS quotas: [  OK  ]
Shutting down RPC idmapd: [  OK  ]
Starting NFS services:  [  OK  ]
Starting NFS quotas: [FAILED]
Starting NFS mountd: [FAILED]
Starting NFS daemon: [FAILED]
CMD: shadow-40vm1,shadow-40vm2.shadow.whamcloud.com chkconfig --list rpcidmapd 2>/dev/null |
			       grep -q rpcidmapd && service rpcidmapd restart ||
			       true
CMD: shadow-40vm7 exportfs -o rw,async,no_root_squash *:/mnt/lustre         && exportfs -v
/mnt/lustre   	<world>(rw,async,wdelay,no_root_squash,no_subtree_check)

Mounting NFS clients (version 3)...
CMD: shadow-40vm1,shadow-40vm2.shadow.whamcloud.com mkdir -p /mnt/lustre
CMD: shadow-40vm1,shadow-40vm2.shadow.whamcloud.com mount -t nfs -o nfsvers=3,async                 shadow-40vm7:/mnt/lustre /mnt/lustre
shadow-40vm1: mount.nfs: Connection timed out
shadow-40vm2: mount.nfs: Connection timed out
 parallel-scale-nfsv3 : @@@@@@ FAIL: setup nfs failed!

parallel-scale-nfsv4 hit the same failure.

Maloo reports:
https://testing.hpdd.intel.com/test_sets/46866536-28a2-11e4-901f-5254006e85c2
https://testing.hpdd.intel.com/test_sets/46cc3106-28a2-11e4-901f-5254006e85c2



 Comments   
Comment by Jian Yu [ 20/Aug/14 ]

Lustre client build: https://build.hpdd.intel.com/job/lustre-b2_4/73/ (2.4.3)
Lustre server build: https://build.hpdd.intel.com/job/lustre-b2_5/80/
Distro/Arch: RHEL6.5/x86_64

The same failure occurred:
https://testing.hpdd.intel.com/test_sets/bd0e71cc-2658-11e4-8af9-5254006e85c2
https://testing.hpdd.intel.com/test_sets/bd2553a6-2658-11e4-8af9-5254006e85c2

Is this related to the change of http://review.whamcloud.com/11246 ?

Comment by Jian Yu [ 21/Aug/14 ]

This is blocking parallel-scale-nfsv{3,4} testing on Lustre b2_5 branch:
https://testing.hpdd.intel.com/test_sets/3cd2c68e-28e2-11e4-85c7-5254006e85c2
https://testing.hpdd.intel.com/test_sets/3ce98bee-28e2-11e4-85c7-5254006e85c2
https://testing.hpdd.intel.com/test_sets/bf690ca0-28d0-11e4-85c7-5254006e85c2
https://testing.hpdd.intel.com/test_sets/bf8ec968-28d0-11e4-85c7-5254006e85c2

Comment by Oleg Drokin [ 22/Aug/14 ]

I guess kernel update to rhel might have changed something without us noticing and broke nfs.
I wonder if master still works?

Comment by Jian Yu [ 23/Aug/14 ]

Here are some instances occurred in the recent month on master branch:
https://testing.hpdd.intel.com/test_sets/13400e6e-2845-11e4-901f-5254006e85c2
https://testing.hpdd.intel.com/test_sets/fc293d3a-2818-11e4-8e75-5254006e85c2
https://testing.hpdd.intel.com/test_sets/0dc52508-22ca-11e4-b8ac-5254006e85c2
https://testing.hpdd.intel.com/test_sets/6239ab1c-15c9-11e4-818c-5254006e85c2
https://testing.hpdd.intel.com/test_sets/659e6e3e-11e4-11e4-8a56-5254006e85c2

And here are all of the instances against parallel-scale-nfsv3 on master branch:
http://tinyurl.com/qaduwde

The kernel update on RHEL6.5 is not the cause.

Comment by Jian Yu [ 23/Aug/14 ]

By looking into the test sessions on Lustre b2_5 build #83, I found that only SLES11SP3 client + RHEL6.5 server test session hit this issue:
https://testing.hpdd.intel.com/test_sessions/52a73644-28e1-11e4-85c7-5254006e85c2

Other test sessions did not hit this issue. Maybe this is a sporadic test environment issue? Let's wait for the test results of Lustre b2_5 build $84.

Comment by Jian Yu [ 24/Aug/14 ]

For Lustre b2_5 build $84, it's also only the SLES11SP3 client + RHEL6.5 server test session hit this issue:
https://testing.hpdd.intel.com/test_sessions/8198b614-2b7d-11e4-8687-5254006e85c2
Other test sessions did not hit this issue.

Comment by Jian Yu [ 28/Aug/14 ]

The issue did not occur on Lustre b2_5 build #85. It seems it's a sporadic test environment issue. Let's close this ticket now. If it occurs again, please reopen this ticket.

Generated at Sat Feb 10 01:52:13 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.