[LU-747] lnet-selftest test smoke: lst regression test failed Created: 10/Oct/11  Updated: 29/May/17  Resolved: 29/May/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.3.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Lai Siyao
Resolution: Cannot Reproduce Votes: 0
Labels: None

Issue Links:
Related
Severity: 3
Rank (Obsolete): 5312

 Description   

This issue was created by maloo for nasf <yong.fan@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/741005c6-f3a2-11e0-908b-52540025f9af.

==========
/tmp/smoke.sh: line 35: 14410 Killed /usr/sbin/lst stat --delay 10 --timeout 10 c s
killing 14410 ...
c:
Total 0 error nodes in c
s:
Total 0 error nodes in s
4 batch in stopping
Batch is stopped
session is ended
Total 0 error nodes in c
Total 0 error nodes in s
pdsh@client-28vm1: client-28vm1: rcmd: xpoll: protocol failure in circuit setup
pdsh@client-28vm1: client-28vm2: rcmd: xpoll: protocol failure in circuit setup
pdsh@client-28vm1: client-28vm4: rcmd: xpoll: protocol failure in circuit setup
lnet-selftest test_smoke: @@@@@@ FAIL: test_smoke failed with 254
Dumping lctl log to /logdir/test_logs/2011-10-10/lustre-reviews-el5-x86_64_2617_-7f5c37529258/lnet-selftest.test_smoke.*.1318292496.log



 Comments   
Comment by nasf (Inactive) [ 05/Nov/11 ]

Another similar failure instance, "rcmd: xpoll: protocol failure in circuit setup". I am not sure what happened.

https://maloo.whamcloud.com/test_sets/7b1c412a-0803-11e1-b0d9-52540025f9af

Comment by Lai Siyao [ 16/Apr/12 ]

Another similar failure https://maloo.whamcloud.com/test_sets/14900348-8827-11e1-8e21-525400d2bfa6 .

Comment by Peter Jones [ 03/May/12 ]

Hi Lai

Could you please look into this one?

Thanks

Peter

Comment by Doug Oucharek (Inactive) [ 05/Jun/12 ]

Are these tests run as root? The message "protocol failure in circuit setup" from rcmd indicates that one of two things have happened: 1- rcmd is trying to use a port number less than 1024 and does not have root privileges (need root to access port numbers less than 1024), 2- the other end has refused the connection (too busy?).

Comment by Lai Siyao [ 13/Jun/12 ]

Generally such failure is triggered by pdsh command, and the maloo logs show the failed command are executed as root, so the second one might be the possible cause, but this is hard to identify. I haven't seen such failure for a while, if we can find a way to reproduce this failure, it will be easier to find the cause.

Comment by Peter Jones [ 14/Jun/12 ]

Not failing very often if at all anymore so dropping priority

Comment by Andreas Dilger [ 29/May/17 ]

Close old ticket.

Generated at Sat Feb 10 01:10:02 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.