[LU-3785] lnet-selftest: test_smoke hung at "6 batch in stopping" Created: 20/Aug/13  Updated: 11/Dec/19  Resolved: 11/Dec/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.1, Lustre 2.5.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Jian Yu Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 1
Labels: yuc2
Environment:

Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/33/
MDSCOUNT=4


Severity: 3
Rank (Obsolete): 9788

 Description   

lnet-selftest test_smoke hung as follows:

Failed to stat on 6 nodes
[LNet Rates of c]
[R] Avg: 191      RPC/s Min: 183      RPC/s Max: 199      RPC/s
[W] Avg: 196      RPC/s Min: 181      RPC/s Max: 211      RPC/s
[LNet Bandwidth of c]
[R] Avg: 18.91    MB/s  Min: 17.58    MB/s  Max: 20.24    MB/s
[W] Avg: 11.84    MB/s  Min: 10.29    MB/s  Max: 13.39    MB/s
killing 24261 ...
/tmp/smoke.sh: line 87: 24261 Killed                  /usr/sbin/lst stat --delay 10 --timeout 10 c s
c:
Total 0 error nodes in c
RPC failure, can't show error on 12345-10.10.17.58@tcp
RPC failure, can't show error on 12345-10.10.17.59@tcp
RPC failure, can't show error on 12345-10.10.17.60@tcp
RPC failure, can't show error on 12345-10.10.17.61@tcp
RPC failure, can't show error on 12345-10.10.17.64@tcp
RPC failure, can't show error on 12345-10.10.17.65@tcp
s:
Total 6 error nodes in s
8 batch in stopping
7 batch in stopping
6 batch in stopping

Maloo report: https://maloo.whamcloud.com/test_sets/01bbbfdc-092c-11e3-a9b0-52540035b04c



 Comments   
Comment by Jian Yu [ 20/Aug/13 ]

Another instance with DNE configuration:
https://maloo.whamcloud.com/test_sets/d75f8c58-0284-11e3-b384-52540035b04c

Comment by Jian Yu [ 06/Sep/13 ]

Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/44/ (2.4.1 RC1)
MDSCOUNT=4

lnet-selftest failed again:
https://maloo.whamcloud.com/test_sets/ae4a21b6-1657-11e3-aa2a-52540035b04c

Comment by Jian Yu [ 03/Dec/13 ]

Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/59/
Distro/Arch: RHEL6.4/x86_64 (server), SLES11SP2/x86_64 (client)
MDSCOUNT=1

The same failure occurred:
https://maloo.whamcloud.com/test_sets/6d5cdb92-5803-11e3-b1ae-52540035b04c

Comment by Jian Yu [ 06/Jan/14 ]

Lustre Build: http://build.whamcloud.com/job/lustre-b2_5/5/
Distro/Arch: RHEL6.4/x86_64(server), SLES11SP3/x86_64(client)
MDSCOUNT=1

The same failure occurred:
https://maloo.whamcloud.com/test_sets/ea6de240-7642-11e3-b3c0-52540035b04c

Comment by Andreas Dilger [ 11/Dec/19 ]

Close old bug, not seen since 2.5.

Generated at Sat Feb 10 01:36:53 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.