[LU-318] lnet-selftest test_smoke: FAIL: lst Error found Created: 12/May/11  Updated: 15/Sep/13  Resolved: 15/Sep/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Sarah Liu Assignee: Liang Zhen (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: None

Attachments: Text File lnet-selftest.suite_log.client-5-ib.log     Text File lnet-selftest.test_smoke.debug_log.client-5-ib.1305260968.log     Text File lnet-selftest.test_smoke.dmesg.client-5-ib.1305260968.log     Text File mds-debug.log     Text File mds-dmesg.log     Text File ost-dmesg.log    
Severity: 3
Rank (Obsolete): 4342

 Description   

Running lnet-selftest failed with quota enabled. Cannot upload the result to Maloo, please see the attached for more logs.



 Comments   
Comment by Liang Zhen (Inactive) [ 13/May/11 ]

lustre/tests/lnet-selftest.sh this script is not exactly correct, we can't check error after end_session, because end_session will kill workitems and raise some errors which is not really failed tests. Also, end_session will release all testing-nodes, so show_error may talk with un-assigned nodes.
I will fix the testing script.

Comment by Liang Zhen (Inactive) [ 20/May/11 ]

After looking into these logs again, I think I was wrong in previous comment, we actually did show_error before end_session.

I found this in logs:
[LNet Rates of c]
[R] Avg: 19048 RPC/s Min: 19017 RPC/s Max: 19079 RPC/s
[W] Avg: 20920 RPC/s Min: 20878 RPC/s Max: 20961 RPC/s
[LNet Bandwidth of c]
[R] Avg: 1750.06 MB/s Min: 1747.19 MB/s Max: 1752.93 MB/s
[W] Avg: 1749.21 MB/s Min: 1742.16 MB/s Max: 1756.25 MB/s
Failed to stat on 1 nodes <=================================== this is not correct
[LNet Rates of s]
[R] Avg: 19050 RPC/s Min: 19050 RPC/s Max: 19050 RPC/s
[W] Avg: 22734 RPC/s Min: 22734 RPC/s Max: 22734 RPC/s
[LNet Bandwidth of s]
[R] Avg: 1761.24 MB/s Min: 1761.24 MB/s Max: 1761.24 MB/s
[W] Avg: 1759.02 MB/s Min: 1759.02 MB/s Max: 1759.02 MB/s
Failed to stat on 1 nodes <=================================== this is not correct

looks like two of test nodes rebooted/crashed, or soft lockup
however, I can't find out the reason because I don't see and logs or console output from those two nodes(we actually only have logs from client-5), so could you please collect information from all test nodes the next time?

Thanks
Liang

Comment by Sarah Liu [ 22/May/11 ]

please see the attached for more information, this bug can be reproduced.

Comment by Liang Zhen (Inactive) [ 15/Sep/13 ]

we didn't see this for long time, close it

Generated at Sat Feb 10 01:05:51 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.