[LU-12247] sanityn test 34 fails with 'test_34 returned 1' Created: 30/Apr/19  Updated: 24/Jun/19  Resolved: 24/Jun/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0, Lustre 2.12.1
Fix Version/s: Lustre 2.13.0

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: ppc
Environment:

ppc clients


Issue Links:
Duplicate
duplicates LU-11803 sanity test 255c fails with 'Ladvise ... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

sanityn test_34 fails with 'test_34 returned 1'. We see this test fail for PPC only.

Looking at the suite_log for a recent failure, with logs at https://testing.whamcloud.com/test_sets/7d68fa34-668f-11e9-8bb1-52540065bddc, we see that wait_osc_import_ready() does not return in time

CMD: trevis-55vm11 lctl set_param -n fail_loc=0 2>/dev/null || true
CMD: trevis-77vm1.trevis.whamcloud.com lctl get_param -n at_min
can't get osc.lustre-OST0000-osc-ffff*.ost_server_uuid in 40 secs
CMD: trevis-77vm1.trevis.whamcloud.com lctl get_param -n at_min
can't get osc.lustre-OST0001-osc-ffff*.ost_server_uuid in 40 secs
CMD: trevis-77vm1.trevis.whamcloud.com lctl get_param -n at_min
can't get osc.lustre-OST0002-osc-ffff*.ost_server_uuid in 40 secs
CMD: trevis-77vm1.trevis.whamcloud.com lctl get_param -n at_min
can't get osc.lustre-OST0003-osc-ffff*.ost_server_uuid in 40 secs
CMD: trevis-77vm1.trevis.whamcloud.com lctl get_param -n at_min
can't get osc.lustre-OST0004-osc-ffff*.ost_server_uuid in 40 secs
CMD: trevis-77vm1.trevis.whamcloud.com lctl get_param -n at_min
can't get osc.lustre-OST0005-osc-ffff*.ost_server_uuid in 40 secs
CMD: trevis-77vm1.trevis.whamcloud.com lctl get_param -n at_min
can't get osc.lustre-OST0006-osc-ffff*.ost_server_uuid in 40 secs
CMD: trevis-55vm11 lctl set_param -n fail_loc=0 2>/dev/null || true
CMD: trevis-77vm1.trevis.whamcloud.com lctl get_param -n at_min
can't get osc.lustre-OST0000-osc-ffff*.ost_server_uuid in 40 secs
test_34 returned 1
FAIL 34 (353s)

When sanityn test 34 succeeds, we should see client 2 checking each OST that it in the IDLE state. When this test fails, we don’t see any activity on the console of client 2 (vm2)

 [10987.532565] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == sanityn test 34: no lock timeout under IO ========================================================= 04:24:20 \(1555993460\)
[10987.714805] Lustre: DEBUG MARKER: == sanityn test 34: no lock timeout under IO ========================================================= 04:24:20 

The OSS console log does not have errors outside what is seen when sanityn test 34 succeeds.

Logs for other failures are at
https://testing.whamcloud.com/test_sets/07ef1416-2970-11e9-b901-52540065bddc
https://testing.whamcloud.com/test_sets/052e8140-2555-11e9-830a-52540065bddc



 Comments   
Comment by Andreas Dilger [ 30/Apr/19 ]

This looks like a test script problem because it is checking for -ffff in the OSC name which is x86 specific.

Comment by James A Simmons [ 22/Jun/19 ]

Is this still true now that https://review.whamcloud.com/33894 landed?

Comment by James Nunez (Inactive) [ 24/Jun/19 ]

sanityn test 34 hasn't failed for ppc clients since May 1 (2019). As James pointed out, it looks like https://review.whamcloud.com/#/c/33894/ fixed this issue.

Generated at Sat Feb 10 02:50:55 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.