[LU-2135] Test failure on test suite sanityn, subtest test_34 Created: 10/Oct/12  Updated: 10/Dec/17  Resolved: 07/May/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.3.0, Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 5137

 Description   

This issue was created by maloo for yujian <yujian@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/d2dceea4-12a8-11e2-bd97-52540035b04c.

Lustre Tag: v2_3_0_RC2
Lustre Build: http://build.whamcloud.com/job/lustre-b2_3/32
Distro/Arch: RHEL6.3/x86_64

The sub-test test_34 failed with the following error:

test failed to respond and timed out

Info required for matching: sanityn 34

The suite_stdout log showed that:

02:30:22:== sanityn test 33a: commit on sharing, cross crete/delete, 2 clients, benchmark == 02:30:18 (1349775018)
<~snip~>
03:06:27:=== START createmany old: 0 transaction
03:06:27:CMD: client-27vm2.lab.whamcloud.com,client-27vm1 createmany -o /mnt/lustre/d0.sanityn/d33-\$(hostname)-3/f- -r /mnt/lustre2/d0.sanityn/d33-\$(hostname)-3/f- 10000 > /dev/null 2>&1
03:30:43:********** Timeout by autotest system **********03:32:05:CMD: client-27vm7 lctl get_param -n osd*.lustre-MDT0000.mntdev

But the suite_log showed that test 33a passed but test 34 hung:

PASS 33a (3700s)

== sanityn test 34: no lock timeout under IO == 03:31:57 (1349778717)
CMD: client-27vm8 lctl get_param -n ldlm.namespaces.filter-*.lock_timeouts
CMD: client-27vm8 lctl set_param fail_loc=0x512
fail_loc=0x512
CMD: client-27vm8 lctl set_param fail_loc=0x512
fail_loc=0x512
CMD: client-27vm8 lctl set_param fail_loc=0x512
fail_loc=0x512
CMD: client-27vm8 lctl set_param fail_loc=0x512
fail_loc=0x512
CMD: client-27vm8 lctl set_param fail_loc=0x512
fail_loc=0x512
CMD: client-27vm8 lctl set_param fail_loc=0x512
fail_loc=0x512
CMD: client-27vm8 lctl set_param fail_loc=0x512
fail_loc=0x512
lock should not expire
writing on client1
reading on client2


 Comments   
Comment by Andreas Dilger [ 17/Dec/12 ]

Possibly a duplicate: https://maloo.whamcloud.com/test_sessions/abb408b6-475b-11e2-876e-52540035b04c

Comment by Keith Mannthey (Inactive) [ 08/Feb/13 ]

Another possible dup on Master: https://maloo.whamcloud.com/sub_tests/9bcb6ab2-7234-11e2-9936-52540035b04c

Comment by Nikitas Angelinas [ 27/Mar/13 ]

Some possible duplicates from testing with unlanded NRS patches on master:
https://maloo.whamcloud.com/test_sessions/b65824aa-9607-11e2-8c64-52540035b04c
https://maloo.whamcloud.com/test_sessions/21c4eeb8-9611-11e2-8c64-52540035b04c
https://maloo.whamcloud.com/test_sessions/cd1e4cc8-9620-11e2-9abb-52540035b04c
https://maloo.whamcloud.com/test_sessions/8246a25e-9666-11e2-9abb-52540035b04c
https://maloo.whamcloud.com/test_sessions/bf8a5e06-961d-11e2-9abb-52540035b04c
https://maloo.whamcloud.com/test_sessions/eb3c2e2a-9650-11e2-9abb-52540035b04c

Comment by Niu Yawei (Inactive) [ 01/Apr/13 ]

1.8.9 client -> 2.4 server Interop test: https://maloo.whamcloud.com/test_sets/25418d26-948b-11e2-93c6-52540035b04c.

Comment by Li Wei (Inactive) [ 15/Apr/13 ]

https://maloo.whamcloud.com/test_sets/12adf294-a18d-11e2-8fc0-52540035b04c

Comment by Andreas Dilger [ 18/Dec/14 ]

https://testing.hpdd.intel.com/test_sets/3cf1c250-8616-11e4-b909-5254006e85c2

Comment by Andreas Dilger [ 07/May/15 ]

Haven't seen this for a long time.

Comment by Andreas Dilger [ 10/Dec/17 ]
test_34 FAIL 2017-12-07 09:25:59 32 1 lock timeout happened   gerrit:30146, jira:LU-5152 Preview 50
test_34 FAIL 2017-12-01 08:01:28 31 1 lock timeout happened   gerrit:26930, jira:LU-8856 Preview 50
test_34 FAIL 2017-11-29 08:23:15 32 1 lock timeout happened   gerrit:28834, jira:LU-10193 Preview 50
test_34 FAIL 2017-11-06 16:32:03 23 1 lock timeout happened   gerrit:28846, jira:LU-9899 Preview 50
test_34 FAIL 2017-10-03 01:01:52 50 1 no lock timeout happened   gerrit:29295, jira:LU-9019 Preview 50
test_34 FAIL 2017-09-03 16:14:00 49 1 Error in dmesg detected     Preview 50
test_34 FAIL 2017-08-26 14:43:49 51 1 Error in dmesg detected     Preview 50
test_34 FAIL 2017-08-22 14:45:04 47 1 Error in dmesg detected     Preview 50
test_34 FAIL 2017-08-21 15:14:58 51 1 Error in dmesg detected     Preview 50
test_34 FAIL 2017-08-03 17:29:37 43 1 test_34 returned 1   gerrit:16682, jira:LU-7236 Preview 50
test_34 FAIL 2017-07-27 12:57:24 43 1 test_34 returned 1   gerrit:16682, jira:LU-7236 Preview 50
test_34 FAIL 2017-05-12 20:26:52 56 1 Error in dmesg detected     Preview 50
test_34 FAIL 2017-05-02 12:14:02 710 1 lock timeout happened   gerrit:26752, jira:LU-9372 Preview 50
test_34 FAIL 2017-04-25 17:45:35 55 1 Error in dmesg detected     Preview 50

This test is still failing once every couple of weeks.

Generated at Sat Feb 10 01:22:39 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.