[LU-1793] Test failure on test suite replay-single, subtest test_44a Created: 27/Aug/12 Updated: 21/Dec/12 Resolved: 20/Dec/12 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.3.0, Lustre 2.4.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Maloo | Assignee: | Hongchao Zhang |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | LB | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 4081 | ||||||||
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/c0bd39b0-ee84-11e1-9426-52540035b04c. The sub-test test_44a failed with the following error:
== replay-single test 44a: race in target handle connect ============================================= 21:37:30 (1345783050)
3 13=3 13
replay-single test_44a: @@@@@@ FAIL: test_44a failed with 3
Trace dump:
= /usr/lib64/lustre/tests/test-framework.sh:3638:error_noexit()
= /usr/lib64/lustre/tests/test-framework.sh:3660:error()
= /usr/lib64/lustre/tests/test-framework.sh:3893:run_one()
= /usr/lib64/lustre/tests/test-framework.sh:3922:run_one_logged()
= /usr/lib64/lustre/tests/test-framework.sh:3796:run_test()
= /usr/lib64/lustre/tests/replay-single.sh:922:main()
Dumping lctl log to /logdir/test_logs/2012-08-23/lustre-b2_3-el6-x86_64-el5-x86_64__3__-7fad99b000d8/replay-single.test_44a.*.1345783052.log
CMD: client-11vm1.lab.whamcloud.com,client-11vm2,client-11vm3,client-11vm4 /usr/sbin/lctl dk > /logdir/test_logs/2012-08-23/lustre-b2_3-el6-x86_64-el5-x86_64__3__-7fad99b000d8/replay-single.test_44a.debug_log.\$(hostname -s).1345783052.log;
dmesg > /logdir/test_logs/2012-08-23/lustre-b2_3-el6-x86_64-el5-x86_64__3__-7fad99b000d8/replay-single.test_44a.dmesg.\$(hostname -s).1345783052.log
|
| Comments |
| Comment by Sarah Liu [ 10/Sep/12 ] |
|
another failure: https://maloo.whamcloud.com/test_sets/68aa2ccc-f683-11e1-8eb0-52540035b04c |
| Comment by Sarah Liu [ 24/Sep/12 ] |
|
server/client lustre-b2_3-RC1 RHEL6 https://maloo.whamcloud.com/test_sets/b0389fb2-0653-11e2-9b17-52540035b04c |
| Comment by Peter Jones [ 25/Sep/12 ] |
|
Hongchao Could you please comment on this one? Thanks Peter |
| Comment by Hongchao Zhang [ 25/Sep/12 ] |
|
in test_44a, '` ... more than one 'MDC' were found! we'd better print the "$mdcdev" to see what these MDCs are. the patch is tracked at http://review.whamcloud.com/#change,4088 |
| Comment by Sarah Liu [ 25/Sep/12 ] |
|
lustre-b2_3-RC1 OFED build |
| Comment by Jian Yu [ 10/Oct/12 ] |
|
Lustre Tag: v2_3_0_RC2 The same issue occurred again: https://maloo.whamcloud.com/test_sets/0dd9033e-12aa-11e2-bd97-52540035b04c |
| Comment by Jian Yu [ 10/Oct/12 ] |
|
Lustre Tag: v2_3_0_RC2 This was hit regularly on different distros/archs: |
| Comment by Jodi Levi (Inactive) [ 10/Oct/12 ] |
|
Reducing from blocker per Oleg's comments in 2.3 channel. 1793 is not a blocker - it's a test script or test environment eissue, somehow we ended up having 3 mdc devices instead of just one. There was a debug patch somewhere to pring list of devices in that case |
| Comment by Sarah Liu [ 02/Nov/12 ] |
|
another failure on SLES11 SP2 client: |
| Comment by Peter Jones [ 28/Nov/12 ] |
|
Sarah reports that this consistently fails 100% of the time so it should be possible to debug and fix this test script/env issue now. |
| Comment by Oleg Drokin [ 03/Dec/12 ] |
|
This problem potentially got fixed by |
| Comment by Andreas Dilger [ 03/Dec/12 ] |
|
Peter, I think the bug that is failing 100% of the time is |
| Comment by Peter Jones [ 03/Dec/12 ] |
|
ok - thanks Andreas |
| Comment by Hongchao Zhang [ 20/Dec/12 ] |
|
this test passed 100% from 28, Nov, consider to drop it priority? |
| Comment by Jodi Levi (Inactive) [ 20/Dec/12 ] |
|
Duplicate of |