[LU-802] Test failure on test suite parallel-scale, subtest test_write_append_truncate Created: 28/Oct/11 Updated: 31/Oct/13 Resolved: 31/Oct/13 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.2.0, Lustre 2.1.1, Lustre 2.1.2, Lustre 1.8.8, Lustre 2.4.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Maloo | Assignee: | Chris Gearing (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 4130 |
| Description |
|
This issue was created by maloo for Chris Gearing <chris@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/0219bc62-00f6-11e1-bb4f-52540025f9af. See dmesg client vm1 The sub-test test_write_append_truncate failed with the following error:
Info required for matching: parallel-scale write_append_truncate |
| Comments |
| Comment by Jian Yu [ 13/Feb/12 ] |
|
Lustre Tag: v2_1_1_0_RC2 The same issue occurred: https://maloo.whamcloud.com/test_sets/e04b00fe-55d3-11e1-9aa8-5254004bbbd3 The stack trace on Client 1 (client-11vm1) showed that: 12:26:40:write_append_ S 0002b90d 0 23901 23898 0x00000080 12:26:40: f63c5570 00000086 4f381feb 0002b90d 00000000 000003c8 00005d5d 0000a53a 12:26:40: 00000000 c3318200 00002643 00000000 00002643 c0b0f0c0 c0b0f0c0 f63c5818 12:26:40: c0b0f0c0 c0b0aaa4 c0b0f0c0 f63c5818 c69b4000 f98351ed f982663f 0000008f 12:26:40:Call Trace: 12:26:40: [<f980de2e>] ? cl_lock_mutex_put+0x3e/0x80 [obdclass] 12:26:40: [<f980e328>] ? cl_lock_state_wait+0x1d8/0x340 [obdclass] 12:26:40: [<f980b8f5>] ? cl_lock_used_mod+0x25/0x60 [obdclass] 12:26:40: [<f980bbad>] ? cl_lock_user_add+0x4d/0x200 [obdclass] 12:26:40: [<c044d4d0>] ? default_wake_function+0x0/0x10 12:26:40: [<f9813969>] ? cl_enqueue_locked+0x1a9/0x2d0 [obdclass] 12:26:40: [<f93faee6>] ? lov_io_call+0x76/0x250 [lov] 12:26:40: [<f9813d69>] ? cl_lock_request+0x89/0x2d0 [obdclass] 12:26:40: [<f93fccc2>] ? lov_io_lock+0x72/0x260 [lov] 12:26:40: [<f89ec572>] ? vvp_io_init+0xe2/0x380 [lustre] 12:26:40: [<f84c4eaf>] ? libcfs_debug_vmsg2+0x34f/0x820 [libcfs] 12:26:40: [<f981bbe0>] ? cl_io_lock+0x4e0/0xa10 [obdclass] 12:26:40: [<f9814fee>] ? cl_io_iter_init+0x6e/0x240 [obdclass] 12:26:41: [<f981c232>] ? cl_io_loop+0x122/0x2a0 [obdclass] 12:26:41: [<f896497a>] ? ll_file_io_generic+0x41a/0x6c0 [lustre] 12:26:41: [<f9801af6>] ? cl_env_get+0x16/0x480 [obdclass] 12:26:41: [<f84c8ce2>] ? cfs_hash_dual_bd_unlock+0x22/0x50 [libcfs] 12:26:41: [<f84ccc4e>] ? cfs_hash_find_or_add+0x7e/0x160 [libcfs] 12:26:41: [<f8964d41>] ? ll_file_aio_read+0x121/0x4d0 [lustre] 12:26:41: [<c052ee2e>] ? cp_new_stat64+0xee/0x100 12:26:41: [<f8970c65>] ? ll_file_read+0x165/0x440 [lustre] 12:26:41: [<c05a24fc>] ? security_file_permission+0xc/0x10 12:26:41: [<c052a656>] ? rw_verify_area+0x66/0xe0 12:26:41: [<f8970b00>] ? ll_file_read+0x0/0x440 [lustre] 12:26:41: [<c052b01d>] ? vfs_read+0x9d/0x190 12:26:41: [<c04afccc>] ? audit_syscall_entry+0x21c/0x240 12:26:41: [<c052b151>] ? sys_read+0x41/0x70 12:26:41: [<c0409a9f>] ? sysenter_do_call+0x12/0x28 |
| Comment by Peter Jones [ 15/Feb/12 ] |
|
Jinshan Could you please look at this one? Thanks Peter |
| Comment by Jinshan Xiong (Inactive) [ 15/Feb/12 ] |
|
From the debug log, it seems everything was fine. How much time was set for this test? Let's try longer time before timing it out. |
| Comment by Jian Yu [ 16/Feb/12 ] |
|
> From the debug log, it seems everything was fine. How much time was set for this test? Let's try longer time before timing it out. It seems the TIMEOUT value set by the autotest system is 3600. The write_append_truncate test took very long time (large than 2000s) to run on the VMs, but took less than 10mins to run on the physical nodes with real devices: Per the parallel-scale test script: #
# write_append_truncate
#
# threads per client
write_THREADS=${write_THREADS:-8}
write_REP=${write_REP:-10000}
[ "$SLOW" = "no" ] && write_REP=100
I think we could specify small values to write_REP/write_THREADS to reduce the run time on VMs by the autotest system. |
| Comment by Jinshan Xiong (Inactive) [ 16/Feb/12 ] |
|
Thank you, yujian. Please let me know if you meet this problem again. |
| Comment by Sarah Liu [ 05/Mar/12 ] |
|
got the similar issue: https://maloo.whamcloud.com/test_sets/9ff40214-6701-11e1-a9a4-5254004bbbd3 build lustre-master/493/RHEL6-x86_64 |
| Comment by Jian Yu [ 11/May/12 ] |
|
Lustre Build: http://build.whamcloud.com/job/lustre-b1_8/194/ The similar issue occurred: https://maloo.whamcloud.com/test_sets/3c4bbb96-9b3e-11e1-a0a0-52540035b04c |
| Comment by Jian Yu [ 31/May/12 ] |
|
Lustre Tag: v2_1_2_RC2 The same issue occurred: https://maloo.whamcloud.com/test_sets/19c1ae10-aac9-11e1-bd84-52540035b04c |
| Comment by Jian Yu [ 18/Dec/12 ] |
|
Lustre Client: 2.3.0 The same issue occurred: https://maloo.whamcloud.com/test_sets/4f1c6314-4864-11e2-8cdc-52540035b04c |
| Comment by Jian Yu [ 09/Sep/13 ] |
|
The timeout issue keeps occurring while testing Lustre 2.4.1 RC2: The test timed out in 3600s. However, the same test passed on Lustre 2.4.1 RC1 in about 5570s several days ago. Chris, could you please check whether the timeout mechanism in autotest was changed or not? Thanks. The timeout failure is preventing the remaining sub-tests in parallel-scale.sh from running. So, I flagged this ticket as a blocker. |
| Comment by Jian Yu [ 31/Oct/13 ] |
|
The parallel-scale write_append_truncate test passed in the latest full group test sessions without timeout issue. Let's close this ticket. If it hit timeout failure again, please feel free to reopen the ticket. |