[LU-4458] Interop 2.5.0<->2.6 failure on test suite recovery-small test_9 Created: 08/Jan/14 Updated: 19/Jan/15 Resolved: 19/Jan/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.6.0, Lustre 2.7.0 |
| Fix Version/s: | Lustre 2.7.0, Lustre 2.5.4 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Maloo | Assignee: | Mikhail Pershin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | HB | ||
| Environment: |
server: lustre-master build # 1823 RHEL6 ldiskfs |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 12221 | ||||||||
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/3ba4558e-77f3-11e3-a6a3-52540035b04c. The sub-test test_9 failed with the following error:
Found D process on OST: 22:51:46:Lustre: DEBUG MARKER: == recovery-small test 9: pause bulk on OST (bug 1420) == 22:48:54 (1389077334) 22:51:47:Lustre: DEBUG MARKER: lctl set_param fail_loc=0x214 22:51:47:LustreError: 2046:0:(fail.c:133:__cfs_fail_timeout_set()) cfs_fail_timeout id 214 sleeping for 20000000ms 22:51:47:INFO: task ll_ost_io00_002:2046 blocked for more than 120 seconds. 22:51:47:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 22:51:47:ll_ost_io00_0 D 0000000000000001 0 2046 2 0x00000080 22:51:47: ffff8802f8759a60 0000000000000046 ffff8802f8759ac0 0000000016734040 22:57:44: ffffffffa0566ab0 ffff880316255389 0000004e359ea090 ffffffffa053c044 22:57:44: ffff8803167345f8 ffff8802f8759fd8 000000000000fb88 ffff8803167345f8 22:57:44:Call Trace: 22:57:44: [<ffffffff8150f3f2>] schedule_timeout+0x192/0x2e0 22:57:44: [<ffffffff810811e0>] ? process_timeout+0x0/0x10 22:57:45: [<ffffffffa0520d0f>] __cfs_fail_timeout_set+0xcf/0x150 [libcfs] 22:57:45: [<ffffffffa0eaaec9>] cfs_fail_timeout_set.clone.2+0x29/0x30 [ptlrpc] 22:57:45: [<ffffffffa0eae94b>] tgt_brw_write+0x34b/0x1550 [ptlrpc] 22:57:45: [<ffffffffa0525921>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 22:57:45: [<ffffffffa0eb0fea>] tgt_handle_request0+0x2ea/0x1490 [ptlrpc] 22:57:45: [<ffffffffa0525921>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 22:57:45: [<ffffffffa0eb25ca>] tgt_request_handle+0x43a/0x980 [ptlrpc] 22:57:45: [<ffffffffa0e65725>] ptlrpc_main+0xd25/0x1970 [ptlrpc] 22:57:45: [<ffffffffa0e64a00>] ? ptlrpc_main+0x0/0x1970 [ptlrpc] 22:57:46: [<ffffffff81096a36>] kthread+0x96/0xa0 22:57:46: [<ffffffff8100c0ca>] child_rip+0xa/0x20 22:57:46: [<ffffffff810969a0>] ? kthread+0x0/0xa0 22:57:46: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 22:57:46:INFO: task ll_ost_io00_002:2046 blocked for more than 120 seconds. 22:57:46:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 22:57:46:ll_ost_io00_0 D 0000000000000001 0 2046 2 0x00000080 22:57:46: ffff8802f8759a60 0000000000000046 ffff8802f8759ac0 0000000016734040 22:57:46: ffffffffa0566ab0 ffff880316255389 0000004e359ea090 ffffffffa053c044 22:57:46: ffff8803167345f8 ffff8802f8759fd8 000000000000fb88 ffff8803167345f8 22:57:46:Call Trace: |
| Comments |
| Comment by Jodi Levi (Inactive) [ 15/Jan/14 ] |
|
Mike, |
| Comment by Mikhail Pershin [ 11/Apr/14 ] |
|
The pause_bulk() was changed in 2.6 and now this affects compatibility. |
| Comment by Mikhail Pershin [ 21/Apr/14 ] |
|
The |
| Comment by Andreas Dilger [ 14/Nov/14 ] |
|
This test is still failing on average once or twice a day: If these failures are a different bug, then this one should be closed and a new one opened. |
| Comment by Gerrit Updater [ 30/Dec/14 ] |
|
Mike Pershin (mike.pershin@intel.com) uploaded a new patch: http://review.whamcloud.com/13205 |
| Comment by Gerrit Updater [ 16/Jan/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13205/ |
| Comment by Jodi Levi (Inactive) [ 19/Jan/15 ] |
|
Patch landed to Master. |