[LU-5128] ASSERTION( atomic_read(&obd->obd_req_replay_clients) == 0 ) failed Created: 01/Jun/14 Updated: 18/Aug/14 Resolved: 06/Aug/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.3 |
| Fix Version/s: | Lustre 2.7.0, Lustre 2.5.3 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Shuichi Ihara (Inactive) | Assignee: | Hongchao Zhang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | duu, mn4 | ||
| Environment: |
Lustre-2.4.3 |
||
| Severity: | 3 |
| Rank (Obsolete): | 14153 |
| Description |
|
MDS failovered and once MDS's recovery finished, many OSS crahsed due to following ASSERTION. 2014-05-30 17:39:07 Lustre: Skipped 3 previous similar messages 2014-05-30 17:39:07 LustreError: 18967:0:(ldlm_lib.c:1851:target_next_replay_req()) ASSERTION( atomic_read(&obd->obd_req_replay_clients) == 0 ) failed: 2014-05-30 17:39:07 LustreError: 18967:0:(ldlm_lib.c:1851:target_next_replay_req()) LBUG 2014-05-30 17:39:07 Pid: 18967, comm: tgt_recov 2014-05-30 17:39:07 2014-05-30 17:39:07 Call Trace: 2014-05-30 17:39:07 [<ffffffffa0353895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 2014-05-30 17:39:07 [<ffffffffa0353e97>] lbug_with_loc+0x47/0xb0 [libcfs] 2014-05-30 17:39:07 [<ffffffffa066f48c>] target_recovery_thread+0x14ac/0x1970 [ptlrpc] 2014-05-30 17:39:07 [<ffffffffa066dfe0>] ? target_recovery_thread+0x0/0x1970 [ptlrpc] 2014-05-30 17:39:07 [<ffffffff8100c0ca>] child_rip+0xa/0x20 2014-05-30 17:39:07 [<ffffffffa066dfe0>] ? target_recovery_thread+0x0/0x1970 [ptlrpc] 2014-05-30 17:39:07 [<ffffffffa066dfe0>] ? target_recovery_thread+0x0/0x1970 [ptlrpc] 2014-05-30 17:39:07 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 2014-05-30 17:39:07 2014-05-30 17:39:07 Kernel panic - not syncing: LBUG 2014-05-30 17:39:07 Pid: 18967, comm: tgt_recov Not tainted 2.6.32-358.18.1.el6_lustre.x86_64 #1 2014-05-30 17:39:07 Call Trace: 2014-05-30 17:39:07 [<ffffffff8150de58>] ? panic+0xa7/0x16f 2014-05-30 17:39:07 [<ffffffffa0353eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs] 2014-05-30 17:39:07 [<ffffffffa066f48c>] ? target_recovery_thread+0x14ac/0x1970 [ptlrpc] 2014-05-30 17:39:07 [<ffffffffa066dfe0>] ? target_recovery_thread+0x0/0x1970 [ptlrpc] 2014-05-30 17:39:07 [<ffffffff8100c0ca>] ? child_rip+0xa/0x20 2014-05-30 17:39:07 [<ffffffffa066dfe0>] ? target_recovery_thread+0x0/0x1970 [ptlrpc] 2014-05-30 17:39:07 [<ffffffffa066dfe0>] ? target_recovery_thread+0x0/0x1970 [ptlrpc] 2014-05-30 17:39:07 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
|
| Comments |
| Comment by Peter Jones [ 02/Jun/14 ] |
|
Hongchao Could you please advise on this one? Thanks Peter |
| Comment by Hongchao Zhang [ 06/Jun/14 ] |
|
Hi, Could you please attach the whole logs of this issue, thanks! Thanks |
| Comment by Hongchao Zhang [ 06/Jun/14 ] |
|
there could be a race between "target_process_req_flags" and "class_export_recovery_cleanup", and if the replay request contains the flag the patch against b2_4 is tracked at http://review.whamcloud.com/#/c/10628/ |
| Comment by Shuichi Ihara (Inactive) [ 24/Jun/14 ] |
|
this only happens on b2_4 branch or same problem maybe occur even on b2_5? |
| Comment by Hongchao Zhang [ 25/Jun/14 ] |
|
the issue tracked at http://review.whamcloud.com/#/c/10628/ also exists on b2_5 |
| Comment by Hongchao Zhang [ 26/Jun/14 ] |
|
the patch against master is tracked at http://review.whamcloud.com/#/c/10849/ |
| Comment by wu libin (Inactive) [ 15/Jul/14 ] |
|
Here is the patch for b2_5: http://review.whamcloud.com/#/c/11102/ |
| Comment by Peter Jones [ 06/Aug/14 ] |
|
Landed for 2.7 |