[LU-3892] osp_sync.c:356:osp_sync_interpret()) ASSERTION( req->rq_transno == 0 ) failed Created: 06/Sep/13 Updated: 10/Oct/14 Resolved: 24/Sep/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.0 |
| Fix Version/s: | Lustre 2.5.0, Lustre 2.4.2 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Oleg Drokin | Assignee: | Alex Zhuravlev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Rank (Obsolete): | 10156 | ||||||||
| Description |
|
I started to hit this recently running sanity in a loop, in different tests, same crash every time: <0>[20903.330989] LustreError: 9397:0:(osp_sync.c:356:osp_sync_interpret()) ASSERTION( req->rq_transno == 0 ) failed: <0>[20903.331969] LustreError: 9397:0:(osp_sync.c:356:osp_sync_interpret()) LBUG <4>[20903.332470] Pid: 9397, comm: ptlrpcd_2 <4>[20903.332898] <4>[20903.332899] Call Trace: <4>[20903.333610] [<ffffffffa0ac78a5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] <4>[20903.334192] [<ffffffffa0ac7ea7>] lbug_with_loc+0x47/0xb0 [libcfs] <4>[20903.334705] [<ffffffffa07f1092>] osp_sync_interpret+0x492/0x500 [osp] <4>[20903.335252] [<ffffffffa127b2aa>] ptlrpc_check_set+0x2ca/0x1da0 [ptlrpc] <4>[20903.335824] [<ffffffffa12a700b>] ptlrpcd_check+0x55b/0x590 [ptlrpc] <4>[20903.336580] [<ffffffffa12a7553>] ptlrpcd+0x233/0x390 [ptlrpc] <4>[20903.337121] [<ffffffff8105ad10>] ? default_wake_function+0x0/0x20 <4>[20903.337649] [<ffffffffa12a7320>] ? ptlrpcd+0x0/0x390 [ptlrpc] <4>[20903.338147] [<ffffffff81094606>] kthread+0x96/0xa0 <4>[20903.343428] [<ffffffff8100c10a>] child_rip+0xa/0x20 <4>[20903.343939] [<ffffffff81094570>] ? kthread+0x0/0xa0 <4>[20903.344452] [<ffffffff8100c100>] ? child_rip+0x0/0x20 <4>[20903.345028] <0>[20903.348400] Kernel panic - not syncing: LBUG Crash and modules: /exports/crashdumps/192.168.10.219-2013-09-05-20\:55\:33/ |
| Comments |
| Comment by Alex Zhuravlev [ 16/Sep/13 ] |
| Comment by Alex Zhuravlev [ 16/Sep/13 ] |
|
hopefully a better approach: http://review.whamcloud.com/#/c/7672/ |
| Comment by Alex Zhuravlev [ 18/Sep/13 ] |
|
I was able to reproduce the issue locally. the last patch should fix the root cause. |
| Comment by Peter Jones [ 24/Sep/13 ] |
|
Landed for 2.5.0 |
| Comment by Lukasz Flis [ 25/Sep/13 ] |
|
Our MDS server got panic today due to this bug. Regards |
| Comment by Lukasz Flis [ 25/Sep/13 ] |
|
Peter, should i report this bug in a new ticket pointing 2.4 explicitly? Sep 25 20:56:14 <user.notice> mds01.storage 3450:0:(osp_sync.c:359:osp_sync_interpret()) ASSERTION( Sep 25 20:56:14 <user.notice> mds01.storage Kernel[]: panic - not syncing: LBUG It's exactly the same issue |
| Comment by Peter Jones [ 25/Sep/13 ] |
|
Hi Lukasz It's ok. This issue is under consideration for 2.4.2. There is no need to open a new ticket. Peter |
| Comment by Patrick Farrell (Inactive) [ 08/Oct/14 ] |
|
Was this patched in master as well? Cray saw this issue in 2.6 ( Sorry, please forget that comment. For some reason I thought the original patch was against 2.5. Not sure what I was thinking here. |