[LU-3618] osp_sync.c:324:osp_sync_request_commit_cb()) ASSERTION( list_empty(&req->rq_exp_list) ) failed Created: 23/Jul/13  Updated: 14/Feb/14  Resolved: 11/Feb/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0
Fix Version/s: Lustre 2.6.0, Lustre 2.5.1

Type: Bug Priority: Major
Reporter: Oleg Drokin Assignee: Hongchao Zhang
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
Severity: 3
Rank (Obsolete): 9310

 Description   

Hit this while running replay-single test 61a:

<0>[604824.463203] LustreError: 30555:0:(osp_sync.c:324:osp_sync_request_commit_cb()) ASSERTION( list_empty(&req->rq_exp_list) ) failed: 
<0>[604824.464322] LustreError: 30555:0:(osp_sync.c:324:osp_sync_request_commit_cb()) LBUG
<4>[604824.481997] Pid: 30555, comm: ptlrpcd_rcv
<4>[604824.482492] 
<4>[604824.482493] Call Trace:
<4>[604824.483471]  [<ffffffffa0e068a5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
<4>[604824.484216]  [<ffffffffa0e06ea7>] lbug_with_loc+0x47/0xb0 [libcfs]
<4>[604824.485725]  [<ffffffffa0858b15>] osp_sync_request_commit_cb+0x155/0x1c0 [osp]
<4>[604824.486917]  [<ffffffffa125c7ab>] ptlrpc_free_committed+0x14b/0x620 [ptlrpc]
<4>[604824.487688]  [<ffffffffa125e523>] after_reply+0x7a3/0xd90 [ptlrpc]
<4>[604824.492014]  [<ffffffffa1263463>] ptlrpc_check_set+0x1093/0x1da0 [ptlrpc]
<4>[604824.492801]  [<ffffffffa128fa8b>] ptlrpcd_check+0x55b/0x590 [ptlrpc]
<4>[604824.493498]  [<ffffffffa128ffd3>] ptlrpcd+0x233/0x390 [ptlrpc]
<4>[604824.494131]  [<ffffffff8105ad10>] ? default_wake_function+0x0/0x20
<4>[604824.494817]  [<ffffffffa128fda0>] ? ptlrpcd+0x0/0x390 [ptlrpc]
<4>[604824.506292]  [<ffffffff81094606>] kthread+0x96/0xa0
<4>[604824.506922]  [<ffffffff8100c10a>] child_rip+0xa/0x20
<4>[604824.507515]  [<ffffffff81094570>] ? kthread+0x0/0xa0
<4>[604824.508122]  [<ffffffff8100c100>] ? child_rip+0x0/0x20
<4>[604824.508745] 
<0>[604824.547194] Kernel panic - not syncing: LBUG

This seems to be somewhat similar to ORI-634, but I actually have a crashdump and matching modules in
/exports/crashdumps/192.168.10.221-2013-07-22-02\:04\:40
source tag in my tree:master-20130722



 Comments   
Comment by Liang Zhen (Inactive) [ 12/Jan/14 ]

I hit this too: https://maloo.whamcloud.com/test_sets/040d0c90-7b60-11e3-a66e-52540035b04c
seems to me it's because req::rq_commit_cb is called twice:
if a request is already on imp::imp_replay_list, and it's replayed and got reply:

after_reply()
                } else if (req->rq_commit_cb != NULL &&
                           cfs_list_empty(&req->rq_replay_list)) {
                        spin_unlock(&imp->imp_lock);
                        req->rq_commit_cb(req); // called the first time 
                        spin_lock(&imp->imp_lock);
                }
                ......
                // because the request is already on imp_replay_list, rq_commmit_cb will
                // be called again
                ptlrpc_free_committed(imp);  

Patch is here: http://review.whamcloud.com/#/c/8815/

Comment by Peter Jones [ 11/Feb/14 ]

Landed for 2.6

Generated at Sat Feb 10 01:35:28 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.