[LU-2293] Assertion triggered in osp_sync_thread Created: 06/Nov/12  Updated: 07/Nov/12  Resolved: 06/Nov/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Prakash Surya (Inactive) Assignee: Alex Zhuravlev
Resolution: Duplicate Votes: 0
Labels: topsequoia

Severity: 3
Rank (Obsolete): 5484

 Description   

Triggered this assertion bringing up the MDS after a version change:

LustreError: 33030:0:(osp_sync.c:584:osp_sync_process_record()) processed all old entries: 0x3e03:1
LustreError: 33030:0:(osp_sync.c:584:osp_sync_process_record()) Skipped 28 previous similar messages
LustreError: 33027:0:(llog_cat.c:187:llog_cat_id2handle()) lstest-OST01c8-osc-MDT0000: error opening log id 0x5a5a5a5a5a5a5a5a:5a5a5a5a: rc = -2
LustreError: 33027:0:(llog_cat.c:513:llog_cat_cancel_records()) Cannot find log 0x5a5a5a5a5a5a5a5a
LustreError: 33027:0:(llog_cat.c:552:llog_cat_cancel_records()) lstest-OST01c8-osc-MDT0000: fail to cancel 0 of 1 llog-records: rc = -2
LustreError: 33027:0:(osp_sync.c:714:osp_sync_process_committed()) @@@ lstest-OST01c8-osc-MDT0000: can't cancel record: -2
  req@ffff880f7bd3b800 x1417921862573367/t0(0) o6->lstest-OST01c8-osc-MDT0000@172.20.3.56@o2ib500:28/4 lens 664/400 e 0 to 0 dl 1352235775 ref 1 fl Complete:R/0/0 rc 0/-2
LustreError: 33027:0:(llog_cat.c:187:llog_cat_id2handle()) lstest-OST01c8-osc-MDT0000: error opening log id 0x5a5a5a5a5a5a5a5a:5a5a5a5a: rc = -2
LustreError: 33027:0:(llog_cat.c:513:llog_cat_cancel_records()) Cannot find log 0x5a5a5a5a5a5a5a5a
LustreError: 33027:0:(llog_cat.c:552:llog_cat_cancel_records()) lstest-OST01c8-osc-MDT0000: fail to cancel 0 of 1 llog-records: rc = -2
LustreError: 33027:0:(osp_sync.c:714:osp_sync_process_committed()) @@@ lstest-OST01c8-osc-MDT0000: can't cancel record: -2
  req@ffff880fd2d16000 x1417921862573368/t0(0) o6->lstest-OST01c8-osc-MDT0000@172.20.3.56@o2ib500:28/4 lens 664/400 e 0 to 0 dl 1352235775 ref 1 fl Complete:R/0/0 rc 0/-2
LustreError: 33027:0:(osp_sync.c:866:osp_sync_thread()) ASSERTION( rc == 0 || rc == LLOG_PROC_BREAK ) failed: 0 changes, 7 in progress, 7 in flight: -22
LustreError: 33027:0:(osp_sync.c:866:osp_sync_thread()) LBUG
Pid: 33027, comm: osp-syn-456


Call Trace:
 [<ffffffffa05ae965>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [<ffffffffa05aef77>] lbug_with_loc+0x47/0xb0 [libcfs]
 [<ffffffffa1006440>] osp_sync_thread+0x630/0x700 [osp]
 [<ffffffffa1005e10>] ? osp_sync_thread+0x0/0x700 [osp]
 [<ffffffff8100c14a>] child_rip+0xa/0x20
 [<ffffffffa1005e10>] ? osp_sync_thread+0x0/0x700 [osp]
 [<ffffffffa1005e10>] ? osp_sync_thread+0x0/0x700 [osp]
 [<ffffffff8100c140>] ? child_rip+0x0/0x20

Kernel panic - not syncing: LBUG
Pid: 33027, comm: osp-syn-456
 Tainted: P        W  ----------------   2.6.32-220.23.1.2chaos.ch5.x86_64 #1
Call Trace:
 [<ffffffff814eea92>] ? panic+0x78/0x143
 [<ffffffffa05aefcb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
 [<ffffffffa1006440>] ? osp_sync_thread+0x630/0x700 [osp]
 [<ffffffffa1005e10>] ? osp_sync_thread+0x0/0x700 [osp]
 [<ffffffff8100c14a>] ? child_rip+0xa/0x20
 [<ffffffffa1005e10>] ? osp_sync_thread+0x0/0x700 [osp]
 [<ffffffffa1005e10>] ? osp_sync_thread+0x0/0x700 [osp]
 [<ffffffff8100c140>] ? child_rip+0x0/0x20

Lustre Version:

Lustre: Lustre: Build Version: 2.3.54-2chaos-2chaos--PRISTINE-2.6.32-220.23.1.2chaos.ch5.x86_64


 Comments   
Comment by Prakash Surya (Inactive) [ 06/Nov/12 ]

Looks like a duplicate of LU-2109

Comment by Peter Jones [ 06/Nov/12 ]

Alex

Could you please assign someone to this one?

Peter

Comment by Peter Jones [ 06/Nov/12 ]

Ah. Our comments crossed

Comment by Li Wei (Inactive) [ 06/Nov/12 ]

Prakash,

I went to https://github.com/chaos/lustre and did not find 2.3.54-2chaos-2chaos tag from the branch/tag drop down list. Was I looking at the wrong place?

Comment by Prakash Surya (Inactive) [ 07/Nov/12 ]

Sorry, it looks like we have not pushed that tag to github yet. Looking at what's there, this branch is what is tagged as 2.3.54-2chaos: https://github.com/chaos/lustre/commits/2.3.54-llnl

I did not have your two patches from LU-2109 applied when this hit, specifically these two:

commit 0748ca16b672798ca213b8582979ae5481de19d2
Author: Li Wei <wei.g.li@intel.com>
Date:   Fri Nov 2 15:21:01 2012 +0800

    LU-2109 llog: Diagnostic patch
    
    To hunt down those who free log handles that are still being
    processed.
    
    Change-Id: Ib65c5fb8881cfeeb5cbf5b891ae235b97dde5e82
    Signed-off-by: Li Wei <wei.g.li@intel.com>

commit d0f28b8d78ec86041c79d77e9f423e48f9812c6e
Author: Li Wei <wei.g.li@intel.com>
Date:   Thu Nov 1 21:47:57 2012 +0800

    LU-2109 osp: Tell more when unable to cancel log records
    
    This is to debug LU-2109, but I think it may be useful to be landed to
    master.
    
    Change-Id: I7b487271608eb7ecbd9869c6e44643a463f08416
    Signed-off-by: Li Wei <wei.g.li@intel.com>

But I've pulled them in since hitting this, so they should be there the next time it occurs.

Generated at Sat Feb 10 01:23:59 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.