[LU-2843] ASSERTION( last_rec->lrh_index == tail->lrt_index ) Created: 20/Feb/13  Updated: 03/Apr/13  Resolved: 03/Apr/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.4.0

Type: Bug Priority: Blocker
Reporter: Ned Bass Assignee: Hongchao Zhang
Resolution: Fixed Votes: 0
Labels: MB
Environment:

2.6.32-279.9.1.1chaos.ch5.1.x86_64
KVM x86_64 virtual machine


Severity: 3
Rank (Obsolete): 6882

 Description   

Steps leading up to:

  1. create new filesystem
  2. register changelog user
  3. run createmany and unlinkmany in infinite loop until catalog full (LU-1586)
  4. lctl --device lustre-MDT0000 changelog_deregister cl1

Build contains patches for LU-2129 and LU-2102.

LustreError: 30248:0:(llog_osd.c:596:llog_osd_next_block()) ASSERTION( last_rec->lrh_index == tail->lrt_index ) failed: 
LustreError: 30248:0:(llog_osd.c:596:llog_osd_next_block()) LBUG
Pid: 30248, comm: lctl
PID: 30248  TASK: ffff88004bd7e080  CPU: 3   COMMAND: "lctl"
 #0 [ffff88005029b590] machine_kexec at ffffffff8103283b
 #1 [ffff88005029b5f0] crash_kexec at ffffffff810ba492
 #2 [ffff88005029b6c0] panic at ffffffff814fdcf2
 #3 [ffff88005029b740] lbug_with_loc at ffffffffa0fc7fbb [libcfs]
 #4 [ffff88005029b760] llog_osd_next_block at ffffffffa06ba3ed [obdclass]
 #5 [ffff88005029b800] llog_process_thread at ffffffffa068a7f6 [obdclass]
 #6 [ffff88005029b8b0] llog_process_or_fork at ffffffffa068cb0d [obdclass]
 #7 [ffff88005029b900] llog_cat_process_cb at ffffffffa068f282 [obdclass]
 #8 [ffff88005029b950] llog_process_thread at ffffffffa068adfb [obdclass]
 #9 [ffff88005029ba00] llog_process_or_fork at ffffffffa068cb0d [obdclass]
#10 [ffff88005029ba50] llog_cat_process_or_fork at ffffffffa068d5b9 [obdclass]
#11 [ffff88005029bae0] llog_cat_process at ffffffffa068d7c9 [obdclass]
#12 [ffff88005029bb00] llog_changelog_cancel at ffffffffa0c730ff [mdd]
#13 [ffff88005029bb40] llog_cancel at ffffffffa0691538 [obdclass]
#14 [ffff88005029bb80] mdd_changelog_llog_cancel at ffffffffa0c77dae [mdd]
#15 [ffff88005029bbc0] mdd_changelog_user_purge at ffffffffa0c78220 [mdd]
#16 [ffff88005029bc10] mdd_iocontrol at ffffffffa0c78776 [mdd]
#17 [ffff88005029bc70] mdt_ioc_child at ffffffffa0d50129 [mdt]
#18 [ffff88005029bcf0] mdt_iocontrol at ffffffffa0d54608 [mdt]
#19 [ffff88005029bd90] class_handle_ioctl at ffffffffa069ba4f [obdclass]
#20 [ffff88005029be40] obd_class_ioctl at ffffffffa06852ab [obdclass]
#21 [ffff88005029be60] vfs_ioctl at ffffffff8118ddb2
#22 [ffff88005029bea0] do_vfs_ioctl at ffffffff8118df54
#23 [ffff88005029bf30] sys_ioctl at ffffffff8118e4d1
#24 [ffff88005029bf80] system_call_fastpath at ffffffff8100b0f2
    RIP: 0000003ae74dde07  RSP: 00007fff6c48fc00  RFLAGS: 00010246
    RAX: 0000000000000010  RBX: ffffffff8100b0f2  RCX: 0000000000000000
    RDX: 00007fff6c48fc40  RSI: 00000000424066b2  RDI: 0000000000000003
    RBP: 0000000000000001   R8: 0000000000000000   R9: 0000000000000240
    R10: 0000000000000000  R11: 0000000000000246  R12: 00000000424066b2
    R13: 00007fff6c48fc40  R14: 0000000000678380  R15: 0000000000000003
    ORIG_RAX: 0000000000000010  CS: 0033  SS: 002b


 Comments   
Comment by Peter Jones [ 21/Feb/13 ]

Hongchao

Could you please look into this one?

Thanks

Peter

Comment by Mikhail Pershin [ 23/Feb/13 ]

It makes sense to add debug into llog_osd_write_rec() to see how log records were written upon llog ends. There are several separated writes and maybe we have situation when part of data is written and other is not. Assertion in llog_osd_next_block() is just result of failed write most probably

Comment by Hongchao Zhang [ 04/Mar/13 ]

status update:
the issue has been reproduced locally and the patch is under creation & test.

Comment by Mikhail Pershin [ 04/Mar/13 ]

Hongchao, could you decribe the reason of issue? We saw similar one in Orion but it gone at some point. I wonder what can be root cause now.

Comment by Hongchao Zhang [ 06/Mar/13 ]

the patch is http://review.whamcloud.com/#change,5604

this issue is caused by the ways of writing LLOG record, one record is written into LLOG by 3 continuous write operations at most,
if the second or third write failed, then the data written previously will damage the LLOG for it did not know the exact size of
the LLOG file and it will treat all data it can read as LLOG records.

Comment by Mikhail Pershin [ 11/Mar/13 ]

yes, that is technical debt, those 3-step blob write can be partial.

Comment by Hongchao Zhang [ 13/Mar/13 ]

the single write can also cause this problem,

int osd_ldiskfs_write_record(struct inode *inode, void *buf, int bufsize,
                             int write_NUL, loff_t *offs, handle_t *handle)
{
        ...
        while (bufsize > 0) {
                if (bh != NULL)
                        brelse(bh);

                block = offset >> inode->i_blkbits;
                boffs = offset & (blocksize - 1);
                size = min(blocksize - boffs, bufsize);
                bh = ldiskfs_bread(handle, inode, block, 1, &err);      <---------- could fail for -ENOSPC
                if (!bh) {
                        CERROR("%s: error reading offset %llu (block %lu): "
                               "rc = %d\n",
                               inode->i_sb->s_id, offset, block, err);
                        break;
                }

                ...
        }

        ...
}

currently, osd_punch will cause LBUG "LASSERT(op < OSD_OT_MAX)" in "osd_trans_exec_op"

Comment by Mikhail Pershin [ 13/Mar/13 ]

it can fail, yes, but it doesn't change i_size so no need to return it back. As for OSD_OT_MAX - it need just be increased if such situation is possible and valid

Comment by Hongchao Zhang [ 14/Mar/13 ]

no, it change the size even on the failed case(-ENOSPC)

/* correct in-core and on-disk sizes */
        if (new_size > i_size_read(inode)) {
                spin_lock(&inode->i_lock);
                if (new_size > i_size_read(inode))
                        i_size_write(inode, new_size);
                if (i_size_read(inode) > LDISKFS_I(inode)->i_disksize) {
                        LDISKFS_I(inode)->i_disksize = i_size_read(inode);
                        dirty_inode = 1;
                }
                spin_unlock(&inode->i_lock);
                if (dirty_inode)
                        inode->i_sb->s_op->dirty_inode(inode);
        }

does we fix here by not change the size for error case (could it affect other cases which need this behavior) or fix it in the LLOG layer?

Comment by Mikhail Pershin [ 14/Mar/13 ]

check new_size value, it is not changed in case of failed ldiskfs_bread() so we will not enter under that 'if' clause. That would be just OSD bug if size is changed upon failed write.

Comment by Peter Jones [ 03/Apr/13 ]

Landed for 2.4

Generated at Sat Feb 10 01:28:41 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.