Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2843

ASSERTION( last_rec->lrh_index == tail->lrt_index )

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.0
    • Lustre 2.4.0
    • 2.6.32-279.9.1.1chaos.ch5.1.x86_64
      KVM x86_64 virtual machine
    • 3
    • 6882

    Description

      Steps leading up to:

      1. create new filesystem
      2. register changelog user
      3. run createmany and unlinkmany in infinite loop until catalog full (LU-1586)
      4. lctl --device lustre-MDT0000 changelog_deregister cl1

      Build contains patches for LU-2129 and LU-2102.

      LustreError: 30248:0:(llog_osd.c:596:llog_osd_next_block()) ASSERTION( last_rec->lrh_index == tail->lrt_index ) failed: 
      LustreError: 30248:0:(llog_osd.c:596:llog_osd_next_block()) LBUG
      Pid: 30248, comm: lctl
      
      PID: 30248  TASK: ffff88004bd7e080  CPU: 3   COMMAND: "lctl"
       #0 [ffff88005029b590] machine_kexec at ffffffff8103283b
       #1 [ffff88005029b5f0] crash_kexec at ffffffff810ba492
       #2 [ffff88005029b6c0] panic at ffffffff814fdcf2
       #3 [ffff88005029b740] lbug_with_loc at ffffffffa0fc7fbb [libcfs]
       #4 [ffff88005029b760] llog_osd_next_block at ffffffffa06ba3ed [obdclass]
       #5 [ffff88005029b800] llog_process_thread at ffffffffa068a7f6 [obdclass]
       #6 [ffff88005029b8b0] llog_process_or_fork at ffffffffa068cb0d [obdclass]
       #7 [ffff88005029b900] llog_cat_process_cb at ffffffffa068f282 [obdclass]
       #8 [ffff88005029b950] llog_process_thread at ffffffffa068adfb [obdclass]
       #9 [ffff88005029ba00] llog_process_or_fork at ffffffffa068cb0d [obdclass]
      #10 [ffff88005029ba50] llog_cat_process_or_fork at ffffffffa068d5b9 [obdclass]
      #11 [ffff88005029bae0] llog_cat_process at ffffffffa068d7c9 [obdclass]
      #12 [ffff88005029bb00] llog_changelog_cancel at ffffffffa0c730ff [mdd]
      #13 [ffff88005029bb40] llog_cancel at ffffffffa0691538 [obdclass]
      #14 [ffff88005029bb80] mdd_changelog_llog_cancel at ffffffffa0c77dae [mdd]
      #15 [ffff88005029bbc0] mdd_changelog_user_purge at ffffffffa0c78220 [mdd]
      #16 [ffff88005029bc10] mdd_iocontrol at ffffffffa0c78776 [mdd]
      #17 [ffff88005029bc70] mdt_ioc_child at ffffffffa0d50129 [mdt]
      #18 [ffff88005029bcf0] mdt_iocontrol at ffffffffa0d54608 [mdt]
      #19 [ffff88005029bd90] class_handle_ioctl at ffffffffa069ba4f [obdclass]
      #20 [ffff88005029be40] obd_class_ioctl at ffffffffa06852ab [obdclass]
      #21 [ffff88005029be60] vfs_ioctl at ffffffff8118ddb2
      #22 [ffff88005029bea0] do_vfs_ioctl at ffffffff8118df54
      #23 [ffff88005029bf30] sys_ioctl at ffffffff8118e4d1
      #24 [ffff88005029bf80] system_call_fastpath at ffffffff8100b0f2
          RIP: 0000003ae74dde07  RSP: 00007fff6c48fc00  RFLAGS: 00010246
          RAX: 0000000000000010  RBX: ffffffff8100b0f2  RCX: 0000000000000000
          RDX: 00007fff6c48fc40  RSI: 00000000424066b2  RDI: 0000000000000003
          RBP: 0000000000000001   R8: 0000000000000000   R9: 0000000000000240
          R10: 0000000000000000  R11: 0000000000000246  R12: 00000000424066b2
          R13: 00007fff6c48fc40  R14: 0000000000678380  R15: 0000000000000003
          ORIG_RAX: 0000000000000010  CS: 0033  SS: 002b
      

      Attachments

        Activity

          [LU-2843] ASSERTION( last_rec->lrh_index == tail->lrt_index )
          pjones Peter Jones added a comment -

          Landed for 2.4

          pjones Peter Jones added a comment - Landed for 2.4

          check new_size value, it is not changed in case of failed ldiskfs_bread() so we will not enter under that 'if' clause. That would be just OSD bug if size is changed upon failed write.

          tappro Mikhail Pershin added a comment - check new_size value, it is not changed in case of failed ldiskfs_bread() so we will not enter under that 'if' clause. That would be just OSD bug if size is changed upon failed write.

          no, it change the size even on the failed case(-ENOSPC)

          /* correct in-core and on-disk sizes */
                  if (new_size > i_size_read(inode)) {
                          spin_lock(&inode->i_lock);
                          if (new_size > i_size_read(inode))
                                  i_size_write(inode, new_size);
                          if (i_size_read(inode) > LDISKFS_I(inode)->i_disksize) {
                                  LDISKFS_I(inode)->i_disksize = i_size_read(inode);
                                  dirty_inode = 1;
                          }
                          spin_unlock(&inode->i_lock);
                          if (dirty_inode)
                                  inode->i_sb->s_op->dirty_inode(inode);
                  }
          

          does we fix here by not change the size for error case (could it affect other cases which need this behavior) or fix it in the LLOG layer?

          hongchao.zhang Hongchao Zhang added a comment - no, it change the size even on the failed case(-ENOSPC) /* correct in-core and on-disk sizes */ if (new_size > i_size_read(inode)) { spin_lock(&inode->i_lock); if (new_size > i_size_read(inode)) i_size_write(inode, new_size); if (i_size_read(inode) > LDISKFS_I(inode)->i_disksize) { LDISKFS_I(inode)->i_disksize = i_size_read(inode); dirty_inode = 1; } spin_unlock(&inode->i_lock); if (dirty_inode) inode->i_sb->s_op->dirty_inode(inode); } does we fix here by not change the size for error case (could it affect other cases which need this behavior) or fix it in the LLOG layer?

          it can fail, yes, but it doesn't change i_size so no need to return it back. As for OSD_OT_MAX - it need just be increased if such situation is possible and valid

          tappro Mikhail Pershin added a comment - it can fail, yes, but it doesn't change i_size so no need to return it back. As for OSD_OT_MAX - it need just be increased if such situation is possible and valid
          hongchao.zhang Hongchao Zhang added a comment - - edited

          the single write can also cause this problem,

          int osd_ldiskfs_write_record(struct inode *inode, void *buf, int bufsize,
                                       int write_NUL, loff_t *offs, handle_t *handle)
          {
                  ...
                  while (bufsize > 0) {
                          if (bh != NULL)
                                  brelse(bh);
          
                          block = offset >> inode->i_blkbits;
                          boffs = offset & (blocksize - 1);
                          size = min(blocksize - boffs, bufsize);
                          bh = ldiskfs_bread(handle, inode, block, 1, &err);      <---------- could fail for -ENOSPC
                          if (!bh) {
                                  CERROR("%s: error reading offset %llu (block %lu): "
                                         "rc = %d\n",
                                         inode->i_sb->s_id, offset, block, err);
                                  break;
                          }
          
                          ...
                  }
          
                  ...
          }
          

          currently, osd_punch will cause LBUG "LASSERT(op < OSD_OT_MAX)" in "osd_trans_exec_op"

          hongchao.zhang Hongchao Zhang added a comment - - edited the single write can also cause this problem, int osd_ldiskfs_write_record(struct inode *inode, void *buf, int bufsize, int write_NUL, loff_t *offs, handle_t *handle) { ... while (bufsize > 0) { if (bh != NULL) brelse(bh); block = offset >> inode->i_blkbits; boffs = offset & (blocksize - 1); size = min(blocksize - boffs, bufsize); bh = ldiskfs_bread(handle, inode, block, 1, &err); <---------- could fail for -ENOSPC if (!bh) { CERROR( "%s: error reading offset %llu (block %lu): " "rc = %d\n" , inode->i_sb->s_id, offset, block, err); break ; } ... } ... } currently, osd_punch will cause LBUG "LASSERT(op < OSD_OT_MAX)" in "osd_trans_exec_op"

          yes, that is technical debt, those 3-step blob write can be partial.

          tappro Mikhail Pershin added a comment - yes, that is technical debt, those 3-step blob write can be partial.

          the patch is http://review.whamcloud.com/#change,5604

          this issue is caused by the ways of writing LLOG record, one record is written into LLOG by 3 continuous write operations at most,
          if the second or third write failed, then the data written previously will damage the LLOG for it did not know the exact size of
          the LLOG file and it will treat all data it can read as LLOG records.

          hongchao.zhang Hongchao Zhang added a comment - the patch is http://review.whamcloud.com/#change,5604 this issue is caused by the ways of writing LLOG record, one record is written into LLOG by 3 continuous write operations at most, if the second or third write failed, then the data written previously will damage the LLOG for it did not know the exact size of the LLOG file and it will treat all data it can read as LLOG records.

          Hongchao, could you decribe the reason of issue? We saw similar one in Orion but it gone at some point. I wonder what can be root cause now.

          tappro Mikhail Pershin added a comment - Hongchao, could you decribe the reason of issue? We saw similar one in Orion but it gone at some point. I wonder what can be root cause now.

          status update:
          the issue has been reproduced locally and the patch is under creation & test.

          hongchao.zhang Hongchao Zhang added a comment - status update: the issue has been reproduced locally and the patch is under creation & test.

          It makes sense to add debug into llog_osd_write_rec() to see how log records were written upon llog ends. There are several separated writes and maybe we have situation when part of data is written and other is not. Assertion in llog_osd_next_block() is just result of failed write most probably

          tappro Mikhail Pershin added a comment - It makes sense to add debug into llog_osd_write_rec() to see how log records were written upon llog ends. There are several separated writes and maybe we have situation when part of data is written and other is not. Assertion in llog_osd_next_block() is just result of failed write most probably

          People

            hongchao.zhang Hongchao Zhang
            nedbass Ned Bass (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: