Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3233

tgt_cb_last_committed()) ASSERTION( ccb->llcc_exp->exp_obd == ccb->llcc_tgt->lut_obd ) failed:

Details

    • 3
    • 7934

    Description

      I see this running racer on a single node MDSCOUNT=2 llmount.sh filesystem. (I upgraded the LASSERT() to an LASSERTF().)

      LustreError: 10307:0:(mdt_recovery.c:390:mdt_last_rcvd_update()) Trying to overwrite bigger transno:on-disk: 4295820775, new: 4295609707 replay: 0. see LU-617.
      LustreError: 10277:0:(tgt_lastrcvd.c:412:tgt_cb_last_committed()) ASSERTION( ccb->llcc_exp->exp_obd == ccb->llcc_tgt->lut_obd ) failed: exp_obd ffff88014fb182f8 != lut_obd ffff88016c536178
      
      Breakpoint 2, lbug_with_loc (msgdata=0xffffffffa12c9360) at /root/lustre-release/libcfs/libcfs/linux/linux-debug.c:167
      167	        libcfs_debug_msg(msgdata, "LBUG\n");
      (gdb) bt
      #0  lbug_with_loc (msgdata=0xffffffffa12c9360)
          at /root/lustre-release/libcfs/libcfs/linux/linux-debug.c:167
      #1  0xffffffffa12613e9 in tgt_cb_last_committed (env=<value optimized out>, 
          th=<value optimized out>, cb=0xffff88017377cb40, err=<value optimized out>)
          at /root/lustre-release/lustre/ptlrpc/../../lustre/target/tgt_lastrcvd.c:410
      #2  0xffffffffa0874a24 in osd_trans_commit_cb (sb=<value optimized out>, 
          jcb=<value optimized out>, error=0)
          at /root/lustre-release/lustre/osd-ldiskfs/osd_handler.c:646
      #3  0xffffffffa082610a in ldiskfs_journal_commit_callback (
          journal=<value optimized out>, txn=0xffff880177622a80)
          at /root/lustre-release/ldiskfs/ldiskfs/super.c:329
      #4  0xffffffffa007584f in jbd2_journal_commit_transaction (journal=0xffff880150255800)
          at fs/jbd2/commit.c:1049
      #5  0xffffffffa007b698 in kjournald2 (arg=0xffff880150255800) at fs/jbd2/journal.c:163
      #6  0xffffffff81090626 in kthread (_create=0xffff880160ab15c8) at kernel/kthread.c:78
      #7  0xffffffff8100c0ca in ?? () at arch/x86/kernel/entry_64.S:1211
      #8  0x0000000000000000 in ?? ()
      (gdb) p *((struct obd_device *) 0xffff88014fb182f8)
      $2 = {
        obd_type = 0xffff88019705ca40, 
        obd_magic = 2874988271, 
        obd_name = "lustre-MDT0001", '\000' <repeats 113 times>, 
        obd_uuid = {
          uuid = "lustre-MDT0001_UUID", '\000' <repeats 20 times>
        }, 
      ...
      (gdb) p *((struct obd_device *) 0xffff88016c536178)
      $3 = {
        obd_type = 0xffff88019705ca40, 
        obd_magic = 2874988271, 
        obd_name = "lustre-MDT0000", '\000' <repeats 113 times>, 
        obd_uuid = {
          uuid = "lustre-MDT0000_UUID", '\000' <repeats 20 times>
        }, 
      ...
      

      Attachments

        Issue Links

          Activity

            [LU-3233] tgt_cb_last_committed()) ASSERTION( ccb->llcc_exp->exp_obd == ccb->llcc_tgt->lut_obd ) failed:
            pjones Peter Jones made changes -
            Resolution New: Fixed [ 1 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]
            pjones Peter Jones added a comment -

            Landed for 2.4.1 and 2.5.0

            pjones Peter Jones added a comment - Landed for 2.4.1 and 2.5.0
            pjones Peter Jones made changes -
            Fix Version/s New: Lustre 2.4.1 [ 10294 ]
            jhammond John Hammond made changes -
            Link New: This issue is related to LU-3765 [ LU-3765 ]
            jhammond John Hammond added a comment -

            See Niu's patches to remove M_CHECK_STALE and make open by FID more sane.

            jhammond John Hammond added a comment - See Niu's patches to remove M_CHECK_STALE and make open by FID more sane.
            jhammond John Hammond added a comment -

            Both patches landed to master.

            Dropping to minor as Jinshan and I had discussed whether the logic of using M_CHECK_STALE and having the server return the FID of the new file (along with an open handle) really made any sense here. It appeared that someone planned an optimization around returning the new FID but it never got implemented. I believe that open might be made more robust by having the server ignore any supplied name if MDS_OPEN_BY_FID is set, and perhaps by having the client not supply a name when it sets MDS_OPEN_BY_FID. More analysis is needed however.

            jhammond John Hammond added a comment - Both patches landed to master. Dropping to minor as Jinshan and I had discussed whether the logic of using M_CHECK_STALE and having the server return the FID of the new file (along with an open handle) really made any sense here. It appeared that someone planned an optimization around returning the new FID but it never got implemented. I believe that open might be made more robust by having the server ignore any supplied name if MDS_OPEN_BY_FID is set, and perhaps by having the client not supply a name when it sets MDS_OPEN_BY_FID. More analysis is needed however.
            jhammond John Hammond made changes -
            Priority Original: Blocker [ 1 ] New: Minor [ 4 ]
            adilger Andreas Dilger made changes -
            Fix Version/s New: Lustre 2.5.0 [ 10295 ]
            jhammond John Hammond added a comment -

            Indeed. In any case I think I may have a reasonable fix for the MDT case without touching too much else.

            Please see http://review.whamcloud.com/6938.

            jhammond John Hammond added a comment - Indeed. In any case I think I may have a reasonable fix for the MDT case without touching too much else. Please see http://review.whamcloud.com/6938 .
            green Oleg Drokin added a comment -

            Right, there's normally an overtrust to those handles and underchecking their validness (mostly in locks), but granted most of the time it does not end up with such drastic crashes on server if you get the handle wrong.

            green Oleg Drokin added a comment - Right, there's normally an overtrust to those handles and underchecking their validness (mostly in locks), but granted most of the time it does not end up with such drastic crashes on server if you get the handle wrong.

            People

              jhammond John Hammond
              jhammond John Hammond
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: