Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7737

osd_handler.c:2777:osd_object_destroy()) ASSERTION( !lu_object_is_dying(dt->do_lu.lo_header)

Details

    • 3
    • 9223372036854775807

    Description

      This error might be a duplicate of LU-6699. Anyway as the bug occurred in conjunction with llog errors, it might related to the latest changes in DNE (change 16838).

      The error happens during soak testing of build '20160203' (see: https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160203). DNE is enable. MDTs had been formatted with ldiskfs, OSTs with zfs. MDSes are configured in actve-active HA failover configuration.

      The configuration for the HA pair in question reads as:

      • lola-8 - mdt-0, 1 (primary resources)
      • lola-9 - mdt-2, 3 (primary resources)

      During the umount (failback of resources) of mdt-3 on lola-8 the
      node crashed with LBUG:

      <0>LustreError: 5861:0:(osd_handler.c:2777:osd_object_destroy()) ASSERTION( !lu_object_is_dying(dt->do_lu.lo_header) ) fail
      ed: 
      <0>LustreError: 5861:0:(osd_handler.c:2777:osd_object_destroy()) LBUG
      <4>Pid: 5861, comm: umount
      <4>
      <4>Call Trace:
      <4> [<ffffffffa0772875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      <4> [<ffffffffa0772e77>] lbug_with_loc+0x47/0xb0 [libcfs]
      <4> [<ffffffffa1060fbb>] osd_object_destroy+0x52b/0x5b0 [osd_ldiskfs]
      <4> [<ffffffffa105e42d>] ? osd_object_ref_del+0x22d/0x4e0 [osd_ldiskfs]
      <4> [<ffffffffa0851dda>] llog_osd_destroy+0x1ba/0x9e0 [obdclass]
      <4> [<ffffffffa08417a6>] llog_destroy+0x2b6/0x470 [obdclass]
      <4> [<ffffffffa08438cb>] llog_cat_close+0x17b/0x220 [obdclass]
      <4> [<ffffffffa12b04e7>] lod_sub_fini_llog+0xb7/0x380 [lod]
      <4> [<ffffffff8109ec20>] ? autoremove_wake_function+0x0/0x40
      <4> [<ffffffffa12b35c4>] lod_process_config+0xbc4/0x1830 [lod]
      <4> [<ffffffffa111361f>] ? lfsck_stop+0x15f/0x4c0 [lfsck]
      <4> [<ffffffff8117523c>] ? __kmalloc+0x21c/0x230
      <4> [<ffffffff8109ec20>] ? autoremove_wake_function+0x0/0x40
      <4> [<ffffffffa1331474>] mdd_process_config+0x114/0x5d0 [mdd]
      <4> [<ffffffffa11db55e>] mdt_device_fini+0x3ee/0xf40 [mdt]
      <4> [<ffffffffa0860406>] ? class_disconnect_exports+0x116/0x2f0 [obdclass]
      <4> [<ffffffffa087a552>] class_cleanup+0x572/0xd20 [obdclass]
      <4> [<ffffffffa085b0c6>] ? class_name2dev+0x56/0xe0 [obdclass]
      <4> [<ffffffffa087cbd6>] class_process_config+0x1ed6/0x2830 [obdclass]
      <4> [<ffffffffa077dd01>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      <4> [<ffffffff8117523c>] ? __kmalloc+0x21c/0x230
      <4> [<ffffffffa087d9ef>] class_manual_cleanup+0x4bf/0x8e0 [obdclass]
      <4> [<ffffffffa085b0c6>] ? class_name2dev+0x56/0xe0 [obdclass]
      <4> [<ffffffffa08b610c>] server_put_super+0xa0c/0xed0 [obdclass]
      <4> [<ffffffff811ac776>] ? invalidate_inodes+0xf6/0x190
      <4> [<ffffffff81190b7b>] generic_shutdown_super+0x5b/0xe0
      <4> [<ffffffff81190c66>] kill_anon_super+0x16/0x60
      <4> [<ffffffffa08808a6>] lustre_kill_super+0x36/0x60 [obdclass]
      <4> [<ffffffff81191407>] deactivate_super+0x57/0x80
      <4> [<ffffffff811b10df>] mntput_no_expire+0xbf/0x110
      <4> [<ffffffff811b1c2b>] sys_umount+0x7b/0x3a0
      <4> [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
      <4>
      

      Also, immediately the following error was reported on lola-8:

      Feb  3 10:51:27 lola-8 kernel: LustreError: 5733:0:(llog.c:588:llog_process_thread()) soaked-MDT0006-osp-MDT0003 retry remo
      te llog process
      Feb  3 10:51:27 lola-8 kernel: LustreError: 5733:0:(lod_dev.c:419:lod_sub_recovery_thread()) soaked-MDT0006-osp-MDT0003 get
      ting update log failed: rc = -11
      Feb  3 10:51:27 lola-8 kernel: LustreError: 5727:0:(llog.c:595:llog_process_thread()) Local llog found corrupted
      Feb  3 10:51:27 lola-8 kernel: LustreError: 5730:0:(osp_object.c:588:osp_attr_get()) soaked-MDT0002-osp-MDT0003:osp_attr_ge
      t update error [0x200000009:0x2:0x0]: rc = -5
      Feb  3 10:51:27 lola-8 kernel: LustreError: 5730:0:(lod_sub_object.c:959:lod_sub_prep_llog()) soaked-MDT0003-mdtlov: can't get id from catalogs: rc = -5
      Feb  3 10:51:28 lola-8 kernel: LustreError: 5727:0:(llog.c:595:llog_process_thread()) Local llog found corrupted
      Feb  3 10:51:28 lola-8 kernel: LustreError: 5727:0:(llog.c:595:llog_process_thread()) Skipped 1 previous similar message
      Feb  3 10:51:29 lola-8 kernel: LustreError: 5727:0:(llog.c:595:llog_process_thread()) Local llog found corrupted
      Feb  3 10:51:29 lola-8 kernel: LustreError: 5727:0:(llog.c:595:llog_process_thread()) Skipped 1 previous similar message
      Feb  3 10:51:31 lola-8 kernel: Lustre: soaked-MDT0003: Not available for connect from 0@lo (stopping)
      Feb  3 10:51:32 lola-8 kernel: LustreError: 5727:0:(llog.c:595:llog_process_thread()) Local llog found corrupted
      Feb  3 10:51:32 lola-8 kernel: LustreError: 5727:0:(llog.c:595:llog_process_thread()) Skipped 2 previous similar messages
      Feb  3 10:51:35 lola-8 kernel: LustreError: 5861:0:(osd_handler.c:3291:osd_object_ref_del()) soaked-MDT0003-osd: nlink == 0 on [0x2c00042a3:0x15ec5:0x0], maybe an upgraded file? (LU-3915)
      

      The sequence of events are:

      • 2016-02-03 10:42:39 - failover started for lola-9
      • 2016-02-03 10:42:39 - lola-9 online again
      • 2016-02-03 10:51:26 - Failback of resoures (umount mdt-3)
      • 2016-02-03 10:51:35 - lola-8 hit LBUG

      Attached files:
      lola-8 messages, console, vmcore-dmesg.txt
      soak.log (for injected errors)
      Note:
      Crash dump have been created. I'll add the info about the storage location as soon as ticket is created.

      Info required for matching: sanity-quota 7c

      Attachments

        1. console-lola-8.log.bz2
          116 kB
        2. messages-lola-8.log.bz2
          78 kB
        3. soak.log.bz2
          42 kB
        4. vmcore-dmesg.txt.bz2
          27 kB

        Issue Links

          Activity

            [LU-7737] osd_handler.c:2777:osd_object_destroy()) ASSERTION( !lu_object_is_dying(dt->do_lu.lo_header)
            pjones Peter Jones added a comment -

            Thanks Alex so I'll re-resolve this

            pjones Peter Jones added a comment - Thanks Alex so I'll re-resolve this
            bzzz Alex Zhuravlev added a comment - - edited
            bzzz Alex Zhuravlev added a comment - - edited http://review.whamcloud.com/18362 got another ticket - LU-7772
            pjones Peter Jones added a comment -

            Oh! There is a second patch - sorry

            pjones Peter Jones added a comment - Oh! There is a second patch - sorry
            pjones Peter Jones added a comment -

            Landed for 2.8

            pjones Peter Jones added a comment - Landed for 2.8

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18308/
            Subject: LU-7737 lod: not return -EIO during process updates log
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 028e65b03dac9497256978d2266acb8c20b48a99

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18308/ Subject: LU-7737 lod: not return -EIO during process updates log Project: fs/lustre-release Branch: master Current Patch Set: Commit: 028e65b03dac9497256978d2266acb8c20b48a99
            bogl Bob Glossman (Inactive) added a comment - another on master: https://testing.hpdd.intel.com/test_sets/f9c7e7e6-cee1-11e5-b578-5254006e85c2

            Alex Zhuravlev (alexey.zhuravlev@intel.com) uploaded a new patch: http://review.whamcloud.com/18362
            Subject: LU-7737 llog: do not destroy llog twice
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 866689144744cae95116b69a992abbfaca806517

            gerrit Gerrit Updater added a comment - Alex Zhuravlev (alexey.zhuravlev@intel.com) uploaded a new patch: http://review.whamcloud.com/18362 Subject: LU-7737 llog: do not destroy llog twice Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 866689144744cae95116b69a992abbfaca806517

            I think we should also avoid the case when we try to destroy llog twice: in llog_process() (due to cancels on an error) and in llog_cat_close(). the patch is coming..

            bzzz Alex Zhuravlev added a comment - I think we should also avoid the case when we try to destroy llog twice: in llog_process() (due to cancels on an error) and in llog_cat_close(). the patch is coming..
            bogl Bob Glossman (Inactive) added a comment - another on master: https://testing.hpdd.intel.com/test_sets/6f228e60-cd48-11e5-b1fa-5254006e85c2
            rhenwood Richard Henwood (Inactive) added a comment - And another over the weekend: https://testing.hpdd.intel.com/test_sets/8c4963c2-cc3c-11e5-b2cb-5254006e85c2

            People

              di.wang Di Wang
              heckes Frank Heckes (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: