Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7737

osd_handler.c:2777:osd_object_destroy()) ASSERTION( !lu_object_is_dying(dt->do_lu.lo_header)

    XMLWordPrintable

Details

    • 3
    • 9223372036854775807

    Description

      This error might be a duplicate of LU-6699. Anyway as the bug occurred in conjunction with llog errors, it might related to the latest changes in DNE (change 16838).

      The error happens during soak testing of build '20160203' (see: https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160203). DNE is enable. MDTs had been formatted with ldiskfs, OSTs with zfs. MDSes are configured in actve-active HA failover configuration.

      The configuration for the HA pair in question reads as:

      • lola-8 - mdt-0, 1 (primary resources)
      • lola-9 - mdt-2, 3 (primary resources)

      During the umount (failback of resources) of mdt-3 on lola-8 the
      node crashed with LBUG:

      <0>LustreError: 5861:0:(osd_handler.c:2777:osd_object_destroy()) ASSERTION( !lu_object_is_dying(dt->do_lu.lo_header) ) fail
      ed: 
      <0>LustreError: 5861:0:(osd_handler.c:2777:osd_object_destroy()) LBUG
      <4>Pid: 5861, comm: umount
      <4>
      <4>Call Trace:
      <4> [<ffffffffa0772875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      <4> [<ffffffffa0772e77>] lbug_with_loc+0x47/0xb0 [libcfs]
      <4> [<ffffffffa1060fbb>] osd_object_destroy+0x52b/0x5b0 [osd_ldiskfs]
      <4> [<ffffffffa105e42d>] ? osd_object_ref_del+0x22d/0x4e0 [osd_ldiskfs]
      <4> [<ffffffffa0851dda>] llog_osd_destroy+0x1ba/0x9e0 [obdclass]
      <4> [<ffffffffa08417a6>] llog_destroy+0x2b6/0x470 [obdclass]
      <4> [<ffffffffa08438cb>] llog_cat_close+0x17b/0x220 [obdclass]
      <4> [<ffffffffa12b04e7>] lod_sub_fini_llog+0xb7/0x380 [lod]
      <4> [<ffffffff8109ec20>] ? autoremove_wake_function+0x0/0x40
      <4> [<ffffffffa12b35c4>] lod_process_config+0xbc4/0x1830 [lod]
      <4> [<ffffffffa111361f>] ? lfsck_stop+0x15f/0x4c0 [lfsck]
      <4> [<ffffffff8117523c>] ? __kmalloc+0x21c/0x230
      <4> [<ffffffff8109ec20>] ? autoremove_wake_function+0x0/0x40
      <4> [<ffffffffa1331474>] mdd_process_config+0x114/0x5d0 [mdd]
      <4> [<ffffffffa11db55e>] mdt_device_fini+0x3ee/0xf40 [mdt]
      <4> [<ffffffffa0860406>] ? class_disconnect_exports+0x116/0x2f0 [obdclass]
      <4> [<ffffffffa087a552>] class_cleanup+0x572/0xd20 [obdclass]
      <4> [<ffffffffa085b0c6>] ? class_name2dev+0x56/0xe0 [obdclass]
      <4> [<ffffffffa087cbd6>] class_process_config+0x1ed6/0x2830 [obdclass]
      <4> [<ffffffffa077dd01>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      <4> [<ffffffff8117523c>] ? __kmalloc+0x21c/0x230
      <4> [<ffffffffa087d9ef>] class_manual_cleanup+0x4bf/0x8e0 [obdclass]
      <4> [<ffffffffa085b0c6>] ? class_name2dev+0x56/0xe0 [obdclass]
      <4> [<ffffffffa08b610c>] server_put_super+0xa0c/0xed0 [obdclass]
      <4> [<ffffffff811ac776>] ? invalidate_inodes+0xf6/0x190
      <4> [<ffffffff81190b7b>] generic_shutdown_super+0x5b/0xe0
      <4> [<ffffffff81190c66>] kill_anon_super+0x16/0x60
      <4> [<ffffffffa08808a6>] lustre_kill_super+0x36/0x60 [obdclass]
      <4> [<ffffffff81191407>] deactivate_super+0x57/0x80
      <4> [<ffffffff811b10df>] mntput_no_expire+0xbf/0x110
      <4> [<ffffffff811b1c2b>] sys_umount+0x7b/0x3a0
      <4> [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
      <4>
      

      Also, immediately the following error was reported on lola-8:

      Feb  3 10:51:27 lola-8 kernel: LustreError: 5733:0:(llog.c:588:llog_process_thread()) soaked-MDT0006-osp-MDT0003 retry remo
      te llog process
      Feb  3 10:51:27 lola-8 kernel: LustreError: 5733:0:(lod_dev.c:419:lod_sub_recovery_thread()) soaked-MDT0006-osp-MDT0003 get
      ting update log failed: rc = -11
      Feb  3 10:51:27 lola-8 kernel: LustreError: 5727:0:(llog.c:595:llog_process_thread()) Local llog found corrupted
      Feb  3 10:51:27 lola-8 kernel: LustreError: 5730:0:(osp_object.c:588:osp_attr_get()) soaked-MDT0002-osp-MDT0003:osp_attr_ge
      t update error [0x200000009:0x2:0x0]: rc = -5
      Feb  3 10:51:27 lola-8 kernel: LustreError: 5730:0:(lod_sub_object.c:959:lod_sub_prep_llog()) soaked-MDT0003-mdtlov: can't get id from catalogs: rc = -5
      Feb  3 10:51:28 lola-8 kernel: LustreError: 5727:0:(llog.c:595:llog_process_thread()) Local llog found corrupted
      Feb  3 10:51:28 lola-8 kernel: LustreError: 5727:0:(llog.c:595:llog_process_thread()) Skipped 1 previous similar message
      Feb  3 10:51:29 lola-8 kernel: LustreError: 5727:0:(llog.c:595:llog_process_thread()) Local llog found corrupted
      Feb  3 10:51:29 lola-8 kernel: LustreError: 5727:0:(llog.c:595:llog_process_thread()) Skipped 1 previous similar message
      Feb  3 10:51:31 lola-8 kernel: Lustre: soaked-MDT0003: Not available for connect from 0@lo (stopping)
      Feb  3 10:51:32 lola-8 kernel: LustreError: 5727:0:(llog.c:595:llog_process_thread()) Local llog found corrupted
      Feb  3 10:51:32 lola-8 kernel: LustreError: 5727:0:(llog.c:595:llog_process_thread()) Skipped 2 previous similar messages
      Feb  3 10:51:35 lola-8 kernel: LustreError: 5861:0:(osd_handler.c:3291:osd_object_ref_del()) soaked-MDT0003-osd: nlink == 0 on [0x2c00042a3:0x15ec5:0x0], maybe an upgraded file? (LU-3915)
      

      The sequence of events are:

      • 2016-02-03 10:42:39 - failover started for lola-9
      • 2016-02-03 10:42:39 - lola-9 online again
      • 2016-02-03 10:51:26 - Failback of resoures (umount mdt-3)
      • 2016-02-03 10:51:35 - lola-8 hit LBUG

      Attached files:
      lola-8 messages, console, vmcore-dmesg.txt
      soak.log (for injected errors)
      Note:
      Crash dump have been created. I'll add the info about the storage location as soon as ticket is created.

      Info required for matching: sanity-quota 7c

      Attachments

        1. vmcore-dmesg.txt.bz2
          27 kB
        2. soak.log.bz2
          42 kB
        3. messages-lola-8.log.bz2
          78 kB
        4. console-lola-8.log.bz2
          116 kB

        Issue Links

          Activity

            People

              di.wang Di Wang
              heckes Frank Heckes (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: