Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8412

Intel CAS testing umount triggers lu_object.c:1224:lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 3

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.7.0
    • None
    • CentOS-6.8
      kernel-2.6.32-573.26.1.el6.20160517.x86_64.lustre272
      lustre-2.7.2-1nasS_mofed32v1_2.6.32_573.26.1.el6.20160517.x86_64.lustre272.x86_64
    • 3
    • 9223372036854775807

    Description

      While testing OST caching with Intel CAS software, a stop of the targets (umount) triggered the following LBUG:

      <4>Lustre: Failing over fscache-OST000b
      <4>Lustre: Skipped 1 previous similar message
      <3>LustreError: Skipped 3 previous similar messages
      <0>LustreError: 57144:0:(lu_object.c:1224:lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 3
      <0>LustreError: 57144:0:(lu_object.c:1224:lu_device_fini()) LBUG
      <4>Pid: 57144, comm: umount
      <4>
      <4>Call Trace:
      <4> [<ffffffffa05aa895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      <4> [<ffffffffa05aae97>] lbug_with_loc+0x47/0xb0 [libcfs]
      <4> [<ffffffffa070391b>] lu_device_fini+0xbb/0xc0 [obdclass]
      <4> [<ffffffffa06e38bd>] ls_device_put+0x7d/0x2e0 [obdclass]
      <4> [<ffffffffa06e3c92>] local_oid_storage_fini+0x172/0x410 [obdclass]
      <4> [<ffffffffa13cb2af>] lfsck_instance_cleanup+0x20f/0x7e0 [lfsck]
      <4> [<ffffffffa13cc0eb>] lfsck_degister+0x4b/0x60 [lfsck]
      <4> [<ffffffffa1493e67>] ofd_device_fini+0x87/0x260 [ofd]
      <4> [<ffffffffa06f4122>] class_cleanup+0x562/0xd20 [obdclass]
      <4> [<ffffffffa06d1216>] ? class_name2dev+0x56/0xe0 [obdclass]
      <4> [<ffffffffa06f5e4a>] class_process_config+0x156a/0x1ad0 [obdclass]
      <4> [<ffffffffa06ee205>] ? lustre_cfg_new+0x435/0x630 [obdclass]
      <4> [<ffffffffa06f6525>] class_manual_cleanup+0x175/0x4c0 [obdclass]
      <4> [<ffffffffa06d1216>] ? class_name2dev+0x56/0xe0 [obdclass]
      <4> [<ffffffffa0735597>] server_put_super+0xcf7/0x1060 [obdclass]
      <4> [<ffffffff811ad166>] ? invalidate_inodes+0xf6/0x190
      <4> [<ffffffff8119127b>] generic_shutdown_super+0x5b/0xe0
      <4> [<ffffffff81191366>] kill_anon_super+0x16/0x60
      <4> [<ffffffffa06f80b6>] lustre_kill_super+0x36/0x60 [obdclass]
      <4> [<ffffffff81191b07>] deactivate_super+0x57/0x80
      <4> [<ffffffff811b1acf>] mntput_no_expire+0xbf/0x110
      <4> [<ffffffff811b261b>] sys_umount+0x7b/0x3a0
      <4> [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
      <4>
      <0>Kernel panic - not syncing: LBUG
      <4>Pid: 57144, comm: umount Tainted: G        W  -- ------------  T 2.6.32-573.26.1.el6.20160517.x86_64.lustre272 #1
      <4>Call Trace:
      <4> [<ffffffff8157370a>] ? panic+0xa7/0x190
      <4> [<ffffffffa05aaeeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
      <4> [<ffffffffa070391b>] ? lu_device_fini+0xbb/0xc0 [obdclass]
      <4> [<ffffffffa06e38bd>] ? ls_device_put+0x7d/0x2e0 [obdclass]
      <4> [<ffffffffa06e3c92>] ? local_oid_storage_fini+0x172/0x410 [obdclass]
      <4> [<ffffffffa13cb2af>] ? lfsck_instance_cleanup+0x20f/0x7e0 [lfsck]
      <4> [<ffffffffa13cc0eb>] ? lfsck_degister+0x4b/0x60 [lfsck]
      <4> [<ffffffffa1493e67>] ? ofd_device_fini+0x87/0x260 [ofd]
      <4> [<ffffffffa06f4122>] ? class_cleanup+0x562/0xd20 [obdclass]
      <4> [<ffffffffa06d1216>] ? class_name2dev+0x56/0xe0 [obdclass]
      <4> [<ffffffffa06f5e4a>] ? class_process_config+0x156a/0x1ad0 [obdclass]
      <4> [<ffffffffa06ee205>] ? lustre_cfg_new+0x435/0x630 [obdclass]
      <4> [<ffffffffa06f6525>] ? class_manual_cleanup+0x175/0x4c0 [obdclass]
      <4> [<ffffffffa06d1216>] ? class_name2dev+0x56/0xe0 [obdclass]
      <4> [<ffffffffa0735597>] ? server_put_super+0xcf7/0x1060 [obdclass]
      <4> [<ffffffff811ad166>] ? invalidate_inodes+0xf6/0x190
      <4> [<ffffffff8119127b>] ? generic_shutdown_super+0x5b/0xe0
      <4> [<ffffffff81191366>] ? kill_anon_super+0x16/0x60
      <4> [<ffffffffa06f80b6>] ? lustre_kill_super+0x36/0x60 [obdclass]
      <4> [<ffffffff81191b07>] ? deactivate_super+0x57/0x80
      <4> [<ffffffff811b1acf>] ? mntput_no_expire+0xbf/0x110
      <4> [<ffffffff811b261b>] ? sys_umount+0x7b/0x3a0
      <4> [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b
      

      Attachments

        Issue Links

          Activity

            [LU-8412] Intel CAS testing umount triggers lu_object.c:1224:lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 3
            pjones Peter Jones added a comment -

            Nathan

            It looks like it. I'll follow up

            Peter

            pjones Peter Jones added a comment - Nathan It looks like it. I'll follow up Peter

            Is the port of this fix to the 2.7 FE branch just stuck waiting on one more code review?

            ndauchy Nathan Dauchy (Inactive) added a comment - Is the port of this fix to the 2.7 FE branch just stuck waiting on one more code review?
            bogl Bob Glossman (Inactive) added a comment - port in flight: http://review.whamcloud.com/21402
            pjones Peter Jones added a comment -

            Bob

            Triage agrees that this looks like LU-7038. Could you please port that fix to the 2.7 FE branch

            Thanks

            Peter

            pjones Peter Jones added a comment - Bob Triage agrees that this looks like LU-7038 . Could you please port that fix to the 2.7 FE branch Thanks Peter

            This looks very similar to LU-7038, but opened new ticket for NAS tracking. We may just need a backport of the fix to 2.7.2.

            ndauchy Nathan Dauchy (Inactive) added a comment - This looks very similar to LU-7038 , but opened new ticket for NAS tracking. We may just need a backport of the fix to 2.7.2.

            People

              bogl Bob Glossman (Inactive)
              ndauchy Nathan Dauchy (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: