Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9334

LBUG lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.10.0
    • Lustre 2.10.0
    • None
    • Soak stress cluster - Lustre-master build 3554 version=2.9.55_35_gaa32cc5
    • 3
    • 9223372036854775807

    Description

      After installing build 3554, attempting to re-mount the filesystem. On MGS/MDS node:

      Apr 13 14:14:42 soak-8 kernel: LDISKFS-fs (dm-5): recovery complete
      Apr 13 14:14:42 soak-8 kernel: LDISKFS-fs (dm-5): mounted filesystem with ordered data mode. Opts: user_xattr,user_xattr,errors=remount-ro,no_mbcache,nodelalloc
      Apr 13 14:14:44 soak-8 kernel: Lustre: MGS: Connection restored to 7703d69c-e9bc-c7f1-f9af-daf27057ff03 (at 0@lo)
      Apr 13 14:14:44 soak-8 kernel: Lustre: soaked-MDT0000: Imperative Recovery not enabled, recovery window 300-900
      Apr 13 14:14:44 soak-8 kernel: LustreError: 5015:0:(lfsck_layout.c:6780:lfsck_layout_setup()) soaked-MDT0000-osd: fail to init layout LFSCK component: rc = -20
      Apr 13 14:14:44 soak-8 kernel: LustreError: 5015:0:(mdd_device.c:1084:mdd_prepare()) soaked-MDD0000: failed to initialize lfsck: rc = -20
      Apr 13 14:14:44 soak-8 kernel: LustreError: 5015:0:(obd_mount_server.c:1840:server_fill_super()) Unable to start targets: -20
      Apr 13 14:14:44 soak-8 kernel: Lustre: Failing over soaked-MDT0000
      Apr 13 14:14:44 soak-8 kernel: LustreError: 5553:0:(osp_object.c:527:osp_attr_get()) soaked-MDT0001-osp-MDT0000:osp_attr_get update error [0x200000009:0x1:0x0]: rc = -5
      Apr 13 14:14:44 soak-8 kernel: LustreError: 5554:0:(lod_sub_object.c:959:lod_sub_prep_llog()) soaked-MDT0000-mdtlov: can't get id from catalogs: rc = -5
      Apr 13 14:14:44 soak-8 kernel: LustreError: 5554:0:(lod_dev.c:419:lod_sub_recovery_thread()) soaked-MDT0002-osp-MDT0000 getting update log failed: rc = -5
      Apr 13 14:14:44 soak-8 kernel: LustreError: 5553:0:(osp_object.c:527:osp_attr_get()) Skipped 2 previous similar messages
      Apr 13 14:14:44 soak-8 kernel: LustreError: 5015:0:(lu_object.c:1224:lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1
      Apr 13 14:14:44 soak-8 kernel: LustreError: 5015:0:(lu_object.c:1224:lu_device_fini()) LBUG
      Apr 13 14:14:44 soak-8 kernel: Pid: 5015, comm: mount.lustre
      Apr 13 14:14:44 soak-8 kernel: #012Call Trace:
      Apr 13 14:14:44 soak-8 kernel: [<ffffffffa0bcb7f3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs]
      Apr 13 14:14:44 soak-8 kernel: [<ffffffffa0bcb861>] lbug_with_loc+0x41/0xb0 [libcfs]
      Apr 13 14:14:44 soak-8 kernel: [<ffffffffa0d0fe48>] lu_device_fini+0xb8/0xc0 [obdclass]
      Apr 13 14:14:44 soak-8 kernel: [<ffffffffa0cf4b12>] ls_device_put+0x82/0x2a0 [obdclass]
      Apr 13 14:14:44 soak-8 kernel: [<ffffffffa0cf4e0d>] local_oid_storage_fini+0xdd/0x210 [obdclass]
      Apr 13 14:14:44 soak-8 kernel: [<ffffffffa129a308>] mgs_fs_cleanup+0x88/0xa0 [mgs]
      Apr 13 14:14:44 soak-8 kernel: [<ffffffffa1292b96>] mgs_device_fini+0x196/0x5b0 [mgs]
      Apr 13 14:14:44 soak-8 kernel: [<ffffffffa0cff724>] class_cleanup+0x7f4/0xd80 [obdclass]
      Apr 13 14:14:44 soak-8 kernel: [<ffffffffa0d02364>] class_process_config+0x1f84/0x2c30 [obdclass]
      Apr 13 14:14:44 soak-8 kernel: [<ffffffffa0cf1659>] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
      Apr 13 14:14:44 soak-8 kernel: [<ffffffffa0d030ff>] class_manual_cleanup+0xef/0x810 [obdclass]
      Apr 13 14:14:44 soak-8 kernel: [<ffffffffa0d332e0>] server_put_super+0xb20/0xcd0 [obdclass]
      Apr 13 14:14:44 soak-8 kernel: [<ffffffffa0d3708b>] server_fill_super+0xcdb/0x184c [obdclass]
      Apr 13 14:14:44 soak-8 kernel: [<ffffffffa0d0f1d8>] lustre_fill_super+0x328/0x950 [obdclass]
      Apr 13 14:14:44 soak-8 kernel: [<ffffffffa0d0eeb0>] ? lustre_fill_super+0x0/0x950 [obdclass]
      Apr 13 14:14:44 soak-8 kernel: [<ffffffff81201b1d>] mount_nodev+0x4d/0xb0
      Apr 13 14:14:44 soak-8 kernel: [<ffffffffa0d06c78>] lustre_mount+0x38/0x60 [obdclass]
      Apr 13 14:14:44 soak-8 kernel: [<ffffffff812024c9>] mount_fs+0x39/0x1b0
      Apr 13 14:14:44 soak-8 kernel: [<ffffffff8121e24f>] vfs_kern_mount+0x5f/0xf0
      Apr 13 14:14:44 soak-8 kernel: [<ffffffff812207ae>] do_mount+0x24e/0xaa0
      Apr 13 14:14:44 soak-8 kernel: [<ffffffff81185a7e>] ? __get_free_pages+0xe/0x50
      Apr 13 14:14:44 soak-8 kernel: [<ffffffff81221096>] SyS_mount+0x96/0xf0
      Apr 13 14:14:44 soak-8 kernel: [<ffffffff81696c49>] system_call_fastpath+0x16/0x1b
      Apr 13 14:14:45 soak-8 kernel:
      Apr 13 14:14:45 soak-8 kernel: Kernel panic - not syncing: LBUG
      

      vmcore-dmesg is attached, crash dump is available on soak-8

      Attachments

        1. 191916.dump.txt.gz
          259 kB
        2. 191952.dump.txt.gz
          61 kB
        3. 192002.dump.txt.gz
          25 kB
        4. 192012.dump.txt.gz
          17 kB
        5. 192022.dump.txt.gz
          17 kB
        6. 192032.dump.txt.gz
          19 kB
        7. 192042.dump.txt.gz
          3 kB
        8. 192052.dump.txt.gz
          18 kB
        9. 192102.dump.txt.gz
          18 kB
        10. 192112.dump.txt.gz
          19 kB
        11. vmcore-dmesg.txt
          129 kB

        Issue Links

          Activity

            [LU-9334] LBUG lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1
            pjones Peter Jones added a comment -

            Landed for 2.10

            pjones Peter Jones added a comment - Landed for 2.10

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/26716/
            Subject: LU-9334 lfsck: reset trace file for upgrade case
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: b3e30f49d3947c63b56274657a5c55af9ba85a2d

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/26716/ Subject: LU-9334 lfsck: reset trace file for upgrade case Project: fs/lustre-release Branch: master Current Patch Set: Commit: b3e30f49d3947c63b56274657a5c55af9ba85a2d
            yong.fan nasf (Inactive) added a comment - Still need the patch: https://review.whamcloud.com/26716

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/26703/
            Subject: LU-9334 lfsck: object leak in lfsck_load_one_trace_file
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 1b136613440bd81e284a12df97618f92c9729d71

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/26703/ Subject: LU-9334 lfsck: object leak in lfsck_load_one_trace_file Project: fs/lustre-release Branch: master Current Patch Set: Commit: 1b136613440bd81e284a12df97618f92c9729d71

            Jinshan Xiong (jinshan.xiong@intel.com) merged in patch https://review.whamcloud.com/26718/
            Subject: LU-9334 lfsck: reset trace file for upgrade case
            Project: fs/lustre-release
            Branch: pfl
            Current Patch Set:
            Commit: ba4a83aa9f72d7855ff77a37897619155313b198

            gerrit Gerrit Updater added a comment - Jinshan Xiong (jinshan.xiong@intel.com) merged in patch https://review.whamcloud.com/26718/ Subject: LU-9334 lfsck: reset trace file for upgrade case Project: fs/lustre-release Branch: pfl Current Patch Set: Commit: ba4a83aa9f72d7855ff77a37897619155313b198

            Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/26718
            Subject: LU-9334 lfsck: reset trace file for upgrade case
            Project: fs/lustre-release
            Branch: pfl
            Current Patch Set: 1
            Commit: 740e28ad8fad95080da4b0275562378ef0b51cd3

            gerrit Gerrit Updater added a comment - Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/26718 Subject: LU-9334 lfsck: reset trace file for upgrade case Project: fs/lustre-release Branch: pfl Current Patch Set: 1 Commit: 740e28ad8fad95080da4b0275562378ef0b51cd3

            Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/26716
            Subject: LU-9334 lfsck: reset trace file for upgrade case
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 48672c55bf83bbadec89bbd3406489251d701db0

            gerrit Gerrit Updater added a comment - Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/26716 Subject: LU-9334 lfsck: reset trace file for upgrade case Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 48672c55bf83bbadec89bbd3406489251d701db0

            Jinshan Xiong (jinshan.xiong@intel.com) merged in patch https://review.whamcloud.com/26715/
            Subject: LU-9334 lfsck: object leak in lfsck_load_one_trace_file
            Project: fs/lustre-release
            Branch: pfl
            Current Patch Set:
            Commit: e62a57f86027582c8c9b56a7d3d348fb38510f4b

            gerrit Gerrit Updater added a comment - Jinshan Xiong (jinshan.xiong@intel.com) merged in patch https://review.whamcloud.com/26715/ Subject: LU-9334 lfsck: object leak in lfsck_load_one_trace_file Project: fs/lustre-release Branch: pfl Current Patch Set: Commit: e62a57f86027582c8c9b56a7d3d348fb38510f4b

            Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: https://review.whamcloud.com/26715
            Subject: LU-9334 lfsck: object leak in lfsck_load_one_trace_file
            Project: fs/lustre-release
            Branch: pfl
            Current Patch Set: 1
            Commit: 806549e6ba5e96059cb41d006290bfc70ce58709

            gerrit Gerrit Updater added a comment - Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: https://review.whamcloud.com/26715 Subject: LU-9334 lfsck: object leak in lfsck_load_one_trace_file Project: fs/lustre-release Branch: pfl Current Patch Set: 1 Commit: 806549e6ba5e96059cb41d006290bfc70ce58709

            Tested latest patch

            Apr 18 16:26:55 soak-8 kernel: Lustre: MGS: Connection restored to c060ba9e-ea2b-a109-ec49-a8f45fcb1eaf (at 0@lo)
            Apr 18 16:26:56 soak-8 kernel: Lustre: soaked-MDT0000: Imperative Recovery not enabled, recovery window 300-900
            Apr 18 16:26:56 soak-8 kernel: LustreError: 12003:0:(lfsck_layout.c:6780:lfsck_layout_setup()) soaked-MDT0000-osd: fail to init layout LFSCK component: rc = -20
            Apr 18 16:26:56 soak-8 kernel: LustreError: 12003:0:(mdd_device.c:1084:mdd_prepare()) soaked-MDD0000: failed to initialize lfsck: rc = -20
            Apr 18 16:26:56 soak-8 kernel: LustreError: 12003:0:(obd_mount_server.c:1840:server_fill_super()) Unable to start targets: -20
            Apr 18 16:26:56 soak-8 kernel: Lustre: Failing over soaked-MDT0000
            Apr 18 16:26:56 soak-8 kernel: LustreError: 12458:0:(osp_object.c:527:osp_attr_get()) soaked-MDT0001-osp-MDT0000:osp_attr_get update error [0x200000009:0x1:0x0]: rc = -5
            Apr 18 16:26:56 soak-8 kernel: LustreError: 12459:0:(lod_sub_object.c:959:lod_sub_prep_llog()) soaked-MDT0000-mdtlov: can't get id from catalogs: rc = -5
            Apr 18 16:26:56 soak-8 kernel: LustreError: 12459:0:(lod_dev.c:419:lod_sub_recovery_thread()) soaked-MDT0002-osp-MDT0000 getting update log failed: rc = -5
            Apr 18 16:26:56 soak-8 kernel: LustreError: 12458:0:(osp_object.c:527:osp_attr_get()) Skipped 2 previous similar messages
            Apr 18 16:27:02 soak-8 kernel: Lustre: 12003:0:(client.c:2113:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1492532816/real 1492532816]  req@ffff8803fc650000 x1565032298054672/t0(0) o251->MGC192.168.1.108@o2ib10@0@lo:26/25 lens 224/224 e 0 to 1 dl 1492532822 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
            Apr 18 16:27:02 soak-8 kernel: Lustre: server umount soaked-MDT0000 complete
            Apr 18 16:27:02 soak-8 kernel: LustreError: 12003:0:(obd_mount.c:1502:lustre_fill_super()) Unable to mount  (-20)
            Apr 18 16:27:02 soak-8 sshd[11981]: Received disconnect from 192.168.1.135: 11: disconnected by user
            Apr 18 16:27:02 soak-8 sshd[11981]: pam_unix(sshd:session): session closed for user root
            
            

            fails

            cliffw Cliff White (Inactive) added a comment - Tested latest patch Apr 18 16:26:55 soak-8 kernel: Lustre: MGS: Connection restored to c060ba9e-ea2b-a109-ec49-a8f45fcb1eaf (at 0@lo) Apr 18 16:26:56 soak-8 kernel: Lustre: soaked-MDT0000: Imperative Recovery not enabled, recovery window 300-900 Apr 18 16:26:56 soak-8 kernel: LustreError: 12003:0:(lfsck_layout.c:6780:lfsck_layout_setup()) soaked-MDT0000-osd: fail to init layout LFSCK component: rc = -20 Apr 18 16:26:56 soak-8 kernel: LustreError: 12003:0:(mdd_device.c:1084:mdd_prepare()) soaked-MDD0000: failed to initialize lfsck: rc = -20 Apr 18 16:26:56 soak-8 kernel: LustreError: 12003:0:(obd_mount_server.c:1840:server_fill_super()) Unable to start targets: -20 Apr 18 16:26:56 soak-8 kernel: Lustre: Failing over soaked-MDT0000 Apr 18 16:26:56 soak-8 kernel: LustreError: 12458:0:(osp_object.c:527:osp_attr_get()) soaked-MDT0001-osp-MDT0000:osp_attr_get update error [0x200000009:0x1:0x0]: rc = -5 Apr 18 16:26:56 soak-8 kernel: LustreError: 12459:0:(lod_sub_object.c:959:lod_sub_prep_llog()) soaked-MDT0000-mdtlov: can't get id from catalogs: rc = -5 Apr 18 16:26:56 soak-8 kernel: LustreError: 12459:0:(lod_dev.c:419:lod_sub_recovery_thread()) soaked-MDT0002-osp-MDT0000 getting update log failed: rc = -5 Apr 18 16:26:56 soak-8 kernel: LustreError: 12458:0:(osp_object.c:527:osp_attr_get()) Skipped 2 previous similar messages Apr 18 16:27:02 soak-8 kernel: Lustre: 12003:0:(client.c:2113:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1492532816/real 1492532816] req@ffff8803fc650000 x1565032298054672/t0(0) o251->MGC192.168.1.108@o2ib10@0@lo:26/25 lens 224/224 e 0 to 1 dl 1492532822 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 Apr 18 16:27:02 soak-8 kernel: Lustre: server umount soaked-MDT0000 complete Apr 18 16:27:02 soak-8 kernel: LustreError: 12003:0:(obd_mount.c:1502:lustre_fill_super()) Unable to mount (-20) Apr 18 16:27:02 soak-8 sshd[11981]: Received disconnect from 192.168.1.135: 11: disconnected by user Apr 18 16:27:02 soak-8 sshd[11981]: pam_unix(sshd:session): session closed for user root fails

            People

              yong.fan nasf (Inactive)
              cliffw Cliff White (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: