Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9334

LBUG lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.10.0
    • Lustre 2.10.0
    • None
    • Soak stress cluster - Lustre-master build 3554 version=2.9.55_35_gaa32cc5
    • 3
    • 9223372036854775807

    Description

      After installing build 3554, attempting to re-mount the filesystem. On MGS/MDS node:

      Apr 13 14:14:42 soak-8 kernel: LDISKFS-fs (dm-5): recovery complete
      Apr 13 14:14:42 soak-8 kernel: LDISKFS-fs (dm-5): mounted filesystem with ordered data mode. Opts: user_xattr,user_xattr,errors=remount-ro,no_mbcache,nodelalloc
      Apr 13 14:14:44 soak-8 kernel: Lustre: MGS: Connection restored to 7703d69c-e9bc-c7f1-f9af-daf27057ff03 (at 0@lo)
      Apr 13 14:14:44 soak-8 kernel: Lustre: soaked-MDT0000: Imperative Recovery not enabled, recovery window 300-900
      Apr 13 14:14:44 soak-8 kernel: LustreError: 5015:0:(lfsck_layout.c:6780:lfsck_layout_setup()) soaked-MDT0000-osd: fail to init layout LFSCK component: rc = -20
      Apr 13 14:14:44 soak-8 kernel: LustreError: 5015:0:(mdd_device.c:1084:mdd_prepare()) soaked-MDD0000: failed to initialize lfsck: rc = -20
      Apr 13 14:14:44 soak-8 kernel: LustreError: 5015:0:(obd_mount_server.c:1840:server_fill_super()) Unable to start targets: -20
      Apr 13 14:14:44 soak-8 kernel: Lustre: Failing over soaked-MDT0000
      Apr 13 14:14:44 soak-8 kernel: LustreError: 5553:0:(osp_object.c:527:osp_attr_get()) soaked-MDT0001-osp-MDT0000:osp_attr_get update error [0x200000009:0x1:0x0]: rc = -5
      Apr 13 14:14:44 soak-8 kernel: LustreError: 5554:0:(lod_sub_object.c:959:lod_sub_prep_llog()) soaked-MDT0000-mdtlov: can't get id from catalogs: rc = -5
      Apr 13 14:14:44 soak-8 kernel: LustreError: 5554:0:(lod_dev.c:419:lod_sub_recovery_thread()) soaked-MDT0002-osp-MDT0000 getting update log failed: rc = -5
      Apr 13 14:14:44 soak-8 kernel: LustreError: 5553:0:(osp_object.c:527:osp_attr_get()) Skipped 2 previous similar messages
      Apr 13 14:14:44 soak-8 kernel: LustreError: 5015:0:(lu_object.c:1224:lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1
      Apr 13 14:14:44 soak-8 kernel: LustreError: 5015:0:(lu_object.c:1224:lu_device_fini()) LBUG
      Apr 13 14:14:44 soak-8 kernel: Pid: 5015, comm: mount.lustre
      Apr 13 14:14:44 soak-8 kernel: #012Call Trace:
      Apr 13 14:14:44 soak-8 kernel: [<ffffffffa0bcb7f3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs]
      Apr 13 14:14:44 soak-8 kernel: [<ffffffffa0bcb861>] lbug_with_loc+0x41/0xb0 [libcfs]
      Apr 13 14:14:44 soak-8 kernel: [<ffffffffa0d0fe48>] lu_device_fini+0xb8/0xc0 [obdclass]
      Apr 13 14:14:44 soak-8 kernel: [<ffffffffa0cf4b12>] ls_device_put+0x82/0x2a0 [obdclass]
      Apr 13 14:14:44 soak-8 kernel: [<ffffffffa0cf4e0d>] local_oid_storage_fini+0xdd/0x210 [obdclass]
      Apr 13 14:14:44 soak-8 kernel: [<ffffffffa129a308>] mgs_fs_cleanup+0x88/0xa0 [mgs]
      Apr 13 14:14:44 soak-8 kernel: [<ffffffffa1292b96>] mgs_device_fini+0x196/0x5b0 [mgs]
      Apr 13 14:14:44 soak-8 kernel: [<ffffffffa0cff724>] class_cleanup+0x7f4/0xd80 [obdclass]
      Apr 13 14:14:44 soak-8 kernel: [<ffffffffa0d02364>] class_process_config+0x1f84/0x2c30 [obdclass]
      Apr 13 14:14:44 soak-8 kernel: [<ffffffffa0cf1659>] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
      Apr 13 14:14:44 soak-8 kernel: [<ffffffffa0d030ff>] class_manual_cleanup+0xef/0x810 [obdclass]
      Apr 13 14:14:44 soak-8 kernel: [<ffffffffa0d332e0>] server_put_super+0xb20/0xcd0 [obdclass]
      Apr 13 14:14:44 soak-8 kernel: [<ffffffffa0d3708b>] server_fill_super+0xcdb/0x184c [obdclass]
      Apr 13 14:14:44 soak-8 kernel: [<ffffffffa0d0f1d8>] lustre_fill_super+0x328/0x950 [obdclass]
      Apr 13 14:14:44 soak-8 kernel: [<ffffffffa0d0eeb0>] ? lustre_fill_super+0x0/0x950 [obdclass]
      Apr 13 14:14:44 soak-8 kernel: [<ffffffff81201b1d>] mount_nodev+0x4d/0xb0
      Apr 13 14:14:44 soak-8 kernel: [<ffffffffa0d06c78>] lustre_mount+0x38/0x60 [obdclass]
      Apr 13 14:14:44 soak-8 kernel: [<ffffffff812024c9>] mount_fs+0x39/0x1b0
      Apr 13 14:14:44 soak-8 kernel: [<ffffffff8121e24f>] vfs_kern_mount+0x5f/0xf0
      Apr 13 14:14:44 soak-8 kernel: [<ffffffff812207ae>] do_mount+0x24e/0xaa0
      Apr 13 14:14:44 soak-8 kernel: [<ffffffff81185a7e>] ? __get_free_pages+0xe/0x50
      Apr 13 14:14:44 soak-8 kernel: [<ffffffff81221096>] SyS_mount+0x96/0xf0
      Apr 13 14:14:44 soak-8 kernel: [<ffffffff81696c49>] system_call_fastpath+0x16/0x1b
      Apr 13 14:14:45 soak-8 kernel:
      Apr 13 14:14:45 soak-8 kernel: Kernel panic - not syncing: LBUG
      

      vmcore-dmesg is attached, crash dump is available on soak-8

      Attachments

        1. 191916.dump.txt.gz
          259 kB
          Cliff White
        2. 191952.dump.txt.gz
          61 kB
          Cliff White
        3. 192002.dump.txt.gz
          25 kB
          Cliff White
        4. 192012.dump.txt.gz
          17 kB
          Cliff White
        5. 192022.dump.txt.gz
          17 kB
          Cliff White
        6. 192032.dump.txt.gz
          19 kB
          Cliff White
        7. 192042.dump.txt.gz
          3 kB
          Cliff White
        8. 192052.dump.txt.gz
          18 kB
          Cliff White
        9. 192102.dump.txt.gz
          18 kB
          Cliff White
        10. 192112.dump.txt.gz
          19 kB
          Cliff White
        11. vmcore-dmesg.txt
          129 kB
          Cliff White

        Issue Links

          Activity

            [LU-9334] LBUG lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1
            pjones Peter Jones added a comment -

            Landed for 2.10

            pjones Peter Jones added a comment - Landed for 2.10

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/26716/
            Subject: LU-9334 lfsck: reset trace file for upgrade case
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: b3e30f49d3947c63b56274657a5c55af9ba85a2d

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/26716/ Subject: LU-9334 lfsck: reset trace file for upgrade case Project: fs/lustre-release Branch: master Current Patch Set: Commit: b3e30f49d3947c63b56274657a5c55af9ba85a2d
            yong.fan nasf (Inactive) added a comment - Still need the patch: https://review.whamcloud.com/26716

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/26703/
            Subject: LU-9334 lfsck: object leak in lfsck_load_one_trace_file
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 1b136613440bd81e284a12df97618f92c9729d71

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/26703/ Subject: LU-9334 lfsck: object leak in lfsck_load_one_trace_file Project: fs/lustre-release Branch: master Current Patch Set: Commit: 1b136613440bd81e284a12df97618f92c9729d71

            Jinshan Xiong (jinshan.xiong@intel.com) merged in patch https://review.whamcloud.com/26718/
            Subject: LU-9334 lfsck: reset trace file for upgrade case
            Project: fs/lustre-release
            Branch: pfl
            Current Patch Set:
            Commit: ba4a83aa9f72d7855ff77a37897619155313b198

            gerrit Gerrit Updater added a comment - Jinshan Xiong (jinshan.xiong@intel.com) merged in patch https://review.whamcloud.com/26718/ Subject: LU-9334 lfsck: reset trace file for upgrade case Project: fs/lustre-release Branch: pfl Current Patch Set: Commit: ba4a83aa9f72d7855ff77a37897619155313b198

            Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/26718
            Subject: LU-9334 lfsck: reset trace file for upgrade case
            Project: fs/lustre-release
            Branch: pfl
            Current Patch Set: 1
            Commit: 740e28ad8fad95080da4b0275562378ef0b51cd3

            gerrit Gerrit Updater added a comment - Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/26718 Subject: LU-9334 lfsck: reset trace file for upgrade case Project: fs/lustre-release Branch: pfl Current Patch Set: 1 Commit: 740e28ad8fad95080da4b0275562378ef0b51cd3

            Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/26716
            Subject: LU-9334 lfsck: reset trace file for upgrade case
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 48672c55bf83bbadec89bbd3406489251d701db0

            gerrit Gerrit Updater added a comment - Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/26716 Subject: LU-9334 lfsck: reset trace file for upgrade case Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 48672c55bf83bbadec89bbd3406489251d701db0

            Jinshan Xiong (jinshan.xiong@intel.com) merged in patch https://review.whamcloud.com/26715/
            Subject: LU-9334 lfsck: object leak in lfsck_load_one_trace_file
            Project: fs/lustre-release
            Branch: pfl
            Current Patch Set:
            Commit: e62a57f86027582c8c9b56a7d3d348fb38510f4b

            gerrit Gerrit Updater added a comment - Jinshan Xiong (jinshan.xiong@intel.com) merged in patch https://review.whamcloud.com/26715/ Subject: LU-9334 lfsck: object leak in lfsck_load_one_trace_file Project: fs/lustre-release Branch: pfl Current Patch Set: Commit: e62a57f86027582c8c9b56a7d3d348fb38510f4b

            Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: https://review.whamcloud.com/26715
            Subject: LU-9334 lfsck: object leak in lfsck_load_one_trace_file
            Project: fs/lustre-release
            Branch: pfl
            Current Patch Set: 1
            Commit: 806549e6ba5e96059cb41d006290bfc70ce58709

            gerrit Gerrit Updater added a comment - Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: https://review.whamcloud.com/26715 Subject: LU-9334 lfsck: object leak in lfsck_load_one_trace_file Project: fs/lustre-release Branch: pfl Current Patch Set: 1 Commit: 806549e6ba5e96059cb41d006290bfc70ce58709

            Tested latest patch

            Apr 18 16:26:55 soak-8 kernel: Lustre: MGS: Connection restored to c060ba9e-ea2b-a109-ec49-a8f45fcb1eaf (at 0@lo)
            Apr 18 16:26:56 soak-8 kernel: Lustre: soaked-MDT0000: Imperative Recovery not enabled, recovery window 300-900
            Apr 18 16:26:56 soak-8 kernel: LustreError: 12003:0:(lfsck_layout.c:6780:lfsck_layout_setup()) soaked-MDT0000-osd: fail to init layout LFSCK component: rc = -20
            Apr 18 16:26:56 soak-8 kernel: LustreError: 12003:0:(mdd_device.c:1084:mdd_prepare()) soaked-MDD0000: failed to initialize lfsck: rc = -20
            Apr 18 16:26:56 soak-8 kernel: LustreError: 12003:0:(obd_mount_server.c:1840:server_fill_super()) Unable to start targets: -20
            Apr 18 16:26:56 soak-8 kernel: Lustre: Failing over soaked-MDT0000
            Apr 18 16:26:56 soak-8 kernel: LustreError: 12458:0:(osp_object.c:527:osp_attr_get()) soaked-MDT0001-osp-MDT0000:osp_attr_get update error [0x200000009:0x1:0x0]: rc = -5
            Apr 18 16:26:56 soak-8 kernel: LustreError: 12459:0:(lod_sub_object.c:959:lod_sub_prep_llog()) soaked-MDT0000-mdtlov: can't get id from catalogs: rc = -5
            Apr 18 16:26:56 soak-8 kernel: LustreError: 12459:0:(lod_dev.c:419:lod_sub_recovery_thread()) soaked-MDT0002-osp-MDT0000 getting update log failed: rc = -5
            Apr 18 16:26:56 soak-8 kernel: LustreError: 12458:0:(osp_object.c:527:osp_attr_get()) Skipped 2 previous similar messages
            Apr 18 16:27:02 soak-8 kernel: Lustre: 12003:0:(client.c:2113:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1492532816/real 1492532816]  req@ffff8803fc650000 x1565032298054672/t0(0) o251->MGC192.168.1.108@o2ib10@0@lo:26/25 lens 224/224 e 0 to 1 dl 1492532822 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
            Apr 18 16:27:02 soak-8 kernel: Lustre: server umount soaked-MDT0000 complete
            Apr 18 16:27:02 soak-8 kernel: LustreError: 12003:0:(obd_mount.c:1502:lustre_fill_super()) Unable to mount  (-20)
            Apr 18 16:27:02 soak-8 sshd[11981]: Received disconnect from 192.168.1.135: 11: disconnected by user
            Apr 18 16:27:02 soak-8 sshd[11981]: pam_unix(sshd:session): session closed for user root
            
            

            fails

            cliffw Cliff White (Inactive) added a comment - Tested latest patch Apr 18 16:26:55 soak-8 kernel: Lustre: MGS: Connection restored to c060ba9e-ea2b-a109-ec49-a8f45fcb1eaf (at 0@lo) Apr 18 16:26:56 soak-8 kernel: Lustre: soaked-MDT0000: Imperative Recovery not enabled, recovery window 300-900 Apr 18 16:26:56 soak-8 kernel: LustreError: 12003:0:(lfsck_layout.c:6780:lfsck_layout_setup()) soaked-MDT0000-osd: fail to init layout LFSCK component: rc = -20 Apr 18 16:26:56 soak-8 kernel: LustreError: 12003:0:(mdd_device.c:1084:mdd_prepare()) soaked-MDD0000: failed to initialize lfsck: rc = -20 Apr 18 16:26:56 soak-8 kernel: LustreError: 12003:0:(obd_mount_server.c:1840:server_fill_super()) Unable to start targets: -20 Apr 18 16:26:56 soak-8 kernel: Lustre: Failing over soaked-MDT0000 Apr 18 16:26:56 soak-8 kernel: LustreError: 12458:0:(osp_object.c:527:osp_attr_get()) soaked-MDT0001-osp-MDT0000:osp_attr_get update error [0x200000009:0x1:0x0]: rc = -5 Apr 18 16:26:56 soak-8 kernel: LustreError: 12459:0:(lod_sub_object.c:959:lod_sub_prep_llog()) soaked-MDT0000-mdtlov: can't get id from catalogs: rc = -5 Apr 18 16:26:56 soak-8 kernel: LustreError: 12459:0:(lod_dev.c:419:lod_sub_recovery_thread()) soaked-MDT0002-osp-MDT0000 getting update log failed: rc = -5 Apr 18 16:26:56 soak-8 kernel: LustreError: 12458:0:(osp_object.c:527:osp_attr_get()) Skipped 2 previous similar messages Apr 18 16:27:02 soak-8 kernel: Lustre: 12003:0:(client.c:2113:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1492532816/real 1492532816] req@ffff8803fc650000 x1565032298054672/t0(0) o251->MGC192.168.1.108@o2ib10@0@lo:26/25 lens 224/224 e 0 to 1 dl 1492532822 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 Apr 18 16:27:02 soak-8 kernel: Lustre: server umount soaked-MDT0000 complete Apr 18 16:27:02 soak-8 kernel: LustreError: 12003:0:(obd_mount.c:1502:lustre_fill_super()) Unable to mount (-20) Apr 18 16:27:02 soak-8 sshd[11981]: Received disconnect from 192.168.1.135: 11: disconnected by user Apr 18 16:27:02 soak-8 sshd[11981]: pam_unix(sshd:session): session closed for user root fails

            People

              yong.fan nasf (Inactive)
              cliffw Cliff White (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: