Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1480

failure on replay-single test_74: ASSERTION( cfs_atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.5.0
    • Lustre 2.4.0, Lustre 2.4.1
    • 3
    • 4293

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/8506fd4e-ad5b-11e1-8152-52540035b04c.

      The sub-test test_74 failed with the following error:

      test failed to respond and timed out

      Info required for matching: replay-single 74

      Attachments

        Issue Links

          Activity

            [LU-1480] failure on replay-single test_74: ASSERTION( cfs_atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1

            Bobijam, can you please look at the https://maloo.whamcloud.com/test_sets/be5714e2-2ce7-11e2-9af4-52540035b04c to see if your debugging patch contains the information you need to resolve this problem.

            adilger Andreas Dilger added a comment - Bobijam, can you please look at the https://maloo.whamcloud.com/test_sets/be5714e2-2ce7-11e2-9af4-52540035b04c to see if your debugging patch contains the information you need to resolve this problem.
            ian Ian Colle (Inactive) added a comment - https://maloo.whamcloud.com/test_sets/be5714e2-2ce7-11e2-9af4-52540035b04c
            bobijam Zhenyu Xu added a comment - - edited

            status update:

            A debugging patch has landed in master branch and waiting for re-hits with the debug message.

            bobijam Zhenyu Xu added a comment - - edited status update: A debugging patch has landed in master branch and waiting for re-hits with the debug message.

            Alex reported in LU-2070:

            please use http://review.whamcloud.com/4151 to debug

            adilger Andreas Dilger added a comment - Alex reported in LU-2070 : please use http://review.whamcloud.com/4151 to debug

            Prompted to Blocker for 2.4.

            liwei Li Wei (Inactive) added a comment - Prompted to Blocker for 2.4.
            yujian Jian Yu added a comment -

            Lustre Tag: v2_3_0_RC3
            Lustre Client Build: http://build.whamcloud.com/job/lustre-b1_8/198
            Lustre Server Build: http://build.whamcloud.com/job/lustre-b2_3/36
            Distro/Arch: RHEL6.3/x86_64

            The same issue occurred in parallel-scale-nfsv3 test nfsread_orphan_file:
            https://maloo.whamcloud.com/test_sets/0101c7c8-16a0-11e2-962d-52540035b04c

            yujian Jian Yu added a comment - Lustre Tag: v2_3_0_RC3 Lustre Client Build: http://build.whamcloud.com/job/lustre-b1_8/198 Lustre Server Build: http://build.whamcloud.com/job/lustre-b2_3/36 Distro/Arch: RHEL6.3/x86_64 The same issue occurred in parallel-scale-nfsv3 test nfsread_orphan_file: https://maloo.whamcloud.com/test_sets/0101c7c8-16a0-11e2-962d-52540035b04c
            yujian Jian Yu added a comment -

            Lustre Tag: v2_3_0_RC3
            Lustre Build: http://build.whamcloud.com/job/lustre-b2_3/36
            Distro/Arch: RHEL6.3/x86_64(server), FC15/x86_64(client)
            Network: TCP
            ENABLE_QUOTA=yes

            parallel-scale-nfsv3 test iorfpp failed with the same issue:
            https://maloo.whamcloud.com/test_sets/7ca42e16-168c-11e2-962d-52540035b04c

            yujian Jian Yu added a comment - Lustre Tag: v2_3_0_RC3 Lustre Build: http://build.whamcloud.com/job/lustre-b2_3/36 Distro/Arch: RHEL6.3/x86_64(server), FC15/x86_64(client) Network: TCP ENABLE_QUOTA=yes parallel-scale-nfsv3 test iorfpp failed with the same issue: https://maloo.whamcloud.com/test_sets/7ca42e16-168c-11e2-962d-52540035b04c
            yujian Jian Yu added a comment -

            Lustre Tag: v2_3_0_RC2
            Lustre Client Build: http://build.whamcloud.com/job/lustre-b1_8/198
            Lustre Server Build: http://build.whamcloud.com/job/lustre-b2_3/32
            Distro/Arch: RHEL6.3/x86_64

            The same issue occurred in parallel-scale-nfsv3 test nfsread_orphan_file:
            https://maloo.whamcloud.com/test_sets/d53d4086-1370-11e2-808f-52540035b04c

            yujian Jian Yu added a comment - Lustre Tag: v2_3_0_RC2 Lustre Client Build: http://build.whamcloud.com/job/lustre-b1_8/198 Lustre Server Build: http://build.whamcloud.com/job/lustre-b2_3/32 Distro/Arch: RHEL6.3/x86_64 The same issue occurred in parallel-scale-nfsv3 test nfsread_orphan_file: https://maloo.whamcloud.com/test_sets/d53d4086-1370-11e2-808f-52540035b04c
            bobijam Zhenyu Xu added a comment - - edited

            http://review.whamcloud.com/4108

            LU-1480 lov: lov_delete_raid0 need wait

            If lov_delete_raid0 does not wait for its layout stable, lov object's
            deletion will leave lovsub object hang in the memory.

            bobijam Zhenyu Xu added a comment - - edited http://review.whamcloud.com/4108 LU-1480 lov: lov_delete_raid0 need wait If lov_delete_raid0 does not wait for its layout stable, lov object's deletion will leave lovsub object hang in the memory.
            sarah Sarah Liu added a comment -

            Hit this issue when testing interop between 1.8 and 2.3-RC1 after paralles-scale-nfsv3 passed all the sub tests

            https://maloo.whamcloud.com/test_sets/71e39924-0626-11e2-9b17-52540035b04c

            MDS console log of test_nfsread_orphan_file shows:

            19:47:03:LustreError: 20620:0:(lu_object.c:1220:lu_stack_fini()) header@ffff8800517c9bf0[0x0, 1, [0x100060000:0x4769:0x0] hash]{ 
            19:47:03:LustreError: 20620:0:(lu_object.c:1220:lu_stack_fini()) ....lovsub@ffff8800517c9c88[0]
            19:47:03:LustreError: 20620:0:(lu_object.c:1220:lu_stack_fini()) ....osc@ffff8800545352a8id: 18281 gr: 0 idx: 6 gen: 0 kms_valid: 1 kms 0 rc: 0 force_sync: 0 min_xid: 0 size: 2942 mtime: 1348454797 atime: 0 ctime: 1348454797 blocks: 0
            19:47:03:LustreError: 20620:0:(lu_object.c:1220:lu_stack_fini()) } header@ffff8800517c9bf0
            19:47:03:LustreError: 20620:0:(lu_object.c:1081:lu_device_fini()) ASSERTION( cfs_atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1
            19:47:03:LustreError: 20620:0:(lu_object.c:1081:lu_device_fini()) LBUG
            19:47:03:Pid: 20620, comm: umount
            19:47:03:
            19:47:03:Call Trace:
            19:47:03: [<ffffffffa0d31905>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
            19:47:03: [<ffffffffa0d31f17>] lbug_with_loc+0x47/0xb0 [libcfs]
            19:47:03: [<ffffffffa040b2dc>] lu_device_fini+0xcc/0xd0 [obdclass]
            19:47:03: [<ffffffffa089f114>] lovsub_device_free+0x24/0x200 [lov]
            19:47:03: [<ffffffffa040e826>] lu_stack_fini+0x96/0xf0 [obdclass]
            19:47:03: [<ffffffffa04137ae>] cl_stack_fini+0xe/0x10 [obdclass]
            19:47:03: [<ffffffffa088e6a8>] lov_device_fini+0x58/0x130 [lov]
            19:47:03: [<ffffffffa040e7d9>] lu_stack_fini+0x49/0xf0 [obdclass]
            19:47:03: [<ffffffffa04137ae>] cl_stack_fini+0xe/0x10 [obdclass]
            19:47:03: [<ffffffffa0b744cd>] cl_sb_fini+0x6d/0x190 [lustre]
            19:47:03: [<ffffffffa0b3959c>] client_common_put_super+0x14c/0xe60 [lustre]
            19:47:03: [<ffffffffa0b3a380>] ll_put_super+0xd0/0x360 [lustre]
            19:47:03: [<ffffffff811961a6>] ? invalidate_inodes+0xf6/0x190
            19:47:03: [<ffffffff8117d34b>] generic_shutdown_super+0x5b/0xe0
            19:47:03: [<ffffffff8117d436>] kill_anon_super+0x16/0x60
            19:47:03: [<ffffffffa03f8eaa>] lustre_kill_super+0x4a/0x60 [obdclass]
            19:47:03: [<ffffffff8117e4b0>] deactivate_super+0x70/0x90
            19:47:03: [<ffffffff8119a4ff>] mntput_no_expire+0xbf/0x110
            19:47:03: [<ffffffff8119af9b>] sys_umount+0x7b/0x3a0
            19:47:03: [<ffffffff810d6b12>] ? audit_syscall_entry+0x272/0x2a0
            19:47:03: [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
            19:47:03:
            19:47:03:Kernel panic - not syncing: LBUG
            19:47:03:Pid: 20620, comm: umount Not tainted 2.6.32-279.5.1.el6_lustre.x86_64 #1
            19:47:03:Call Trace:
            19:47:03: [<ffffffff814fd58a>] ? panic+0xa0/0x168
            19:47:03: [<ffffffffa0d31f6b>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
            19:47:03: [<ffffffffa040b2dc>] ? lu_device_fini+0xcc/0xd0 [obdclass]
            19:47:03: [<ffffffffa089f114>] ? lovsub_device_free+0x24/0x200 [lov]
            19:47:03: [<ffffffffa040e826>] ? lu_stack_fini+0x96/0xf0 [obdclass]
            19:47:03: [<ffffffffa04137ae>] ? cl_stack_fini+0xe/0x10 [obdclass]
            19:47:03: [<ffffffffa088e6a8>] ? lov_device_fini+0x58/0x130 [lov]
            19:47:03: [<ffffffffa040e7d9>] ? lu_stack_fini+0x49/0xf0 [obdclass]
            19:47:03: [<ffffffffa04137ae>] ? cl_stack_fini+0xe/0x10 [obdclass]
            19:47:03: [<ffffffffa0b744cd>] ? cl_sb_fini+0x6d/0x190 [lustre]
            19:47:03: [<ffffffffa0b3959c>] ? client_common_put_super+0x14c/0xe60 [lustre]
            19:47:03: [<ffffffffa0b3a380>] ? ll_put_super+0xd0/0x360 [lustre]
            19:47:03: [<ffffffff811961a6>] ? invalidate_inodes+0xf6/0x190
            19:47:03: [<ffffffff8117d34b>] ? generic_shutdown_super+0x5b/0xe0
            19:47:03: [<ffffffff8117d436>] ? kill_anon_super+0x16/0x60
            19:47:03: [<ffffffffa03f8eaa>] ? lustre_kill_super+0x4a/0x60 [obdclass]
            19:47:03: [<ffffffff8117e4b0>] ? deactivate_super+0x70/0x90
            19:47:03: [<ffffffff8119a4ff>] ? mntput_no_expire+0xbf/0x110
            19:47:03: [<ffffffff8119af9b>] ? sys_umount+0x7b/0x3a0
            19:47:03: [<ffffffff810d6b12>] ? audit_syscall_entry+0x272/0x2a0
            19:47:03: [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
            19:47:03:Initializing cgroup subsys cpuset
            19:47:03:Initializing cgroup subsys cpu
            
            sarah Sarah Liu added a comment - Hit this issue when testing interop between 1.8 and 2.3-RC1 after paralles-scale-nfsv3 passed all the sub tests https://maloo.whamcloud.com/test_sets/71e39924-0626-11e2-9b17-52540035b04c MDS console log of test_nfsread_orphan_file shows: 19:47:03:LustreError: 20620:0:(lu_object.c:1220:lu_stack_fini()) header@ffff8800517c9bf0[0x0, 1, [0x100060000:0x4769:0x0] hash]{ 19:47:03:LustreError: 20620:0:(lu_object.c:1220:lu_stack_fini()) ....lovsub@ffff8800517c9c88[0] 19:47:03:LustreError: 20620:0:(lu_object.c:1220:lu_stack_fini()) ....osc@ffff8800545352a8id: 18281 gr: 0 idx: 6 gen: 0 kms_valid: 1 kms 0 rc: 0 force_sync: 0 min_xid: 0 size: 2942 mtime: 1348454797 atime: 0 ctime: 1348454797 blocks: 0 19:47:03:LustreError: 20620:0:(lu_object.c:1220:lu_stack_fini()) } header@ffff8800517c9bf0 19:47:03:LustreError: 20620:0:(lu_object.c:1081:lu_device_fini()) ASSERTION( cfs_atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1 19:47:03:LustreError: 20620:0:(lu_object.c:1081:lu_device_fini()) LBUG 19:47:03:Pid: 20620, comm: umount 19:47:03: 19:47:03:Call Trace: 19:47:03: [<ffffffffa0d31905>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 19:47:03: [<ffffffffa0d31f17>] lbug_with_loc+0x47/0xb0 [libcfs] 19:47:03: [<ffffffffa040b2dc>] lu_device_fini+0xcc/0xd0 [obdclass] 19:47:03: [<ffffffffa089f114>] lovsub_device_free+0x24/0x200 [lov] 19:47:03: [<ffffffffa040e826>] lu_stack_fini+0x96/0xf0 [obdclass] 19:47:03: [<ffffffffa04137ae>] cl_stack_fini+0xe/0x10 [obdclass] 19:47:03: [<ffffffffa088e6a8>] lov_device_fini+0x58/0x130 [lov] 19:47:03: [<ffffffffa040e7d9>] lu_stack_fini+0x49/0xf0 [obdclass] 19:47:03: [<ffffffffa04137ae>] cl_stack_fini+0xe/0x10 [obdclass] 19:47:03: [<ffffffffa0b744cd>] cl_sb_fini+0x6d/0x190 [lustre] 19:47:03: [<ffffffffa0b3959c>] client_common_put_super+0x14c/0xe60 [lustre] 19:47:03: [<ffffffffa0b3a380>] ll_put_super+0xd0/0x360 [lustre] 19:47:03: [<ffffffff811961a6>] ? invalidate_inodes+0xf6/0x190 19:47:03: [<ffffffff8117d34b>] generic_shutdown_super+0x5b/0xe0 19:47:03: [<ffffffff8117d436>] kill_anon_super+0x16/0x60 19:47:03: [<ffffffffa03f8eaa>] lustre_kill_super+0x4a/0x60 [obdclass] 19:47:03: [<ffffffff8117e4b0>] deactivate_super+0x70/0x90 19:47:03: [<ffffffff8119a4ff>] mntput_no_expire+0xbf/0x110 19:47:03: [<ffffffff8119af9b>] sys_umount+0x7b/0x3a0 19:47:03: [<ffffffff810d6b12>] ? audit_syscall_entry+0x272/0x2a0 19:47:03: [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b 19:47:03: 19:47:03:Kernel panic - not syncing: LBUG 19:47:03:Pid: 20620, comm: umount Not tainted 2.6.32-279.5.1.el6_lustre.x86_64 #1 19:47:03:Call Trace: 19:47:03: [<ffffffff814fd58a>] ? panic+0xa0/0x168 19:47:03: [<ffffffffa0d31f6b>] ? lbug_with_loc+0x9b/0xb0 [libcfs] 19:47:03: [<ffffffffa040b2dc>] ? lu_device_fini+0xcc/0xd0 [obdclass] 19:47:03: [<ffffffffa089f114>] ? lovsub_device_free+0x24/0x200 [lov] 19:47:03: [<ffffffffa040e826>] ? lu_stack_fini+0x96/0xf0 [obdclass] 19:47:03: [<ffffffffa04137ae>] ? cl_stack_fini+0xe/0x10 [obdclass] 19:47:03: [<ffffffffa088e6a8>] ? lov_device_fini+0x58/0x130 [lov] 19:47:03: [<ffffffffa040e7d9>] ? lu_stack_fini+0x49/0xf0 [obdclass] 19:47:03: [<ffffffffa04137ae>] ? cl_stack_fini+0xe/0x10 [obdclass] 19:47:03: [<ffffffffa0b744cd>] ? cl_sb_fini+0x6d/0x190 [lustre] 19:47:03: [<ffffffffa0b3959c>] ? client_common_put_super+0x14c/0xe60 [lustre] 19:47:03: [<ffffffffa0b3a380>] ? ll_put_super+0xd0/0x360 [lustre] 19:47:03: [<ffffffff811961a6>] ? invalidate_inodes+0xf6/0x190 19:47:03: [<ffffffff8117d34b>] ? generic_shutdown_super+0x5b/0xe0 19:47:03: [<ffffffff8117d436>] ? kill_anon_super+0x16/0x60 19:47:03: [<ffffffffa03f8eaa>] ? lustre_kill_super+0x4a/0x60 [obdclass] 19:47:03: [<ffffffff8117e4b0>] ? deactivate_super+0x70/0x90 19:47:03: [<ffffffff8119a4ff>] ? mntput_no_expire+0xbf/0x110 19:47:03: [<ffffffff8119af9b>] ? sys_umount+0x7b/0x3a0 19:47:03: [<ffffffff810d6b12>] ? audit_syscall_entry+0x272/0x2a0 19:47:03: [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b 19:47:03:Initializing cgroup subsys cpuset 19:47:03:Initializing cgroup subsys cpu
            pjones Peter Jones added a comment -

            ok

            pjones Peter Jones added a comment - ok

            People

              bobijam Zhenyu Xu
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: