Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1480

failure on replay-single test_74: ASSERTION( cfs_atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.5.0
    • Lustre 2.4.0, Lustre 2.4.1
    • 3
    • 4293

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/8506fd4e-ad5b-11e1-8152-52540035b04c.

      The sub-test test_74 failed with the following error:

      test failed to respond and timed out

      Info required for matching: replay-single 74

      Attachments

        Issue Links

          Activity

            [LU-1480] failure on replay-single test_74: ASSERTION( cfs_atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1
            pjones Peter Jones added a comment -

            Landed for 2.5.0

            pjones Peter Jones added a comment - Landed for 2.5.0
            sarah Sarah Liu added a comment -

            Also hit this error when running interop between 2.4.0 server and 2.5 client:
            https://maloo.whamcloud.com/test_sets/a58ce5fe-19c7-11e3-bb73-52540035b04c

            server: 2.4.0
            client: lustre-master build #1652

            19:44:33:LustreError: 10863:0:(lu_object.c:1141:lu_device_fini()) ASSERTION( cfs_atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1
            19:44:33:LustreError: 10863:0:(lu_object.c:1141:lu_device_fini()) LBUG
            19:44:34:Pid: 10863, comm: umount
            19:44:34:
            19:44:35:Call Trace:
            19:44:35: [<ffffffffa0478895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
            19:44:35: [<ffffffffa0478e97>] lbug_with_loc+0x47/0xb0 [libcfs]
            19:44:36: [<ffffffffa05ec4b8>] lu_device_fini+0xb8/0xc0 [obdclass]
            19:44:36: [<ffffffffa05d0727>] ls_device_put+0x87/0x1e0 [obdclass]
            19:44:36: [<ffffffffa05d0a3c>] local_oid_storage_fini+0x1bc/0x270 [obdclass]
            19:44:36: [<ffffffffa0d6bd74>] mgs_fs_cleanup+0x64/0x80 [mgs]
            19:44:36: [<ffffffffa0d68ae0>] mgs_device_fini+0x1d0/0x5a0 [mgs]
            19:44:36: [<ffffffffa05ddba7>] class_cleanup+0x577/0xda0 [obdclass]
            19:44:36: [<ffffffffa05b2b36>] ? class_name2dev+0x56/0xe0 [obdclass]
            19:44:37: [<ffffffffa05df48c>] class_process_config+0x10bc/0x1c80 [obdclass]
            19:44:37: [<ffffffffa05d8cb3>] ? lustre_cfg_new+0x353/0x7e0 [obdclass]
            19:44:38: [<ffffffffa05e01c9>] class_manual_cleanup+0x179/0x6f0 [obdclass]
            19:44:38: [<ffffffffa05b2b36>] ? class_name2dev+0x56/0xe0 [obdclass]
            19:44:38: [<ffffffffa0614d7d>] server_put_super+0x46d/0xf00 [obdclass]
            19:44:38: [<ffffffff8118334b>] generic_shutdown_super+0x5b/0xe0
            19:44:38: [<ffffffff81183436>] kill_anon_super+0x16/0x60
            19:44:39: [<ffffffffa05e2026>] lustre_kill_super+0x36/0x60 [obdclass]
            19:44:39: [<ffffffff81183bd7>] deactivate_super+0x57/0x80
            19:44:39: [<ffffffff811a1c4f>] mntput_no_expire+0xbf/0x110
            19:44:39: [<ffffffff811a26bb>] sys_umount+0x7b/0x3a0
            19:44:39: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
            19:44:39:
            19:44:40:Kernel panic - not syncing: LBUG
            19:44:40:Pid: 10863, comm: umount Not tainted 2.6.32-358.6.2.el6_lustre.g230b174.x86_64 #1
            
            sarah Sarah Liu added a comment - Also hit this error when running interop between 2.4.0 server and 2.5 client: https://maloo.whamcloud.com/test_sets/a58ce5fe-19c7-11e3-bb73-52540035b04c server: 2.4.0 client: lustre-master build #1652 19:44:33:LustreError: 10863:0:(lu_object.c:1141:lu_device_fini()) ASSERTION( cfs_atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1 19:44:33:LustreError: 10863:0:(lu_object.c:1141:lu_device_fini()) LBUG 19:44:34:Pid: 10863, comm: umount 19:44:34: 19:44:35:Call Trace: 19:44:35: [<ffffffffa0478895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 19:44:35: [<ffffffffa0478e97>] lbug_with_loc+0x47/0xb0 [libcfs] 19:44:36: [<ffffffffa05ec4b8>] lu_device_fini+0xb8/0xc0 [obdclass] 19:44:36: [<ffffffffa05d0727>] ls_device_put+0x87/0x1e0 [obdclass] 19:44:36: [<ffffffffa05d0a3c>] local_oid_storage_fini+0x1bc/0x270 [obdclass] 19:44:36: [<ffffffffa0d6bd74>] mgs_fs_cleanup+0x64/0x80 [mgs] 19:44:36: [<ffffffffa0d68ae0>] mgs_device_fini+0x1d0/0x5a0 [mgs] 19:44:36: [<ffffffffa05ddba7>] class_cleanup+0x577/0xda0 [obdclass] 19:44:36: [<ffffffffa05b2b36>] ? class_name2dev+0x56/0xe0 [obdclass] 19:44:37: [<ffffffffa05df48c>] class_process_config+0x10bc/0x1c80 [obdclass] 19:44:37: [<ffffffffa05d8cb3>] ? lustre_cfg_new+0x353/0x7e0 [obdclass] 19:44:38: [<ffffffffa05e01c9>] class_manual_cleanup+0x179/0x6f0 [obdclass] 19:44:38: [<ffffffffa05b2b36>] ? class_name2dev+0x56/0xe0 [obdclass] 19:44:38: [<ffffffffa0614d7d>] server_put_super+0x46d/0xf00 [obdclass] 19:44:38: [<ffffffff8118334b>] generic_shutdown_super+0x5b/0xe0 19:44:38: [<ffffffff81183436>] kill_anon_super+0x16/0x60 19:44:39: [<ffffffffa05e2026>] lustre_kill_super+0x36/0x60 [obdclass] 19:44:39: [<ffffffff81183bd7>] deactivate_super+0x57/0x80 19:44:39: [<ffffffff811a1c4f>] mntput_no_expire+0xbf/0x110 19:44:39: [<ffffffff811a26bb>] sys_umount+0x7b/0x3a0 19:44:39: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b 19:44:39: 19:44:40:Kernel panic - not syncing: LBUG 19:44:40:Pid: 10863, comm: umount Not tainted 2.6.32-358.6.2.el6_lustre.g230b174.x86_64 #1
            yujian Jian Yu added a comment -

            Lustre build: http://build.whamcloud.com/job/lustre-b2_4/44/ (2.4.1 RC1)
            Distro/Arch: RHEL6.4/x86_64 + FC18/x86_64 (Server + Client)

            sanity test 232 hit the same failure:
            https://maloo.whamcloud.com/test_sets/0cbde1d0-14ee-11e3-ac48-52540035b04c

            yujian Jian Yu added a comment - Lustre build: http://build.whamcloud.com/job/lustre-b2_4/44/ (2.4.1 RC1) Distro/Arch: RHEL6.4/x86_64 + FC18/x86_64 (Server + Client) sanity test 232 hit the same failure: https://maloo.whamcloud.com/test_sets/0cbde1d0-14ee-11e3-ac48-52540035b04c
            yujian Jian Yu added a comment -

            Lustre Branch: b2_4
            Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/27/
            Distro/Arch: RHEL6.4/x86_64 + FC18/x86_64 (Server + Client)

            Sanity test 232 failed as follows:

            == sanity test 232: failed lock should not block umount ============================================== 08:02:59 (1375628579)
            fail_loc=0x31c
            1+0 records in
            1+0 records out
            1048576 bytes (1.0 MB) copied, 0.0207861 s, 50.4 MB/s
            fail_loc=0
            10.10.4.122@tcp:/lustre /mnt/lustre lustre rw,flock,user_xattr 0 0
            CMD: client-16vm2.lab.whamcloud.com grep -c /mnt/lustre' ' /proc/mounts
            Stopping client client-16vm2.lab.whamcloud.com /mnt/lustre (opts:)
            CMD: client-16vm2.lab.whamcloud.com lsof -t /mnt/lustre
            

            Syslog on client node client-16vm2 showed that:

            Aug  4 08:02:59 client-16vm2 kernel: [ 6221.766422] Lustre: DEBUG MARKER: == sanity test 232: failed lock should not block umount ============================================== 08:02:59 (1375628579)
            Aug  4 08:02:59 client-16vm2 mrshd[30197]: pam_unix(mrsh:session): session closed for user root
            Aug  4 08:02:59 client-16vm2 xinetd[427]: EXIT: mshell status=0 pid=30197 duration=0(sec)
            Aug  4 08:02:59 client-16vm2 systemd-logind[299]: Removed session c2300.
            Aug  4 08:02:59 client-16vm2 kernel: [ 6221.842241] Lustre: DEBUG MARKER: grep -c /mnt/lustre' ' /proc/mounts
            Aug  4 08:02:59 client-16vm2 kernel: [ 6221.853369] Lustre: DEBUG MARKER: lsof -t /mnt/lustre
            Aug  4 08:03:00 client-16vm2 kernel: [ 6221.973212] Lustre: DEBUG MARKER: umount /mnt/lustre 2>&1
            Aug  4 08:03:00 client-16vm2 kernel: [ 6222.052122] LustreError: 30246:0:(lu_object.c:1141:lu_device_fini()) ASSERTION( cfs_atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1
            Aug  4 08:03:00 client-16vm2 kernel: [ 6222.058312] LustreError: 30246:0:(lu_object.c:1141:lu_device_fini()) LBUG
            Aug  4 08:03:00 client-16vm2 kernel: [ 6222.063666] Pid: 30246, comm: umount
            Aug  4 08:03:00 client-16vm2 kernel: [ 6222.063669]
            Aug  4 08:03:00 client-16vm2 kernel: [ 6222.063669] Call Trace:
            Aug  4 08:03:00 client-16vm2 kernel: [ 6222.063690]  [<ffffffffa02477e7>] libcfs_debug_dumpstack+0x57/0x80 [libcfs]
            Aug  4 08:03:00 client-16vm2 kernel: [ 6222.063698]  [<ffffffffa0247df5>] lbug_with_loc+0x45/0xc0 [libcfs]
            Aug  4 08:03:00 client-16vm2 kernel: [ 6222.063735]  [<ffffffffa03b3c0a>] lu_device_fini+0xba/0xc0 [obdclass]
            Aug  4 08:03:00 client-16vm2 kernel: [ 6222.063754]  [<ffffffffa0704816>] lovsub_device_free+0x66/0x1b0 [lov]
            Aug  4 08:03:00 client-16vm2 kernel: [ 6222.063776]  [<ffffffffa03b7a0e>] lu_stack_fini+0x7e/0xc0 [obdclass]
            Aug  4 08:03:00 client-16vm2 kernel: [ 6222.063797]  [<ffffffffa03bd24e>] cl_stack_fini+0xe/0x10 [obdclass]
            Aug  4 08:03:00 client-16vm2 kernel: [ 6222.063807]  [<ffffffffa06f48c9>] lov_device_fini+0x59/0x120 [lov]
            Aug  4 08:03:00 client-16vm2 kernel: [ 6222.063829]  [<ffffffffa03b79d9>] lu_stack_fini+0x49/0xc0 [obdclass]
            Aug  4 08:03:00 client-16vm2 kernel: [ 6222.063861]  [<ffffffffa03bd24e>] cl_stack_fini+0xe/0x10 [obdclass]
            Aug  4 08:03:00 client-16vm2 kernel: [ 6222.063892]  [<ffffffffa07e34d1>] cl_sb_fini+0x61/0x1a0 [lustre]
            Aug  4 08:03:00 client-16vm2 kernel: [ 6222.063905]  [<ffffffffa07a5d84>] client_common_put_super+0x54/0x7c0 [lustre]
            Aug  4 08:03:00 client-16vm2 kernel: [ 6222.063918]  [<ffffffffa07a6e84>] ll_put_super+0xd4/0x370 [lustre]
            Aug  4 08:03:00 client-16vm2 kernel: [ 6222.063922]  [<ffffffff811aa7ee>] ? dispose_list+0x3e/0x60
            Aug  4 08:03:00 client-16vm2 kernel: [ 6222.063925]  [<ffffffff81192761>] generic_shutdown_super+0x61/0xe0
            Aug  4 08:03:00 client-16vm2 kernel: [ 6222.063928]  [<ffffffff81192876>] kill_anon_super+0x16/0x30
            Aug  4 08:03:00 client-16vm2 kernel: [ 6222.063949]  [<ffffffffa03a8b7a>] lustre_kill_super+0x4a/0x60 [obdclass]
            Aug  4 08:03:00 client-16vm2 kernel: [ 6222.063951]  [<ffffffff81192bf7>] deactivate_locked_super+0x57/0x90
            Aug  4 08:03:00 client-16vm2 kernel: [ 6222.063953]  [<ffffffff811937ae>] deactivate_super+0x4e/0x70
            Aug  4 08:03:00 client-16vm2 kernel: [ 6222.063956]  [<ffffffff811ae167>] mntput_no_expire+0xd7/0x130
            Aug  4 08:03:00 client-16vm2 kernel: [ 6222.063958]  [<ffffffff811af146>] sys_umount+0x76/0x390
            Aug  4 08:03:00 client-16vm2 kernel: [ 6222.063961]  [<ffffffff8162bae9>] system_call_fastpath+0x16/0x1b
            

            Maloo report: https://maloo.whamcloud.com/test_sets/996b2278-fd79-11e2-9fdb-52540035b04c

            The failure occurs regularly on FC18 client and is blocking the remaining sanity sub-tests from running.

            yujian Jian Yu added a comment - Lustre Branch: b2_4 Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/27/ Distro/Arch: RHEL6.4/x86_64 + FC18/x86_64 (Server + Client) Sanity test 232 failed as follows: == sanity test 232: failed lock should not block umount ============================================== 08:02:59 (1375628579) fail_loc=0x31c 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.0207861 s, 50.4 MB/s fail_loc=0 10.10.4.122@tcp:/lustre /mnt/lustre lustre rw,flock,user_xattr 0 0 CMD: client-16vm2.lab.whamcloud.com grep -c /mnt/lustre' ' /proc/mounts Stopping client client-16vm2.lab.whamcloud.com /mnt/lustre (opts:) CMD: client-16vm2.lab.whamcloud.com lsof -t /mnt/lustre Syslog on client node client-16vm2 showed that: Aug 4 08:02:59 client-16vm2 kernel: [ 6221.766422] Lustre: DEBUG MARKER: == sanity test 232: failed lock should not block umount ============================================== 08:02:59 (1375628579) Aug 4 08:02:59 client-16vm2 mrshd[30197]: pam_unix(mrsh:session): session closed for user root Aug 4 08:02:59 client-16vm2 xinetd[427]: EXIT: mshell status=0 pid=30197 duration=0(sec) Aug 4 08:02:59 client-16vm2 systemd-logind[299]: Removed session c2300. Aug 4 08:02:59 client-16vm2 kernel: [ 6221.842241] Lustre: DEBUG MARKER: grep -c /mnt/lustre' ' /proc/mounts Aug 4 08:02:59 client-16vm2 kernel: [ 6221.853369] Lustre: DEBUG MARKER: lsof -t /mnt/lustre Aug 4 08:03:00 client-16vm2 kernel: [ 6221.973212] Lustre: DEBUG MARKER: umount /mnt/lustre 2>&1 Aug 4 08:03:00 client-16vm2 kernel: [ 6222.052122] LustreError: 30246:0:(lu_object.c:1141:lu_device_fini()) ASSERTION( cfs_atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1 Aug 4 08:03:00 client-16vm2 kernel: [ 6222.058312] LustreError: 30246:0:(lu_object.c:1141:lu_device_fini()) LBUG Aug 4 08:03:00 client-16vm2 kernel: [ 6222.063666] Pid: 30246, comm: umount Aug 4 08:03:00 client-16vm2 kernel: [ 6222.063669] Aug 4 08:03:00 client-16vm2 kernel: [ 6222.063669] Call Trace: Aug 4 08:03:00 client-16vm2 kernel: [ 6222.063690] [<ffffffffa02477e7>] libcfs_debug_dumpstack+0x57/0x80 [libcfs] Aug 4 08:03:00 client-16vm2 kernel: [ 6222.063698] [<ffffffffa0247df5>] lbug_with_loc+0x45/0xc0 [libcfs] Aug 4 08:03:00 client-16vm2 kernel: [ 6222.063735] [<ffffffffa03b3c0a>] lu_device_fini+0xba/0xc0 [obdclass] Aug 4 08:03:00 client-16vm2 kernel: [ 6222.063754] [<ffffffffa0704816>] lovsub_device_free+0x66/0x1b0 [lov] Aug 4 08:03:00 client-16vm2 kernel: [ 6222.063776] [<ffffffffa03b7a0e>] lu_stack_fini+0x7e/0xc0 [obdclass] Aug 4 08:03:00 client-16vm2 kernel: [ 6222.063797] [<ffffffffa03bd24e>] cl_stack_fini+0xe/0x10 [obdclass] Aug 4 08:03:00 client-16vm2 kernel: [ 6222.063807] [<ffffffffa06f48c9>] lov_device_fini+0x59/0x120 [lov] Aug 4 08:03:00 client-16vm2 kernel: [ 6222.063829] [<ffffffffa03b79d9>] lu_stack_fini+0x49/0xc0 [obdclass] Aug 4 08:03:00 client-16vm2 kernel: [ 6222.063861] [<ffffffffa03bd24e>] cl_stack_fini+0xe/0x10 [obdclass] Aug 4 08:03:00 client-16vm2 kernel: [ 6222.063892] [<ffffffffa07e34d1>] cl_sb_fini+0x61/0x1a0 [lustre] Aug 4 08:03:00 client-16vm2 kernel: [ 6222.063905] [<ffffffffa07a5d84>] client_common_put_super+0x54/0x7c0 [lustre] Aug 4 08:03:00 client-16vm2 kernel: [ 6222.063918] [<ffffffffa07a6e84>] ll_put_super+0xd4/0x370 [lustre] Aug 4 08:03:00 client-16vm2 kernel: [ 6222.063922] [<ffffffff811aa7ee>] ? dispose_list+0x3e/0x60 Aug 4 08:03:00 client-16vm2 kernel: [ 6222.063925] [<ffffffff81192761>] generic_shutdown_super+0x61/0xe0 Aug 4 08:03:00 client-16vm2 kernel: [ 6222.063928] [<ffffffff81192876>] kill_anon_super+0x16/0x30 Aug 4 08:03:00 client-16vm2 kernel: [ 6222.063949] [<ffffffffa03a8b7a>] lustre_kill_super+0x4a/0x60 [obdclass] Aug 4 08:03:00 client-16vm2 kernel: [ 6222.063951] [<ffffffff81192bf7>] deactivate_locked_super+0x57/0x90 Aug 4 08:03:00 client-16vm2 kernel: [ 6222.063953] [<ffffffff811937ae>] deactivate_super+0x4e/0x70 Aug 4 08:03:00 client-16vm2 kernel: [ 6222.063956] [<ffffffff811ae167>] mntput_no_expire+0xd7/0x130 Aug 4 08:03:00 client-16vm2 kernel: [ 6222.063958] [<ffffffff811af146>] sys_umount+0x76/0x390 Aug 4 08:03:00 client-16vm2 kernel: [ 6222.063961] [<ffffffff8162bae9>] system_call_fastpath+0x16/0x1b Maloo report: https://maloo.whamcloud.com/test_sets/996b2278-fd79-11e2-9fdb-52540035b04c The failure occurs regularly on FC18 client and is blocking the remaining sanity sub-tests from running.
            yujian Jian Yu added a comment -

            Lustre b1_8 client build: http://build.whamcloud.com/job/lustre-b1_8/258 (1.8.9-wc1)
            Lustre b2_1 server build: http://build.whamcloud.com/job/lustre-b2_1/215 (2.1.6 RC2)

            After running parallel-scale-nfsv3, unmounting NFS server/Lustre client on the MDS node hit the same failure:
            https://maloo.whamcloud.com/test_sets/25c65126-de8e-11e2-afb2-52540035b04c

            yujian Jian Yu added a comment - Lustre b1_8 client build: http://build.whamcloud.com/job/lustre-b1_8/258 (1.8.9-wc1) Lustre b2_1 server build: http://build.whamcloud.com/job/lustre-b2_1/215 (2.1.6 RC2) After running parallel-scale-nfsv3, unmounting NFS server/Lustre client on the MDS node hit the same failure: https://maloo.whamcloud.com/test_sets/25c65126-de8e-11e2-afb2-52540035b04c
            bobijam Zhenyu Xu added a comment -

            update a patchset at http://review.whamcloud.com/6105

            During a file lov object initialization, we need protect the access and change of its subobj->coh_parent, since it could be another layout change race there, which makes an unreferenced lovsub obj in the site object hash table.

            bobijam Zhenyu Xu added a comment - update a patchset at http://review.whamcloud.com/6105 During a file lov object initialization, we need protect the access and change of its subobj->coh_parent, since it could be another layout change race there, which makes an unreferenced lovsub obj in the site object hash table.
            yujian Jian Yu added a comment -

            Lustre b1_8 client build: http://build.whamcloud.com/job/lustre-b1_8/258 (1.8.9-wc1)
            Lustre b2_1 server build: http://build.whamcloud.com/job/lustre-b2_1/205

            After running parallel-scale-nfsv3, unmounting NFS server/Lustre client on the MDS node hit the same failure:

            19:29:19:LustreError: 10728:0:(lu_object.c:1018:lu_device_fini()) ASSERTION( cfs_atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1
            19:29:19:LustreError: 10728:0:(lu_object.c:1018:lu_device_fini()) LBUG
            19:29:20:Pid: 10728, comm: umount
            19:29:20:
            19:29:20:Call Trace:
            19:29:20: [<ffffffffa049f785>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
            19:29:20: [<ffffffffa049fd97>] lbug_with_loc+0x47/0xb0 [libcfs]
            19:29:20: [<ffffffffa05df3dc>] lu_device_fini+0xcc/0xd0 [obdclass]
            19:29:20: [<ffffffffa0a798b4>] lovsub_device_free+0x24/0x1e0 [lov]
            19:29:20: [<ffffffffa05e25f6>] lu_stack_fini+0x96/0xf0 [obdclass]
            19:29:20: [<ffffffffa05e6bfe>] cl_stack_fini+0xe/0x10 [obdclass]
            19:29:20: [<ffffffffa0a699d8>] lov_device_fini+0x58/0x130 [lov]
            19:29:20: [<ffffffffa05e25a9>] lu_stack_fini+0x49/0xf0 [obdclass]
            19:29:20: [<ffffffffa05e6bfe>] cl_stack_fini+0xe/0x10 [obdclass]
            19:29:20: [<ffffffffa0b52b6d>] cl_sb_fini+0x6d/0x190 [lustre]
            19:29:20: [<ffffffffa0b1ac9c>] client_common_put_super+0x14c/0xe60 [lustre]
            19:29:20: [<ffffffffa0b1ba80>] ll_put_super+0xd0/0x360 [lustre]
            19:29:20: [<ffffffff8119d546>] ? invalidate_inodes+0xf6/0x190
            19:29:20: [<ffffffff8118334b>] generic_shutdown_super+0x5b/0xe0
            19:29:20: [<ffffffff81183436>] kill_anon_super+0x16/0x60
            19:29:21: [<ffffffffa05ceaca>] lustre_kill_super+0x4a/0x60 [obdclass]
            19:29:21: [<ffffffff81183bd7>] deactivate_super+0x57/0x80
            19:29:21: [<ffffffff811a1bff>] mntput_no_expire+0xbf/0x110
            19:29:21: [<ffffffff811a266b>] sys_umount+0x7b/0x3a0
            19:29:21: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
            19:29:21:
            19:29:21:Kernel panic - not syncing: LBUG
            

            Maloo report: https://maloo.whamcloud.com/test_sets/84843efe-c8e9-11e2-97fe-52540035b04c

            yujian Jian Yu added a comment - Lustre b1_8 client build: http://build.whamcloud.com/job/lustre-b1_8/258 (1.8.9-wc1) Lustre b2_1 server build: http://build.whamcloud.com/job/lustre-b2_1/205 After running parallel-scale-nfsv3, unmounting NFS server/Lustre client on the MDS node hit the same failure: 19:29:19:LustreError: 10728:0:(lu_object.c:1018:lu_device_fini()) ASSERTION( cfs_atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1 19:29:19:LustreError: 10728:0:(lu_object.c:1018:lu_device_fini()) LBUG 19:29:20:Pid: 10728, comm: umount 19:29:20: 19:29:20:Call Trace: 19:29:20: [<ffffffffa049f785>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 19:29:20: [<ffffffffa049fd97>] lbug_with_loc+0x47/0xb0 [libcfs] 19:29:20: [<ffffffffa05df3dc>] lu_device_fini+0xcc/0xd0 [obdclass] 19:29:20: [<ffffffffa0a798b4>] lovsub_device_free+0x24/0x1e0 [lov] 19:29:20: [<ffffffffa05e25f6>] lu_stack_fini+0x96/0xf0 [obdclass] 19:29:20: [<ffffffffa05e6bfe>] cl_stack_fini+0xe/0x10 [obdclass] 19:29:20: [<ffffffffa0a699d8>] lov_device_fini+0x58/0x130 [lov] 19:29:20: [<ffffffffa05e25a9>] lu_stack_fini+0x49/0xf0 [obdclass] 19:29:20: [<ffffffffa05e6bfe>] cl_stack_fini+0xe/0x10 [obdclass] 19:29:20: [<ffffffffa0b52b6d>] cl_sb_fini+0x6d/0x190 [lustre] 19:29:20: [<ffffffffa0b1ac9c>] client_common_put_super+0x14c/0xe60 [lustre] 19:29:20: [<ffffffffa0b1ba80>] ll_put_super+0xd0/0x360 [lustre] 19:29:20: [<ffffffff8119d546>] ? invalidate_inodes+0xf6/0x190 19:29:20: [<ffffffff8118334b>] generic_shutdown_super+0x5b/0xe0 19:29:20: [<ffffffff81183436>] kill_anon_super+0x16/0x60 19:29:21: [<ffffffffa05ceaca>] lustre_kill_super+0x4a/0x60 [obdclass] 19:29:21: [<ffffffff81183bd7>] deactivate_super+0x57/0x80 19:29:21: [<ffffffff811a1bff>] mntput_no_expire+0xbf/0x110 19:29:21: [<ffffffff811a266b>] sys_umount+0x7b/0x3a0 19:29:21: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b 19:29:21: 19:29:21:Kernel panic - not syncing: LBUG Maloo report: https://maloo.whamcloud.com/test_sets/84843efe-c8e9-11e2-97fe-52540035b04c
            mdiep Minh Diep added a comment -

            I have reproduced this issue and trigger kernel crash dump available at /scratch/ftp/uploads/LU-1480

            mdiep Minh Diep added a comment - I have reproduced this issue and trigger kernel crash dump available at /scratch/ftp/uploads/ LU-1480
            bobijam Zhenyu Xu added a comment -

            pushed a debug patch at http://review.whamcloud.com/6105

            bobijam Zhenyu Xu added a comment - pushed a debug patch at http://review.whamcloud.com/6105
            mdiep Minh Diep added a comment -

            I hit this very frequent using fc18 client running sanity test

            mdiep Minh Diep added a comment - I hit this very frequent using fc18 client running sanity test

            People

              bobijam Zhenyu Xu
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: