[LU-5713] Interop 2.5<->2.7 sanity-lfsck test_8: (mdt_handler.c:4378:mdt_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed Created: 07/Oct/14 Updated: 17/Apr/17 Resolved: 17/Apr/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0, Lustre 2.8.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | WC Triage |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
server: lustre-master build # 2671 |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 16027 | ||||||||
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/34941bc6-4732-11e4-8a80-5254006e85c2. The sub-test test_8 failed with the following error: test failed to respond and timed out MDS console 09:49:10:Lustre: DEBUG MARKER: == sanity-lfsck test 8: LFSCK state machine == 15:48:59 (1411832939) 09:49:10:Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts 09:49:10:Lustre: DEBUG MARKER: umount -d -f /mnt/mds1 09:49:10:LustreError: 24584:0:(qsd_reint.c:54:qsd_reint_completion()) lustre-MDT0000: failed to enqueue global quota lock, glb fid:[0x200000006:0x10000:0x0], rc:-5 09:49:10:LustreError: 24584:0:(qsd_reint.c:54:qsd_reint_completion()) Skipped 2 previous similar messages 09:49:10:LustreError: 25070:0:(ldlm_lib.c:2106:target_stop_recovery_thread()) lustre-MDT0000: Aborting recovery 09:49:10:Lustre: 24587:0:(ldlm_lib.c:1781:target_recovery_overseer()) recovery is aborted, evict exports in recovery 09:49:10:LustreError: 24567:0:(osp_precreate.c:725:osp_precreate_cleanup_orphans()) lustre-OST0000-osc-MDT0000: cannot cleanup orphans: rc = -5 09:49:10:LustreError: 24569:0:(osp_precreate.c:466:osp_precreate_send()) lustre-OST0001-osc-MDT0000: can't precreate: rc = -5 09:49:10:LustreError: 24569:0:(osp_precreate.c:976:osp_precreate_thread()) lustre-OST0001-osc-MDT0000: cannot precreate objects: rc = -5 09:49:10:Lustre: *** cfs_fail_loc=1602, val=0*** 09:49:10:LustreError: 2658:0:(client.c:1075:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff88007be6f400 x1480397462816312/t0(0) o13->lustre-OST0001-osc-MDT0000@10.1.6.53@tcp:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 09:49:10:LustreError: 2658:0:(client.c:1075:ptlrpc_import_delay_req()) Skipped 12 previous similar messages 09:49:10:LustreError: 2643:0:(lod_dev.c:913:lod_device_free()) ASSERTION( atomic_read(&lu->ld_ref) == 0 ) failed: 09:49:10:LustreError: 25070:0:(mdt_handler.c:4378:mdt_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: 09:49:10:LustreError: 25070:0:(mdt_handler.c:4378:mdt_fini()) LBUG 09:49:10:Pid: 25070, comm: umount 09:49:10: 09:49:10:Call Trace: 09:49:10: [<ffffffffa0483895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 09:49:10: [<ffffffffa0483e97>] lbug_with_loc+0x47/0xb0 [libcfs] 09:49:10: [<ffffffffa0ecbb9f>] mdt_device_fini+0xc8f/0xcd0 [mdt] 09:49:10: [<ffffffffa05b95d6>] ? class_disconnect_exports+0x116/0x2f0 [obdclass] 09:49:10: [<ffffffffa05d8f52>] class_cleanup+0x562/0xd20 [obdclass] 09:49:10: [<ffffffffa05b6f16>] ? class_name2dev+0x56/0xe0 [obdclass] 09:49:10: [<ffffffffa05dac6a>] class_process_config+0x155a/0x1ac0 [obdclass] 09:49:10: [<ffffffffa05d39b5>] ? lustre_cfg_new+0x4f5/0x6f0 [obdclass] 09:49:10: [<ffffffffa05db347>] class_manual_cleanup+0x177/0x700 [obdclass] 09:49:10: [<ffffffffa05b6f16>] ? class_name2dev+0x56/0xe0 [obdclass] 09:49:10: [<ffffffffa0617a57>] server_put_super+0xb37/0xe50 [obdclass] 09:49:10: [<ffffffff8118b63b>] generic_shutdown_super+0x5b/0xe0 09:49:10: [<ffffffff8118b726>] kill_anon_super+0x16/0x60 09:49:10: [<ffffffffa05dd246>] lustre_kill_super+0x36/0x60 [obdclass] 09:49:10: [<ffffffff8118bec7>] deactivate_super+0x57/0x80 09:49:10: [<ffffffff811ab8cf>] mntput_no_expire+0xbf/0x110 09:49:10: [<ffffffff811ac41b>] sys_umount+0x7b/0x3a0 09:49:10: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b 09:49:10: 09:49:10:LustreError: 2643:0:(lod_dev.c:913:lod_device_free()) LBUG 09:49:10:Pid: 2643, comm: obd_zombid Info required for matching: sanity-lfsck 8 |
| Comments |
| Comment by Andreas Dilger [ 10/Feb/15 ] |
|
Hit this again: https://testing.hpdd.intel.com/test_sets/856f4938-aea4-11e4-98a3-5254006e85c2 |
| Comment by Bruno Faccini (Inactive) [ 27/Feb/15 ] |
|
Andreas, |
| Comment by Sarah Liu [ 01/Sep/15 ] |
|
Hit this error when running sanity test_65j with 2.8 server (DNE mode) and 2.5.3 client: MDS Lustre: DEBUG MARKER: == sanity test 65j: set default striping on root directory (bug 6367)=== 19:58:05 (1441076285)
LustreError: 46370:0:(client.c:1138:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff880821fb8cc0 x1511074876772396/t0(0) o1000->lustre-MDT0001-osp-MDT0000@0@lo:24/4 lens 248/16608 e 0 to 0 dl 0 ref 2 fl Rpc:/0/ffffffff rc 0/-1
LustreError: 46370:0:(client.c:1138:ptlrpc_import_delay_req()) Skipped 3 previous similar messages
LustreError: 46370:0:(osp_object.c:586:osp_attr_get()) lustre-MDT0001-osp-MDT0000:osp_attr_get update error [0x240000402:0x1:0x0]: rc = -5
LustreError: 46370:0:(llog.c:180:llog_cancel_rec()) lustre-MDT0001-osp-MDT0000: fail to write header for llog #0x1:1073742850#00000000: rc = -5
Lustre: lustre-MDT0000: Not available for connect from 10.2.4.56@tcp (stopping)
Lustre: Skipped 10 previous similar messages
LustreError: 11-0: lustre-MDT0000-osp-MDT0001: operation obd_ping to node 0@lo failed: rc = -107
Lustre: lustre-MDT0000-osp-MDT0002: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 2 previous similar messages
LustreError: Skipped 7 previous similar messages
Lustre: lustre-MDT0000: Not available for connect from 10.2.4.56@tcp (stopping)
Lustre: Skipped 18 previous similar messages
Lustre: lustre-MDT0000 is waiting for obd_unlinked_exports more than 8 seconds. The obd refcount = 3. Is it stuck?
LustreError: 4752:0:(osp_dev.c:1255:osp_device_free()) header@ffff880400337c80[0x0, 1, [0x1:0x0:0x0] hash exist]{
LustreError: 4752:0:(osp_dev.c:1255:osp_device_free()) ....local_storage@ffff880400337cd0
LustreError: 4752:0:(osp_dev.c:1255:osp_device_free()) ....osd-ldiskfs@ffff8804002f4e40osd-ldiskfs-object@ffff8804002f4e40(i:ffff88041678d6e0:81/3187989141)[plain]
LustreError: 4752:0:(osp_dev.c:1255:osp_device_free()) } header@ffff880400337c80
LustreError: 4752:0:(osp_dev.c:1255:osp_device_free()) header@ffff8804358af840[0x0, 1, [0x200000003:0x0:0x0] hash exist]{
LustreError: 4752:0:(osp_dev.c:1255:osp_device_free()) ....local_storage@ffff8804358af890
LustreError: 4752:0:(osp_dev.c:1255:osp_device_free()) ....osd-ldiskfs@ffff8804012a5200osd-ldiskfs-object@ffff8804012a5200(i:ffff880401c1b560:79/3187989106)[plain]
LustreError: 4752:0:(osp_dev.c:1255:osp_device_free()) } header@ffff8804358af840
LustreError: 4752:0:(osp_dev.c:1255:osp_device_free()) header@ffff8804002476c0[0x0, 1, [0x200000003:0x2:0x0] hash exist]{
LustreError: 4752:0:(osp_dev.c:1255:osp_device_free()) ....local_storage@ffff880400247710
LustreError: 4752:0:(osp_dev.c:1255:osp_device_free()) ....osd-ldiskfs@ffff880400f2ea80osd-ldiskfs-object@ffff880400f2ea80(i:ffff88042a53b350:80/3187989107)[plain]
LustreError: 4752:0:(osp_dev.c:1255:osp_device_free()) } header@ffff8804002476c0
LustreError: 4752:0:(osp_dev.c:1255:osp_device_free()) header@ffff88040004a5c0[0x0, 1, [0xa:0x0:0x0] hash exist]{
LustreError: 4752:0:(osp_dev.c:1255:osp_device_free()) ....local_storage@ffff88040004a610
LustreError: 4752:0:(osp_dev.c:1255:osp_device_free()) ....osd-ldiskfs@ffff88040031d800osd-ldiskfs-object@ffff88040031d800(i:ffff8804166a3110:82/3187989175)[plain]
LustreError: 4752:0:(osp_dev.c:1255:osp_device_free()) } header@ffff88040004a5c0
LustreError: 4752:0:(osp_dev.c:1255:osp_device_free()) header@ffff88040031dec0[0x1, 1, [0x200000001:0x1017:0x0] hash exist]{
LustreError: 4752:0:(osp_dev.c:1255:osp_device_free()) ....local_storage@ffff88040031df10
LustreError: 4752:0:(osp_dev.c:1255:osp_device_free()) ....osd-ldiskfs@ffff88040002f840osd-ldiskfs-object@ffff88040002f840(i:ffff8804220777a0:12/1262305354)[plain]
LustreError: 4752:0:(osp_dev.c:1255:osp_device_free()) } header@ffff88040031dec0
LustreError: 46370:0:(mdt_handler.c:4275:mdt_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed:
LustreError: 46370:0:(mdt_handler.c:4275:mdt_fini()) LBUG
Message fromLustreError: 4752:0:(osp_dev.c:1255:osp_device_free()) header@ffInitializing cgroup subsys cpuset
Initializing cgroup subsys cpu
|
| Comment by Frank Heckes (Inactive) [ 18/Nov/15 ] |
|
Error happens also during soak testing of build '20151116' (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20151116) upon umounting MDTs: <0>LustreError: 6529:0:(lod_dev.c:1570:lod_device_free()) ASSERTION( atomic_read(&lu->ld_ref) == 0 ) failed: lu is ffff8804006dc000 <0>LustreError: 6529:0:(lod_dev.c:1570:lod_device_free()) LBUG <4>Pid: 6529, comm: obd_zombid <4> <4>Call Trace: <0>LustreError: 13065:0:(mdt_handler.c:4295:mdt_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: <0>LustreError: 13065:0:(mdt_handler.c:4295:mdt_fini()) LBUG <4>Pid: 13065, comm: umount <4> <4>Call Trace: <4> [<ffffffffa08a5875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] <4> [<ffffffffa08a5e77>] lbug_with_loc+0x47/0xb0 [libcfs] <4> [<ffffffffa08a5875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] <4> [<ffffffffa08a5e77>] lbug_with_loc+0x47/0xb0 [libcfs] <4> [<ffffffffa13d7ba1>] lod_device_free+0x2c1/0x330 [lod] <4> [<ffffffffa1308b79>] mdt_device_fini+0xed9/0xf40 [mdt] <4> [<ffffffffa09d00d6>] ? class_disconnect_exports+0x116/0x2f0 [obdclass] <4> [<ffffffffa09ea1f2>] class_cleanup+0x572/0xd20 [obdclass] <4> [<ffffffffa09e3fed>] class_decref+0x3dd/0x4c0 [obdclass] <4> [<ffffffffa09cadb6>] ? class_name2dev+0x56/0xe0 [obdclass] <4> [<ffffffffa09ec876>] class_process_config+0x1ed6/0x2830 [obdclass] <4> [<ffffffffa08b16c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs] <4> [<ffffffff8117523c>] ? __kmalloc+0x21c/0x230 <4> [<ffffffffa09ceffc>] obd_zombie_impexp_cull+0x61c/0xac0 [obdclass] <4> [<ffffffffa09ed68f>] class_manual_cleanup+0x4bf/0x8e0 [obdclass] <4> [<ffffffffa09cadb6>] ? class_name2dev+0x56/0xe0 [obdclass] <4> [<ffffffffa0a2558c>] server_put_super+0xa0c/0xed0 [obdclass] <4> [<ffffffffa09cf505>] obd_zombie_impexp_thread+0x65/0x190 [obdclass] <4> [<ffffffff811ac776>] ? invalidate_inodes+0xf6/0x190 <4> [<ffffffff81190b7b>] generic_shutdown_super+0x5b/0xe0 <4> [<ffffffff81190c66>] kill_anon_super+0x16/0x60 <4> [<ffffffff81064c00>] ? default_wake_function+0x0/0x20 <4> [<ffffffffa09f0546>] lustre_kill_super+0x36/0x60 [obdclass] <4> [<ffffffff81191407>] deactivate_super+0x57/0x80 <4> [<ffffffff811b10df>] mntput_no_expire+0xbf/0x110 <4> [<ffffffff811b1c2b>] sys_umount+0x7b/0x3a0 <4> [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b <4> <4> [<ffffffffa09cf4a0>] ? obd_zombie_impexp_thread+0x0/0x190 [obdclass] <4> [<ffffffff8109e78e>] kthread+0x9e/0xc0 <4> [<ffffffff8100c28a>] child_rip+0xa/0x20 <4> [<ffffffff8109e6f0>] ? kthread+0x0/0xc0 <4> [<ffffffff8100c280>] ? child_rip+0x0/0x20 <4> <0>Kernel panic - not syncing: LBUG <4>Pid: 13065, comm: umount Tainted: P --------------- 2.6.32-504.30.3.el6_lustre.gb64632c.x86_64 #1 <4>Call Trace: <4> [<ffffffff81529c9c>] ? panic+0xa7/0x16f <4> [<ffffffffa08a5ecb>] ? lbug_with_loc+0x9b/0xb0 [libcfs] <4> [<ffffffffa1308b79>] ? mdt_device_fini+0xed9/0xf40 [mdt] <4> [<ffffffffa09d00d6>] ? class_disconnect_exports+0x116/0x2f0 [obdclass] <4> [<ffffffffa09ea1f2>] ? class_cleanup+0x572/0xd20 [obdclass] <4> [<ffffffffa09cadb6>] ? class_name2dev+0x56/0xe0 [obdclass] <4> [<ffffffffa09ec876>] ? class_process_config+0x1ed6/0x2830 [obdclass] <4> [<ffffffffa08b16c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs] <4> [<ffffffff8117523c>] ? __kmalloc+0x21c/0x230 <4> [<ffffffffa09ed68f>] ? class_manual_cleanup+0x4bf/0x8e0 [obdclass] <4> [<ffffffffa09cadb6>] ? class_name2dev+0x56/0xe0 [obdclass] <4> [<ffffffffa0a2558c>] ? server_put_super+0xa0c/0xed0 [obdclass] <4> [<ffffffff811ac776>] ? invalidate_inodes+0xf6/0x190 <4> [<ffffffff81190b7b>] ? generic_shutdown_super+0x5b/0xe0 <4> [<ffffffff81190c66>] ? kill_anon_super+0x16/0x60 <4> [<ffffffffa09f0546>] ? lustre_kill_super+0x36/0x60 [obdclass] <4> [<ffffffff81191407>] ? deactivate_super+0x57/0x80 <4> [<ffffffff811b10df>] ? mntput_no_expire+0xbf/0x110 <4> [<ffffffff811b1c2b>] ? sys_umount+0x7b/0x3a0 <4> [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b crash dump file is available via cluster lola's head node: lhn:/scratch/crashdumps/lu-5713/lola-8-127.0.0.1-2015-11-18-04:11:26 |
| Comment by James Nunez (Inactive) [ 16/Dec/15 ] |
|
We hit this LBUG on master (not interop testing): |
| Comment by Andreas Dilger [ 17/Apr/17 ] |
|
Close old issue. |