[LU-4365] recovery-small test_51: LBUG: (qsd_lib.c:294:qsd_qtype_fini()) ASSERTION( atomic_read(&qqi->qqi_ref) == 1 ) failed - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Blocker
Fix Version/s: Lustre 2.6.0, Lustre 2.4.2, Lustre 2.5.1
Affects Version/s: Lustre 2.4.1, Lustre 2.5.0
Labels:
None
Environment:

Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/63/
Distro/Arch: RHEL6.4/x86_64
FSTYPE=zfs

Severity:
3
Rank (Obsolete):
11948

Description

recovery-small test_51 hung as follows:

Failing mds1 on client-30vm3
CMD: client-30vm3 grep -c /mnt/mds1' ' /proc/mounts
Stopping /mnt/mds1 (opts:) on client-30vm3
CMD: client-30vm3 umount -d /mnt/mds1

LBUG occurred on MDS:

23:04:50:Lustre: DEBUG MARKER: umount -d /mnt/mds1
23:04:50:LustreError: 7276:0:(ldlm_resource.c:804:ldlm_resource_complain()) lustre-MDT0000-lwp-MDT0000: namespace resource [0x200000006:0x1010000:0x0].0 (ffff8800709d3980) refcount nonzero (1) after lock cleanup; forcing cleanup.
23:04:50:LustreError: 7276:0:(ldlm_resource.c:804:ldlm_resource_complain()) Skipped 1 previous similar message
23:04:50:LustreError: 7276:0:(ldlm_resource.c:1415:ldlm_resource_dump()) --- Resource: [0x200000006:0x1010000:0x0].0 (ffff8800709d3980) refcount = 2
23:04:50:LustreError: 7276:0:(ldlm_resource.c:1418:ldlm_resource_dump()) Granted locks (in reverse order):
23:04:50:LustreError: 7276:0:(ldlm_resource.c:1421:ldlm_resource_dump()) ### ### ns: lustre-MDT0000-lwp-MDT0000 lock: ffff880063768b40/0x3b627f62d662915 lrc: 2/1,0 mode: CR/CR res: [0x200000006:0x1010000:0x0].0 rrc: 2 type: PLN flags: 0x1106400000000 nid: local remote: 0x3b627f62d66293f expref: -99 pid: 6945 timeout: 0 lvb_type: 2
23:04:50:LustreError: 7276:0:(ldlm_resource.c:1421:ldlm_resource_dump()) Skipped 1 previous similar message
23:04:50:LustreError: 7276:0:(ldlm_resource.c:1415:ldlm_resource_dump()) --- Resource: [0x200000006:0x10000:0x0].0 (ffff8800709d3ac0) refcount = 2
23:04:50:LustreError: 7276:0:(ldlm_resource.c:1418:ldlm_resource_dump()) Granted locks (in reverse order):
23:04:50:LustreError: 7276:0:(ldlm_lib.c:2137:target_stop_recovery_thread()) lustre-MDT0000: Aborting recovery
23:04:50:Lustre: 6947:0:(ldlm_lib.c:1801:target_recovery_overseer()) recovery is aborted, evict exports in recovery
23:04:50:Lustre: 6947:0:(ldlm_lib.c:1801:target_recovery_overseer()) Skipped 2 previous similar messages
23:04:50:LustreError: 6937:0:(osp_precreate.c:737:osp_precreate_cleanup_orphans()) lustre-OST0000-osc-MDT0000: cannot cleanup orphans: rc = -5
23:04:50:LustreError: 7276:0:(qsd_lib.c:294:qsd_qtype_fini()) ASSERTION( atomic_read(&qqi->qqi_ref) == 1 ) failed: 
23:04:50:LustreError: 7276:0:(qsd_lib.c:294:qsd_qtype_fini()) LBUG
23:04:50:Pid: 7276, comm: umount
23:04:50:
23:04:50:Call Trace:
23:04:50: [<ffffffffa05c2895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
23:04:50: [<ffffffffa05c2e97>] lbug_with_loc+0x47/0xb0 [libcfs]
23:04:51: [<ffffffffa0efcc23>] qsd_qtype_fini+0x363/0x3a0 [lquota]
23:04:51: [<ffffffffa0efcfc8>] qsd_fini+0xd8/0x460 [lquota]
23:04:51: [<ffffffffa0d751a8>] osd_shutdown+0x38/0xe0 [osd_zfs]
23:04:51: [<ffffffffa0d78571>] osd_process_config+0x141/0x190 [osd_zfs]
23:04:51: [<ffffffffa0843b41>] lod_process_config+0x4e1/0x17a0 [lod]
23:04:51: [<ffffffffa05d32d1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
23:04:51: [<ffffffff8100bb8e>] ? apic_timer_interrupt+0xe/0x20
23:04:52: [<ffffffffa118ed00>] mdd_process_config+0x210/0x610 [mdd]
23:04:52: [<ffffffffa1722c56>] mdt_stack_fini+0x176/0xbe0 [mdt]
23:04:52: [<ffffffffa1189950>] ? mdd_init_capa_ctxt+0x120/0x130 [mdd]
23:04:52: [<ffffffffa1723dca>] mdt_device_fini+0x70a/0xd90 [mdt]
23:04:52: [<ffffffffa071fbc7>] class_cleanup+0x577/0xda0 [obdclass]
23:04:52: [<ffffffffa06f4b06>] ? class_name2dev+0x56/0xe0 [obdclass]
23:04:52: [<ffffffffa07214ac>] class_process_config+0x10bc/0x1c80 [obdclass]
23:04:52: [<ffffffffa05cdd98>] ? libcfs_log_return+0x28/0x40 [libcfs]
23:04:52: [<ffffffffa071ad11>] ? lustre_cfg_new+0x391/0x7e0 [obdclass]
23:04:52: [<ffffffffa07221e9>] class_manual_cleanup+0x179/0x6f0 [obdclass]
23:04:52: [<ffffffffa06f4b06>] ? class_name2dev+0x56/0xe0 [obdclass]
23:04:52: [<ffffffffa07571dc>] server_put_super+0x5bc/0xf00 [obdclass]
23:04:52: [<ffffffff8118366b>] generic_shutdown_super+0x5b/0xe0
23:04:52: [<ffffffff81183756>] kill_anon_super+0x16/0x60
23:04:52: [<ffffffffa07240a6>] lustre_kill_super+0x36/0x60 [obdclass]
23:04:52: [<ffffffff81183ef7>] deactivate_super+0x57/0x80
23:04:52: [<ffffffff811a21ef>] mntput_no_expire+0xbf/0x110
23:04:52: [<ffffffff811a2c5b>] sys_umount+0x7b/0x3a0
23:04:52: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
23:04:52:
23:04:52:Kernel panic - not syncing: LBUG

Maloo report: https://maloo.whamcloud.com/test_sets/beee4bb4-60cb-11e3-bd66-52540035b04c

recovery-small test_51: LBUG: (qsd_lib.c:294:qsd_qtype_fini()) ASSERTION( atomic_read(&qqi->qqi_ref) == 1 ) failed

Details

Description

Attachments

Activity

People

Dates