[LU-10621] sanity-flr test 31 LBUG on OST umount with “ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed:” Created: 06/Feb/18  Updated: 06/Feb/18

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

sanity-flr test_31 has only hung once; https://testing.hpdd.intel.com/test_sets/ececb25a-06d3-11e8-a10a-52540065bddc. From this test session, the last thing we see in the test_log is

== sanity-flr test 31: make sure glimpse request can be retried ====================================== 20:48:28 (1517431708)
fail_loc=0x1A00
CMD: trevis-41vm3 grep -c /mnt/lustre-ost1' ' /proc/mounts
Stopping /mnt/lustre-ost1 (opts:) on trevis-41vm3
CMD: trevis-41vm3 umount -d /mnt/lustre-ost1

Looking at the console log of the OSTs (vm3), we see:

 [ 5669.616463] Lustre: DEBUG MARKER: == sanity-flr test 31: make sure glimpse request can be retried ====================================== 20:48:28 (1517431708)
[ 5669.797671] Lustre: DEBUG MARKER: grep -c /mnt/lustre-ost1' ' /proc/mounts
[ 5670.115051] Lustre: DEBUG MARKER: umount -d /mnt/lustre-ost1
[ 5670.280194] Lustre: Failing over lustre-OST0000
[ 5670.306388] LustreError: 1200:0:(ofd_dev.c:250:ofd_stack_fini()) header@ffff88005c2d0960[0x1, 1, [0x100000000:0xccf:0x0] hash exist]{
[ 5670.306388] 
[ 5670.311566] LustreError: 1200:0:(ofd_dev.c:250:ofd_stack_fini()) ....obdfilter@ffff88005c2d09b0obdfilter-object@ffff88005c2d09b0
[ 5670.311566] 
[ 5670.316596] LustreError: 1200:0:(ofd_dev.c:250:ofd_stack_fini()) ....osd-zfs@ffff88005530b270osd-zfs-object@ffff88005530b270
[ 5670.316596] 
[ 5670.320351] LustreError: 1200:0:(ofd_dev.c:250:ofd_stack_fini()) } header@ffff88005c2d0960
[ 5670.320351] 
[ 5670.324164] LustreError: 1200:0:(ofd_dev.c:3057:ofd_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: 
[ 5670.326440] LustreError: 1200:0:(ofd_dev.c:3057:ofd_fini()) LBUG
[ 5670.328461] Pid: 1200, comm: umount
[ 5670.330250] 
[ 5670.330250] Call Trace:
[ 5670.333500]  [<ffffffffc067d7ae>] libcfs_call_trace+0x4e/0x60 [libcfs]
[ 5670.335432]  [<ffffffffc067d83c>] lbug_with_loc+0x4c/0xb0 [libcfs]
[ 5670.337340]  [<ffffffffc11bae63>] ofd_device_fini+0x2c3/0x2d0 [ofd]
[ 5670.339235]  [<ffffffffc0c8008c>] class_cleanup+0x8cc/0xc40 [obdclass]
[ 5670.341180]  [<ffffffffc0c810ac>] class_process_config+0x62c/0x28a0 [obdclass]
[ 5670.343119]  [<ffffffffc0688d47>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
[ 5670.345063]  [<ffffffffc0c834e6>] class_manual_cleanup+0x1c6/0x710 [obdclass]
[ 5670.347013]  [<ffffffffc0cb30ce>] server_put_super+0x8de/0xcd0 [obdclass]
[ 5670.348965]  [<ffffffff812054d2>] generic_shutdown_super+0x72/0x100
[ 5670.350875]  [<ffffffff812058a2>] kill_anon_super+0x12/0x20
[ 5670.352772]  [<ffffffffc0c85f32>] lustre_kill_super+0x32/0x50 [obdclass]
[ 5670.354721]  [<ffffffff81205c59>] deactivate_locked_super+0x49/0x60
[ 5670.356674]  [<ffffffff812063c6>] deactivate_super+0x46/0x60
[ 5670.358685]  [<ffffffff8122376f>] cleanup_mnt+0x3f/0x80
[ 5670.360727]  [<ffffffff81223802>] __cleanup_mnt+0x12/0x20
[ 5670.362669]  [<ffffffff810aee05>] task_work_run+0xc5/0xf0
[ 5670.364564]  [<ffffffff8102ab52>] do_notify_resume+0x92/0xb0
[ 5670.366471]  [<ffffffff816b8d37>] int_signal+0x12/0x17
[ 5670.368362] 
[ 5670.369916] Kernel panic - not syncing: LBUG
[ 5670.370906] CPU: 1 PID: 1200 Comm: umount Tainted: P           OE  ------------   3.10.0-693.11.6.el7_lustre.x86_64 #1
[ 5670.370906] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007

Generated at Sat Feb 10 02:36:43 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.