[LU-3895] conf-sanity test 47: umount -d -f /mnt/ost1 Created: 06/Sep/13  Updated: 02/Jun/14  Resolved: 02/Jun/14

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.1
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Jian Yu Assignee: Nathaniel Clark
Resolution: Duplicate Votes: 0
Labels: MB, dne, yuc2, zfs
Environment:

Lustre build: http://build.whamcloud.com/job/lustre-b2_4/44/ (2.4.1 RC1)
Distro/Arch: RHEL6.4/x86_64
FSTYPE=zfs
MDSCOUNT=4


Issue Links:
Related
is related to LU-3582 Runtests failed: old and new files ar... Resolved
is related to LU-4349 conf-sanity test_47: test failed to r... Resolved
Severity: 3
Rank (Obsolete): 10178

 Description   

conf-sanity test 47 hung as follows:

umount lustre on /mnt/lustre.....
CMD: wtm-29vm6.rosso.whamcloud.com grep -c /mnt/lustre' ' /proc/mounts
Stopping client wtm-29vm6.rosso.whamcloud.com /mnt/lustre (opts:)
CMD: wtm-29vm6.rosso.whamcloud.com lsof -t /mnt/lustre
CMD: wtm-29vm6.rosso.whamcloud.com umount  /mnt/lustre 2>&1
stop ost1 service on wtm-29vm4
CMD: wtm-29vm4 grep -c /mnt/ost1' ' /proc/mounts
Stopping /mnt/ost1 (opts:-f) on wtm-29vm4
CMD: wtm-29vm4 umount -d -f /mnt/ost1

Stack trace on OSS wtm-29vm4 showed that:

umount        D 0000000000000000     0  9981   9980 0x00000080
 ffff8800740a38c8 0000000000000086 ffffffff81ead540 0000000000000282
 ffffffff8100b9ce 0000000000000282 ffff8800740a3868 ffffffff810810cc
 ffff8800647dfab8 ffff8800740a3fd8 000000000000fb88 ffff8800647dfab8
Call Trace:
 [<ffffffff8100b9ce>] ? common_interrupt+0xe/0x13
 [<ffffffff810810cc>] ? lock_timer_base+0x3c/0x70
 [<ffffffff8150f322>] schedule_timeout+0x192/0x2e0
 [<ffffffff810811e0>] ? process_timeout+0x0/0x10
 [<ffffffff8150f48e>] schedule_timeout_uninterruptible+0x1e/0x20
 [<ffffffffa04ae9da>] dnode_special_close+0x2a/0x60 [zfs]
 [<ffffffffa04a3562>] dmu_objset_evict+0x92/0x400 [zfs]
 [<ffffffffa04b4840>] dsl_dataset_evict+0x30/0x1b0 [zfs]
 [<ffffffffa0494d59>] dbuf_evict_user+0x49/0x80 [zfs]
 [<ffffffffa0495c77>] dbuf_rele_and_unlock+0xf7/0x1e0 [zfs]
 [<ffffffffa04960d0>] dmu_buf_rele+0x30/0x40 [zfs]
 [<ffffffffa04b9d60>] dsl_dataset_disown+0xb0/0x1d0 [zfs]
 [<ffffffffa04a2671>] dmu_objset_disown+0x11/0x20 [zfs]
 [<ffffffffa0db65ee>] udmu_objset_close+0x2e/0x40 [osd_zfs]
 [<ffffffffa0db4e0b>] osd_device_fini+0x34b/0x5b0 [osd_zfs]
 [<ffffffffa073fbf7>] class_cleanup+0x577/0xda0 [obdclass]
 [<ffffffffa0714b36>] ? class_name2dev+0x56/0xe0 [obdclass]
 [<ffffffffa07414dc>] class_process_config+0x10bc/0x1c80 [obdclass]
 [<ffffffffa05e0d98>] ? libcfs_log_return+0x28/0x40 [libcfs]
 [<ffffffffa073ad41>] ? lustre_cfg_new+0x391/0x7e0 [obdclass]
 [<ffffffffa0742219>] class_manual_cleanup+0x179/0x6f0 [obdclass]
 [<ffffffffa05e0d98>] ? libcfs_log_return+0x28/0x40 [libcfs]
 [<ffffffffa0db3fdd>] osd_obd_disconnect+0x1bd/0x1c0 [osd_zfs]
 [<ffffffffa07442ae>] lustre_put_lsi+0x17e/0x1100 [obdclass]
 [<ffffffffa074cff8>] lustre_common_put_super+0x5f8/0xc40 [obdclass]
 [<ffffffffa05e62d1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
 [<ffffffffa0776b7a>] server_put_super+0x1ca/0xf00 [obdclass]
 [<ffffffff8118363b>] generic_shutdown_super+0x5b/0xe0
 [<ffffffff81183726>] kill_anon_super+0x16/0x60
 [<ffffffffa07440d6>] lustre_kill_super+0x36/0x60 [obdclass]
 [<ffffffff81183ec7>] deactivate_super+0x57/0x80
 [<ffffffff811a21bf>] mntput_no_expire+0xbf/0x110
 [<ffffffff811a2c2b>] sys_umount+0x7b/0x3a0
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

Console log on OSS wtm-29vm4 showed that:

22:49:17:LustreError: 167-0: lustre-MDT0000-lwp-OST0000: This client was evicted by lustre-MDT0000; in progress operations using this service will fail.
22:49:17:LustreError: 9656:0:(osd_oi.c:239:osd_fld_lookup()) lustre-OST0000-osd: cannot find FLD range for [0x200000402:0x0:0x0]: rc = -5
22:49:17:LustreError: 9656:0:(osd_oi.c:256:fid_is_on_ost()) lustre-OST0000-osd: Can not lookup fld for [0x200000402:0x0:0x0]
22:49:17:LustreError: 9883:0:(osd_oi.c:239:osd_fld_lookup()) lustre-OST0000-osd: cannot find FLD range for [0x200000400:0x0:0x0]: rc = -5
22:49:17:LustreError: 9883:0:(osd_oi.c:256:fid_is_on_ost()) lustre-OST0000-osd: Can not lookup fld for [0x200000400:0x0:0x0]
22:49:17:Lustre: lustre-MDT0000-lwp-OST0000: Connection restored to lustre-MDT0000 (at 10.10.17.33@tcp)
22:49:17:LustreError: 9885:0:(ofd_obd.c:1207:ofd_create()) lustre-OST0000: Can't find FID Sequence 0x200000400: rc = -17
22:49:17:LustreError: 9656:0:(ofd_obd.c:1207:ofd_create()) lustre-OST0000: Can't find FID Sequence 0x200000402: rc = -17

Maloo report: https://maloo.whamcloud.com/test_sets/a0788f4c-1647-11e3-aa2a-52540035b04c



 Comments   
Comment by Jian Yu [ 06/Sep/13 ]

By searching on Maloo, I found this test failed in review-dne-zfs test sessions on both b2_4 and master branches.

Comment by Nathaniel Clark [ 06/Sep/13 ]

This bug appears to share some similarities to LU-3582 in that there are bad FLD lookups on the OST after an "unmount"

Comment by Nathaniel Clark [ 04/Mar/14 ]

This seems to only happen on b2_4 and the last occurrence was 2013-11-18
https://maloo.whamcloud.com/test_sets/43b81376-50e5-11e3-9ca9-52540035b04c

I cannot find any occurrences on b2_5 or master all of the failures of this specific test seem to be LU-4349.

Comment by Jodi Levi (Inactive) [ 19/Mar/14 ]

Duplicate of LU-3582

Comment by Jodi Levi (Inactive) [ 02/Jun/14 ]

Reopening to remove fix version as this is a duplicate.

Generated at Sat Feb 10 01:37:51 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.