Details
-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
Lustre 2.4.1
-
Lustre build: http://build.whamcloud.com/job/lustre-b2_4/44/ (2.4.1 RC1)
Distro/Arch: RHEL6.4/x86_64
FSTYPE=zfs
MDSCOUNT=4
-
3
-
10178
Description
conf-sanity test 47 hung as follows:
umount lustre on /mnt/lustre..... CMD: wtm-29vm6.rosso.whamcloud.com grep -c /mnt/lustre' ' /proc/mounts Stopping client wtm-29vm6.rosso.whamcloud.com /mnt/lustre (opts:) CMD: wtm-29vm6.rosso.whamcloud.com lsof -t /mnt/lustre CMD: wtm-29vm6.rosso.whamcloud.com umount /mnt/lustre 2>&1 stop ost1 service on wtm-29vm4 CMD: wtm-29vm4 grep -c /mnt/ost1' ' /proc/mounts Stopping /mnt/ost1 (opts:-f) on wtm-29vm4 CMD: wtm-29vm4 umount -d -f /mnt/ost1
Stack trace on OSS wtm-29vm4 showed that:
umount D 0000000000000000 0 9981 9980 0x00000080 ffff8800740a38c8 0000000000000086 ffffffff81ead540 0000000000000282 ffffffff8100b9ce 0000000000000282 ffff8800740a3868 ffffffff810810cc ffff8800647dfab8 ffff8800740a3fd8 000000000000fb88 ffff8800647dfab8 Call Trace: [<ffffffff8100b9ce>] ? common_interrupt+0xe/0x13 [<ffffffff810810cc>] ? lock_timer_base+0x3c/0x70 [<ffffffff8150f322>] schedule_timeout+0x192/0x2e0 [<ffffffff810811e0>] ? process_timeout+0x0/0x10 [<ffffffff8150f48e>] schedule_timeout_uninterruptible+0x1e/0x20 [<ffffffffa04ae9da>] dnode_special_close+0x2a/0x60 [zfs] [<ffffffffa04a3562>] dmu_objset_evict+0x92/0x400 [zfs] [<ffffffffa04b4840>] dsl_dataset_evict+0x30/0x1b0 [zfs] [<ffffffffa0494d59>] dbuf_evict_user+0x49/0x80 [zfs] [<ffffffffa0495c77>] dbuf_rele_and_unlock+0xf7/0x1e0 [zfs] [<ffffffffa04960d0>] dmu_buf_rele+0x30/0x40 [zfs] [<ffffffffa04b9d60>] dsl_dataset_disown+0xb0/0x1d0 [zfs] [<ffffffffa04a2671>] dmu_objset_disown+0x11/0x20 [zfs] [<ffffffffa0db65ee>] udmu_objset_close+0x2e/0x40 [osd_zfs] [<ffffffffa0db4e0b>] osd_device_fini+0x34b/0x5b0 [osd_zfs] [<ffffffffa073fbf7>] class_cleanup+0x577/0xda0 [obdclass] [<ffffffffa0714b36>] ? class_name2dev+0x56/0xe0 [obdclass] [<ffffffffa07414dc>] class_process_config+0x10bc/0x1c80 [obdclass] [<ffffffffa05e0d98>] ? libcfs_log_return+0x28/0x40 [libcfs] [<ffffffffa073ad41>] ? lustre_cfg_new+0x391/0x7e0 [obdclass] [<ffffffffa0742219>] class_manual_cleanup+0x179/0x6f0 [obdclass] [<ffffffffa05e0d98>] ? libcfs_log_return+0x28/0x40 [libcfs] [<ffffffffa0db3fdd>] osd_obd_disconnect+0x1bd/0x1c0 [osd_zfs] [<ffffffffa07442ae>] lustre_put_lsi+0x17e/0x1100 [obdclass] [<ffffffffa074cff8>] lustre_common_put_super+0x5f8/0xc40 [obdclass] [<ffffffffa05e62d1>] ? libcfs_debug_msg+0x41/0x50 [libcfs] [<ffffffffa0776b7a>] server_put_super+0x1ca/0xf00 [obdclass] [<ffffffff8118363b>] generic_shutdown_super+0x5b/0xe0 [<ffffffff81183726>] kill_anon_super+0x16/0x60 [<ffffffffa07440d6>] lustre_kill_super+0x36/0x60 [obdclass] [<ffffffff81183ec7>] deactivate_super+0x57/0x80 [<ffffffff811a21bf>] mntput_no_expire+0xbf/0x110 [<ffffffff811a2c2b>] sys_umount+0x7b/0x3a0 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Console log on OSS wtm-29vm4 showed that:
22:49:17:LustreError: 167-0: lustre-MDT0000-lwp-OST0000: This client was evicted by lustre-MDT0000; in progress operations using this service will fail. 22:49:17:LustreError: 9656:0:(osd_oi.c:239:osd_fld_lookup()) lustre-OST0000-osd: cannot find FLD range for [0x200000402:0x0:0x0]: rc = -5 22:49:17:LustreError: 9656:0:(osd_oi.c:256:fid_is_on_ost()) lustre-OST0000-osd: Can not lookup fld for [0x200000402:0x0:0x0] 22:49:17:LustreError: 9883:0:(osd_oi.c:239:osd_fld_lookup()) lustre-OST0000-osd: cannot find FLD range for [0x200000400:0x0:0x0]: rc = -5 22:49:17:LustreError: 9883:0:(osd_oi.c:256:fid_is_on_ost()) lustre-OST0000-osd: Can not lookup fld for [0x200000400:0x0:0x0] 22:49:17:Lustre: lustre-MDT0000-lwp-OST0000: Connection restored to lustre-MDT0000 (at 10.10.17.33@tcp) 22:49:17:LustreError: 9885:0:(ofd_obd.c:1207:ofd_create()) lustre-OST0000: Can't find FID Sequence 0x200000400: rc = -17 22:49:17:LustreError: 9656:0:(ofd_obd.c:1207:ofd_create()) lustre-OST0000: Can't find FID Sequence 0x200000402: rc = -17
Maloo report: https://maloo.whamcloud.com/test_sets/a0788f4c-1647-11e3-aa2a-52540035b04c