Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
Lustre 2.8.0
-
autotest
-
3
-
9223372036854775807
Description
Replay-single test 70d hangs on unmounts of one of the MDTs in a DNE configuration. Logs for this failure are at https://testing.hpdd.intel.com/test_sets/0c5cca82-5c61-11e5-9065-5254006e85c2 .
From the MDS2, MDS3, MDS4 console logs, we see
00:57:23:Lustre: lustre-MDT0001 is waiting for obd_unlinked_exports more than 32 seconds. The obd refcount = 5. Is it stuck? 00:57:23:Lustre: lustre-MDT0001 is waiting for obd_unlinked_exports more than 64 seconds. The obd refcount = 5. Is it stuck? 00:57:23:Lustre: lustre-MDT0001: Not available for connect from 10.2.4.158@tcp (stopping) 00:57:23:Lustre: Skipped 336 previous similar messages 00:57:23:INFO: task umount:4687 blocked for more than 120 seconds. 00:57:23: Not tainted 2.6.32-573.3.1.el6_lustre.gde57418.x86_64 #1 00:57:23:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 00:57:23:umount D 0000000000000000 0 4687 4686 0x00000080 00:57:23: ffff88005cbefa38 0000000000000082 0000000000000000 ffff88005cbef9d8 00:57:23: ffff88005cbef998 000200000a0204a1 00000440a839357a 0000000000000000 00:57:23: ffff880056458044 000000010042c434 ffff8800688e9068 ffff88005cbeffd8 00:57:23:Call Trace: 01:05:22: [<ffffffff81539a62>] schedule_timeout+0x192/0x2e0 01:05:22: [<ffffffff81089c10>] ? process_timeout+0x0/0x10 01:05:22: [<ffffffffa05ce6b6>] obd_exports_barrier+0xb6/0x190 [obdclass] 01:05:22: [<ffffffffa0e9c145>] mdt_device_fini+0x475/0xf40 [mdt] 01:05:22: [<ffffffffa05d0496>] ? class_disconnect_exports+0x116/0x2f0 [obdclass] 01:05:22: [<ffffffffa05ea8b2>] class_cleanup+0x572/0xd30 [obdclass] 01:05:23: [<ffffffffa05cb156>] ? class_name2dev+0x56/0xe0 [obdclass] 01:05:23: [<ffffffffa05ecf46>] class_process_config+0x1ed6/0x2840 [obdclass] 01:05:23: [<ffffffffa04b2b61>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 01:05:23: [<ffffffff8117892c>] ? __kmalloc+0x21c/0x230 01:05:23: [<ffffffffa05edd6f>] class_manual_cleanup+0x4bf/0x8e0 [obdclass] 01:05:23: [<ffffffffa05cb156>] ? class_name2dev+0x56/0xe0 [obdclass] 01:05:23: [<ffffffffa062721c>] server_put_super+0xa0c/0xed0 [obdclass] 01:05:23: [<ffffffff811b0176>] ? invalidate_inodes+0xf6/0x190 01:05:23: [<ffffffff811943bb>] generic_shutdown_super+0x5b/0xe0 01:05:23: [<ffffffff811944a6>] kill_anon_super+0x16/0x60 01:05:23: [<ffffffffa05f0c26>] lustre_kill_super+0x36/0x60 [obdclass] 01:05:23: [<ffffffff81194c47>] deactivate_super+0x57/0x80 01:05:23: [<ffffffff811b4adf>] mntput_no_expire+0xbf/0x110 01:05:23: [<ffffffff811b562b>] sys_umount+0x7b/0x3a0 01:05:23: [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b 01:05:23:Lustre: lustre-MDT0001 is waiting for obd_unlinked_exports more than 128 seconds. The obd refcount = 5. Is it stuck?
Other instances of this failure:
2015-09-08 23:27:49 – https://testing.hpdd.intel.com/test_sets/fe085252-56cb-11e5-84d0-5254006e85c2
2015-09-11 10:01:48 - https://testing.hpdd.intel.com/test_sets/23ba7c20-58cd-11e5-baa0-5254006e85c2
2015-09-11 19:25:28 - https://testing.hpdd.intel.com/test_sets/a55b755e-5904-11e5-a4d9-5254006e85c2
There are a few other cases of this test failing with same stack trace, but MDS console messages differ:
2015-09-09 05:56:14 - https://testing.hpdd.intel.com/test_sets/3dda199c-56fe-11e5-8947-5254006e85c2
2015-09-12 19:16:06 - https://testing.hpdd.intel.com/test_sets/53eb4ab0-59d0-11e5-825b-5254006e85c2
All of these failures are in review-dne-part-2