Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.8.0
-
3
-
9223372036854775807
Description
The HSM coordinator may still be running after the MDT has been unmounted. To see this let sanity-hsm run for a bit, then kill it, then unmount the MDT.
# export agt1_HOST=$HOSTNAME # ./lustre/tests/llmount.sh ... # bash lustre/tests/sanity-hsm.sh ... == sanity-hsm test 9: Use of explicit archive number, with dedicated copytool == 09:53:53 (1429541633) Purging archive on t Starting copytool agt1 on t Copytool is stopped on t Copytool has stopped in 2s. mdt.lustre-MDT0000.hsm_control=shutdown Waiting 20 secs for update ^C # umount /mnt/mds1
From the console:
... [ 1489.933004] Lustre: DEBUG MARKER: == sanity-hsm test 9: Use of explicit archive number, with dedicated copytool == 09:53:53 (1429541633) [ 1499.709003] Lustre: Failing over lustre-MDT0000 [ 1499.823727] Lustre: server umount lustre-MDT0000 complete [ 1500.063068] ------------[ cut here ]------------ [ 1500.063872] WARNING: at lib/list_debug.c:48 list_del+0x6e/0xa0() (Not tainted) [ 1500.064035] Hardware name: Bochs [ 1500.064035] list_del corruption. prev->next should be ffff8800c792dde8, but was 6b6b6b6b6b6b6b6b [ 1500.064035] Modules linked in: ... [ 1500.064035] Pid: 9019, comm: hsm_cdtr Not tainted 2.6.32-431.29.2.el6.lustre.x86_64 #1 [ 1500.064035] Call Trace: [ 1500.064035] [<ffffffff810741b7>] ? warn_slowpath_common+0x87/0xc0 [ 1500.064035] [<ffffffff810742a6>] ? warn_slowpath_fmt+0x46/0x50 [ 1500.064035] [<ffffffff812b528e>] ? list_del+0x6e/0xa0 [ 1500.064035] [<ffffffff8109efd1>] ? remove_wait_queue+0x31/0x50 [ 1500.064035] [<ffffffffa0f1e9b4>] ? mdt_coordinator+0xd94/0x1620 [mdt] [ 1500.064035] [<ffffffff81061d90>] ? default_wake_function+0x0/0x20 [ 1500.064035] [<ffffffff81553065>] ? thread_return+0x4e/0x7e9 [ 1500.064035] [<ffffffffa0f1dc20>] ? mdt_coordinator+0x0/0x1620 [mdt] [ 1500.064035] [<ffffffff8109e856>] ? kthread+0x96/0xa0 [ 1500.064035] [<ffffffff8100c30a>] ? child_rip+0xa/0x20 [ 1500.064035] [<ffffffff815562e0>] ? _spin_unlock_irq+0x30/0x40 [ 1500.064035] [<ffffffff8100bb10>] ? restore_args+0x0/0x30 [ 1500.064035] [<ffffffff8109e7c0>] ? kthread+0x0/0xa0 [ 1500.064035] [<ffffffff8100c300>] ? child_rip+0x0/0x20 [ 1500.064035] ---[ end trace a2fe1cd64beca73d ]--- [ 1500.064035] ------------[ cut here ]------------