Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.10.0, Lustre 2.10.1, Lustre 2.11.0
-
None
-
trevis, failover
servers: EL7, zfs, master branch, v2.9.59_15_g107b2cb, b3603
clients: EL7, master branch, v2.9.59_15_g107b2cb, b3603
-
3
-
9223372036854775807
Description
https://testing.hpdd.intel.com/test_sessions/07818c64-6912-4446-814a-c3cdec28854c
Could not find another ticket with a replay-single timeout and a hung umount on a client. This config also has hung kworker processes on several VMs, but this client umount issue might be a more likely root cause.
From Client 3 dmesg:
if [ $running -ne 0 ] ; then
echo Stopping client $(hostname) /mnt/lustre2 opts:;
lsof /mnt/lustre2 || need_kill=no;
if [ x != x -a x$need_kill != xno ]; then
pids=$(lsof -t /mnt/lustre2 | sort -u);
[11520.078055] INFO: task umount:1234 blocked for more than 120 seconds.
[11520.079506] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2.225724] intel_powerclamp: No package C-state available
[11520.081059] umount D
[11520.082569] ffff880069527dc8 0 1234 1227 0x00000080
[11520.083815] ffff8800654fbae0 0000000000000086 ffff88007a732f10 ffff8800654fbfd8
[11520.085531] ffff8800654fbfd8 ffff8800654fbfd8 ffff88007a732f10 ffff880069527dc0
[11520.087219] ffff880069527dc4 ffff88007a732f10 00000000ffffffff ffff880069527dc8
[11520.088795] Call Trace:
[11520.090178] [<ffffffff8168d6c9>] schedule_preempt_disabled+0x29/0x70
[11520.091653] [<ffffffff8168b315>] __mutex_lock_slowpath+0xc5/0x1d0
[11520.093349] [<ffffffff8168a76f>] mutex_lock+0x1f/0x2f
[11520.094874] [<ffffffffa06bb101>] mgc_process_config+0x201/0x13e0 [mgc]
[11520.096613] [<ffffffffa07a1615>] obd_process_config.constprop.13+0x85/0x2d0 [obdclass]
[11520.098405] [<ffffffffa0658b37>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
[11520.100064] [<ffffffffa078e319>] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
[11520.101783] [<ffffffffa07a293f>] lustre_end_log+0x1ff/0x550 [obdclass]
[11520.103515] [<ffffffffa0b9968d>] ll_put_super+0x8d/0xaa0 [lustre]
[11520.105178] [<ffffffff81243207>] ? fsnotify_clear_marks_by_inode+0xa7/0x140
[11520.106902] [<ffffffff81138fbd>] ? call_rcu_sched+0x1d/0x20
[11520.108563] [<ffffffffa0bc40ec>] ? ll_destroy_inode+0x1c/0x20 [lustre]
[11520.110314] [<ffffffff8121a718>] ? destroy_inode+0x38/0x60
[11520.111942] [<ffffffff8121a846>] ? evict+0x106/0x170
[11520.113553] [<ffffffff8121a8ee>] ? dispose_list+0x3e/0x50
[11520.115235] [<ffffffff8121b544>] ? evict_inodes+0x114/0x140
[11520.116820] [<ffffffff81200da2>] generic_shutdown_super+0x72/0xf0
[11520.118407] [<ffffffff81201172>] kill_anon_super+0x12/0x20
[11520.119958] [<ffffffffa07a0cb5>] lustre_kill_super+0x45/0x50 [obdclass]
[11520.121606] [<ffffffff81201529>] deactivate_locked_super+0x49/0x60
[11520.123217] [<ffffffff81201b26>] deactivate_super+0x46/0x60
[11520.124774] [<ffffffff8121ef65>] mntput_no_expire+0xc5/0x120
[11520.126317] [<ffffffff812200a0>] SyS_umount+0xa0/0x3b0
[11520.127822] [<ffffffff816975c9>] system_call_fastpath+0x16/0x1b