[LU-2653] hang in recovery-small test 51 Created: 19/Jan/13  Updated: 26/Apr/17  Resolved: 26/Apr/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Oleg Drokin Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 6195

 Description   

I seem to have a somewhat frequent hang on lustre cleanup that looks like this:

[46801.719874] INFO: task umount:16892 blocked for more than 120 seconds.
[46801.720068] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[46801.720819] umount        D 0000000000000000  2608 16892  16891 0x00000000
[46801.721019]  ffff880041defa18 0000000000000086 ffff880041def9e0 ffff880041def9dc
[46801.721325]  ffff880041dee000 ffff8800bcc24100 ffff8800062d67c0 0000000000000001
[46801.721650]  ffff88002bffa6f8 ffff880041deffd8 000000000000fba8 ffff88002bffa6f8
[46801.721976] Call Trace:
[46801.722121]  [<ffffffffa0db2f7d>] osp_sync_fini+0x8d/0x170 [osp]
[46801.722304]  [<ffffffff8108fd60>] ? autoremove_wake_function+0x0/0x40
[46801.722492]  [<ffffffffa0da90fe>] ? osp_disconnect+0x11e/0x170 [osp]
[46801.722686]  [<ffffffffa0dad05e>] osp_process_config+0x4ae/0x6f0 [osp]
[46801.722882]  [<ffffffffa0d5e717>] lod_process_config+0x2f7/0xa40 [lod]
[46801.723079]  [<ffffffffa0a5cc1b>] mdd_process_config+0x20b/0x7f0 [mdd]
[46801.723282]  [<ffffffffa0c9d5b1>] ? lustre_cfg_new+0x391/0x7e0 [mdt]
[46801.723478]  [<ffffffffa0c9db71>] mdt_stack_fini+0x171/0xbc0 [mdt]
[46801.723665]  [<ffffffffa0a59e90>] ? mdd_init_capa_ctxt+0x120/0x130 [mdd]
[46801.723865]  [<ffffffffa0c9e9ea>] mdt_device_fini+0x42a/0x8e0 [mdt]
[46801.724082]  [<ffffffffa0546107>] class_cleanup+0x577/0xda0 [obdclass]
[46801.724283]  [<ffffffffa051c59c>] ? class_name2dev+0x7c/0xe0 [obdclass]
[46801.724493]  [<ffffffffa05479d5>] class_process_config+0x10a5/0x1c60 [obdclass]
[46801.724821]  [<ffffffffa0aacec8>] ? libcfs_log_return+0x28/0x40 [libcfs]
[46801.725024]  [<ffffffffa0541421>] ? lustre_cfg_new+0x391/0x7e0 [obdclass]
[46801.725213]  [<ffffffffa0548709>] class_manual_cleanup+0x179/0x6e0 [obdclass]
[46801.725420]  [<ffffffffa051c59c>] ? class_name2dev+0x7c/0xe0 [obdclass]
[46801.725626]  [<ffffffffa055515c>] server_put_super+0x58c/0x10a0 [obdclass]
[46801.725823]  [<ffffffff8117d6ab>] generic_shutdown_super+0x5b/0xe0
[46801.726014]  [<ffffffff8117d796>] kill_anon_super+0x16/0x60
[46801.726205]  [<ffffffffa054a506>] lustre_kill_super+0x36/0x60 [obdclass]
[46801.726410]  [<ffffffff8117e825>] deactivate_super+0x85/0xa0
[46801.726597]  [<ffffffff8119a89f>] mntput_no_expire+0xbf/0x110
[46801.726779]  [<ffffffff8119b34b>] sys_umount+0x7b/0x3a0
[46801.726959]  [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b

The hanging line is

cfs_wait_event(thread->t_ctl_waitq, thread->t_flags & SVC_STOPPED);

Crashdump is in /exports/crashdumps/t/ospsyn.dmp and modules are in /exports/crashdumps/192.168.10.210-2013-01-18-21:37:33/modules


Generated at Sat Feb 10 01:27:03 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.