[LU-4449] Test failure timeout on sanity-scrub test_3: MGS stuck on umount with obd_unlinked_exports Created: 07/Jan/14  Updated: 13/Jan/14  Resolved: 09/Jan/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates LU-3230 conf-sanity fails to start run: umoun... Resolved
duplicates LU-4062 sanity test_132: MGS is waiting for o... Closed
Severity: 3
Rank (Obsolete): 12199

 Description   

This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

This issue relates to the following test suite run:
http://maloo.whamcloud.com/test_sets/5e3fd144-766e-11e3-b3c0-52540035b04c.

The sub-test test_3 failed with the following error:

test failed to respond and timed out

Info required for matching: sanity-scrub 3

MDS/MGS console log

09:39:21:LustreError: 166-1: MGC10.10.17.217@tcp: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail
09:39:21:Lustre: MGS is waiting for obd_unlinked_exports more than 8 seconds. The obd refcount = 5. Is it stuck?
09:39:21:Lustre: MGS is waiting for obd_unlinked_exports more than 16 seconds. The obd refcount = 5. Is it stuck?
09:39:21:Lustre: MGS is waiting for obd_unlinked_exports more than 32 seconds. The obd refcount = 5. Is it stuck?
09:39:23:Lustre: MGS is waiting for obd_unlinked_exports more than 64 seconds. The obd refcount = 5. Is it stuck?
09:39:23:Lustre: MGS is waiting for obd_unlinked_exports more than 128 seconds. The obd refcount = 5. Is it stuck?
09:39:23:INFO: task umount:19279 blocked for more than 120 seconds.
09:39:23:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
09:39:23:umount        D 0000000000000000     0 19279  19278 0x00000080
09:39:23: ffff880067477aa8 0000000000000086 0000000000000000 ffff88007b83a000
09:39:23: ffffffffa0931ce3 0000000000000000 ffff88005b046084 ffffffffa0931ce3
09:39:24: ffff880060265af8 ffff880067477fd8 000000000000fb88 ffff880060265af8
09:39:24:Call Trace:
09:39:24: [<ffffffff8150f3f2>] schedule_timeout+0x192/0x2e0
09:39:24: [<ffffffff810811e0>] ? process_timeout+0x0/0x10
09:39:24: [<ffffffffa08b767b>] obd_exports_barrier+0xab/0x180 [obdclass]
09:39:24: [<ffffffffa12c252e>] mgs_device_fini+0xfe/0x580 [mgs]
09:39:24: [<ffffffffa08e0013>] class_cleanup+0x573/0xd30 [obdclass]
09:39:24: [<ffffffffa08b9816>] ? class_name2dev+0x56/0xe0 [obdclass]
09:39:24: [<ffffffffa08e1d3a>] class_process_config+0x156a/0x1ad0 [obdclass]
09:39:24: [<ffffffffa08da013>] ? lustre_cfg_new+0x2d3/0x6e0 [obdclass]
09:39:24: [<ffffffffa08e2419>] class_manual_cleanup+0x179/0x6f0 [obdclass]
09:45:18: [<ffffffffa08b9816>] ? class_name2dev+0x56/0xe0 [obdclass]
09:45:18: [<ffffffffa091b51b>] server_put_super+0x94b/0xe30 [obdclass]
09:45:18: [<ffffffff8118366b>] generic_shutdown_super+0x5b/0xe0
09:45:18: [<ffffffff81183756>] kill_anon_super+0x16/0x60
09:45:18: [<ffffffffa08e42d6>] lustre_kill_super+0x36/0x60 [obdclass]
09:45:18: [<ffffffff81183ef7>] deactivate_super+0x57/0x80
09:45:18: [<ffffffff811a21ef>] mntput_no_expire+0xbf/0x110
09:45:18: [<ffffffff811a2c5b>] sys_umount+0x7b/0x3a0
09:45:18: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

Possibly not a full fix for LU-4365?



 Comments   
Comment by Oleg Drokin [ 09/Jan/14 ]

This is a duplicate of LU-3230

Generated at Sat Feb 10 01:42:50 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.