[LU-3665] obdfilter-survey test_3a: unmount stuck in obd_exports_barrier() Created: 30/Jul/13  Updated: 06/May/18  Resolved: 06/May/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0, Lustre 2.6.0
Fix Version/s: Lustre 2.12.0

Type: Bug Priority: Major
Reporter: Maloo Assignee: Nathaniel Clark
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocker
is blocked by LU-3319 Adapt to 3.10 upstream kernel proc_di... Resolved
Duplicate
duplicates LU-3230 conf-sanity fails to start run: umoun... Resolved
duplicates LU-4062 sanity test_132: MGS is waiting for o... Closed
duplicates LU-4695 Timeout at end of recovery-small Closed
is duplicated by LU-5166 Test failure conf-sanity: hung on umo... Resolved
is duplicated by LU-10631 obdfilter-survey.sh:obdflter_survey_r... Resolved
Related
is related to LU-5242 Test hang sanity test_132, test_133: ... Resolved
Severity: 3
Rank (Obsolete): 9451

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/14e07cdc-ed44-11e2-99b4-52540035b04c.

The sub-test test_3a failed with the following error:

test failed to respond and timed out

OST console:

00:47:41:LustreError: 19583:0:(qsd_reint.c:54:qsd_reint_completion()) Skipped 1 previous similar message
00:48:42:INFO: task umount:19559 blocked for more than 120 seconds.
00:48:42:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
00:48:42:umount        D 0000000000000001     0 19559  19558 0x00000080
00:48:42: ffff8802d22c7aa8 0000000000000082 ffffffff00000010 ffff8802d22c7a58
�
00:48:42: ffff8802d22c7a18 0000000000000286 ffffffffa07db450 ffff8802fcc06f2a
00:48:42: ffff8803003725f8 ffff8802d22c7fd8 000000000000fb88 ffff8803003725f8
00:48:42:Call Trace:
00:48:42: [<ffffffff8150ee42>] schedule_timeout+0x192/0x2e0
00:48:42: [<ffffffff810810e0>] ? process_timeout+0x0/0x10
00:48:42: [<ffffffffa05f462d>] cfs_schedule_timeout_and_set_state+0x1d/0x20 [libcfs]
00:48:42: [<ffffffffa07175f8>] obd_exports_barrier+0x98/0x170 [obdclass]
00:48:42: [<ffffffffa0e5a962>] ofd_device_fini+0x42/0x230 [ofd]
00:48:42: [<ffffffffa0742f17>] class_cleanup+0x577/0xda0 [obdclass]
00:48:42: [<ffffffffa07197a6>] ? class_name2dev+0x56/0xe0 [obdclass]
00:48:42: [<ffffffffa07447fc>] class_process_config+0x10bc/0x1c80 [obdclass]
00:48:42: [<ffffffffa073e1e3>] ? lustre_cfg_new+0x2d3/0x6e0 [obdclass]
00:48:42: [<ffffffffa0745539>] class_manual_cleanup+0x179/0x6f0 [obdclass]
00:48:42: [<ffffffffa07197a6>] ? class_name2dev+0x56/0xe0 [obdclass]
00:48:42: [<ffffffffa07809ec>] server_put_super+0x5ec/0xf60 [obdclass]
00:48:42: [<ffffffff811833ab>] generic_shutdown_super+0x5b/0xe0
00:48:42: [<ffffffff81183496>] kill_anon_super+0x16/0x60
00:48:42: [<ffffffffa07473e6>] lustre_kill_super+0x36/0x60 [obdclass]
00:48:42: [<ffffffff81183c37>] deactivate_super+0x57/0x80
00:48:42: [<ffffffff811a1c8f>] mntput_no_expire+0xbf/0x110
00:48:42: [<ffffffff811a26fb>] sys_umount+0x7b/0x3a0
00:48:42: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b


 Comments   
Comment by Nathaniel Clark [ 05/Aug/13 ]

This is possibly a duplicate of LU-3230

Comment by Doug Oucharek (Inactive) [ 08/Aug/13 ]

Closing as duplicate.

Comment by Nathaniel Clark [ 21/Feb/14 ]

Comments from LU-3230:

Jian Yu added a comment - 26/Jan/14 4:09 AM - edited

More instances on Lustre b2_5 branch:
https://maloo.whamcloud.com/test_sets/91c9c6da-861a-11e3-a2cb-52540035b04c
https://maloo.whamcloud.com/test_sets/09ebb164-8477-11e3-bab5-52540035b04c
https://maloo.whamcloud.com/test_sets/2f51a8fa-8477-11e3-bab5-52540035b04c
https://maloo.whamcloud.com/test_sets/2cbbedf4-8ecb-11e3-b036-52540035b04c

Nathaniel Clark added a comment - 21/Feb/14 12:06 PM

All the b2_5 TIMEOUTs happened in obdfilter-survey/3a, but for each of them, there were errors in test 1c or 2a that I believe left echo-client on the OST that then caused the umount to TIMEOUT.

Comment by Nathaniel Clark [ 21/Feb/14 ]

Cleanup after obdfilter-survey
http://review.whamcloud.com/9350 - master
http://review.whamcloud.com/9351 - b2_5

Comment by Nathaniel Clark [ 03/Mar/14 ]

Waiting on http://review.whamcloud.com/9038 (LU-3319)

Comment by James A Simmons [ 03/Mar/14 ]

So the patch fixed the problem?

Comment by James A Simmons [ 20/May/14 ]

Patch http://review.whamcloud.com/9038 has been landed. LU-3319 shouldn't be blocking you anymore.

Comment by Nathaniel Clark [ 10/Oct/17 ]

This patch is now stuck on
ldiskfs: LU-7420
zfs: LU-6649

Comment by Peter Jones [ 26/Feb/18 ]

Reopening as this was not a duplicate and there is a patch to track - https://review.whamcloud.com/#/c/9350/ 

Comment by Cory Spitz [ 06/Mar/18 ]

Can we set the Fix Version to 2.11.0?

Comment by Peter Jones [ 06/Mar/18 ]

@Cory let's discuss in the upcoming LWG call.

Comment by Gerrit Updater [ 06/May/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/9350/
Subject: LU-3665 tests: Cleanup echo client after obdfilter-survey
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 471c7966eb03e6283561ba5690a6f9adab68bb9e

Comment by Peter Jones [ 06/May/18 ]

Landed for 2.12

Generated at Sat Feb 10 01:35:52 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.