[LU-13213] recovery-small test_10d: Timeout occurred after 343 mins, last suite running was recovery-small Created: 07/Feb/20  Updated: 30/Mar/20

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for wangshilong <wshilong@ddn.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/17db9b74-48b4-11ea-aeb7-52540065bddc

test_10d failed with the following error:

Timeout occurred after 343 mins, last suite running was recovery-small

<<Please provide additional information about the failure here>>

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
recovery-small test_10d - Timeout occurred after 343 mins, last suite running was recovery-small



 Comments   
Comment by Bruno Faccini (Inactive) [ 25/Feb/20 ]

+1 on recent master at https://testing.whamcloud.com/test_sets/c006f95e-c218-4f9b-aa8a-cd00f9a30e58 .
And just in case this can help, only interesting stack/thread found in all Clients/Servers Console log is on Client 1:

[21519.072162] ll_sa_18549     S ffff9846a408d140     0 18588      2 0x00000080
[21519.073467] Call Trace:
[21519.073942]  [<ffffffffa8980a09>] schedule+0x29/0x70
[21519.074778]  [<ffffffffc0b1422e>] ldlm_lock_decref_internal+0x95e/0xa30 [ptlrpc]
[21519.076037]  [<ffffffffa82c72e0>] ? wake_up_atomic_t+0x30/0x30
[21519.077037]  [<ffffffffc0b220f5>] failed_lock_cleanup.isra.19+0x95/0x230 [ptlrpc]
[21519.078297]  [<ffffffffc0b245ac>] ldlm_cli_enqueue_fini+0x16c/0xe40 [ptlrpc]
[21519.079516]  [<ffffffffc08a4371>] ? lprocfs_counter_sub+0xc1/0x130 [obdclass]
[21519.080718]  [<ffffffffc0b28271>] ldlm_cli_enqueue+0x441/0xa20 [ptlrpc]
[21519.081835]  [<ffffffffc0b25550>] ? ldlm_expired_completion_wait+0x2a0/0x2a0 [ptlrpc]
[21519.083313]  [<ffffffffc0f29490>] ? ll_md_need_convert+0x1b0/0x1b0 [lustre]
[21519.084525]  [<ffffffffc0c76460>] ? mdc_changelog_cdev_finish+0x1f0/0x1f0 [mdc]
[21519.085743]  [<ffffffffc0c70d30>] mdc_enqueue_base+0x330/0x1d30 [mdc]
[21519.086822]  [<ffffffffc0c72e85>] mdc_intent_lock+0x135/0x560 [mdc]
[21519.087874]  [<ffffffffc0f29490>] ? ll_md_need_convert+0x1b0/0x1b0 [lustre]
[21519.089053]  [<ffffffffc0b25550>] ? ldlm_expired_completion_wait+0x2a0/0x2a0 [ptlrpc]
[21519.090361]  [<ffffffffc0c76460>] ? mdc_changelog_cdev_finish+0x1f0/0x1f0 [mdc]
[21519.091590]  [<ffffffffc0c60373>] mdc_read_page+0xb3/0x970 [mdc]
[21519.092642]  [<ffffffffc0ae2477>] lmv_striped_read_page.isra.33+0x4f0/0xa7b [lmv]
[21519.093886]  [<ffffffffc0acd7d8>] lmv_read_page+0x258/0x3e0 [lmv]
[21519.094979]  [<ffffffffc0ee9e1c>] ll_get_dir_page+0xac/0x1b0 [lustre]
[21519.096064]  [<ffffffffc0f29490>] ? ll_md_need_convert+0x1b0/0x1b0 [lustre]
[21519.097238]  [<ffffffffc0f3d460>] ? ll_agl_trigger+0x520/0x520 [lustre]
[21519.098358]  [<ffffffffc0f3d6f0>] ll_statahead_thread+0x290/0xba0 [lustre]
[21519.099508]  [<ffffffffa82d4ffe>] ? finish_task_switch+0x4e/0x1c0
[21519.100524]  [<ffffffffa89805a2>] ? __schedule+0x402/0x840
[21519.101459]  [<ffffffffc0f3d460>] ? ll_agl_trigger+0x520/0x520 [lustre]
[21519.102564]  [<ffffffffa82c61f1>] kthread+0xd1/0xe0
[21519.103400]  [<ffffffffa82c6120>] ? insert_kthread_work+0x40/0x40
[21519.104419]  [<ffffffffa898dd37>] ret_from_fork_nospec_begin+0x21/0x21
[21519.105575]  [<ffffffffa82c6120>] ? insert_kthread_work+0x40/0x40
Comment by Emoly Liu [ 26/Feb/20 ]

+1 on master: https://testing.whamcloud.com/test_sets/73d46ce4-0a1a-4731-be42-3d9c7c213762

Comment by Chris Horn [ 26/Feb/20 ]

+1 on master
https://testing.whamcloud.com/test_sessions/a5039531-f929-4bae-95b4-c9ea63ac6270

Comment by Bruno Faccini (Inactive) [ 04/Mar/20 ]

+1 on recent master at https://testing.whamcloud.com/test_sets/d91c26a6-a83a-40af-82dc-3006fff0471e , and again with the same stage stack/sa thread ...

Comment by Olaf Faaland [ 30/Mar/20 ]

+1 on master: https://testing.whamcloud.com/test_sets/92d08cf9-efaf-4db9-8a6d-b54509319af6 with same stack Bruno identified.

Generated at Sat Feb 10 02:59:22 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.