[LU-12148] conf-sanity test_64: timed out Created: 02/Apr/19  Updated: 24/Mar/22  Resolved: 24/Mar/22

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates LU-15616 sanity-lnet test_226: Timeout occurre... Resolved
duplicates LU-15618 ksock_conn ref leak on shutdown Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Minh Diep <mdiep@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/ecc78788-5523-11e9-92fe-52540065bddc

test_64 failed with the following error:

Timeout occurred after 308 mins, last suite running was conf-sanity, restarting cluster to continue tests

<<Please provide additional information about the failure here>>
[14994.822655] Lustre: setting import lustre-MDT0000_UUID INACTIVE by administrator request
[15004.893254] Lustre: Unmounted lustre-client
[15042.573217] Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' || true
[15042.714100] Key type lgssc unregistered
[15044.164409] LNet: 7665:0:(socklnd.c:2600:ksocknal_shutdown()) waiting for 1 peers to disconnect
[15048.165408] LNet: 7665:0:(socklnd.c:2600:ksocknal_shutdown()) waiting for 1 peers to disconnect
[15056.166467] LNet: 7665:0:(socklnd.c:2600:ksocknal_shutdown()) waiting for 1 peers to disconnect
[15072.167392] LNet: 7665:0:(socklnd.c:2600:ksocknal_shutdown()) waiting for 1 peers to disconnect
[15104.168388] LNet: 7665:0:(socklnd.c:2600:ksocknal_shutdown()) waiting for 1 peers to disconnect
[15168.169393] LNet: 7665:0:(socklnd.c:2600:ksocknal_shutdown()) waiting for 1 peers to disconnect
[15240.322419] INFO: task socknal_sd00_00:842 blocked for more than 120 seconds.
[15240.323677] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[15240.324983] socknal_sd00_00 D ffff940fb6be8000 0 842 2 0x00000080
[15240.326214] Call Trace:
[15240.326714] [] schedule_preempt_disabled+0x29/0x70
[15240.327920] [] __mutex_lock_slowpath+0xc7/0x1d0
[15240.328985] [] mutex_lock+0x1f/0x2f
[15240.329906] [] lnet_nid2peerni_locked+0x71/0x150 [lnet]
[15240.331085] [] lnet_parse+0x791/0x11e0 [lnet]
[15240.332077] [] ksocknal_process_receive+0x46e/0xda0 [ksocklnd]
[15240.333496] [] ksocknal_scheduler+0xee/0x670 [ksocklnd]
[15240.334632] [] ? wake_up_atomic_t+0x30/0x30
[15240.335646] [] ? ksocknal_recv+0x2a0/0x2a0 [ksocklnd]
[15240.336765] [] kthread+0xd1/0xe0
[15240.337599] [] ? insert_kthread_work+0x40/0x40
[15240.338630] [] ret_from_fork_nospec_begin+0x21/0x21
[15240.339725] [] ? insert_kthread_work+0x40/0x40
[15296.170391] LNet: 7665:0:(socklnd.c:2600:ksocknal_shutdown()) waiting for 1 peers to disconnect
[15360.340409] INFO: task socknal_sd00_00:842 blocked for more than 120 seconds.
[15360.341721] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[15360.342979] socknal_sd00_00 D ffff940fb6be8000 0 842 2 0x00000080

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
conf-sanity test_64 - Timeout occurred after 308 mins, last suite running was conf-sanity, restarting cluster to continue tests



 Comments   
Comment by Chris Horn [ 24/Mar/22 ]

Likely a duplicate of either https://jira.whamcloud.com/browse/LU-15618 or https://jira.whamcloud.com/browse/LU-15616 (or could be that both issues were hit).

Generated at Sat Feb 10 02:50:04 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.