[LU-10281] conf-sanity: test_54a hung at lnet_discover_peer_locked() Created: 27/Nov/17  Updated: 27/Nov/17

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-9917 lnet_discover_peer_locked() must refr... Resolved
is related to LU-9971 MR: ABA problem in lnet_discover_peer... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Jinshan Xiong <jinshan.xiong@intel.com>

Please provide additional information about the failure here.

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/60aa1144-d2a7-11e7-9840-52540065bddc.

The console message at OSS:

[24240.470189] Lustre: srv-lustre-OST0000: No data found on store. Initialize space
[24284.799899] LNet: Service thread pid 2277 was inactive for 40.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
[24284.801683] Pid: 2277, comm: ll_ost00_002
[24284.802095] 
Call Trace:
[24284.802511]  [<ffffffff816a9569>] schedule+0x29/0x70
[24284.803038]  [<ffffffffc077a9bb>] lnet_discover_peer_locked+0x10b/0x380 [lnet]
[24284.803764]  [<ffffffff810b1920>] ? autoremove_wake_function+0x0/0x40
[24284.804550]  [<ffffffffc077aca0>] LNetPrimaryNID+0x70/0x1a0 [lnet]
[24284.805230]  [<ffffffffc0a3e35e>] ptlrpc_connection_get+0x3e/0x450 [ptlrpc]
[24284.806007]  [<ffffffffc0a422a4>] ptlrpc_send_reply+0x394/0x840 [ptlrpc]
[24284.806762]  [<ffffffffc0a482af>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc]
[24284.807514]  [<ffffffffc0a4281b>] ptlrpc_send_error+0x9b/0x1b0 [ptlrpc]
[24284.808270]  [<ffffffffc0a42940>] ptlrpc_error+0x10/0x20 [ptlrpc]
[24284.808917]  [<ffffffffc0aafb18>] tgt_request_handle+0x7d8/0x13b0 [ptlrpc]
[24284.809653]  [<ffffffffc0a53eee>] ptlrpc_server_handle_request+0x24e/0xab0 [ptlrpc]
[24284.810445]  [<ffffffffc0a50db8>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
[24284.811199]  [<ffffffff810c4832>] ? default_wake_function+0x12/0x20
[24284.811824]  [<ffffffff810ba598>] ? __wake_up_common+0x58/0x90
[24284.812424]  [<ffffffffc0a57692>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
[24284.813143]  [<ffffffff81029557>] ? __switch_to+0xd7/0x510
[24284.813685]  [<ffffffff816a9000>] ? __schedule+0x370/0x8b0
[24284.814259]  [<ffffffffc0a56c00>] ? ptlrpc_main+0x0/0x1e40 [ptlrpc]
[24284.814975]  [<ffffffff810b099f>] kthread+0xcf/0xe0
[24284.815450]  [<ffffffff810b08d0>] ? kthread+0x0/0xe0
[24284.815961]  [<ffffffff816b4fd8>] ret_from_fork+0x58/0x90
[24284.816565]  [<ffffffff810b08d0>] ? kthread+0x0/0xe0

My patch doesn't change this area of the code.



 Comments   
Comment by Jinshan Xiong (Inactive) [ 27/Nov/17 ]

Probably related to multi-rail issues.

Generated at Sat Feb 10 02:33:39 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.