[LU-15377] Interop sanityn test_31a: Timeout occurred after 70 mins, last suite running was sanityn Created: 15/Dec/21  Updated: 21/Dec/22

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.8
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-14548 sanityn test 31a hangs in client lock... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/3045b6e7-f5b7-4b71-b605-b6dc1e4cce73

test_31a failed with the following error:

Timeout occurred after 70 mins, last suite running was sanityn

Hit the failure in interop between 2.10.8(client) and 2.12.8(server)
there isn't much log, only client 1 console shows

[  713.726260] Lustre: DEBUG MARKER: dmesg
[  714.202657] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == sanityn test 31a: voluntary cancel \/ blocking ast race============================================= 21:26:57 \(1638566817\)
[  714.406656] Lustre: DEBUG MARKER: == sanityn test 31a: voluntary cancel / blocking ast race============================================= 21:26:57 (1638566817)
[  714.461305] Lustre: *** cfs_fail_loc=314, val=0***
[  715.286505] Lustre: *** cfs_fail_loc=314, val=0***
[  715.287508] Lustre: Skipped 2 previous similar messages
[  754.579148] LNet: Service thread pid 15960 was inactive for 40.11s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
[  754.582424] Pid: 15960, comm: ldlm_cb00_000 3.10.0-957.1.3.el7.x86_64 #1 SMP Thu Nov 29 14:49:43 UTC 2018
[  754.584236] Call Trace:
[  754.584765]  [<ffffffffc0bb8faa>] ldlm_handle_cp_callback+0x10a/0xb70 [ptlrpc]
[  754.586528]  [<ffffffffc0bbbf68>] ldlm_callback_handler.part.10+0xbc8/0x2110 [ptlrpc]
[  754.588072]  [<ffffffffc0bbd4e7>] ldlm_callback_handler+0x37/0xd0 [ptlrpc]
[  754.589462]  [<ffffffffc0be917b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
[  754.591009]  [<ffffffffc0bec8c2>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
[  754.592287]  [<ffffffffbc0c1c31>] kthread+0xd1/0xe0
[  754.593307]  [<ffffffffbc774c37>] ret_from_fork_nospec_end+0x0/0x39
[  754.594581]  [<ffffffffffffffff>] 0xffffffffffffffff
[  754.595601] LustreError: dumping log to /tmp/lustre-log.1638566857.15960
[ 1309.444983] Lustre: 15962:0:(service.c:1346:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply
[ 1309.444983]   req@ffff8acc9fee4c00 x1718161318897216/t0(0) o105->LOV_OSC_UUID@10.240.29.215@tcp:732/0 lens 360/192 e 24 to 0 dl 1638567417 ref 2 fl Interpret:/0/0 rc 0/0
[ 4361.356047] SysRq : Changing Loglevel
[ 4361.356926] Loglevel set to 8
[ 4361.358368] SysRq : Show State

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanityn test_31a - Timeout occurred after 70 mins, last suite running was sanityn


Generated at Sat Feb 10 03:17:47 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.