[LU-10121] sanity test_102a: Timeout occurred after 106 mins Created: 14/Oct/17  Updated: 14/Oct/17

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Bob Glossman <bob.glossman@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/63f445dc-b09b-11e7-943d-5254006e85c2.

The sub-test test_102a failed with the following error:

Timeout occurred after 106 mins, last suite running was sanity, restarting cluster to continue tests

OST was hung. from console log of OST:

================================================================= 00:29:20 \(1507940960\)
[ 2791.791518] Lustre: DEBUG MARKER: == sanity test 102a: user xattr test ================================================================= 00:29:20 (1507940960)
[ 3387.034404] Lustre: 7453:0:(service.c:1356:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply
[ 3387.034404]   req@ffff880063c94850 x1581187914345184/t0(0) o2->b2fc9e87-48ae-2b17-587d-dbbbaf131925@10.2.8.84@tcp:446/0 lens 560/432 e 24 to 0 dl 1507941561 ref 2 fl Interpret:/0/0 rc 0/0
[ 3388.036398] Lustre: 7453:0:(service.c:1356:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply
[ 3388.036398]   req@ffff88005e18a050 x1581187922246688/t0(0) o6->lustre-MDT0000-mdtlov_UUID@10.2.8.88@tcp:447/0 lens 664/432 e 24 to 0 dl 1507941562 ref 2 fl Interpret:/0/0 rc 0/0
[ 3393.646670] Lustre: lustre-OST0000: Client b2fc9e87-48ae-2b17-587d-dbbbaf131925 (at 10.2.8.84@tcp) reconnecting
[ 3393.647882] Lustre: lustre-OST0000: Connection restored to 2946d57c-37df-70e7-0e08-049884dae200 (at 10.2.8.84@tcp)
[ 3393.648908] Lustre: Skipped 2 previous similar messages
[ 3394.054006] Lustre: lustre-OST0000: deleting orphan objects from 0x0:55917 to 0x0:55937
[ 3994.653150] Lustre: lustre-OST0000: Client b2fc9e87-48ae-2b17-587d-dbbbaf131925 (at 10.2.8.84@tcp) reconnecting
[ 3994.654263] Lustre: Skipped 1 previous similar message
[ 3994.654924] Lustre: lustre-OST0000: Connection restored to 2946d57c-37df-70e7-0e08-049884dae200 (at 10.2.8.84@tcp)
[ 3994.655979] Lustre: Skipped 1 previous similar message
[ 4144.218382] Lustre: 9461:0:(service.c:1356:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply
[ 4144.218382]   req@ffff880063c94450 x1581187922258224/t0(0) o5->lustre-MDT0000-mdtlov_UUID@10.2.8.88@tcp:448/0 lens 432/432 e 0 to 0 dl 1507942318 ref 2 fl Interpret:/0/0 rc 0/0
[ 4320.247693] INFO: task ll_ost00_003:9461 blocked for more than 120 seconds.
[ 4320.248555] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4320.249331] ll_ost00_003    D ffff88005e0f7a30     0  9461      2 0x00000080
[ 4320.250087]  ffff880069aa3b80 0000000000000046 ffff880060103f40 ffff880069aa3fd8
[ 4320.250914]  ffff880069aa3fd8 ffff880069aa3fd8 ffff880060103f40 ffff88005e0f7a28
[ 4320.251732]  ffff88005e0f7a2c ffff880060103f40 00000000ffffffff ffff88005e0f7a30
[ 4320.252555] Call Trace:
[ 4320.252841]  [<ffffffff816aa3e9>] schedule_preempt_disabled+0x29/0x70
[ 4320.253500]  [<ffffffff816a8317>] __mutex_lock_slowpath+0xc7/0x1d0
[ 4320.254114]  [<ffffffff816a772f>] mutex_lock+0x1f/0x2f
[ 4320.254685]  [<ffffffffc10fa4bb>] ofd_create_hdl+0xdcb/0x2090 [ofd]
[ 4320.255547]  [<ffffffffc0db9287>] ? lustre_msg_add_version+0x27/0xa0 [ptlrpc]
[ 4320.257328]  [<ffffffffc0db95df>] ? lustre_pack_reply_v2+0x14f/0x280 [ptlrpc]
[ 4320.258189]  [<ffffffffc0db9901>] ? lustre_pack_reply+0x11/0x20 [ptlrpc]
[ 4320.258946]  [<ffffffffc0e1c475>] tgt_request_handle+0x925/0x1370 [ptlrpc]
[ 4320.259731]  [<ffffffffc0dc537e>] ptlrpc_server_handle_request+0x24e/0xab0 [ptlrpc]
[ 4320.260526]  [<ffffffff810ba588>] ? __wake_up_common+0x58/0x90
[ 4320.261182]  [<ffffffffc0dc8b22>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
[ 4320.261853]  [<ffffffffc0dc8090>] ? ptlrpc_register_service+0xe80/0xe80 [ptlrpc]
[ 4320.262684]  [<ffffffff810b098f>] kthread+0xcf/0xe0
[ 4320.263168]  [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
[ 4320.263798]  [<ffffffff816b4f18>] ret_from_fork+0x58/0x90
[ 4320.264422]  [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40

Info required for matching: sanity 102a


Generated at Sat Feb 10 02:32:14 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.