[LU-8761] replay-single test_70b: OUT threads hang on lu_object_find_at() Created: 26/Oct/16  Updated: 06/Jul/21  Resolved: 06/Jul/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Niu Yawei <yawei.niu@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/68a7725a-9b09-11e6-80a1-5254006e85c2.

The sub-test test_70b failed with the following error:


Please provide additional information about the failure here.

Info required for matching: replay-single 70b



 Comments   
Comment by Niu Yawei (Inactive) [ 26/Oct/16 ]

console log from one of the MDT:

6:16:11:[ 3930.145981] Lustre: lustre-MDT0001: Recovery over after 0:04, of 5 clients 5 recovered and 0 were evicted.
16:16:11:[ 3931.489640] Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 4 sec
16:16:11:[ 3931.500570] Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 4 sec
16:16:11:[ 3932.121919] Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 4 sec
16:26:12:[ 3932.131738] Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 4 sec
16:26:12:[ 3940.005891] Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_70b fail mds3 3 times
16:26:12:[ 3940.453025] Lustre: DEBUG MARKER: test_70b fail mds3 3 times
16:26:12:[ 4200.625154] INFO: task mdt_out00_002:24322 blocked for more than 120 seconds.
16:26:12:[ 4200.629097] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
16:26:12:[ 4200.631169] mdt_out00_002   D ffff88007bf00100     0 24322      2 0x00000080
16:26:12:[ 4200.633229]  ffff88005b9afae8 0000000000000046 ffff880057840b80 ffff88005b9affd8
16:26:12:[ 4200.635344]  ffff88005b9affd8 ffff88005b9affd8 ffff880057840b80 ffff88004a704000
16:26:12:[ 4200.637481]  ffff880044ce0000 ffff88007bf00118 ffff88005b9afc6c ffff88007bf00100
16:26:12:[ 4200.639563] Call Trace:
16:26:12:[ 4200.641428]  [<ffffffff8163bc39>] schedule+0x29/0x70
16:26:12:[ 4200.643324]  [<ffffffffa07ede5d>] lu_object_find_at+0x4d/0xe0 [obdclass]
16:26:12:[ 4200.645262]  [<ffffffffa0a1432f>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc]
16:26:12:[ 4200.647187]  [<ffffffff810b8940>] ? wake_up_state+0x20/0x20
16:26:12:[ 4200.648992]  [<ffffffffa07ef1c8>] dt_locate_at+0x18/0xb0 [obdclass]
16:26:12:[ 4200.650825]  [<ffffffffa0a7c7ad>] out_handle+0x108d/0x1900 [ptlrpc]
16:26:12:[ 4200.652647]  [<ffffffffa0a12022>] ? lustre_msg_get_opc+0x22/0xf0 [ptlrpc]
16:26:12:[ 4200.654478]  [<ffffffffa0a72259>] ? tgt_request_preprocess.isra.26+0x299/0x790 [ptlrpc]
16:26:12:[ 4200.656402]  [<ffffffffa0a73065>] tgt_request_handle+0x915/0x1320 [ptlrpc]
16:26:12:[ 4200.658260]  [<ffffffffa0a1efdb>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc]
16:26:12:[ 4200.660178]  [<ffffffffa0a1cb98>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
16:26:12:[ 4200.664966]  [<ffffffff810b8952>] ? default_wake_function+0x12/0x20
16:26:12:[ 4200.668601]  [<ffffffff810af0b8>] ? __wake_up_common+0x58/0x90
16:26:12:[ 4200.670327]  [<ffffffffa0a23090>] ptlrpc_main+0xaa0/0x1de0 [ptlrpc]
16:26:12:[ 4200.672064]  [<ffffffffa0a225f0>] ? ptlrpc_register_service+0xe40/0xe40 [ptlrpc]
16:26:12:[ 4200.673836]  [<ffffffff810a5b8f>] kthread+0xcf/0xe0
16:26:12:[ 4200.675448]  [<ffffffff810a5ac0>] ? kthread_create_on_node+0x140/0x140
16:26:12:[ 4200.677146]  [<ffffffff81646b98>] ret_from_fork+0x58/0x90
16:26:12:[ 4200.678748]  [<ffffffff810a5ac0>] ? kthread_create_on_node+0x140/0x140
Generated at Sat Feb 10 02:20:19 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.