[LU-17377] racer test_1: Timeout occurred Created: 19/Dec/23 Updated: 25/Jan/24 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.15.4 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/36a3b3ec-c78e-4534-b151-7a623637c44b test_1 failed with the following error: Timeout occurred after 784 minutes, last suite running was racer Test session details: <<Please provide additional information about the failure here>> [Tue Nov 28 11:19:35 2023] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == racer test 1: racer on clients: trevis-51vm4.trevis.whamcloud.com,trevis-51vm5 DURATION=900 ========================================================== 11:19:36 \(1701170376\) [Tue Nov 28 11:19:35 2023] Lustre: DEBUG MARKER: == racer test 1: racer on clients: trevis-51vm4.trevis.whamcloud.com,trevis-51vm5 DURATION=900 ========================================================== 11:19:36 (1701170376) [Tue Nov 28 11:19:36 2023] Lustre: 2189800:0:(mdt_recovery.c:200:mdt_req_from_lrd()) @@@ restoring transno req@00000000780159d6 x1783787460031360/t399431959257(0) o101->8cf5fb4b-4b34-4192-888d-2669cccef68f@10.240.40.111@tcp:608/0 lens 376/816 e 0 to 0 dl 1701170383 ref 1 fl Interpret:H/2/0 rc 0/0 job:'dd.0' [Tue Nov 28 11:20:31 2023] LustreError: 2200161:0:(mdt_handler.c:754:mdt_pack_acl2body()) lustre-MDT0000: unable to read [0x20002d2c1:0x431:0x0] ACL: rc = -2 [Tue Nov 28 11:20:37 2023] LustreError: 2200160:0:(mdt_handler.c:754:mdt_pack_acl2body()) lustre-MDT0000: unable to read [0x20002d2c1:0x540:0x0] ACL: rc = -2 [Tue Nov 28 11:30:07 2023] LustreError: 2188308:0:(osd_handler.c:224:osd_trans_start()) lustre-MDT0000: can't assign tx: rc = -2 [Tue Nov 28 11:31:32 2023] Lustre: mdt00_007: service thread pid 2200131 was inactive for 42.604 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [Tue Nov 28 11:31:32 2023] Pid: 2200143, comm: mdt00_019 4.18.0-477.15.1.el8_lustre.x86_64 #1 SMP Thu Nov 23 04:41:42 UTC 2023 [Tue Nov 28 11:31:32 2023] Lustre: Skipped 1 previous similar message [Tue Nov 28 11:31:32 2023] Call Trace TBD: [Tue Nov 28 11:31:32 2023] [<0>] ldlm_completion_ast+0x7b0/0x900 [ptlrpc] [Tue Nov 28 11:31:32 2023] [<0>] ldlm_cli_enqueue_local+0x30b/0x870 [ptlrpc] [Tue Nov 28 11:31:32 2023] [<0>] mdt_object_local_lock+0x50e/0xb30 [mdt] [Tue Nov 28 11:31:32 2023] [<0>] mdt_object_lock_internal+0x191/0x4a0 [mdt] [Tue Nov 28 11:31:32 2023] [<0>] mdt_getattr_name_lock+0x8f3/0x2380 [mdt] [Tue Nov 28 11:31:32 2023] [<0>] mdt_intent_getattr+0x27e/0x470 [mdt] [Tue Nov 28 11:31:32 2023] [<0>] mdt_intent_opc+0x12c/0xbf0 [mdt] [Tue Nov 28 11:31:32 2023] [<0>] mdt_intent_policy+0x20b/0x3a0 [mdt] [Tue Nov 28 11:31:32 2023] [<0>] ldlm_lock_enqueue+0x4e8/0xaa0 [ptlrpc] [Tue Nov 28 11:31:32 2023] [<0>] ldlm_handle_enqueue0+0x634/0x1750 [ptlrpc] [Tue Nov 28 11:31:32 2023] [<0>] tgt_enqueue+0xa4/0x220 [ptlrpc] [Tue Nov 28 11:31:32 2023] [<0>] tgt_request_handle+0xccd/0x1b10 [ptlrpc] [Tue Nov 28 11:31:32 2023] [<0>] ptlrpc_server_handle_request+0x323/0xbe0 [ptlrpc] [Tue Nov 28 11:31:32 2023] [<0>] ptlrpc_main+0xc0f/0x1570 [ptlrpc] [Tue Nov 28 11:31:32 2023] [<0>] kthread+0x134/0x150 [Tue Nov 28 11:31:32 2023] [<0>] ret_from_fork+0x35/0x40 [Tue Nov 28 11:31:32 2023] Pid: 2200131, comm: mdt00_007 4.18.0-477.15.1.el8_lustre.x86_64 #1 SMP Thu Nov 23 04:41:42 UTC 2023 [Tue Nov 28 11:31:32 2023] Call Trace TBD: [Tue Nov 28 11:31:32 2023] [<0>] ldlm_completion_ast+0x7b0/0x900 [ptlrpc] [Tue Nov 28 11:31:32 2023] [<0>] ldlm_cli_enqueue_local+0x30b/0x870 [ptlrpc] [Tue Nov 28 11:31:32 2023] [<0>] mdt_object_local_lock+0x50e/0xb30 [mdt] [Tue Nov 28 11:31:32 2023] [<0>] mdt_object_lock_internal+0x191/0x4a0 [mdt] [Tue Nov 28 11:31:32 2023] [<0>] mdt_getattr_name_lock+0x8f3/0x2380 [mdt] [Tue Nov 28 11:31:32 2023] [<0>] mdt_intent_getattr+0x27e/0x470 [mdt] [Tue Nov 28 11:31:32 2023] [<0>] mdt_intent_opc+0x12c/0xbf0 [mdt] [Tue Nov 28 11:31:32 2023] [<0>] mdt_intent_policy+0x20b/0x3a0 [mdt] [Tue Nov 28 11:31:32 2023] [<0>] ldlm_lock_enqueue+0x4e8/0xaa0 [ptlrpc] [Tue Nov 28 11:31:32 2023] [<0>] ldlm_handle_enqueue0+0x634/0x1750 [ptlrpc] [Tue Nov 28 11:31:32 2023] [<0>] tgt_enqueue+0xa4/0x220 [ptlrpc] [Tue Nov 28 11:31:32 2023] [<0>] tgt_request_handle+0xccd/0x1b10 [ptlrpc] [Tue Nov 28 11:31:32 2023] [<0>] ptlrpc_server_handle_request+0x323/0xbe0 [ptlrpc] [Tue Nov 28 11:31:32 2023] [<0>] ptlrpc_main+0xc0f/0x1570 [ptlrpc] [Tue Nov 28 11:31:32 2023] [<0>] kthread+0x134/0x150 [Tue Nov 28 11:31:32 2023] [<0>] ret_from_fork+0x35/0x40 [Tue Nov 28 11:31:32 2023] Pid: 2200133, comm: mdt00_009 4.18.0-477.15.1.el8_lustre.x86_64 #1 SMP Thu Nov 23 04:41:42 UTC 2023 [Tue Nov 28 11:31:32 2023] Call Trace TBD: [Tue Nov 28 11:31:32 2023] [<0>] ldlm_completion_ast+0x7b0/0x900 [ptlrpc] [Tue Nov 28 11:31:32 2023] [<0>] ldlm_cli_enqueue_local+0x30b/0x870 [ptlrpc] [Tue Nov 28 11:31:32 2023] Lustre: mdt00_002: service thread pid 2188295 was inactive for 42.652 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. [Tue Nov 28 11:31:32 2023] [<0>] mdt_object_local_lock+0x50e/0xb30 [mdt] [Tue Nov 28 11:31:32 2023] [<0>] mdt_object_lock_internal+0x191/0x4a0 [mdt] [Tue Nov 28 11:31:32 2023] [<0>] mdt_getattr_name_lock+0x8f3/0x2380 [mdt] [Tue Nov 28 11:31:32 2023] [<0>] mdt_intent_getattr+0x27e/0x470 [mdt] [Tue Nov 28 11:31:32 2023] [<0>] mdt_intent_opc+0x12c/0xbf0 [mdt] [Tue Nov 28 11:31:32 2023] [<0>] mdt_intent_policy+0x20b/0x3a0 [mdt] [Tue Nov 28 11:31:32 2023] [<0>] ldlm_lock_enqueue+0x4e8/0xaa0 [ptlrpc] [Tue Nov 28 11:31:32 2023] [<0>] ldlm_handle_enqueue0+0x634/0x1750 [ptlrpc] [Tue Nov 28 11:31:32 2023] [<0>] tgt_enqueue+0xa4/0x220 [ptlrpc] [Tue Nov 28 11:31:32 2023] [<0>] tgt_request_handle+0xccd/0x1b10 [ptlrpc] [Tue Nov 28 11:31:32 2023] [<0>] ptlrpc_server_handle_request+0x323/0xbe0 [ptlrpc] [Tue Nov 28 11:31:32 2023] [<0>] ptlrpc_main+0xc0f/0x1570 [ptlrpc] [Tue Nov 28 11:31:32 2023] [<0>] kthread+0x134/0x150 [Tue Nov 28 11:31:32 2023] [<0>] ret_from_fork+0x35/0x40 [Tue Nov 28 11:36:06 2023] LustreError: 2200143:0:(ldlm_request.c:124:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1701171050, 317s ago); not entering recovery in server code, just going back to sleep ns: mdt-lustre-MDT0000_UUID lock: 000000008db36cb4/0x6256bf2f7e02c3f1 lrc: 3/1,0 mode: --/PR res: [0x20002d2c2:0x9:0x0].0x0 bits 0x13/0x0 rrc: 32 type: IBT gid 0 flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 2200143 timeout: 0 lvb_type: 0 [Tue Nov 28 11:36:06 2023] LustreError: 2200143:0:(ldlm_request.c:124:ldlm_expired_completion_wait()) Skipped 1 previous similar message [Tue Nov 28 11:36:06 2023] LustreError: dumping log to /tmp/lustre-log.1701171367.2200131 [Tue Nov 28 11:40:44 2023] Lustre: 2200163:0:(service.c:1437:ptlrpc_at_send_early_reply()) @@@ Could not add any time (4/4), not sending early reply req@00000000e8b46235 x1783763196782336/t0(0) o101->e5dd43cf-d40b-4b07-91f8-6b6b6a2c6a1c@10.240.40.112@tcp:365/0 lens 576/3928 e 24 to 0 dl 1701171650 ref 2 fl Interpret:/0/0 rc 0/0 job:'ls.0' [Tue Nov 28 11:40:50 2023] Lustre: lustre-MDT0000: Client 94c002db-c56e-4796-b0f1-e4af0f5485a1 (at 10.240.40.112@tcp) reconnecting [Tue Nov 28 11:40:50 2023] Lustre: Skipped 1 previous similar message [Tue Nov 28 11:50:51 2023] Lustre: lustre-MDT0000: Client 94c002db-c56e-4796-b0f1-e4af0f5485a1 (at 10.240.40.112@tcp) reconnecting [Tue Nov 28 11:50:51 2023] Lustre: Skipped 3 previous similar messages [Tue Nov 28 11:56:35 2023] Lustre: mdt00_031: service thread pid 2200158 was inactive for 42.147 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. [Tue Nov 28 11:56:35 2023] ptlrpc_watchdog_fire: 19 callbacks suppressed [Tue Nov 28 11:56:35 2023] Lustre: mdt00_036: service thread pid 2200163 was inactive for 42.094 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [Tue Nov 28 11:56:35 2023] Lustre: Skipped 18 previous similar messages [Tue Nov 28 11:56:35 2023] Lustre: Skipped 1 previous similar message [Tue Nov 28 11:56:35 2023] Pid: 2200163, comm: mdt00_036 4.18.0-477.15.1.el8_lustre.x86_64 #1 SMP Thu Nov 23 04:41:42 UTC 2023 [Tue Nov 28 11:56:35 2023] Call Trace TBD: VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV |
| Comments |
| Comment by Sarah Liu [ 19/Dec/23 ] |
|
oss dmesg [Tue Nov 28 11:19:35 2023] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == racer test 1: racer on clients: trevis-51vm4.trevis.whamcloud.com,trevis-51vm5 DURATION=900 ========================================================== 11:19:36 \(1701170376\) [Tue Nov 28 11:19:35 2023] Lustre: DEBUG MARKER: == racer test 1: racer on clients: trevis-51vm4.trevis.whamcloud.com,trevis-51vm5 DURATION=900 ========================================================== 11:19:36 (1701170376) [Tue Nov 28 11:31:28 2023] LustreError: 8:0:(ofd_obd.c:501:ofd_destroy_export()) lustre-OST0001: cli 8cf5fb4b-4b34-4192-888d-2669cccef68f/00000000d36a2675 has 1703936 pending on destroyed export [Tue Nov 28 11:31:28 2023] LustreError: 8:0:(tgt_grant.c:257:tgt_grant_sanity_check()) ofd_destroy_export: tot_granted 1096876032 != fo_tot_granted 1098579968 [Tue Nov 28 11:31:28 2023] LustreError: 8:0:(tgt_grant.c:260:tgt_grant_sanity_check()) ofd_destroy_export: tot_pending 10223616 != fo_tot_pending 11927552 [Tue Nov 28 11:31:31 2023] LustreError: 1392352:0:(tgt_grant.c:257:tgt_grant_sanity_check()) ofd_statfs: tot_granted 1096876032 != fo_tot_granted 1098579968 [Tue Nov 28 11:31:31 2023] LustreError: 1392352:0:(tgt_grant.c:260:tgt_grant_sanity_check()) ofd_statfs: tot_pending 13631488 != fo_tot_pending 15335424 [Tue Nov 28 11:31:36 2023] LustreError: 1392351:0:(tgt_grant.c:257:tgt_grant_sanity_check()) ofd_statfs: tot_granted 1129250816 != fo_tot_granted 1130954752 [Tue Nov 28 11:31:36 2023] LustreError: 1392351:0:(tgt_grant.c:260:tgt_grant_sanity_check()) ofd_statfs: tot_pending 42598400 != fo_tot_pending 44302336 [Tue Nov 28 11:31:41 2023] LustreError: 1392352:0:(tgt_grant.c:257:tgt_grant_sanity_check()) ofd_statfs: tot_granted 1105395712 != fo_tot_granted 1107099648 |