Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17377

racer test_1: Timeout occurred

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.15.4
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/36a3b3ec-c78e-4534-b151-7a623637c44b

      test_1 failed with the following error:

      Timeout occurred after 784 minutes, last suite running was racer
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-b2_15/77 - 4.18.0-477.15.1.el8_8.x86_64
      servers: https://build.whamcloud.com/job/lustre-b2_15/77 - 4.18.0-477.15.1.el8_lustre.x86_64

      <<Please provide additional information about the failure here>>
      MDS dmesg

      [Tue Nov 28 11:19:35 2023] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == racer test 1: racer on clients: trevis-51vm4.trevis.whamcloud.com,trevis-51vm5 DURATION=900 ========================================================== 11:19:36 \(1701170376\)
      [Tue Nov 28 11:19:35 2023] Lustre: DEBUG MARKER: == racer test 1: racer on clients: trevis-51vm4.trevis.whamcloud.com,trevis-51vm5 DURATION=900 ========================================================== 11:19:36 (1701170376)
      [Tue Nov 28 11:19:36 2023] Lustre: 2189800:0:(mdt_recovery.c:200:mdt_req_from_lrd()) @@@ restoring transno  req@00000000780159d6 x1783787460031360/t399431959257(0) o101->8cf5fb4b-4b34-4192-888d-2669cccef68f@10.240.40.111@tcp:608/0 lens 376/816 e 0 to 0 dl 1701170383 ref 1 fl Interpret:H/2/0 rc 0/0 job:'dd.0'
      [Tue Nov 28 11:20:31 2023] LustreError: 2200161:0:(mdt_handler.c:754:mdt_pack_acl2body()) lustre-MDT0000: unable to read [0x20002d2c1:0x431:0x0] ACL: rc = -2
      [Tue Nov 28 11:20:37 2023] LustreError: 2200160:0:(mdt_handler.c:754:mdt_pack_acl2body()) lustre-MDT0000: unable to read [0x20002d2c1:0x540:0x0] ACL: rc = -2
      [Tue Nov 28 11:30:07 2023] LustreError: 2188308:0:(osd_handler.c:224:osd_trans_start()) lustre-MDT0000: can't assign tx: rc = -2
      [Tue Nov 28 11:31:32 2023] Lustre: mdt00_007: service thread pid 2200131 was inactive for 42.604 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [Tue Nov 28 11:31:32 2023] Pid: 2200143, comm: mdt00_019 4.18.0-477.15.1.el8_lustre.x86_64 #1 SMP Thu Nov 23 04:41:42 UTC 2023
      [Tue Nov 28 11:31:32 2023] Lustre: Skipped 1 previous similar message
      [Tue Nov 28 11:31:32 2023] Call Trace TBD:
      [Tue Nov 28 11:31:32 2023] [<0>] ldlm_completion_ast+0x7b0/0x900 [ptlrpc]
      [Tue Nov 28 11:31:32 2023] [<0>] ldlm_cli_enqueue_local+0x30b/0x870 [ptlrpc]
      [Tue Nov 28 11:31:32 2023] [<0>] mdt_object_local_lock+0x50e/0xb30 [mdt]
      [Tue Nov 28 11:31:32 2023] [<0>] mdt_object_lock_internal+0x191/0x4a0 [mdt]
      [Tue Nov 28 11:31:32 2023] [<0>] mdt_getattr_name_lock+0x8f3/0x2380 [mdt]
      [Tue Nov 28 11:31:32 2023] [<0>] mdt_intent_getattr+0x27e/0x470 [mdt]
      [Tue Nov 28 11:31:32 2023] [<0>] mdt_intent_opc+0x12c/0xbf0 [mdt]
      [Tue Nov 28 11:31:32 2023] [<0>] mdt_intent_policy+0x20b/0x3a0 [mdt]
      [Tue Nov 28 11:31:32 2023] [<0>] ldlm_lock_enqueue+0x4e8/0xaa0 [ptlrpc]
      [Tue Nov 28 11:31:32 2023] [<0>] ldlm_handle_enqueue0+0x634/0x1750 [ptlrpc]
      [Tue Nov 28 11:31:32 2023] [<0>] tgt_enqueue+0xa4/0x220 [ptlrpc]
      [Tue Nov 28 11:31:32 2023] [<0>] tgt_request_handle+0xccd/0x1b10 [ptlrpc]
      [Tue Nov 28 11:31:32 2023] [<0>] ptlrpc_server_handle_request+0x323/0xbe0 [ptlrpc]
      [Tue Nov 28 11:31:32 2023] [<0>] ptlrpc_main+0xc0f/0x1570 [ptlrpc]
      [Tue Nov 28 11:31:32 2023] [<0>] kthread+0x134/0x150
      [Tue Nov 28 11:31:32 2023] [<0>] ret_from_fork+0x35/0x40
      [Tue Nov 28 11:31:32 2023] Pid: 2200131, comm: mdt00_007 4.18.0-477.15.1.el8_lustre.x86_64 #1 SMP Thu Nov 23 04:41:42 UTC 2023
      [Tue Nov 28 11:31:32 2023] Call Trace TBD:
      [Tue Nov 28 11:31:32 2023] [<0>] ldlm_completion_ast+0x7b0/0x900 [ptlrpc]
      [Tue Nov 28 11:31:32 2023] [<0>] ldlm_cli_enqueue_local+0x30b/0x870 [ptlrpc]
      [Tue Nov 28 11:31:32 2023] [<0>] mdt_object_local_lock+0x50e/0xb30 [mdt]
      [Tue Nov 28 11:31:32 2023] [<0>] mdt_object_lock_internal+0x191/0x4a0 [mdt]
      [Tue Nov 28 11:31:32 2023] [<0>] mdt_getattr_name_lock+0x8f3/0x2380 [mdt]
      [Tue Nov 28 11:31:32 2023] [<0>] mdt_intent_getattr+0x27e/0x470 [mdt]
      [Tue Nov 28 11:31:32 2023] [<0>] mdt_intent_opc+0x12c/0xbf0 [mdt]
      [Tue Nov 28 11:31:32 2023] [<0>] mdt_intent_policy+0x20b/0x3a0 [mdt]
      [Tue Nov 28 11:31:32 2023] [<0>] ldlm_lock_enqueue+0x4e8/0xaa0 [ptlrpc]
      [Tue Nov 28 11:31:32 2023] [<0>] ldlm_handle_enqueue0+0x634/0x1750 [ptlrpc]
      [Tue Nov 28 11:31:32 2023] [<0>] tgt_enqueue+0xa4/0x220 [ptlrpc]
      [Tue Nov 28 11:31:32 2023] [<0>] tgt_request_handle+0xccd/0x1b10 [ptlrpc]
      [Tue Nov 28 11:31:32 2023] [<0>] ptlrpc_server_handle_request+0x323/0xbe0 [ptlrpc]
      [Tue Nov 28 11:31:32 2023] [<0>] ptlrpc_main+0xc0f/0x1570 [ptlrpc]
      [Tue Nov 28 11:31:32 2023] [<0>] kthread+0x134/0x150
      [Tue Nov 28 11:31:32 2023] [<0>] ret_from_fork+0x35/0x40
      [Tue Nov 28 11:31:32 2023] Pid: 2200133, comm: mdt00_009 4.18.0-477.15.1.el8_lustre.x86_64 #1 SMP Thu Nov 23 04:41:42 UTC 2023
      [Tue Nov 28 11:31:32 2023] Call Trace TBD:
      [Tue Nov 28 11:31:32 2023] [<0>] ldlm_completion_ast+0x7b0/0x900 [ptlrpc]
      [Tue Nov 28 11:31:32 2023] [<0>] ldlm_cli_enqueue_local+0x30b/0x870 [ptlrpc]
      [Tue Nov 28 11:31:32 2023] Lustre: mdt00_002: service thread pid 2188295 was inactive for 42.652 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
      [Tue Nov 28 11:31:32 2023] [<0>] mdt_object_local_lock+0x50e/0xb30 [mdt]
      [Tue Nov 28 11:31:32 2023] [<0>] mdt_object_lock_internal+0x191/0x4a0 [mdt]
      [Tue Nov 28 11:31:32 2023] [<0>] mdt_getattr_name_lock+0x8f3/0x2380 [mdt]
      [Tue Nov 28 11:31:32 2023] [<0>] mdt_intent_getattr+0x27e/0x470 [mdt]
      [Tue Nov 28 11:31:32 2023] [<0>] mdt_intent_opc+0x12c/0xbf0 [mdt]
      [Tue Nov 28 11:31:32 2023] [<0>] mdt_intent_policy+0x20b/0x3a0 [mdt]
      [Tue Nov 28 11:31:32 2023] [<0>] ldlm_lock_enqueue+0x4e8/0xaa0 [ptlrpc]
      [Tue Nov 28 11:31:32 2023] [<0>] ldlm_handle_enqueue0+0x634/0x1750 [ptlrpc]
      [Tue Nov 28 11:31:32 2023] [<0>] tgt_enqueue+0xa4/0x220 [ptlrpc]
      [Tue Nov 28 11:31:32 2023] [<0>] tgt_request_handle+0xccd/0x1b10 [ptlrpc]
      [Tue Nov 28 11:31:32 2023] [<0>] ptlrpc_server_handle_request+0x323/0xbe0 [ptlrpc]
      [Tue Nov 28 11:31:32 2023] [<0>] ptlrpc_main+0xc0f/0x1570 [ptlrpc]
      [Tue Nov 28 11:31:32 2023] [<0>] kthread+0x134/0x150
      [Tue Nov 28 11:31:32 2023] [<0>] ret_from_fork+0x35/0x40
      [Tue Nov 28 11:36:06 2023] LustreError: 2200143:0:(ldlm_request.c:124:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1701171050, 317s ago); not entering recovery in server code, just going back to sleep ns: mdt-lustre-MDT0000_UUID lock: 000000008db36cb4/0x6256bf2f7e02c3f1 lrc: 3/1,0 mode: --/PR res: [0x20002d2c2:0x9:0x0].0x0 bits 0x13/0x0 rrc: 32 type: IBT gid 0 flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 2200143 timeout: 0 lvb_type: 0
      [Tue Nov 28 11:36:06 2023] LustreError: 2200143:0:(ldlm_request.c:124:ldlm_expired_completion_wait()) Skipped 1 previous similar message
      [Tue Nov 28 11:36:06 2023] LustreError: dumping log to /tmp/lustre-log.1701171367.2200131
      [Tue Nov 28 11:40:44 2023] Lustre: 2200163:0:(service.c:1437:ptlrpc_at_send_early_reply()) @@@ Could not add any time (4/4), not sending early reply  req@00000000e8b46235 x1783763196782336/t0(0) o101->e5dd43cf-d40b-4b07-91f8-6b6b6a2c6a1c@10.240.40.112@tcp:365/0 lens 576/3928 e 24 to 0 dl 1701171650 ref 2 fl Interpret:/0/0 rc 0/0 job:'ls.0'
      [Tue Nov 28 11:40:50 2023] Lustre: lustre-MDT0000: Client 94c002db-c56e-4796-b0f1-e4af0f5485a1 (at 10.240.40.112@tcp) reconnecting
      [Tue Nov 28 11:40:50 2023] Lustre: Skipped 1 previous similar message
      [Tue Nov 28 11:50:51 2023] Lustre: lustre-MDT0000: Client 94c002db-c56e-4796-b0f1-e4af0f5485a1 (at 10.240.40.112@tcp) reconnecting
      [Tue Nov 28 11:50:51 2023] Lustre: Skipped 3 previous similar messages
      [Tue Nov 28 11:56:35 2023] Lustre: mdt00_031: service thread pid 2200158 was inactive for 42.147 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
      [Tue Nov 28 11:56:35 2023] ptlrpc_watchdog_fire: 19 callbacks suppressed
      [Tue Nov 28 11:56:35 2023] Lustre: mdt00_036: service thread pid 2200163 was inactive for 42.094 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [Tue Nov 28 11:56:35 2023] Lustre: Skipped 18 previous similar messages
      [Tue Nov 28 11:56:35 2023] Lustre: Skipped 1 previous similar message
      [Tue Nov 28 11:56:35 2023] Pid: 2200163, comm: mdt00_036 4.18.0-477.15.1.el8_lustre.x86_64 #1 SMP Thu Nov 23 04:41:42 UTC 2023
      [Tue Nov 28 11:56:35 2023] Call Trace TBD:
      

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      racer test_1 - Timeout occurred after 784 minutes, last suite running was racer

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: