Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15657

replay-dual test_30: operation ldlm_enqueue failed: rc = -107

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.15.0
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/ce9f15e1-c403-44ac-8152-162384f0d6a8

      test_30 failed with the following error:

      Timeout occurred after 117 mins, last suite running was replay-dual
      
      [Wed Feb  9 10:09:01 2022] Lustre: DEBUG MARKER: == replay-dual test 30: layout lock replay is not blocked on IO ========================================================== 10:09:01 (1644401341)
      [Wed Feb  9 10:09:05 2022] LustreError: 11-0: lustre-MDT0000-lwp-OST0001: operation ldlm_enqueue to node 10.240.22.123@tcp failed: rc = -107
      [Wed Feb  9 10:09:05 2022] LustreError: Skipped 1 previous similar message
      [Wed Feb  9 10:09:25 2022] Lustre: ll_ost00_045: service thread pid 116830 was inactive for 62.178 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [Wed Feb  9 10:09:25 2022] Pid: 116786, comm: ll_ost00_011 4.18.0-348.2.1.el8_lustre.x86_64 #1 SMP Tue Feb 8 01:15:01 UTC 2022
      [Wed Feb  9 10:09:25 2022] Lustre: Skipped 1 previous similar message
      [Wed Feb  9 10:09:25 2022] Call Trace TBD:
      [Wed Feb  9 10:09:25 2022] [<0>] ldlm_completion_ast+0x7ac/0x900 [ptlrpc]
      [Wed Feb  9 10:09:25 2022] [<0>] ldlm_cli_enqueue_local+0x307/0x860 [ptlrpc]
      [Wed Feb  9 10:09:25 2022] [<0>] ofd_destroy_by_fid+0x235/0x4a0 [ofd]
      [Wed Feb  9 10:09:25 2022] [<0>] ofd_destroy_hdl+0x263/0xa10 [ofd]
      [Wed Feb  9 10:09:25 2022] [<0>] tgt_request_handle+0xc93/0x1a40 [ptlrpc]
      [Wed Feb  9 10:09:25 2022] [<0>] ptlrpc_server_handle_request+0x323/0xbd0 [ptlrpc]
      [Wed Feb  9 10:09:25 2022] [<0>] ptlrpc_main+0xc06/0x1560 [ptlrpc]
      [Wed Feb  9 10:09:25 2022] [<0>] kthread+0x116/0x130
      [Wed Feb  9 10:09:25 2022] [<0>] ret_from_fork+0x35/0x40
      [Wed Feb  9 10:09:25 2022] Pid: 116830, comm: ll_ost00_045 4.18.0-348.2.1.el8_lustre.x86_64 #1 SMP Tue Feb 8 01:15:01 UTC 2022
      [Wed Feb  9 10:09:25 2022] Call Trace TBD:
      [Wed Feb  9 10:09:25 2022] [<0>] ldlm_completion_ast+0x7ac/0x900 [ptlrpc]
      [Wed Feb  9 10:09:25 2022] [<0>] ldlm_cli_enqueue_local+0x307/0x860 [ptlrpc]
      [Wed Feb  9 10:09:25 2022] [<0>] ofd_destroy_by_fid+0x235/0x4a0 [ofd]
      [Wed Feb  9 10:09:25 2022] [<0>] ofd_destroy_hdl+0x263/0xa10 [ofd]
      [Wed Feb  9 10:09:25 2022] [<0>] tgt_request_handle+0xc93/0x1a40 [ptlrpc]
      [Wed Feb  9 10:09:25 2022] [<0>] ptlrpc_server_handle_request+0x323/0xbd0 [ptlrpc]
      [Wed Feb  9 10:09:25 2022] [<0>] ptlrpc_main+0xc06/0x1560 [ptlrpc]
      [Wed Feb  9 10:09:25 2022] [<0>] kthread+0x116/0x130
      [Wed Feb  9 10:09:25 2022] [<0>] ret_from_fork+0x35/0x40
      [Wed Feb  9 10:09:25 2022] Pid: 116816, comm: ll_ost00_031 4.18.0-348.2.1.el8_lustre.x86_64 #1 SMP Tue Feb 8 01:15:01 UTC 2022
      [Wed Feb  9 10:09:25 2022] Call Trace TBD:
      [Wed Feb  9 10:09:25 2022] Lustre: ll_ost00_014: service thread pid 116797 was inactive for 62.212 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
      [Wed Feb  9 10:09:25 2022] [<0>] ldlm_completion_ast+0x7ac/0x900 [ptlrpc]
      [Wed Feb  9 10:09:25 2022] [<0>] ldlm_cli_enqueue_local+0x307/0x860 [ptlrpc]
      [Wed Feb  9 10:09:25 2022] [<0>] ofd_destroy_by_fid+0x235/0x4a0 [ofd]
      [Wed Feb  9 10:09:25 2022] [<0>] ofd_destroy_hdl+0x263/0xa10 [ofd]
      [Wed Feb  9 10:09:25 2022] [<0>] tgt_request_handle+0xc93/0x1a40 [ptlrpc]
      [Wed Feb  9 10:09:25 2022] [<0>] ptlrpc_server_handle_request+0x323/0xbd0 [ptlrpc]
      [Wed Feb  9 10:09:25 2022] [<0>] ptlrpc_main+0xc06/0x1560 [ptlrpc]
      [Wed Feb  9 10:09:25 2022] [<0>] kthread+0x116/0x130
      [Wed Feb  9 10:09:25 2022] [<0>] ret_from_fork+0x35/0x40
      [Wed Feb  9 10:09:26 2022] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null
      [Wed Feb  9 10:09:27 2022] Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-26vm4.onyx.whamcloud.com: executing set_default_debug -1 all 4
      [Wed Feb  9 10:09:27 2022] Lustre: DEBUG MARKER: onyx-26vm4.onyx.whamcloud.com: executing set_default_debug -1 all 4
      [Wed Feb  9 10:11:49 2022] Lustre: ll_ost00_003: service thread pid 62623 was inactive for 144.982 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
      [Wed Feb  9 10:11:49 2022] Lustre: Skipped 54 previous similar messages
      [Wed Feb  9 10:11:53 2022] Lustre: ll_ost00_052: service thread pid 118905 was inactive for 143.741 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
      [Wed Feb  9 10:11:53 2022] Lustre: Skipped 7 previous similar messages
      [Wed Feb  9 10:11:57 2022] Lustre: ll_ost00_072: service thread pid 131846 was inactive for 143.588 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
      [Wed Feb  9 10:11:57 2022] Lustre: Skipped 3 previous similar messages
      [Wed Feb  9 10:12:01 2022] Lustre: ll_ost00_074: service thread pid 131848 was inactive for 144.645 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
      [Wed Feb  9 10:12:01 2022] Lustre: Skipped 2 previous similar messages
      [Wed Feb  9 10:13:31 2022] LustreError: 116830:0:(ldlm_request.c:123:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1644401304, 308s ago); not entering recovery in server code, just going back to sleep ns: filter-lustre-OST0006_UUID lock: 0000000096b27b45/0xca050e0025d0b7f7 lrc: 3/0,1 mode: --/PW res: [0xdee:0x0:0x0].0x0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) gid 0 flags: 0x40010080000000 nid: local remote: 0x0 expref: -99 pid: 116830 timeout: 0 lvb_type: 0
      [Wed Feb  9 10:13:31 2022] LustreError: 116830:0:(ldlm_request.c:123:ldlm_expired_completion_wait()) Skipped 47 previous similar messages
      [Wed Feb  9 10:13:31 2022] LustreError: dumping log to /tmp/lustre-log.1644401612.116786
      [Wed Feb  9 10:14:36 2022] LustreError: 131848:0:(ldlm_request.c:123:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1644401377, 300s ago); not entering recovery in server code, just going back to sleep ns: filter-lustre-OST0006_UUID lock: 000000008894b201/0xca050e0025d0be8e lrc: 3/0,1 mode: --/PW res: [0xdf5:0x0:0x0].0x0 rrc: 4 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) gid 0 flags: 0x40010080000000 nid: local remote: 0x0 expref: -99 pid: 131848 timeout: 0 lvb_type: 0
      [Wed Feb  9 10:14:36 2022] LustreError: 131848:0:(ldlm_request.c:123:ldlm_expired_completion_wait()) Skipped 25 previous similar messages
      [Wed Feb  9 10:15:09 2022] LustreError: 131858:0:(ldlm_request.c:123:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1644401380, 330s ago); not entering recovery in server code, just going back to sleep ns: filter-lustre-OST0006_UUID lock: 000000002e1d934b/0xca050e0025d0be9c lrc: 3/0,1 mode: --/PW res: [0xdf2:0x0:0x0].0x0 rrc: 4 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) gid 0 flags: 0x40010080000000 nid: local remote: 0x0 expref: -99 pid: 131858 timeout: 0 lvb_type: 0
      [Wed Feb  9 10:15:09 2022] LustreError: 131858:0:(ldlm_request.c:123:ldlm_expired_completion_wait()) Skipped 3 previous similar messages
      [Wed Feb  9 10:19:18 2022] Lustre: 131861:0:(service.c:1436:ptlrpc_at_send_early_reply()) @@@ Could not add any time (5/5), not sending early reply  req@00000000a56db3f7 x1724276360899264/t0(0) o6->lustre-MDT0000-mdtlov_UUID@10.240.22.123@tcp:639/0 lens 544/432 e 20 to 0 dl 1644401964 ref 2 fl Interpret:/0/0 rc 0/0 job:'osp-syn-2-0.0'
      [Wed Feb  9 10:19:23 2022] Lustre: 131861:0:(service.c:1436:ptlrpc_at_send_early_reply()) @@@ Could not add any time (5/5), not sending early reply  req@00000000ee42b4b0 x1724276360903360/t0(0) o6->lustre-MDT0000-mdtlov_UUID@10.240.22.123@tcp:644/0 lens 544/0 e 20 to 0 dl 1644401969 ref 2 fl New:/0/ffffffff rc 0/-1 job:'osp-syn-4-0.0'
      [Wed Feb  9 10:19:23 2022] Lustre: 131861:0:(service.c:1436:ptlrpc_at_send_early_reply()) Skipped 15 previous similar messages
      [Wed Feb  9 10:19:24 2022] Lustre: lustre-OST0001: Client lustre-MDT0000-mdtlov_UUID (at 10.240.22.123@tcp) reconnecting
      [Wed Feb  9 10:19:29 2022] Lustre: lustre-OST0003: Client lustre-MDT0000-mdtlov_UUID (at 10.240.22.123@tcp) reconnecting
      

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      replay-dual test_30 - Timeout occurred after 117 mins, last suite running was replay-dual

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: