Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13692

MDS slow/hung threads at mdt_object_local_lock

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.14.0, Lustre 2.12.6
    • Lustre 2.12.4
    • None
    • clients running l2.12.3
    • 2
    • 9223372036854775807

    Description

      MDS threads hang/slow getting stack trace dumps

      Jun 16 14:20:20 nbp8-mds1 kernel: [1660354.377038] Lustre: MGS: Connection restored to 09397a7f-3fe2-2dc8-d25a-74d157cb2008 (at 10.151.50.72@o2ib)
      Jun 16 14:20:20 nbp8-mds1 kernel: [1660354.377044] Lustre: Skipped 277 previous similar messages
      Jun 16 14:22:20 nbp8-mds1 kernel: [1660474.852981] Lustre: 8579:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (30/30), not sending early reply
      Jun 16 14:22:20 nbp8-mds1 kernel: [1660474.852981]   req@ffff8979f9e16300 x1668974534442592/t1531807577764(0) o36->67852282-a3ed-5acb-a9e2-3cae43fe0406@10.151.0.201@o2ib:5/0 lens 488/3152 e 0 to 0 dl 1592342570 ref 2 fl Interpret:/0/0 rc 0/0
      Jun 16 14:22:20 nbp8-mds1 kernel: [1660474.952277] Lustre: 8579:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 134 previous similar messages
      Jun 16 14:26:14 nbp8-mds1 kernel: [1660708.333617] LNet: Service thread pid 14118 was inactive for 551.86s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      Jun 16 14:26:14 nbp8-mds1 kernel: [1660708.389992] LNet: Skipped 3 previous similar messages
      Jun 16 14:26:14 nbp8-mds1 kernel: [1660708.407185] Pid: 14118, comm: mdt00_115 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020
      Jun 16 14:26:14 nbp8-mds1 kernel: [1660708.407190] Call Trace:
      Jun 16 14:26:14 nbp8-mds1 kernel: [1660708.407202]  [<ffffffffc0f17c90>] ldlm_completion_ast+0x430/0x860 [ptlrpc]
      Jun 16 14:26:14 nbp8-mds1 kernel: [1660708.412473]  [<ffffffffc0f187b1>] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc]
      Jun 16 14:26:14 nbp8-mds1 kernel: [1660708.412496]  [<ffffffffc16205cb>] mdt_object_local_lock+0x50b/0xb20 [mdt]
      Jun 16 14:26:14 nbp8-mds1 kernel: [1660708.412506]  [<ffffffffc1620c50>] mdt_object_lock_internal+0x70/0x360 [mdt]
      Jun 16 14:26:14 nbp8-mds1 kernel: [1660708.412516]  [<ffffffffc1620f9c>] mdt_reint_object_lock+0x2c/0x60 [mdt]
      Jun 16 14:26:14 nbp8-mds1 kernel: [1660708.412530]  [<ffffffffc1639efc>] mdt_reint_striped_lock+0x8c/0x510 [mdt]
      Jun 16 14:26:14 nbp8-mds1 kernel: [1660708.412541]  [<ffffffffc163d866>] mdt_reint_setattr+0x676/0x1290 [mdt]
      Jun 16 14:26:14 nbp8-mds1 kernel: [1660708.412552]  [<ffffffffc163f963>] mdt_reint_rec+0x83/0x210 [mdt]
      Jun 16 14:26:14 nbp8-mds1 kernel: [1660708.412562]  [<ffffffffc161c273>] mdt_reint_internal+0x6e3/0xaf0 [mdt]
      Jun 16 14:26:14 nbp8-mds1 kernel: [1660708.412571]  [<ffffffffc16276e7>] mdt_reint+0x67/0x140 [mdt]
      Jun 16 14:26:14 nbp8-mds1 kernel: [1660708.412623]  [<ffffffffc0fb73ca>] tgt_request_handle+0xada/0x1570 [ptlrpc]
      Jun 16 14:26:14 nbp8-mds1 kernel: [1660708.412659]  [<ffffffffc0f5b47b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
      Jun 16 14:26:14 nbp8-mds1 kernel: [1660708.412691]  [<ffffffffc0f5ede4>] ptlrpc_main+0xb34/0x1470 [ptlrpc]
      Jun 16 14:26:14 nbp8-mds1 kernel: [1660708.412696]  [<ffffffff858c61f1>] kthread+0xd1/0xe0
      Jun 16 14:26:14 nbp8-mds1 kernel: [1660708.412699]  [<ffffffff85f8dd37>] ret_from_fork_nospec_end+0x0/0x39
      Jun 16 14:26:14 nbp8-mds1 kernel: [1660708.412722]  [<ffffffffffffffff>] 0xffffffffffffffff
      Jun 16 14:26:14 nbp8-mds1 kernel: [1660708.412725] LustreError: dumping log to /tmp/lustre-log.1592342774.14118
      Jun 16 14:26:14 nbp8-mds1 sec[2849]: SEC_EVENT |msg lustre hung thread
      Jun 16 14:26:15 nbp8-mds1 kernel: [1660709.178343] LNet: Service thread pid 8575 was inactive for 552.75s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      Jun 16 14:26:15 nbp8-mds1 kernel: [1660709.234458] Pid: 8575, comm: mdt00_043 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020
      Jun 16 14:26:15 nbp8-mds1 kernel: [1660709.234459] Call Trace:
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.234472]  [<ffffffffc0f17c90>] ldlm_completion_ast+0x430/0x860 [ptlrpc]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.239700]  [<ffffffffc0f187b1>] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.239749]  [<ffffffffc16205cb>] mdt_object_local_lock+0x50b/0xb20 [mdt]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.239763]  [<ffffffffc1620c50>] mdt_object_lock_internal+0x70/0x360 [mdt]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.239773]  [<ffffffffc1620f9c>] mdt_reint_object_lock+0x2c/0x60 [mdt]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.239785]  [<ffffffffc1639efc>] mdt_reint_striped_lock+0x8c/0x510 [mdt]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.239797]  [<ffffffffc163d866>] mdt_reint_setattr+0x676/0x1290 [mdt]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.239809]  [<ffffffffc163f963>] mdt_reint_rec+0x83/0x210 [mdt]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.239819]  [<ffffffffc161c273>] mdt_reint_internal+0x6e3/0xaf0 [mdt]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.239829]  [<ffffffffc16276e7>] mdt_reint+0x67/0x140 [mdt]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.239874]  [<ffffffffc0fb73ca>] tgt_request_handle+0xada/0x1570 [ptlrpc]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.239909]  [<ffffffffc0f5b47b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.239945]  [<ffffffffc0f5ede4>] ptlrpc_main+0xb34/0x1470 [ptlrpc]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.239952]  [<ffffffff858c61f1>] kthread+0xd1/0xe0
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.239958]  [<ffffffff85f8dd37>] ret_from_fork_nospec_end+0x0/0x39
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.239982]  [<ffffffffffffffff>] 0xffffffffffffffff
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.239988] Pid: 14083, comm: mdt00_088 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.239988] Call Trace:
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240020]  [<ffffffffc0f17c90>] ldlm_completion_ast+0x430/0x860 [ptlrpc]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240050]  [<ffffffffc0f187b1>] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240060]  [<ffffffffc16205cb>] mdt_object_local_lock+0x50b/0xb20 [mdt]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240071]  [<ffffffffc1620c50>] mdt_object_lock_internal+0x70/0x360 [mdt]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240082]  [<ffffffffc1620f9c>] mdt_reint_object_lock+0x2c/0x60 [mdt]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240093]  [<ffffffffc1639efc>] mdt_reint_striped_lock+0x8c/0x510 [mdt]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240104]  [<ffffffffc163d866>] mdt_reint_setattr+0x676/0x1290 [mdt]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240116]  [<ffffffffc163f963>] mdt_reint_rec+0x83/0x210 [mdt]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240127]  [<ffffffffc161c273>] mdt_reint_internal+0x6e3/0xaf0 [mdt]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240137]  [<ffffffffc16276e7>] mdt_reint+0x67/0x140 [mdt]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240174]  [<ffffffffc0fb73ca>] tgt_request_handle+0xada/0x1570 [ptlrpc]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240207]  [<ffffffffc0f5b47b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240240]  [<ffffffffc0f5ede4>] ptlrpc_main+0xb34/0x1470 [ptlrpc]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240244]  [<ffffffff858c61f1>] kthread+0xd1/0xe0
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240247]  [<ffffffff85f8dd37>] ret_from_fork_nospec_end+0x0/0x39
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240251]  [<ffffffffffffffff>] 0xffffffffffffffff
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240254] Pid: 10527, comm: mdt00_069 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240254] Call Trace:
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240285]  [<ffffffffc0f17c90>] ldlm_completion_ast+0x430/0x860 [ptlrpc]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240313]  [<ffffffffc0f187b1>] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240325]  [<ffffffffc16205cb>] mdt_object_local_lock+0x50b/0xb20 [mdt]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240335]  [<ffffffffc1620c50>] mdt_object_lock_internal+0x70/0x360 [mdt]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240344]  [<ffffffffc1620f9c>] mdt_reint_object_lock+0x2c/0x60 [mdt]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240357]  [<ffffffffc1639efc>] mdt_reint_striped_lock+0x8c/0x510 [mdt]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240369]  [<ffffffffc163d866>] mdt_reint_setattr+0x676/0x1290 [mdt]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240380]  [<ffffffffc163f963>] mdt_reint_rec+0x83/0x210 [mdt]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240389]  [<ffffffffc161c273>] mdt_reint_internal+0x6e3/0xaf0 [mdt]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240400]  [<ffffffffc16276e7>] mdt_reint+0x67/0x140 [mdt]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240439]  [<ffffffffc0fb73ca>] tgt_request_handle+0xada/0x1570 [ptlrpc]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240478]  [<ffffffffc0f5b47b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240510]  [<ffffffffc0f5ede4>] ptlrpc_main+0xb34/0x1470 [ptlrpc]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240514]  [<ffffffff858c61f1>] kthread+0xd1/0xe0
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240516]  [<ffffffff85f8dd37>] ret_from_fork_nospec_end+0x0/0x39
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240520]  [<ffffffffffffffff>] 0xffffffffffffffff
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240523] Pid: 14084, comm: mdt00_089 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240523] Call Trace:
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240554]  [<ffffffffc0f17c90>] ldlm_completion_ast+0x430/0x860 [ptlrpc]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240585]  [<ffffffffc0f187b1>] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240598]  [<ffffffffc16205cb>] mdt_object_local_lock+0x50b/0xb20 [mdt]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240610]  [<ffffffffc1620c50>] mdt_object_lock_internal+0x70/0x360 [mdt]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240620]  [<ffffffffc1620f9c>] mdt_reint_object_lock+0x2c/0x60 [mdt]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240631]  [<ffffffffc1639efc>] mdt_reint_striped_lock+0x8c/0x510 [mdt]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240647]  [<ffffffffc163d866>] mdt_reint_setattr+0x676/0x1290 [mdt]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240657]  [<ffffffffc163f963>] mdt_reint_rec+0x83/0x210 [mdt]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240666]  [<ffffffffc161c273>] mdt_reint_internal+0x6e3/0xaf0 [mdt]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240677]  [<ffffffffc16276e7>] mdt_reint+0x67/0x140 [mdt]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240716]  [<ffffffffc0fb73ca>] tgt_request_handle+0xada/0x1570 [ptlrpc]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240749]  [<ffffffffc0f5b47b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240781]  [<ffffffffc0f5ede4>] ptlrpc_main+0xb34/0x1470 [ptlrpc]
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240784]  [<ffffffff858c61f1>] kthread+0xd1/0xe0
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240788]  [<ffffffff85f8dd37>] ret_from_fork_nospec_end+0x0/0x39
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240792]  [<ffffffffffffffff>] 0xffffffffffffffff
      Jun 16 14:26:16 nbp8-mds1 kernel: [1660709.240796] LNet: Service thread pid 8568 was inactive for 552.81s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
       

      Attachments

        1. lustre-log.1592343544.8525
          4.75 MB
          Mahmoud Hanafi
        2. messages
          51 kB
          Mahmoud Hanafi
        3. vmcore-dmesg.txt
          1021 kB
          Mahmoud Hanafi

        Issue Links

          Activity

            People

              green Oleg Drokin
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: