Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12354

MDT threads stuck at ldlm_completion_ast

    XMLWordPrintable

Details

    • Bug
    • Resolution: Incomplete
    • Major
    • None
    • Lustre 2.10.7
    • None
    • 2
    • 9223372036854775807

    Description

      MDT with very high load and lots of threads stuck at ldlm_completion_ast and
      osp_sync_process_queues.

      May be a dup of LU-11359

      [2418705.962173] LNet: Service thread pid 34570 was inactive for 400.29s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [2418706.013418] Pid: 34570, comm: mdt01_236 3.10.0-693.21.1.el7.20180508.x86_64.lustre2106 #1 SMP Wed Jan 30 00:30:34 UTC 2019
      [2418706.013419] Call Trace:
      [2418706.013433]  [<ffffffffa0beba11>] ldlm_completion_ast+0x5b1/0x920 [ptlrpc]
      [2418706.034654]  [<ffffffffa0becb53>] ldlm_cli_enqueue_local+0x233/0x860 [ptlrpc]
      [2418706.034673]  [<ffffffffa1248d72>] mdt_object_local_lock+0x512/0xb00 [mdt]
      [2418706.034680]  [<ffffffffa12493be>] mdt_object_lock_internal+0x5e/0x300 [mdt]
      [2418706.034687]  [<ffffffffa124a164>] mdt_getattr_name_lock+0x8a4/0x1910 [mdt]
      [2418706.034694]  [<ffffffffa124b480>] mdt_intent_getattr+0x2b0/0x480 [mdt]
      [2418706.034701]  [<ffffffffa124746b>] mdt_intent_opc+0x1eb/0xaf0 [mdt]
      [2418706.034708]  [<ffffffffa124fd08>] mdt_intent_policy+0x138/0x320 [mdt]
      [2418706.034727]  [<ffffffffa0bd12cd>] ldlm_lock_enqueue+0x38d/0x980 [ptlrpc]
      [2418706.034751]  [<ffffffffa0bfac13>] ldlm_handle_enqueue0+0xa83/0x1670 [ptlrpc]
      [2418706.034787]  [<ffffffffa0c80622>] tgt_enqueue+0x62/0x210 [ptlrpc]
      [2418706.034818]  [<ffffffffa0c8428a>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
      [2418706.034845]  [<ffffffffa0c2c6cb>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
      [2418706.034870]  [<ffffffffa0c306b2>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
      [2418706.034873]  [<ffffffff810b1131>] kthread+0xd1/0xe0
      [2418706.034875]  [<ffffffff816a14f7>] ret_from_fork+0x77/0xb0
      [2418706.034894]  [<ffffffffffffffff>] 0xffffffffffffffff
      [2418706.034897] LustreError: dumping log to /tmp/lustre-log.1557763688.34570
      
      PID: 32153  TASK: ffff883ed28d5ee0  CPU: 10  COMMAND: "osp-syn-153-0"
       #0 [ffff883ed292f908] __schedule at ffffffff816946af
       #1 [ffff883ed292f990] schedule at ffffffff81694cd9
       #2 [ffff883ed292f9a0] osp_sync_process_queues at ffffffffa13f8197 [osp]
       #3 [ffff883ed292fab0] llog_process_thread at ffffffffa09ad8bb [obdclass]
       #4 [ffff883ed292fb78] llog_process_or_fork at ffffffffa09ae5cc [obdclass]
       #5 [ffff883ed292fbc0] llog_cat_process_cb at ffffffffa09b3dad [obdclass]
       #6 [ffff883ed292fc30] llog_process_thread at ffffffffa09ad8bb [obdclass]
       #7 [ffff883ed292fcf8] llog_process_or_fork at ffffffffa09ae5cc [obdclass]
       #8 [ffff883ed292fd40] llog_cat_process_or_fork at ffffffffa09b2e49 [obdclass]
       #9 [ffff883ed292fdb8] llog_cat_process at ffffffffa09b2f7e [obdclass]
      #10 [ffff883ed292fdd8] osp_sync_thread at ffffffffa13f63af [osp]
      #11 [ffff883ed292fec8] kthread at ffffffff810b1131
      #12 [ffff883ed292ff50] ret_from_fork at ffffffff816a14f7
      

      And

      PID: 33442  TASK: ffff881fcfa38000  CPU: 21  COMMAND: "mdt01_073"
       #0 [ffff883eb683f718] __schedule at ffffffff816946af
       #1 [ffff883eb683f7a8] schedule at ffffffff81694cd9
       #2 [ffff883eb683f7b8] schedule_timeout at ffffffff81692574
       #3 [ffff883eb683f860] ldlm_completion_ast at ffffffffa0beba11 [ptlrpc]
       #4 [ffff883eb683f900] ldlm_cli_enqueue_local at ffffffffa0becb53 [ptlrpc]
       #5 [ffff883eb683f998] mdt_object_local_lock at ffffffffa1248d72 [mdt]
       #6 [ffff883eb683fa48] mdt_object_lock_internal at ffffffffa12493be [mdt]
       #7 [ffff883eb683fa98] mdt_getattr_name_lock at ffffffffa124a164 [mdt]
       #8 [ffff883eb683fb20] mdt_intent_getattr at ffffffffa124b480 [mdt]
       #9 [ffff883eb683fb60] mdt_intent_opc at ffffffffa124746b [mdt]
      #10 [ffff883eb683fbc0] mdt_intent_policy at ffffffffa124fd08 [mdt]
      #11 [ffff883eb683fbf8] ldlm_lock_enqueue at ffffffffa0bd12cd [ptlrpc]
      #12 [ffff883eb683fc50] ldlm_handle_enqueue0 at ffffffffa0bfac13 [ptlrpc]
      #13 [ffff883eb683fce0] tgt_enqueue at ffffffffa0c80622 [ptlrpc]
      #14 [ffff883eb683fd00] tgt_request_handle at ffffffffa0c8428a [ptlrpc]
      #15 [ffff883eb683fd48] ptlrpc_server_handle_request at ffffffffa0c2c6cb [ptlrpc]
      #16 [ffff883eb683fde8] ptlrpc_main at ffffffffa0c306b2 [ptlrpc]
      #17 [ffff883eb683fec8] kthread at ffffffff810b1131
      #18 [ffff883eb683ff50] ret_from_fork at ffffffff816a14f7
      

      Attachments

        Issue Links

          Activity

            People

              hongchao.zhang Hongchao Zhang
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: