Details
-
Bug
-
Resolution: Incomplete
-
Major
-
None
-
Lustre 2.10.7
-
None
-
2
-
9223372036854775807
Description
MDT with very high load and lots of threads stuck at ldlm_completion_ast and
osp_sync_process_queues.
May be a dup of LU-11359
[2418705.962173] LNet: Service thread pid 34570 was inactive for 400.29s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [2418706.013418] Pid: 34570, comm: mdt01_236 3.10.0-693.21.1.el7.20180508.x86_64.lustre2106 #1 SMP Wed Jan 30 00:30:34 UTC 2019 [2418706.013419] Call Trace: [2418706.013433] [<ffffffffa0beba11>] ldlm_completion_ast+0x5b1/0x920 [ptlrpc] [2418706.034654] [<ffffffffa0becb53>] ldlm_cli_enqueue_local+0x233/0x860 [ptlrpc] [2418706.034673] [<ffffffffa1248d72>] mdt_object_local_lock+0x512/0xb00 [mdt] [2418706.034680] [<ffffffffa12493be>] mdt_object_lock_internal+0x5e/0x300 [mdt] [2418706.034687] [<ffffffffa124a164>] mdt_getattr_name_lock+0x8a4/0x1910 [mdt] [2418706.034694] [<ffffffffa124b480>] mdt_intent_getattr+0x2b0/0x480 [mdt] [2418706.034701] [<ffffffffa124746b>] mdt_intent_opc+0x1eb/0xaf0 [mdt] [2418706.034708] [<ffffffffa124fd08>] mdt_intent_policy+0x138/0x320 [mdt] [2418706.034727] [<ffffffffa0bd12cd>] ldlm_lock_enqueue+0x38d/0x980 [ptlrpc] [2418706.034751] [<ffffffffa0bfac13>] ldlm_handle_enqueue0+0xa83/0x1670 [ptlrpc] [2418706.034787] [<ffffffffa0c80622>] tgt_enqueue+0x62/0x210 [ptlrpc] [2418706.034818] [<ffffffffa0c8428a>] tgt_request_handle+0x92a/0x1370 [ptlrpc] [2418706.034845] [<ffffffffa0c2c6cb>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc] [2418706.034870] [<ffffffffa0c306b2>] ptlrpc_main+0xa92/0x1e40 [ptlrpc] [2418706.034873] [<ffffffff810b1131>] kthread+0xd1/0xe0 [2418706.034875] [<ffffffff816a14f7>] ret_from_fork+0x77/0xb0 [2418706.034894] [<ffffffffffffffff>] 0xffffffffffffffff [2418706.034897] LustreError: dumping log to /tmp/lustre-log.1557763688.34570
PID: 32153 TASK: ffff883ed28d5ee0 CPU: 10 COMMAND: "osp-syn-153-0"
#0 [ffff883ed292f908] __schedule at ffffffff816946af
#1 [ffff883ed292f990] schedule at ffffffff81694cd9
#2 [ffff883ed292f9a0] osp_sync_process_queues at ffffffffa13f8197 [osp]
#3 [ffff883ed292fab0] llog_process_thread at ffffffffa09ad8bb [obdclass]
#4 [ffff883ed292fb78] llog_process_or_fork at ffffffffa09ae5cc [obdclass]
#5 [ffff883ed292fbc0] llog_cat_process_cb at ffffffffa09b3dad [obdclass]
#6 [ffff883ed292fc30] llog_process_thread at ffffffffa09ad8bb [obdclass]
#7 [ffff883ed292fcf8] llog_process_or_fork at ffffffffa09ae5cc [obdclass]
#8 [ffff883ed292fd40] llog_cat_process_or_fork at ffffffffa09b2e49 [obdclass]
#9 [ffff883ed292fdb8] llog_cat_process at ffffffffa09b2f7e [obdclass]
#10 [ffff883ed292fdd8] osp_sync_thread at ffffffffa13f63af [osp]
#11 [ffff883ed292fec8] kthread at ffffffff810b1131
#12 [ffff883ed292ff50] ret_from_fork at ffffffff816a14f7
And
PID: 33442 TASK: ffff881fcfa38000 CPU: 21 COMMAND: "mdt01_073"
#0 [ffff883eb683f718] __schedule at ffffffff816946af
#1 [ffff883eb683f7a8] schedule at ffffffff81694cd9
#2 [ffff883eb683f7b8] schedule_timeout at ffffffff81692574
#3 [ffff883eb683f860] ldlm_completion_ast at ffffffffa0beba11 [ptlrpc]
#4 [ffff883eb683f900] ldlm_cli_enqueue_local at ffffffffa0becb53 [ptlrpc]
#5 [ffff883eb683f998] mdt_object_local_lock at ffffffffa1248d72 [mdt]
#6 [ffff883eb683fa48] mdt_object_lock_internal at ffffffffa12493be [mdt]
#7 [ffff883eb683fa98] mdt_getattr_name_lock at ffffffffa124a164 [mdt]
#8 [ffff883eb683fb20] mdt_intent_getattr at ffffffffa124b480 [mdt]
#9 [ffff883eb683fb60] mdt_intent_opc at ffffffffa124746b [mdt]
#10 [ffff883eb683fbc0] mdt_intent_policy at ffffffffa124fd08 [mdt]
#11 [ffff883eb683fbf8] ldlm_lock_enqueue at ffffffffa0bd12cd [ptlrpc]
#12 [ffff883eb683fc50] ldlm_handle_enqueue0 at ffffffffa0bfac13 [ptlrpc]
#13 [ffff883eb683fce0] tgt_enqueue at ffffffffa0c80622 [ptlrpc]
#14 [ffff883eb683fd00] tgt_request_handle at ffffffffa0c8428a [ptlrpc]
#15 [ffff883eb683fd48] ptlrpc_server_handle_request at ffffffffa0c2c6cb [ptlrpc]
#16 [ffff883eb683fde8] ptlrpc_main at ffffffffa0c306b2 [ptlrpc]
#17 [ffff883eb683fec8] kthread at ffffffff810b1131
#18 [ffff883eb683ff50] ret_from_fork at ffffffff816a14f7
Attachments
Issue Links
- is related to
-
LU-14611 racer test 1 hangs in ls/locking
- Resolved