Details
-
Bug
-
Resolution: Duplicate
-
Blocker
-
None
-
Lustre 2.4.0
-
3
-
6794
Description
Running racer I hit a problem multiple times where on completion AST the callback gets stuck looking for some object.
Alex thinks it's a not fully fixed race vs object deletion of some sort.
The stack trace looks like this:
[175924.328073] INFO: task ptlrpc_hr01_003:16414 blocked for more than 120 seconds. [175924.328610] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [175924.329108] ptlrpc_hr01_0 D 0000000000000006 3952 16414 2 0x00000000 [175924.329432] ffff880076a19920 0000000000000046 0000000000000040 0000000000000286 [175924.329950] ffff880076a198a0 0000000000000286 0000000000000286 ffffc9000376b040 [175924.330457] ffff8800573a67b8 ffff880076a19fd8 000000000000fba8 ffff8800573a67b8 [175924.330950] Call Trace: [175924.331191] [<ffffffffa0743c36>] ? htable_lookup+0x1a6/0x1c0 [obdclass] [175924.331505] [<ffffffffa041e79e>] cfs_waitq_wait+0xe/0x10 [libcfs] [175924.331807] [<ffffffffa0744243>] lu_object_find_at+0xb3/0x360 [obdclass] [175924.332104] [<ffffffff81057d60>] ? default_wake_function+0x0/0x20 [175924.332403] [<ffffffffa07413df>] ? keys_fill+0x6f/0x190 [obdclass] [175924.332746] [<ffffffffa0744506>] lu_object_find+0x16/0x20 [obdclass] [175924.333035] [<ffffffffa0549ea6>] mdt_object_find+0x56/0x170 [mdt] [175924.333398] [<ffffffffa0586e63>] mdt_lvbo_fill+0x2f3/0x800 [mdt] [175924.333715] [<ffffffffa0845c1a>] ldlm_server_completion_ast+0x18a/0x640 [ptlrpc] [175924.334204] [<ffffffffa0845a90>] ? ldlm_server_completion_ast+0x0/0x640 [ptlrpc] [175924.334655] [<ffffffffa081bbdc>] ldlm_work_cp_ast_lock+0xcc/0x200 [ptlrpc] [175924.334976] [<ffffffffa085c18f>] ptlrpc_set_wait+0x6f/0x880 [ptlrpc] [175924.335264] [<ffffffff81090154>] ? __init_waitqueue_head+0x24/0x40 [175924.335559] [<ffffffffa041e8a5>] ? cfs_waitq_init+0x15/0x20 [libcfs] [175924.335867] [<ffffffffa085876e>] ? ptlrpc_prep_set+0x11e/0x300 [ptlrpc] [175924.336134] [<ffffffffa081bb10>] ? ldlm_work_cp_ast_lock+0x0/0x200 [ptlrpc] [175924.336444] [<ffffffffa081e19b>] ldlm_run_ast_work+0x1db/0x460 [ptlrpc] [175924.336767] [<ffffffffa081eda4>] ldlm_reprocess_all+0x114/0x300 [ptlrpc] [175924.337067] [<ffffffffa08372e3>] ldlm_cli_cancel_local+0x2b3/0x470 [ptlrpc] [175924.337445] [<ffffffffa083bbab>] ldlm_cli_cancel+0x5b/0x360 [ptlrpc] [175924.337719] [<ffffffffa083bf42>] ldlm_blocking_ast_nocheck+0x92/0x320 [ptlrpc] [175924.338177] [<ffffffffa0819070>] ? lock_res_and_lock+0x30/0x50 [ptlrpc] [175924.338464] [<ffffffffa0549d40>] mdt_blocking_ast+0x190/0x2a0 [mdt] [175924.338759] [<ffffffffa042e401>] ? libcfs_debug_msg+0x41/0x50 [libcfs] [175924.339051] [<ffffffff814faf3e>] ? _spin_unlock+0xe/0x10 [175924.339339] [<ffffffffa083f950>] ldlm_handle_bl_callback+0x130/0x400 [ptlrpc] [175924.339814] [<ffffffffa0820cc6>] ldlm_lock_decref_internal+0x426/0xc80 [ptlrpc] [175924.340282] [<ffffffff814faf3e>] ? _spin_unlock+0xe/0x10 [175924.340614] [<ffffffffa0712217>] ? class_handle2object+0x97/0x170 [obdclass] [175924.341175] [<ffffffffa0821f49>] ldlm_lock_decref+0x39/0x90 [ptlrpc] [175924.341527] [<ffffffffa087112b>] ptlrpc_hr_main+0x39b/0x760 [ptlrpc] [175924.341824] [<ffffffff81057d60>] ? default_wake_function+0x0/0x20 [175924.342141] [<ffffffffa0870d90>] ? ptlrpc_hr_main+0x0/0x760 [ptlrpc] [175924.342444] [<ffffffff8100c14a>] child_rip+0xa/0x20 [175924.342734] [<ffffffffa0870d90>] ? ptlrpc_hr_main+0x0/0x760 [ptlrpc] [175924.343068] [<ffffffffa0870d90>] ? ptlrpc_hr_main+0x0/0x760 [ptlrpc] [175924.343376] [<ffffffff8100c140>] ? child_rip+0x0/0x20