[LU-14582] nested LDLM locks cause evictions due to RPC-in-flight limit Created: 05/Apr/21 Updated: 26/Nov/21 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Alex Zhuravlev | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
clients get evicted from MDS during racer runs quite often. this is due to nested LDLM locks in DNE setup. say, thread T is working on a client side:
Lustre: mdt00_027: service thread pid 13168 was inactive for 40.338 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Pid: 13168, comm: mdt00_027 4.18.0 #36 SMP Thu Mar 25 14:56:29 MSK 2021 Call Trace: [<0>] ldlm_completion_ast+0x77c/0x8d0 [ptlrpc] [<0>] ldlm_cli_enqueue_fini+0x9fc/0xe90 [ptlrpc] [<0>] ldlm_cli_enqueue+0x4d9/0x990 [ptlrpc] [<0>] osp_md_object_lock+0x154/0x290 [osp] [<0>] lod_object_lock+0x11a/0x780 [lod] [<0>] mdt_remote_object_lock_try+0x140/0x370 [mdt] [<0>] mdt_remote_object_lock+0x1a/0x20 [mdt] [<0>] mdt_reint_unlink+0x70d/0x2060 [mdt] [<0>] mdt_reint_rec+0x117/0x240 [mdt] [<0>] mdt_reint_internal+0x90c/0xab0 [mdt] [<0>] mdt_reint+0x57/0x100 [mdt] [<0>] tgt_request_handle+0xbe0/0x1970 [ptlrpc] [<0>] ptlrpc_main+0x134f/0x30e0 [ptlrpc] [<0>] kthread+0x100/0x140 [<0>] ret_from_fork+0x24/0x30 [<0>] 0xffffffffffffffff Lustre: mdt00_001: service thread pid 7729 was inactive for 65.046 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Pid: 10908, comm: mdt00_037 4.18.0 #36 SMP Thu Mar 25 14:56:29 MSK 2021 Lustre: Skipped 1 previous similar message Call Trace: [<0>] ldlm_completion_ast+0x77c/0x8d0 [ptlrpc] [<0>] ldlm_cli_enqueue_local+0x27d/0x7f0 [ptlrpc] [<0>] mdt_object_local_lock+0x479/0xad0 [mdt] [<0>] mdt_object_lock_internal+0x1d3/0x3f0 [mdt] [<0>] mdt_getattr_name_lock+0xdb5/0x1f80 [mdt] [<0>] mdt_intent_getattr+0x25b/0x420 [mdt] [<0>] mdt_intent_policy+0x659/0xee0 [mdt] [<0>] ldlm_lock_enqueue+0x418/0x9b0 [ptlrpc] [<0>] ldlm_handle_enqueue0+0x5d8/0x16c0 [ptlrpc] [<0>] tgt_enqueue+0x9f/0x200 [ptlrpc] [<0>] tgt_request_handle+0xbe0/0x1970 [ptlrpc] [<0>] ptlrpc_main+0x134f/0x30e0 [ptlrpc] [<0>] kthread+0x100/0x140 [<0>] ret_from_fork+0x24/0x30 [<0>] 0xffffffffffffffff Pid: 7729, comm: mdt00_001 4.18.0 #36 SMP Thu Mar 25 14:56:29 MSK 2021 Call Trace: [<0>] ldlm_completion_ast+0x77c/0x8d0 [ptlrpc] [<0>] ldlm_cli_enqueue_local+0x27d/0x7f0 [ptlrpc] [<0>] mdt_object_local_lock+0x539/0xad0 [mdt] [<0>] mdt_object_lock_internal+0x1d3/0x3f0 [mdt] [<0>] mdt_getattr_name_lock+0x78c/0x1f80 [mdt] [<0>] mdt_intent_getattr+0x25b/0x420 [mdt] [<0>] mdt_intent_policy+0x659/0xee0 [mdt] [<0>] ldlm_lock_enqueue+0x418/0x9b0 [ptlrpc] [<0>] ldlm_handle_enqueue0+0x5d8/0x16c0 [ptlrpc] [<0>] tgt_enqueue+0x9f/0x200 [ptlrpc] [<0>] tgt_request_handle+0xbe0/0x1970 [ptlrpc] [<0>] ptlrpc_main+0x134f/0x30e0 [ptlrpc] [<0>] kthread+0x100/0x140 [<0>] ret_from_fork+0x24/0x30 [<0>] 0xffffffffffffffff Lustre: mdt00_018: service thread pid 10527 was inactive for 65.062 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. LustreError: 7718:0:(ldlm_lockd.c:260:expired_lock_main()) ### lock callback timer expired after 101s: evicting client at 0@lo ns: mdt-lustre-MDT0001_UUID lock: 00000000bd306e1a/0xfbfedfd6efc4a594 lrc: 3/0,0 mode: PR/PR res: [0x240000403:0x172f:0x0].0x0 bits 0x13/0x0 rrc: 3 type: IBT gid 0 flags: 0x60200400000020 nid: 0@lo remote: 0xfbfedfd6efc498ba expref: 705 pid: 11012 timeout: 739 lvb_type: 0 |
| Comments |
| Comment by Alex Zhuravlev [ 05/Apr/21 ] |
|
I tend to think we don't really need to hold the first lock during enqueue for UPDATE to another MDS. |
| Comment by Andreas Dilger [ 05/Apr/21 ] |
|
I was going to say the same - once the lookup is complete and we have the remote FID, there is no benefit to holding the first lock. |
| Comment by Andreas Dilger [ 05/Apr/21 ] |
|
How hard would it be to make a patch to fix this? We definitely see several different DLM timeouts in production these days, and fixing them would be great. |
| Comment by Alex Zhuravlev [ 05/Apr/21 ] |
|
have to try.. just to release the lock sooner seem to be trivial (in lmv_intent_remote()), but AFAIU this is not enouch as normally we set lock's data with mdc_set_lock_data() in llite |
| Comment by Alex Zhuravlev [ 07/Apr/21 ] |
|
there are another cases with nested locks:
|
| Comment by Andreas Dilger [ 09/Apr/21 ] |
|
Oleg, Lai, any comment on this? |
| Comment by Lai Siyao [ 12/Apr/21 ] |
|
Yes, client should avoid holding locks, but it's bit nasty for client layer: inode and dentry preparations need to be registered as callback, and called in LMV after successful LOOKUP lock. |