[LU-11305] MDS' all cpu cores eaten up by ldlm_cn [100% cpu sys] Created: 30/Aug/18 Updated: 26/Jul/22 Resolved: 26/Jul/22 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | anhua | Assignee: | Yang Sheng |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | ldlm | ||
| Environment: |
centos 7.4, SSD MDT * 2 |
||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Epic/Theme: | ldlm | ||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
MDS would sometimes have all cpu cores with cpu sys 100%, consumed all by ldlm_cn processes. PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND ...... [a lot more]
When this occurs, the mds server totally does not service, that is, all clients get near to stuck on every lustre operation. I am wondering if this might be caused by too many lock cancellings?
Sample logs would be like below: Aug 28 15:40:13 mds01 kernel: Lustre: 173646:0:(service.c:1296:ptlrpc_at_send_early_reply()) @@@ Already past deadline ( Aug 28 15:45:13 mds01 kernel: LNet: Service thread pid 5649 was inactive for 200.66s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
|
| Comments |
| Comment by Ian Costello [ 09/Aug/19 ] |
|
I am seeing the same issue as well however on the oss's, the similarities I am seeing are the ldlm threads consuming all except 2 cores (libcfs cpu_pattern) on the oss's (DNE). |
| Comment by Aurelien Degremont (Inactive) [ 16/Sep/19 ] |
|
We were able to reproduce a very similar issue. We are seeing an unresponsive MDS, all CPUs at 100% with lots of ldlm_cn_xxx threads and few ldlm_bl threads consumming all the CPUs. ldlm_cn threads have this kind of strack trace: Aug 26 13:38:31 ip-198-19-95-73 kernel: Lustre: 79258:0:(service.c:2114:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:136s); client may timeout. req@ffff88d6dfb47850 x1642929689998160/t0(0) o103->0e4989ad-ae70-2c70-9db7-05f5fafae50e@10.26.86.9@tcp:745/0 lens 3584/0 e 0 to 0 dl 1566826575 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Aug 26 13:38:31 ip-198-19-95-73 kernel: Lustre: 79258:0:(service.c:2114:ptlrpc_server_handle_request()) Skipped 11126 previous similar messages Aug 26 13:39:11 ip-198-19-95-73 kernel: LNet: Service thread pid 14956 was inactive for 204.11s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 26 13:39:11 ip-198-19-95-73 kernel: Pid: 14956, comm: mdt01_094 4.14.123-111.109.amzn2.x86_64 #1 SMP Mon Jun 10 19:37:57 UTC 2019 Aug 26 13:39:11 ip-198-19-95-73 kernel: Call Trace: Aug 26 13:39:11 ip-198-19-95-73 kernel: ldlm_completion_ast+0x63e/0x910 [ptlrpc] Aug 26 13:39:11 ip-198-19-95-73 kernel: ldlm_cli_enqueue_local+0x27c/0x880 [ptlrpc] Aug 26 13:39:11 ip-198-19-95-73 kernel: mdt_object_local_lock+0x759/0xa00 [mdt] Aug 26 13:39:11 ip-198-19-95-73 kernel: mdt_object_lock_internal+0x5b/0x2e0 [mdt] Aug 26 13:39:11 ip-198-19-95-73 kernel: mdt_reint_open+0xb4e/0x3020 [mdt] Aug 26 13:39:11 ip-198-19-95-73 kernel: mdt_reint_rec+0x7f/0x250 [mdt] Aug 26 13:39:11 ip-198-19-95-73 kernel: mdt_reint_internal+0x5ee/0x680 [mdt] Aug 26 13:39:11 ip-198-19-95-73 kernel: mdt_intent_reint+0x173/0x3e0 [mdt] Aug 26 13:39:11 ip-198-19-95-73 kernel: mdt_intent_opc+0x385/0xa00 [mdt] Aug 26 13:39:11 ip-198-19-95-73 kernel: mdt_intent_policy+0x1c5/0x370 [mdt] Aug 26 13:39:11 ip-198-19-95-73 kernel: ldlm_lock_enqueue+0x234/0x950 [ptlrpc] Aug 26 13:39:11 ip-198-19-95-73 kernel: ldlm_handle_enqueue0+0xa0b/0x1750 [ptlrpc]
A couple of ldlm_bl_NN have this kind of stack, obtained from crash bt -T: [ffffc9001eecfa58] try_to_wake_up at ffffffff810b3b74 [ffffc9001eecfac8] __update_load_avg_se at ffffffff810b9258 [ffffc9001eecfad8] __update_load_avg_se at ffffffff810b9258 [ffffc9001eecfb10] update_load_avg at ffffffff810bbacd [ffffc9001eecfb30] account_entity_enqueue at ffffffff810bdb45 [ffffc9001eecfb68] enqueue_entity at ffffffff810bfa63 [ffffc9001eecfbf8] tgt_blocking_ast at ffffffffa0c78a28 [ptlrpc] [ffffc9001eecfc40] ldlm_export_lock_hash at ffffffffa0bea6d0 [ptlrpc] -------------- [ffffc9001eecfc48] cfs_hash_bd_from_key at ffffffffa084fad1 [libcfs] [ffffc9001eecfc68] cfs_hash_rw_lock at ffffffffa084f370 [libcfs] [ffffc9001eecfc70] cfs_hash_hh_hnode_del at ffffffffa084f3e0 [libcfs] [ffffc9001eecfc78] cfs_hash_bd_del_locked at ffffffffa084f7df [libcfs] [ffffc9001eecfc90] cfs_hash_rw_unlock at ffffffffa084ff20 [libcfs] [ffffc9001eecfc98] cfs_hash_del at ffffffffa085305e [libcfs] [ffffc9001eecfcd8] cfs_hash_bd_from_key at ffffffffa084fad1 [libcfs] [ffffc9001eecfd18] cfs_hash_bd_from_key at ffffffffa084fad1 [libcfs] [ffffc9001eecfd80] ldlm_lock_put at ffffffffa0bc3f73 [ptlrpc] [ffffc9001eecfd90] cfs_hash_hh_hhead at ffffffffa084f390 [libcfs] [ffffc9001eecfd98] cfs_hash_for_each_relax at ffffffffa0850458 [libcfs] [ffffc9001eecfda0] ldlm_cancel_locks_for_export_cb at ffffffffa0bcbe50 [ptlrpc] [ffffc9001eecfdb8] ldlm_export_lock_put at ffffffffa0bea740 [ptlrpc] [ffffc9001eecfdf0] ldlm_cancel_locks_for_export_cb at ffffffffa0bcbe50 [ptlrpc] [ffffc9001eecfe18] cfs_hash_for_each_empty at ffffffffa08538b8 [libcfs] [ffffc9001eecfe58] ldlm_export_cancel_locks at ffffffffa0bcc10f [ptlrpc] [ffffc9001eecfe80] ldlm_bl_thread_main at ffffffffa0bf5147 [ptlrpc] [ffffc9001eecfec0] default_wake_function at ffffffff810b4040 [ffffc9001eecff08] ldlm_bl_thread_main at ffffffffa0bf4c60 [ptlrpc] [ffffc9001eecff10] kthread at ffffffff810a90fa [ffffc9001eecff20] kthread at ffffffff810a8fe0 [ffffc9001eecff50] ret_from_fork at ffffffff81800205 perf top : Samples: 10M of event 'cpu-clock', Event count (approx.): 118365912201 Overhead Shared Object Symbol 45.75% [libcfs] [k] cfs_hash_hh_hhead 21.23% [libcfs] [k] cfs_hash_for_each_relax 6.93% [kernel] [k] __x86_indirect_thunk_rax 3.85% [obdclass] [k] class_handle2object 2.72% [kernel] [k] _raw_read_lock 2.46% [libcfs] [k] cfs_hash_rw_unlock This issue appears when there are lots of locks in the MDT namespace. ldlm.namespaces.mdt-lustre-MDT0000_UUID.contended_locks=32 ldlm.namespaces.mdt-lustre-MDT0000_UUID.contention_seconds=2 ldlm.namespaces.mdt-lustre-MDT0000_UUID.ctime_age_limit=10 ldlm.namespaces.mdt-lustre-MDT0000_UUID.early_lock_cancel=0 ldlm.namespaces.mdt-lustre-MDT0000_UUID.lock_count=46344331 ldlm.namespaces.mdt-lustre-MDT0000_UUID.lock_timeouts=55 ldlm.namespaces.mdt-lustre-MDT0000_UUID.lock_unused_count=0 ldlm.namespaces.mdt-lustre-MDT0000_UUID.lru_max_age=3900000 ldlm.namespaces.mdt-lustre-MDT0000_UUID.lru_size=9600 ldlm.namespaces.mdt-lustre-MDT0000_UUID.max_nolock_bytes=0 ldlm.namespaces.mdt-lustre-MDT0000_UUID.max_parallel_ast=1024 ldlm.namespaces.mdt-lustre-MDT0000_UUID.resource_count=14281836 ldlm.namespaces.mdt-lustre-MDT0000_UUID.pool.stats= snapshot_time 1568643075.466635065 secs.nsecs granted 59817 samples [locks] 0 55589762 171302800233 8049595240895311875 grant 153081925 samples [locks] 1 1 153081925 153081925 cancel 106817757 samples [locks] 1 1 106817757 106817757 grant_rate 59817 samples [locks/s] 0 527737 153081896 36151088814526 cancel_rate 59817 samples [locks/s] 0 315229 106817738 15370244224014 grant_plan 59817 samples [locks/s] 382861 47857748 189705106051 7747978793807775269 slv 59817 samples [slv] 1 1378303200000 77899468648378585 6008560669154165631 recalc_freed 259959295 samples [locks] 0 0 0 0 recalc_timing 59791 samples [sec] 1 52689 140658 3151201290
For anybody hitting a similar issue I would recommend checking the number of locks on server side and limit it on client side ldlm.namespace..lru_max_age and ldlm.namespace..lru_size
Is this a scalibility issue of the LDLM code? Too many active locks? |
| Comment by Yang Sheng [ 18/Sep/19 ] |
|
Hi, Aurelien, Have possible to share the vmcore with us? Or only paste output of 'foreach bt' in crash. Many thanks for your help. Thanks, |
| Comment by Aurelien Degremont (Inactive) [ 19/Sep/19 ] |
|
Not easy to provide you with a vmcore, but here a 'foreach bt' log. (This is a new occurence of the problem, do not compare this output with the exact output I posted above) |
| Comment by Yang Sheng [ 26/Jul/22 ] |
|
Please feel free to reopen it once have new info. |