Details
-
Improvement
-
Resolution: Fixed
-
Major
-
None
-
None
-
9223372036854775807
Description
Here are ftrace logs for cancelling of 10k ldlm locks on client disconnect
ldlm_cancel_locks_for_export() / cfs_hash_for_each_relax():
First function calls are fast:
# tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) + 30.305 us | cfs_hash_for_each_relax [libcfs](); 0) + 13.570 us | cfs_hash_for_each_relax [libcfs](); 0) + 16.458 us | cfs_hash_for_each_relax [libcfs](); 1) + 22.303 us | cfs_hash_for_each_relax [libcfs](); 0) + 11.710 us | cfs_hash_for_each_relax [libcfs](); 0) + 10.227 us | cfs_hash_for_each_relax [libcfs](); 1) + 12.137 us | cfs_hash_for_each_relax [libcfs](); 0) + 10.632 us | cfs_hash_for_each_relax [libcfs](); 1) + 11.427 us | cfs_hash_for_each_relax [libcfs]();
the last ones are critically slow:
3) ! 408.016 us | cfs_hash_for_each_relax [libcfs](); 3) ! 425.611 us | cfs_hash_for_each_relax [libcfs](); 3) ! 411.456 us | cfs_hash_for_each_relax [libcfs](); 3) ! 408.476 us | cfs_hash_for_each_relax [libcfs](); 3) ! 410.266 us | cfs_hash_for_each_relax [libcfs](); 3) ! 409.032 us | cfs_hash_for_each_relax [libcfs](); 3) ! 408.790 us | cfs_hash_for_each_relax [libcfs](); 3) ! 409.062 us | cfs_hash_for_each_relax [libcfs](); 3) ! 409.492 us | cfs_hash_for_each_relax [libcfs](); 3) ! 839.014 us | cfs_hash_for_each_relax [libcfs](); 3) ! 409.712 us | cfs_hash_for_each_relax [libcfs]();
i.e. cfs_hash_for_each_relax becoming slow when first hash buckets are emptied.
it matches the perf (cpu profiling) results in one of the above comments, that CPU consumed mostly by
cfs_hash_hh_hhead() – the function is used to check whether hlists in a bucket are empty.
3.83% 0.00% [k] ret_from_fork - -
|
--3.81%--kthread
|
|--1.85%--ldlm_bl_thread_main
| ldlm_export_cancel_locks
| cfs_hash_for_each_empty
| cfs_hash_for_each_relax
| |
| --1.09%--cfs_hash_hh_hhead
|
ldlm_export_cancel_locks() call consumes 2.3 seconds:
[root@devvm3 tests]# cat ~/trace.txt
# tracer: function_graph
#
# CPU DURATION FUNCTION CALLS
# | | | | | | |
2) $ 2397334 us | ldlm_export_cancel_locks [ptlrpc]();
1) | ldlm_export_cancel_locks [ptlrpc]() {
0) $ 2384598 us | } /* ldlm_export_cancel_locks [ptlrpc] */
[root@devvm3 tests]#
with a small libcfs_hash improvement (will be submitted soon), the function call takes only 0.1 sec:
# tracer: function_graph
#
# CPU DURATION FUNCTION CALLS
# | | | | | | |
0) | ldlm_export_cancel_locks [ptlrpc]() {
0) 8.154 us | lu_env_init [obdclass]();
0) * 91503.19 us | cfs_hash_for_each_empty [libcfs]();
3) | ldlm_export_cancel_locks [ptlrpc]() {
3) 7.818 us | lu_env_init [obdclass]();
3) * 92138.99 us | cfs_hash_for_each_empty [libcfs]();
0) + 34.008 us | ldlm_reprocess_list [ptlrpc]();
0) 1.183 us | libcfs_debug_msg [libcfs]();
0) + 13.812 us | lu_env_fini [obdclass]();
0) * 91564.21 us | }
3) + 36.190 us | ldlm_reprocess_list [ptlrpc]();
3) 1.249 us | libcfs_debug_msg [libcfs]();
3) 9.312 us | lu_env_fini [obdclass]();
3) * 92197.71 us | }
my test script, for a reference:
cd /sys/kernel/debug/tracing/ echo ldlm_export_cancel_locks > set_graph_function # echo cfs_hash_for_each_relax > set_graph_function # echo ldlm_cancel_locks_for_export_cb > set_graph_function # echo ldlm_export_cancel_locks > /sys/kernel/debug/tracing/set_ftrace_filter echo > /sys/kernel/debug/tracing/set_ftrace_filter echo 1 > max_graph_depth echo function_graph > current_tracer echo > trace cd - for n in {1..20}; do touch /mnt/lustre/foo-$n; for x in {1..1000}; do echo ladvise -a lockahead -s $(( 2 * x ))M -l 1M -m WRITE /mnt/lustre/foo-$n; done; done | ../utils/lfs ../utils/lctl get_param ldlm.namespaces.filter-*.lock_count LUUID=$(../utils/lctl get_param -n llite.*.uuid) ../utils/lctl set_param obdfilter.lustre-OST*.evict_client=$LUUID sleep 3 ../utils/lctl get_param ldlm.namespaces.filter-*.lock_count cat /sys/kernel/debug/tracing/trace > ~/trace.txt cat ~/trace.txt