Prepare lustre for adoption into the linux kernel (LU-9679)

[LU-8980] Add tracepoint support to Lustre Created: 28/Dec/16  Updated: 28/Oct/23

Status: Reopened
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0
Fix Version/s: Lustre 2.11.0

Type: Technical task Priority: Critical
Reporter: James A Simmons Assignee: James A Simmons
Resolution: Unresolved Votes: 2
Labels: None

Issue Links:
Related
is related to LU-14428 Convert tracefile to use ring_buffer ... Open
is related to LU-9679 Prepare lustre for adoption into the ... Resolved
is related to LU-8145 start dump trace thread once CERROR Resolved
is related to LU-7801 Function level debug message filter Closed
is related to LU-8223 De-Noise LNet neterr logs so they can... Open
Epic/Theme: upstream
Rank (Obsolete): 9223372036854775807

 Description   

One of the major obstacles to the movement of the upstream client out of staging is our debug logging system. The libcfs debugging infrastructure was developed long before trace point existed.
Today trace point + perf can provide far more functionally than what is offered by lctl debug. This ticket covers the work need to implement this functionality.



 Comments   
Comment by James A Simmons [ 28/Dec/16 ]

I do have a first patch but I need to add the event filtering base on the debug flags. Here is the sample output I'm seeing.

# tracer: nop
#
# entries-in-buffer/entries-written: 2121/11382   #P:8
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
modprobe-10993 [007] ....  6584.386683: lnet_config: (linux-crypto.c:377:cfs_crypto_performance_test) Crypto hash algorithm crc32 speed = 1427 MB/s
.....
Comment by Gerrit Updater [ 30/Dec/16 ]

James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/24554
Subject: LU-8980 libcfs: add tracepoint support for libcfs
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: cd9d0fcbeddb0a0a00b58251bc315aff9b92ef39

Comment by James A Simmons [ 31/Dec/16 ]

Also have llite in the works. Here is the output I have so far.

kworker/u17:2-5181 [001] .... 2135.478773: llite_vfstrace: (vvp_io.c:306:vvp_io_fini) [0x200005222:0x13a4:0x0] ignore/verify layout 0/0, layout version 0 restore needed 0

sendfile_groupl-23957 [002] .... 2135.480731: llite_vfstrace: (file.c:3992:ll_layout_conf) [0x200005222:0x13a4:0x0]: layout version change: 0 -> 0

Comment by Gerrit Updater [ 31/Dec/16 ]

James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/24560
Subject: LU-8980 llite: add tracepoint support for llite layer
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7fffac5a94243a03d92381cfd82340188d66684c

Comment by James A Simmons [ 19/Jan/17 ]

Just pushed a version of tracepoint that handles printing messages to the console. Most of the infrastructure is now in place. Only a few issues to resolve.

1) Currently ratelimting is per module. For the lustre debugging code it has a individual rate limit for every single message. I doubt that       would be acceptable upstream.

2) Need to fine tune the printk rate limiting based on module parameters in libcfs.

3) Some of the debug message mask are conditional i.e "CDEBUG(warn ? D_WARNING : D_INFO". Might need to create special               macros for this. Haven't figured a clean solution. Another option is break it up into two separate trace events.

For the most part its ready for review. Peter who would be good choices to look at this work?

Comment by Dmitry Eremin (Inactive) [ 07/Feb/17 ]

James,

This is not I expected to see when I vote for this feature. I expect to see correct function name in "FUNCTION" field instead of category like "lnet_info".

# tracer: nop
#
# entries-in-buffer/entries-written: 58/61   #P:4
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
      ldlm_bl_01-15147 [003] .... 189369.402891: lnet_info: (linux-crypto.c:119:cfs_crypto_hash_alloc) Using crypto hash: adler32 (adler32-zlib) speed 2454 MB/s

 ll_ost_io00_002-15366 [001] .... 189369.403051: lnet_info: (linux-crypto.c:119:cfs_crypto_hash_alloc) Using crypto hash: adler32 (adler32-zlib) speed 2454 MB/s

##### CPU 0 buffer started ####
   ptlrpcd_00_03-14400 [000] .... 189383.834528: lnet_info: (linux-crypto.c:119:cfs_crypto_hash_alloc) Using crypto hash: adler32 (adler32-zlib) speed 2454 MB/s

 ll_ost_io00_002-15366 [001] .... 189383.835278: lnet_info: (linux-crypto.c:119:cfs_crypto_hash_alloc) Using crypto hash: adler32 (adler32-zlib) speed 2454 MB/s

##### CPU 2 buffer started ####
   ptlrpcd_00_01-14398 [002] .... 189383.848857: lnet_info: (linux-crypto.c:119:cfs_crypto_hash_alloc) Using crypto hash: adler32 (adler32-zlib) speed 2454 MB/s

It would be very beneficially to be able to select appropriate function name for tracing:

# perf list | grep lnet
  lnet:lnet_config                                   [Tracepoint event]
  lnet:lnet_console                                  [Tracepoint event]
  lnet:lnet_emerg                                    [Tracepoint event]
  lnet:lnet_error                                    [Tracepoint event]
  lnet:lnet_ha                                       [Tracepoint event]
  lnet:lnet_info                                     [Tracepoint event]
  lnet:lnet_ioctl                                    [Tracepoint event]
  lnet:lnet_neterror                                 [Tracepoint event]
  lnet:lnet_other                                    [Tracepoint event]
  lnet:lnet_warning                                  [Tracepoint event]

Right now, I can select for tracing only category instead of exact function I interested in.

Comment by James A Simmons [ 09/Feb/17 ]

Function in this case means the name the compiler generates from the TRACE_EVENT macro. For example DEFINE_EVENT(lnet_log_msg, lnet_config, ... --> becomes lnet_config(...) in the source file. So they are not regular function but more like event markers. One of major goals of this project is to replace the current lctl debugging infrastructure since this is a requirement upstream. So I worked the code to map to the behavior lctl debug is doing which is category based. You will see CDEBUG(D_HA,...) maps to lnet_ha(...) etc. By the way the reason I used names like lnet_info, lnet_ioctl is to see clearly in the logs which subsys the event is showing up in. Perhaps

ptlrpcd_00_01-14398 [002] .... 189383.848857: info: lnet:(linux-crypto.c:119:cfs_crypto_hash_alloc) Using crypto hash: ...

I have to look at the logs perf creates to see if it makes sense to the user.

As for limiting debugging to an exact function you can do that with filtering.
perf ... -filter cfs_hash_*. You can do this by module as well "-filter :mod:libcfs".
 

Comment by Dmitry Eremin (Inactive) [ 09/Feb/17 ]

Unfortunately tracepoints don't allow to filter by "const char *" fields. I was not able to specify filter by file or function name because of it's interpreted as integer.

Can we change the type to fixed array of char or dynamic array of char? I encountered that only such types allow to specify filter with wild card characters (*,?).

 

Comment by Dmitry Eremin (Inactive) [ 10/Feb/17 ]

This patch allow string filtering by file and function name.

diff --git a/libcfs/include/libcfs/libcfs_trace.h b/libcfs/include/libcfs/libcfs_trace.h
index c2aecdb..5d210cb 100644
--- a/libcfs/include/libcfs/libcfs_trace.h
+++ b/libcfs/include/libcfs/libcfs_trace.h
@@ -67,20 +67,20 @@ DECLARE_EVENT_CLASS(libcfs_log_msg,
                 struct va_format *vaf),
        TP_ARGS(msg_file, msg_line, msg_fn, vaf),
        TP_STRUCT__entry(
-               __field(const char *, msg_file)
+               __string(msg_file, msg_file ? msg_file : "<nofile>")
                __field(int, msg_line)
-               __field(const char *, msg_fn)
+               __string(msg_func, msg_fn ? msg_fn : "<nofunc>")
                __dynamic_array(char, msg, CFS_TRACE_CONSOLE_BUFFER_SIZE)
        ),
        TP_fast_assign(
-               __entry->msg_file = msg_file;
+               __assign_str(msg_file, msg_file);
                __entry->msg_line = msg_line;
-               __entry->msg_fn = msg_fn;
+               __assign_str(msg_func, msg_fn);
                vsnprintf(__get_dynamic_array(msg),
                          CFS_TRACE_CONSOLE_BUFFER_SIZE, vaf->fmt, *vaf->va);
        ),
-       TP_printk("(%s:%d:%s) %s", __entry->msg_file, __entry->msg_line,
-                 __entry->msg_fn, __get_str(msg))
+       TP_printk("(%s:%d:%s) %s", __get_str(msg_file), __entry->msg_line,
+                 __get_str(msg_func), __get_str(msg))
 );
 
 DEFINE_EVENT(libcfs_log_msg, libcfs_config,
@@ -118,20 +118,20 @@ DECLARE_EVENT_CLASS(libcfs_log_msg_limited,
                 const char *msg_fn, struct va_format *vaf),
        TP_ARGS(rs, msg_file, msg_line, msg_fn, vaf),
        TP_STRUCT__entry(
-               __field(const char *, msg_file)
+               __string(msg_file, msg_file ? msg_file : "<nofile>")
                __field(int, msg_line)
-               __field(const char *, msg_fn)
+               __string(msg_func, msg_fn ? msg_fn : "<nofunc>")
                __dynamic_array(char, msg, CFS_TRACE_CONSOLE_BUFFER_SIZE)
        ),
        TP_fast_assign(
-               __entry->msg_file = msg_file;
+               __assign_str(msg_file, msg_file);
                __entry->msg_line = msg_line;
-               __entry->msg_fn = msg_fn;
+               __assign_str(msg_func, msg_fn);
                vsnprintf(__get_dynamic_array(msg),
                          CFS_TRACE_CONSOLE_BUFFER_SIZE, vaf->fmt, *vaf->va);
        ),
-       TP_printk("(%s:%d:%s) %s", __entry->msg_file, __entry->msg_line,
-                 __entry->msg_fn, __get_str(msg))
+       TP_printk("(%s:%d:%s) %s", __get_str(msg_file), __entry->msg_line,
+                 __get_str(msg_func), __get_str(msg))
 );
 
 DEFINE_EVENT_CONDITION(libcfs_log_msg_limited, libcfs_console,

Probably with this patch the implementation will be suitable enough to replace current logging system.

Comment by James A Simmons [ 10/Feb/17 ]

Oh now I understand what you mean by your filtering. Thanks for making it clearer. BTW its really  awesome to be able to filter that way.

Comment by James A Simmons [ 13/Feb/17 ]

Dmitry how does it look to you now. BTW does anyone even know what the error codes for LCONSOLE_ERROR mean anymore?

Comment by James A Simmons [ 31/Mar/17 ]

I pushed a new patch that has been updated and I fixed the build issue on RHEL6. Only the CERROR need to be ported but I like to have feedback on the patch before I complete  the remaining work.

Comment by James A Simmons [ 04/Apr/17 ]

It lives. I have completed a functional patch with a complete port of all the trace point events. Please give it a review.

Comment by Gerrit Updater [ 28/Oct/23 ]

"Timothy Day <timday@amazon.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52875
Subject: LU-8980 debug: redefine CDEBUG to use tracing
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7be979ac832dcafde22ca68737b4c37459dca83f

Generated at Sat Feb 10 02:22:12 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.