Details

    • Technical task
    • Resolution: Unresolved
    • Critical
    • Lustre 2.11.0
    • Lustre 2.10.0
    • None
    • 9223372036854775807

    Description

      One of the major obstacles to the movement of the upstream client out of staging is our debug logging system. The libcfs debugging infrastructure was developed long before trace point existed.
      Today trace point + perf can provide far more functionally than what is offered by lctl debug. This ticket covers the work need to implement this functionality.

      Attachments

        Issue Links

          Activity

            [LU-8980] Add tracepoint support to Lustre

            "Timothy Day <timday@amazon.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52875
            Subject: LU-8980 debug: redefine CDEBUG to use tracing
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 7be979ac832dcafde22ca68737b4c37459dca83f

            gerrit Gerrit Updater added a comment - "Timothy Day <timday@amazon.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52875 Subject: LU-8980 debug: redefine CDEBUG to use tracing Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 7be979ac832dcafde22ca68737b4c37459dca83f

            It lives. I have completed a functional patch with a complete port of all the trace point events. Please give it a review.

            simmonsja James A Simmons added a comment - It lives. I have completed a functional patch with a complete port of all the trace point events. Please give it a review.

            I pushed a new patch that has been updated and I fixed the build issue on RHEL6. Only the CERROR need to be ported but I like to have feedback on the patch before I complete  the remaining work.

            simmonsja James A Simmons added a comment - I pushed a new patch that has been updated and I fixed the build issue on RHEL6. Only the CERROR need to be ported but I like to have feedback on the patch before I complete  the remaining work.
            simmonsja James A Simmons added a comment - - edited

            Dmitry how does it look to you now. BTW does anyone even know what the error codes for LCONSOLE_ERROR mean anymore?

            simmonsja James A Simmons added a comment - - edited Dmitry how does it look to you now. BTW does anyone even know what the error codes for LCONSOLE_ERROR mean anymore?
            simmonsja James A Simmons added a comment - - edited

            Oh now I understand what you mean by your filtering. Thanks for making it clearer. BTW its really  awesome to be able to filter that way.

            simmonsja James A Simmons added a comment - - edited Oh now I understand what you mean by your filtering. Thanks for making it clearer. BTW its really  awesome to be able to filter that way.

            This patch allow string filtering by file and function name.

            diff --git a/libcfs/include/libcfs/libcfs_trace.h b/libcfs/include/libcfs/libcfs_trace.h
            index c2aecdb..5d210cb 100644
            --- a/libcfs/include/libcfs/libcfs_trace.h
            +++ b/libcfs/include/libcfs/libcfs_trace.h
            @@ -67,20 +67,20 @@ DECLARE_EVENT_CLASS(libcfs_log_msg,
                             struct va_format *vaf),
                    TP_ARGS(msg_file, msg_line, msg_fn, vaf),
                    TP_STRUCT__entry(
            -               __field(const char *, msg_file)
            +               __string(msg_file, msg_file ? msg_file : "<nofile>")
                            __field(int, msg_line)
            -               __field(const char *, msg_fn)
            +               __string(msg_func, msg_fn ? msg_fn : "<nofunc>")
                            __dynamic_array(char, msg, CFS_TRACE_CONSOLE_BUFFER_SIZE)
                    ),
                    TP_fast_assign(
            -               __entry->msg_file = msg_file;
            +               __assign_str(msg_file, msg_file);
                            __entry->msg_line = msg_line;
            -               __entry->msg_fn = msg_fn;
            +               __assign_str(msg_func, msg_fn);
                            vsnprintf(__get_dynamic_array(msg),
                                      CFS_TRACE_CONSOLE_BUFFER_SIZE, vaf->fmt, *vaf->va);
                    ),
            -       TP_printk("(%s:%d:%s) %s", __entry->msg_file, __entry->msg_line,
            -                 __entry->msg_fn, __get_str(msg))
            +       TP_printk("(%s:%d:%s) %s", __get_str(msg_file), __entry->msg_line,
            +                 __get_str(msg_func), __get_str(msg))
             );
             
             DEFINE_EVENT(libcfs_log_msg, libcfs_config,
            @@ -118,20 +118,20 @@ DECLARE_EVENT_CLASS(libcfs_log_msg_limited,
                             const char *msg_fn, struct va_format *vaf),
                    TP_ARGS(rs, msg_file, msg_line, msg_fn, vaf),
                    TP_STRUCT__entry(
            -               __field(const char *, msg_file)
            +               __string(msg_file, msg_file ? msg_file : "<nofile>")
                            __field(int, msg_line)
            -               __field(const char *, msg_fn)
            +               __string(msg_func, msg_fn ? msg_fn : "<nofunc>")
                            __dynamic_array(char, msg, CFS_TRACE_CONSOLE_BUFFER_SIZE)
                    ),
                    TP_fast_assign(
            -               __entry->msg_file = msg_file;
            +               __assign_str(msg_file, msg_file);
                            __entry->msg_line = msg_line;
            -               __entry->msg_fn = msg_fn;
            +               __assign_str(msg_func, msg_fn);
                            vsnprintf(__get_dynamic_array(msg),
                                      CFS_TRACE_CONSOLE_BUFFER_SIZE, vaf->fmt, *vaf->va);
                    ),
            -       TP_printk("(%s:%d:%s) %s", __entry->msg_file, __entry->msg_line,
            -                 __entry->msg_fn, __get_str(msg))
            +       TP_printk("(%s:%d:%s) %s", __get_str(msg_file), __entry->msg_line,
            +                 __get_str(msg_func), __get_str(msg))
             );
             
             DEFINE_EVENT_CONDITION(libcfs_log_msg_limited, libcfs_console,
            
            

            Probably with this patch the implementation will be suitable enough to replace current logging system.

            dmiter Dmitry Eremin (Inactive) added a comment - This patch allow string filtering by file and function name. diff --git a/libcfs/include/libcfs/libcfs_trace.h b/libcfs/include/libcfs/libcfs_trace.h index c2aecdb..5d210cb 100644 --- a/libcfs/include/libcfs/libcfs_trace.h +++ b/libcfs/include/libcfs/libcfs_trace.h @@ -67,20 +67,20 @@ DECLARE_EVENT_CLASS(libcfs_log_msg, struct va_format *vaf), TP_ARGS(msg_file, msg_line, msg_fn, vaf), TP_STRUCT__entry( - __field(const char *, msg_file) + __string(msg_file, msg_file ? msg_file : "<nofile>") __field(int, msg_line) - __field(const char *, msg_fn) + __string(msg_func, msg_fn ? msg_fn : "<nofunc>") __dynamic_array(char, msg, CFS_TRACE_CONSOLE_BUFFER_SIZE) ), TP_fast_assign( - __entry->msg_file = msg_file; + __assign_str(msg_file, msg_file); __entry->msg_line = msg_line; - __entry->msg_fn = msg_fn; + __assign_str(msg_func, msg_fn); vsnprintf(__get_dynamic_array(msg), CFS_TRACE_CONSOLE_BUFFER_SIZE, vaf->fmt, *vaf->va); ), - TP_printk("(%s:%d:%s) %s", __entry->msg_file, __entry->msg_line, - __entry->msg_fn, __get_str(msg)) + TP_printk("(%s:%d:%s) %s", __get_str(msg_file), __entry->msg_line, + __get_str(msg_func), __get_str(msg)) ); DEFINE_EVENT(libcfs_log_msg, libcfs_config, @@ -118,20 +118,20 @@ DECLARE_EVENT_CLASS(libcfs_log_msg_limited, const char *msg_fn, struct va_format *vaf), TP_ARGS(rs, msg_file, msg_line, msg_fn, vaf), TP_STRUCT__entry( - __field(const char *, msg_file) + __string(msg_file, msg_file ? msg_file : "<nofile>") __field(int, msg_line) - __field(const char *, msg_fn) + __string(msg_func, msg_fn ? msg_fn : "<nofunc>") __dynamic_array(char, msg, CFS_TRACE_CONSOLE_BUFFER_SIZE) ), TP_fast_assign( - __entry->msg_file = msg_file; + __assign_str(msg_file, msg_file); __entry->msg_line = msg_line; - __entry->msg_fn = msg_fn; + __assign_str(msg_func, msg_fn); vsnprintf(__get_dynamic_array(msg), CFS_TRACE_CONSOLE_BUFFER_SIZE, vaf->fmt, *vaf->va); ), - TP_printk("(%s:%d:%s) %s", __entry->msg_file, __entry->msg_line, - __entry->msg_fn, __get_str(msg)) + TP_printk("(%s:%d:%s) %s", __get_str(msg_file), __entry->msg_line, + __get_str(msg_func), __get_str(msg)) ); DEFINE_EVENT_CONDITION(libcfs_log_msg_limited, libcfs_console, Probably with this patch the implementation will be suitable enough to replace current logging system.

            Unfortunately tracepoints don't allow to filter by "const char *" fields. I was not able to specify filter by file or function name because of it's interpreted as integer.

            Can we change the type to fixed array of char or dynamic array of char? I encountered that only such types allow to specify filter with wild card characters (*,?).

             

            dmiter Dmitry Eremin (Inactive) added a comment - Unfortunately tracepoints don't allow to filter by "const char *" fields. I was not able to specify filter by file or function name because of it's interpreted as integer. Can we change the type to fixed array of char or dynamic array of char? I encountered that only such types allow to specify filter with wild card characters (*,?).  

            Function in this case means the name the compiler generates from the TRACE_EVENT macro. For example DEFINE_EVENT(lnet_log_msg, lnet_config, ... --> becomes lnet_config(...) in the source file. So they are not regular function but more like event markers. One of major goals of this project is to replace the current lctl debugging infrastructure since this is a requirement upstream. So I worked the code to map to the behavior lctl debug is doing which is category based. You will see CDEBUG(D_HA,...) maps to lnet_ha(...) etc. By the way the reason I used names like lnet_info, lnet_ioctl is to see clearly in the logs which subsys the event is showing up in. Perhaps

            ptlrpcd_00_01-14398 [002] .... 189383.848857: info: lnet:(linux-crypto.c:119:cfs_crypto_hash_alloc) Using crypto hash: ...

            I have to look at the logs perf creates to see if it makes sense to the user.

            As for limiting debugging to an exact function you can do that with filtering.
            perf ... -filter cfs_hash_*. You can do this by module as well "-filter :mod:libcfs".
             

            simmonsja James A Simmons added a comment - Function in this case means the name the compiler generates from the TRACE_EVENT macro. For example DEFINE_EVENT(lnet_log_msg, lnet_config, ... --> becomes lnet_config(...) in the source file. So they are not regular function but more like event markers. One of major goals of this project is to replace the current lctl debugging infrastructure since this is a requirement upstream. So I worked the code to map to the behavior lctl debug is doing which is category based. You will see CDEBUG(D_HA,...) maps to lnet_ha(...) etc. By the way the reason I used names like lnet_info, lnet_ioctl is to see clearly in the logs which subsys the event is showing up in. Perhaps ptlrpcd_00_01-14398 [002] .... 189383.848857: info: lnet:(linux-crypto.c:119:cfs_crypto_hash_alloc) Using crypto hash: ... I have to look at the logs perf creates to see if it makes sense to the user. As for limiting debugging to an exact function you can do that with filtering. perf ... -filter cfs_hash_*. You can do this by module as well "-filter :mod:libcfs".  

            People

              simmonsja James A Simmons
              simmonsja James A Simmons
              Votes:
              2 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated: