Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17420

JobID specific debug logging

XMLWordPrintable

    • Icon: Improvement Improvement
    • Resolution: Unresolved
    • Icon: Minor Minor
    • None
    • None
    • 3
    • 9223372036854775807

      While trying to analyze a problem specific to one client/application on a large cluster that is in active use by many applications, we are faced with trying to capture debug logs on multiple servers that may be actively processing thousands of RPCs per second that are (likely) unrelated to the problem at hand.

      It might be possible to set a debug_jobid and debug_jobid_mask parameter of the client(s) and server(s) and then execute the job with the specific JobID on the clients. Then, when the servers are processing RPC requests, they use an elevated debug_jobid_mask only when processing RPCs from that job.

      There are some potential implementation issues with this, namely that the existing libcfs_debug mask is currently global to the node, so there would have to be some changes to eg. CDEBUG() to allow this to be checked on a per-thread basis (hopefully without changing the arguments to this widely-used macro).

      Also, this approach has the risk of missing important information from other threads that may be running at the same time (eg. getting conflicting locks) so it is open for discussion whether it will actually be useful in practice.

            wc-triage WC Triage
            adilger Andreas Dilger
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: