Details

    • Improvement
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      While trying to analyze a problem specific to one client/application on a large cluster that is in active use by many applications, we are faced with trying to capture debug logs on multiple servers that may be actively processing thousands of RPCs per second that are (likely) unrelated to the problem at hand.

      It might be possible to set a debug_jobid and debug_jobid_mask parameter of the client(s) and server(s) and then execute the job with the specific JobID on the clients. Then, when the servers are processing RPC requests, they use an elevated debug_jobid_mask only when processing RPCs from that job.

      There are some potential implementation issues with this, namely that the existing libcfs_debug mask is currently global to the node, so there would have to be some changes to eg. CDEBUG() to allow this to be checked on a per-thread basis (hopefully without changing the arguments to this widely-used macro).

      Also, this approach has the risk of missing important information from other threads that may be running at the same time (eg. getting conflicting locks) so it is open for discussion whether it will actually be useful in practice.

      Attachments

        Activity

          [LU-17420] JobID specific debug logging
          There are no comments yet on this issue.

          People

            wc-triage WC Triage
            adilger Andreas Dilger
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: