Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16357

a mechanism to inform other nodes to dump debug log

    XMLWordPrintable

Details

    • New Feature
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 9223372036854775807

    Description

      Now we can call libcfs_debug_dumplog() in the code to dump debug log of this node to help debug, but often we want to dump debug logs of other nodes as well because the symptom is on this node, but it may be caused by bug on other nodes.

      Having a mechanism to trigger debug log dumping on remote nodes would greatly simplify cross-node debugging. It should be possible to call something like "lctl dk --client[=NID[,NID]]" to have a user trigger a local debug log dump and/or on the specified remote NIDs. It should also be possible to run "lctl dk --mds[=IDX[,IDX]]" to dump logs on all or some MDS nodes, and "lctl dk --oss[=IDX[,IDX]]" to do the same on all or some OSS nodes. This would provide a powerful debugging feature to help isolate issues on multiple remote nodes, especially if they are not directly accessible from the server or client, and SSH into the server is not allowed from the client.

      There should be some parameter to control log dumping on the server, like debug_enable_remote to prevent malicious users from dumping the server logs and filling up the local storage. Similarly, a mechanism to avoid multiple logdumps for the same reason, so a unique ID in the RPC would be useful to have, and clients would cache this for a minute and drop any logdump RPCs with this same ID.

      There should also be a tunable parameter which allows non-root users to trigger the debug log dump, though not actually access the log file for security reasons. This allows log dumping to be triggered directly by the application or job scheduler in case of an application-level error, without the need to run as root or wait for an admin to become available. One option would be something like debug_gid=0 by default for root-only log dumping, debug_gid=GID for an administrative group, or debug_gid=-1 to allow all users to do this.

      Attachments

        Issue Links

          Activity

            People

              laisiyao Lai Siyao
              laisiyao Lai Siyao
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: