Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17540

sync and delay before LBUG() calls panic()

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      It would be useful to have a few second pause between when LBUG() is called and panic() is triggered, in order for the stack trace to be written to the serial console, and ideally also to give a chance for it to be written to /var/log/messages if no serial console is available.

      The code currently calls panic() immediately after dumping the stack:

      lbug_with_loc(struct libcfs_debug_msg_data *msgdata)
      {
              libcfs_catastrophe = 1;
              libcfs_debug_msg(msgdata, "LBUG\n");
      
              if (in_interrupt()) {
                      panic("LBUG in interrupt.\n");
                      /* not reached */
              }
      
              libcfs_debug_dumpstack(NULL);
              if (libcfs_panic_on_lbug)
                      panic("LBUG");
              else
                      libcfs_debug_dumplog();
              set_current_state(TASK_UNINTERRUPTIBLE);
              while (1)
                      schedule();
      }
      

      It would be reasonable to allow libcfs_panic_on_lbug() to store the number of seconds (or milliseconds?) to delay before calling panic(), probably using msleep() to busy-wait instead of being scheduled. In the meantime, a task could be dispatched to a work queue to try sync-and-flush for whatever can be written during this delay (if the system is not locked up), equivalent to "sysrq-w" and "sysrq-s".

      Attachments

        Issue Links

          Activity

            [LU-17540] sync and delay before LBUG() calls panic()

            There was a delay added between calling LBUG() and it calling panic() in patch https://review.whamcloud.com/55505 "LU-17793 libcfs: fix objtool warning in lbug_with_loc()" so this may allow the stack trace to be saved before the node is rebooted.

            Otherwise we might need to add the sync before the sleep to start the write.

            adilger Andreas Dilger added a comment - There was a delay added between calling LBUG() and it calling panic() in patch https://review.whamcloud.com/55505 " LU-17793 libcfs: fix objtool warning in lbug_with_loc() " so this may allow the stack trace to be saved before the node is rebooted. Otherwise we might need to add the sync before the sleep to start the write.
            timday Tim Day added a comment -

            We could use BUG() and BUG_ON() within the LBUG() definition. Those macros dump a stack trace and panic. Plus, the traces should reliably get flushed out to console without us having to add delays.

            timday Tim Day added a comment - We could use BUG() and BUG_ON() within the LBUG() definition. Those macros dump a stack trace and panic. Plus, the traces should reliably get flushed out to console without us having to add delays.

            People

              yujian Jian Yu
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: