Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2593

Quiet log spam while waiting for initial client connect in recovery

Details

    • Improvement
    • Resolution: Not a Bug
    • Blocker
    • None
    • None
    • Lustre 2.1.3
    • 6043

    Description

      After a cluster was bounced during a partner's Chroma evaluation, the filesystem came back up in recovery. The site admin did not realize that they should attempt to re-mount their client in order to move forward, and therefore the MDT was blocked on tgt_recov for over 12 hours.

      Ultimately, the site admin was instructed to use lctl abort_recov on the MDT device to move forward, and advised that in the future their client(s) should remount the filesystem instead of using lctl.

      As part of the investigation into what was happening, it was observed that dmesg on the MDS was full of repeating messages about tgt_recov (log excerpt attached). At adilger's request, I created this ticket to track a request to quiet these repeating messages down in order to reduce the chance that important messages might be lost in the spam.

      Attachments

        Activity

          [LU-2593] Quiet log spam while waiting for initial client connect in recovery

          not Lustre bug

          tappro Mikhail Pershin added a comment - not Lustre bug

          as far as I can see this is kernel message but not Lustre one, also there is message about how to shut it down:

          Jan  7 15:28:58 chroma-mds0 kernel: INFO: task tgt_recov:6363 blocked for
          more than 120 seconds.
          Jan  7 15:28:58 chroma-mds0 kernel: "echo 0 >
          /proc/sys/kernel/hung_task_timeout_secs" disables this message.
          

          I don't know what can we do here except following that advice.

          tappro Mikhail Pershin added a comment - as far as I can see this is kernel message but not Lustre one, also there is message about how to shut it down: Jan 7 15:28:58 chroma-mds0 kernel: INFO: task tgt_recov:6363 blocked for more than 120 seconds. Jan 7 15:28:58 chroma-mds0 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. I don't know what can we do here except following that advice.

          People

            tappro Mikhail Pershin
            mjmac Michael MacDonald (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: