Details
-
Improvement
-
Resolution: Not a Bug
-
Blocker
-
None
-
None
-
Lustre 2.1.3
-
6043
Description
After a cluster was bounced during a partner's Chroma evaluation, the filesystem came back up in recovery. The site admin did not realize that they should attempt to re-mount their client in order to move forward, and therefore the MDT was blocked on tgt_recov for over 12 hours.
Ultimately, the site admin was instructed to use lctl abort_recov on the MDT device to move forward, and advised that in the future their client(s) should remount the filesystem instead of using lctl.
As part of the investigation into what was happening, it was observed that dmesg on the MDS was full of repeating messages about tgt_recov (log excerpt attached). At adilger's request, I created this ticket to track a request to quiet these repeating messages down in order to reduce the chance that important messages might be lost in the spam.