Details
-
Improvement
-
Resolution: Fixed
-
Minor
-
None
-
None
-
9223372036854775807
Description
Cray debugged an issue where we identified a huge number of what appear to be duplicated requests to the MDS from various clients. This is highly abnormal, and while Lustre is able to handle a certain amount of this, it seems that this code turns up a bug that occurs when many requests are being replayed/restored.
It looked very much like a network issue, and some fabric maintenance saw the problem go away.
While we were not able to root cause the issue (though we suspect LU-2827 may have been at play), Patrick Farrell observed that we would've figured out the problem much sooner if a debug message was printed to the console and dk log under the default debug settings.
Restore/replay of a request is A) relatively rare, and B) even when handled without incident, indicates a strong possibility of something wrong, and merits a warning. Having this debug in place would have led to this being solved much more quickly.
I've opened this ticket to track the change to the debug message.