Details
-
Improvement
-
Resolution: Unresolved
-
Major
-
None
-
Lustre 2.4.0
-
None
-
5735
Description
Today, it is very difficult to confirm whether timeouts in Lustre are due to dropped packets in LNet. This is due to two reasons:
1- neterrors are off by default so logging does not show dropped packets.
2- the errors counter is never incremented (see LU-2223).
My understanding is that neterrors are off by default because there is too much "noise" when they are on. That begs the question: how can logs which are issued that frequently be considered errors?
I think this issue can be address in one of two ways:
1- Clean up the neterror logs so they are not noisy and then leave neterrors on by default.
2- Add a set of new counters to LNet to count the reasons for dropped packets.
Attachments
Issue Links
- duplicates
-
LU-8223 De-Noise LNet neterr logs so they can be ON by default
- Open