[LU-2418] Add Way to Detect Dropped Packets on Production Systems - Whamcloud Community JIRA

Details

Type: Improvement
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: Lustre 2.4.0
Labels:
- lnet
- support

Rank (Obsolete):
5735

Description

Today, it is very difficult to confirm whether timeouts in Lustre are due to dropped packets in LNet. This is due to two reasons:

1- neterrors are off by default so logging does not show dropped packets.
2- the errors counter is never incremented (see LU-2223).

My understanding is that neterrors are off by default because there is too much "noise" when they are on. That begs the question: how can logs which are issued that frequently be considered errors?

I think this issue can be address in one of two ways:

1- Clean up the neterror logs so they are not noisy and then leave neterrors on by default.
2- Add a set of new counters to LNet to count the reasons for dropped packets.

Attachments

Issue Links

duplicates

LU-8223 De-Noise LNet neterr logs so they can be ON by default

Open

Activity

People

Assignee:: WC Triage

Reporter:: Doug Oucharek (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 30/Nov/12 7:57 PM

Updated:: 18/Mar/25 7:51 AM