[LU-8223] De-Noise LNet neterr logs so they can be ON by default Created: 31/May/16 Updated: 20/Jun/17 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major |
| Reporter: | Doug Oucharek (Inactive) | Assignee: | Amir Shehata (Inactive) |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | lnet | ||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
LNet's neterr logs are turned off by default. I have been told this is due to the fact that they are very noisy. Logically, if logs are happening very frequently, they are not really errors then but normal operations. If they were errors, we should be fixing them. The big problem here is that when a networking field issue happens, we have little to nothing in the logs to go on. Debugging requires that the problem be easy to reproduce with neterr turned on (not usually the case for production errors), or becomes a discipline of the mind (i.e. guesswork). This ticket is for cleaning up the neterr logs to be true errors so we can have neterr logs on be default. |
| Comments |
| Comment by Doug Oucharek (Inactive) [ 01/Jun/16 ] |
|
I actually consider this a bug and not an improvement. Neterrors should never have gotten into this unusable state in the first place. Outside of development, they are useless because they are off. |
| Comment by Gerrit Updater [ 13/Jun/16 ] |
|
Doug Oucharek (doug.s.oucharek@intel.com) uploaded a new patch: http://review.whamcloud.com/20769 |
| Comment by James A Simmons [ 05/Jan/17 ] |
|
Do you need to backport this to earlier lustre version or is this a lustre 2.10 thing. The reason I ask is that the lustre debugging code is being migrated to tracepoint. For tracepoint this can be addressed but it wouldn't be back portable. |
| Comment by Doug Oucharek (Inactive) [ 05/Jan/17 ] |
|
This is really just a 2.10 thing. I don't expect anyone will want this backported. When will the migration to tracepoint be taking place? |
| Comment by James A Simmons [ 05/Jan/17 ] |
|
I already started the tracepoint work. See LU-8980. |
| Comment by Doug Oucharek (Inactive) [ 06/Jan/17 ] |
|
Ok, I can delay this patch until LU-8980 lands and then update it with any additional changes required. |