Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8223

De-Noise LNet neterr logs so they can be ON by default

Details

    • Improvement
    • Resolution: Unresolved
    • Major
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      LNet's neterr logs are turned off by default. I have been told this is due to the fact that they are very noisy. Logically, if logs are happening very frequently, they are not really errors then but normal operations. If they were errors, we should be fixing them.

      The big problem here is that when a networking field issue happens, we have little to nothing in the logs to go on. Debugging requires that the problem be easy to reproduce with neterr turned on (not usually the case for production errors), or becomes a discipline of the mind (i.e. guesswork).

      This ticket is for cleaning up the neterr logs to be true errors so we can have neterr logs on be default.

      Attachments

        Issue Links

          Activity

            [LU-8223] De-Noise LNet neterr logs so they can be ON by default

            Ok, I can delay this patch until LU-8980 lands and then update it with any additional changes required.

            doug Doug Oucharek (Inactive) added a comment - Ok, I can delay this patch until LU-8980 lands and then update it with any additional changes required.

            I already started the tracepoint work. See LU-8980.

            simmonsja James A Simmons added a comment - I already started the tracepoint work. See LU-8980 .

            This is really just a 2.10 thing.  I don't expect anyone will want this backported.

            When will the migration to tracepoint be taking place?

            doug Doug Oucharek (Inactive) added a comment - This is really just a 2.10 thing.  I don't expect anyone will want this backported. When will the migration to tracepoint be taking place?

            Do you need to backport this to earlier lustre version or is this a lustre 2.10 thing. The reason I ask is that the lustre debugging code is being migrated to tracepoint. For tracepoint this can be addressed but it wouldn't be back portable.

            simmonsja James A Simmons added a comment - Do you need to backport this to earlier lustre version or is this a lustre 2.10 thing. The reason I ask is that the lustre debugging code is being migrated to tracepoint. For tracepoint this can be addressed but it wouldn't be back portable.

            Doug Oucharek (doug.s.oucharek@intel.com) uploaded a new patch: http://review.whamcloud.com/20769
            Subject: LU-8223 lnet: Fix use of NETERR logging
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 155086c0edeb1b938994f1aba7c243ad4d57133d

            gerrit Gerrit Updater added a comment - Doug Oucharek (doug.s.oucharek@intel.com) uploaded a new patch: http://review.whamcloud.com/20769 Subject: LU-8223 lnet: Fix use of NETERR logging Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 155086c0edeb1b938994f1aba7c243ad4d57133d

            I actually consider this a bug and not an improvement. Neterrors should never have gotten into this unusable state in the first place. Outside of development, they are useless because they are off.

            doug Doug Oucharek (Inactive) added a comment - I actually consider this a bug and not an improvement. Neterrors should never have gotten into this unusable state in the first place. Outside of development, they are useless because they are off.

            People

              ashehata Amir Shehata (Inactive)
              doug Doug Oucharek (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: