Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12886

A lot of LNetError: lnet_peer_ni_add_to_recoveryq_locked() messages

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • None
    • Lustre 2.12.3
    • 2.12.3 RC1 (vanilla) on servers, CentOS 7.6, patched kernel; 2.12.0 + patches on clients
    • 4
    • 9223372036854775807

    Description

      After upgrading our servers on Fir (Sherlock's /scratch) to Lustre 2.12.3 RC1, we are noticing a lot of these messages on all Lustre servers:

      LNetError: 49537:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.0.10.201@o2ib7 added to recovery queue. Health = 900
      

      The NIDs reported are our Lustre routers, that are still running 2.12.0+patches (they are on the client clusters).

      Attaching logs from all servers as lnet_recoveryq.log

      This doesn't seem to have an impact on production and so far 2.12.3 RC1 has been just great for us (and we run it without additional patches now!). Thanks!

      Stéphane

      Attachments

        1. fir-io1-s1.log
          932 kB
          Stephane Thiell
        2. lnet_recoveryq.log
          20 kB
          Stephane Thiell

        Issue Links

          Activity

            People

              ashehata Amir Shehata (Inactive)
              sthiell Stephane Thiell
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: