Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16393

o2iblnd: connections rejected before lnd startup is complete

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Before lnd startup is complete, there's a window of time when o2iblnd can reject connection requests similar to the following:

       Nov 16 08:24:18 ai400x2vm-008 kernel: LNetError: 7758:0:(o2iblnd_cb.c:2480:kiblnd_passive_connect()) Can't accept conn from 172.16.16.12@o2ib on NA (ib0:0:172.16.0.192): bad dst nid 172.16.0.192@o2ib
      Nov 16 08:24:19 ai400x2vm-008 kernel: LNetError: 7758:0:(o2iblnd_cb.c:2480:kiblnd_passive_connect()) Can't accept conn from 172.16.16.187@o2ib on NA (ib0:0:172.16.0.192): bad dst nid 172.16.0.192@o2ib
      Nov 16 08:24:19 ai400x2vm-008 kernel: LNetError: 7758:0:(o2iblnd_cb.c:2480:kiblnd_passive_connect()) Skipped 54 previous similar messages
      Nov 16 08:24:19 ai400x2vm-008 kernel: LNet: Added LNI 172.16.0.192@o2ib [32/5120/0/180]
      Nov 16 08:24:19 ai400x2vm-008 kernel: LNet: Using FastReg for registration
      Nov 16 08:24:20 ai400x2vm-008 kernel: LNetError: 7758:0:(o2iblnd_cb.c:2480:kiblnd_passive_connect()) Can't accept conn from 172.16.0.58@o2ib on NA (ib0:1:172.16.0.192): bad dst nid 172.16.0.192@o2ib
      Nov 16 08:24:20 ai400x2vm-008 kernel: LNetError: 7758:0:(o2iblnd_cb.c:2480:kiblnd_passive_connect()) Skipped 180 previous similar messages
      Nov 16 08:24:20 ai400x2vm-008 kernel: LNet: Added LNI 172.16.16.192@o2ib [32/5120/0/180]

      Look into getting rid of this race condition.

      Attachments

        Issue Links

          Activity

            [LU-16393] o2iblnd: connections rejected before lnd startup is complete
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51651/
            Subject: LU-16393 o2iblnd: add IBLND_REJECT_EARLY reject reason
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 673ff86a84ad5d11cde24aa7411c45385ad1c633

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51651/ Subject: LU-16393 o2iblnd: add IBLND_REJECT_EARLY reject reason Project: fs/lustre-release Branch: master Current Patch Set: Commit: 673ff86a84ad5d11cde24aa7411c45385ad1c633

            "Serguei Smirnov <ssmirnov@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51651
            Subject: LU-16393 o2iblnd: add IBLND_REJECT_EARLY reject reason
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 93fe169eef88e8ab31acd01b8c5b3084f1de93ad

            gerrit Gerrit Updater added a comment - "Serguei Smirnov <ssmirnov@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51651 Subject: LU-16393 o2iblnd: add IBLND_REJECT_EARLY reject reason Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 93fe169eef88e8ab31acd01b8c5b3084f1de93ad
            ssmirnov Serguei Smirnov added a comment - - edited

            Hi Nathan,

            Yes, my understanding is that this is correct.

            ssmirnov Serguei Smirnov added a comment - - edited Hi Nathan, Yes, my understanding is that this is correct.

            Is it correct that a client with a rejected connection (during this race window on a server) would report an error message like the following?

            LNetError: 353407:0:(o2iblnd_cb.c:2951:kiblnd_rejected()) 192.168.23.45@o2ib rejected: o2iblnd fatal error

             

            nathand Nathan Dauchy added a comment - Is it correct that a client with a rejected connection (during this race window on a server) would report an error message like the following? LNetError: 353407:0:(o2iblnd_cb.c:2951:kiblnd_rejected()) 192.168.23.45@o2ib rejected: o2iblnd fatal error  

            People

              ssmirnov Serguei Smirnov
              ssmirnov Serguei Smirnov
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: