Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16393

o2iblnd: connections rejected before lnd startup is complete

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Before lnd startup is complete, there's a window of time when o2iblnd can reject connection requests similar to the following:

       Nov 16 08:24:18 ai400x2vm-008 kernel: LNetError: 7758:0:(o2iblnd_cb.c:2480:kiblnd_passive_connect()) Can't accept conn from 172.16.16.12@o2ib on NA (ib0:0:172.16.0.192): bad dst nid 172.16.0.192@o2ib
      Nov 16 08:24:19 ai400x2vm-008 kernel: LNetError: 7758:0:(o2iblnd_cb.c:2480:kiblnd_passive_connect()) Can't accept conn from 172.16.16.187@o2ib on NA (ib0:0:172.16.0.192): bad dst nid 172.16.0.192@o2ib
      Nov 16 08:24:19 ai400x2vm-008 kernel: LNetError: 7758:0:(o2iblnd_cb.c:2480:kiblnd_passive_connect()) Skipped 54 previous similar messages
      Nov 16 08:24:19 ai400x2vm-008 kernel: LNet: Added LNI 172.16.0.192@o2ib [32/5120/0/180]
      Nov 16 08:24:19 ai400x2vm-008 kernel: LNet: Using FastReg for registration
      Nov 16 08:24:20 ai400x2vm-008 kernel: LNetError: 7758:0:(o2iblnd_cb.c:2480:kiblnd_passive_connect()) Can't accept conn from 172.16.0.58@o2ib on NA (ib0:1:172.16.0.192): bad dst nid 172.16.0.192@o2ib
      Nov 16 08:24:20 ai400x2vm-008 kernel: LNetError: 7758:0:(o2iblnd_cb.c:2480:kiblnd_passive_connect()) Skipped 180 previous similar messages
      Nov 16 08:24:20 ai400x2vm-008 kernel: LNet: Added LNI 172.16.16.192@o2ib [32/5120/0/180]

      Look into getting rid of this race condition.

      Attachments

        Issue Links

          Activity

            People

              ssmirnov Serguei Smirnov
              ssmirnov Serguei Smirnov
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: