Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13806

LNetError: 7229:0:(peer.c:529:lnet_peer_del_nid()) ASSERTION( lpni2 ) failed:

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • Lustre 2.15.0
    • None
    • 3
    • 9223372036854775807

    Description

      sles15s01 config:

      sles15s01:~ # start.sh
      debug=+net
      sles15s01:~ # lctl list_nids
      192.168.2.30@tcp1
      sles15s01:~ # lctl show_route
      net               tcp2 hops 4294967295 gw                192.168.2.20@tcp1 up pri 0
      sles15s01:~ #
      

      sles15c01 config:

      sles15c01:~ # start.sh
      debug=+net
      sles15c01:~ # lctl list_nids
      192.168.2.38@tcp2
      sles15c01:~ # lctl show_route
      net               tcp1 hops 4294967295 gw                192.168.2.50@tcp2 up pri 0
      sles15c01:~ #
      

      sles15build01 config:

      sles15build01:~ # start.sh
      debug=+net
      sles15build01:~ # lctl list_nids
      192.168.2.20@tcp1
      192.168.2.50@tcp2
      sles15build01:~ #
      
      sles15c01:~ # date; lnetctl peer show; lctl show_route; ssh sles15build01 'lnetctl net del --net tcp1 --if eth0'; sleep 1; lnetctl peer show; lctl show_route
      

      Assert was tripped after the lnetctl net del command was executed on sles15build01.

      I think the bug is probably in the initial router discovery. When we create the lpni for the primary nid, 192.168.2.20@tcp1, and attach it to peer it is at the end of the list, but I think primary nid is expected to be the first one.

      Attachments

        Activity

          People

            hornc Chris Horn
            hornc Chris Horn
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: