Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5364

Lustre Router connection hangs one side of fabric

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Major
    • None
    • Lustre 2.4.3
    • None
    • 3
    • 14961

    Description

      We have 2 IB fabrics connected with 2 lustre routers. One side of fabrics connected via obsidain longbows and the other fabrics is directed connected to routers via qdr switch.

      Fabric1_o2ib233 <--->LONGBOW1<----<ROUTER1>---->QDR<--Fabric2_o2ib
      Fabric1_o2ib233 <--->LONGBOW2<----<ROUTER2>----->QDR<--Fabric2_o2ib
      

      We get Router disconnects on the fabric2_o2ib side with errors like this on the routers

      LNet: 1310:0:(o2iblnd_cb.c:2360:kiblnd_passive_connect()) Conn race 10.151.27.74@o2ib
      LNet: 1308:0:(o2iblnd_cb.c:2360:kiblnd_passive_connect()) Conn race 10.151.27.86@o2ib
      LNet: 1312:0:(o2iblnd_cb.c:2360:kiblnd_passive_connect()) Conn race 10.151.25.242@o2ib
      LNet: 1312:0:(o2iblnd_cb.c:2360:kiblnd_passive_connect()) Conn race 10.151.25.156@o2ib
      LNet: 1314:0:(o2iblnd_cb.c:2360:kiblnd_passive_connect()) Conn race 10.151.27.80@o2ib
      

      ROUTER MODULE SETTINGS

      options lnet networks="o2ib(ib1),o2ib233(ib0)" forwarding=enabled
      options ko2iblnd require_privileged_port=0
      options ko2iblnd use_privileged_port=0
      options ko2iblnd timeout=150
      options ko2iblnd retry_count=7
      options ko2iblnd peer_timeout=0
      options ptlrpc at_min=100
      

      SERVERS SETTINGS

      options ko2iblnd require_privileged_port=0
      options ko2iblnd use_privileged_port=0
      options lnet networks=o2ib(ib1),o2ib100(ib1) routes="o2ib233 10.151.27.[58,93]@o2ib" dead_router_check_interval=60 live_router_check_interval=60
      # Get rid of messages for missing, special-purpose hardware (LU-1599)
      blacklist padlock-sha
      options ko2iblnd timeout=150
      options ko2iblnd retry_count=7
      options ko2iblnd peer_timeout=0
      options ptlrpc at_min=100
      

      CLIENTS

      options ko2iblnd require_privileged_port=0
      options ko2iblnd use_privileged_port=0
      options lnet networks=o2ib233(ib1) routes="o2ib 10.153.27.[58,93]@o2ib233" dead_router_check_interval=60 live_router_check_interval=60
      

      Attachments

        Activity

          People

            ashehata Amir Shehata (Inactive)
            mhanafi Mahmoud Hanafi
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: