Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11257

RHEL/CentOS 3.10.0-862.11.6.el7.x86_64 kernel breaks LNet

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.10.4
    • None
    • CentOS 7.5, x86_64
    • 3
    • 9223372036854775807

    Description

      It looks like the latest kernel update from CentOS/RedHat prevents LNet to work on Infiniband interfaces (mlx5).

      Symptoms

      No LNet communication, self-ping doesn't work:

      # lctl list_nids
      10.9.101.60@o2ib4
      # lctl ping 10.9.101.60@o2ib4
      failed to ping 10.9.101.60@o2ib4: Input/output error

      Communicating with other nodes is impossible, as is mounting filesystems.
      The exact same node with the exact same configuration works flawlessly with kernel 3.10.0-862.9.1.el7.x86_64

       Versions

      # uname -r
      3.10.0-862.11.6.el7.x86_64
      # cat /sys/fs/lustre/version
      2.10.4

      HW

       

      # ibstat
      CA 'mlx5_0'
              CA type: MT4115
              Number of ports: 1
              Firmware version: 12.21.3012
              Hardware version: 0
              Node GUID: 0x7cfe900300268c04
              System image GUID: 0x7cfe900300268c04
              Port 1:
                      State: Active
                      Physical state: LinkUp
                      Rate: 100
                      Base lid: 72
                      LMC: 0
                      SM lid: 6
                      Capability mask: 0x2651e848
                      Port GUID: 0x7cfe900300268c04
                      Link layer: InfiniBand

       

      Kernel logs

      [ 1185.337098] LNetError: 22109:0:(o2iblnd_cb.c:2513:kiblnd_passive_connect()) Can't accept 10.9.101.60@o2ib4: -22 
      [ 1185.348376] LNet: 22109:0:(o2iblnd_cb.c:2212:kiblnd_reject()) Error -22 sending reject 
      [ 1185.357473] LNetError: 22109:0:(o2iblnd_cb.c:2721:kiblnd_rejected()) 10.9.101.60@o2ib4 rejected: consumer defined fatal error

      Attachments

        Issue Links

          Activity

            People

              pjones Peter Jones
              srcc Stanford Research Computing Center
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: