Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.10.0
    • Lustre 2.7.0, Lustre 2.8.0, Lustre 2.9.0
    • 3
    • 16043

    Description

      Got an IOR failure on the soak cluster with the following errors:

      Oct  7 21:54:01 lola-23 kernel: LNetError: 3613:0:(o2iblnd_cb.c:1134:kiblnd_init_rdma()) RDMA too fragmented for 192.168.1.115@o2ib100 (256): 128/256 src 128/256 dst frags
      Oct  7 21:54:01 lola-23 kernel: LNetError: 3618:0:(o2iblnd_cb.c:428:kiblnd_handle_rx()) Can't setup rdma for PUT to 192.168.1.114@o2ib100: -90
      Oct  7 21:54:01 lola-23 kernel: LNetError: 3618:0:(o2iblnd_cb.c:428:kiblnd_handle_rx()) Skipped 7 previous similar messages
      

      Liang told me that this is a known issue with routing. That said, the IOR process is not killable and the only option is to reboot the client node. We should at least fail "gracefully" by returning the error to the application.

      Attachments

        Issue Links

          Activity

            [LU-5718] RDMA too fragmented with router
            alex.ku Alex Kulyavtsev made changes -
            Link New: This issue is related to LU-12419 [ LU-12419 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to DDN-547 [ DDN-547 ]
            mdiep Minh Diep made changes -
            Link New: This issue is related to LDEV-342 [ LDEV-342 ]
            mdiep Minh Diep made changes -
            Link Original: This issue is related to LDEV-341 [ LDEV-341 ]
            mdiep Minh Diep made changes -
            Link New: This issue is related to LDEV-341 [ LDEV-341 ]
            pjones Peter Jones made changes -
            Link New: This issue is related to LU-10252 [ LU-10252 ]
            mdiep Minh Diep made changes -
            Link New: This issue is related to JFC-20 [ JFC-20 ]
            mdiep Minh Diep made changes -
            Link Original: This issue is related to JFC-17 [ JFC-17 ]
            mdiep Minh Diep made changes -
            Link New: This issue is related to DDN-453 [ DDN-453 ]
            pjones Peter Jones made changes -
            Link New: This issue is duplicated by SEA-464 [ SEA-464 ]

            People

              doug Doug Oucharek (Inactive)
              johann Johann Lombardi (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              39 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: