Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5718

RDMA too fragmented with router

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.10.0
    • Lustre 2.7.0, Lustre 2.8.0, Lustre 2.9.0
    • 3
    • 16043

    Description

      Got an IOR failure on the soak cluster with the following errors:

      Oct  7 21:54:01 lola-23 kernel: LNetError: 3613:0:(o2iblnd_cb.c:1134:kiblnd_init_rdma()) RDMA too fragmented for 192.168.1.115@o2ib100 (256): 128/256 src 128/256 dst frags
      Oct  7 21:54:01 lola-23 kernel: LNetError: 3618:0:(o2iblnd_cb.c:428:kiblnd_handle_rx()) Can't setup rdma for PUT to 192.168.1.114@o2ib100: -90
      Oct  7 21:54:01 lola-23 kernel: LNetError: 3618:0:(o2iblnd_cb.c:428:kiblnd_handle_rx()) Skipped 7 previous similar messages
      

      Liang told me that this is a known issue with routing. That said, the IOR process is not killable and the only option is to reboot the client node. We should at least fail "gracefully" by returning the error to the application.

      Attachments

        Issue Links

          Activity

            People

              doug Doug Oucharek (Inactive)
              johann Johann Lombardi (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              39 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: