Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5718

RDMA too fragmented with router

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Critical Critical
    • Lustre 2.10.0
    • Lustre 2.7.0, Lustre 2.8.0, Lustre 2.9.0
    • 3
    • 16043

      Got an IOR failure on the soak cluster with the following errors:

      Oct  7 21:54:01 lola-23 kernel: LNetError: 3613:0:(o2iblnd_cb.c:1134:kiblnd_init_rdma()) RDMA too fragmented for 192.168.1.115@o2ib100 (256): 128/256 src 128/256 dst frags
      Oct  7 21:54:01 lola-23 kernel: LNetError: 3618:0:(o2iblnd_cb.c:428:kiblnd_handle_rx()) Can't setup rdma for PUT to 192.168.1.114@o2ib100: -90
      Oct  7 21:54:01 lola-23 kernel: LNetError: 3618:0:(o2iblnd_cb.c:428:kiblnd_handle_rx()) Skipped 7 previous similar messages
      

      Liang told me that this is a known issue with routing. That said, the IOR process is not killable and the only option is to reboot the client node. We should at least fail "gracefully" by returning the error to the application.

            doug Doug Oucharek (Inactive)
            johann Johann Lombardi (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            39 Start watching this issue

              Created:
              Updated:
              Resolved: