Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.16.0
-
None
-
Rocky Linux 9.2 – 5.14.0-284.25.1.el9_2.x86_64 - Broadcom BCM57414
-
4
-
9223372036854775807
Description
During testing with master (2.15.57_130_g40c4041 / 40c404129b8ee51af5da7ec422672cc1eba74cbe) on EL9.2 with RoCE network, we noticed that LNet must have some dependencies with RoCE v1 being enabled. If only RoCE v2 is enabled and NOT v1, while the IB layer seems to work well (ib_write_bw, ibv_rc_pingpong, etc.), LNet doesn't work. Attaching a debug trace of a lctl ping on itself (using @o2ib), which doesn't succeed.
In our case, enabling RoCE v1 on the hardware fixes the issue with LNet:
# bnxtnvm -dev=$ROCEIF setoption=disable_roce_v1#0