Details
-
Bug
-
Resolution: Fixed
-
Critical
-
None
-
lustre-2.12.6_9.llnl client
kernel-4.18.0-305.0.0.1toss.t4.x86_64
RHEL84
-
3
-
9223372036854775807
Description
lnet_selftest fails between two nodes over Omnipath
dk.opal63.llnl.gov.7:00000001:00020000:43.0:1622598261.714620:0:129525:0:(brw_test.c:415:brw_bulk_ready()) BRW bulk READ failed for RPC from 12345-192.168.128.126@o2ib18: -103
Bulk transfers work over Infiniband (although in that test 1 of the nodes was RHEL 7.9 and an earlier Lustre patch stack). Bulk transfers also work over tcp using ksocklnd.
lctl pings work fine between the same two nodes.
mpibench and other MPI applications also work fine over Omnipath between two nodes.
See https://github.com/LLNL/lustre/releases/tag/2.12.6_9.llnl for the patch stack