Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.7.0, Lustre 2.8.0, Lustre 2.9.0
-
3
-
16043
Description
Got an IOR failure on the soak cluster with the following errors:
Oct 7 21:54:01 lola-23 kernel: LNetError: 3613:0:(o2iblnd_cb.c:1134:kiblnd_init_rdma()) RDMA too fragmented for 192.168.1.115@o2ib100 (256): 128/256 src 128/256 dst frags Oct 7 21:54:01 lola-23 kernel: LNetError: 3618:0:(o2iblnd_cb.c:428:kiblnd_handle_rx()) Can't setup rdma for PUT to 192.168.1.114@o2ib100: -90 Oct 7 21:54:01 lola-23 kernel: LNetError: 3618:0:(o2iblnd_cb.c:428:kiblnd_handle_rx()) Skipped 7 previous similar messages
Liang told me that this is a known issue with routing. That said, the IOR process is not killable and the only option is to reboot the client node. We should at least fail "gracefully" by returning the error to the application.
Attachments
Issue Links
- duplicates
-
LU-7385 Bulk IO write error
- Resolved
-
LU-3322 ko2iblnd support for different map_on_demand and peer_credits between systems
- Resolved
- is related to
-
LU-9420 Bad Check slipped into repo
- Resolved
-
LU-10252 backport change LU-5718 change 12451/12 to b2_8_fe
- Resolved
-
LU-7401 OOM after LNet initialization with not default peer_creadits on mlx5
- Resolved
-
LUDOC-378 Document wrq_sge as an o2iblnd parameter
- Resolved
-
LU-12419 ppc64le: "LNetError: RDMA has too many fragments for peer_ni" when reading two files
- Closed
- is related to
-
LU-7210 ASSERTION( peer->ibp_connecting == 0 )
- Resolved
-
LU-7569 IB leaf switch caused LNet routers to crash
- Resolved
-
LU-7650 ko2iblnd map_on_demand can't negotitate when page sizes are different between nodes.
- Resolved