Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
None
-
3
-
9223372036854775807
Description
On larger clusters it appears like the current transaction timeout default of 10 seconds is too short and it causes RDMA timeouts.
The proposal is to increase the timeout to 150s. With a retry count of 3 that would bring the LND timeout to 50s, which was the initial value before health.
From
LU-13020the workaround to get equivalent behavior for systems without this patch is to run the following commands on all of the 2.12.3 nodes in the shown order:This only temporarily changes these values, but they can be set permanently by adding the following line in /etc/modprobe.d/lnet.conf: