Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
centos 7.9 VM 3.10.0-1160.25.1.el7_lustre.x86_64 kernel
could not reproduce on centos 8.2
-
3
-
9223372036854775807
Description
The issue can be reproduced by adding an o2ib NI and then interrupting the corresponding link by pulling the cable or shutting down the switch connection or the whole switch.
Alternatively, one can add the o2ib NI when the corresponding link is already down (cable pulled) to the same effect.
Using "ifdown" to bring the whole interface down doesn't reproduce the problem.
I could reproduce this on a Centos 7.9 VM, but not on a Centos 8.2 system.
The issue got introduced by
commit da230373bd14306cb97fb48748ebce205f09d468 Author: Serguei Smirnov <ssmirnov@whamcloud.com> Date: Thu Feb 16 10:34:03 2023 -0800 LU-16563 lnet: use discovered ni status to set initial health
It then got masked by another issue causing failure when trying to add an o2ib NI starting from
commit cc5594df3e70d1924f34ccdf4c3178654d277ad0 Author: Shaun Tancheff <shaun.tancheff@hpe.com> Date: Sun Apr 23 07:19:11 2023 -0500 LU-16759 o2ib: MOFED 5.5+ ib_dma_virt_map_sg
until some later commit which I didn't determine re-enabled adding o2iblnd NI. The latest master is behaving on 7.9 Centos as described.