Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.15.4
-
None
-
server 2.12.9
server side
lnet networks=o2ib2(op0),tcp1(ens6)
client side
lnet networks=o2ib2(op0)
-
2
-
9223372036854775807
Description
we see after upgrade to client 2.15.4 (from 2.15.3) that for 2 HA couples,
half of the OSTs can not be accessed.
they do not show up in lfs df,
they show up as UP in lctl dl
in lnetctl peer we see
- primary nid: 10.84.200.32@tcp1 <<<<<<< Multi-Rail: True peer ni: - nid: 10.85.200.32@o2ib2 state: NA - nid: 10.84.200.32@tcp1 state: NA - primary nid: 10.85.200.33@o2ib2 Multi-Rail: True peer ni: - nid: 10.85.200.33@o2ib2 state: NA - nid: 10.84.200.33@tcp1 state: NA
the client can not reach tcp1 network of the server, but that is selected as primary nid.
I can either delete the nid with lnetctl peer del to make lfs df show all OSTs,
or I can use
lnet lock_prim_nid=0
to make it work.
That hints towards LU-14668, I would also verify that with a git bisect to 6cfc8e55a2e77c9c91b81a8842e2cbd886025298
That seems to be strange that a non reachable NID can be primary NID, is that intended?
Attachments
Issue Links
- is related to
-
LU-14668 LNet: do discovery in the background
- Resolved