Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
Lustre 1.8.6
-
None
-
3
-
10177
Description
I'm testing on Lustre N-hop routing (e.g. o2ib0 <> tcp <> o2ib1) below.
MDS/OSS <-- IB (o2ib0) --> Router1 <-- TCP (tcp0) --> Router2 <-- IB (o2ib1) --> Client - Network configuration - There are two IB fabrics and 1GbE connects both fabrics with LNET routers. MDS/OSS IP address: 192.168.100.120@o2ib0 options lnet networks=o2ib0 routes="tcp0 192.168.100.121@o2ib0; o2ib1 192.168.100.121@o2ib0" Router1 IP address: 192.168.100.121@o2ib0, 192.168.10.121@tcp0 options lnet ip2nets="tcp0 192.168.20.*; o2ib0(ib0) 192.168.100.*" routes="o2ib1 192.168.20.122@tcp0" forwarding="enabled" Router2 IP address: 192.168.200.122@o2ib1, 192.168.10.122@tcp0 options lnet ip2nets="tcp0 192.168.20.*; o2ib1(ib0) 192.168.200.*" routes="o2ib0 192.168.20.121@tcp0" forwarding="enabled" Client IP address: 192.168.200.123@o2ib1 options lnet networks=o2ib1(ib0) routes="o2ib0 192.168.200.122@o2ib1"
It worked with above the configurations, but it seems that there is an issue if Router2 downs (e.g. 'lctl net down'), then restart it. The problem is that the client can't be recovery unless the client umount and remount the filesytem on it.