Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.12.3
-
CentOS 7.6
-
3
-
9223372036854775807
Description
After having removed a few lnet routes using lnetctl, we are seeing these constant messages on all Lustre servers on Fir:
Mar 17 13:59:02 fir-io7-s1 kernel: LNetError: 115948:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 17 14:09:02 fir-io7-s1 kernel: LNetError: 4245:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 17 14:19:02 fir-io7-s1 kernel: LNetError: 5152:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 17 14:29:02 fir-io7-s1 kernel: LNetError: 5152:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 17 14:39:02 fir-io7-s1 kernel: LNetError: 5966:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error
All coming from removed routers from the lnet route config on the servers.
We removed the routes using commands like:
# clush -w@mds,@oss 'lnetctl route del --net o2ib4 --gateway 10.0.10.212@o2ib7'
The remaining active routes are:
[root@fir-io7-s1 lnet_consumer]# lnetctl route show -v route: - net: o2ib1 gateway: 10.0.10.216@o2ib7 hop: -1 priority: 0 state: up - net: o2ib1 gateway: 10.0.10.218@o2ib7 hop: -1 priority: 0 state: up - net: o2ib1 gateway: 10.0.10.219@o2ib7 hop: -1 priority: 0 state: up - net: o2ib1 gateway: 10.0.10.217@o2ib7 hop: -1 priority: 0 state: up - net: o2ib2 gateway: 10.0.10.227@o2ib7 hop: -1 priority: 0 state: up - net: o2ib2 gateway: 10.0.10.226@o2ib7 hop: -1 priority: 0 state: up - net: o2ib2 gateway: 10.0.10.225@o2ib7 hop: -1 priority: 0 state: up - net: o2ib2 gateway: 10.0.10.224@o2ib7 hop: -1 priority: 0 state: up
Why is lnet trying to use the old routers?
lctl dk shows:
00000800:00020000:16.0:1584479347.555117:0:4883:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error 00000800:00000200:16.0:1584479347.555118:0:4883:0:(o2iblnd_cb.c:2307:kiblnd_connreq_done()) 10.0.10.212@o2ib7: active(1), version(12), status(-111)
I'm attaching a dk with +net as fir-io7-s1_dk.log.gz
Also attaching kernel logs as fir-io7-s1_kern.log and the output of lnetctl stats show as fir-io7-s1_lnetctl_stats.txt .