[LU-13912] lnet_check_routes sends pings too frequently Created: 19/Aug/20  Updated: 12/May/21  Resolved: 12/May/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.15.0

Type: Bug Priority: Minor
Reporter: Chris Horn Assignee: Chris Horn
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

lnet_check_routes() attempts to discover a router every alive_router_check_interval / (# local nets). e.g. test node has three nets:

sles15s01:~ # lctl list_nids
192.168.2.30@tcp
192.168.2.31@tcp
192.168.2.30@tcp10
192.168.2.31@tcp11
sles15s01:~ #

Default interval is 60

sles15s01:~ # cat /sys/module/lnet/parameters/alive_router_check_interval
60
sles15s01:~ #

But each local net on the router is getting discovered every 15 seconds:
tcp10

00000400:00000200:3.0:1597692791.996167:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp10(ffff93a5a43ea500) cpt = 1
00000400:00000200:3.0:1597692793.020062:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp11(ffff93a53c96fe00) cpt = 1
00000400:00000200:3.0:1597692806.332061:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp10(ffff93a5a43ea500) cpt = 1
00000400:00000200:3.0:1597692808.380070:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp11(ffff93a53c96fe00) cpt = 1
00000400:00000200:3.0:1597692821.692041:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp10(ffff93a5a43ea500) cpt = 1
00000400:00000200:3.0:1597692823.740048:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp11(ffff93a53c96fe00) cpt = 1
00000400:00000200:3.0:1597692836.028075:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp10(ffff93a5a43ea500) cpt = 1
00000400:00000200:3.0:1597692838.076090:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp11(ffff93a53c96fe00) cpt = 1
00000400:00000200:3.0:1597692851.388127:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp10(ffff93a5a43ea500) cpt = 1
00000400:00000200:3.0:1597692853.436089:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp11(ffff93a53c96fe00) cpt = 1
00000400:00000200:3.0:1597692866.748080:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp10(ffff93a5a43ea500) cpt = 1
00000400:00000200:3.0:1597692868.796055:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp11(ffff93a53c96fe00) cpt = 1
00000400:00000200:3.0:1597692881.084093:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp10(ffff93a5a43ea500) cpt = 1
00000400:00000200:3.0:1597692883.132121:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp11(ffff93a53c96fe00) cpt = 1
00000400:00000200:3.0:1597692896.444129:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp10(ffff93a5a43ea500) cpt = 1
00000400:00000200:3.0:1597692898.492085:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp11(ffff93a53c96fe00) cpt = 1
00000400:00000200:3.0:1597692911.804178:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp10(ffff93a5a43ea500) cpt = 1
00000400:00000200:3.0:1597692913.852211:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp11(ffff93a53c96fe00) cpt = 1
00000400:00000200:3.0:1597692926.140084:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp10(ffff93a5a43ea500) cpt = 1
00000400:00000200:3.0:1597692928.188112:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp11(ffff93a53c96fe00) cpt = 1
00000400:00000200:3.0:1597692941.500096:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp10(ffff93a5a43ea500) cpt = 1
00000400:00000200:3.0:1597692943.548098:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp11(ffff93a53c96fe00) cpt = 1
00000400:00000200:3.0:1597692956.860151:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp10(ffff93a5a43ea500) cpt = 1
00000400:00000200:3.0:1597692958.908200:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp11(ffff93a53c96fe00) cpt = 1
00000400:00000200:3.0:1597692971.196694:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp10(ffff93a5a43ea500) cpt = 1
00000400:00000200:3.0:1597692973.244082:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp11(ffff93a53c96fe00) cpt = 1
00000400:00000200:3.0:1597692986.556074:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp10(ffff93a5a43ea500) cpt = 1
00000400:00000200:3.0:1597692988.604117:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp11(ffff93a53c96fe00) cpt = 1
00000400:00000200:3.0:1597693001.916333:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp10(ffff93a5a43ea500) cpt = 1
00000400:00000200:3.0:1597693003.964361:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp11(ffff93a53c96fe00) cpt = 1
00000400:00000200:3.0:1597693016.252092:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp10(ffff93a5a43ea500) cpt = 1
00000400:00000200:3.0:1597693018.300111:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp11(ffff93a53c96fe00) cpt = 1
00000400:00000200:3.0:1597693031.612116:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp10(ffff93a5a43ea500) cpt = 1
00000400:00000200:3.0:1597693033.660105:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp11(ffff93a53c96fe00) cpt = 1
00000400:00000200:3.0:1597693046.972061:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp10(ffff93a5a43ea500) cpt = 1
00000400:00000200:3.0:1597693049.020096:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp11(ffff93a53c96fe00) cpt = 1
00000400:00000200:3.0:1597693061.308071:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp10(ffff93a5a43ea500) cpt = 1
00000400:00000200:3.0:1597693064.380103:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp11(ffff93a53c96fe00) cpt = 1
00000400:00000200:3.0:1597693076.668114:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp10(ffff93a5a43ea500) cpt = 1
00000400:00000200:3.0:1597693079.740148:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp11(ffff93a53c96fe00) cpt = 1
00000400:00000200:3.0:1597693091.004132:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp10(ffff93a5a43ea500) cpt = 1
00000400:00000200:3.0:1597693094.076132:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp11(ffff93a53c96fe00) cpt = 1
00000400:00000200:3.0:1597693106.364052:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp10(ffff93a5a43ea500) cpt = 1
00000400:00000200:3.0:1597693109.436064:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp11(ffff93a53c96fe00) cpt = 1
00000400:00000200:3.0:1597693121.724125:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp10(ffff93a5a43ea500) cpt = 1
00000400:00000200:3.0:1597693124.796137:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp11(ffff93a53c96fe00) cpt = 1
00000400:00000200:3.0:1597693136.060160:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp10(ffff93a5a43ea500) cpt = 1
00000400:00000200:3.0:1597693139.132125:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp11(ffff93a53c96fe00) cpt = 1
00000400:00000200:3.0:1597693151.420063:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp10(ffff93a5a43ea500) cpt = 1
00000400:00000200:3.0:1597693154.492093:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp11(ffff93a53c96fe00) cpt = 1
00000400:00000200:3.0:1597693166.780101:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp10(ffff93a5a43ea500) cpt = 1
00000400:00000200:3.0:1597693169.852103:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp11(ffff93a53c96fe00) cpt = 1
00000400:00000200:3.0:1597693181.116105:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp10(ffff93a5a43ea500) cpt = 1
00000400:00000200:3.0:1597693184.188104:0:10126:0:(router.c:1233:lnet_check_routers()) 192.168.2.32@tcp99(ffff93a57174e200) tcp11(ffff93a53c96fe00) cpt = 1


 Comments   
Comment by Gerrit Updater [ 19/Aug/20 ]

Chris Horn (chris.horn@hpe.com) uploaded a new patch: https://review.whamcloud.com/39694
Subject: LU-13912 lnet: Correct the router ping interval calculation
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: bbfaa7b23f5c1fcae6737087c848b49fda59bf7f

Comment by Gerrit Updater [ 11/May/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39694/
Subject: LU-13912 lnet: Correct the router ping interval calculation
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 0131d39a622f1efc07dc49df7bceed1bbe16357d

Comment by Peter Jones [ 12/May/21 ]

Landed for 2.15

Generated at Sat Feb 10 03:05:16 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.