[LU-15538] DLC doesn't initialize default LND tunables correctly Created: 08/Feb/22 Updated: 23/Sep/22 Resolved: 11/Jun/22 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Chris Horn | Assignee: | Chris Horn |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
LND tunables are not initialized correctly when any net or lnd tunable parameter is specified via either the CLI or via the yaml config. I noticed this with map_on_demand, but the bug potentially impacts any LND tunable where '0' is an accepted value: /home/hornc/lustre-wc-rel/lnet/utils/lnetctl lnet configure
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl net add --net o2ib --if enp137s0f0
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl net show -v 5 | egrep -e o2ib -e map
- net type: o2ib
- nid: 172.18.4.4@o2ib
map_on_demand: 1
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl lnet configure
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl net add --net o2ib --if enp137s0f0 --peer-credits=32
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl net show -v 5 | egrep -e o2ib -e map
- net type: o2ib
- nid: 172.18.4.4@o2ib
map_on_demand: 0
Issue is that when any NET or LND tunable is specified via CLI or yaml, then the whole tunables struct gets memset to 0, or in the case of yaml config, 0 gets assigned to any tunable that isn't specified in the yaml. LND then thinks that '0' was specified by user and will use that value (if it is valid). |
| Comments |
| Comment by Gerrit Updater [ 10/Feb/22 ] |
|
"Chris Horn <chris.horn@hpe.com>" uploaded a new patch: https://review.whamcloud.com/46492 |
| Comment by Chris Horn [ 10/Feb/22 ] |
|
Test notes for [hornc@s-lmo-gaz38b lustre-wc-rel]$ git reset --hard origin/master HEAD is now at 450d10c362 New RC 2.15.0-RC2 [hornc@s-lmo-gaz38b lustre-wc-rel]$ make -j 32 ... [root@s-lmo-gaz38b hornc]# bash start.sh
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl lnet configure
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl net add --net o2ib --if enp137s0f0
- net type: o2ib
- nid: 172.18.4.4@o2ib
map_on_demand: 1
[root@s-lmo-gaz38b hornc]# bash clean.sh
[root@s-lmo-gaz38b hornc]# bash start.sh map_on_demand=0
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl lnet configure
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl net add --net o2ib --if enp137s0f0
- net type: o2ib
- nid: 172.18.4.4@o2ib
map_on_demand: 0
[root@s-lmo-gaz38b hornc]# bash clean.sh
[root@s-lmo-gaz38b hornc]# bash start.sh map_on_demand=1
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl lnet configure
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl net add --net o2ib --if enp137s0f0
- net type: o2ib
- nid: 172.18.4.4@o2ib
map_on_demand: 1
[root@s-lmo-gaz38b hornc]#
We can see map_on_demand has expected value (note, the default is 1) in each case. Repeat test, but specify '--peer-credits=32': [root@s-lmo-gaz38b hornc]# bash start.sh2
debug=+net
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl lnet configure
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl net add --net o2ib --if enp137s0f0 --peer-credits=32
- net type: o2ib
- nid: 172.18.4.4@o2ib
map_on_demand: 0
[root@s-lmo-gaz38b hornc]# bash clean.sh
[root@s-lmo-gaz38b hornc]# bash start.sh2 map_on_demand=0
debug=+net
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl lnet configure
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl net add --net o2ib --if enp137s0f0 --peer-credits=32
- net type: o2ib
- nid: 172.18.4.4@o2ib
map_on_demand: 0
[root@s-lmo-gaz38b hornc]# bash clean.sh
[root@s-lmo-gaz38b hornc]# bash start.sh2 map_on_demand=1
debug=+net
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl lnet configure
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl net add --net o2ib --if enp137s0f0 --peer-credits=32
- net type: o2ib
- nid: 172.18.4.4@o2ib
map_on_demand: 0
[root@s-lmo-gaz38b hornc]#
We can see the wrong value is used in 2/3 cases. Apply fix: [hornc@s-lmo-gaz38b lustre-wc-rel]$ git fetch https://review.whamcloud.com/fs/lustre-release refs/changes/92/46492/1 && git cherry-pick FETCH_HEAD remote: Counting objects: 2010, done remote: Finding sources: 100% (10/10) remote: Total 10 (delta 9), reused 10 (delta 9) Unpacking objects: 100% (10/10), 1.30 KiB | 665.00 KiB/s, done. From https://review.whamcloud.com/fs/lustre-release * branch refs/changes/92/46492/1 -> FETCH_HEAD [tmp4 c2c7abe9b6] LU-15538 lnet: DLC sets map_on_demand incorrectly Date: Sat Feb 5 23:15:30 2022 +0000 3 files changed, 7 insertions(+), 1 deletion(-) [hornc@s-lmo-gaz38b lustre-wc-rel]$ make -j 32 ... Repeat test cases: [root@s-lmo-gaz38b hornc]# bash start.sh
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl lnet configure
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl net add --net o2ib --if enp137s0f0
- net type: o2ib
- nid: 172.18.4.4@o2ib
map_on_demand: 1
[root@s-lmo-gaz38b hornc]# bash clean.sh
[root@s-lmo-gaz38b hornc]# bash start.sh map_on_demand=0
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl lnet configure
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl net add --net o2ib --if enp137s0f0
- net type: o2ib
- nid: 172.18.4.4@o2ib
map_on_demand: 0
[root@s-lmo-gaz38b hornc]# bash clean.sh
[root@s-lmo-gaz38b hornc]# bash start.sh map_on_demand=1
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl lnet configure
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl net add --net o2ib --if enp137s0f0
- net type: o2ib
- nid: 172.18.4.4@o2ib
map_on_demand: 1
[root@s-lmo-gaz38b hornc]# bash clean.sh
[root@s-lmo-gaz38b hornc]# bash start.sh2
debug=+net
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl lnet configure
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl net add --net o2ib --if enp137s0f0 --peer-credits=32
- net type: o2ib
- nid: 172.18.4.4@o2ib
map_on_demand: 1
[root@s-lmo-gaz38b hornc]# bash clean.sh
[root@s-lmo-gaz38b hornc]# bash start.sh2 map_on_demand=0
debug=+net
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl lnet configure
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl net add --net o2ib --if enp137s0f0 --peer-credits=32
- net type: o2ib
- nid: 172.18.4.4@o2ib
map_on_demand: 0
[root@s-lmo-gaz38b hornc]# bash clean.sh
[root@s-lmo-gaz38b hornc]# bash start.sh2 map_on_demand=1
debug=+net
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl lnet configure
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl net add --net o2ib --if enp137s0f0 --peer-credits=32
- net type: o2ib
- nid: 172.18.4.4@o2ib
map_on_demand: 1
[root@s-lmo-gaz38b hornc]#
We can see map_on_demand gets correct value in every case. |
| Comment by Gerrit Updater [ 11/Jun/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46492/ |
| Comment by Peter Jones [ 11/Jun/22 ] |
|
Landed for 2.16 |