[LU-15538] DLC doesn't initialize default LND tunables correctly Created: 08/Feb/22  Updated: 23/Sep/22  Resolved: 11/Jun/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Minor
Reporter: Chris Horn Assignee: Chris Horn
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

LND tunables are not initialized correctly when any net or lnd tunable parameter is specified via either the CLI or via the yaml config.

I noticed this with map_on_demand, but the bug potentially impacts any LND tunable where '0' is an accepted value:

/home/hornc/lustre-wc-rel/lnet/utils/lnetctl lnet configure
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl net add --net o2ib --if enp137s0f0
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl net show -v 5 | egrep -e o2ib -e  map
    - net type: o2ib
        - nid: 172.18.4.4@o2ib
              map_on_demand: 1

/home/hornc/lustre-wc-rel/lnet/utils/lnetctl lnet configure
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl net add --net o2ib --if enp137s0f0 --peer-credits=32
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl net show -v 5 | egrep -e o2ib -e  map
    - net type: o2ib
        - nid: 172.18.4.4@o2ib
              map_on_demand: 0

Issue is that when any NET or LND tunable is specified via CLI or yaml, then the whole tunables struct gets memset to 0, or in the case of yaml config, 0 gets assigned to any tunable that isn't specified in the yaml. LND then thinks that '0' was specified by user and will use that value (if it is valid).



 Comments   
Comment by Gerrit Updater [ 10/Feb/22 ]

"Chris Horn <chris.horn@hpe.com>" uploaded a new patch: https://review.whamcloud.com/46492
Subject: LU-15538 lnet: DLC sets map_on_demand incorrectly
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 6d5b0308cae0db67daa7df626a60fd0825d252cc

Comment by Chris Horn [ 10/Feb/22 ]

Test notes for LU-15538:

[hornc@s-lmo-gaz38b lustre-wc-rel]$ git reset --hard origin/master
HEAD is now at 450d10c362 New RC 2.15.0-RC2
[hornc@s-lmo-gaz38b lustre-wc-rel]$ make -j 32
...
[root@s-lmo-gaz38b hornc]# bash start.sh
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl lnet configure
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl net add --net o2ib --if enp137s0f0
    - net type: o2ib
        - nid: 172.18.4.4@o2ib
              map_on_demand: 1
[root@s-lmo-gaz38b hornc]# bash clean.sh
[root@s-lmo-gaz38b hornc]# bash start.sh map_on_demand=0
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl lnet configure
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl net add --net o2ib --if enp137s0f0
    - net type: o2ib
        - nid: 172.18.4.4@o2ib
              map_on_demand: 0
[root@s-lmo-gaz38b hornc]# bash clean.sh
[root@s-lmo-gaz38b hornc]# bash start.sh map_on_demand=1
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl lnet configure
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl net add --net o2ib --if enp137s0f0
    - net type: o2ib
        - nid: 172.18.4.4@o2ib
              map_on_demand: 1
[root@s-lmo-gaz38b hornc]#

We can see map_on_demand has expected value (note, the default is 1) in each case.

Repeat test, but specify '--peer-credits=32':

[root@s-lmo-gaz38b hornc]# bash start.sh2
debug=+net
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl lnet configure
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl net add --net o2ib --if enp137s0f0 --peer-credits=32
    - net type: o2ib
        - nid: 172.18.4.4@o2ib
              map_on_demand: 0
[root@s-lmo-gaz38b hornc]# bash clean.sh
[root@s-lmo-gaz38b hornc]# bash start.sh2 map_on_demand=0
debug=+net
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl lnet configure
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl net add --net o2ib --if enp137s0f0 --peer-credits=32
    - net type: o2ib
        - nid: 172.18.4.4@o2ib
              map_on_demand: 0
[root@s-lmo-gaz38b hornc]# bash clean.sh
[root@s-lmo-gaz38b hornc]# bash start.sh2 map_on_demand=1
debug=+net
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl lnet configure
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl net add --net o2ib --if enp137s0f0 --peer-credits=32
    - net type: o2ib
        - nid: 172.18.4.4@o2ib
              map_on_demand: 0
[root@s-lmo-gaz38b hornc]#

We can see the wrong value is used in 2/3 cases.

Apply fix:

[hornc@s-lmo-gaz38b lustre-wc-rel]$ git fetch https://review.whamcloud.com/fs/lustre-release refs/changes/92/46492/1 && git cherry-pick FETCH_HEAD
remote: Counting objects: 2010, done
remote: Finding sources: 100% (10/10)
remote: Total 10 (delta 9), reused 10 (delta 9)
Unpacking objects: 100% (10/10), 1.30 KiB | 665.00 KiB/s, done.
From https://review.whamcloud.com/fs/lustre-release
 * branch                  refs/changes/92/46492/1 -> FETCH_HEAD
[tmp4 c2c7abe9b6] LU-15538 lnet: DLC sets map_on_demand incorrectly
 Date: Sat Feb 5 23:15:30 2022 +0000
 3 files changed, 7 insertions(+), 1 deletion(-)
[hornc@s-lmo-gaz38b lustre-wc-rel]$ make -j 32
...

Repeat test cases:

[root@s-lmo-gaz38b hornc]# bash start.sh
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl lnet configure
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl net add --net o2ib --if enp137s0f0
    - net type: o2ib
        - nid: 172.18.4.4@o2ib
              map_on_demand: 1
[root@s-lmo-gaz38b hornc]# bash clean.sh
[root@s-lmo-gaz38b hornc]# bash start.sh map_on_demand=0
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl lnet configure
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl net add --net o2ib --if enp137s0f0
    - net type: o2ib
        - nid: 172.18.4.4@o2ib
              map_on_demand: 0
[root@s-lmo-gaz38b hornc]# bash clean.sh
[root@s-lmo-gaz38b hornc]# bash start.sh map_on_demand=1
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl lnet configure
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl net add --net o2ib --if enp137s0f0
    - net type: o2ib
        - nid: 172.18.4.4@o2ib
              map_on_demand: 1
[root@s-lmo-gaz38b hornc]# bash clean.sh
[root@s-lmo-gaz38b hornc]# bash start.sh2
debug=+net
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl lnet configure
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl net add --net o2ib --if enp137s0f0 --peer-credits=32
    - net type: o2ib
        - nid: 172.18.4.4@o2ib
              map_on_demand: 1
[root@s-lmo-gaz38b hornc]# bash clean.sh
[root@s-lmo-gaz38b hornc]# bash start.sh2 map_on_demand=0
debug=+net
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl lnet configure
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl net add --net o2ib --if enp137s0f0 --peer-credits=32
    - net type: o2ib
        - nid: 172.18.4.4@o2ib
              map_on_demand: 0
[root@s-lmo-gaz38b hornc]# bash clean.sh
[root@s-lmo-gaz38b hornc]# bash start.sh2 map_on_demand=1
debug=+net
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl lnet configure
/home/hornc/lustre-wc-rel/lnet/utils/lnetctl net add --net o2ib --if enp137s0f0 --peer-credits=32
    - net type: o2ib
        - nid: 172.18.4.4@o2ib
              map_on_demand: 1
[root@s-lmo-gaz38b hornc]#

We can see map_on_demand gets correct value in every case.

Comment by Gerrit Updater [ 11/Jun/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46492/
Subject: LU-15538 lnet: DLC sets map_on_demand incorrectly
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 896f4a082b93453f5e7168f685faff4fba594ff3

Comment by Peter Jones [ 11/Jun/22 ]

Landed for 2.16

Generated at Sat Feb 10 03:19:10 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.