[LU-17475] sanity test_432 fails with "mgs and active mismatch, 10 attempts" with IPv6 Created: 26/Jan/24  Updated: 26/Jan/24

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Chris Horn Assignee: Chris Horn
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   
== sanity test 432: mv dir from outside Lustre =========== 05:03:34 (1706007814)
On MGS 2601:8c1:c180:2000::cbdd, active = nodemap.active=1
On el8-mds2 2601:8c1:c180:2000::cbde, active =
On el8-mds2 2601:8c1:c180:2000::cbde, active =
On el8-mds2 2601:8c1:c180:2000::cbde, active =
On el8-mds2 2601:8c1:c180:2000::cbde, active =
On el8-mds2 2601:8c1:c180:2000::cbde, active =
On el8-mds2 2601:8c1:c180:2000::cbde, active =
On el8-mds2 2601:8c1:c180:2000::cbde, active =
On el8-mds2 2601:8c1:c180:2000::cbde, active =
On el8-mds2 2601:8c1:c180:2000::cbde, active =
On el8-mds2 2601:8c1:c180:2000::cbde, active =
MGS
nodemap.active=1
OTHER - IP: 2601:8c1:c180:2000::cbde

 sanity test_432: @@@@@@ FAIL: mgs and active  mismatch, 10 attempts

wait_nm_sync() in test-framework.sh uses the IP address as an argument to do_node() :

        # wait up to 10 seconds for other servers to sync with mgs
        for i in $(seq 1 10); do
                for node in $(all_server_nodes); do
                        local node_ip=$(host_nids_address $node $NETTYPE |
                                        cut -d' ' -f1)

                        is_sync=true
                        if [ -z "$value" ]; then
                                [ $node_ip == $mgs_ip ] && continue
                        fi

                        out2=$(do_node $node_ip $LCTL get_param $opt \
                               nodemap.$proc_param 2>/dev/null)
                        echo "On $node ${node_ip}, ${proc_param} = $out2"
                        [ "$out1" != "$out2" ] && is_sync=false && break
                done
                $is_sync && break
                sleep 1
        done

If do_node resolves to pdsh (likely?) then this will not work with IPv6 because pdsh mis-interprets the ':' in an IPv6 address as specifying an rcmd type:

A list of hosts may also be preceded by ... "rcmd_type:" to specify an alternate rcmd connection type for these hosts.



 Comments   
Comment by Gerrit Updater [ 26/Jan/24 ]

"Chris Horn <chris.horn@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53838
Subject: LU-17475 tests: Do not pass IP to do_node in wait_nm_sync
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b0d6032e697acbc0287209ef93fb218334a79f5d

Generated at Sat Feb 10 03:35:45 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.