[LU-15342] recovery-mds-scale test_failover_mds: sh: IDLE: command not found Created: 08/Dec/21  Updated: 25/Apr/23  Resolved: 23/Dec/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.15.0
Fix Version/s: Lustre 2.15.0

Type: Bug Priority: Major
Reporter: Maloo Assignee: Elena Gryaznova
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-12857 recovery-mds-scale test_failover_ost ... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Elena <elena.gryaznova@hpe.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/ec8cab41-6587-4369-815b-b53f9684d272

test_failover_mds failed with the following error:

Checking clients are in FULL|IDLE state before next failover
CMD: onyx-33vm3,onyx-33vm4 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/openmpi/bin:/usr/share/Modules/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/sbin:/sbin:/bin::/sbin:/bin:/usr/sbin: NAME=autotest_config bash rpc.sh wait_import_state_mount FULL|IDLE mdc.lustre-MDT0000-mdc-*.mds_server_uuid 
onyx-33vm4: sh: IDLE: command not found
onyx-33vm3: sh: IDLE: command not found
onyx-33vm4: CMD: onyx-64vm13 /usr/sbin/lctl get_param -n version 2>/dev/null
onyx-33vm3: CMD: onyx-64vm13 /usr/sbin/lctl get_param -n version 2>/dev/null
onyx-33vm4: CMD: onyx-64vm13 /usr/sbin/lctl get_param -n version 2>/dev/null
onyx-33vm3: CMD: onyx-64vm13 /usr/sbin/lctl get_param -n version 2>/dev/null
onyx-33vm4: CMD: onyx-24vm5 /usr/sbin/lctl get_param -n version 2>/dev/null
onyx-33vm3: CMD: onyx-24vm5 /usr/sbin/lctl get_param -n version 2>/dev/null
onyx-33vm3: CMD: onyx-33vm3.onyx.whamcloud.com /usr/sbin/lctl get_param -n version 2>/dev/null
onyx-33vm3: onyx-33vm3.onyx.whamcloud.com: executing wait_import_state_mount FULL
onyx-33vm4: CMD: onyx-33vm4.onyx.whamcloud.com /usr/sbin/lctl get_param -n version 2>/dev/null
onyx-33vm4: onyx-33vm4.onyx.whamcloud.com: executing wait_import_state_mount FULL
onyx-33vm3: CMD: onyx-33vm3.onyx.whamcloud.com lctl get_param -n at_max
pdsh@onyx-33vm1: onyx-33vm3: ssh exited with exit code 127
onyx-33vm4: CMD: onyx-33vm4.onyx.whamcloud.com lctl get_param -n at_max
pdsh@onyx-33vm1: onyx-33vm4: ssh exited with exit code 127
 recovery-mds-scale test_failover_mds: @@@@@@ FAIL: import is not in FULL|IDLE state 

regression caused by:

commit af666bef058c5b7997527fc851a84a89375912fb
Author:     Andreas Dilger <adilger@whamcloud.com>
AuthorDate: Wed Oct 20 19:47:25 2021 -0600
Commit:     Oleg Drokin <green@whamcloud.com>
CommitDate: Tue Nov 30 03:52:10 2021 +0000

    LU-12857 tests: allow clients to be IDLE after recovery
    
    If clients are not connected to an OST when it fails (connection
    is IDLE), they do not need to be involved in recovery, so this
    should not be considered an error when checking the client state.
    
    Test-Parameters: trivial testlist=recovery-mds-scale env=SLOW=no
    Test-Parameters: testlist=conf-sanity
    Test-Parameters: testlist=replay-dual,replay-single
    Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
    Change-Id: I6cfeb718acd233378ed1608f22061bc15c3ebbe5
    Reviewed-on: https://review.whamcloud.com/45318
    Tested-by: jenkins <devops@whamcloud.com>
    Tested-by: Maloo <maloo@whamcloud.com>
    Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
    Reviewed-by: James Nunez <jnunez@whamcloud.com>
    Reviewed-by: Oleg Drokin <green@whamcloud.com>

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
recovery-mds-scale test_failover_mds - import is not in FULL|IDLE state



 Comments   
Comment by Gerrit Updater [ 08/Dec/21 ]

"Elena Gryaznova <elena.gryaznova@hpe.com>" uploaded a new patch: https://review.whamcloud.com/45788
Subject: LU-15342 tests: escape "|"
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 999811b11b83e4b5b3f01d9a15b4835e87fcfa1e

Comment by Gerrit Updater [ 23/Dec/21 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45788/
Subject: LU-15342 tests: escape "|"
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 25606a2ce19e94c13694d46c3f15e9a10df40a91

Comment by Peter Jones [ 23/Dec/21 ]

Landed for 2.15

Generated at Sat Feb 10 03:17:30 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.