[LU-11990] conf-sanity test_66: replace nids fail alone MGS Created: 22/Feb/19  Updated: 01/Feb/24

Status: Reopened
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.1
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Alexander Boyko Assignee: Alexander Boyko
Resolution: Unresolved Votes: 0
Labels: always_except
Environment:

separate MGS


Issue Links:
Related
is related to LU-13356 lctl conf_param hung on the MGS node Resolved
is related to LU-3056 conf-sanity test_66 - replace nids fa... Resolved
is related to LU-3793 conf-sanity, subtest test_66 fails du... Resolved
is related to LU-4200 Test failure on test suite conf-sanit... Resolved
is related to LU-5137 Test failure conf-sanity test_66: rep... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

https://testing.whamcloud.com/sub_tests/fe25b856-2e5e-11e9-be61-52540065bddc

Started lustre-MDT0000
start ost1 service on trevis-26vm3
CMD: trevis-26vm3 mkdir -p /mnt/lustre-ost1
CMD: trevis-26vm3 dmsetup status /dev/mapper/ost1_flakey >/dev/null 2>&1
CMD: trevis-26vm3 dmsetup status /dev/mapper/ost1_flakey 2>&1
CMD: trevis-26vm3 test -b /dev/mapper/ost1_flakey
CMD: trevis-26vm3 e2label /dev/mapper/ost1_flakey
Starting ost1:   /dev/mapper/ost1_flakey /mnt/lustre-ost1
CMD: trevis-26vm3 mkdir -p /mnt/lustre-ost1; mount -t lustre   /dev/mapper/ost1_flakey /mnt/lustre-ost1
CMD: trevis-26vm3 /usr/sbin/lctl get_param -n health_check
CMD: trevis-26vm3 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/qt-3.3/bin:/usr/lib64/compat-openmpi16/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/sbin:/sbin:/bin::/sbin:/bin:/usr/sbin: NAME=autotest_config bash rpc.sh set_default_debug \"-1\" \"all\" 4 
trevis-26vm3: == rpc test complete, duration -o sec ================================================================ 23:29:02 (1549927742)
trevis-26vm3: trevis-26vm3.trevis.whamcloud.com: executing set_default_debug -1 all 4
CMD: trevis-26vm3 e2label /dev/mapper/ost1_flakey 				2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
CMD: trevis-26vm3 e2label /dev/mapper/ost1_flakey 				2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
CMD: trevis-26vm3 e2label /dev/mapper/ost1_flakey 2>/dev/null
Started lustre-OST0000
mount lustre on /mnt/lustre.....
Starting client: trevis-26vm1.trevis.whamcloud.com:  -o user_xattr,flock trevis-26vm10@tcp:/lustre /mnt/lustre
CMD: trevis-26vm1.trevis.whamcloud.com mkdir -p /mnt/lustre
CMD: trevis-26vm1.trevis.whamcloud.com mount -t lustre -o user_xattr,flock trevis-26vm10@tcp:/lustre /mnt/lustre
CMD: trevis-26vm3 /usr/sbin/lctl list_nids
CMD: trevis-26vm4 /usr/sbin/lctl list_nids
CMD: trevis-26vm4 /usr/sbin/lctl get_param -n osc.lustre-OST0000-osc-MDT0000.active
Setting lustre-OST0000.osc.active from 1 to 0
CMD: trevis-26vm10 /usr/sbin/lctl conf_param lustre-OST0000.osc.active='0'
CMD: trevis-26vm4 /usr/sbin/lctl get_param -n osc.lustre-OST0000-osc-MDT0000.active
CMD: trevis-26vm4 /usr/sbin/lctl get_param -n osc.lustre-OST0000-osc-MDT0000.active
Waiting 90 secs for update
CMD: trevis-26vm4 /usr/sbin/lctl get_param -n osc.lustre-OST0000-osc-MDT0000.active
CMD: trevis-26vm4 /usr/sbin/lctl get_param -n osc.lustre-OST0000-osc-MDT0000.active
CMD: trevis-26vm4 /usr/sbin/lctl get_param -n osc.lustre-OST0000-osc-MDT0000.active
CMD: trevis-26vm4 /usr/sbin/lctl get_param -n osc.lustre-OST0000-osc-MDT0000.active
CMD: trevis-26vm4 /usr/sbin/lctl get_param -n osc.lustre-OST0000-osc-MDT0000.active
Updated after 5s: wanted '0' got '0'
replace_nids should fail if MDS, OSTs and clients are UP
CMD: trevis-26vm10 /usr/sbin/lctl replace_nids lustre-OST0000 10.9.5.64@tcp
 conf-sanity test_66: @@@@@@ FAIL: replace_nids fail 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:5837:error()
  = /usr/lib64/lustre/tests/conf-sanity.sh:4739:test_66()
  = /usr/lib64/lustre/tests/test-framework.sh:6118:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:6157:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:6004:run_test()
  = /usr/lib64/lustre/tests/conf-sanity.sh:4814:main() 


 Comments   
Comment by Artem Blagodarenko (Inactive) [ 16/Mar/20 ]

This issue is addressed by  https://review.whamcloud.com/37880

Comment by Andreas Dilger [ 20/Jan/24 ]

Reopening because test_66 is still skipped with always_except due to this ticket.

if ! combined_mgs_mds; then
        # bug number for skipped test: LU-11991         LU-11990
        ALWAYS_EXCEPT="$ALWAYS_EXCEPT  32a 32b 32c 32d 32e      66"
        # bug number for skipped test: LU-9897  LU-12032
        ALWAYS_EXCEPT="$ALWAYS_EXCEPT  84       123F"
fi

If test_66 is never expected to work with a combined MDS/MGS then it should be moved from ALWAYS_EXCEPT to a skip check directly in test_66 itself.

If test_66 is expected to work (I now see that there is an embedded check for combined_mds_mgs) then the exception should just be removed.

aboyko, could you please submit the trivial patch to fix this one way or the other, as I'm not sure of the details.

Comment by Gerrit Updater [ 01/Feb/24 ]

"Alexander Boyko <alexander.boyko@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53877
Subject: LU-11990 tests: enable conf-sanity 66
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 55d8dcdc3973c1082a7660a99cb8b59683ea56fc

Generated at Sat Feb 10 02:48:43 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.