Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.13.0
-
None
-
3
-
9223372036854775807
Description
parallel-scale test_rr_alloc fails getting/setting ‘lov.lustre-MDT0000*.qos_threshold_rr’. These failures started on approximately 29 OCT 2019 and may be related to the changes in the ‘striped directory allocate stripes by QoS’, LU-12624, patches landings.
Looking at the suite_log for https://testing.whamcloud.com/test_sets/c0dad8f4-fd58-11e9-8e77-52540065bddc, we see the errors getting and setting qos_threshold_rr on the MDS
CMD: trevis-20vm12 params=\$(/usr/sbin/lctl get_param lov.lustre-MDT0000*.qos_threshold_rr); [[ -z \"lustre-MDT0000\" ]] && param= || param=\$(grep lustre-MDT0000 <<< \"\$params\"); [[ -z \$param ]] && param=\"\$params\"; while read s; do echo mds1 \$s; done <<< \"\$param\" trevis-20vm12: error: get_param: param_path 'lov/lustre-MDT0000*/qos_threshold_rr': No such file or directory CMD: trevis-20vm12 params=\$(/usr/sbin/lctl get_param osp.lustre-OST*-osc-MDT0000.create_count); [[ -z \"lustre-MDT0000\" ]] && param= || param=\$(grep lustre-MDT0000 <<< \"\$params\"); [[ -z \$param ]] && param=\"\$params\"; while read s; do echo mds1 \$s; done <<< \"\$param\" CMD: trevis-20vm12 /usr/sbin/lctl set_param -n lov.lustre-MDT0000*.qos_threshold_rr 100 osp.lustre-OST*-osc-MDT0000.create_count 3488 trevis-20vm12: error: set_param: param_path 'lov/lustre-MDT0000*/qos_threshold_rr': No such file or directory parallel-scale test_rr_alloc: @@@@@@ FAIL: failed while setting qos_threshold_rr & creat_count Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:6108:error() = /usr/lib64/lustre/tests/functions.sh:1004:run_rr_alloc() = /usr/lib64/lustre/tests/parallel-scale.sh:165:test_rr_alloc()
Looking at the sanity suite_log at https://testing.whamcloud.com/test_sets/9ee6cf78-fd58-11e9-8e77-52540065bddc, we see failures getting the qos_threshold_rr parameter
== sanity test 116a: stripe QOS: free space balance ================================================== 00:49:17 (1572569357) Free space priority CMD: trevis-20vm12 lctl get_param -n lo[vd].*-mdtlov.qos_prio_free 91% CMD: trevis-20vm12 /usr/sbin/lctl set_param -n os[cd]*.*MD*.force_sync 1 CMD: trevis-20vm12 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: trevis-20vm12 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: trevis-20vm12 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: trevis-20vm12 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: trevis-20vm12 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: trevis-20vm12 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: trevis-20vm12 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* sleep 5 for ZFS zfs sleep 5 for ZFS zfs Waiting for local destroys to complete OST kbytes available: 1878016 1901568 1912832 1904640 1900544 1911808 1909760 Min free space: OST 0: 1878016 Max free space: OST 2: 1912832 CMD: trevis-20vm12 lctl get_param -n *.*MDT0000-mdtlov.qos_threshold_rr trevis-20vm12: error: get_param: param_path '*/*MDT0000-mdtlov/qos_threshold_rr': No such file or directory Check for uneven OSTs: diff=34816KB (1%) must be > % ...ok Don't need to fill OST0 diff=34816=1% must be > % for QOS mode.../usr/lib64/lustre/tests/sanity.sh: line 10107: [: 1: unary operator expected failed - QOS mode won't be used sleep 5 for ZFS zfs Waiting for local destroys to complete cleanup time 6 SKIP: sanity test_116a QOS imbalance criteria not met SKIP 116a (29s) == sanity test 116b: QoS shouldn't LBUG if not enough OSTs found on the 2nd pass ===================== 00:49:46 (1572569386) CMD: trevis-20vm12 lctl get_param -n lo[vd].lustre-MDT0000-mdtlov.qos_threshold_rr trevis-20vm12: error: get_param: param_path 'lo[vd]/lustre-MDT0000-mdtlov/qos_threshold_rr': No such file or directory SKIP: sanity test_116b no QOS SKIP 116b (1s)
In sanityn, https://testing.whamcloud.com/test_sets/ab3b11a8-fd58-11e9-8e77-52540065bddc, we see similar failures
== sanityn test 93: alloc_rr should not allocate on same ost ========================================= 08:34:06 (1572597246) CMD: trevis-20vm12 lctl get_param -n lod.lustre-MDT*/qos_threshold_rr trevis-20vm12: error: get_param: param_path 'lod/lustre-MDT*/qos_threshold_rr': No such file or directory CMD: trevis-20vm12 lctl set_param -n lod.lustre-MDT*/qos_threshold_rr 100 trevis-20vm12: error: set_param: param_path 'lod/lustre-MDT*/qos_threshold_rr': No such file or directory CMD: trevis-20vm12 lctl set_param fail_loc=0x00000163 fail_loc=0x00000163 CMD: trevis-20vm12 lctl set_param fail_loc=0x0 fail_loc=0x0 CMD: trevis-20vm12 lctl set_param -n 'lod.lustre-MDT*/qos_threshold_rr' trevis-20vm12: error: set_param: setting lod.lustre-MDT*/qos_threshold_rr: no value
Other failures are at
https://testing.whamcloud.com/test_sets/1447e4bc-fce3-11e9-b934-52540065bddc
https://testing.whamcloud.com/test_sets/25c770d8-fcff-11e9-8e77-52540065bddc
https://testing.whamcloud.com/test_sets/53262b58-fd06-11e9-bbc3-52540065bddc