Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12932

parallel-scale test rr_alloc fails with ‘failed while setting qos_threshold_rr & creat_count’

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.13.0
    • Lustre 2.13.0
    • None
    • 3
    • 9223372036854775807

    Description

      parallel-scale test_rr_alloc fails getting/setting ‘lov.lustre-MDT0000*.qos_threshold_rr’. These failures started on approximately 29 OCT 2019 and may be related to the changes in the ‘striped directory allocate stripes by QoS’, LU-12624, patches landings.

      Looking at the suite_log for https://testing.whamcloud.com/test_sets/c0dad8f4-fd58-11e9-8e77-52540065bddc, we see the errors getting and setting qos_threshold_rr on the MDS

      CMD: trevis-20vm12 params=\$(/usr/sbin/lctl get_param lov.lustre-MDT0000*.qos_threshold_rr);
      			 [[ -z \"lustre-MDT0000\" ]] && param= ||
      			 param=\$(grep lustre-MDT0000 <<< \"\$params\");
      			 [[ -z \$param ]] && param=\"\$params\";
      			 while read s; do echo mds1 \$s;
      			 done <<< \"\$param\"
      trevis-20vm12: error: get_param: param_path 'lov/lustre-MDT0000*/qos_threshold_rr': No such file or directory
      CMD: trevis-20vm12 params=\$(/usr/sbin/lctl get_param osp.lustre-OST*-osc-MDT0000.create_count);
      			 [[ -z \"lustre-MDT0000\" ]] && param= ||
      			 param=\$(grep lustre-MDT0000 <<< \"\$params\");
      			 [[ -z \$param ]] && param=\"\$params\";
      			 while read s; do echo mds1 \$s;
      			 done <<< \"\$param\"
      CMD: trevis-20vm12 /usr/sbin/lctl set_param -n 		lov.lustre-MDT0000*.qos_threshold_rr 100 		osp.lustre-OST*-osc-MDT0000.create_count 3488
      trevis-20vm12: error: set_param: param_path 'lov/lustre-MDT0000*/qos_threshold_rr': No such file or directory
       parallel-scale test_rr_alloc: @@@@@@ FAIL: failed while setting qos_threshold_rr & creat_count 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:6108:error()
        = /usr/lib64/lustre/tests/functions.sh:1004:run_rr_alloc()
        = /usr/lib64/lustre/tests/parallel-scale.sh:165:test_rr_alloc()
      

      Looking at the sanity suite_log at https://testing.whamcloud.com/test_sets/9ee6cf78-fd58-11e9-8e77-52540065bddc, we see failures getting the qos_threshold_rr parameter

      == sanity test 116a: stripe QOS: free space balance ================================================== 00:49:17 (1572569357)
      Free space priority CMD: trevis-20vm12 lctl get_param -n lo[vd].*-mdtlov.qos_prio_free
      91%
      CMD: trevis-20vm12 /usr/sbin/lctl set_param -n os[cd]*.*MD*.force_sync 1
      CMD: trevis-20vm12 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: trevis-20vm12 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: trevis-20vm12 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: trevis-20vm12 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: trevis-20vm12 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: trevis-20vm12 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: trevis-20vm12 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      sleep 5 for ZFS zfs
      sleep 5 for ZFS zfs
      Waiting for local destroys to complete
      OST kbytes available: 1878016 1901568 1912832 1904640 1900544 1911808 1909760
      Min free space: OST 0: 1878016
      Max free space: OST 2: 1912832
      CMD: trevis-20vm12 lctl get_param -n *.*MDT0000-mdtlov.qos_threshold_rr
      trevis-20vm12: error: get_param: param_path '*/*MDT0000-mdtlov/qos_threshold_rr': No such file or directory
      Check for uneven OSTs: diff=34816KB (1%) must be > % ...ok
      Don't need to fill OST0
      diff=34816=1% must be > % for QOS mode.../usr/lib64/lustre/tests/sanity.sh: line 10107: [: 1: unary operator expected
      failed - QOS mode won't be used
      sleep 5 for ZFS zfs
      Waiting for local destroys to complete
      cleanup time 6
      
       SKIP: sanity test_116a QOS imbalance criteria not met
      SKIP 116a (29s)
      
      == sanity test 116b: QoS shouldn't LBUG if not enough OSTs found on the 2nd pass ===================== 00:49:46 (1572569386)
      CMD: trevis-20vm12 lctl get_param -n lo[vd].lustre-MDT0000-mdtlov.qos_threshold_rr
      trevis-20vm12: error: get_param: param_path 'lo[vd]/lustre-MDT0000-mdtlov/qos_threshold_rr': No such file or directory
      
       SKIP: sanity test_116b no QOS
      SKIP 116b (1s)
      

      In sanityn, https://testing.whamcloud.com/test_sets/ab3b11a8-fd58-11e9-8e77-52540065bddc, we see similar failures

      == sanityn test 93: alloc_rr should not allocate on same ost ========================================= 08:34:06 (1572597246)
      CMD: trevis-20vm12 lctl get_param -n lod.lustre-MDT*/qos_threshold_rr
      trevis-20vm12: error: get_param: param_path 'lod/lustre-MDT*/qos_threshold_rr': No such file or directory
      CMD: trevis-20vm12 lctl set_param -n lod.lustre-MDT*/qos_threshold_rr 100
      trevis-20vm12: error: set_param: param_path 'lod/lustre-MDT*/qos_threshold_rr': No such file or directory
      CMD: trevis-20vm12 lctl set_param fail_loc=0x00000163
      fail_loc=0x00000163
      CMD: trevis-20vm12 lctl set_param fail_loc=0x0
      fail_loc=0x0
      CMD: trevis-20vm12 lctl set_param -n 		'lod.lustre-MDT*/qos_threshold_rr' 
      
      trevis-20vm12: error: set_param: setting lod.lustre-MDT*/qos_threshold_rr: no value
      

      Other failures are at
      https://testing.whamcloud.com/test_sets/1447e4bc-fce3-11e9-b934-52540065bddc
      https://testing.whamcloud.com/test_sets/25c770d8-fcff-11e9-8e77-52540065bddc
      https://testing.whamcloud.com/test_sets/53262b58-fd06-11e9-bbc3-52540065bddc

      Attachments

        Issue Links

          Activity

            People

              laisiyao Lai Siyao
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: