Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12932

parallel-scale test rr_alloc fails with ‘failed while setting qos_threshold_rr & creat_count’

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.13.0
    • Lustre 2.13.0
    • None
    • 3
    • 9223372036854775807

    Description

      parallel-scale test_rr_alloc fails getting/setting ‘lov.lustre-MDT0000*.qos_threshold_rr’. These failures started on approximately 29 OCT 2019 and may be related to the changes in the ‘striped directory allocate stripes by QoS’, LU-12624, patches landings.

      Looking at the suite_log for https://testing.whamcloud.com/test_sets/c0dad8f4-fd58-11e9-8e77-52540065bddc, we see the errors getting and setting qos_threshold_rr on the MDS

      CMD: trevis-20vm12 params=\$(/usr/sbin/lctl get_param lov.lustre-MDT0000*.qos_threshold_rr);
      			 [[ -z \"lustre-MDT0000\" ]] && param= ||
      			 param=\$(grep lustre-MDT0000 <<< \"\$params\");
      			 [[ -z \$param ]] && param=\"\$params\";
      			 while read s; do echo mds1 \$s;
      			 done <<< \"\$param\"
      trevis-20vm12: error: get_param: param_path 'lov/lustre-MDT0000*/qos_threshold_rr': No such file or directory
      CMD: trevis-20vm12 params=\$(/usr/sbin/lctl get_param osp.lustre-OST*-osc-MDT0000.create_count);
      			 [[ -z \"lustre-MDT0000\" ]] && param= ||
      			 param=\$(grep lustre-MDT0000 <<< \"\$params\");
      			 [[ -z \$param ]] && param=\"\$params\";
      			 while read s; do echo mds1 \$s;
      			 done <<< \"\$param\"
      CMD: trevis-20vm12 /usr/sbin/lctl set_param -n 		lov.lustre-MDT0000*.qos_threshold_rr 100 		osp.lustre-OST*-osc-MDT0000.create_count 3488
      trevis-20vm12: error: set_param: param_path 'lov/lustre-MDT0000*/qos_threshold_rr': No such file or directory
       parallel-scale test_rr_alloc: @@@@@@ FAIL: failed while setting qos_threshold_rr & creat_count 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:6108:error()
        = /usr/lib64/lustre/tests/functions.sh:1004:run_rr_alloc()
        = /usr/lib64/lustre/tests/parallel-scale.sh:165:test_rr_alloc()
      

      Looking at the sanity suite_log at https://testing.whamcloud.com/test_sets/9ee6cf78-fd58-11e9-8e77-52540065bddc, we see failures getting the qos_threshold_rr parameter

      == sanity test 116a: stripe QOS: free space balance ================================================== 00:49:17 (1572569357)
      Free space priority CMD: trevis-20vm12 lctl get_param -n lo[vd].*-mdtlov.qos_prio_free
      91%
      CMD: trevis-20vm12 /usr/sbin/lctl set_param -n os[cd]*.*MD*.force_sync 1
      CMD: trevis-20vm12 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: trevis-20vm12 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: trevis-20vm12 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: trevis-20vm12 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: trevis-20vm12 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: trevis-20vm12 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: trevis-20vm12 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      sleep 5 for ZFS zfs
      sleep 5 for ZFS zfs
      Waiting for local destroys to complete
      OST kbytes available: 1878016 1901568 1912832 1904640 1900544 1911808 1909760
      Min free space: OST 0: 1878016
      Max free space: OST 2: 1912832
      CMD: trevis-20vm12 lctl get_param -n *.*MDT0000-mdtlov.qos_threshold_rr
      trevis-20vm12: error: get_param: param_path '*/*MDT0000-mdtlov/qos_threshold_rr': No such file or directory
      Check for uneven OSTs: diff=34816KB (1%) must be > % ...ok
      Don't need to fill OST0
      diff=34816=1% must be > % for QOS mode.../usr/lib64/lustre/tests/sanity.sh: line 10107: [: 1: unary operator expected
      failed - QOS mode won't be used
      sleep 5 for ZFS zfs
      Waiting for local destroys to complete
      cleanup time 6
      
       SKIP: sanity test_116a QOS imbalance criteria not met
      SKIP 116a (29s)
      
      == sanity test 116b: QoS shouldn't LBUG if not enough OSTs found on the 2nd pass ===================== 00:49:46 (1572569386)
      CMD: trevis-20vm12 lctl get_param -n lo[vd].lustre-MDT0000-mdtlov.qos_threshold_rr
      trevis-20vm12: error: get_param: param_path 'lo[vd]/lustre-MDT0000-mdtlov/qos_threshold_rr': No such file or directory
      
       SKIP: sanity test_116b no QOS
      SKIP 116b (1s)
      

      In sanityn, https://testing.whamcloud.com/test_sets/ab3b11a8-fd58-11e9-8e77-52540065bddc, we see similar failures

      == sanityn test 93: alloc_rr should not allocate on same ost ========================================= 08:34:06 (1572597246)
      CMD: trevis-20vm12 lctl get_param -n lod.lustre-MDT*/qos_threshold_rr
      trevis-20vm12: error: get_param: param_path 'lod/lustre-MDT*/qos_threshold_rr': No such file or directory
      CMD: trevis-20vm12 lctl set_param -n lod.lustre-MDT*/qos_threshold_rr 100
      trevis-20vm12: error: set_param: param_path 'lod/lustre-MDT*/qos_threshold_rr': No such file or directory
      CMD: trevis-20vm12 lctl set_param fail_loc=0x00000163
      fail_loc=0x00000163
      CMD: trevis-20vm12 lctl set_param fail_loc=0x0
      fail_loc=0x0
      CMD: trevis-20vm12 lctl set_param -n 		'lod.lustre-MDT*/qos_threshold_rr' 
      
      trevis-20vm12: error: set_param: setting lod.lustre-MDT*/qos_threshold_rr: no value
      

      Other failures are at
      https://testing.whamcloud.com/test_sets/1447e4bc-fce3-11e9-b934-52540065bddc
      https://testing.whamcloud.com/test_sets/25c770d8-fcff-11e9-8e77-52540065bddc
      https://testing.whamcloud.com/test_sets/53262b58-fd06-11e9-bbc3-52540065bddc

      Attachments

        Issue Links

          Activity

            [LU-12932] parallel-scale test rr_alloc fails with ‘failed while setting qos_threshold_rr & creat_count’
            pjones Peter Jones added a comment -

            Landed for 2.13

            pjones Peter Jones added a comment - Landed for 2.13

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36686/
            Subject: LU-12932 lod: rename qos_threshold_rr parameter
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: aa4269f5c2e3c834cdff63dc32d7a7183f32374a

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36686/ Subject: LU-12932 lod: rename qos_threshold_rr parameter Project: fs/lustre-release Branch: master Current Patch Set: Commit: aa4269f5c2e3c834cdff63dc32d7a7183f32374a

            James Simmons (jsimmons@infradead.org) uploaded a new patch: https://review.whamcloud.com/36686
            Subject: LU-12932 lod: rename qos_threshold_rr parameter
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 5b4cbf2478783de790e7f00ec92c9d226f842897

            gerrit Gerrit Updater added a comment - James Simmons (jsimmons@infradead.org) uploaded a new patch: https://review.whamcloud.com/36686 Subject: LU-12932 lod: rename qos_threshold_rr parameter Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 5b4cbf2478783de790e7f00ec92c9d226f842897

            Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36683
            Subject: LU-12932 lod: rename qos_threshold_rr parameter
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 12992baae51be17271ee0656fab8d236a80cb4d1

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36683 Subject: LU-12932 lod: rename qos_threshold_rr parameter Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 12992baae51be17271ee0656fab8d236a80cb4d1

            Sigh, the parameter name that was previously existing was named "qos_threshold_rr" and not "qos_thresholdrr", so another patch is needed.

            adilger Andreas Dilger added a comment - Sigh, the parameter name that was previously existing was named " qos_threshold_rr " and not " qos_thresholdrr ", so another patch is needed.

            I don't think this is fixed. Looking at the results for the patch https://review.whamcloud.com/36667/ at https://testing.whamcloud.com/test_sets/59e536a8-ff8d-11e9-8e77-52540065bddc, we see that qos_threshold_rr is missing:

            == sanityn test 93: alloc_rr should not allocate on same ost ========================================= 03:24:52 (1572924292)
            CMD: trevis-18vm4 lctl get_param -n lod.lustre-MDT*/qos_threshold_rr
            trevis-18vm4: error: get_param: param_path 'lod/lustre-MDT*/qos_threshold_rr': No such file or directory
            CMD: trevis-18vm4 lctl set_param -n lod.lustre-MDT*/qos_threshold_rr 100
            trevis-18vm4: error: set_param: param_path 'lod/lustre-MDT*/qos_threshold_rr': No such file or directory
            CMD: trevis-18vm4 lctl set_param fail_loc=0x00000163
            fail_loc=0x00000163
            CMD: trevis-18vm4 lctl set_param fail_loc=0x0
            fail_loc=0x0
            CMD: trevis-18vm4 lctl set_param -n 		'lod.lustre-MDT*/qos_threshold_rr' 
            trevis-18vm4: error: set_param: setting lod.lustre-MDT*/qos_threshold_rr: no value
            /mnt/lustre/f93.sanityn-1/file1
            

            Maybe the parameter changed to qos_thresholdrr?

            jamesanunez James Nunez (Inactive) added a comment - I don't think this is fixed. Looking at the results for the patch https://review.whamcloud.com/36667/ at https://testing.whamcloud.com/test_sets/59e536a8-ff8d-11e9-8e77-52540065bddc , we see that qos_threshold_rr is missing: == sanityn test 93: alloc_rr should not allocate on same ost ========================================= 03:24:52 (1572924292) CMD: trevis-18vm4 lctl get_param -n lod.lustre-MDT*/qos_threshold_rr trevis-18vm4: error: get_param: param_path 'lod/lustre-MDT*/qos_threshold_rr': No such file or directory CMD: trevis-18vm4 lctl set_param -n lod.lustre-MDT*/qos_threshold_rr 100 trevis-18vm4: error: set_param: param_path 'lod/lustre-MDT*/qos_threshold_rr': No such file or directory CMD: trevis-18vm4 lctl set_param fail_loc=0x00000163 fail_loc=0x00000163 CMD: trevis-18vm4 lctl set_param fail_loc=0x0 fail_loc=0x0 CMD: trevis-18vm4 lctl set_param -n 'lod.lustre-MDT*/qos_threshold_rr' trevis-18vm4: error: set_param: setting lod.lustre-MDT*/qos_threshold_rr: no value /mnt/lustre/f93.sanityn-1/file1 Maybe the parameter changed to qos_thresholdrr?
            pjones Peter Jones added a comment -

            Landed for 2.13

            pjones Peter Jones added a comment - Landed for 2.13

            Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/36666/
            Subject: LU-12932 tests: remove obsolete qos.sh test script
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 15443866cc98b1c81551ce8d2172b2902c51eebd

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/36666/ Subject: LU-12932 tests: remove obsolete qos.sh test script Project: fs/lustre-release Branch: master Current Patch Set: Commit: 15443866cc98b1c81551ce8d2172b2902c51eebd

            Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/36667/
            Subject: LU-12932 lod: restore qos_thresholdrr sysfs file
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 9ad541aa34002eb0f3d19ba9512b713ffcaf77bc

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/36667/ Subject: LU-12932 lod: restore qos_thresholdrr sysfs file Project: fs/lustre-release Branch: master Current Patch Set: Commit: 9ad541aa34002eb0f3d19ba9512b713ffcaf77bc

            James Simmons (jsimmons@infradead.org) uploaded a new patch: https://review.whamcloud.com/36667
            Subject: LU-12932 lod: restore qos_thresholdrr sysfs file
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 88a518e7bdada1ed92278c6bd50b9b37b0ac6ca1

            gerrit Gerrit Updater added a comment - James Simmons (jsimmons@infradead.org) uploaded a new patch: https://review.whamcloud.com/36667 Subject: LU-12932 lod: restore qos_thresholdrr sysfs file Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 88a518e7bdada1ed92278c6bd50b9b37b0ac6ca1

            People

              laisiyao Lai Siyao
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: