Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16330

recovery-small test_152: QoS allocation slower than RR, killable semaphore doesn't work

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Dongyang Li <dongyangli@ddn.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/6423df40-8be8-4ceb-9efc-e35a41855158

      test_152 failed with the following error:

      QoS allocation slower than RR, killable semaphore doesn't work
      
      == recovery-small test 152: QoS object allocation could be awakened in case of OST failover ========================================================== 08:52:19 (1668761539)
      CMD: trevis-60vm4 uname -r
      striped dir -i0 -c1 -H crush2 /mnt/lustre/d152.recovery-small
      striped dir -i0 -c1 -H fnv_1a_64 /mnt/lustre/d152.recovery-small/rr
      striped dir -i0 -c1 -H crush2 /mnt/lustre/d152.recovery-small/qos
      CMD: trevis-60vm4 /usr/sbin/lctl set_param fail_loc=0x80000173 fail_val=20
      fail_loc=0x80000173
      fail_val=20
      CMD: trevis-60vm4 /usr/sbin/lctl get_param -n lov.*0000*.qos_threshold_rr
      CMD: trevis-60vm4 /usr/sbin/lctl set_param lov.*.qos_threshold_rr=0
      lov.lustre-MDT0000-mdtlov.qos_threshold_rr=0
      lov.lustre-MDT0002-mdtlov.qos_threshold_rr=0
      QoS allocation took 21 seconds
       recovery-small test_152: @@@@@@ FAIL: QoS allocation slower than RR, killable semaphore doesn't work 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:6532:error()
        = /usr/lib64/lustre/tests/recovery-small.sh:3403:test_152()
        = /usr/lib64/lustre/tests/test-framework.sh:6868:run_one()
        = /usr/lib64/lustre/tests/test-framework.sh:6918:run_one_logged()
        = /usr/lib64/lustre/tests/test-framework.sh:6755:run_test()
        = /usr/lib64/lustre/tests/recovery-small.sh:3405:main()
      Dumping lctl log to /autotest/autotest-2/2022-11-18/lustre-reviews_review-dne-zfs-part-5_90735_16_7d42ba00-0350-47d9-88c8-13488e0d2b82//recovery-small.test_152.*.1668761581.log
      CMD: trevis-60vm1.trevis.whamcloud.com,trevis-60vm2,trevis-60vm3,trevis-60vm4,trevis-60vm5 /usr/sbin/lctl dk > /autotest/autotest-2/2022-11-18/lustre-reviews_review-dne-zfs-part-5_90735_16_7d42ba00-0350-47d9-88c8-13488e0d2b82//recovery-small.test_152.debug_log.\$(hostname -s).1668761581.log;
      		dmesg > /autotest/autotest-2/2022-11-18/lustre-reviews_review-dne-zfs-part-5_90735_16_7d42ba00-0350-47d9-88c8-13488e0d2b82//recovery-small.test_152.dmesg.\$(hostname -s).1668761581.log
      CMD: trevis-60vm4 /usr/sbin/lctl set_param lov.*.qos_threshold_rr=17%
      lov.lustre-MDT0000-mdtlov.qos_threshold_rr=17%
      lov.lustre-MDT0002-mdtlov.qos_threshold_rr=17%
      

      checked log, no seq rollover happening and seq is not used up for sync creation. Could not reproduce with the same env on local box. Slow disk?

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      recovery-small test_152 - QoS allocation slower than RR, killable semaphore doesn't work

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: