Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.14.0
-
None
-
3
-
9223372036854775807
Description
parallel-scale test_rr_alloc fails with 'failed while setting qos_threshold_rr & creat_count' which is a good description of what happened. This looks a lot like LU-12932, but that ticket was closed over a year ago. The last parallel-scale test rr_alloc failure with this error was probably on 20 JAN 2020. We see this test fail with this failure again on 27 SEPT 2020 for Lustre 2.13.56 at https://testing.whamcloud.com/test_sets/23e9ef62-55b0-4bc9-bc7c-d7f10c079221.
Looking at a recent failure at https://testing.whamcloud.com/test_sets/502ab60d-78a5-4bd5-bac9-a895f4e9d631, we see
CMD: trevis-200vm4 /usr/sbin/lctl set_param -n lod.lustre-MDT0000*.qos_threshold_rr=100 osp.lustre-OST*-osc-MDT0000.create_count=3052 CMD: trevis-200vm5 /usr/sbin/lctl set_param -n lod.lustre-MDT0000*.qos_threshold_rr=100 osp.lustre-OST*-osc-MDT0000.create_count=3052 trevis-200vm5: error: set_param: param_path 'lod/lustre-MDT0000*/qos_threshold_rr': No such file or directory trevis-200vm5: error: set_param: param_path 'osp/lustre-OST*-osc-MDT0000/create_count': No such file or directory pdsh@trevis-200vm1: trevis-200vm5: ssh exited with exit code 2 parallel-scale test_rr_alloc: @@@@@@ FAIL: failed while setting qos_threshold_rr & creat_count Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:6273:error() = /usr/lib64/lustre/tests/functions.sh:1072:run_rr_alloc() = /usr/lib64/lustre/tests/parallel-scale.sh:163:test_rr_alloc()
There are many failures since September 2020. Here a links to a few of these failures:
https://testing.whamcloud.com/test_sets/5209ec71-aefe-49d8-8e23-65d9ca333e1f
https://testing.whamcloud.com/test_sets/02fe011f-4f52-425b-a960-da6c022819c1
https://testing.whamcloud.com/test_sets/83259a8b-cc30-4e8b-ab0f-599774f5f7e8