[LU-14303] parallel-scale test rr_alloc fails with 'failed while setting qos_threshold_rr & creat_count' Created: 07/Jan/21  Updated: 22/Jan/21  Resolved: 22/Jan/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.14.0
Fix Version/s: Lustre 2.14.0

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: Yang Sheng
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-12932 parallel-scale test rr_alloc fails wi... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

parallel-scale test_rr_alloc fails with 'failed while setting qos_threshold_rr & creat_count' which is a good description of what happened. This looks a lot like LU-12932, but that ticket was closed over a year ago. The last parallel-scale test rr_alloc failure with this error was probably on 20 JAN 2020. We see this test fail with this failure again on 27 SEPT 2020 for Lustre 2.13.56 at https://testing.whamcloud.com/test_sets/23e9ef62-55b0-4bc9-bc7c-d7f10c079221.

Looking at a recent failure at https://testing.whamcloud.com/test_sets/502ab60d-78a5-4bd5-bac9-a895f4e9d631, we see

CMD: trevis-200vm4 /usr/sbin/lctl set_param -n 			lod.lustre-MDT0000*.qos_threshold_rr=100 			osp.lustre-OST*-osc-MDT0000.create_count=3052
CMD: trevis-200vm5 /usr/sbin/lctl set_param -n 			lod.lustre-MDT0000*.qos_threshold_rr=100 			osp.lustre-OST*-osc-MDT0000.create_count=3052
trevis-200vm5: error: set_param: param_path 'lod/lustre-MDT0000*/qos_threshold_rr': No such file or directory
trevis-200vm5: error: set_param: param_path 'osp/lustre-OST*-osc-MDT0000/create_count': No such file or directory
pdsh@trevis-200vm1: trevis-200vm5: ssh exited with exit code 2
 parallel-scale test_rr_alloc: @@@@@@ FAIL: failed while setting qos_threshold_rr & creat_count 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:6273:error()
  = /usr/lib64/lustre/tests/functions.sh:1072:run_rr_alloc()
  = /usr/lib64/lustre/tests/parallel-scale.sh:163:test_rr_alloc()

There are many failures since September 2020. Here a links to a few of these failures:
https://testing.whamcloud.com/test_sets/5209ec71-aefe-49d8-8e23-65d9ca333e1f
https://testing.whamcloud.com/test_sets/02fe011f-4f52-425b-a960-da6c022819c1
https://testing.whamcloud.com/test_sets/83259a8b-cc30-4e8b-ab0f-599774f5f7e8



 Comments   
Comment by Peter Jones [ 07/Jan/21 ]

Yang Sheng

Can you please advise?

Thanks

Peter

Comment by Gerrit Updater [ 11/Jan/21 ]

Yang Sheng (ys@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41192
Subject: LU-14303 tests: parallel-scale test rr_alloc fails
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 5401776ffb1383217d30fc6025ca825d8383a287

Comment by Gerrit Updater [ 22/Jan/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/41192/
Subject: LU-14303 tests: parallel-scale test rr_alloc fails
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 2a8ffe262a035dacba38b2ea29776b76a083acb9

Comment by Peter Jones [ 22/Jan/21 ]

Landed for 2.14

Generated at Sat Feb 10 03:08:35 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.