[LU-16330] recovery-small test_152: QoS allocation slower than RR, killable semaphore doesn't work Created: 21/Nov/22  Updated: 21/Dec/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Dongyang Li <dongyangli@ddn.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/6423df40-8be8-4ceb-9efc-e35a41855158

test_152 failed with the following error:

QoS allocation slower than RR, killable semaphore doesn't work
== recovery-small test 152: QoS object allocation could be awakened in case of OST failover ========================================================== 08:52:19 (1668761539)
CMD: trevis-60vm4 uname -r
striped dir -i0 -c1 -H crush2 /mnt/lustre/d152.recovery-small
striped dir -i0 -c1 -H fnv_1a_64 /mnt/lustre/d152.recovery-small/rr
striped dir -i0 -c1 -H crush2 /mnt/lustre/d152.recovery-small/qos
CMD: trevis-60vm4 /usr/sbin/lctl set_param fail_loc=0x80000173 fail_val=20
fail_loc=0x80000173
fail_val=20
CMD: trevis-60vm4 /usr/sbin/lctl get_param -n lov.*0000*.qos_threshold_rr
CMD: trevis-60vm4 /usr/sbin/lctl set_param lov.*.qos_threshold_rr=0
lov.lustre-MDT0000-mdtlov.qos_threshold_rr=0
lov.lustre-MDT0002-mdtlov.qos_threshold_rr=0
QoS allocation took 21 seconds
 recovery-small test_152: @@@@@@ FAIL: QoS allocation slower than RR, killable semaphore doesn't work 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:6532:error()
  = /usr/lib64/lustre/tests/recovery-small.sh:3403:test_152()
  = /usr/lib64/lustre/tests/test-framework.sh:6868:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:6918:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:6755:run_test()
  = /usr/lib64/lustre/tests/recovery-small.sh:3405:main()
Dumping lctl log to /autotest/autotest-2/2022-11-18/lustre-reviews_review-dne-zfs-part-5_90735_16_7d42ba00-0350-47d9-88c8-13488e0d2b82//recovery-small.test_152.*.1668761581.log
CMD: trevis-60vm1.trevis.whamcloud.com,trevis-60vm2,trevis-60vm3,trevis-60vm4,trevis-60vm5 /usr/sbin/lctl dk > /autotest/autotest-2/2022-11-18/lustre-reviews_review-dne-zfs-part-5_90735_16_7d42ba00-0350-47d9-88c8-13488e0d2b82//recovery-small.test_152.debug_log.\$(hostname -s).1668761581.log;
		dmesg > /autotest/autotest-2/2022-11-18/lustre-reviews_review-dne-zfs-part-5_90735_16_7d42ba00-0350-47d9-88c8-13488e0d2b82//recovery-small.test_152.dmesg.\$(hostname -s).1668761581.log
CMD: trevis-60vm4 /usr/sbin/lctl set_param lov.*.qos_threshold_rr=17%
lov.lustre-MDT0000-mdtlov.qos_threshold_rr=17%
lov.lustre-MDT0002-mdtlov.qos_threshold_rr=17%

checked log, no seq rollover happening and seq is not used up for sync creation. Could not reproduce with the same env on local box. Slow disk?

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
recovery-small test_152 - QoS allocation slower than RR, killable semaphore doesn't work


Generated at Sat Feb 10 03:26:02 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.