[LU-3764] sanity test_116a: stripe QOS didn't balance free space Created: 15/Aug/13  Updated: 07/Jul/17  Resolved: 23/Dec/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0, Lustre 2.6.0
Fix Version/s: Lustre 2.6.0, Lustre 2.5.1, Lustre 2.10.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: James Nunez (Inactive)
Resolution: Fixed Votes: 0
Labels: mn4

Issue Links:
Related
is related to LU-3880 Make error_ignore accept a general st... Resolved
Severity: 3
Rank (Obsolete): 9696

 Description   

This issue was created by maloo for girish <gshilamkar@ddn.com>

This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/d8c6d5b4-0537-11e3-925a-52540035b04c.

The sub-test test_116a failed with the following error:

== sanity test 116a: stripe QOS: free space balance ===================== 22:07:35 (1376456855)
Free space priority error: get_param: /proc/

Unknown macro: {fs,sys}

/

Unknown macro: {lnet,lustre}

/lov/clilov/qos_prio_free: Found no match
CMD: client-26vm7 lctl set_param -n osd*.MD.force_sync 1
CMD: client-26vm7 lctl get_param -n osc.MDT.sync_*
CMD: client-26vm7 lctl get_param -n osc.MDT.sync_*
CMD: client-26vm7 lctl get_param -n osc.MDT.sync_*
CMD: client-26vm7 lctl get_param -n osc.MDT.sync_*
CMD: client-26vm7 lctl get_param -n osc.MDT.sync_*
CMD: client-26vm7 lctl get_param -n osc.MDT.sync_*
CMD: client-26vm7 lctl get_param -n osc.MDT.sync_*
CMD: client-26vm7 lctl get_param -n osc.MDT.sync_*
CMD: client-26vm7 lctl get_param -n osc.MDT.sync_*
Waiting for local destroys to complete
OST kbytes available: 163812 172220 172220 163816 172240 161968 172000
Min free space: OST 5: 161968
Max free space: OST 4: 172240
Filling 25% remaining space in OST5 with 40492Kb
....................CMD: client-26vm7 lctl get_param -n lov.*.qos_maxage
Waiting for local destroys to complete
OST kbytes available: 164036 172220 172000 164036 172240 112592 172224
Min free space: OST 5: 112592
Max free space: OST 4: 172240
diff=59648=52% must be > 20% for QOS mode...ok
writing a bunch of files to QOS-assigned OSTs
...........................................................................................................................................................................................................wrote 203 200k files
CMD: client-26vm7 lctl get_param -n lov.*.qos_maxage
Note: free space may not be updated, so measurements might be off
Waiting for local destroys to complete
OST kbytes available: 155036 164800 163420 158036 166440 113608 164624
Min free space: OST 5: 113608
Max free space: OST 4: 166440
free space delta: orig 59648 final 52832
Wrote -1016 to smaller OST 5
Wrote 5800 to larger OST 4
lustre-OST0005_UUID
435 files created on smaller OST 5
lustre-OST0004_UUID
371 files created on larger OST 4
Wrote -15% more files to larger OST 4
sanity test_116a: @@@@@@ IGNORE (bzstripe QOS didn't balance free space):
Trace dump:
= /usr/lib64/lustre/tests/test-framework.sh:4202:error_noexit()
= /usr/lib64/lustre/tests/test-framework.sh:4243:error_ignore()
= /usr/lib64/lustre/tests/sanity.sh:6659:test_116a()
= /usr/lib64/lustre/tests/test-framework.sh:4483:run_one()
= /usr/lib64/lustre/tests/test-framework.sh:4516:run_one_logged()
= /usr/lib64/lustre/tests/test-framework.sh:4371:run_test()
= /usr/lib64/lustre/tests/sanity.sh:6663:main()
Dumping lctl log to /logdir/test_logs/2013-08-13/lustre-reviews-el6-x86_64-review-2_4_1_17301_-70153027810520-204146/sanity.test_116a.*.1376456900.log
CMD: client-26vm1,client-26vm2.lab.whamcloud.com,client-26vm7,client-26vm8 /usr/sbin/lctl dk > /logdir/test_logs/2013-08-13/lustre-reviews-el6-x86_64-review-2_4_1_17301_-70153027810520-204146/sanity.test_116a.debug_log.\$(hostname -s).1376456900.log;
dmesg > /logdir/test_logs/2013-08-13/lustre-reviews-el6-x86_64-review-2_4_1_17301_-70153027810520-204146/sanity.test_116a.dmesg.\$(hostname -s).1376456900.log
Resetting fail_loc on all nodes...CMD: client-26vm1,client-26vm2.lab.whamcloud.com,client-26vm7,client-26vm8 lctl set_param -n fail_loc=0 2>/dev/null || true
done.
CMD: client-26vm1,client-26vm7,client-26vm8 rc=\$([ -f /proc/sys/lnet/catastrophe ] &&
echo \$(< /proc/sys/lnet/catastrophe) || echo 0);
if [ \$rc -ne 0 ]; then echo \$(hostname): \$rc; fi
exit \$rc

Info required for matching: sanity 116a



 Comments   
Comment by Jian Yu [ 29/Aug/13 ]

Another instance:
https://maloo.whamcloud.com/test_sets/d39202b8-0f81-11e3-9bce-52540035b04c

Comment by John Hammond [ 11/Sep/13 ]

This test uses error_ignore so according to the test author's wishes it shouldn't be counted as a failure. Note that it uses error_ignore incorrectly since it only passes one argument. But used correctly or not, calling error_ignore causes the test framework to interpret the test as a failed.

Comment by James Nunez (Inactive) [ 11/Sep/13 ]

The incorrect call to error_ignore is fixed, with a fake bugzilla bug number, in the patch for LU-3640 at http://review.whamcloud.com/#/c/7132/. LU-3880 will allow error_ignore to take a general string for a bug/ticket number, i.e. get rid of the bugzilla assumption. I think adding the bug number should fix this error_ignore triggering an error, but I'll make sure.

Comment by James Nunez (Inactive) [ 14/Sep/13 ]

Calling error_ignore with two arguments, interpreted as a bug number and a comment, stops the error message "error() without useful message, please fix", but it does not stop the test framework system from classifying the test status as a failure.

The problem is that error_noexit writes the comment sent to error_ignore to the LOGDIR/err file. If there is an err file, the pass routine assumes the test failed and reports the test status as FAIL. So, for errors that should be ignored, we probably don't want to be writing the comment out to the err file. Maybe the ignore message should be written out to an ignore file?

Comment by James Nunez (Inactive) [ 26/Sep/13 ]

Proposed patch at: http://review.whamcloud.com/7782

Comment by Andreas Dilger [ 08/Oct/13 ]

Another minor patch to clean up the code style of test_116a: http://review.whamcloud.com/7882

Comment by Bob Glossman (Inactive) [ 11/Oct/13 ]

another:
https://maloo.whamcloud.com/test_sets/23af00fc-320f-11e3-905d-52540035b04c

Comment by James Nunez (Inactive) [ 25/Oct/13 ]

Landed to master

Comment by Dmitry Eremin (Inactive) [ 22/Nov/13 ]

This bug prevents testing in b2_5. We need it in b2_5.

https://maloo.whamcloud.com/sub_tests/846cc906-4817-11e3-9f6d-52540035b04c

Comment by James Nunez (Inactive) [ 07/Dec/13 ]

Thanks to Dmitry, patch for b2_5 at http://review.whamcloud.com/#/c/8395/

Comment by Bob Glossman (Inactive) [ 17/Apr/14 ]

backport to b2_4
http://review.whamcloud.com/9996

Comment by Gerrit Updater [ 24/Jan/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/7882/
Subject: LU-3764 tests: clean up sanity test_116a code style
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: f4af6c845b836154e3791d28f02709ea53a4e841

Generated at Sat Feb 10 01:36:42 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.