[LU-798] Test failure on test suite performance-sanity,subtest test_5 Created: 27/Oct/11 Updated: 13/Jul/16 Resolved: 13/Jul/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.2, Lustre 2.1.3, Lustre 2.1.4, Lustre 1.8.8 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Maloo | Assignee: | Zhenyu Xu |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 5794 | ||||
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/6924c914-00ed-11e1-bb4f-52540025f9af. |
| Comments |
| Comment by Peter Jones [ 27/Oct/11 ] |
|
Bobijam will look into this one |
| Comment by Zhenyu Xu [ 27/Oct/11 ] |
|
from the dmesg log, I don't see any error happens during the test except test_5 took almost an hour, I suspect its 3600 seconds test limit interrupt the test. client dmesg Lustre: DEBUG MARKER: == performance-sanity test 5: lookup rate 10M file dir =============================================== 09:45:59 (1319733959)
Lustre: DEBUG MARKER: ===== mdsrate-lookup-1dir.sh ======
Lustre: DEBUG MARKER: Using TIMEOUT=20
Lustre: DEBUG MARKER: ===== mdsrate-lookup-1dir.sh Test preparation: creating 1000000 files.
Lustre: DEBUG MARKER: ===== mdsrate-lookup-1dir.sh
Lustre: DEBUG MARKER: ===== mdsrate-lookup-1dir.sh
Lustre: DEBUG MARKER: == mdsrate-lookup-1dir mdsrate-lookup-1dir.sh: test complete, cleaning up ============================ 10:33:31 (1319736811)
SysRq : Show State
sibling
task PC pid father child younger older
init S ffff81007e3567e0 0 1 0 2 (NOTLB)
ffff81007ff95a28 0000000000000082 00000000000280d2 ffff810000017c10
ffff81007ff857a0 000000000000000a ffff81007ff857a0 ffff81007e3567e0
00000aabaa90b8db 0000000000006e20 ffff81007ff85988 0000000088031cb7
|
| Comment by Zhenyu Xu [ 28/Oct/11 ] |
|
I checked maloo for some passed performance-sanity test_5 to get a rough understanding how much time was needed for successfull test. This test case https://maloo.whamcloud.com/sub_tests/9e94a664-f710-11e0-a451-52540025f9af takes 2638s, while it only test with 433075 files (by checking test script, I think it was set by NUM_FILES environment variable) This test case https://maloo.whamcloud.com/sub_tests/116bdd9a-e467-11e0-9909-52540025f9af takes 2493s with 452393 files. 1M files test cases all timed out on cleaning up phase. |
| Comment by Jian Yu [ 01/Nov/11 ] |
|
performance-sanity test 5 passed on v1_8_7_WC1_RC1 in 2692s with 438805 files: UUID Inodes IUsed IFree IUse% Mounted on lustre-MDT0000_UUID 5000040 52 4999988 0% /mnt/lustre[MDT:0] lustre-OST0000_UUID 445440 2202 443238 0% /mnt/lustre[OST:0] lustre-OST0001_UUID 445440 88 445352 0% /mnt/lustre[OST:1] lustre-OST0002_UUID 445440 89 445351 0% /mnt/lustre[OST:2] lustre-OST0003_UUID 445440 89 445351 0% /mnt/lustre[OST:3] lustre-OST0004_UUID 445440 89 445351 0% /mnt/lustre[OST:4] lustre-OST0005_UUID 445440 89 445351 0% /mnt/lustre[OST:5] lustre-OST0006_UUID 445440 89 445351 0% /mnt/lustre[OST:6] filesystem summary: 5000040 52 4999988 0% /mnt/lustre ===== mdsrate-lookup-1dir.sh Test preparation: creating 438805 files. The number of files was calculated as follows: mdsrate-lookup-1dir.sh:
NUM_FILES=${NUM_FILES:-1000000}
IFree=$(mdsrate_inodes_available)
if [ $IFree -lt $NUM_FILES ]; then
NUM_FILES=$IFree
fi
test-framework.sh:
inodes_available () {
local IFree=$($LFS df -i $MOUNT | grep ^$FSNAME | awk '{print $4}' | sort -un | head -1) || return 1
echo $IFree
}
mdsrate_inodes_available () {
local min_inodes=$(inodes_available)
echo $((min_inodes * 99 / 100))
}
So, it depends on the minimum number of free inodes on MDT and OST. For RC1, the OSTSIZE was about 30G and the min IFree was about 443238, however, for RC2, the OSTSIZE on the vm cluster was about 150G and the min IFree was about 2246430: UUID Inodes IUsed IFree IUse% Mounted on lustre-MDT0000_UUID 5000040 52 4999988 0% /mnt/lustre[MDT:0] lustre-OST0000_UUID 2257920 11490 2246430 1% /mnt/lustre[OST:0] lustre-OST0001_UUID 2257920 89 2257831 0% /mnt/lustre[OST:1] lustre-OST0002_UUID 2257920 89 2257831 0% /mnt/lustre[OST:2] lustre-OST0003_UUID 2257920 88 2257832 0% /mnt/lustre[OST:3] lustre-OST0004_UUID 2257920 89 2257831 0% /mnt/lustre[OST:4] lustre-OST0005_UUID 2257920 89 2257831 0% /mnt/lustre[OST:5] lustre-OST0006_UUID 2257920 89 2257831 0% /mnt/lustre[OST:6] filesystem summary: 5000040 52 4999988 0% /mnt/lustre ===== mdsrate-lookup-1dir.sh Test preparation: creating 1000000 files. So, this is the similar issue as Chris, could you please take a look at this issue and make some adjustment on the OSTSIZE or timeout value? Thanks! |
| Comment by Peter Jones [ 03/Nov/11 ] |
|
> Chris, could you please take a look at this issue and make some adjustment on the OSTSIZE or timeout value? Thanks! Bobijam, it would be better to make such requests via a TT JIRA ticket. |
| Comment by Sarah Liu [ 03/Nov/11 ] |
|
the remaining tests of performance-sanity are pass: https://maloo.whamcloud.com/test_sets/7d4cb962-0668-11e1-9433-52540025f9af |
| Comment by Jian Yu [ 11/May/12 ] |
|
Lustre Build: http://build.whamcloud.com/job/lustre-b1_8/194/ The same issue occurred: https://maloo.whamcloud.com/test_sets/b8d82eb2-9b50-11e1-a0a0-52540035b04c |
| Comment by Jian Yu [ 31/May/12 ] |
|
Lustre Tag: v2_1_2_RC2 performance-sanity test 6 timed out: https://maloo.whamcloud.com/test_sets/9b7b720a-aa34-11e1-971d-52540035b04c |
| Comment by Jian Yu [ 21/Aug/12 ] |
|
More instances occurred while testing Lustre 2.1.3 RC2: |
| Comment by Jian Yu [ 10/Dec/12 ] |
|
Lustre Branch: b2_1 The same issue occurred: |
| Comment by Zhenyu Xu [ 13/Dec/12 ] |
|
b2_1 patch tracking at http://review.whamcloud.com/4830 commit message LU-798 test: reduce performance test file number
The performance test takes too much time deleting all the test files
which timedout the test suite.
Test-Parameters: envdefinitions=SLOW=yes,ENABLE_QUOTA=yes testlist=performance-sanity
|
| Comment by Jian Yu [ 17/Dec/12 ] |
|
Lustre Tag: v2_1_4_RC1 https://maloo.whamcloud.com/test_sets/07afddb8-4831-11e2-8cdc-52540035b04c |
| Comment by Jian Yu [ 26/Dec/12 ] |
|
From the historical Maloo reports, I found that all of the passed performance-sanity test runs were performed with about 450000 files, while all of the timeouted performance-sanity test runs were performed with 1000000 files due to large OSTSIZE. |
| Comment by Jian Yu [ 26/Dec/12 ] |
|
Hello Chris, Oleg suggested to adjust the length of time that is allocated to the performance-sanity test so that it has time to complete. Could you please increase the autotest timeout value for the performance-sanity test? Thanks. |
| Comment by Chris Gearing (Inactive) [ 02/Jan/13 ] |
|
performance-sanity is set to 2 hours for test_4, test_5 and test_7 so I fear the code might be broken in autotest. I need to figure out how to investigate. |
| Comment by Chris Gearing (Inactive) [ 03/Jan/13 ] |
|
OK. What confused me here is that this bug dates back to 2011 and so many of the examples have been fixed a long time ago. I've increased the timeout for test_7 to 7200 which makes it like 5,6 and 8. |
| Comment by Chris Gearing (Inactive) [ 03/Jan/13 ] |
|
Looking at the recent examples this issue should now be fixed. |
| Comment by Jian Yu [ 13/Jan/13 ] |
|
Hi Chris, The performance-sanity test timed out on the latest b1_8 build #241. This time it's sub-test 8: https://maloo.whamcloud.com/test_sets/79fc558c-5ce9-11e2-afaf-52540035b04c I did not find any Lustre error messages in the logs. So, it's still due to low performance and needs more time. |
| Comment by Andreas Dilger [ 13/Jul/16 ] |
|
Closing this old bug. |