[LU-797] Test failure on test suite ost-pools, subtest test_14, test_18, test_23 Created: 27/Oct/11  Updated: 22/Feb/13  Resolved: 19/Dec/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.2.0, Lustre 2.1.2, Lustre 2.1.3, Lustre 2.1.4, Lustre 1.8.8
Fix Version/s: Lustre 2.4.0, Lustre 2.1.4, Lustre 1.8.9

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: Niu Yawei (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by LU-1001 Test failure on test suite ost-pools,... Resolved
is duplicated by LU-876 Improvement to ost-pools.sh Resolved
Related
Severity: 3
Rank (Obsolete): 4734

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/f6c809ee-00ec-11e1-bb4f-52540025f9af.

The sub-test test_14 failed with the following error:

test failed to respond and timed out

Info required for matching: ost-pools 14



 Comments   
Comment by Peter Jones [ 27/Oct/11 ]

Niu will look into this one

Comment by Niu Yawei (Inactive) [ 29/Oct/11 ]

Look at the client test log:

Filling up OST0
14188+0 records in
14188+0 records out
14877196288 bytes (15 GB) copied, 836.591 seconds, 17.8 MB/s
14188+0 records in
14188+0 records out
14877196288 bytes (15 GB) copied, 908.419 seconds, 16.4 MB/s
14188+0 records in
14188+0 records out
14877196288 bytes (15 GB) copied, 922.504 seconds, 16.1 MB/s

It's filling up OST0 (which is ~150G), and it has already used (836 + 908 + 922)=2666 seconds for only 45G, however the timeout of this test is 3600, so it just doesn't have enough time to fill up the OST0.

I think this is a test issue, should we adjust the timeout according to the OST size or just skip the test if the OST is too big?

Comment by Mikhail Pershin [ 31/Oct/11 ]

This is similar to test 220 issue in sanity.sh, LU-748. Test must be moved to SLOW group at least, but ideally all such tests must use smaller OST specially created for testing or special OBD_FAIL_LOC which would simulate OST small size.

Comment by Jian Yu [ 01/Nov/11 ]

ost-pools test 14 passed on Lustre v1_8_7_WC1_RC1 in 1850s:
https://maloo.whamcloud.com/test_sets/a1a53668-f4cf-11e0-908b-52540025f9af

The test output showed that:

Filling up OST0
2756+0 records in
2756+0 records out
2889875456 bytes (2.9 GB) copied, 200.484 seconds, 14.4 MB/s
2756+0 records in
2756+0 records out
2889875456 bytes (2.9 GB) copied, 179.428 seconds, 16.1 MB/s
2756+0 records in
2756+0 records out
2889875456 bytes (2.9 GB) copied, 160.244 seconds, 18.0 MB/s
2756+0 records in
2756+0 records out
2889875456 bytes (2.9 GB) copied, 166.279 seconds, 17.4 MB/s
2756+0 records in
2756+0 records out
2889875456 bytes (2.9 GB) copied, 159.178 seconds, 18.2 MB/s
2756+0 records in
2756+0 records out
2889875456 bytes (2.9 GB) copied, 160.844 seconds, 18.0 MB/s
2756+0 records in
2756+0 records out
2889875456 bytes (2.9 GB) copied, 177.44 seconds, 16.3 MB/s
2756+0 records in
2756+0 records out
2889875456 bytes (2.9 GB) copied, 187.56 seconds, 15.4 MB/s
2756+0 records in
2756+0 records out
2889875456 bytes (2.9 GB) copied, 170.918 seconds, 16.9 MB/s
UUID                   1K-blocks        Used   Available Use% Mounted on
lustre-MDT0000_UUID      7498624      443048     6555576   6% /mnt/lustre[MDT:0]
lustre-OST0000_UUID     30192956    25817228     2846080  90% /mnt/lustre[OST:0]
lustre-OST0001_UUID     30192956      454760    28222820   2% /mnt/lustre[OST:1]
lustre-OST0002_UUID     30192956      454760    28222820   2% /mnt/lustre[OST:2]
lustre-OST0003_UUID     30192956      454760    28222820   2% /mnt/lustre[OST:3]
lustre-OST0004_UUID     30192956      454760    28222820   2% /mnt/lustre[OST:4]
lustre-OST0005_UUID     30192956      454760    28222820   2% /mnt/lustre[OST:5]
lustre-OST0006_UUID     30192956      454760    28222820   2% /mnt/lustre[OST:6]

filesystem summary:    211350692    28545788   172183000  14% /mnt/lustre

So, the OST size was about 30G.

Chris, does the "OSTSIZE" variable has different values on different vm clusters? Could you please adjust the size so that the test would not timeout and skipped?

FYI, due to the timeout of test 14, the remaining ost-pools subtests were not run.

Comment by Johann Lombardi (Inactive) [ 03/Nov/11 ]

Has anybody rerun the missing tests?

Comment by Sarah Liu [ 03/Nov/11 ]

I will rerun the missing tests for both lu-797 and lu-798 and keep you updated.

Comment by Sarah Liu [ 03/Nov/11 ]

the remaining tests of ost-pools pass: https://maloo.whamcloud.com/test_sets/caa784fe-0653-11e1-b578-52540025f9af

Comment by Andreas Dilger [ 12/Dec/11 ]

It looks like the failing tests on master are creating 10x 14GB files (consistent with 150GB OST size) and the passing 1.8.7 tests are successfully creating 10x 2.7GB files in 1850s. That means the tests on master would need at least 9500s to pass. That is completely unreasonable for a subtest like this, which should complete in a minute or less (given its low importance compared to other tests).

ost-pools.sh test_14 is testing the QOS space balancing for OST pools, which is similar to sanity.sh test_116 is doing for QOS space balance without pools. It seems that test_116 is just being skipped when the OST is too large.

Probably the best solution is to have an OBD_FAIL_CHECK() that can change the os_bfree number at the fsfilt or OSD level for the OST index as given in fail_val. Even better might be fail_val = "(free_fraction) << 16 | ost_idx" where free_fraction is the 1/256 of the free space (which gives about 0.5% granularity) so that the fail_loc can be useful in other tests.

That will avoid all of the IO in these tests and make them MUCH faster. It would be worthwhile to investigate other tests that need to fill up the filesystem to see if they can similarly be sped up.

Comment by nasf (Inactive) [ 26/Dec/11 ]

Another failure:

https://maloo.whamcloud.com/test_sets/bdf73f6c-2dc5-11e1-819b-5254004bbbd3

Comment by Li Wei (Inactive) [ 18/Jan/12 ]

https://maloo.whamcloud.com/test_sets/ecef3f7a-41db-11e1-9650-5254004bbbd3 (Orion LDiskFS)

Comment by Andreas Dilger [ 27/Jan/12 ]

I submitted change http://review.whamcloud.com/2028 to speed up the ost-pools script:

For test_14, since it is already reformatting the filesystem, it is just going to format with a smaller OST size. I'd rather prefer that we don't reformat the filesystem at all for these tests, and instead implement an OBD_FAIL_CHECK to allow it to run on any configuration, but this is a simple workaround for now that will reduce the time to run ost-pools by 1.5h on large systems.

Also, test_18 was taking a long time because it was trying to create and unlink 3 * 3 * 30000 files, which was also taking a very long time for slow VM systems. Instead, use the createmany feature to run for a fixed time instead of for a specific number of files, so that it can properly assess performance without taking too long or to short on different hardware configurations. This should also reduce the test time by 0.5h or so.

Comment by Build Master (Inactive) [ 02/Feb/12 ]

Integrated in lustre-master » x86_64,client,el5,ofa #449
LU-797 tests: speed up ost-pools tests (Revision eea698c944283b755882d8f504d2fcc8ea371bd8)

Result = SUCCESS
Oleg Drokin : eea698c944283b755882d8f504d2fcc8ea371bd8
Files :

  • lustre/tests/ost-pools.sh
  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 02/Feb/12 ]

Integrated in lustre-master » i686,client,el6,inkernel #449
LU-797 tests: speed up ost-pools tests (Revision eea698c944283b755882d8f504d2fcc8ea371bd8)

Result = SUCCESS
Oleg Drokin : eea698c944283b755882d8f504d2fcc8ea371bd8
Files :

  • lustre/tests/ost-pools.sh
  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 02/Feb/12 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #449
LU-797 tests: speed up ost-pools tests (Revision eea698c944283b755882d8f504d2fcc8ea371bd8)

Result = SUCCESS
Oleg Drokin : eea698c944283b755882d8f504d2fcc8ea371bd8
Files :

  • lustre/tests/test-framework.sh
  • lustre/tests/ost-pools.sh
Comment by Build Master (Inactive) [ 02/Feb/12 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #449
LU-797 tests: speed up ost-pools tests (Revision eea698c944283b755882d8f504d2fcc8ea371bd8)

Result = SUCCESS
Oleg Drokin : eea698c944283b755882d8f504d2fcc8ea371bd8
Files :

  • lustre/tests/ost-pools.sh
  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 02/Feb/12 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #449
LU-797 tests: speed up ost-pools tests (Revision eea698c944283b755882d8f504d2fcc8ea371bd8)

Result = SUCCESS
Oleg Drokin : eea698c944283b755882d8f504d2fcc8ea371bd8
Files :

  • lustre/tests/test-framework.sh
  • lustre/tests/ost-pools.sh
Comment by Build Master (Inactive) [ 02/Feb/12 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #449
LU-797 tests: speed up ost-pools tests (Revision eea698c944283b755882d8f504d2fcc8ea371bd8)

Result = SUCCESS
Oleg Drokin : eea698c944283b755882d8f504d2fcc8ea371bd8
Files :

  • lustre/tests/ost-pools.sh
  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 02/Feb/12 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #449
LU-797 tests: speed up ost-pools tests (Revision eea698c944283b755882d8f504d2fcc8ea371bd8)

Result = SUCCESS
Oleg Drokin : eea698c944283b755882d8f504d2fcc8ea371bd8
Files :

  • lustre/tests/test-framework.sh
  • lustre/tests/ost-pools.sh
Comment by Build Master (Inactive) [ 02/Feb/12 ]

Integrated in lustre-master » x86_64,server,el5,ofa #449
LU-797 tests: speed up ost-pools tests (Revision eea698c944283b755882d8f504d2fcc8ea371bd8)

Result = SUCCESS
Oleg Drokin : eea698c944283b755882d8f504d2fcc8ea371bd8
Files :

  • lustre/tests/test-framework.sh
  • lustre/tests/ost-pools.sh
Comment by Build Master (Inactive) [ 02/Feb/12 ]

Integrated in lustre-master » i686,server,el5,ofa #449
LU-797 tests: speed up ost-pools tests (Revision eea698c944283b755882d8f504d2fcc8ea371bd8)

Result = SUCCESS
Oleg Drokin : eea698c944283b755882d8f504d2fcc8ea371bd8
Files :

  • lustre/tests/ost-pools.sh
  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 02/Feb/12 ]

Integrated in lustre-master » i686,server,el5,inkernel #449
LU-797 tests: speed up ost-pools tests (Revision eea698c944283b755882d8f504d2fcc8ea371bd8)

Result = SUCCESS
Oleg Drokin : eea698c944283b755882d8f504d2fcc8ea371bd8
Files :

  • lustre/tests/test-framework.sh
  • lustre/tests/ost-pools.sh
Comment by Build Master (Inactive) [ 02/Feb/12 ]

Integrated in lustre-master » i686,server,el6,inkernel #449
LU-797 tests: speed up ost-pools tests (Revision eea698c944283b755882d8f504d2fcc8ea371bd8)

Result = SUCCESS
Oleg Drokin : eea698c944283b755882d8f504d2fcc8ea371bd8
Files :

  • lustre/tests/test-framework.sh
  • lustre/tests/ost-pools.sh
Comment by Build Master (Inactive) [ 02/Feb/12 ]

Integrated in lustre-master » x86_64,client,sles11,inkernel #449
LU-797 tests: speed up ost-pools tests (Revision eea698c944283b755882d8f504d2fcc8ea371bd8)

Result = SUCCESS
Oleg Drokin : eea698c944283b755882d8f504d2fcc8ea371bd8
Files :

  • lustre/tests/ost-pools.sh
  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 02/Feb/12 ]

Integrated in lustre-master » i686,client,el5,inkernel #449
LU-797 tests: speed up ost-pools tests (Revision eea698c944283b755882d8f504d2fcc8ea371bd8)

Result = SUCCESS
Oleg Drokin : eea698c944283b755882d8f504d2fcc8ea371bd8
Files :

  • lustre/tests/ost-pools.sh
  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 02/Feb/12 ]

Integrated in lustre-master » i686,client,el5,ofa #449
LU-797 tests: speed up ost-pools tests (Revision eea698c944283b755882d8f504d2fcc8ea371bd8)

Result = SUCCESS
Oleg Drokin : eea698c944283b755882d8f504d2fcc8ea371bd8
Files :

  • lustre/tests/ost-pools.sh
  • lustre/tests/test-framework.sh
Comment by Peter Jones [ 02/Feb/12 ]

Landed for 2.2

Comment by Niu Yawei (Inactive) [ 08/Feb/12 ]

Looks the test_23b is still timeout: https://maloo.whamcloud.com/test_sets/8bc6156c-4fef-11e1-9ebc-5254004bbbd3

== ost-pools test 23b: OST pools and OOS ============================================================= 21:02:13 (1328418133)
running as uid/gid/euid/egid 500/500/500/500, groups:
[touch] [/mnt/lustre/d0_runas_test/f5213]
client-29vm3: Pool lustre.testpool created
client-29vm3: OST lustre-OST0000_UUID added to pool lustre.testpool
client-29vm3: OST lustre-OST0003_UUID added to pool lustre.testpool
client-29vm3: OST lustre-OST0006_UUID added to pool lustre.testpool
1: 32768+0 records in
32768+0 records out
34359738368 bytes (34 GB) copied, 1560.75 seconds, 22.0 MB/s
2: 32768+0 records in
32768+0 records out
34359738368 bytes (34 GB) copied, 1700.81 seconds, 20.2 MB/s

This test (test_23b) adds 3 osts into a pool, and checks if the available space of the pool is less than $MAXFREE, however, it writes file over all osts at the end. We missed creating file in pool?

Comment by Niu Yawei (Inactive) [ 08/Feb/12 ]

Andreas, could you take a look?

Comment by Andreas Dilger [ 09/Feb/12 ]

You are correct, it appears the output files are not being created in the pool. However, looking back at the original test_23, that was true at that time also, so I'm not sure what the original intent of this test was. Looking further back into Git/CVS to determine if the test ran differently when it was introduced, I didn't see any significant changes to test_23 since 1.8.5

Note also that I have a second patch which is skipping test_23b() when SLOW=no, because I don't think this is a very useful test (http://review.whamcloud.com/2091).

Comment by Build Master (Inactive) [ 15/Feb/12 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #472
LU-797 tests: skip ost-pools.sh 23b when SLOW=no (Revision f7b4054cfc1d30fbbfd56acfe4b5a7a334de8212)

Result = SUCCESS
Oleg Drokin : f7b4054cfc1d30fbbfd56acfe4b5a7a334de8212
Files :

  • lustre/tests/ost-pools.sh
Comment by Build Master (Inactive) [ 15/Feb/12 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #472
LU-797 tests: skip ost-pools.sh 23b when SLOW=no (Revision f7b4054cfc1d30fbbfd56acfe4b5a7a334de8212)

Result = SUCCESS
Oleg Drokin : f7b4054cfc1d30fbbfd56acfe4b5a7a334de8212
Files :

  • lustre/tests/ost-pools.sh
Comment by Build Master (Inactive) [ 15/Feb/12 ]

Integrated in lustre-master » x86_64,client,el5,ofa #472
LU-797 tests: skip ost-pools.sh 23b when SLOW=no (Revision f7b4054cfc1d30fbbfd56acfe4b5a7a334de8212)

Result = SUCCESS
Oleg Drokin : f7b4054cfc1d30fbbfd56acfe4b5a7a334de8212
Files :

  • lustre/tests/ost-pools.sh
Comment by Build Master (Inactive) [ 15/Feb/12 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #472
LU-797 tests: skip ost-pools.sh 23b when SLOW=no (Revision f7b4054cfc1d30fbbfd56acfe4b5a7a334de8212)

Result = SUCCESS
Oleg Drokin : f7b4054cfc1d30fbbfd56acfe4b5a7a334de8212
Files :

  • lustre/tests/ost-pools.sh
Comment by Build Master (Inactive) [ 15/Feb/12 ]

Integrated in lustre-master » x86_64,client,sles11,inkernel #472
LU-797 tests: skip ost-pools.sh 23b when SLOW=no (Revision f7b4054cfc1d30fbbfd56acfe4b5a7a334de8212)

Result = SUCCESS
Oleg Drokin : f7b4054cfc1d30fbbfd56acfe4b5a7a334de8212
Files :

  • lustre/tests/ost-pools.sh
Comment by Build Master (Inactive) [ 15/Feb/12 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #472
LU-797 tests: skip ost-pools.sh 23b when SLOW=no (Revision f7b4054cfc1d30fbbfd56acfe4b5a7a334de8212)

Result = SUCCESS
Oleg Drokin : f7b4054cfc1d30fbbfd56acfe4b5a7a334de8212
Files :

  • lustre/tests/ost-pools.sh
Comment by Build Master (Inactive) [ 15/Feb/12 ]

Integrated in lustre-master » i686,client,el5,ofa #472
LU-797 tests: skip ost-pools.sh 23b when SLOW=no (Revision f7b4054cfc1d30fbbfd56acfe4b5a7a334de8212)

Result = SUCCESS
Oleg Drokin : f7b4054cfc1d30fbbfd56acfe4b5a7a334de8212
Files :

  • lustre/tests/ost-pools.sh
Comment by Build Master (Inactive) [ 15/Feb/12 ]

Integrated in lustre-master » i686,client,el6,inkernel #472
LU-797 tests: skip ost-pools.sh 23b when SLOW=no (Revision f7b4054cfc1d30fbbfd56acfe4b5a7a334de8212)

Result = SUCCESS
Oleg Drokin : f7b4054cfc1d30fbbfd56acfe4b5a7a334de8212
Files :

  • lustre/tests/ost-pools.sh
Comment by Build Master (Inactive) [ 15/Feb/12 ]

Integrated in lustre-master » x86_64,server,el5,ofa #472
LU-797 tests: skip ost-pools.sh 23b when SLOW=no (Revision f7b4054cfc1d30fbbfd56acfe4b5a7a334de8212)

Result = SUCCESS
Oleg Drokin : f7b4054cfc1d30fbbfd56acfe4b5a7a334de8212
Files :

  • lustre/tests/ost-pools.sh
Comment by Build Master (Inactive) [ 15/Feb/12 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #472
LU-797 tests: skip ost-pools.sh 23b when SLOW=no (Revision f7b4054cfc1d30fbbfd56acfe4b5a7a334de8212)

Result = SUCCESS
Oleg Drokin : f7b4054cfc1d30fbbfd56acfe4b5a7a334de8212
Files :

  • lustre/tests/ost-pools.sh
Comment by Build Master (Inactive) [ 15/Feb/12 ]

Integrated in lustre-master » i686,server,el6,inkernel #472
LU-797 tests: skip ost-pools.sh 23b when SLOW=no (Revision f7b4054cfc1d30fbbfd56acfe4b5a7a334de8212)

Result = SUCCESS
Oleg Drokin : f7b4054cfc1d30fbbfd56acfe4b5a7a334de8212
Files :

  • lustre/tests/ost-pools.sh
Comment by Build Master (Inactive) [ 15/Feb/12 ]

Integrated in lustre-master » i686,client,el5,inkernel #472
LU-797 tests: skip ost-pools.sh 23b when SLOW=no (Revision f7b4054cfc1d30fbbfd56acfe4b5a7a334de8212)

Result = SUCCESS
Oleg Drokin : f7b4054cfc1d30fbbfd56acfe4b5a7a334de8212
Files :

  • lustre/tests/ost-pools.sh
Comment by Build Master (Inactive) [ 15/Feb/12 ]

Integrated in lustre-master » i686,server,el5,ofa #472
LU-797 tests: skip ost-pools.sh 23b when SLOW=no (Revision f7b4054cfc1d30fbbfd56acfe4b5a7a334de8212)

Result = SUCCESS
Oleg Drokin : f7b4054cfc1d30fbbfd56acfe4b5a7a334de8212
Files :

  • lustre/tests/ost-pools.sh
Comment by Build Master (Inactive) [ 15/Feb/12 ]

Integrated in lustre-master » i686,server,el5,inkernel #472
LU-797 tests: skip ost-pools.sh 23b when SLOW=no (Revision f7b4054cfc1d30fbbfd56acfe4b5a7a334de8212)

Result = SUCCESS
Oleg Drokin : f7b4054cfc1d30fbbfd56acfe4b5a7a334de8212
Files :

  • lustre/tests/ost-pools.sh
Comment by Build Master (Inactive) [ 17/Feb/12 ]

Integrated in lustre-master » x86_64,server,el6,ofa #480
LU-797 tests: speed up ost-pools tests (Revision eea698c944283b755882d8f504d2fcc8ea371bd8)
LU-797 tests: skip ost-pools.sh 23b when SLOW=no (Revision f7b4054cfc1d30fbbfd56acfe4b5a7a334de8212)

Result = FAILURE
Oleg Drokin : eea698c944283b755882d8f504d2fcc8ea371bd8
Files :

  • lustre/tests/test-framework.sh
  • lustre/tests/ost-pools.sh

Oleg Drokin : f7b4054cfc1d30fbbfd56acfe4b5a7a334de8212
Files :

  • lustre/tests/ost-pools.sh
Comment by Build Master (Inactive) [ 17/Feb/12 ]

Integrated in lustre-master » x86_64,client,el6,ofa #480
LU-797 tests: speed up ost-pools tests (Revision eea698c944283b755882d8f504d2fcc8ea371bd8)
LU-797 tests: skip ost-pools.sh 23b when SLOW=no (Revision f7b4054cfc1d30fbbfd56acfe4b5a7a334de8212)

Result = FAILURE
Oleg Drokin : eea698c944283b755882d8f504d2fcc8ea371bd8
Files :

  • lustre/tests/ost-pools.sh
  • lustre/tests/test-framework.sh

Oleg Drokin : f7b4054cfc1d30fbbfd56acfe4b5a7a334de8212
Files :

  • lustre/tests/ost-pools.sh
Comment by Build Master (Inactive) [ 17/Feb/12 ]

Integrated in lustre-master » i686,client,el6,ofa #480
LU-797 tests: speed up ost-pools tests (Revision eea698c944283b755882d8f504d2fcc8ea371bd8)
LU-797 tests: skip ost-pools.sh 23b when SLOW=no (Revision f7b4054cfc1d30fbbfd56acfe4b5a7a334de8212)

Result = ABORTED
Oleg Drokin : eea698c944283b755882d8f504d2fcc8ea371bd8
Files :

  • lustre/tests/ost-pools.sh
  • lustre/tests/test-framework.sh

Oleg Drokin : f7b4054cfc1d30fbbfd56acfe4b5a7a334de8212
Files :

  • lustre/tests/ost-pools.sh
Comment by Jian Yu [ 20/May/12 ]

Lustre Tag: v1_8_8_WC1_RC2
Lustre Build: http://build.whamcloud.com/job/lustre-b1_8/197/
Distro/Arch: RHEL5.8/x86_64 (kernel version: 2.6.18-308.4.1.el5)
Network: TCP (1GigE)
ENABLE_QUOTA=yes

ost-pools test 14 timed out: https://maloo.whamcloud.com/test_sets/23fdf678-a258-11e1-a6cd-52540035b04c

Comment by Jian Yu [ 31/May/12 ]

Lustre Tag: v2_1_2_RC2
Lustre Build: http://build.whamcloud.com/job/lustre-b2_1/86/
Distro/Arch: RHEL6.2/x86_64(server), RHEL6.2/i686(client)
Network: TCP (1GigE)
ENABLE_QUOTA=yes

ost-pools test 23b timed out: https://maloo.whamcloud.com/test_sets/7d305f1e-aa33-11e1-971d-52540035b04c

Comment by Jian Yu [ 10/Jun/12 ]

Note also that I have a second patch which is skipping test_23b() when SLOW=no, because I don't think this is a very useful test (http://review.whamcloud.com/2091).

During the build/release testing cycle, all of the auster tests are run with SLOW=yes, so the ost-pools test 23b timeout issue still exists with large size OSTs:
http://tinyurl.com/c4gxcbn

Comment by Jian Yu [ 13/Aug/12 ]

More instances occurred during Lustre 2.1.3 RC1 testing:
https://maloo.whamcloud.com/test_sets/b66fac84-e490-11e1-af05-52540035b04c
https://maloo.whamcloud.com/test_sets/759997b4-e4be-11e1-af05-52540035b04c
https://maloo.whamcloud.com/test_sets/615958d4-e3ec-11e1-b6d3-52540035b04c

Comment by Jian Yu [ 21/Aug/12 ]

More instances occurred during Lustre 2.1.3 RC2 testing:
https://maloo.whamcloud.com/test_sets/9c6295ce-eb32-11e1-ba73-52540035b04c
https://maloo.whamcloud.com/test_sets/68c13076-eb3d-11e1-ba73-52540035b04c

Comment by Niu Yawei (Inactive) [ 05/Nov/12 ]

Look at the recent failure:

== ost-pools test 23b: OST pools and OOS ============================================================= 21:49:32 (1345438172)
running as uid/gid/euid/egid 500/500/500/500, groups:
 [touch] [/mnt/lustre/d0_runas_test/f13763]
CMD: client-30vm3 lctl pool_new lustre.testpool
client-30vm3: Pool lustre.testpool created
CMD: client-30vm6.lab.whamcloud.com lctl get_param -n lov.lustre-*.pools.testpool         2>/dev/null || echo foo
CMD: client-30vm3 lctl pool_add lustre.testpool lustre-OST[0000-0006/3]
client-30vm3: OST lustre-OST0000_UUID added to pool lustre.testpool
client-30vm3: OST lustre-OST0003_UUID added to pool lustre.testpool
client-30vm3: OST lustre-OST0006_UUID added to pool lustre.testpool
CMD: client-30vm6.lab.whamcloud.com lctl get_param -n lov.lustre-*.pools.testpool |
                           sort -u | tr '\n' ' ' 
1: 32768+0 records in
32768+0 records out
34359738368 bytes (34 GB) copied, 1656.82 seconds, 20.7 MB/s
2: 32768+0 records in
32768+0 records out
34359738368 bytes (34 GB) copied, 1761.27 seconds, 19.5 MB/s
3: 32768+0 records in
32768+0 records out
34359738368 bytes (34 GB) copied, 1921.64 seconds, 17.9 MB/s
4: 32768+0 records in
32768+0 records out
34359738368 bytes (34 GB) copied, 1941.35 seconds, 17.7 MB/s
5: 32768+0 records in
32768+0 records out
34359738368 bytes (34 GB) copied, 2098.63 seconds, 16.4 MB/s
6: 32768+0 records in
32768+0 records out
34359738368 bytes (34 GB) copied, 2030.07 seconds, 16.9 MB/s

The MAXFREE should be (2000000(KB) * 7(osts)) / (1024 * 1024) = 13GB, which means the available space should be less than 13GB in this pool, however, tests writes much more than the it and didn't fail with ENOSPC. I think we'd add more debug information in the tests, and also break the writting whenever written bytes is greater than available space.

Comment by Niu Yawei (Inactive) [ 05/Nov/12 ]

http://review.whamcloud.com/#change,4474

Comment by Niu Yawei (Inactive) [ 29/Nov/12 ]

remove the space of "lfs df" output interllay: http://review.whamcloud.com/4705

Comment by Emoly Liu [ 10/Dec/12 ]

remove the space of "lfs df" output interllay: http://review.whamcloud.com/4705

b2_1 port is at http://review.whamcloud.com/#change,4784

Comment by Jian Yu [ 10/Dec/12 ]

Lustre Branch: b2_1
Lustre Build: http://build.whamcloud.com/job/lustre-b2_1/148
Distro/Arch: RHEL6.3/x86_64 (kernel version: 2.6.32-279.14.1.el6)
Network: TCP (1GigE)

The same issue occurred:
https://maloo.whamcloud.com/test_sets/9b37618e-41db-11e2-adcf-52540035b04c

Comment by Emoly Liu [ 14/Dec/12 ]

http://review.whamcloud.com/#change,4474

b2_1 port is at http://review.whamcloud.com/4831

Comment by Peter Jones [ 19/Dec/12 ]

Fix landed for 2.1.4 and 2.4

Comment by Emoly Liu [ 28/Dec/12 ]

The related patches for b1_8 are merged into http://review.whamcloud.com/4898

Comment by Jian Yu [ 30/Dec/12 ]

Lustre Branch: b1_8
Lustre Build: http://build.whamcloud.com/job/lustre-b1_8/236/
Distro/Arch: RHEL5.8/x86_64
Network: TCP (1GigE)
ENABLE_QUOTA=yes
OSTSIZE=149723171

The ost-pools test 14 timed out: https://maloo.whamcloud.com/test_sets/f8ded3ce-5187-11e2-bbc3-52540035b04c

Generated at Sat Feb 10 01:10:30 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.