[LU-1776] sanity test_220: open(/mnt/lustre/d0.sanity/d220/f1055867) error: No space left on device Created: 21/Aug/12  Updated: 22/Dec/12  Resolved: 16/Sep/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.3, Lustre 2.1.4
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: Hongchao Zhang
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates LU-748 sanity test 220 must be placed in SLO... Resolved
Severity: 3
Rank (Obsolete): 4353

 Description   

This issue was created by maloo for yujian <yujian@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/60788e20-eb2c-11e1-ba73-52540035b04c.

The sub-test test_220 failed with the following error:

 - created 1030000 (time 1345384521.87 total 2189.78 last 33.07)
 - created 1040000 (time 1345384541.54 total 2209.45 last 19.67)
open(/mnt/lustre/d0.sanity/d220/f1055867) error: No space left on device
total: 1048039 creates in 2234.40 seconds: 469.05 creates/second
 sanity test_220: @@@@@@ FAIL: test_220 failed with 3

Info required for matching: sanity 220

More instances:
https://maloo.whamcloud.com/test_sets/7fccd814-e823-11e1-9f19-52540035b04c
https://maloo.whamcloud.com/test_sets/24e9800c-e795-11e1-a0f9-52540035b04c
https://maloo.whamcloud.com/test_sets/f0f78744-e69f-11e1-80c3-52540035b04c
https://maloo.whamcloud.com/test_sets/6588d858-e77a-11e1-94fa-52540035b04c
https://maloo.whamcloud.com/test_sets/2314d086-e59e-11e1-ae4e-52540035b04c
https://maloo.whamcloud.com/test_sets/36d9b0b8-e4eb-11e1-af05-52540035b04c
https://maloo.whamcloud.com/test_sets/36220096-e449-11e1-af05-52540035b04c



 Comments   
Comment by Peter Jones [ 21/Aug/12 ]

Hongchao

Could you please look into this one?

Thanks

Peter

Comment by Hongchao Zhang [ 22/Aug/12 ]

this problem is caused by the insufficient objects(inodes) in MDT!

lustre-MDT0000_UUID 1048576 537 1048039 0% /mnt/lustre[MDT:0]
lustre-OST0000_UUID 1981440 164 1981276 0% /mnt/lustre[OST:0]
lustre-OST0001_UUID 1981440 155 1981285 0% /mnt/lustre[OST:1]
lustre-OST0002_UUID 1981440 159 1981281 0% /mnt/lustre[OST:2]
lustre-OST0003_UUID 1981440 158 1981282 0% /mnt/lustre[OST:3]
lustre-OST0004_UUID 1981440 157 1981283 0% /mnt/lustre[OST:4]
lustre-OST0005_UUID 1981440 157 1981283 0% /mnt/lustre[OST:5]
lustre-OST0006_UUID 1981440 156 1981284 0% /mnt/lustre[OST:6]

filesystem summary: 1048576 537 1048039 0% /mnt/lustre

the whole filesystem can only contain 1048576 files(limited by MDT), but the OSTs have much more objects even for "stripe_count=1"!

Comment by Oleg Drokin [ 22/Aug/12 ]

indeed, yet another victim of our 2G MDTs.

Comment by Jian Yu [ 25/Aug/12 ]

This is blocking the review testing on b2_1 patches now:
https://maloo.whamcloud.com/test_sets/bb7e1664-eed8-11e1-8e98-52540035b04c
https://maloo.whamcloud.com/test_sets/64df0a36-ee81-11e1-8e98-52540035b04c
https://maloo.whamcloud.com/test_sets/f34e5c58-ee0a-11e1-b95b-52540035b04c
https://maloo.whamcloud.com/test_sets/ae15f4ac-eedc-11e1-9426-52540035b04c
https://maloo.whamcloud.com/test_sets/6698bba8-ed30-11e1-8e13-52540035b04c
https://maloo.whamcloud.com/test_sets/ddb30780-ed43-11e1-8e13-52540035b04c
https://maloo.whamcloud.com/test_sets/8643b268-ecf5-11e1-b788-52540035b04c
https://maloo.whamcloud.com/test_sets/592a37b6-ecc3-11e1-ba25-52540035b04c

Comment by Peter Jones [ 25/Aug/12 ]

So is this something that Chris will need to adjust? Should this ticket be moved to the TT project?

Comment by Hongchao Zhang [ 27/Aug/12 ]

this should be moved to TT, and this test should be skipped if there is less inodes in MDT.

Comment by Andreas Dilger [ 29/Aug/12 ]

To be honest, having test_220 create 1M files for an hour just to test OST pools out of space is a huge waste of testing time. Instead, we should cherry-pick commit d0efb7dea9b8a5c571d63a0e019f16e75d16131f (http://review.whamcloud.com/1676 "LU-748 test: shorten the runtime of sanity subtest_220") from master to avoid this test environment dependency and time sink entirely.

Comment by Andreas Dilger [ 16/Sep/12 ]

Oleg has cherry-picked LU-748 (commit d0efb7dea9b8a5c571d63a0e019f16e75d16131f) to b2_1, so this problem should be gone for future 2.1 testing.

Generated at Sat Feb 10 01:19:35 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.