[LU-3782] Divizion by zero in ost-pools 18 Created: 20/Aug/13  Updated: 17/May/16  Resolved: 14/Mar/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.9.0

Type: Bug Priority: Minor
Reporter: Alexander Lezhoev Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: patch
Environment:

4-nodes virtual cluster, 2 OST 700MB per each.


Severity: 3
Rank (Obsolete): 9784

 Description   

ost-pools: create_perf () does not check if files were actually created on each iteration:

create_perf() {
...

    stat=$(createmany -o $cdir/${tfile} -$numsec | tail -1)
    files=$(echo $stat | cut -f 2 -d ' ')
    echo $stat 1>&2
..
}

$numsec is fixed as 15 seconds.
So test failed if there is lack of inodes and no files have been created:

== ost-pools test 18: File create in a directory which references a deleted pool == 15:47:58 (1371152878)
Create performance, iteration 1, 15 seconds x 3
total: 40940 creates in 14.28 seconds: 2867.34 creates/second
iter 1: 40940 creates without pool
mft51: Pool lustre.testpool created
mft51: OST lustre-OST0000_UUID added to pool lustre.testpool
mft51: OST lustre-OST0001_UUID added to pool lustre.testpool
total: 38563 creates in 14.42 seconds: 2674.14 creates/second
iter 1: 38563 creates with pool
mft51: OST lustre-OST0000_UUID removed from pool lustre.testpool
mft51: OST lustre-OST0001_UUID removed from pool lustre.testpool
mft51: Pool lustre.testpool destroyed
total: 43721 creates in 13.57 seconds: 3221.95 creates/second
iter 1: 43721 creates with missing pool

Create performance, iteration 2, 15 seconds x 3
total: 0 creates in 0.00 seconds: 0.00 creates/second
iter 2: 0 creates without pool
mft51: Pool lustre.testpool created
mft51: OST lustre-OST0000_UUID added to pool lustre.testpool
mft51: OST lustre-OST0001_UUID added to pool lustre.testpool
total: 0 creates in 0.00 seconds: 0.00 creates/second
iter 2: 0 creates with pool
mft51: OST lustre-OST0000_UUID removed from pool lustre.testpool
mft51: OST lustre-OST0001_UUID removed from pool lustre.testpool
mft51: Pool lustre.testpool destroyed
total: 0 creates in 0.00 seconds: 0.00 creates/second
iter 2: 0 creates with missing pool

Create performance, iteration 3, 15 seconds x 3
total: 0 creates in 0.00 seconds: 0.00 creates/second
iter 3: 0 creates without pool
mft51: Pool lustre.testpool created
mft51: OST lustre-OST0000_UUID added to pool lustre.testpool
mft51: OST lustre-OST0001_UUID added to pool lustre.testpool
total: 0 creates in 0.00 seconds: 0.00 creates/second
iter 3: 0 creates with pool
mft51: OST lustre-OST0000_UUID removed from pool lustre.testpool
mft51: OST lustre-OST0001_UUID removed from pool lustre.testpool
mft51: Pool lustre.testpool destroyed
total: 0 creates in 0.00 seconds: 0.00 creates/second
iter 3: 0 creates with missing pool

Avg files created in 15 seconds without pool: 0
Avg files created in 15 seconds with pool: 0
Avg files created in 15 seconds missing pool: 0
/usr/lib64/lustre/tests/ost-pools.sh: line 1000: (0 - 0) * 100 / 0: division by 0 (error token is "0")
test_18 returned 1
  • createmany return code must be checked.
  • Number of files should be calculated more flexible.


 Comments   
Comment by Keith Mannthey (Inactive) [ 20/Aug/13 ]

Are you intending to submit a patch or just reporting the issue?

Comment by Alexander Lezhoev [ 21/Aug/13 ]

Keith, because the current design (use fixed time instead of number of files) appears after LU-797, so my patch would look like reverting to the old version. I'm not sure which solution is most acceptable for you.

Comment by Andreas Dilger [ 21/Aug/13 ]

Probably the best solution is to change createmany to allow handing both a time limit and maximum file count, and exit when either condition is hit. This would best be done by parsing named options instead of making more confusing positional parameter combinations.

Comment by Kirtan Shetty (Inactive) [ 07/Sep/15 ]

Andreas Dilger, I have put the option of maximum file count in the test, but can you please elaborate on how this will help us in this issue ?

Comment by Andreas Dilger [ 01/Oct/15 ]

You are correct - I don't think my suggestion will help in this case, because the test would still exit after 15s even if a (maximum) number of files was specified. I was thinking of the more normal case where we want to run a test workload for a maximum amount of time, but not create too many files if the MDS is very fast.

I guess the next question is what was wrong with the system that it couldn't create any files in 90s? Is the filesystem out of inodes? Is the MDS or OSS down or in recovery? I'd probably consider it a test error if createmany wasn't able to create any files at all for this test.

Comment by Kirtan Shetty (Inactive) [ 05/Oct/15 ]

Ok, got it. So for now just a check when zero files created should be enough right ?

Comment by Gerrit Updater [ 26/Oct/15 ]

kirtan.shetty (kirtan.shetty@seagate.com) uploaded a new patch: http://review.whamcloud.com/16939
Subject: LU-3782 test: Fix for faliure when no file are created.
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 5c7298fb807446cc66112886f7d430908148c9d2

Comment by Gerrit Updater [ 14/Mar/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16939/
Subject: LU-3782 test: Fix for faliure when no file are created.
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 475083a145b3b02156ede309fbaefd17b3786228

Generated at Sat Feb 10 01:36:51 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.