[LU-14898] sanity test_413a: (max - min) * 100 / min: division by 0 (error token is "min") Created: 30/Jul/21  Updated: 18/Sep/21  Resolved: 18/Sep/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-14659 sanity test_413a: subdirs shouldn't b... Resolved
is related to LU-13417 DNE3: mkdir() automatically create re... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>

This issue relates to the following test suite runs on master:
https://testing.whamcloud.com/test_sets/d6500a87-362a-4b97-a17c-cf951e79589a
https://testing.whamcloud.com/test_sets/3fd97831-6f6e-4b8e-b8de-54bf5bbe417d

test_413a failed with the following error:

Check for uneven MDTs: 
weight diff=2% must be > 100% ...Fill MDT0 with 100 files: loop 0
weight diff=2% must be > 100% ...Fill MDT0 with 100 files: loop 1
weight diff=72% must be > 100% ...Fill MDT0 with 100 files: loop 2
weight diff=72% must be > 100% ...Fill MDT0 with 100 files: loop 3
weight diff=72% must be > 100% ...Fill MDT0 with 100 files: loop 4
sanity.sh: line 24557: (max - min) * 100 / min: division by 0 (error token is "min")
test_413a returned 1

It looks like something is wrong with filling the MDT0000, since it runs multiple loops and "diff" doesn't change, until MDT0000 is totally full (min = 0). Either the statfs data is cached and not updated between loops (if the DoM writes are very fast), or the writes are not going to MDT0000 for some reason (I think this is less likely, but possible).

Another possibility is that the default MDT space balance is causing MDT usage to be much more balanced, and it is not possible to hit 100% imbalance just by writing DoM files to MDT0000? It may be that the loop also needs to create more, smaller files (e.g. 1000x64KB) so that both blocks and inodes are used.

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity test_413a - test_413a returned 1



 Comments   
Comment by Andreas Dilger [ 30/Jul/21 ]

Lai, could you please take a look. This was hit with two recent patches, maybe caused by the recent "default MDT balance" patch causing MDT usage to be more even than before.

Comment by Andreas Dilger [ 11/Aug/21 ]

+1 on master: https://testing.whamcloud.com/test_sets/1032039a-c54c-456d-9fcf-5ab446241111

Comment by Lai Siyao [ 11/Aug/21 ]

This happens when system is full, and I met this before in my test system, but I didn't know it could happen in autotest. I'll look into it later.

Comment by Andreas Dilger [ 12/Aug/21 ]

I think part of the problem here is that the current generate_uneven_mdts() is mostly consuming blocks, but that counts only a fraction of imbalance compared to consuming inodes. As mentioned in LU-14659, it probably makes sense to create smaller files (e.g. 64KB instead of 1MB) so that 16x as many files are created and imbalance the MDTs for inodes also before it runs out of blocks.

Comment by Andreas Dilger [ 18/Sep/21 ]

Fixed by https://review.whamcloud.com/44384

Generated at Sat Feb 10 03:13:43 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.