[LU-8940] tests: df is not responsive enough Created: 15/Dec/16 Updated: 29/Mar/22 Resolved: 29/Mar/22 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | CEA | Assignee: | John Hammond |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
In sanity-hsm df is used to check if there is enough space on the file system before creating a rather large file (with make_custom_file_for_progress()). If there is not enough space, a call to cleanup_large_files() is issued and then df is used once again to check that there is now enough space. The thing is, df takes time to see the update in free space, more than du for example (even with the option --sync). dd if=/dev/zero of=/mnt/lustre/file count=100 bs=1M df -h find /mnt/lustre -size +10M -delete && df -h [--sync ] && du -s /mnt/lustre # Wait around 3 to 5 seconds df -h You can check the output of the first two df commands, the "Used" column should be the same. But du immediatly reflects the actual size. |
| Comments |
| Comment by Quentin Bouget [ 15/Dec/16 ] |
|
Would it be satisfying enough to use "du" instead ? |
| Comment by John Hammond [ 15/Dec/16 ] |
|
Quentin, could you clarify is the problem that the test should be calling cleanup_large_files() but it isn't? Or something else? Is it causing a spurious test failure? If so the could you post the logs? |
| Comment by Quentin Bouget [ 16/Dec/16 ] |
|
cleanup_large_files() is called, at the right times, but we check how much space it frees too quickly after and end up thinking there is not enough space when there is. The consequence is that some tests are skipped with the message "not enough space" when they should not. Overall this is not too problematic, but it would still be nice to fix. |
| Comment by John Hammond [ 16/Dec/16 ] |
|
Quentin, could you try add a call to wait_destroy_complete after find and see if works? |
| Comment by Quentin Bouget [ 27/Dec/16 ] |
|
The timeout of wait_complete_destroy is 5 seconds. When I run my test it sometimes is enough, sometimes it is not. I think it would be better to use a non-sleeping and more deterministic method. |
| Comment by Quentin Bouget [ 20/Feb/17 ] |
|
A workaround in sanity-hsm.sh is not to use make_custom_file_for_progress() and use empty file as much as possible (or very small files). |
| Comment by John Hammond [ 23/Feb/17 ] |
|
In general we should not use empty files in sanity-hsm. Since this will silently reduce the code coverage in many places. (BTW we should also avoid using copies of /etc/ {hosts,passwd,...}as a small file, since these may be empty on some distros. It's better to use dd if=/dev/urandom ... to create the small files.) |
| Comment by Quentin Bouget [ 23/Feb/17 ] |
|
I am not sure I agree with that. Using not empty files pretty much only tests how the copytool behaves and very little tests are "copytool oriented". The main problem related to using big files is that the copytool's bandwidth is artificially capped at 1MB/s. This unnecessarily slows down many tests in sanity-hsm. Overall I am not asking for complete removal of big files, I am just asking for each use to be documented and rationalized (which is what motivates lu-8950). Anyway, using small or empty files is just a workaround. We still have to find a solution. I personnaly think that it is ok for df to report outdated information as long as theĀ "--sync" option forces an update. |
| Comment by Quentin Bouget [ 23/May/17 ] |
|
We were actually pretty close to finding a general fix to this, we tried wait_destroy_complete() and it would seem wait_delete_complete() was what we needed (credit to Andreas Dilger). |
| Comment by John Hammond [ 29/Mar/22 ] |
|
Fixed by |