[LU-3522] sanity-benchmark test_iozone: "no space left on device" on ZFS Created: 27/Jun/13 Updated: 09/Oct/17 Resolved: 09/Oct/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0, Lustre 2.4.1, Lustre 2.5.0, Lustre 2.8.0 |
| Fix Version/s: | Lustre 2.10.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Maloo | Assignee: | Nathaniel Clark |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
server and client : lustre-master build# 1536 zfs |
||
| Issue Links: |
|
||||||||||||||||||||
| Sub-Tasks: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 8858 | ||||||||||||||||||||
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/903643c8-ddca-11e2-85a3-52540035b04c. The sub-test test_iozone failed with the following error:
test log shows: Error writing block 7269, fd= 3 write: No space left on device iozone: interrupted exiting iozone sanity-benchmark test_iozone: @@@@@@ FAIL: iozone (2) failed |
| Comments |
| Comment by Andreas Dilger [ 02/Jul/13 ] |
|
Either it both the test script should be fixed to check the available space before starting, and stripe the file over all OSTs or create multiple output files. |
| Comment by Jian Yu [ 09/Aug/13 ] |
|
Lustre Branch: b2_4 The failure occurred regularly on Lustre b2_4 branch with ZFS backend filesystem: |
| Comment by Peter Jones [ 10/Aug/13 ] |
|
Nathaniel Could you please look into this one? Thanks Peter |
| Comment by Nathaniel Clark [ 12/Aug/13 ] |
|
It looks like the issue is the O_DIRECT nature of the iozone run. |
| Comment by Sarah Liu [ 13/Aug/13 ] |
|
SLES11 SP2 client also hit this issue with ldiskfs: https://maloo.whamcloud.com/test_sets/4ee7f69a-029c-11e3-b384-52540035b04c |
| Comment by Nathaniel Clark [ 14/Aug/13 ] |
|
It appears that when doing O_DIRECT, the transfer runs out of grant space before completing and thus gets an ENOSPC error back (this is zfs only). |
| Comment by Nathaniel Clark [ 19/Aug/13 ] |
|
The difference between ldiskfs and zfs seems to be in the calculated grant request space, namely ofd_grant_from_cli()'s conversion factor for ldiskfs is 1, whereas for zfs it's 32. This is because the ofd_grant_compat() assumes that if the ofd_blockbits (set from statfs which gives the largest available block size in zfs) is greater than 4KB (12 bits) that it should assume the worst case, and shift out by that amount (which in this case is 5 bits). |
| Comment by Nathaniel Clark [ 20/Aug/13 ] |
| Comment by Andreas Dilger [ 23/Aug/13 ] |
|
I put a brief description of what needs to be done into |
| Comment by Jian Yu [ 04/Sep/13 ] |
|
Lustre build: http://build.whamcloud.com/job/lustre-b2_4/44/ (2.4.1 RC1) sanity-benchmark test iozone hit the same failure: |
| Comment by Jian Yu [ 05/Sep/13 ] |
|
Lustre build: http://build.whamcloud.com/job/lustre-b2_4/44/ (2.4.1 RC1) sanity-benchmark test iozone also hit the same failure: |
| Comment by Sarah Liu [ 08/Sep/13 ] |
|
lustre-master build #1652 also hit this issue on ldiskfs: https://maloo.whamcloud.com/test_sets/c30260ee-15b4-11e3-8938-52540035b04c |
| Comment by Jian Yu [ 09/Sep/13 ] |
|
Lustre build: http://build.whamcloud.com/job/lustre-b2_4/45/ (2.4.1 RC2) sanity-benchmark test iozone also hit the same failure: |
| Comment by Nathaniel Clark [ 18/Sep/13 ] |
|
The current state of this bug seems to be: Behaving as expected, wait for OBD_CONNECT_GRANT_PARAM to be supported on the client ( |
| Comment by Peter Jones [ 19/Sep/13 ] |
|
Lai Are you able to assist with this one? Peter |
| Comment by Jian Yu [ 02/Nov/13 ] |
|
Lustre build: http://build.whamcloud.com/job/lustre-b2_4/47/ The same failure occurred: |
| Comment by Jian Yu [ 02/Nov/13 ] |
|
Lustre build: http://build.whamcloud.com/job/lustre-b2_4/47/ The same failure occurred: |
| Comment by Jian Yu [ 04/Nov/13 ] |
|
Lustre build: http://build.whamcloud.com/job/lustre-b2_4/47/ FSTYPE=zfs sanity-benchmark test iozone failed as follows: random random bkwd record stride
KB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread
3844920 512
Error writing block 7508, fd= 3
iozone: interrupted
exiting iozone
sanity-benchmark test_iozone: @@@@@@ FAIL: iozone (1) failed
Dmesg on client node showed that: Lustre: DEBUG MARKER: min OST has 2014336kB available, using 3844920kB file size LustreError: 17272:0:(vvp_io.c:1088:vvp_io_commit_write()) Write page 961062 of inode ffff88007af981f8 failed -28 Maloo report: https://maloo.whamcloud.com/test_sets/51ec0c9c-444f-11e3-8472-52540035b04c |
| Comment by Jian Yu [ 26/Nov/13 ] |
|
Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/58/ FSTYPE=zfs sanity-benchmark test iozone failed again: |
| Comment by Jian Yu [ 13/Dec/13 ] |
|
More instances on Lustre b2_4 branch: |
| Comment by Sarah Liu [ 08/Jul/15 ] |
|
I have similar failure in interop testing with master branch ldiskfs https://testing.hpdd.intel.com/test_sets/b4f55f66-250b-11e5-8009-5254006e85c2 |