[LU-3522] sanity-benchmark test_iozone: "no space left on device" on ZFS Created: 27/Jun/13  Updated: 09/Oct/17  Resolved: 09/Oct/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0, Lustre 2.4.1, Lustre 2.5.0, Lustre 2.8.0
Fix Version/s: Lustre 2.10.0

Type: Bug Priority: Critical
Reporter: Maloo Assignee: Nathaniel Clark
Resolution: Fixed Votes: 0
Labels: None
Environment:

server and client : lustre-master build# 1536 zfs


Issue Links:
Blocker
is blocked by LU-2049 add support for OBD_CONNECT_GRANT_PARAM Resolved
Duplicate
is duplicated by LU-3906 Failure on test suite parallel-scale ... Resolved
is duplicated by LU-4042 Failure on test suite replay-ost-sing... Resolved
Sub-Tasks:
Key
Summary
Type
Status
Assignee
LU-2049 add support for OBD_CONNECT_GRANT_PARAM Technical task Resolved Nathaniel Clark  
Severity: 3
Rank (Obsolete): 8858

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/903643c8-ddca-11e2-85a3-52540035b04c.

The sub-test test_iozone failed with the following error:

iozone (2) failed

test log shows:

Error writing block 7269, fd= 3
write: No space left on device

iozone: interrupted

exiting iozone

 sanity-benchmark test_iozone: @@@@@@ FAIL: iozone (2) failed 


 Comments   
Comment by Andreas Dilger [ 02/Jul/13 ]

Either it both the test script should be fixed to check the available space before starting, and stripe the file over all OSTs or create multiple output files.

Comment by Jian Yu [ 09/Aug/13 ]

Lustre Branch: b2_4
Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/27/
Distro/Arch: RHEL6.4/x86_64

The failure occurred regularly on Lustre b2_4 branch with ZFS backend filesystem:
https://maloo.whamcloud.com/test_sets/c39c4b36-fd73-11e2-9fdb-52540035b04c
https://maloo.whamcloud.com/test_sets/9a3acd76-fd0a-11e2-9fdb-52540035b04c
https://maloo.whamcloud.com/test_sets/6cb6014e-ed89-11e2-8e3a-52540035b04c
https://maloo.whamcloud.com/test_sets/191be2e6-ce18-11e2-96ef-52540035b04c
https://maloo.whamcloud.com/test_sets/f29840d6-cb67-11e2-a1fe-52540035b04c

Comment by Peter Jones [ 10/Aug/13 ]

Nathaniel

Could you please look into this one?

Thanks

Peter

Comment by Nathaniel Clark [ 12/Aug/13 ]

It looks like the issue is the O_DIRECT nature of the iozone run.

Comment by Sarah Liu [ 13/Aug/13 ]

SLES11 SP2 client also hit this issue with ldiskfs:

https://maloo.whamcloud.com/test_sets/4ee7f69a-029c-11e3-b384-52540035b04c

Comment by Nathaniel Clark [ 14/Aug/13 ]

It appears that when doing O_DIRECT, the transfer runs out of grant space before completing and thus gets an ENOSPC error back (this is zfs only).

Comment by Nathaniel Clark [ 19/Aug/13 ]

The difference between ldiskfs and zfs seems to be in the calculated grant request space, namely ofd_grant_from_cli()'s conversion factor for ldiskfs is 1, whereas for zfs it's 32. This is because the ofd_grant_compat() assumes that if the ofd_blockbits (set from statfs which gives the largest available block size in zfs) is greater than 4KB (12 bits) that it should assume the worst case, and shift out by that amount (which in this case is 5 bits).

Comment by Nathaniel Clark [ 20/Aug/13 ]

http://review.whamcloud.com/7402

Comment by Andreas Dilger [ 23/Aug/13 ]

I put a brief description of what needs to be done into LU-2049. I thought that Jinshan and Johann had worked up a more detailed design for this, but I can't find it.

Comment by Jian Yu [ 04/Sep/13 ]

Lustre build: http://build.whamcloud.com/job/lustre-b2_4/44/ (2.4.1 RC1)
Distro/Arch: RHEL6.4/x86_64 + FC18/x86_64 (Server + Client)
FSTYPE=ldiskfs

sanity-benchmark test iozone hit the same failure:
https://maloo.whamcloud.com/test_sets/becb9218-14ef-11e3-ac48-52540035b04c

Comment by Jian Yu [ 05/Sep/13 ]

Lustre build: http://build.whamcloud.com/job/lustre-b2_4/44/ (2.4.1 RC1)
Distro/Arch: RHEL6.4/x86_64
FSTYPE=zfs

sanity-benchmark test iozone also hit the same failure:
https://maloo.whamcloud.com/test_sets/e004601a-1556-11e3-8938-52540035b04c

Comment by Sarah Liu [ 08/Sep/13 ]

lustre-master build #1652 also hit this issue on ldiskfs:

https://maloo.whamcloud.com/test_sets/c30260ee-15b4-11e3-8938-52540035b04c

Comment by Jian Yu [ 09/Sep/13 ]

Lustre build: http://build.whamcloud.com/job/lustre-b2_4/45/ (2.4.1 RC2)
Distro/Arch: RHEL6.4/x86_64
FSTYPE=ldiskfs

sanity-benchmark test iozone also hit the same failure:
https://maloo.whamcloud.com/test_sets/497a00b4-182b-11e3-b39a-52540035b04c

Comment by Nathaniel Clark [ 18/Sep/13 ]

The current state of this bug seems to be: Behaving as expected, wait for OBD_CONNECT_GRANT_PARAM to be supported on the client (LU-2049).

Comment by Peter Jones [ 19/Sep/13 ]

Lai

Are you able to assist with this one?

Peter

Comment by Jian Yu [ 02/Nov/13 ]

Lustre build: http://build.whamcloud.com/job/lustre-b2_4/47/
Distro/Arch: RHEL6.4/x86_64(server), SLES11SP2/x86_64(client)
FSTYPE=ldiskfs

The same failure occurred:
https://maloo.whamcloud.com/test_sets/e5731cd8-432f-11e3-8676-52540035b04c

Comment by Jian Yu [ 02/Nov/13 ]

Lustre build: http://build.whamcloud.com/job/lustre-b2_4/47/
Distro/Arch: RHEL6.4/x86_64
FSTYPE=zfs

The same failure occurred:
https://maloo.whamcloud.com/test_sets/5d2a22d4-43a9-11e3-942a-52540035b04c

Comment by Jian Yu [ 04/Nov/13 ]

Lustre build: http://build.whamcloud.com/job/lustre-b2_4/47/
Distro/Arch: RHEL6.4/x86_64

FSTYPE=zfs
MDSCOUNT=1
MDSSIZE=2097152
OSTCOUNT=2
OSTSIZE=2097152

sanity-benchmark test iozone failed as follows:

                                                            random  random    bkwd   record   stride                                   
              KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
         3844920     512
Error writing block 7508, fd= 3

iozone: interrupted

exiting iozone

 sanity-benchmark test_iozone: @@@@@@ FAIL: iozone (1) failed

Dmesg on client node showed that:

Lustre: DEBUG MARKER: min OST has 2014336kB available, using 3844920kB file size
LustreError: 17272:0:(vvp_io.c:1088:vvp_io_commit_write()) Write page 961062 of inode ffff88007af981f8 failed -28

Maloo report: https://maloo.whamcloud.com/test_sets/51ec0c9c-444f-11e3-8472-52540035b04c

Comment by Jian Yu [ 26/Nov/13 ]

Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/58/
Distro/Arch: RHEL6.4/x86_64

FSTYPE=zfs
MDSCOUNT=1
MDSSIZE=2097152
OSTCOUNT=2
OSTSIZE=2097152

sanity-benchmark test iozone failed again:
https://maloo.whamcloud.com/test_sets/f6ed7dd6-5604-11e3-8e94-52540035b04c

Comment by Jian Yu [ 13/Dec/13 ]

More instances on Lustre b2_4 branch:
https://maloo.whamcloud.com/test_sets/bc739fb2-6353-11e3-8c76-52540035b04c
https://maloo.whamcloud.com/test_sets/75b2103c-6366-11e3-8ae4-52540035b04c
https://maloo.whamcloud.com/test_sets/06fb5b2e-627d-11e3-a8fd-52540035b04c

Comment by Sarah Liu [ 08/Jul/15 ]

I have similar failure in interop testing with master branch ldiskfs

https://testing.hpdd.intel.com/test_sets/b4f55f66-250b-11e5-8009-5254006e85c2

Generated at Sat Feb 10 01:34:38 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.