[LU-289] script error in oos.sh and oos2.sh Created: 06/May/11  Updated: 22/Feb/13  Resolved: 17/Jan/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 1.8.8, Lustre 1.8.6
Fix Version/s: None

Type: Bug Priority: Trivial
Reporter: Niu Yawei (Inactive) Assignee: Jian Yu
Resolution: Fixed Votes: 0
Labels: None

Severity: 4
Rank (Obsolete): 5679

 Description   

When I run sanityn.sh in my virtual machine, I found that the space is not freed after test_15 (which calls oos2.sh to fill up OST), that result in some of the following test failed for ENOSPC.

The reason of oos2.sh/oos.sh doesn't free space is that the output of 'dd if=/dev/zero of=$OOS count=$(($ORIGFREE + 100)) bs=1k 2> $LOG' is:

dd: writing `/mnt/lustre/oos': No space left on device
dd: closing output file `/mnt/lustre/oos': Input/output error

but not not the scprit exepected:

dd: writing `/mnt/lustre/oos': No space left on device
247330+0 records in
247329+0 records out
253264896 bytes (253 MB) copied, 121.715 seconds, 2.1 MB/s

so the following line of the script is broken, and the script exit without cleanup the test file.

RECORDSOUT=$((`grep "records out" $LOG | cut -d+ -f 1` + \
`grep "records out" $LOG2 | cut -d+ -f 1`))

I tried the dd command with 'oflag=direct', then the output will same as what we expected, but direct io will make the test slower, not sure if it's a good solution.

I think because the test_15 is always skipped (for too much free space on testing system), we never hit this issue in out daily test.



 Comments   
Comment by Jian Yu [ 16/Jun/11 ]

Lustre Branch: v1_8_6_RC2
Lustre Build: http://newbuild.whamcloud.com/job/lustre-b1_8/80/
e2fsprogs Build: http://newbuild.whamcloud.com/job/e2fsprogs-master/40/
Distro/Arch: RHEL6/x86_64(patchless client, in-kernel OFED, kernel version: 2.6.32-131.2.1.el6)
        RHEL5/x86_64(server, OFED 1.5.3.1, kernel version: 2.6.18-238.12.1.el5_lustre)

While running sanity test 64b, the same failure also occurred:

== test 64b: check out-of-space detection on client =========== == 13:15:39
STRIPECOUNT=6 ORIGFREE=1496256 MAXFREE=2400000
BEFORE dd started
lustre-OST0000-osc-ffff8803056d6400 avl=249376 grnt=2048 diff=247328 
lustre-OST0001-osc-ffff8803056d6400 avl=249376 grnt=2048 diff=247328 
lustre-OST0002-osc-ffff8803056d6400 avl=249376 grnt=2048 diff=247328 
lustre-OST0003-osc-ffff8803056d6400 avl=249376 grnt=2048 diff=247328 
lustre-OST0004-osc-ffff8803056d6400 avl=249376 grnt=2048 diff=247328 
lustre-OST0005-osc-ffff8803056d6400 avl=249376 grnt=2048 diff=247328 
AFTER dd
lustre-OST0000-osc-ffff8803056d6400 avl=2056 grnt=0 diff=2056 
lustre-OST0001-osc-ffff8803056d6400 avl=2592 grnt=512 diff=2080 
lustre-OST0002-osc-ffff8803056d6400 avl=2592 grnt=512 diff=2080 
lustre-OST0003-osc-ffff8803056d6400 avl=2592 grnt=512 diff=2080 
lustre-OST0004-osc-ffff8803056d6400 avl=2592 grnt=512 diff=2080 
lustre-OST0005-osc-ffff8803056d6400 avl=2588 grnt=508 diff=2080 
no OSTs are close to full
osc.lustre-OST0000-osc-ffff8803056d6400.kbytesavail=2056
osc.lustre-OST0001-osc-ffff8803056d6400.kbytesavail=2592
osc.lustre-OST0002-osc-ffff8803056d6400.kbytesavail=2592
osc.lustre-OST0003-osc-ffff8803056d6400.kbytesavail=2592
osc.lustre-OST0004-osc-ffff8803056d6400.kbytesavail=2592
osc.lustre-OST0005-osc-ffff8803056d6400.kbytesavail=2588
osc.lustre-OST0000-osc-ffff8803056d6400.cur_dirty_bytes=0
osc.lustre-OST0000-osc-ffff8803056d6400.cur_grant_bytes=0
osc.lustre-OST0000-osc-ffff8803056d6400.cur_lost_grant_bytes=0
osc.lustre-OST0001-osc-ffff8803056d6400.cur_dirty_bytes=0
osc.lustre-OST0001-osc-ffff8803056d6400.cur_grant_bytes=524288
osc.lustre-OST0001-osc-ffff8803056d6400.cur_lost_grant_bytes=0
osc.lustre-OST0002-osc-ffff8803056d6400.cur_dirty_bytes=0
osc.lustre-OST0002-osc-ffff8803056d6400.cur_grant_bytes=524288
osc.lustre-OST0002-osc-ffff8803056d6400.cur_lost_grant_bytes=0
osc.lustre-OST0003-osc-ffff8803056d6400.cur_dirty_bytes=0
osc.lustre-OST0003-osc-ffff8803056d6400.cur_grant_bytes=524288
osc.lustre-OST0003-osc-ffff8803056d6400.cur_lost_grant_bytes=0
osc.lustre-OST0004-osc-ffff8803056d6400.cur_dirty_bytes=0
osc.lustre-OST0004-osc-ffff8803056d6400.cur_grant_bytes=524288
osc.lustre-OST0004-osc-ffff8803056d6400.cur_lost_grant_bytes=0
osc.lustre-OST0005-osc-ffff8803056d6400.cur_dirty_bytes=0
osc.lustre-OST0005-osc-ffff8803056d6400.cur_grant_bytes=520192
osc.lustre-OST0005-osc-ffff8803056d6400.cur_lost_grant_bytes=0
ERROR: no blocks written by dd?
LOG: dd: writing `/mnt/lustre/oosfile': No space left on device
LOG: dd: closing output file `/mnt/lustre/oosfile': Input/output error
LOG file
dd: writing `/mnt/lustre/oosfile': No space left on device
dd: closing output file `/mnt/lustre/oosfile': Input/output error
 sanity test_64b: @@@@@@ FAIL: test_64b failed with 1

There are two failures in the above output. The first one is "no OSTs are close to full", and the second one is "dd: closing output file `/mnt/lustre/oosfile': Input/output error".

Maloo report: https://maloo.whamcloud.com/test_sets/252b94d6-97ca-11e0-9a27-52540025f9af

Comment by Jian Yu [ 16/Jun/11 ]

sanityn test 15 also hit the above two failures:

== test 15: test out-of-space with multiple writers =========== == 14:22:04
PATH=/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/tests:/usr/lib64/openmpi/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/usr/lib64/lustre/utils:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey::/usr/lib64/lustre/tests/mpi
STRIPECOUNT=6 ORIGFREE=1496256 MAXFREE=2400000
lustre-OST0000-osc-ffff88032475dc00 avl=4632 grnt=224 diff=4408 
lustre-OST0000-osc-ffff880329e9b800 avl=4632 grnt=44 diff=4588 
lustre-OST0001-osc-ffff88032475dc00 avl=5188 grnt=1076 diff=4112 
lustre-OST0001-osc-ffff880329e9b800 avl=5188 grnt=0 diff=5188 
lustre-OST0002-osc-ffff88032475dc00 avl=5656 grnt=0 diff=5656 
lustre-OST0002-osc-ffff880329e9b800 avl=5656 grnt=1148 diff=4508 
lustre-OST0003-osc-ffff88032475dc00 avl=5656 grnt=204 diff=5452 
lustre-OST0003-osc-ffff880329e9b800 avl=5656 grnt=376 diff=5280 
lustre-OST0004-osc-ffff88032475dc00 avl=4632 grnt=68 diff=4564 
lustre-OST0004-osc-ffff880329e9b800 avl=4632 grnt=244 diff=4388 
lustre-OST0005-osc-ffff88032475dc00 avl=4112 grnt=0 diff=4112 
lustre-OST0005-osc-ffff880329e9b800 avl=4112 grnt=0 diff=4112 
no OSTs are close to full
osc.lustre-OST0000-osc-ffff88032475dc00.kbytesavail=4632
osc.lustre-OST0000-osc-ffff880329e9b800.kbytesavail=4632
osc.lustre-OST0001-osc-ffff88032475dc00.kbytesavail=5188
osc.lustre-OST0001-osc-ffff880329e9b800.kbytesavail=5188
osc.lustre-OST0002-osc-ffff88032475dc00.kbytesavail=5656
osc.lustre-OST0002-osc-ffff880329e9b800.kbytesavail=5656
osc.lustre-OST0003-osc-ffff88032475dc00.kbytesavail=5656
osc.lustre-OST0003-osc-ffff880329e9b800.kbytesavail=5656
osc.lustre-OST0004-osc-ffff88032475dc00.kbytesavail=4632
osc.lustre-OST0004-osc-ffff880329e9b800.kbytesavail=4632
osc.lustre-OST0005-osc-ffff88032475dc00.kbytesavail=4112
osc.lustre-OST0005-osc-ffff880329e9b800.kbytesavail=4112
osc.lustre-OST0000-osc-ffff88032475dc00.cur_dirty_bytes=0
osc.lustre-OST0000-osc-ffff88032475dc00.cur_grant_bytes=229376
osc.lustre-OST0000-osc-ffff88032475dc00.cur_lost_grant_bytes=0
osc.lustre-OST0000-osc-ffff880329e9b800.cur_dirty_bytes=0
osc.lustre-OST0000-osc-ffff880329e9b800.cur_grant_bytes=45056
osc.lustre-OST0000-osc-ffff880329e9b800.cur_lost_grant_bytes=0
osc.lustre-OST0001-osc-ffff88032475dc00.cur_dirty_bytes=0
osc.lustre-OST0001-osc-ffff88032475dc00.cur_grant_bytes=1101824
osc.lustre-OST0001-osc-ffff88032475dc00.cur_lost_grant_bytes=0
osc.lustre-OST0001-osc-ffff880329e9b800.cur_dirty_bytes=0
osc.lustre-OST0001-osc-ffff880329e9b800.cur_grant_bytes=0
osc.lustre-OST0001-osc-ffff880329e9b800.cur_lost_grant_bytes=0
osc.lustre-OST0002-osc-ffff88032475dc00.cur_dirty_bytes=0
osc.lustre-OST0002-osc-ffff88032475dc00.cur_grant_bytes=0
osc.lustre-OST0002-osc-ffff88032475dc00.cur_lost_grant_bytes=0
osc.lustre-OST0002-osc-ffff880329e9b800.cur_dirty_bytes=0
osc.lustre-OST0002-osc-ffff880329e9b800.cur_grant_bytes=1175552
osc.lustre-OST0002-osc-ffff880329e9b800.cur_lost_grant_bytes=0
osc.lustre-OST0003-osc-ffff88032475dc00.cur_dirty_bytes=0
osc.lustre-OST0003-osc-ffff88032475dc00.cur_grant_bytes=208896
osc.lustre-OST0003-osc-ffff88032475dc00.cur_lost_grant_bytes=0
osc.lustre-OST0003-osc-ffff880329e9b800.cur_dirty_bytes=0
osc.lustre-OST0003-osc-ffff880329e9b800.cur_grant_bytes=385024
osc.lustre-OST0003-osc-ffff880329e9b800.cur_lost_grant_bytes=0
osc.lustre-OST0004-osc-ffff88032475dc00.cur_dirty_bytes=0
osc.lustre-OST0004-osc-ffff88032475dc00.cur_grant_bytes=69632
osc.lustre-OST0004-osc-ffff88032475dc00.cur_lost_grant_bytes=0
osc.lustre-OST0004-osc-ffff880329e9b800.cur_dirty_bytes=0
osc.lustre-OST0004-osc-ffff880329e9b800.cur_grant_bytes=249856
osc.lustre-OST0004-osc-ffff880329e9b800.cur_lost_grant_bytes=0
osc.lustre-OST0005-osc-ffff88032475dc00.cur_dirty_bytes=0
osc.lustre-OST0005-osc-ffff88032475dc00.cur_grant_bytes=0
osc.lustre-OST0005-osc-ffff88032475dc00.cur_lost_grant_bytes=0
osc.lustre-OST0005-osc-ffff880329e9b800.cur_dirty_bytes=0
osc.lustre-OST0005-osc-ffff880329e9b800.cur_grant_bytes=0
osc.lustre-OST0005-osc-ffff880329e9b800.cur_lost_grant_bytes=0
oos2.sh: line 86: +               : syntax error: operand expected (error token is "+               ")
+ SUCCESS=0
++ grep 'records out' /tmp/oos2.log
++ cut -d+ -f 1
++ grep 'records out' /tmp/oos2.log2
++ cut -d+ -f 1
oos2.sh: line 87: +               : syntax error: operand expected (error token is "+               ")

# cat /tmp/oos2.log
dd: writing `/mnt/lustre/oosfile': No space left on device
dd: closing output file `/mnt/lustre/oosfile': Input/output error

# cat /tmp/oos2.log2
dd: writing `/mnt/lustre2/oosfile2': No space left on device
dd: closing output file `/mnt/lustre2/oosfile2': Input/output error

Maloo report: https://maloo.whamcloud.com/test_sets/410f8fb2-97d0-11e0-9a27-52540025f9af

Comment by Jian Yu [ 15/May/12 ]

Lustre Tag: v1_8_8_WC1_RC1
Lustre Build: http://build.whamcloud.com/job/lustre-b1_8/195/
Distro/Arch: RHEL5.8/x86_64 (kernel version: 2.6.18-308.4.1.el5)
Network: TCP (1GigE)
ENABLE_QUOTA=yes

sanity test 64b and sanityn test 15 failed with the same issue:
https://maloo.whamcloud.com/test_sets/41ad5fd4-9dc5-11e1-a1d8-52540035b04c
https://maloo.whamcloud.com/test_sets/4bd8fb46-9e59-11e1-9080-52540035b04c

Comment by Jian Yu [ 10/Jan/13 ]

With patch http://review.whamcloud.com/4986, sanity test 64b and sanityn test 15 passed.

Lustre b1_8 client and server:
sanity test 64b passed: https://maloo.whamcloud.com/test_sets/13339fc4-5b11-11e2-8985-52540035b04c
sanityn test 15 passed: https://maloo.whamcloud.com/test_sets/169f7b22-5b0e-11e2-b205-52540035b04c

Lustre b1_8 client and master server:
sanity test 64b passed: https://maloo.whamcloud.com/test_sets/cfd341de-5b07-11e2-b205-52540035b04c
sanityn test 15 passed: https://maloo.whamcloud.com/test_sets/156fb564-5aff-11e2-b205-52540035b04c

Comment by Jian Yu [ 17/Jan/13 ]

Patch was landed on Lustre b1_8 branch.

Comment by Peter Jones [ 22/Feb/13 ]

It looks like this was fixed by the LU-1526 landing

Generated at Sat Feb 10 01:05:35 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.