[LU-289] script error in oos.sh and oos2.sh Created: 06/May/11 Updated: 22/Feb/13 Resolved: 17/Jan/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 1.8.8, Lustre 1.8.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Trivial |
| Reporter: | Niu Yawei (Inactive) | Assignee: | Jian Yu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 4 |
| Rank (Obsolete): | 5679 |
| Description |
|
When I run sanityn.sh in my virtual machine, I found that the space is not freed after test_15 (which calls oos2.sh to fill up OST), that result in some of the following test failed for ENOSPC. The reason of oos2.sh/oos.sh doesn't free space is that the output of 'dd if=/dev/zero of=$OOS count=$(($ORIGFREE + 100)) bs=1k 2> $LOG' is: dd: writing `/mnt/lustre/oos': No space left on device but not not the scprit exepected: dd: writing `/mnt/lustre/oos': No space left on device so the following line of the script is broken, and the script exit without cleanup the test file. RECORDSOUT=$((`grep "records out" $LOG | cut -d+ -f 1` + \ I tried the dd command with 'oflag=direct', then the output will same as what we expected, but direct io will make the test slower, not sure if it's a good solution. I think because the test_15 is always skipped (for too much free space on testing system), we never hit this issue in out daily test. |
| Comments |
| Comment by Jian Yu [ 16/Jun/11 ] |
|
Lustre Branch: v1_8_6_RC2 While running sanity test 64b, the same failure also occurred: == test 64b: check out-of-space detection on client =========== == 13:15:39 STRIPECOUNT=6 ORIGFREE=1496256 MAXFREE=2400000 BEFORE dd started lustre-OST0000-osc-ffff8803056d6400 avl=249376 grnt=2048 diff=247328 lustre-OST0001-osc-ffff8803056d6400 avl=249376 grnt=2048 diff=247328 lustre-OST0002-osc-ffff8803056d6400 avl=249376 grnt=2048 diff=247328 lustre-OST0003-osc-ffff8803056d6400 avl=249376 grnt=2048 diff=247328 lustre-OST0004-osc-ffff8803056d6400 avl=249376 grnt=2048 diff=247328 lustre-OST0005-osc-ffff8803056d6400 avl=249376 grnt=2048 diff=247328 AFTER dd lustre-OST0000-osc-ffff8803056d6400 avl=2056 grnt=0 diff=2056 lustre-OST0001-osc-ffff8803056d6400 avl=2592 grnt=512 diff=2080 lustre-OST0002-osc-ffff8803056d6400 avl=2592 grnt=512 diff=2080 lustre-OST0003-osc-ffff8803056d6400 avl=2592 grnt=512 diff=2080 lustre-OST0004-osc-ffff8803056d6400 avl=2592 grnt=512 diff=2080 lustre-OST0005-osc-ffff8803056d6400 avl=2588 grnt=508 diff=2080 no OSTs are close to full osc.lustre-OST0000-osc-ffff8803056d6400.kbytesavail=2056 osc.lustre-OST0001-osc-ffff8803056d6400.kbytesavail=2592 osc.lustre-OST0002-osc-ffff8803056d6400.kbytesavail=2592 osc.lustre-OST0003-osc-ffff8803056d6400.kbytesavail=2592 osc.lustre-OST0004-osc-ffff8803056d6400.kbytesavail=2592 osc.lustre-OST0005-osc-ffff8803056d6400.kbytesavail=2588 osc.lustre-OST0000-osc-ffff8803056d6400.cur_dirty_bytes=0 osc.lustre-OST0000-osc-ffff8803056d6400.cur_grant_bytes=0 osc.lustre-OST0000-osc-ffff8803056d6400.cur_lost_grant_bytes=0 osc.lustre-OST0001-osc-ffff8803056d6400.cur_dirty_bytes=0 osc.lustre-OST0001-osc-ffff8803056d6400.cur_grant_bytes=524288 osc.lustre-OST0001-osc-ffff8803056d6400.cur_lost_grant_bytes=0 osc.lustre-OST0002-osc-ffff8803056d6400.cur_dirty_bytes=0 osc.lustre-OST0002-osc-ffff8803056d6400.cur_grant_bytes=524288 osc.lustre-OST0002-osc-ffff8803056d6400.cur_lost_grant_bytes=0 osc.lustre-OST0003-osc-ffff8803056d6400.cur_dirty_bytes=0 osc.lustre-OST0003-osc-ffff8803056d6400.cur_grant_bytes=524288 osc.lustre-OST0003-osc-ffff8803056d6400.cur_lost_grant_bytes=0 osc.lustre-OST0004-osc-ffff8803056d6400.cur_dirty_bytes=0 osc.lustre-OST0004-osc-ffff8803056d6400.cur_grant_bytes=524288 osc.lustre-OST0004-osc-ffff8803056d6400.cur_lost_grant_bytes=0 osc.lustre-OST0005-osc-ffff8803056d6400.cur_dirty_bytes=0 osc.lustre-OST0005-osc-ffff8803056d6400.cur_grant_bytes=520192 osc.lustre-OST0005-osc-ffff8803056d6400.cur_lost_grant_bytes=0 ERROR: no blocks written by dd? LOG: dd: writing `/mnt/lustre/oosfile': No space left on device LOG: dd: closing output file `/mnt/lustre/oosfile': Input/output error LOG file dd: writing `/mnt/lustre/oosfile': No space left on device dd: closing output file `/mnt/lustre/oosfile': Input/output error sanity test_64b: @@@@@@ FAIL: test_64b failed with 1 There are two failures in the above output. The first one is "no OSTs are close to full", and the second one is "dd: closing output file `/mnt/lustre/oosfile': Input/output error". Maloo report: https://maloo.whamcloud.com/test_sets/252b94d6-97ca-11e0-9a27-52540025f9af |
| Comment by Jian Yu [ 16/Jun/11 ] |
|
sanityn test 15 also hit the above two failures: == test 15: test out-of-space with multiple writers =========== == 14:22:04 PATH=/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/tests:/usr/lib64/openmpi/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/usr/lib64/lustre/utils:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey::/usr/lib64/lustre/tests/mpi STRIPECOUNT=6 ORIGFREE=1496256 MAXFREE=2400000 lustre-OST0000-osc-ffff88032475dc00 avl=4632 grnt=224 diff=4408 lustre-OST0000-osc-ffff880329e9b800 avl=4632 grnt=44 diff=4588 lustre-OST0001-osc-ffff88032475dc00 avl=5188 grnt=1076 diff=4112 lustre-OST0001-osc-ffff880329e9b800 avl=5188 grnt=0 diff=5188 lustre-OST0002-osc-ffff88032475dc00 avl=5656 grnt=0 diff=5656 lustre-OST0002-osc-ffff880329e9b800 avl=5656 grnt=1148 diff=4508 lustre-OST0003-osc-ffff88032475dc00 avl=5656 grnt=204 diff=5452 lustre-OST0003-osc-ffff880329e9b800 avl=5656 grnt=376 diff=5280 lustre-OST0004-osc-ffff88032475dc00 avl=4632 grnt=68 diff=4564 lustre-OST0004-osc-ffff880329e9b800 avl=4632 grnt=244 diff=4388 lustre-OST0005-osc-ffff88032475dc00 avl=4112 grnt=0 diff=4112 lustre-OST0005-osc-ffff880329e9b800 avl=4112 grnt=0 diff=4112 no OSTs are close to full osc.lustre-OST0000-osc-ffff88032475dc00.kbytesavail=4632 osc.lustre-OST0000-osc-ffff880329e9b800.kbytesavail=4632 osc.lustre-OST0001-osc-ffff88032475dc00.kbytesavail=5188 osc.lustre-OST0001-osc-ffff880329e9b800.kbytesavail=5188 osc.lustre-OST0002-osc-ffff88032475dc00.kbytesavail=5656 osc.lustre-OST0002-osc-ffff880329e9b800.kbytesavail=5656 osc.lustre-OST0003-osc-ffff88032475dc00.kbytesavail=5656 osc.lustre-OST0003-osc-ffff880329e9b800.kbytesavail=5656 osc.lustre-OST0004-osc-ffff88032475dc00.kbytesavail=4632 osc.lustre-OST0004-osc-ffff880329e9b800.kbytesavail=4632 osc.lustre-OST0005-osc-ffff88032475dc00.kbytesavail=4112 osc.lustre-OST0005-osc-ffff880329e9b800.kbytesavail=4112 osc.lustre-OST0000-osc-ffff88032475dc00.cur_dirty_bytes=0 osc.lustre-OST0000-osc-ffff88032475dc00.cur_grant_bytes=229376 osc.lustre-OST0000-osc-ffff88032475dc00.cur_lost_grant_bytes=0 osc.lustre-OST0000-osc-ffff880329e9b800.cur_dirty_bytes=0 osc.lustre-OST0000-osc-ffff880329e9b800.cur_grant_bytes=45056 osc.lustre-OST0000-osc-ffff880329e9b800.cur_lost_grant_bytes=0 osc.lustre-OST0001-osc-ffff88032475dc00.cur_dirty_bytes=0 osc.lustre-OST0001-osc-ffff88032475dc00.cur_grant_bytes=1101824 osc.lustre-OST0001-osc-ffff88032475dc00.cur_lost_grant_bytes=0 osc.lustre-OST0001-osc-ffff880329e9b800.cur_dirty_bytes=0 osc.lustre-OST0001-osc-ffff880329e9b800.cur_grant_bytes=0 osc.lustre-OST0001-osc-ffff880329e9b800.cur_lost_grant_bytes=0 osc.lustre-OST0002-osc-ffff88032475dc00.cur_dirty_bytes=0 osc.lustre-OST0002-osc-ffff88032475dc00.cur_grant_bytes=0 osc.lustre-OST0002-osc-ffff88032475dc00.cur_lost_grant_bytes=0 osc.lustre-OST0002-osc-ffff880329e9b800.cur_dirty_bytes=0 osc.lustre-OST0002-osc-ffff880329e9b800.cur_grant_bytes=1175552 osc.lustre-OST0002-osc-ffff880329e9b800.cur_lost_grant_bytes=0 osc.lustre-OST0003-osc-ffff88032475dc00.cur_dirty_bytes=0 osc.lustre-OST0003-osc-ffff88032475dc00.cur_grant_bytes=208896 osc.lustre-OST0003-osc-ffff88032475dc00.cur_lost_grant_bytes=0 osc.lustre-OST0003-osc-ffff880329e9b800.cur_dirty_bytes=0 osc.lustre-OST0003-osc-ffff880329e9b800.cur_grant_bytes=385024 osc.lustre-OST0003-osc-ffff880329e9b800.cur_lost_grant_bytes=0 osc.lustre-OST0004-osc-ffff88032475dc00.cur_dirty_bytes=0 osc.lustre-OST0004-osc-ffff88032475dc00.cur_grant_bytes=69632 osc.lustre-OST0004-osc-ffff88032475dc00.cur_lost_grant_bytes=0 osc.lustre-OST0004-osc-ffff880329e9b800.cur_dirty_bytes=0 osc.lustre-OST0004-osc-ffff880329e9b800.cur_grant_bytes=249856 osc.lustre-OST0004-osc-ffff880329e9b800.cur_lost_grant_bytes=0 osc.lustre-OST0005-osc-ffff88032475dc00.cur_dirty_bytes=0 osc.lustre-OST0005-osc-ffff88032475dc00.cur_grant_bytes=0 osc.lustre-OST0005-osc-ffff88032475dc00.cur_lost_grant_bytes=0 osc.lustre-OST0005-osc-ffff880329e9b800.cur_dirty_bytes=0 osc.lustre-OST0005-osc-ffff880329e9b800.cur_grant_bytes=0 osc.lustre-OST0005-osc-ffff880329e9b800.cur_lost_grant_bytes=0 oos2.sh: line 86: + : syntax error: operand expected (error token is "+ ") + SUCCESS=0 ++ grep 'records out' /tmp/oos2.log ++ cut -d+ -f 1 ++ grep 'records out' /tmp/oos2.log2 ++ cut -d+ -f 1 oos2.sh: line 87: + : syntax error: operand expected (error token is "+ ") # cat /tmp/oos2.log dd: writing `/mnt/lustre/oosfile': No space left on device dd: closing output file `/mnt/lustre/oosfile': Input/output error # cat /tmp/oos2.log2 dd: writing `/mnt/lustre2/oosfile2': No space left on device dd: closing output file `/mnt/lustre2/oosfile2': Input/output error Maloo report: https://maloo.whamcloud.com/test_sets/410f8fb2-97d0-11e0-9a27-52540025f9af |
| Comment by Jian Yu [ 15/May/12 ] |
|
Lustre Tag: v1_8_8_WC1_RC1 sanity test 64b and sanityn test 15 failed with the same issue: |
| Comment by Jian Yu [ 10/Jan/13 ] |
|
With patch http://review.whamcloud.com/4986, sanity test 64b and sanityn test 15 passed. Lustre b1_8 client and server: Lustre b1_8 client and master server: |
| Comment by Jian Yu [ 17/Jan/13 ] |
|
Patch was landed on Lustre b1_8 branch. |
| Comment by Peter Jones [ 22/Feb/13 ] |
|
It looks like this was fixed by the |