[LU-1400] obdfilter-survey test_1c failed but still green in report Created: 13/May/12 Updated: 17/Apr/13 Resolved: 29/Aug/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.4.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Mikhail Pershin | Assignee: | Keith Mannthey (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 4479 | ||||||||
| Description |
|
Looking through obdfilter-survey 1c results I noticed that usually it takes more than 3000s to complete but often it is about 1700s only. All such runs ended up with error in fact but were kept as green so failure is not noticed: == obdfilter-survey test 1c: Object Storage Targets survey, big batch ================================ 22:23:42 (1336800222) + NETTYPE=tcp thrlo=128 nobjhi=1 thrhi=128 size=8192 case=disk rslt_loc=/tmp targets="10.10.4.77:lustre-OST0000 10.10.4.77:lustre-OST0001 10.10.4.77:lustre-OST0002 10.10.4.77:lustre-OST0003 10.10.4.77:lustre-OST0004 10.10.4.77:lustre-OST0005 10.10.4.77:lustre-OST0006" /usr/bin/obdfilter-survey Fri May 11 22:23:48 PDT 2012 Obdfilter-survey for case=disk from fat-intel-1vm2 ost 7 sz 58720256K rsz 1024K obj 7 thr 896 write 248.99 ERROR rewrite 119900.58 ERROR read 176.78 [ 0.00, 239.23] done! The same about test 2a which lasts only 2s: == obdfilter-survey test 2a: Stripe F/S over the Network == 23:11:25 (1336025485) + NETTYPE=tcp thrlo=8 nobjhi=1 thrhi=16 size=1024 case=netdisk rslt_loc=/tmp targets="172.29.3.12:lustre-OST0000 172.29.3.12:lustre-OST0001 172.29.3.12:lustre-OST0002 172.29.3.12:lustre-OST0003 172.29.3.12:lustre-OST0004 172.29.3.12:lustre-OST0005 172.29.3.12:lustre-OST0006" /usr/bin/obdfilter-survey Wed May 2 23:11:25 PDT 2012 Obdfilter-survey for case=netdisk from iu-3vm1.lab.whamcloud.com ost 7 sz 7340032K rsz 1024K obj 7 thr 56 write 260954.00 ERROR rewrite 266418.29 ERROR read 269207.02 ERROR ost 7 sz 7340032K rsz 1024K obj 7 thr 112 write 162454.32 ERROR rewrite 162440.28 ERROR read 162512.28 ERROR done! https://maloo.whamcloud.com/sub_tests/ccc4725c-9c20-11e1-8837-52540035b04c Only interop runs with 2.1 client work normally (check 2a results as 1c test is not in 2.1 yet): Therefore we have two issues there: 1) obdfilter-survey reporting issue, it stays green always 2) echo client issue causing errors |
| Comments |
| Comment by Peter Jones [ 18/May/12 ] |
|
Keith Could you please look into this one? Thanks Peter |
| Comment by Keith Mannthey (Inactive) [ 26/Jul/12 ] |
|
Sorry for the delay. I don't quite understand the 2nd error "2) echo client issue causing errors" Can you clarify what you mean? |
| Comment by Keith Mannthey (Inactive) [ 09/Aug/12 ] |
|
It looks as thought the test itself is working correctly but the /usr/bin/ part is not handling an ENOSPACE error correctly. The systems are running out of desk space but the the /usr/bin/obdfilter-survey script does not pass the failure along. I have added a flag to make the survey itself stop on an error and I am working on testing the change. |
| Comment by Keith Mannthey (Inactive) [ 21/Aug/12 ] |
|
http://review.whamcloud.com/#change,3591 is looking good for acceptance. |
| Comment by Keith Mannthey (Inactive) [ 29/Aug/12 ] |
|
There is a patch to fix the all is green issue in master. Please reopen if there is still an error. |