[LU-1940] Test failure on test suite sanity, subtest test_118c Created: 14/Sep/12 Updated: 15/Feb/13 Resolved: 15/Feb/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | Lustre 2.4.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Maloo | Assignee: | Hongchao Zhang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | MB | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 4199 | ||||||||
| Description |
|
This issue was created by maloo for Oleg Drokin <green@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/d98c0f7a-fe88-11e1-a707-52540035b04c. The sub-test test_118c failed with the following error:
Info required for matching: sanity 118c |
| Comments |
| Comment by Ian Colle (Inactive) [ 19/Sep/12 ] |
|
https://maloo.whamcloud.com/test_sets/807ed662-0273-11e2-ab94-52540035b04c |
| Comment by Ian Colle (Inactive) [ 04/Oct/12 ] |
|
https://maloo.whamcloud.com/test_sets/ffd3d9f0-0e0b-11e2-bf2b-52540035b04c |
| Comment by Ian Colle (Inactive) [ 04/Oct/12 ] |
|
Hit this failure three times last night on three different patches. |
| Comment by Ian Colle (Inactive) [ 04/Oct/12 ] |
|
https://maloo.whamcloud.com/test_sets/8cf491dc-0e0a-11e2-91a3-52540035b04c |
| Comment by Ian Colle (Inactive) [ 04/Oct/12 ] |
|
https://maloo.whamcloud.com/test_sets/b5a029e6-0e07-11e2-bf2b-52540035b04c |
| Comment by Andreas Dilger [ 04/Oct/12 ] |
|
https://maloo.whamcloud.com/test_sets/8cf491dc-0e0a-11e2-91a3-52540035b04c |
| Comment by Andreas Dilger [ 19/Oct/12 ] |
|
https://maloo.whamcloud.com/sub_tests/594d6ba4-17f8-11e2-a41f-52540035b04c |
| Comment by Peng Tao [ 25/Oct/12 ] |
|
https://maloo.whamcloud.com/test_sets/38aa57a4-1ea1-11e2-8b41-52540035b04c |
| Comment by Keith Mannthey (Inactive) [ 26/Oct/12 ] |
|
Note 30 is EROFS: 22:40:54:Lustre: DEBUG MARKER: == sanity test 118c: Fsync blocks on EROFS until dirty pages are flushed ============ 22:40:50 (1351143650) 22:41:05:LustreError: 11-0: an error occurred while communicating with 10.10.4.161@tcp. The ost_write operation failed with -30 22:41:05:LustreError: 2990:0:(osc_request.c:1689:osc_brw_redo_request()) @@@ redo for recoverable error -30 req@ffff880079ad4c00 x1416774108278040/t0(0) o4->lustre-OST0003-osc-ffff88007a8f6000@10.10.4.161@tcp:6/4 lens 488/192 e 0 to 0 dl 1351143702 ref 2 fl Interpret:R/0/0 rc -30/-30 22:41:05:LustreError: 11-0: an error occurred while communicating with 10.10.4.161@tcp. The ost_write operation failed with -30 22:41:05:LustreError: 11-0: an error occurred while communicating with 10.10.4.161@tcp. The ost_write operation failed with -30 22:41:05:LustreError: 11-0: an error occurred while communicating with 10.10.4.161@tcp. The ost_write operation failed with -30 22:41:06:LustreError: 11-0: an error occurred while communicating with 10.10.4.161@tcp. The ost_write operation failed with -30 22:41:06:LustreError: 2990:0:(osc_request.c:1931:brw_interpret()) lustre-OST0003-osc-ffff88007a8f6000: too many resent retries for object: 1118:0, rc = -30. 22:41:17:Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanity test_118c: @@@@@@ FAIL: Multiop fsync failed, rc=30 22:41:17:Lustre: DEBUG MARKER: sanity test_118c: @@@@@@ FAIL: Multiop fsync failed, rc=30 We seem to be giving up (too many retries) before the system has become writable. EROFS is trying to write to a read only filesystem. It seems this test is to make sure we properly block in this condition. More investigation is needed. |
| Comment by Hongchao Zhang [ 12/Nov/12 ] |
|
status update: it is still under investigation. |
| Comment by Hongchao Zhang [ 14/Nov/12 ] |
|
this bug is caused by the resend limit in OSC (the default value is 10) for recoverable error(EIO, EROFS, ENOMEM, |
| Comment by Hongchao Zhang [ 20/Nov/12 ] |
|
the patch is tracked at http://review.whamcloud.com/#change,4622 |
| Comment by Peter Jones [ 26/Nov/12 ] |
|
Landed for 2.4 |
| Comment by Nathaniel Clark [ 27/Nov/12 ] |
|
https://maloo.whamcloud.com/test_sets/df9e51ca-3899-11e2-8c55-52540035b04c |
| Comment by Hongchao Zhang [ 28/Nov/12 ] |
|
the new occurrence is still the resend count, normally there is 1s interval between the fail_loc=OBD_FAIL_OST_EROFS the extra patch is tracked at http://review.whamcloud.com/#change,4694 |
| Comment by Keith Mannthey (Inactive) [ 03/Jan/13 ] |
|
From December 30. 1 error out of the last 100 runs. https://maloo.whamcloud.com/test_sets/90b395d0-5319-11e2-908e-52540035b04c |
| Comment by Andreas Dilger [ 15/Feb/13 ] |
|
Patch 4694 was landed for 2.4.0 |