[LU-4331] Test timeout on recovery-small test_59 Created: 02/Dec/13 Updated: 11/Aug/15 Resolved: 11/Aug/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.6.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | WC Triage |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | zfs | ||
| Severity: | 3 |
| Rank (Obsolete): | 11852 |
| Description |
|
This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com> This issue relates to the following test suite run: The sub-test test_59 failed with the following error:
Info required for matching: recovery-small 59 |
| Comments |
| Comment by nasf (Inactive) [ 11/Jan/14 ] |
|
Another failure instance: https://maloo.whamcloud.com/test_sets/b9ca3f64-7aaf-11e3-a11f-52540035b04c |
| Comment by Bobbie Lind (Inactive) [ 02/May/14 ] |
|
Most tests seem to timeout during a mount or unmount operation. As can be seen here http://review.whamcloud.com/#/c/9205/. I have to update and run the test timeouts again as it's been a while since I last ran this test. |
| Comment by nasf (Inactive) [ 28/Jun/15 ] |
|
Another failure instance: Test output: == recovery-small test 59: Read cancel race on client eviction == 15:17:50 (1435443470) Starting client: onyx-41vm6.onyx.hpdd.intel.com: -o user_xattr,flock onyx-41vm3@tcp:/lustre /mnt/lustre2 CMD: onyx-41vm6.onyx.hpdd.intel.com mkdir -p /mnt/lustre2 CMD: onyx-41vm6.onyx.hpdd.intel.com mount -t lustre -o user_xattr,flock onyx-41vm3@tcp:/lustre /mnt/lustre2 fail_loc=0x311 fail_loc=0 MDS console log: 23:19:05:Lustre: DEBUG MARKER: == recovery-small test 59: Read cancel race on client eviction == 15:17:50 (1435443470) 23:19:05:LustreError: 11-0: lustre-OST0005-osc-MDT0000: operation ost_connect to node 10.2.4.195@tcp failed: rc = -114 23:19:05:LustreError: Skipped 51 previous similar messages OSS console log: 22:39:52:Lustre: DEBUG MARKER: == recovery-small test 59: Read cancel race on client eviction == 15:17:50 (1435443470) 22:39:52:Lustre: lustre-OST0005: Export ffff880065304000 already connecting from 10.2.4.197@tcp 22:39:52:Lustre: Skipped 71 previous similar messages 22:39:52:Lustre: 5060:0:(service.c:1336:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply 22:39:52: req@ffff8800440d3980 x1505169144091928/t0(0) o4->512c404f-3996-b0b6-e5c0-201d629dc366@10.2.4.197@tcp:591/0 lens 608/448 e 23 to 0 dl 1435443586 ref 2 fl Interpret:H/0/0 rc 0/0 22:39:52:Lustre: lustre-OST0005: Client 512c404f-3996-b0b6-e5c0-201d629dc366 (at 10.2.4.197@tcp) reconnecting 22:39:52:Lustre: 5061:0:(service.c:1336:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply 22:39:52: req@ffff880068f9e680 x1505169144119492/t0(0) o4->512c404f-3996-b0b6-e5c0-201d629dc366@10.2.4.197@tcp:643/0 lens 608/448 e 8 to 0 dl 1435443638 ref 2 fl Interpret:/0/0 rc 0/0 22:39:52:Lustre: 5061:0:(service.c:1336:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages 22:39:52:Lustre: lustre-OST0005: Export ffff880065304000 already connecting from 10.2.4.197@tcp 22:39:52:Lustre: Skipped 203 previous similar messages |
| Comment by Andreas Dilger [ 11/Aug/15 ] |
|
This hasn't really been failing since 2014-06-27 except for possibly one hit every three months, which might be totally unrelated. It isn't possible to determine if these failures are related anymore because the old test logs are not available and there is not even the most basic description of the failure in the bug except that test_59 timed out. Closing this and we can re-open it or create a new one if this test starts failing again. |