[LU-801] Test failure on test suite liblustre, subtest test_1 Created: 28/Oct/11 Updated: 19/Jan/12 Resolved: 19/Jan/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Chris Gearing (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 6534 | ||||||||
| Description |
|
This issue was created by maloo for Chris Gearing <chris@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/b2aec26e-00e9-11e1-bd0b-52540025f9af. See errors in dmesg for client vm1 The sub-test test_1 failed with the following error:
Info required for matching: liblustre 1 |
| Comments |
| Comment by Johann Lombardi (Inactive) [ 24/Nov/11 ] |
|
I looked at the logs and it seems that lustre was not running on the MDS, that's why liblustre did not manage to connect to the MGS. Lustre: DEBUG MARKER: == lfsck lfsck.sh test complete, duration 131 sec ==================================================== 19:20:53 (1319682053) Lustre: Modifying parameter lustre-MDT0000.mdd.quota_type in log lustre-MDT0000 Lustre: Skipped 7 previous similar messages LustreError: 3562:0:(ldlm_request.c:1173:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway LustreError: 3562:0:(ldlm_request.c:1173:ldlm_cli_cancel_req()) Skipped 1 previous similar message LustreError: 3562:0:(ldlm_request.c:1800:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108 LustreError: 3562:0:(ldlm_request.c:1800:ldlm_cli_cancel_list()) Skipped 1 previous similar message LustreError: 2249:0:(mgs_handler.c:804:mgs_handle()) lustre_mgs: operation 251 on unconnected MGS LustreError: 2249:0:(ldlm_lib.c:2163:target_send_reply_msg()) @@@ processing error (-107) req@ffff81003ac8e050 x1383767884708251/t0(0) o-1-><?>@<?>:0/0 lens 192/0 e 0 to 0 dl 1319682066 ref 1 fl Interpret:/ffffffff/ffffffff rc -107/-1 LustreError: 11-0: an error occurred while communicating with 0@lo. The mgs_disconnect operation failed with -107 LustreError: Skipped 2 previous similar messages Lustre: MGS has stopped. Lustre: server umount lustre-MDT0000 complete Lustre: DEBUG MARKER: -----============= acceptance-small: liblustre ============----- Wed Oct 26 19:21:50 PDT 2011 Lustre: DEBUG MARKER: == liblustre test 1: liblustre sanity ================================================================ 19:21:55 (1319682115) SysRq : Show State So the mgs/mds has not been remounted before running the liblustre tests. I also checked the sysrq-t output and there is indeed no mds threads running. liblustre sanity definitely requires the filesystem to be up and running before being invoked. Is it an autotest issue? |
| Comment by Johann Lombardi (Inactive) [ 24/Nov/11 ] |
|
Chris, Yujian, any thoughts? |
| Comment by Chris Gearing (Inactive) [ 19/Dec/11 ] |
|
I don't have any useful input I'm afraid, I did look at the logs before raising the bug. |
| Comment by Peter Jones [ 04/Jan/12 ] |
|
Minh Are you able to comment on this one? Thanks Peter |
| Comment by Minh Diep [ 09/Jan/12 ] |
|
I have looked at the log and the past lfsck log. We don't see lustre unmount at the end of lfsck. I don't know when this log showed that. I even ran manually and no umount at all. Resigning this back to Chris since this might relate to how autotest run lfsck |
| Comment by Andreas Dilger [ 19/Jan/12 ] |
|
Closing as a duplicate of |