[LU-6915] sanity-lfsck test 31h fail: “(3) unexpected status” Created: 27/Jul/15 Updated: 11/Feb/16 Resolved: 11/Feb/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | nasf (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | lfsck | ||
| Environment: |
review-dne-part-2 in autotest |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
sanity-lfsck test 31h fails with “(3) unexpected status”. Logs are at: https://testing.hpdd.intel.com/test_sets/ef98233e-3293-11e5-8214-5254006e85c2 From the LFSCK namespace output, we see: 20:24:57:status: partial 20:24:57:flags: incomplete |
| Comments |
| Comment by James Nunez (Inactive) [ 02/Nov/15 ] |
|
Another failure on master: Another failure on master for sanity-lfsck test_31g: |
| Comment by Jian Yu [ 02/Dec/15 ] |
|
More instance on master: |
| Comment by nasf (Inactive) [ 10/Feb/16 ] |
|
Another failure instance: |
| Comment by nasf (Inactive) [ 11/Feb/16 ] |
00000020:00000080:0.0:1448881260.165609:0:29625:0:(class_obd.c:229:class_handle_ioctl()) cmd = c00866e6 00000004:00000080:0.0:1448881260.165616:0:29625:0:(mdt_handler.c:5587:mdt_iocontrol()) handling ioctl cmd 0xc00866e6 00100000:10000000:0.0:1448881260.166859:0:29625:0:(lfsck_namespace.c:3798:lfsck_namespace_reset()) lustre-MDT0000-osd: namespace LFSCK reset: rc = 0 00100000:10000000:1.0:1448881260.167039:0:29627:0:(osd_scrub.c:652:osd_scrub_prep()) lustre-MDT0000: OI scrub prep, flags = 0x46 00100000:10000000:1.0:1448881260.167043:0:29627:0:(osd_scrub.c:278:osd_scrub_file_reset()) lustre-MDT0000: reset OI scrub file, old flags = 0x0, add flags = 0x0 00100000:10000000:1.0:1448881260.167157:0:29628:0:(lfsck_engine.c:1562:lfsck_assistant_engine()) lustre-MDT0000-osd: lfsck_namespace LFSCK assistant thread start 00100000:10000000:1.0:1448881260.167179:0:29626:0:(lfsck_namespace.c:4041:lfsck_namespace_prep()) lustre-MDT0000-osd: namespace LFSCK prep done, start pos [1, [0x0:0x0:0x0], 0x0]: rc = 0 00100000:10000000:1.0:1448881260.167185:0:29627:0:(osd_scrub.c:1498:osd_scrub_main()) lustre-MDT0000: OI scrub start, flags = 0x46, pos = 12 00100000:10000000:1.0:1448881260.167673:0:29626:0:(lfsck_namespace.c:3940:lfsck_namespace_checkpoint()) lustre-MDT0000-osd: namespace LFSCK checkpoint at the pos [12, [0x0:0x0:0x0], 0x0]: rc = 0 00100000:10000000:1.0:1448881260.167676:0:29626:0:(lfsck_engine.c:1046:lfsck_master_engine()) LFSCK entry: oit_flags = 0x60000, dir_flags = 0x8006, oit_cookie = 12, dir_cookie = 0x0, parent = [0x0:0x0:0x0], pid = 29626 00000100:00100000:0.0:1448881260.167737:0:29625:0:(client.c:1530:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc lctl:lustre-MDT0000-mdtlov_UUID:29625:1519247244731748:10.2.4.167@tcp:1101 00000100:00100000:0.0:1448881260.167775:0:29625:0:(client.c:1530:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc lctl:lustre-MDT0000-mdtlov_UUID:29625:1519247244731752:10.2.4.167@tcp:1101 00000100:00100000:0.0:1448881260.167784:0:29625:0:(client.c:1530:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc lctl:lustre-MDT0000-mdtlov_UUID:29625:1519247244731756:10.2.4.167@tcp:1101 00000100:00100000:0.0:1448881260.167790:0:29625:0:(client.c:2210:ptlrpc_set_wait()) set ffff880059e146c0 going to sleep for 6 seconds 00100000:10000000:0.0:1448881260.170033:0:29625:0:(lfsck_lib.c:2031:lfsck_async_interpret_common()) lustre-MDT0000-osd: fail to notify MDT 3 for lfsck_namespace start: rc = -114 ... The logs shows that some former LFSCK instance has not finished yet when the new LFSCK start, that caused only part of MDTs joined the current LFSCK run, as to the finial LFSCK status was "partial", not "completed". We should make all the LFSCK instances to be completed before next LFSCK run. We already have the solution with the patch http://review.whamcloud.com/#/c/17406/ |
| Comment by nasf (Inactive) [ 11/Feb/16 ] |
|
It is another failure instance of |