Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6915

sanity-lfsck test 31h fail: “(3) unexpected status”

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.8.0
    • review-dne-part-2 in autotest
    • 3
    • 9223372036854775807

    Description

      sanity-lfsck test 31h fails with “(3) unexpected status”. Logs are at: https://testing.hpdd.intel.com/test_sets/ef98233e-3293-11e5-8214-5254006e85c2

      From the LFSCK namespace output, we see:

      20:24:57:status: partial
      20:24:57:flags: incomplete
      

      Attachments

        Issue Links

          Activity

            [LU-6915] sanity-lfsck test 31h fail: “(3) unexpected status”

            It is another failure instance of LU-7256.

            yong.fan nasf (Inactive) added a comment - It is another failure instance of LU-7256 .
            00000020:00000080:0.0:1448881260.165609:0:29625:0:(class_obd.c:229:class_handle_ioctl()) cmd = c00866e6
            00000004:00000080:0.0:1448881260.165616:0:29625:0:(mdt_handler.c:5587:mdt_iocontrol()) handling ioctl cmd 0xc00866e6
            00100000:10000000:0.0:1448881260.166859:0:29625:0:(lfsck_namespace.c:3798:lfsck_namespace_reset()) lustre-MDT0000-osd: namespace LFSCK reset: rc = 0
            00100000:10000000:1.0:1448881260.167039:0:29627:0:(osd_scrub.c:652:osd_scrub_prep()) lustre-MDT0000: OI scrub prep, flags = 0x46
            00100000:10000000:1.0:1448881260.167043:0:29627:0:(osd_scrub.c:278:osd_scrub_file_reset()) lustre-MDT0000: reset OI scrub file, old flags = 0x0, add flags = 0x0
            00100000:10000000:1.0:1448881260.167157:0:29628:0:(lfsck_engine.c:1562:lfsck_assistant_engine()) lustre-MDT0000-osd: lfsck_namespace LFSCK assistant thread start
            00100000:10000000:1.0:1448881260.167179:0:29626:0:(lfsck_namespace.c:4041:lfsck_namespace_prep()) lustre-MDT0000-osd: namespace LFSCK prep done, start pos [1, [0x0:0x0:0x0], 0x0]: rc = 0
            00100000:10000000:1.0:1448881260.167185:0:29627:0:(osd_scrub.c:1498:osd_scrub_main()) lustre-MDT0000: OI scrub start, flags = 0x46, pos = 12
            00100000:10000000:1.0:1448881260.167673:0:29626:0:(lfsck_namespace.c:3940:lfsck_namespace_checkpoint()) lustre-MDT0000-osd: namespace LFSCK checkpoint at the pos [12, [0x0:0x0:0x0], 0x0]: rc = 0
            00100000:10000000:1.0:1448881260.167676:0:29626:0:(lfsck_engine.c:1046:lfsck_master_engine()) LFSCK entry: oit_flags = 0x60000, dir_flags = 0x8006, oit_cookie = 12, dir_cookie = 0x0, parent = [0x0:0x0:0x0], pid = 29626
            00000100:00100000:0.0:1448881260.167737:0:29625:0:(client.c:1530:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc lctl:lustre-MDT0000-mdtlov_UUID:29625:1519247244731748:10.2.4.167@tcp:1101
            00000100:00100000:0.0:1448881260.167775:0:29625:0:(client.c:1530:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc lctl:lustre-MDT0000-mdtlov_UUID:29625:1519247244731752:10.2.4.167@tcp:1101
            00000100:00100000:0.0:1448881260.167784:0:29625:0:(client.c:1530:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc lctl:lustre-MDT0000-mdtlov_UUID:29625:1519247244731756:10.2.4.167@tcp:1101
            00000100:00100000:0.0:1448881260.167790:0:29625:0:(client.c:2210:ptlrpc_set_wait()) set ffff880059e146c0 going to sleep for 6 seconds
            00100000:10000000:0.0:1448881260.170033:0:29625:0:(lfsck_lib.c:2031:lfsck_async_interpret_common()) lustre-MDT0000-osd: fail to notify MDT 3 for lfsck_namespace start: rc = -114
            ...
            

            The logs shows that some former LFSCK instance has not finished yet when the new LFSCK start, that caused only part of MDTs joined the current LFSCK run, as to the finial LFSCK status was "partial", not "completed".

            We should make all the LFSCK instances to be completed before next LFSCK run. We already have the solution with the patch http://review.whamcloud.com/#/c/17406/

            yong.fan nasf (Inactive) added a comment - 00000020:00000080:0.0:1448881260.165609:0:29625:0:(class_obd.c:229:class_handle_ioctl()) cmd = c00866e6 00000004:00000080:0.0:1448881260.165616:0:29625:0:(mdt_handler.c:5587:mdt_iocontrol()) handling ioctl cmd 0xc00866e6 00100000:10000000:0.0:1448881260.166859:0:29625:0:(lfsck_namespace.c:3798:lfsck_namespace_reset()) lustre-MDT0000-osd: namespace LFSCK reset: rc = 0 00100000:10000000:1.0:1448881260.167039:0:29627:0:(osd_scrub.c:652:osd_scrub_prep()) lustre-MDT0000: OI scrub prep, flags = 0x46 00100000:10000000:1.0:1448881260.167043:0:29627:0:(osd_scrub.c:278:osd_scrub_file_reset()) lustre-MDT0000: reset OI scrub file, old flags = 0x0, add flags = 0x0 00100000:10000000:1.0:1448881260.167157:0:29628:0:(lfsck_engine.c:1562:lfsck_assistant_engine()) lustre-MDT0000-osd: lfsck_namespace LFSCK assistant thread start 00100000:10000000:1.0:1448881260.167179:0:29626:0:(lfsck_namespace.c:4041:lfsck_namespace_prep()) lustre-MDT0000-osd: namespace LFSCK prep done, start pos [1, [0x0:0x0:0x0], 0x0]: rc = 0 00100000:10000000:1.0:1448881260.167185:0:29627:0:(osd_scrub.c:1498:osd_scrub_main()) lustre-MDT0000: OI scrub start, flags = 0x46, pos = 12 00100000:10000000:1.0:1448881260.167673:0:29626:0:(lfsck_namespace.c:3940:lfsck_namespace_checkpoint()) lustre-MDT0000-osd: namespace LFSCK checkpoint at the pos [12, [0x0:0x0:0x0], 0x0]: rc = 0 00100000:10000000:1.0:1448881260.167676:0:29626:0:(lfsck_engine.c:1046:lfsck_master_engine()) LFSCK entry: oit_flags = 0x60000, dir_flags = 0x8006, oit_cookie = 12, dir_cookie = 0x0, parent = [0x0:0x0:0x0], pid = 29626 00000100:00100000:0.0:1448881260.167737:0:29625:0:(client.c:1530:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc lctl:lustre-MDT0000-mdtlov_UUID:29625:1519247244731748:10.2.4.167@tcp:1101 00000100:00100000:0.0:1448881260.167775:0:29625:0:(client.c:1530:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc lctl:lustre-MDT0000-mdtlov_UUID:29625:1519247244731752:10.2.4.167@tcp:1101 00000100:00100000:0.0:1448881260.167784:0:29625:0:(client.c:1530:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc lctl:lustre-MDT0000-mdtlov_UUID:29625:1519247244731756:10.2.4.167@tcp:1101 00000100:00100000:0.0:1448881260.167790:0:29625:0:(client.c:2210:ptlrpc_set_wait()) set ffff880059e146c0 going to sleep for 6 seconds 00100000:10000000:0.0:1448881260.170033:0:29625:0:(lfsck_lib.c:2031:lfsck_async_interpret_common()) lustre-MDT0000-osd: fail to notify MDT 3 for lfsck_namespace start: rc = -114 ... The logs shows that some former LFSCK instance has not finished yet when the new LFSCK start, that caused only part of MDTs joined the current LFSCK run, as to the finial LFSCK status was "partial", not "completed". We should make all the LFSCK instances to be completed before next LFSCK run. We already have the solution with the patch http://review.whamcloud.com/#/c/17406/
            yong.fan nasf (Inactive) added a comment - Another failure instance: https://testing.hpdd.intel.com/test_sets/fdb5d7b8-cb18-11e5-be8d-5254006e85c2
            yujian Jian Yu added a comment - More instance on master: https://testing.hpdd.intel.com/test_sets/79ea4116-9784-11e5-b72a-5254006e85c2
            jamesanunez James Nunez (Inactive) added a comment - - edited

            Another failure on master:
            2015-10-31 04:03:03 - https://testing.hpdd.intel.com/test_sets/0248405c-7fbc-11e5-bf12-5254006e85c2

            Another failure on master for sanity-lfsck test_31g:
            2015-11-02 19:16:01 - https://testing.hpdd.intel.com/test_sets/fdb229ae-81cd-11e5-af7b-5254006e85c2

            jamesanunez James Nunez (Inactive) added a comment - - edited Another failure on master: 2015-10-31 04:03:03 - https://testing.hpdd.intel.com/test_sets/0248405c-7fbc-11e5-bf12-5254006e85c2 Another failure on master for sanity-lfsck test_31g: 2015-11-02 19:16:01 - https://testing.hpdd.intel.com/test_sets/fdb229ae-81cd-11e5-af7b-5254006e85c2

            People

              yong.fan nasf (Inactive)
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: