Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5724

IR recovery doesn't behave properly with Lustre 2.5

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • None
    • Lustre 2.5.3
    • MDS server running RHEL6.5 running ORNL 2.5.3 branch with about 12 patches.
    • 2
    • 16076

    Description

      Today we experienced a hardware failure with our MDS. The MDS rebooted and then came back. We restarted the MDS but IR behaved strangely. Four clients got evicted but when the timer to completion got down to zero IR restarted all over again. Then once it got to the 700 second range the timer starting to go up. It did this a few times before letting the timer running out. Once the timer did finally get to zero the recovery state was reported as still being in recovery. It removed this way for several more minutes before finally being in a recovered state. In all it toke 54 minutes to recover.

      Attachments

        Issue Links

          Activity

            [LU-5724] IR recovery doesn't behave properly with Lustre 2.5
            yujian Jian Yu added a comment -

            Close this ticket as a duplicate of LU-4119.

            yujian Jian Yu added a comment - Close this ticket as a duplicate of LU-4119 .
            yujian Jian Yu added a comment -

            Large scale testing passed.

            Hi James, can we close this ticket now?

            yujian Jian Yu added a comment - Large scale testing passed. Hi James, can we close this ticket now?
            yujian Jian Yu added a comment -

            With the patch for LU-4119, the recovery issue did not occur at small scale testing in ORNL. Large scale testing will be performed.

            yujian Jian Yu added a comment - With the patch for LU-4119 , the recovery issue did not occur at small scale testing in ORNL. Large scale testing will be performed.
            hongchao.zhang Hongchao Zhang added a comment - - edited

            Is the failover mode the same for both tests at Dec 31, 2014 and at Jan 14, 2015, which is there is a separated node
            only running MGS connected by MDS, OSSs and the clients nodes, and the MDS and OSSs are failed over together?

            hongchao.zhang Hongchao Zhang added a comment - - edited Is the failover mode the same for both tests at Dec 31, 2014 and at Jan 14, 2015, which is there is a separated node only running MGS connected by MDS, OSSs and the clients nodes, and the MDS and OSSs are failed over together?

            No the MGS is left up. We failed over the MDS and OSS together.

            simmonsja James A Simmons added a comment - No the MGS is left up. We failed over the MDS and OSS together.

            Are both MGS and MDS failed over in this test?

            the IR status will be set IR_STARTUP after MGS is started and will be changed to IR_FULL after "ir_timeout" seconds
            (default is OBD_IR_MGS_TIMEOUT = "4*obd_timeout"). the target(MDT or OST) registered to MGS will only be marked as
            "LDD_F_IR_CAPABLE" if the IR status is IR_FULL, and "IR" will be printed as "DISABLED" in this case.

            for the client side, the imperative_recovery will be marked as "Enabled" if the connection with the server supports recovery
            (imp->imp_connect_data & OBD_CONNECT_IMP_RECOV == TRUE).

            hongchao.zhang Hongchao Zhang added a comment - Are both MGS and MDS failed over in this test? the IR status will be set IR_STARTUP after MGS is started and will be changed to IR_FULL after "ir_timeout" seconds (default is OBD_IR_MGS_TIMEOUT = "4*obd_timeout"). the target(MDT or OST) registered to MGS will only be marked as "LDD_F_IR_CAPABLE" if the IR status is IR_FULL, and "IR" will be printed as "DISABLED" in this case. for the client side, the imperative_recovery will be marked as "Enabled" if the connection with the server supports recovery (imp->imp_connect_data & OBD_CONNECT_IMP_RECOV == TRUE).
            simmonsja James A Simmons added a comment - - edited

            Testing failover and we see with just the MDS being failed over:

            very 1.0s: cat recovery_status Wed Jan 14 14:11:11 2015

            status: COMPLETE
            recovery_start: 1421262582
            recovery_duration: 60
            completed_clients: 82/82
            replayed_requests: 0
            last_transno: 90194315377
            VBR: DISABLED
            IR: DISABLED

            All clients are 2.5+ so there should be no reason that IR is disabled. Can you report this problem on your side?

            On the MGS we see during failover.

            root@atlas-tds-mds1 MGC10.36.226.79@o2ib]# cat ir_state
            imperative_recovery: ENABLED
            client_state:

            • { client: atlastds-MDT0000, nidtbl_version: 957 }
            simmonsja James A Simmons added a comment - - edited Testing failover and we see with just the MDS being failed over: very 1.0s: cat recovery_status Wed Jan 14 14:11:11 2015 status: COMPLETE recovery_start: 1421262582 recovery_duration: 60 completed_clients: 82/82 replayed_requests: 0 last_transno: 90194315377 VBR: DISABLED IR: DISABLED All clients are 2.5+ so there should be no reason that IR is disabled. Can you report this problem on your side? On the MGS we see during failover. root@atlas-tds-mds1 MGC10.36.226.79@o2ib]# cat ir_state imperative_recovery: ENABLED client_state: { client: atlastds-MDT0000, nidtbl_version: 957 }

            sorry for misunderstanding! Yes, Lustre supports both MDT and OSS to fail over.

            hongchao.zhang Hongchao Zhang added a comment - sorry for misunderstanding! Yes, Lustre supports both MDT and OSS to fail over.
            yujian Jian Yu added a comment -

            In Lustre test suite, the following sub-tests in insanity.sh test failing MDS and OSS at the same time:

            run_test 2 "Second Failure Mode: MDS/OST `date`"
            run_test 4 "Fourth Failure Mode: OST/MDS `date`"
            

            The basic test steps for sub-test 2 are:

            fail MDS
            fail OSS
            start OSS
            start MDS
            

            And the basic test steps for sub-test 4 are:

            fail OSS
            fail MDS
            start OSS
            start MDS
            

            Here is the insanity test report for Lustre b2_5 build #107: https://testing.hpdd.intel.com/test_sets/bfd812b0-8a4d-11e4-a10b-5254006e85c2

            yujian Jian Yu added a comment - In Lustre test suite, the following sub-tests in insanity.sh test failing MDS and OSS at the same time: run_test 2 "Second Failure Mode: MDS/OST `date`" run_test 4 "Fourth Failure Mode: OST/MDS `date`" The basic test steps for sub-test 2 are: fail MDS fail OSS start OSS start MDS And the basic test steps for sub-test 4 are: fail OSS fail MDS start OSS start MDS Here is the insanity test report for Lustre b2_5 build #107: https://testing.hpdd.intel.com/test_sets/bfd812b0-8a4d-11e4-a10b-5254006e85c2
            green Oleg Drokin added a comment -

            I think what HongChao tries to say is that when MDS and OST both go down, then since MDS is a client of OST, the OST recovery can never complete because of the missing client.

            But on the other hand we took two steps to help this. First, MDS client UUID should be always the same, so even after restart it still should be allowed to reconnect as the old known client (this is assuming it actually got up and into reconecting state in time for OST recovery. if your MDS takes ages to reboot, for example, it might miss this window, esp. if it's a shortened window thanks to IR).

            Second, we have VBR to deal with missing clients during recovery, which is esp. easy with MDT client, since it never has any outstanding uncommitted transactions to replay.

            green Oleg Drokin added a comment - I think what HongChao tries to say is that when MDS and OST both go down, then since MDS is a client of OST, the OST recovery can never complete because of the missing client. But on the other hand we took two steps to help this. First, MDS client UUID should be always the same, so even after restart it still should be allowed to reconnect as the old known client (this is assuming it actually got up and into reconecting state in time for OST recovery. if your MDS takes ages to reboot, for example, it might miss this window, esp. if it's a shortened window thanks to IR). Second, we have VBR to deal with missing clients during recovery, which is esp. easy with MDT client, since it never has any outstanding uncommitted transactions to replay.

            People

              hongchao.zhang Hongchao Zhang
              simmonsja James A Simmons
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: