Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3929

2.1.6->2.4.1 rolling upgrade: lustre-MDT0000: recovery is timed out, evict stale exports

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.5.1
    • Lustre 2.4.0, Lustre 2.4.1, Lustre 2.5.0
    • 3
    • 10379

    Description

      While performing rolling upgrade from Lustre 2.1.6 to 2.4.1 RC2 with the path of OSS->MDS->Client one by one, the test failed after upgrading MDS:

      Starting the MDS service on fat-amd-3...
      ----------------
      fat-amd-3
      ----------------
      debug=-1
      subsystem_debug=all -lnet -lnd -pinger
      debug_mb=100
      pdsh -l root -t 100 -S -w fat-amd-3 "mkdir -p /mnt/mds1 && mount -t lustre -o user_xattr /dev/sdc1 /mnt/mds1"
      Waiting 895 secs for fat-amd-3 recovery done. status: RECOVERING
      <~snip~>
      Waiting 5 secs for fat-amd-3 recovery done. status: RECOVERING
      Waiting 0 secs for fat-amd-3 recovery done. status: RECOVERING
      fat-amd-3 recovery not done in 900 sec. status: RECOVERING
      

      On MDS fat-amd-3, "lctl get_param -n ..recovery_status" showed that:

      ----------------
      fat-amd-3
      ----------------
      status: RECOVERING
      recovery_start: 1378874775
      time_remaining: 0
      connected_clients: 2/4
      req_replay_clients: 0
      lock_repay_clients: 0
      completed_clients: 2
      evicted_clients: 0
      replayed_requests: 0
      queued_requests: 0
      next_transno: 4294967297
      

      Console log on MDS showed that:

      Lustre: lustre-MDT0000: Will be in recovery for at least 1:00, or until 4 clients reconnect
      Lustre: lustre-MDT0000: recovery is timed out, evict stale exports
      Lustre: lustre-MDT0000: disconnecting 2 stale clients
      Lustre: lustre-MDT0000: recovery is timed out, evict stale exports
      

      Maloo reports:
      https://maloo.whamcloud.com/test_sets/d91a2b68-1aa1-11e3-88ff-52540035b04c
      https://maloo.whamcloud.com/test_sets/dae9450c-1a86-11e3-8ceb-52540035b04c

      The same failure also occurred while rolling upgrade from Lustre 2.1.6 to 2.4.0:
      https://maloo.whamcloud.com/test_sets/c70af506-1ab5-11e3-8898-52540035b04c

      Attachments

        Issue Links

          Activity

            [LU-3929] 2.1.6->2.4.1 rolling upgrade: lustre-MDT0000: recovery is timed out, evict stale exports
            pjones Peter Jones added a comment -

            Landed for 2.5.1

            pjones Peter Jones added a comment - Landed for 2.5.1

            Hi,

            Here is the test I carried out:

            It went off smoothly. So I confirm that the master patch is enough. And, as Oleg explained, having the patch in the target version simplifies upgrade.

            Cheers,
            Sebastien.

            sebastien.buisson Sebastien Buisson (Inactive) added a comment - Hi, Here is the test I carried out: full file system installed with stock 2.1.6 upgrade to 2.4.1 + patch http://review.whamcloud.com/8328 with the path OSS->MDS->Client It went off smoothly. So I confirm that the master patch is enough. And, as Oleg explained, having the patch in the target version simplifies upgrade. Cheers, Sebastien.
            green Oleg Drokin added a comment -

            Yes, the master patch alone should be enough to allow upgrades from unpatched 2.1 OSTS (i.e. those that do not have 8086 patch present). Can you give such a combination a try, please?

            We believe it's a better way since it saves you one extra step of upgrading all your OSTS to 2.1.6+patch before you can update your MDS to 2.4+ and then update your OSTs again to 2.4+ too (which is kind of overkill).

            green Oleg Drokin added a comment - Yes, the master patch alone should be enough to allow upgrades from unpatched 2.1 OSTS (i.e. those that do not have 8086 patch present). Can you give such a combination a try, please? We believe it's a better way since it saves you one extra step of upgrading all your OSTS to 2.1.6+patch before you can update your MDS to 2.4+ and then update your OSTs again to 2.4+ too (which is kind of overkill).

            Do you mean the master patch alone would be enough to be able to successfully upgrade from 2.1 with the path OSS->MDS->Client?

            sebastien.buisson Sebastien Buisson (Inactive) added a comment - Do you mean the master patch alone would be enough to be able to successfully upgrade from 2.1 with the path OSS->MDS->Client?

            it could allow more previous Lustre version to upgrade to new version with the patch against master.

            hongchao.zhang Hongchao Zhang added a comment - it could allow more previous Lustre version to upgrade to new version with the patch against master.

            Hi, I have just tested patch http://review.whamcloud.com/#/c/8086/ for b2_1, and it works fine. I mean rolling upgrade from Lustre 2.1.6 plus this patch to 2.4.1 went off smoothly.

            So now I am wondering what is the purpose of this new patch http://review.whamcloud.com/#/c/8328/ for master.

            Sebastien.

            sebastien.buisson Sebastien Buisson (Inactive) added a comment - Hi, I have just tested patch http://review.whamcloud.com/#/c/8086/ for b2_1, and it works fine. I mean rolling upgrade from Lustre 2.1.6 plus this patch to 2.4.1 went off smoothly. So now I am wondering what is the purpose of this new patch http://review.whamcloud.com/#/c/8328/ for master. Sebastien.

            the patch against master is tracked at http://review.whamcloud.com/#/c/8328/

            hongchao.zhang Hongchao Zhang added a comment - the patch against master is tracked at http://review.whamcloud.com/#/c/8328/

            the patch is against b2_1, and is tracked at http://review.whamcloud.com/#/c/8086/

            hongchao.zhang Hongchao Zhang added a comment - the patch is against b2_1, and is tracked at http://review.whamcloud.com/#/c/8086/

            status update:
            the patch is under testing and will be pushed to Gerrit soon. Thanks

            hongchao.zhang Hongchao Zhang added a comment - status update: the patch is under testing and will be pushed to Gerrit soon. Thanks

            Hi,

            We are suffering from this error, which is very annoying in case a customer wants to upgrade its OSSes first and then its MDSes and clients.

            This ticket was opened a month ago, but did not make any progress since then. This is surprising, as I would tend to consider it a major issue on the upgrade path from 2.1 to 2.4 (and 2.5 too). Am I missing something?

            Sebastien.

            sebastien.buisson Sebastien Buisson (Inactive) added a comment - Hi, We are suffering from this error, which is very annoying in case a customer wants to upgrade its OSSes first and then its MDSes and clients. This ticket was opened a month ago, but did not make any progress since then. This is surprising, as I would tend to consider it a major issue on the upgrade path from 2.1 to 2.4 (and 2.5 too). Am I missing something? Sebastien.

            People

              hongchao.zhang Hongchao Zhang
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: