Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3929

2.1.6->2.4.1 rolling upgrade: lustre-MDT0000: recovery is timed out, evict stale exports

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.5.1
    • Lustre 2.4.0, Lustre 2.4.1, Lustre 2.5.0
    • 3
    • 10379

    Description

      While performing rolling upgrade from Lustre 2.1.6 to 2.4.1 RC2 with the path of OSS->MDS->Client one by one, the test failed after upgrading MDS:

      Starting the MDS service on fat-amd-3...
      ----------------
      fat-amd-3
      ----------------
      debug=-1
      subsystem_debug=all -lnet -lnd -pinger
      debug_mb=100
      pdsh -l root -t 100 -S -w fat-amd-3 "mkdir -p /mnt/mds1 && mount -t lustre -o user_xattr /dev/sdc1 /mnt/mds1"
      Waiting 895 secs for fat-amd-3 recovery done. status: RECOVERING
      <~snip~>
      Waiting 5 secs for fat-amd-3 recovery done. status: RECOVERING
      Waiting 0 secs for fat-amd-3 recovery done. status: RECOVERING
      fat-amd-3 recovery not done in 900 sec. status: RECOVERING
      

      On MDS fat-amd-3, "lctl get_param -n ..recovery_status" showed that:

      ----------------
      fat-amd-3
      ----------------
      status: RECOVERING
      recovery_start: 1378874775
      time_remaining: 0
      connected_clients: 2/4
      req_replay_clients: 0
      lock_repay_clients: 0
      completed_clients: 2
      evicted_clients: 0
      replayed_requests: 0
      queued_requests: 0
      next_transno: 4294967297
      

      Console log on MDS showed that:

      Lustre: lustre-MDT0000: Will be in recovery for at least 1:00, or until 4 clients reconnect
      Lustre: lustre-MDT0000: recovery is timed out, evict stale exports
      Lustre: lustre-MDT0000: disconnecting 2 stale clients
      Lustre: lustre-MDT0000: recovery is timed out, evict stale exports
      

      Maloo reports:
      https://maloo.whamcloud.com/test_sets/d91a2b68-1aa1-11e3-88ff-52540035b04c
      https://maloo.whamcloud.com/test_sets/dae9450c-1a86-11e3-8ceb-52540035b04c

      The same failure also occurred while rolling upgrade from Lustre 2.1.6 to 2.4.0:
      https://maloo.whamcloud.com/test_sets/c70af506-1ab5-11e3-8898-52540035b04c

      Attachments

        Issue Links

          Activity

            People

              hongchao.zhang Hongchao Zhang
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: