Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • None
    • Lustre 2.4.0
    • Older iteration of orion branch, lustre 2.3.49.54-68chaos
    • 3
    • 5251

    Description

      We keep getting OSS nodes stuck in recovery on Sequoia's filesystem. the recovery_stat file reports:

      $ cat /proc/fs/lustre/obdfilter/ls1-OST0005/recovery_status 
      status: RECOVERING
      recovery_start: 1350515102
      time_remaining: 0
      connected_clients: 357/787
      req_replay_clients: 32
      lock_repay_clients: 131
      completed_clients: 226
      evicted_clients: 430
      replayed_requests: 23
      queued_requests: 32
      next_transno: 12885381895
      

      On the console we're seeing clearly bad messages like:

      Lustre: ls1-OST0005: Client 6307b568-c73d-a978-48ac-1fc11c345ba7 (at 172.20.11.30@o2ib500) reconnecting, waiting for 787 clients in recovery for -64:-32
      Lustre: Skipped 2006 previous similar messages
      Lustre: ls1-OST0005: Client 6307b568-c73d-a978-48ac-1fc11c345ba7 (at 172.20.11.30@o2ib500) refused reconnection, still busy with 1 active RPCs
      Lustre: Skipped 1908 previous similar messages
      

      Keep in mind, this is still the older orion branch code, our version 2.3.49.54-68chaos.

      Attachments

        Issue Links

          Activity

            [LU-2206] OSS stuck in recovery

            I'm closing this as a duplicate of LU-2104, since that bug has more debug info, and this one has almost nothing.

            adilger Andreas Dilger added a comment - I'm closing this as a duplicate of LU-2104 , since that bug has more debug info, and this one has almost nothing.

            Similar negative recovery times in LU-2104.

            morrone Christopher Morrone (Inactive) added a comment - Similar negative recovery times in LU-2104 .
            pjones Peter Jones added a comment -

            Alex

            Another Sequoia issue to review....

            Peter

            pjones Peter Jones added a comment - Alex Another Sequoia issue to review.... Peter

            People

              bzzz Alex Zhuravlev
              morrone Christopher Morrone (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: