Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5266

LBUG on Failover -ldlm_process_extent_lock()) ASSERTION( lock->l_granted_mode != lock->l_req_mode )

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.6.0, Lustre 2.5.4
    • Lustre 2.6.0
    • Hyperion - 2.5.60 build 2538
    • 3
    • 14694

    Description

      After hard failover of devices to server iws19, server wedged, then hit LBUG.
      Services never complete recovery, and appear to either restart the timer or something:

      Jun 27 10:08:53 iws19 kernel: Lustre: lustre-OST000c: Will be in recovery for at least 2:30, or until 316 clients reconnect
      Jun 27 10:11:53 iws19 kernel: Lustre: lustre-OST000c: recovery is timed out, evict stale exports
      Jun 27 10:18:30 iws19 kernel: Lustre: lustre-OST000c: Client c52d4856-d1df-b87b-911c-f1bfbc23a24d (at 192.168.124.182@o2ib) reconnecting, waiting for 316 clients in recovery for 2:27
      

      The server reports being cpu-bound prior to the failure
      Console log attached - unfortunately the dump after the LBUG failed.

      Attachments

        1. iws19.crash.txt
          24 kB
        2. iws23.dmesg
          29 kB
        3. iws29.lustre-log.1405540543.8118.txt
          0.3 kB
        4. iws29.lustre-log.1405540562.8181.txt
          316 kB
        5. iws29.messages.txt
          37 kB
        6. lustre-log.1404916388.10827.txt
          0.2 kB
        7. lustre-log.1404916402.10701.txt
          145 kB
        8. lustre-log.1404916421.10764.txt
          210 kB

        Issue Links

          Activity

            [LU-5266] LBUG on Failover -ldlm_process_extent_lock()) ASSERTION( lock->l_granted_mode != lock->l_req_mode )
            pjones Peter Jones added a comment -

            Vitaly
            Could you please open a new JIRA ticket to track this additional change. The original fix was in the already GA 2.6 release
            Thanks
            Peter

            pjones Peter Jones added a comment - Vitaly Could you please open a new JIRA ticket to track this additional change. The original fix was in the already GA 2.6 release Thanks Peter

            the previous fix was not exactly correct: http://review.whamcloud.com/11469

            vitaly_fertman Vitaly Fertman added a comment - the previous fix was not exactly correct: http://review.whamcloud.com/11469

            This should probably go into a new bug, since the LASSERT is fixed.

            adilger Andreas Dilger added a comment - This should probably go into a new bug, since the LASSERT is fixed.

            While testing this patch, still have some client evictions and log dumps. Message log and lustre log from latest attached.

            cliffw Cliff White (Inactive) added a comment - While testing this patch, still have some client evictions and log dumps. Message log and lustre log from latest attached.
            pjones Peter Jones added a comment -

            Landed for 2.6

            pjones Peter Jones added a comment - Landed for 2.6

            I tested the patch on Hyperion - no more LBUG, but did have a few evictions. lustre-logs and console logs from one run attached.

            cliffw Cliff White (Inactive) added a comment - I tested the patch on Hyperion - no more LBUG, but did have a few evictions. lustre-logs and console logs from one run attached.

            Was this introduced by http://review.whamcloud.com/5978 ?

            adilger Andreas Dilger added a comment - Was this introduced by http://review.whamcloud.com/5978 ?

            Yes, the issue could be triggered by the resent lock request.

            hongchao.zhang Hongchao Zhang added a comment - Yes, the issue could be triggered by the resent lock request.

            not sure if this failure is the same as the fixed one, but as caught by the same assertion, to not create another ticket, I put it here: http://review.whamcloud.com/10903

            vitaly_fertman Vitaly Fertman added a comment - not sure if this failure is the same as the fixed one, but as caught by the same assertion, to not create another ticket, I put it here: http://review.whamcloud.com/10903

            HongChao,
            could you please look into this one?
            Thank you!

            jlevi Jodi Levi (Inactive) added a comment - HongChao, could you please look into this one? Thank you!

            People

              hongchao.zhang Hongchao Zhang
              cliffw Cliff White (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: