Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5266

LBUG on Failover -ldlm_process_extent_lock()) ASSERTION( lock->l_granted_mode != lock->l_req_mode )

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.6.0, Lustre 2.5.4
    • Lustre 2.6.0
    • Hyperion - 2.5.60 build 2538
    • 3
    • 14694

    Description

      After hard failover of devices to server iws19, server wedged, then hit LBUG.
      Services never complete recovery, and appear to either restart the timer or something:

      Jun 27 10:08:53 iws19 kernel: Lustre: lustre-OST000c: Will be in recovery for at least 2:30, or until 316 clients reconnect
      Jun 27 10:11:53 iws19 kernel: Lustre: lustre-OST000c: recovery is timed out, evict stale exports
      Jun 27 10:18:30 iws19 kernel: Lustre: lustre-OST000c: Client c52d4856-d1df-b87b-911c-f1bfbc23a24d (at 192.168.124.182@o2ib) reconnecting, waiting for 316 clients in recovery for 2:27
      

      The server reports being cpu-bound prior to the failure
      Console log attached - unfortunately the dump after the LBUG failed.

      Attachments

        1. iws19.crash.txt
          24 kB
        2. iws23.dmesg
          29 kB
        3. iws29.lustre-log.1405540543.8118.txt
          0.3 kB
        4. iws29.lustre-log.1405540562.8181.txt
          316 kB
        5. iws29.messages.txt
          37 kB
        6. lustre-log.1404916388.10827.txt
          0.2 kB
        7. lustre-log.1404916402.10701.txt
          145 kB
        8. lustre-log.1404916421.10764.txt
          210 kB

        Issue Links

          Activity

            [LU-5266] LBUG on Failover -ldlm_process_extent_lock()) ASSERTION( lock->l_granted_mode != lock->l_req_mode )
            pjones Peter Jones added a comment -

            Vitaly
            Could you please open a new JIRA ticket to track this additional change. The original fix was in the already GA 2.6 release
            Thanks
            Peter

            pjones Peter Jones added a comment - Vitaly Could you please open a new JIRA ticket to track this additional change. The original fix was in the already GA 2.6 release Thanks Peter

            the previous fix was not exactly correct: http://review.whamcloud.com/11469

            vitaly_fertman Vitaly Fertman added a comment - the previous fix was not exactly correct: http://review.whamcloud.com/11469

            This should probably go into a new bug, since the LASSERT is fixed.

            adilger Andreas Dilger added a comment - This should probably go into a new bug, since the LASSERT is fixed.

            While testing this patch, still have some client evictions and log dumps. Message log and lustre log from latest attached.

            cliffw Cliff White (Inactive) added a comment - While testing this patch, still have some client evictions and log dumps. Message log and lustre log from latest attached.
            pjones Peter Jones added a comment -

            Landed for 2.6

            pjones Peter Jones added a comment - Landed for 2.6

            I tested the patch on Hyperion - no more LBUG, but did have a few evictions. lustre-logs and console logs from one run attached.

            cliffw Cliff White (Inactive) added a comment - I tested the patch on Hyperion - no more LBUG, but did have a few evictions. lustre-logs and console logs from one run attached.

            Was this introduced by http://review.whamcloud.com/5978 ?

            adilger Andreas Dilger added a comment - Was this introduced by http://review.whamcloud.com/5978 ?

            People

              hongchao.zhang Hongchao Zhang
              cliffw Cliff White (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: