Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5266

LBUG on Failover -ldlm_process_extent_lock()) ASSERTION( lock->l_granted_mode != lock->l_req_mode )

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.6.0, Lustre 2.5.4
    • Lustre 2.6.0
    • Hyperion - 2.5.60 build 2538
    • 3
    • 14694

    Description

      After hard failover of devices to server iws19, server wedged, then hit LBUG.
      Services never complete recovery, and appear to either restart the timer or something:

      Jun 27 10:08:53 iws19 kernel: Lustre: lustre-OST000c: Will be in recovery for at least 2:30, or until 316 clients reconnect
      Jun 27 10:11:53 iws19 kernel: Lustre: lustre-OST000c: recovery is timed out, evict stale exports
      Jun 27 10:18:30 iws19 kernel: Lustre: lustre-OST000c: Client c52d4856-d1df-b87b-911c-f1bfbc23a24d (at 192.168.124.182@o2ib) reconnecting, waiting for 316 clients in recovery for 2:27
      

      The server reports being cpu-bound prior to the failure
      Console log attached - unfortunately the dump after the LBUG failed.

      Attachments

        1. lustre-log.1404916421.10764.txt
          210 kB
        2. lustre-log.1404916402.10701.txt
          145 kB
        3. lustre-log.1404916388.10827.txt
          0.2 kB
        4. iws29.messages.txt
          37 kB
        5. iws29.lustre-log.1405540562.8181.txt
          316 kB
        6. iws29.lustre-log.1405540543.8118.txt
          0.3 kB
        7. iws23.dmesg
          29 kB
        8. iws19.crash.txt
          24 kB

        Issue Links

          Activity

            People

              hongchao.zhang Hongchao Zhang
              cliffw Cliff White (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: