Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11386

insanity test 0 hangs on failing over MDT

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.12.0
    • DNE
    • 3
    • 9223372036854775807

    Description

      insanity test 0 started hanging on July 19, 2018 for review-dne-part-4 with https://testing.whamcloud.com/test_sets/7b3d1b66-8af1-11e8-9028-52540065bddc .

      insanity test 0 fails all MDTs and fails all OSTs. For each successful failover, you will see the target being failed, rebooted, and the target being started like

      Failing mds1 on onyx-41vm9
      …
      reboot facets: mds1
      Failover mds1 to onyx-41vm9
      …
      mount facets: mds1
      …
      Starting mds1:   /dev/mapper/mds1_flakey /mnt/lustre-mds1
      …
      Started lustre-MDT0000
      

      We should see this for each MDT and for each OST. Yet, in the tests that hang, for example https://testing.whamcloud.com/test_sets/3f2ce6e0-ba91-11e8-8c12-52540065bddc, we see the first three MDTs fail, reboot and mount, but not the fourth MDT. The last thing we see in the suite_log is the third MDT starting

      CMD: trevis-37vm4 e2label /dev/mapper/mds3_flakey 				2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
      CMD: trevis-37vm4 e2label /dev/mapper/mds3_flakey 				2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
      CMD: trevis-37vm4 e2label /dev/mapper/mds3_flakey 2>/dev/null
      Started lustre-MDT0002
      

      Looking at the console log of the fourth MDT (vm5), we don’t see anything that indicates there is problem with MDT0003, the fourth MDT. In fact, there is nothing obviously wrong with any of the nodes by looking at their console logs.

      In all of these failures, the failover of one of the MDTs hangs. Here are links to logs for more insanity test 0’s that hang
      https://testing.whamcloud.com/test_sets/f7b26fc6-b9b7-11e8-8c12-52540065bddc
      https://testing.whamcloud.com/test_sets/3dc54fa2-adfa-11e8-bbd1-52540065bddc
      https://testing.whamcloud.com/test_sets/e0b3ea6e-8f5c-11e8-b0aa-52540065bddc

      Attachments

        Activity

          People

            wc-triage WC Triage
            jamesanunez James Nunez (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: