Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12927

OSS hit Kernel panic - not syncing: Fatal machine check

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.13.0
    • lustre-master-ib build #334 EL7.7
    • 3
    • 9223372036854775807

    Description

      In OSS failover testing, 1 OSS hit following error caused the system hung

      soak-6

      [2019-10-28T23:49:44+00:00] INFO: Running report handlers
      [2019-10-28T23:49:44+00:00] INFO: Creating JSON run report
      [2019-10-28T23:49:44+00:00] INFO: Report handlers complete
      [  130.975016] LNet: HW NUMA nodes: 2, HW CPU cores: 32, npartitions: 2
      [  130.985785] alg: No test for adler32 (adler32-zlib)
      [  131.829968] Lustre: Lustre: Build Version: 2.12.58_160_g2b90574
      [  132.018974] LNet: Using FMR for registration
      [  132.035678] LNet: Added LNI 192.168.1.106@o2ib [8/256/0/180]
      [  133.724994] Lustre: soaked-OST0002: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900
      [  138.968619] Lustre: soaked-OST0002: Will be in recovery for at least 2:30, or until 27 clients reconnect
      [  138.980123] Lustre: soaked-OST0002: Connection restored to fab0c63f-3fdb-4 (at 192.168.1.138@o2ib)
      [  139.647610] Lustre: soaked-OST0002: Connection restored to 0e08c972-f5eb-4 (at 192.168.1.120@o2ib)
      [  139.657651] Lustre: Skipped 3 previous similar messages
      [  140.934492] Lustre: soaked-OST0002: Connection restored to f5344847-d291-4 (at 192.168.1.135@o2ib)
      [  140.944523] Lustre: Skipped 7 previous similar messages
      [  141.497059] Lustre: soaked-OST000a: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900
      [  141.525919] Lustre: soaked-OST000a: Will be in recovery for at least 2:30, or until 27 clients reconnect
      [  143.049075] Lustre: soaked-OST000a: Connection restored to 1a036cd1-6dcf-4 (at 192.168.1.141@o2ib)
      [  143.059107] Lustre: Skipped 21 previous similar messages
      [  143.713996] Lustre: soaked-OST0002: Recovery over after 0:05, of 27 clients 27 recovered and 0 were evicted.
      [  143.733171] Lustre: soaked-OST0002: deleting orphan objects from 0x0:6964042 to 0x0:6964083
      [  143.735241] Lustre: soaked-OST0002: deleting orphan objects from 0x380000401:5635234 to 0x380000401:5647269
      [  143.753817] Lustre: soaked-OST0002: deleting orphan objects from 0x380000400:5074927 to 0x380000400:5080690
      [  143.820779] Lustre: soaked-OST0002: deleting orphan objects from 0x380000402:8806871 to 0x380000402:8812296
      [  147.362231] Lustre: soaked-OST000a: Connection restored to 3b6c98a5-fe70-4 (at 192.168.1.131@o2ib)
      [  147.372271] Lustre: Skipped 5 previous similar messages
      [  148.926072] Lustre: soaked-OST0006: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900
      [  149.800735] Lustre: soaked-OST0006: Will be in recovery for at least 2:30, or until 27 clients reconnect
      [  151.643017] Lustre: soaked-OST000a: Recovery over after 0:10, of 27 clients 27 recovered and 0 were evicted.
      [  151.651808] Lustre: soaked-OST000a: deleting orphan objects from 0x580000400:5654332 to 0x580000400:5656923
      [  151.653781] Lustre: soaked-OST000a: deleting orphan objects from 0x0:6949857 to 0x0:6949898
      [  151.663992] Lustre: soaked-OST000a: deleting orphan objects from 0x580000402:8821479 to 0x580000402:8827114
      [  151.665251] Lustre: soaked-OST000a: deleting orphan objects from 0x580000401:5099258 to 0x580000401:5105344
      [  155.393063] Lustre: soaked-OST0006: Connection restored to soaked-MDT0002-mdtlov_UUID (at 192.168.1.110@o2ib)
      [  155.404202] Lustre: Skipped 26 previous similar messages
      [  157.016144] Lustre: soaked-OST000e: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900
      [  157.129006] Lustre: soaked-OST000e: Will be in recovery for at least 2:30, or until 27 clients reconnect
      [  158.836265] Lustre: soaked-OST0006: Recovery over after 0:09, of 27 clients 27 recovered and 0 were evicted.
      [  158.847384] Lustre: soaked-OST0006: deleting orphan objects from 0x0:6965153 to 0x0:6965199
      [  158.866283] Lustre: soaked-OST0006: deleting orphan objects from 0x480000402:5102899 to 0x480000402:5104678
      [  158.866936] Lustre: soaked-OST0006: deleting orphan objects from 0x480000401:5643903 to 0x480000401:5651951
      [  158.874936] Lustre: soaked-OST0006: deleting orphan objects from 0x480000400:8787734 to 0x480000400:8793891
      [  167.036317] Lustre: soaked-OST000e: Recovery over after 0:10, of 27 clients 27 recovered and 0 were evicted.
      [  167.051945] Lustre: soaked-OST000e: deleting orphan objects from 0x680000402:4916845 to 0x680000402:4918647
      [  167.052271] Lustre: soaked-OST000e: deleting orphan objects from 0x0:6939032 to 0x0:6939072
      [  167.055485] Lustre: soaked-OST000e: deleting orphan objects from 0x680000401:8720221 to 0x680000401:8723771
      [  167.062501] Lustre: soaked-OST000e: deleting orphan objects from 0x680000400:5548226 to 0x680000400:5552635
      [  271.398262] Lustre: soaked-OST000a: Connection restored to 4270d3b8-8785-4 (at 192.168.1.122@o2ib)
      [  271.408347] Lustre: Skipped 42 previous similar messages
      [  355.688632] Lustre: soaked-OST0006: Connection restored to 4270d3b8-8785-4 (at 192.168.1.122@o2ib)
      [  355.698685] Lustre: Skipped 6 previous similar messages
      [  487.617829] Lustre: soaked-OST000e: Connection restored to 0a14b91b-c6a9-4 (at 192.168.1.119@o2ib)
      [  487.627863] Lustre: Skipped 1 previous similar message
      [  871.326165] Lustre: soaked-OST000a: Connection restored to 667ea088-477b-4 (at 192.168.1.118@o2ib)
      [  871.336185] Lustre: Skipped 15 previous similar messages
      [ 1194.625969] Lustre: soaked-OST0006: Connection restored to 4270d3b8-8785-4 (at 192.168.1.122@o2ib)
      [ 1194.635991] Lustre: Skipped 21 previous similar messages
      [ 1742.196450] Lustre: soaked-OST0006: Connection restored to 4270d3b8-8785-4 (at 192.168.1.122@o2ib)
      [ 1742.206486] Lustre: Skipped 168 previous similar messages
      [ 2512.885378] Lustre: soaked-OST000a: Connection restored to 0e6b88eb-ca9a-4 (at 192.168.1.117@o2ib)
      [ 2512.885380] Lustre: soaked-OST0002: Connection restored to 0e6b88eb-ca9a-4 (at 192.168.1.117@o2ib)
      [ 2512.885383] Lustre: soaked-OST000e: Connection restored to 0e6b88eb-ca9a-4 (at 192.168.1.117@o2ib)
      [ 2512.885385] Lustre: Skipped 64 previous similar messages
      [ 2512.885392] Lustre: Skipped 65 previous similar messages
      [ 3141.275984] Lustre: soaked-OST0002: Connection restored to 2f1eb1c4-6276-4 (at 192.168.1.126@o2ib)
      [ 3141.275986] Lustre: soaked-OST000e: Connection restored to 2f1eb1c4-6276-4 (at 192.168.1.126@o2ib)
      [ 3141.275992] Lustre: Skipped 128 previous similar messages
      [ 3141.302076] Lustre: Skipped 1 previous similar message
      [ 3738.703359] LustreError: 137-5: soaked-OST0007_UUID: not available for connect from 192.168.1.110@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server.
      [ 3738.723185] LustreError: Skipped 3 previous similar messages
      [ 3740.041350] LustreError: 137-5: soaked-OST000f_UUID: not available for connect from 192.168.1.142@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server.
      [ 3741.149040] LustreError: 137-5: soaked-OST000f_UUID: not available for connect from 192.168.1.127@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server.
      [ 3741.168874] LustreError: Skipped 3 previous similar messages
      [ 3743.506904] LustreError: 137-5: soaked-OST0003_UUID: not available for connect from 192.168.1.111@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server.
      [ 3743.526763] LustreError: Skipped 7 previous similar messages
      [ 3749.244322] LustreError: 137-5: soaked-OST000f_UUID: not available for connect from 192.168.1.120@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server.
      [ 3749.264152] LustreError: Skipped 9 previous similar messages
      [ 3757.891545] LustreError: 137-5: soaked-OST000f_UUID: not available for connect from 192.168.1.122@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server.
      [ 3757.911389] LustreError: Skipped 3 previous similar messages
      [ 3788.883107] LustreError: 137-5: soaked-OST0003_UUID: not available for connect from 192.168.1.110@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server.
      [ 3788.883110] LustreError: 137-5: soaked-OST0007_UUID: not available for connect from 192.168.1.110@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server.
      [ 3788.883116] LustreError: Skipped 5 previous similar messages
      [ 3789.539742] mce: [Hardware Error]: CPU 0: Machine Check Exception: 5 Bank 5: be00000000010093
      [ 3789.539748] mce: [Hardware Error]: Machine check events logged
      [ 3789.555773] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff877815b4> {intel_idle+0xd4/0x225}
      [ 3789.565421] mce: [Hardware Error]: TSC 9ce54e28818 ADDR 42ec5acc0 MISC 14076f686 
      [ 3789.573817] mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1572310251 SOCKET 0 APIC 0 microcode 718
      [ 3789.583829] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
      [ 3789.597693] mce: [Hardware Error]: Machine check: Processor context corrupt
      [ 3789.605480] Kernel panic - not syncing: Fatal machine check
      

      Attachments

        Activity

          People

            wc-triage WC Triage
            sarah Sarah Liu
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: