Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.13.0
-
lustre-master-ib build #334 EL7.7
-
3
-
9223372036854775807
Description
In OSS failover testing, 1 OSS hit following error caused the system hung
soak-6
[2019-10-28T23:49:44+00:00] INFO: Running report handlers [2019-10-28T23:49:44+00:00] INFO: Creating JSON run report [2019-10-28T23:49:44+00:00] INFO: Report handlers complete [ 130.975016] LNet: HW NUMA nodes: 2, HW CPU cores: 32, npartitions: 2 [ 130.985785] alg: No test for adler32 (adler32-zlib) [ 131.829968] Lustre: Lustre: Build Version: 2.12.58_160_g2b90574 [ 132.018974] LNet: Using FMR for registration [ 132.035678] LNet: Added LNI 192.168.1.106@o2ib [8/256/0/180] [ 133.724994] Lustre: soaked-OST0002: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 [ 138.968619] Lustre: soaked-OST0002: Will be in recovery for at least 2:30, or until 27 clients reconnect [ 138.980123] Lustre: soaked-OST0002: Connection restored to fab0c63f-3fdb-4 (at 192.168.1.138@o2ib) [ 139.647610] Lustre: soaked-OST0002: Connection restored to 0e08c972-f5eb-4 (at 192.168.1.120@o2ib) [ 139.657651] Lustre: Skipped 3 previous similar messages [ 140.934492] Lustre: soaked-OST0002: Connection restored to f5344847-d291-4 (at 192.168.1.135@o2ib) [ 140.944523] Lustre: Skipped 7 previous similar messages [ 141.497059] Lustre: soaked-OST000a: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 [ 141.525919] Lustre: soaked-OST000a: Will be in recovery for at least 2:30, or until 27 clients reconnect [ 143.049075] Lustre: soaked-OST000a: Connection restored to 1a036cd1-6dcf-4 (at 192.168.1.141@o2ib) [ 143.059107] Lustre: Skipped 21 previous similar messages [ 143.713996] Lustre: soaked-OST0002: Recovery over after 0:05, of 27 clients 27 recovered and 0 were evicted. [ 143.733171] Lustre: soaked-OST0002: deleting orphan objects from 0x0:6964042 to 0x0:6964083 [ 143.735241] Lustre: soaked-OST0002: deleting orphan objects from 0x380000401:5635234 to 0x380000401:5647269 [ 143.753817] Lustre: soaked-OST0002: deleting orphan objects from 0x380000400:5074927 to 0x380000400:5080690 [ 143.820779] Lustre: soaked-OST0002: deleting orphan objects from 0x380000402:8806871 to 0x380000402:8812296 [ 147.362231] Lustre: soaked-OST000a: Connection restored to 3b6c98a5-fe70-4 (at 192.168.1.131@o2ib) [ 147.372271] Lustre: Skipped 5 previous similar messages [ 148.926072] Lustre: soaked-OST0006: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 [ 149.800735] Lustre: soaked-OST0006: Will be in recovery for at least 2:30, or until 27 clients reconnect [ 151.643017] Lustre: soaked-OST000a: Recovery over after 0:10, of 27 clients 27 recovered and 0 were evicted. [ 151.651808] Lustre: soaked-OST000a: deleting orphan objects from 0x580000400:5654332 to 0x580000400:5656923 [ 151.653781] Lustre: soaked-OST000a: deleting orphan objects from 0x0:6949857 to 0x0:6949898 [ 151.663992] Lustre: soaked-OST000a: deleting orphan objects from 0x580000402:8821479 to 0x580000402:8827114 [ 151.665251] Lustre: soaked-OST000a: deleting orphan objects from 0x580000401:5099258 to 0x580000401:5105344 [ 155.393063] Lustre: soaked-OST0006: Connection restored to soaked-MDT0002-mdtlov_UUID (at 192.168.1.110@o2ib) [ 155.404202] Lustre: Skipped 26 previous similar messages [ 157.016144] Lustre: soaked-OST000e: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 [ 157.129006] Lustre: soaked-OST000e: Will be in recovery for at least 2:30, or until 27 clients reconnect [ 158.836265] Lustre: soaked-OST0006: Recovery over after 0:09, of 27 clients 27 recovered and 0 were evicted. [ 158.847384] Lustre: soaked-OST0006: deleting orphan objects from 0x0:6965153 to 0x0:6965199 [ 158.866283] Lustre: soaked-OST0006: deleting orphan objects from 0x480000402:5102899 to 0x480000402:5104678 [ 158.866936] Lustre: soaked-OST0006: deleting orphan objects from 0x480000401:5643903 to 0x480000401:5651951 [ 158.874936] Lustre: soaked-OST0006: deleting orphan objects from 0x480000400:8787734 to 0x480000400:8793891 [ 167.036317] Lustre: soaked-OST000e: Recovery over after 0:10, of 27 clients 27 recovered and 0 were evicted. [ 167.051945] Lustre: soaked-OST000e: deleting orphan objects from 0x680000402:4916845 to 0x680000402:4918647 [ 167.052271] Lustre: soaked-OST000e: deleting orphan objects from 0x0:6939032 to 0x0:6939072 [ 167.055485] Lustre: soaked-OST000e: deleting orphan objects from 0x680000401:8720221 to 0x680000401:8723771 [ 167.062501] Lustre: soaked-OST000e: deleting orphan objects from 0x680000400:5548226 to 0x680000400:5552635 [ 271.398262] Lustre: soaked-OST000a: Connection restored to 4270d3b8-8785-4 (at 192.168.1.122@o2ib) [ 271.408347] Lustre: Skipped 42 previous similar messages [ 355.688632] Lustre: soaked-OST0006: Connection restored to 4270d3b8-8785-4 (at 192.168.1.122@o2ib) [ 355.698685] Lustre: Skipped 6 previous similar messages [ 487.617829] Lustre: soaked-OST000e: Connection restored to 0a14b91b-c6a9-4 (at 192.168.1.119@o2ib) [ 487.627863] Lustre: Skipped 1 previous similar message [ 871.326165] Lustre: soaked-OST000a: Connection restored to 667ea088-477b-4 (at 192.168.1.118@o2ib) [ 871.336185] Lustre: Skipped 15 previous similar messages [ 1194.625969] Lustre: soaked-OST0006: Connection restored to 4270d3b8-8785-4 (at 192.168.1.122@o2ib) [ 1194.635991] Lustre: Skipped 21 previous similar messages [ 1742.196450] Lustre: soaked-OST0006: Connection restored to 4270d3b8-8785-4 (at 192.168.1.122@o2ib) [ 1742.206486] Lustre: Skipped 168 previous similar messages [ 2512.885378] Lustre: soaked-OST000a: Connection restored to 0e6b88eb-ca9a-4 (at 192.168.1.117@o2ib) [ 2512.885380] Lustre: soaked-OST0002: Connection restored to 0e6b88eb-ca9a-4 (at 192.168.1.117@o2ib) [ 2512.885383] Lustre: soaked-OST000e: Connection restored to 0e6b88eb-ca9a-4 (at 192.168.1.117@o2ib) [ 2512.885385] Lustre: Skipped 64 previous similar messages [ 2512.885392] Lustre: Skipped 65 previous similar messages [ 3141.275984] Lustre: soaked-OST0002: Connection restored to 2f1eb1c4-6276-4 (at 192.168.1.126@o2ib) [ 3141.275986] Lustre: soaked-OST000e: Connection restored to 2f1eb1c4-6276-4 (at 192.168.1.126@o2ib) [ 3141.275992] Lustre: Skipped 128 previous similar messages [ 3141.302076] Lustre: Skipped 1 previous similar message [ 3738.703359] LustreError: 137-5: soaked-OST0007_UUID: not available for connect from 192.168.1.110@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server. [ 3738.723185] LustreError: Skipped 3 previous similar messages [ 3740.041350] LustreError: 137-5: soaked-OST000f_UUID: not available for connect from 192.168.1.142@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server. [ 3741.149040] LustreError: 137-5: soaked-OST000f_UUID: not available for connect from 192.168.1.127@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server. [ 3741.168874] LustreError: Skipped 3 previous similar messages [ 3743.506904] LustreError: 137-5: soaked-OST0003_UUID: not available for connect from 192.168.1.111@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server. [ 3743.526763] LustreError: Skipped 7 previous similar messages [ 3749.244322] LustreError: 137-5: soaked-OST000f_UUID: not available for connect from 192.168.1.120@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server. [ 3749.264152] LustreError: Skipped 9 previous similar messages [ 3757.891545] LustreError: 137-5: soaked-OST000f_UUID: not available for connect from 192.168.1.122@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server. [ 3757.911389] LustreError: Skipped 3 previous similar messages [ 3788.883107] LustreError: 137-5: soaked-OST0003_UUID: not available for connect from 192.168.1.110@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server. [ 3788.883110] LustreError: 137-5: soaked-OST0007_UUID: not available for connect from 192.168.1.110@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server. [ 3788.883116] LustreError: Skipped 5 previous similar messages [ 3789.539742] mce: [Hardware Error]: CPU 0: Machine Check Exception: 5 Bank 5: be00000000010093 [ 3789.539748] mce: [Hardware Error]: Machine check events logged [ 3789.555773] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff877815b4> {intel_idle+0xd4/0x225} [ 3789.565421] mce: [Hardware Error]: TSC 9ce54e28818 ADDR 42ec5acc0 MISC 14076f686 [ 3789.573817] mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1572310251 SOCKET 0 APIC 0 microcode 718 [ 3789.583829] mce: [Hardware Error]: Run the above through 'mcelog --ascii' [ 3789.597693] mce: [Hardware Error]: Machine check: Processor context corrupt [ 3789.605480] Kernel panic - not syncing: Fatal machine check