soak-7 survived several failovers, last failover at 2017-05-07 07:41:31
The soak cluster failed over soak-10 at 2017-05-07 18:23:22
Immediately after finishing recovery, soak-7 crashed.
The OSS is reconnected to the recently failed-over MDT on soak-10/11
May 7 18:22:39 soak-7 kernel: LustreError: 11-0: soaked-MDT0003-lwp-OST0011: operation obd_ping to node 192.168.1.110@o2ib10 failed: rc = -107
May 7 18:22:39 soak-7 kernel: Lustre: soaked-MDT0003-lwp-OST0005: Connection to soaked-MDT0003 (at 192.168.1.110@o2ib10) was lost; in progress operations using this service will wait for recovery to complete
May 7 18:22:39 soak-7 kernel: Lustre: Skipped 2 previous similar messages
May 7 18:22:39 soak-7 kernel: LustreError: Skipped 3 previous similar messages
May 7 18:23:21 soak-7 kernel: LNet: 228:0:(o2iblnd_cb.c:2421:kiblnd_passive_connect()) Conn stale 192.168.1.111@o2ib10 version 12/12 incarnation 1494181401470091/1494181401470091
May 7 18:23:21 soak-7 kernel: Lustre: soaked-OST0005: Connection restored to (at 192.168.1.111@o2ib10)
May 7 18:23:21 soak-7 kernel: Lustre: Skipped 2 previous similar messages
May 7 18:23:22 soak-7 kernel: LNet: 7422:0:(o2iblnd_cb.c:1377:kiblnd_reconnect_peer()) Abort reconnection of 192.168.1.111@o2ib10: connected
May 7 18:23:29 soak-7 kernel: LustreError: 167-0: soaked-MDT0003-lwp-OST0011: This client was evicted by soaked-MDT0003; in progress operations using this service will fail.
May 7 18:23:29 soak-7 kernel: LustreError: Skipped 1 previous similar message
May 7 18:23:43 soak-7 kernel: Lustre: soaked-OST0005: deleting orphan objects from 0x440000401:26279429 to 0x440000401:26291121
May 7 18:23:43 soak-7 kernel: Lustre: soaked-OST0011: deleting orphan objects from 0x780000401:26209136 to 0x780000401:26218273
May 7 18:23:43 soak-7 kernel: Lustre: soaked-OST000b: deleting orphan objects from 0x5c0000400:26329949 to 0x5c0000400:26339745
May 7 18:23:43 soak-7 kernel: Lustre: soaked-OST0017: deleting orphan objects from 0x8c0000401:26229632 to 0x8c0000401:26238017
May 7 18:23:54 soak-7 kernel: LustreError: 167-0: soaked-MDT0003-lwp-OST000b: This client was evicted by soaked-MDT0003; in progress operations using this service will fail.
May 7 18:23:54 soak-7 kernel: Lustre: soaked-MDT0003-lwp-OST0017: Connection restored to 192.168.1.111@o2ib10 (at 192.168.1.111@o2ib10)
Then, a hard crash
[38854.133273] Lustre: soaked-MDT0003-lwp-OST0017: Connection restored to 192.168.1.111@o2ib10 (at 192.168.1.111@o2ib10)
[38854.147850] Lustre: Skipped 3 previous similar messages
[55622.538966] perf: interrupt took too long (5010 > 5007), lowering kernel.perf_event_max_sample_rate to 39000
[60371.183844] LustreError: 16407:0:(osd_object.c:427:osd_object_init()) soaked-OST0005: lookup [0x440000401:0x195026b:0x0]/0x920ea8 failed: rc = 17
[60371.201275] BUG: unable to handle kernel NULL pointer dereference at 0000000000000011
[60371.211442] IP: [<ffffffffa0a0d328>] lu_object_find_try+0x178/0x2b0 [obdclass]
[60371.221570] PGD 0
[60371.225825] Oops: 0000 [#1] SMP
There is a crash dump available on the node, vmcore-dmesg attached.