Details
-
Bug
-
Resolution: Fixed
-
Critical
-
None
-
Lustre 2.10.0
-
Soak stress cluster
-
3
-
9,394
-
9223372036854775807
Description
soak-7 survived several failovers, last failover at 2017-05-07 07:41:31
The soak cluster failed over soak-10 at 2017-05-07 18:23:22
Immediately after finishing recovery, soak-7 crashed.
The OSS is reconnected to the recently failed-over MDT on soak-10/11
May 7 18:22:39 soak-7 kernel: LustreError: 11-0: soaked-MDT0003-lwp-OST0011: operation obd_ping to node 192.168.1.110@o2ib10 failed: rc = -107 May 7 18:22:39 soak-7 kernel: Lustre: soaked-MDT0003-lwp-OST0005: Connection to soaked-MDT0003 (at 192.168.1.110@o2ib10) was lost; in progress operations using this service will wait for recovery to complete May 7 18:22:39 soak-7 kernel: Lustre: Skipped 2 previous similar messages May 7 18:22:39 soak-7 kernel: LustreError: Skipped 3 previous similar messages May 7 18:23:21 soak-7 kernel: LNet: 228:0:(o2iblnd_cb.c:2421:kiblnd_passive_connect()) Conn stale 192.168.1.111@o2ib10 version 12/12 incarnation 1494181401470091/1494181401470091 May 7 18:23:21 soak-7 kernel: Lustre: soaked-OST0005: Connection restored to (at 192.168.1.111@o2ib10) May 7 18:23:21 soak-7 kernel: Lustre: Skipped 2 previous similar messages May 7 18:23:22 soak-7 kernel: LNet: 7422:0:(o2iblnd_cb.c:1377:kiblnd_reconnect_peer()) Abort reconnection of 192.168.1.111@o2ib10: connected May 7 18:23:29 soak-7 kernel: LustreError: 167-0: soaked-MDT0003-lwp-OST0011: This client was evicted by soaked-MDT0003; in progress operations using this service will fail. May 7 18:23:29 soak-7 kernel: LustreError: Skipped 1 previous similar message May 7 18:23:43 soak-7 kernel: Lustre: soaked-OST0005: deleting orphan objects from 0x440000401:26279429 to 0x440000401:26291121 May 7 18:23:43 soak-7 kernel: Lustre: soaked-OST0011: deleting orphan objects from 0x780000401:26209136 to 0x780000401:26218273 May 7 18:23:43 soak-7 kernel: Lustre: soaked-OST000b: deleting orphan objects from 0x5c0000400:26329949 to 0x5c0000400:26339745 May 7 18:23:43 soak-7 kernel: Lustre: soaked-OST0017: deleting orphan objects from 0x8c0000401:26229632 to 0x8c0000401:26238017 May 7 18:23:54 soak-7 kernel: LustreError: 167-0: soaked-MDT0003-lwp-OST000b: This client was evicted by soaked-MDT0003; in progress operations using this service will fail. May 7 18:23:54 soak-7 kernel: Lustre: soaked-MDT0003-lwp-OST0017: Connection restored to 192.168.1.111@o2ib10 (at 192.168.1.111@o2ib10)
Then, a hard crash
[38854.133273] Lustre: soaked-MDT0003-lwp-OST0017: Connection restored to 192.168.1.111@o2ib10 (at 192.168.1.111@o2ib10)
[38854.147850] Lustre: Skipped 3 previous similar messages
[55622.538966] perf: interrupt took too long (5010 > 5007), lowering kernel.perf_event_max_sample_rate to 39000
[60371.183844] LustreError: 16407:0:(osd_object.c:427:osd_object_init()) soaked-OST0005: lookup [0x440000401:0x195026b:0x0]/0x920ea8 failed: rc = 17
[60371.201275] BUG: unable to handle kernel NULL pointer dereference at 0000000000000011
[60371.211442] IP: [<ffffffffa0a0d328>] lu_object_find_try+0x178/0x2b0 [obdclass]
[60371.221570] PGD 0
[60371.225825] Oops: 0000 [#1] SMP
There is a crash dump available on the node, vmcore-dmesg attached.
Attachments
Issue Links
- is related to
-
LU-9394 lu_object_find_try - kernel NULL pointer dereference
-
- Resolved
-