Details
-
Bug
-
Resolution: Cannot Reproduce
-
Critical
-
None
-
Lustre 2.4.3
-
None
-
RHEL 6.5
-
2
-
14965
Description
At 16:34 today, one of our mds nodes hit an LBUG that appears to be LU-5294.
Jul 17 16:34:09 atlas-mds3.ccs.ornl.gov kernel: [794253.417021] LustreError: 15235:0:(lu_object.h:867:lu_object_attr()) ASSERTION( ((o)>lo_header>loh_attr & LOHA_EXISTS) != 0 ) failed:
Jul 17 16:34:09 atlas-mds3.ccs.ornl.gov kernel: [794253.430991] LustreError: 15235:0:(lu_object.h:867:lu_object_attr()) LBUG
We performed a crash dump and the mds rebooted. We entered recovery at 17:54 and at 19:24 the time remaining reached 0 but it was still in Recovering status. We have been getting these messages from the mds.
[ 8689.325886] LustreError: 19309:0:(ldlm_lockd.c:878:ldlm_server_blocking_ast()) ### BUG 6063: lock collide during recovery ns: mdt-atlas2-MDT0000_UUID lock: ffff881d06e30900/0xf35a0587ba982321 lrc: 3/0,0 m0
[ 8689.364338] LustreError: 19309:0:(ldlm_lockd.c:878:ldlm_server_blocking_ast()) Skipped 2 previous similar messages
[ 8739.604058] Lustre: atlas2-MDT0000: Denying connection for new client 7c9cecb7-6c21-5cab-c1b6-5ab153ff6158 (at 8173@gni100), waiting for all 20156 known clients (19103 recovered, 1032 in progress, and 21 6
[ 8739.627327] Lustre: Skipped 18 previous similar messages
[ 8973.664327] Lustre: atlas2-MDT0000: Client 1d186399-bf63-fee6-2b63-8fddd9e7fba3 (at 83@gni2) reconnecting, waiting for 20156 clients in recovery for 0:32
[ 8973.679915] Lustre: Skipped 374 previous similar messages
[ 8973.685115] Lustre: atlas2-MDT0000: Client f27002de-ab6c-80a4-91a7-0d2824607322 (at 80@gni2) refused reconnection, still busy with 1 active RPCs
[ 8973.685117] Lustre: Skipped 374 previous similar messages
[ 9065.583029] Lustre: atlas2-MDT0000: recovery is timed out, evict stale exports
[ 9413.230945] Lustre: atlas2-MDT0000: Denying connection for new client 1cbe0fe6-3a30-b20e-2504-2e3665d3e188 (at 11386@gni100), waiting for all 20156 known clients (19126 recovered, 1008 in progress, and 220
[ 9413.254413] Lustre: Skipped 24 previous similar messages
[ 9441.943231] LustreError: 0:0:(ldlm_lockd.c:402:waiting_locks_callback()) ### lock callback timer expired after 376s: evicting client at 11681@gni100 ns: mdt-atlas2-MDT0000_UUID lock: ffff881d07c7f240/0xf0
[ 9441.985270] LustreError: 0:0:(ldlm_lockd.c:402:waiting_locks_callback()) Skipped 2 previous similar messages
[ 9442.001611] Lustre: atlas2-MDT0000: recovery is timed out, evict stale exports
[ 9442.024632] LustreError: 19309:0:(ldlm_lockd.c:878:ldlm_server_blocking_ast()) ### BUG 6063: lock collide during recovery ns: mdt-atlas2-MDT0000_UUID lock: ffff883fc7b21240/0xf35a0587ba988598 lrc: 3/0,0 m0
[ 9442.062793] LustreError: 19309:0:(ldlm_lockd.c:878:ldlm_server_blocking_ast()) Skipped 1 previous similar message
[ 9574.246608] Lustre: atlas2-MDT0000: Client f27002de-ab6c-80a4-91a7-0d2824607322 (at 80@gni2) reconnecting, waiting for 20156 clients in recovery for 3:04
[ 9574.262195] Lustre: Skipped 368 previous similar messages
[ 9574.268355] Lustre: atlas2-MDT0000: Client f27002de-ab6c-80a4-91a7-0d2824607322 (at 80@gni2) refused reconnection, still busy with 1 active RPCs
[ 9574.283062] Lustre: Skipped 368 previous similar messages
[ 9818.311480] Lustre: atlas2-MDT0000: recovery is timed out, evict stale exports