Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5366

BUG 6063: lock collide during recovery

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Critical
    • None
    • Lustre 2.4.3
    • None
    • RHEL 6.5
    • 2
    • 14965

    Description

      At 16:34 today, one of our mds nodes hit an LBUG that appears to be LU-5294.

      Jul 17 16:34:09 atlas-mds3.ccs.ornl.gov kernel: [794253.417021] LustreError: 15235:0:(lu_object.h:867:lu_object_attr()) ASSERTION( ((o)>lo_header>loh_attr & LOHA_EXISTS) != 0 ) failed:
      Jul 17 16:34:09 atlas-mds3.ccs.ornl.gov kernel: [794253.430991] LustreError: 15235:0:(lu_object.h:867:lu_object_attr()) LBUG

      We performed a crash dump and the mds rebooted. We entered recovery at 17:54 and at 19:24 the time remaining reached 0 but it was still in Recovering status. We have been getting these messages from the mds.

      [ 8689.325886] LustreError: 19309:0:(ldlm_lockd.c:878:ldlm_server_blocking_ast()) ### BUG 6063: lock collide during recovery ns: mdt-atlas2-MDT0000_UUID lock: ffff881d06e30900/0xf35a0587ba982321 lrc: 3/0,0 m0
      [ 8689.364338] LustreError: 19309:0:(ldlm_lockd.c:878:ldlm_server_blocking_ast()) Skipped 2 previous similar messages
      [ 8739.604058] Lustre: atlas2-MDT0000: Denying connection for new client 7c9cecb7-6c21-5cab-c1b6-5ab153ff6158 (at 8173@gni100), waiting for all 20156 known clients (19103 recovered, 1032 in progress, and 21 6
      [ 8739.627327] Lustre: Skipped 18 previous similar messages
      [ 8973.664327] Lustre: atlas2-MDT0000: Client 1d186399-bf63-fee6-2b63-8fddd9e7fba3 (at 83@gni2) reconnecting, waiting for 20156 clients in recovery for 0:32
      [ 8973.679915] Lustre: Skipped 374 previous similar messages
      [ 8973.685115] Lustre: atlas2-MDT0000: Client f27002de-ab6c-80a4-91a7-0d2824607322 (at 80@gni2) refused reconnection, still busy with 1 active RPCs
      [ 8973.685117] Lustre: Skipped 374 previous similar messages
      [ 9065.583029] Lustre: atlas2-MDT0000: recovery is timed out, evict stale exports
      [ 9413.230945] Lustre: atlas2-MDT0000: Denying connection for new client 1cbe0fe6-3a30-b20e-2504-2e3665d3e188 (at 11386@gni100), waiting for all 20156 known clients (19126 recovered, 1008 in progress, and 220
      [ 9413.254413] Lustre: Skipped 24 previous similar messages
      [ 9441.943231] LustreError: 0:0:(ldlm_lockd.c:402:waiting_locks_callback()) ### lock callback timer expired after 376s: evicting client at 11681@gni100 ns: mdt-atlas2-MDT0000_UUID lock: ffff881d07c7f240/0xf0
      [ 9441.985270] LustreError: 0:0:(ldlm_lockd.c:402:waiting_locks_callback()) Skipped 2 previous similar messages
      [ 9442.001611] Lustre: atlas2-MDT0000: recovery is timed out, evict stale exports
      [ 9442.024632] LustreError: 19309:0:(ldlm_lockd.c:878:ldlm_server_blocking_ast()) ### BUG 6063: lock collide during recovery ns: mdt-atlas2-MDT0000_UUID lock: ffff883fc7b21240/0xf35a0587ba988598 lrc: 3/0,0 m0
      [ 9442.062793] LustreError: 19309:0:(ldlm_lockd.c:878:ldlm_server_blocking_ast()) Skipped 1 previous similar message
      [ 9574.246608] Lustre: atlas2-MDT0000: Client f27002de-ab6c-80a4-91a7-0d2824607322 (at 80@gni2) reconnecting, waiting for 20156 clients in recovery for 3:04
      [ 9574.262195] Lustre: Skipped 368 previous similar messages
      [ 9574.268355] Lustre: atlas2-MDT0000: Client f27002de-ab6c-80a4-91a7-0d2824607322 (at 80@gni2) refused reconnection, still busy with 1 active RPCs
      [ 9574.283062] Lustre: Skipped 368 previous similar messages
      [ 9818.311480] Lustre: atlas2-MDT0000: recovery is timed out, evict stale exports

      Attachments

        Activity

          People

            bobijam Zhenyu Xu
            curtispb Philip B Curtis
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: