Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11967

MDS LBUG ASSERTION( o->opo_reserved == 0 ) failed

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.13.0, Lustre 2.12.4
    • Lustre 2.12.0
    • None
    • CentOS 7.6, Kernel 3.10.0-957.1.3.el7_lustre.x86_64, all clients are 2.12.0
    • 3
    • 9223372036854775807

    Description

      We just hit the following MDS crash on Fir (2.12), server fir-md1-s1:

      [497493.075367] Lustre: fir-MDT0000: Client 691c85d2-0e39-9e6d-1bfd-ecbaccae5366 (at 10.8.2.27@o2ib6) reconnecting
      [497594.956880] LustreError: 12324:0:(osp_object.c:1458:osp_declare_create()) ASSERTION( o->opo_reserved == 0 ) failed: 
      [497594.967490] LustreError: 12324:0:(osp_object.c:1458:osp_declare_create()) LBUG
      [497594.974807] Pid: 12324, comm: mdt01_074 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018
      [497594.984636] Call Trace:
      [497594.987187]  [<ffffffffc0c5e7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
      [497594.993859]  [<ffffffffc0c5e87c>] lbug_with_loc+0x4c/0xa0 [libcfs]
      [497595.000177]  [<ffffffffc17cdcc5>] osp_declare_create+0x5a5/0x5b0 [osp]
      [497595.006833]  [<ffffffffc171539f>] lod_sub_declare_create+0xdf/0x210 [lod]
      [497595.013748]  [<ffffffffc1714904>] lod_qos_prep_create+0x15d4/0x1890 [lod]
      [497595.020662]  [<ffffffffc16f5bba>] lod_declare_instantiate_components+0x9a/0x1d0 [lod]
      [497595.028614]  [<ffffffffc17084d5>] lod_declare_layout_change+0xb65/0x10f0 [lod]
      [497595.035988]  [<ffffffffc177a102>] mdd_declare_layout_change+0x62/0x120 [mdd]
      [497595.043172]  [<ffffffffc1782e52>] mdd_layout_change+0x882/0x1000 [mdd]
      [497595.049830]  [<ffffffffc15ea317>] mdt_layout_change+0x337/0x430 [mdt]
      [497595.056398]  [<ffffffffc15f242e>] mdt_intent_layout+0x7ee/0xcc0 [mdt]
      [497595.062968]  [<ffffffffc15efa18>] mdt_intent_policy+0x2e8/0xd00 [mdt]
      [497595.069549]  [<ffffffffc0f41ec6>] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc]
      [497595.076400]  [<ffffffffc0f6a8a7>] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc]
      [497595.083597]  [<ffffffffc0ff1302>] tgt_enqueue+0x62/0x210 [ptlrpc]
      [497595.089851]  [<ffffffffc0ff835a>] tgt_request_handle+0xaea/0x1580 [ptlrpc]
      [497595.096881]  [<ffffffffc0f9c92b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
      [497595.104679]  [<ffffffffc0fa025c>] ptlrpc_main+0xafc/0x1fc0 [ptlrpc]
      [497595.111095]  [<ffffffff9d6c1c31>] kthread+0xd1/0xe0
      [497595.116096]  [<ffffffff9dd74c24>] ret_from_fork_nospec_begin+0xe/0x21
      [497595.122657]  [<ffffffffffffffff>] 0xffffffffffffffff
      [497595.127761] Kernel panic - not syncing: LBUG
      [497595.132122] CPU: 41 PID: 12324 Comm: mdt01_074 Kdump: loaded Tainted: G           OEL ------------   3.10.0-957.1.3.el7_lustre.x86_64 #1
      [497595.144451] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018
      [497595.152106] Call Trace:
      [497595.154649]  [<ffffffff9dd61e41>] dump_stack+0x19/0x1b
      [497595.159882]  [<ffffffff9dd5b550>] panic+0xe8/0x21f
      [497595.164763]  [<ffffffffc0c5e8cb>] lbug_with_loc+0x9b/0xa0 [libcfs]
      [497595.171040]  [<ffffffffc17cdcc5>] osp_declare_create+0x5a5/0x5b0 [osp]
      [497595.177668]  [<ffffffffc171539f>] lod_sub_declare_create+0xdf/0x210 [lod]
      [497595.184541]  [<ffffffff9d994d0d>] ? list_del+0xd/0x30
      [497595.189693]  [<ffffffffc1714904>] lod_qos_prep_create+0x15d4/0x1890 [lod]
      [497595.196569]  [<ffffffff9d81a849>] ? ___slab_alloc+0x209/0x4f0
      [497595.202421]  [<ffffffffc0d87f7b>] ? class_handle_hash+0xab/0x2f0 [obdclass]
      [497595.209474]  [<ffffffff9d6d67b0>] ? wake_up_state+0x20/0x20
      [497595.215152]  [<ffffffffc0da7138>] ? lu_buf_alloc+0x48/0x320 [obdclass]
      [497595.221803]  [<ffffffffc0f5be0d>] ? ldlm_cli_enqueue_local+0x27d/0x870 [ptlrpc]
      [497595.229208]  [<ffffffffc16f5bba>] lod_declare_instantiate_components+0x9a/0x1d0 [lod]
      [497595.237131]  [<ffffffffc17084d5>] lod_declare_layout_change+0xb65/0x10f0 [lod]
      [497595.244442]  [<ffffffffc177a102>] mdd_declare_layout_change+0x62/0x120 [mdd]
      [497595.251584]  [<ffffffffc1782e52>] mdd_layout_change+0x882/0x1000 [mdd]
      [497595.258213]  [<ffffffffc15e9b30>] ? mdt_object_lock_internal+0x70/0x3e0 [mdt]
      [497595.265444]  [<ffffffffc15ea317>] mdt_layout_change+0x337/0x430 [mdt]
      [497595.271978]  [<ffffffffc15f242e>] mdt_intent_layout+0x7ee/0xcc0 [mdt]
      [497595.278543]  [<ffffffffc0f8e2f7>] ? lustre_msg_buf+0x17/0x60 [ptlrpc]
      [497595.285083]  [<ffffffffc15efa18>] mdt_intent_policy+0x2e8/0xd00 [mdt]
      [497595.291637]  [<ffffffffc0f40524>] ? ldlm_lock_create+0xa4/0xa40 [ptlrpc]
      [497595.298442]  [<ffffffffc15f1c40>] ? mdt_intent_open+0x350/0x350 [mdt]
      [497595.304999]  [<ffffffffc0f41ec6>] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc]
      [497595.311794]  [<ffffffffc0c69fa3>] ? cfs_hash_bd_add_locked+0x63/0x80 [libcfs]
      [497595.319018]  [<ffffffffc0c6d72e>] ? cfs_hash_add+0xbe/0x1a0 [libcfs]
      [497595.325490]  [<ffffffffc0f6a8a7>] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc]
      [497595.332662]  [<ffffffffc0f927f0>] ? lustre_swab_ldlm_lock_desc+0x30/0x30 [ptlrpc]
      [497595.340268]  [<ffffffffc0ff1302>] tgt_enqueue+0x62/0x210 [ptlrpc]
      [497595.346488]  [<ffffffffc0ff835a>] tgt_request_handle+0xaea/0x1580 [ptlrpc]
      [497595.353481]  [<ffffffffc0fd1a51>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc]
      [497595.361139]  [<ffffffffc0c5ebde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs]
      [497595.368309]  [<ffffffffc0f9c92b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
      [497595.376083]  [<ffffffffc0f997b5>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc]
      [497595.382959]  [<ffffffff9d6d67c2>] ? default_wake_function+0x12/0x20
      [497595.389311]  [<ffffffff9d6cba9b>] ? __wake_up_common+0x5b/0x90
      [497595.395263]  [<ffffffffc0fa025c>] ptlrpc_main+0xafc/0x1fc0 [ptlrpc]
      [497595.401650]  [<ffffffffc0f9f760>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc]
      [497595.409132]  [<ffffffff9d6c1c31>] kthread+0xd1/0xe0
      [497595.414097]  [<ffffffff9d6c1b60>] ? insert_kthread_work+0x40/0x40
      [497595.420279]  [<ffffffff9dd74c24>] ret_from_fork_nospec_begin+0xe/0x21
      [497595.426803]  [<ffffffff9d6c1b60>] ? insert_kthread_work+0x40/0x40
      

      I do have a vmcore.

      Fir has 2 MDS, fir-md1-s1 with MDT0 and MDT2 and fir-md1-s2 with MDT1 and MDT3.

       

      DOM, PFL are enabled and used.

       

      Please let me know if you have any idea how to avoid this.

      Thanks
      Stephane

      Attachments

        Activity

          [LU-11967] MDS LBUG ASSERTION( o->opo_reserved == 0 ) failed

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36376/
          Subject: LU-11967 mdt: reint layout_change in standard way
          Project: fs/lustre-release
          Branch: b2_12
          Current Patch Set:
          Commit: 6a806133a3a53987dbd9c207e0ed82dcd4035bbd

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36376/ Subject: LU-11967 mdt: reint layout_change in standard way Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 6a806133a3a53987dbd9c207e0ed82dcd4035bbd

          Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36376
          Subject: LU-11967 mdt: reint layout_change in standard way
          Project: fs/lustre-release
          Branch: b2_12
          Current Patch Set: 1
          Commit: cf47eb6b28174d280b82154981ff15b80517a8c9

          gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36376 Subject: LU-11967 mdt: reint layout_change in standard way Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: cf47eb6b28174d280b82154981ff15b80517a8c9
          pjones Peter Jones added a comment -

          Landed for 2.13

          pjones Peter Jones added a comment - Landed for 2.13

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35465/
          Subject: LU-11967 mdt: reint layout_change in standard way
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: e1bebbdf53c8490a3be35793070f45f1a68721b1

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35465/ Subject: LU-11967 mdt: reint layout_change in standard way Project: fs/lustre-release Branch: master Current Patch Set: Commit: e1bebbdf53c8490a3be35793070f45f1a68721b1

          Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35465
          Subject: LU-11967 mdt: reint layout_change in standard way
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: bce42b9cebb70f99d4e90bca8e6c99c4e7290794

          gerrit Gerrit Updater added a comment - Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35465 Subject: LU-11967 mdt: reint layout_change in standard way Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: bce42b9cebb70f99d4e90bca8e6c99c4e7290794
          laisiyao Lai Siyao added a comment -

          I didn't find what race triggered this assert, but I find layout_change is not handled the same as other reint operations, so I'm afraid there are hidden race in layout_change resent/replay. I'll commit a patch to reint layout_change in standard way.

          laisiyao Lai Siyao added a comment - I didn't find what race triggered this assert, but I find layout_change is not handled the same as other reint operations, so I'm afraid there are hidden race in layout_change resent/replay. I'll commit a patch to reint layout_change in standard way.

          Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35439
          Subject: LU-11967 obdclass: add barrier() before checking dying object
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 26c774c44d09be0a03c079d5c707800ffc79932d

          gerrit Gerrit Updater added a comment - Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35439 Subject: LU-11967 obdclass: add barrier() before checking dying object Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 26c774c44d09be0a03c079d5c707800ffc79932d

          It's great that you found this! Thanks for the explanation!

          sthiell Stephane Thiell added a comment - It's great that you found this! Thanks for the explanation!
          laisiyao Lai Siyao added a comment -

          Thanks, I found layout_change replay is different from normal create replay: it reads on-disk LOV ea to decide whether object is initialized upon replay, however this LOV ea may be stale and inconsistent with the object stripes in memory (initialized from LOV in replay request), that is, on-disk LOV reveals some stripe is not initialized, but this stripe has been initialized in memory, so it triggers this assert.

                          if (!replay) {
                                  if (lod_comp_inited(lod_comp))
                                          continue;
                          } else {
                                  /**
                                   * In replay path, lod_comp is the EA passed by
                                   * client replay buffer,  comp_v1 is the pre-recovery
                                   * on-disk EA, we'd sift out those components which
                                   * were init-ed in the on-disk EA.
                                   */
                                  if (le32_to_cpu(comp_v1->lcm_entries[i].lcme_flags) &
                                      LCME_FL_INIT)
                                          continue;
                          }
          

          IMO layout_change can be replayed the same as normal create, i.e., use LOV in replay request to create stripes, and never read on-disk LOV which may be stale. I'll write a patch and verify this.

          laisiyao Lai Siyao added a comment - Thanks, I found layout_change replay is different from normal create replay: it reads on-disk LOV ea to decide whether object is initialized upon replay, however this LOV ea may be stale and inconsistent with the object stripes in memory (initialized from LOV in replay request), that is, on-disk LOV reveals some stripe is not initialized, but this stripe has been initialized in memory, so it triggers this assert. if (!replay) { if (lod_comp_inited(lod_comp)) continue; } else { /** * In replay path, lod_comp is the EA passed by * client replay buffer, comp_v1 is the pre-recovery * on-disk EA, we'd sift out those components which * were init-ed in the on-disk EA. */ if (le32_to_cpu(comp_v1->lcm_entries[i].lcme_flags) & LCME_FL_INIT) continue; } IMO layout_change can be replayed the same as normal create, i.e., use LOV in replay request to create stripes, and never read on-disk LOV which may be stale. I'll write a patch and verify this.
          sthiell Stephane Thiell added a comment - - edited

          The filesystem has been running fine for about 2 weeks until a new occurence of this MDS crash occurred last night.  Additionally, it took us several hours this morning to put the filesystem back online after the crash because the MDT recovery was stuck at 0 sec and wouldn't finish. Even a manual abort_recovery didn't work. Only disconnecting all clients resolved the issue. This has already been reported as LU-12360 (and I posted an update there too).

          So for the MDS LBUG, this time I do have a proper vmcore. First, the LBUG:

          [Sat Jun 15 00:46:30 2019][1411481.220356] LustreError: 21484:0:(tgt_grant.c:742:tgt_grant_check()) fir-MDT0001: cli e70978d5-4079-f244-8878-d79e9ac6d1e1 claims 155648 GRANT, real grant 0^M
          [Sat Jun 15 00:46:30 2019][1411481.234534] LustreError: 21484:0:(tgt_grant.c:742:tgt_grant_check()) Skipped 9824 previous similar messages^M
          [Sat Jun 15 00:49:13 2019][1411644.396226] Lustre: 23817:0:(llog_cat.c:874:llog_cat_process_or_fork()) fir-MDD0001: catlog [0x5:0xa:0x0] crosses index zero^M
          [Sat Jun 15 00:49:13 2019][1411644.407621] Lustre: 23817:0:(llog_cat.c:874:llog_cat_process_or_fork()) Skipped 77 previous similar messages^M
          [Sat Jun 15 00:56:30 2019][1412081.314545] LustreError: 21403:0:(tgt_grant.c:742:tgt_grant_check()) fir-MDT0001: cli e70978d5-4079-f244-8878-d79e9ac6d1e1 claims 155648 GRANT, real grant 0^M
          [Sat Jun 15 00:56:30 2019][1412081.328707] LustreError: 21403:0:(tgt_grant.c:742:tgt_grant_check()) Skipped 9658 previous similar messages^M
          [Sat Jun 15 00:59:14 2019][1412245.465371] Lustre: 23731:0:(llog_cat.c:874:llog_cat_process_or_fork()) fir-MDD0001: catlog [0x5:0xa:0x0] crosses index zero^M
          [Sat Jun 15 00:59:14 2019][1412245.476773] Lustre: 23731:0:(llog_cat.c:874:llog_cat_process_or_fork()) Skipped 52 previous similar messages^M
          [Sat Jun 15 01:06:30 2019][1412681.380333] LustreError: 73335:0:(tgt_grant.c:742:tgt_grant_check()) fir-MDT0001: cli e70978d5-4079-f244-8878-d79e9ac6d1e1 claims 155648 GRANT, real grant 0^M
          [Sat Jun 15 01:06:30 2019][1412681.394502] LustreError: 73335:0:(tgt_grant.c:742:tgt_grant_check()) Skipped 9471 previous similar messages^M
          [Sat Jun 15 01:09:30 2019][1412861.615161] Lustre: 23727:0:(llog_cat.c:874:llog_cat_process_or_fork()) fir-MDD0001: catlog [0x5:0xa:0x0] crosses index zero^M
          [Sat Jun 15 01:09:30 2019][1412861.626555] Lustre: 23727:0:(llog_cat.c:874:llog_cat_process_or_fork()) Skipped 80 previous similar messages^M
          [Sat Jun 15 01:16:30 2019][1413281.448578] LustreError: 73327:0:(tgt_grant.c:742:tgt_grant_check()) fir-MDT0001: cli e70978d5-4079-f244-8878-d79e9ac6d1e1 claims 155648 GRANT, real grant 0^M
          [Sat Jun 15 01:16:30 2019][1413281.462751] LustreError: 73327:0:(tgt_grant.c:742:tgt_grant_check()) Skipped 9698 previous similar messages^M
          [Sat Jun 15 01:19:30 2019][1413461.904197] Lustre: 21344:0:(llog_cat.c:874:llog_cat_process_or_fork()) fir-MDD0001: catlog [0x5:0xa:0x0] crosses index zero^M
          [Sat Jun 15 01:19:31 2019][1413461.915593] Lustre: 21344:0:(llog_cat.c:874:llog_cat_process_or_fork()) Skipped 125 previous similar messages^M
          [Sat Jun 15 01:19:41 2019][1413472.771519] LustreError: 49210:0:(osp_object.c:1458:osp_declare_create()) ASSERTION( o->opo_reserved == 0 ) failed: ^M
          [Sat Jun 15 01:19:41 2019][1413472.782226] LustreError: 49210:0:(osp_object.c:1458:osp_declare_create()) LBUG^M
          [Sat Jun 15 01:19:41 2019][1413472.789629] Pid: 49210, comm: mdt03_089 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018^M
          [Sat Jun 15 01:19:41 2019][1413472.799538] Call Trace:^M
          [Sat Jun 15 01:19:41 2019][1413472.802173]  [<ffffffffc09587cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]^M
          [Sat Jun 15 01:19:41 2019][1413472.808916]  [<ffffffffc095887c>] lbug_with_loc+0x4c/0xa0 [libcfs]^M
          [Sat Jun 15 01:19:41 2019][1413472.815311]  [<ffffffffc154ecc5>] osp_declare_create+0x5a5/0x5b0 [osp]^M
          [Sat Jun 15 01:19:41 2019][1413472.822046]  [<ffffffffc149639f>] lod_sub_declare_create+0xdf/0x210 [lod]^M
          [Sat Jun 15 01:19:41 2019][1413472.829065]  [<ffffffffc1495904>] lod_qos_prep_create+0x15d4/0x1890 [lod]^M
          [Sat Jun 15 01:19:41 2019][1413472.836069]  [<ffffffffc1476bba>] lod_declare_instantiate_components+0x9a/0x1d0 [lod]^M
          [Sat Jun 15 01:19:41 2019][1413472.844103]  [<ffffffffc14894d5>] lod_declare_layout_change+0xb65/0x10f0 [lod]^M
          [Sat Jun 15 01:19:41 2019][1413472.851557]  [<ffffffffc14fb102>] mdd_declare_layout_change+0x62/0x120 [mdd]^M
          [Sat Jun 15 01:19:41 2019][1413472.858838]  [<ffffffffc1503e52>] mdd_layout_change+0x882/0x1000 [mdd]^M
          [Sat Jun 15 01:19:41 2019][1413472.865569]  [<ffffffffc136a327>] mdt_layout_change+0x337/0x430 [mdt]^M
          [Sat Jun 15 01:19:41 2019][1413472.872234]  [<ffffffffc137243e>] mdt_intent_layout+0x7ee/0xcc0 [mdt]^M
          [Sat Jun 15 01:19:41 2019][1413472.878907]  [<ffffffffc136fa28>] mdt_intent_policy+0x2e8/0xd00 [mdt]^M
          [Sat Jun 15 01:19:41 2019][1413472.885563]  [<ffffffffc0cc6d46>] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc]^M
          [Sat Jun 15 01:19:41 2019][1413472.892512]  [<ffffffffc0cef707>] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc]^M
          [Sat Jun 15 01:19:41 2019][1413472.899795]  [<ffffffffc0d766e2>] tgt_enqueue+0x62/0x210 [ptlrpc]^M
          [Sat Jun 15 01:19:42 2019][1413472.906131]  [<ffffffffc0d7d73a>] tgt_request_handle+0xaea/0x1580 [ptlrpc]^M
          [Sat Jun 15 01:19:42 2019][1413472.913245]  [<ffffffffc0d21d0b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]^M
          [Sat Jun 15 01:19:42 2019][1413472.921141]  [<ffffffffc0d2563c>] ptlrpc_main+0xafc/0x1fc0 [ptlrpc]^M
          [Sat Jun 15 01:19:42 2019][1413472.927653]  [<ffffffff950c1c31>] kthread+0xd1/0xe0^M
          [Sat Jun 15 01:19:42 2019][1413472.932747]  [<ffffffff95774c24>] ret_from_fork_nospec_begin+0xe/0x21^M
          [Sat Jun 15 01:19:42 2019][1413472.939394]  [<ffffffffffffffff>] 0xffffffffffffffff^M
          [Sat Jun 15 01:19:42 2019][1413472.944595] Kernel panic - not syncing: LBUG^M
          

           Files provided for this occurrence:

          sthiell Stephane Thiell added a comment - - edited The filesystem has been running fine for about 2 weeks until a new occurence of this MDS crash occurred last night.  Additionally, it took us several hours this morning to put the filesystem back online after the crash because the MDT recovery was stuck at 0 sec and wouldn't finish. Even a manual abort_recovery didn't work. Only disconnecting all clients resolved the issue. This has already been reported as LU-12360 (and I posted an update there too). So for the MDS LBUG, this time I do have a proper vmcore. First, the LBUG: [Sat Jun 15 00:46:30 2019][1411481.220356] LustreError: 21484:0:(tgt_grant.c:742:tgt_grant_check()) fir-MDT0001: cli e70978d5-4079-f244-8878-d79e9ac6d1e1 claims 155648 GRANT, real grant 0^M [Sat Jun 15 00:46:30 2019][1411481.234534] LustreError: 21484:0:(tgt_grant.c:742:tgt_grant_check()) Skipped 9824 previous similar messages^M [Sat Jun 15 00:49:13 2019][1411644.396226] Lustre: 23817:0:(llog_cat.c:874:llog_cat_process_or_fork()) fir-MDD0001: catlog [0x5:0xa:0x0] crosses index zero^M [Sat Jun 15 00:49:13 2019][1411644.407621] Lustre: 23817:0:(llog_cat.c:874:llog_cat_process_or_fork()) Skipped 77 previous similar messages^M [Sat Jun 15 00:56:30 2019][1412081.314545] LustreError: 21403:0:(tgt_grant.c:742:tgt_grant_check()) fir-MDT0001: cli e70978d5-4079-f244-8878-d79e9ac6d1e1 claims 155648 GRANT, real grant 0^M [Sat Jun 15 00:56:30 2019][1412081.328707] LustreError: 21403:0:(tgt_grant.c:742:tgt_grant_check()) Skipped 9658 previous similar messages^M [Sat Jun 15 00:59:14 2019][1412245.465371] Lustre: 23731:0:(llog_cat.c:874:llog_cat_process_or_fork()) fir-MDD0001: catlog [0x5:0xa:0x0] crosses index zero^M [Sat Jun 15 00:59:14 2019][1412245.476773] Lustre: 23731:0:(llog_cat.c:874:llog_cat_process_or_fork()) Skipped 52 previous similar messages^M [Sat Jun 15 01:06:30 2019][1412681.380333] LustreError: 73335:0:(tgt_grant.c:742:tgt_grant_check()) fir-MDT0001: cli e70978d5-4079-f244-8878-d79e9ac6d1e1 claims 155648 GRANT, real grant 0^M [Sat Jun 15 01:06:30 2019][1412681.394502] LustreError: 73335:0:(tgt_grant.c:742:tgt_grant_check()) Skipped 9471 previous similar messages^M [Sat Jun 15 01:09:30 2019][1412861.615161] Lustre: 23727:0:(llog_cat.c:874:llog_cat_process_or_fork()) fir-MDD0001: catlog [0x5:0xa:0x0] crosses index zero^M [Sat Jun 15 01:09:30 2019][1412861.626555] Lustre: 23727:0:(llog_cat.c:874:llog_cat_process_or_fork()) Skipped 80 previous similar messages^M [Sat Jun 15 01:16:30 2019][1413281.448578] LustreError: 73327:0:(tgt_grant.c:742:tgt_grant_check()) fir-MDT0001: cli e70978d5-4079-f244-8878-d79e9ac6d1e1 claims 155648 GRANT, real grant 0^M [Sat Jun 15 01:16:30 2019][1413281.462751] LustreError: 73327:0:(tgt_grant.c:742:tgt_grant_check()) Skipped 9698 previous similar messages^M [Sat Jun 15 01:19:30 2019][1413461.904197] Lustre: 21344:0:(llog_cat.c:874:llog_cat_process_or_fork()) fir-MDD0001: catlog [0x5:0xa:0x0] crosses index zero^M [Sat Jun 15 01:19:31 2019][1413461.915593] Lustre: 21344:0:(llog_cat.c:874:llog_cat_process_or_fork()) Skipped 125 previous similar messages^M [Sat Jun 15 01:19:41 2019][1413472.771519] LustreError: 49210:0:(osp_object.c:1458:osp_declare_create()) ASSERTION( o->opo_reserved == 0 ) failed: ^M [Sat Jun 15 01:19:41 2019][1413472.782226] LustreError: 49210:0:(osp_object.c:1458:osp_declare_create()) LBUG^M [Sat Jun 15 01:19:41 2019][1413472.789629] Pid: 49210, comm: mdt03_089 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018^M [Sat Jun 15 01:19:41 2019][1413472.799538] Call Trace:^M [Sat Jun 15 01:19:41 2019][1413472.802173] [<ffffffffc09587cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]^M [Sat Jun 15 01:19:41 2019][1413472.808916] [<ffffffffc095887c>] lbug_with_loc+0x4c/0xa0 [libcfs]^M [Sat Jun 15 01:19:41 2019][1413472.815311] [<ffffffffc154ecc5>] osp_declare_create+0x5a5/0x5b0 [osp]^M [Sat Jun 15 01:19:41 2019][1413472.822046] [<ffffffffc149639f>] lod_sub_declare_create+0xdf/0x210 [lod]^M [Sat Jun 15 01:19:41 2019][1413472.829065] [<ffffffffc1495904>] lod_qos_prep_create+0x15d4/0x1890 [lod]^M [Sat Jun 15 01:19:41 2019][1413472.836069] [<ffffffffc1476bba>] lod_declare_instantiate_components+0x9a/0x1d0 [lod]^M [Sat Jun 15 01:19:41 2019][1413472.844103] [<ffffffffc14894d5>] lod_declare_layout_change+0xb65/0x10f0 [lod]^M [Sat Jun 15 01:19:41 2019][1413472.851557] [<ffffffffc14fb102>] mdd_declare_layout_change+0x62/0x120 [mdd]^M [Sat Jun 15 01:19:41 2019][1413472.858838] [<ffffffffc1503e52>] mdd_layout_change+0x882/0x1000 [mdd]^M [Sat Jun 15 01:19:41 2019][1413472.865569] [<ffffffffc136a327>] mdt_layout_change+0x337/0x430 [mdt]^M [Sat Jun 15 01:19:41 2019][1413472.872234] [<ffffffffc137243e>] mdt_intent_layout+0x7ee/0xcc0 [mdt]^M [Sat Jun 15 01:19:41 2019][1413472.878907] [<ffffffffc136fa28>] mdt_intent_policy+0x2e8/0xd00 [mdt]^M [Sat Jun 15 01:19:41 2019][1413472.885563] [<ffffffffc0cc6d46>] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc]^M [Sat Jun 15 01:19:41 2019][1413472.892512] [<ffffffffc0cef707>] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc]^M [Sat Jun 15 01:19:41 2019][1413472.899795] [<ffffffffc0d766e2>] tgt_enqueue+0x62/0x210 [ptlrpc]^M [Sat Jun 15 01:19:42 2019][1413472.906131] [<ffffffffc0d7d73a>] tgt_request_handle+0xaea/0x1580 [ptlrpc]^M [Sat Jun 15 01:19:42 2019][1413472.913245] [<ffffffffc0d21d0b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]^M [Sat Jun 15 01:19:42 2019][1413472.921141] [<ffffffffc0d2563c>] ptlrpc_main+0xafc/0x1fc0 [ptlrpc]^M [Sat Jun 15 01:19:42 2019][1413472.927653] [<ffffffff950c1c31>] kthread+0xd1/0xe0^M [Sat Jun 15 01:19:42 2019][1413472.932747] [<ffffffff95774c24>] ret_from_fork_nospec_begin+0xe/0x21^M [Sat Jun 15 01:19:42 2019][1413472.939394] [<ffffffffffffffff>] 0xffffffffffffffff^M [Sat Jun 15 01:19:42 2019][1413472.944595] Kernel panic - not syncing: LBUG^M  Files provided for this occurrence: uploaded vmcore to the FTP as fir-md1-s2-vmcore-2019-06-15-01-19-53 (our kernel is 3.10.0-957.1.3.el7_lustre.x86_64 and debuginfo package has already been provided attached vmcore-dmesg-fir-md1-s2-2019-06-15-01-19-53.txt attached fir-md1-s2-crash-foreach-bt-2019-06-15-01-19-53.log attached fir-md1-s2-crash-ps-2019-06-15-01-19-53.log   Thanks

          People

            laisiyao Lai Siyao
            sthiell Stephane Thiell
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: