Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11967

MDS LBUG ASSERTION( o->opo_reserved == 0 ) failed

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.13.0, Lustre 2.12.4
    • Lustre 2.12.0
    • None
    • CentOS 7.6, Kernel 3.10.0-957.1.3.el7_lustre.x86_64, all clients are 2.12.0
    • 3
    • 9223372036854775807

    Description

      We just hit the following MDS crash on Fir (2.12), server fir-md1-s1:

      [497493.075367] Lustre: fir-MDT0000: Client 691c85d2-0e39-9e6d-1bfd-ecbaccae5366 (at 10.8.2.27@o2ib6) reconnecting
      [497594.956880] LustreError: 12324:0:(osp_object.c:1458:osp_declare_create()) ASSERTION( o->opo_reserved == 0 ) failed: 
      [497594.967490] LustreError: 12324:0:(osp_object.c:1458:osp_declare_create()) LBUG
      [497594.974807] Pid: 12324, comm: mdt01_074 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018
      [497594.984636] Call Trace:
      [497594.987187]  [<ffffffffc0c5e7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
      [497594.993859]  [<ffffffffc0c5e87c>] lbug_with_loc+0x4c/0xa0 [libcfs]
      [497595.000177]  [<ffffffffc17cdcc5>] osp_declare_create+0x5a5/0x5b0 [osp]
      [497595.006833]  [<ffffffffc171539f>] lod_sub_declare_create+0xdf/0x210 [lod]
      [497595.013748]  [<ffffffffc1714904>] lod_qos_prep_create+0x15d4/0x1890 [lod]
      [497595.020662]  [<ffffffffc16f5bba>] lod_declare_instantiate_components+0x9a/0x1d0 [lod]
      [497595.028614]  [<ffffffffc17084d5>] lod_declare_layout_change+0xb65/0x10f0 [lod]
      [497595.035988]  [<ffffffffc177a102>] mdd_declare_layout_change+0x62/0x120 [mdd]
      [497595.043172]  [<ffffffffc1782e52>] mdd_layout_change+0x882/0x1000 [mdd]
      [497595.049830]  [<ffffffffc15ea317>] mdt_layout_change+0x337/0x430 [mdt]
      [497595.056398]  [<ffffffffc15f242e>] mdt_intent_layout+0x7ee/0xcc0 [mdt]
      [497595.062968]  [<ffffffffc15efa18>] mdt_intent_policy+0x2e8/0xd00 [mdt]
      [497595.069549]  [<ffffffffc0f41ec6>] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc]
      [497595.076400]  [<ffffffffc0f6a8a7>] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc]
      [497595.083597]  [<ffffffffc0ff1302>] tgt_enqueue+0x62/0x210 [ptlrpc]
      [497595.089851]  [<ffffffffc0ff835a>] tgt_request_handle+0xaea/0x1580 [ptlrpc]
      [497595.096881]  [<ffffffffc0f9c92b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
      [497595.104679]  [<ffffffffc0fa025c>] ptlrpc_main+0xafc/0x1fc0 [ptlrpc]
      [497595.111095]  [<ffffffff9d6c1c31>] kthread+0xd1/0xe0
      [497595.116096]  [<ffffffff9dd74c24>] ret_from_fork_nospec_begin+0xe/0x21
      [497595.122657]  [<ffffffffffffffff>] 0xffffffffffffffff
      [497595.127761] Kernel panic - not syncing: LBUG
      [497595.132122] CPU: 41 PID: 12324 Comm: mdt01_074 Kdump: loaded Tainted: G           OEL ------------   3.10.0-957.1.3.el7_lustre.x86_64 #1
      [497595.144451] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018
      [497595.152106] Call Trace:
      [497595.154649]  [<ffffffff9dd61e41>] dump_stack+0x19/0x1b
      [497595.159882]  [<ffffffff9dd5b550>] panic+0xe8/0x21f
      [497595.164763]  [<ffffffffc0c5e8cb>] lbug_with_loc+0x9b/0xa0 [libcfs]
      [497595.171040]  [<ffffffffc17cdcc5>] osp_declare_create+0x5a5/0x5b0 [osp]
      [497595.177668]  [<ffffffffc171539f>] lod_sub_declare_create+0xdf/0x210 [lod]
      [497595.184541]  [<ffffffff9d994d0d>] ? list_del+0xd/0x30
      [497595.189693]  [<ffffffffc1714904>] lod_qos_prep_create+0x15d4/0x1890 [lod]
      [497595.196569]  [<ffffffff9d81a849>] ? ___slab_alloc+0x209/0x4f0
      [497595.202421]  [<ffffffffc0d87f7b>] ? class_handle_hash+0xab/0x2f0 [obdclass]
      [497595.209474]  [<ffffffff9d6d67b0>] ? wake_up_state+0x20/0x20
      [497595.215152]  [<ffffffffc0da7138>] ? lu_buf_alloc+0x48/0x320 [obdclass]
      [497595.221803]  [<ffffffffc0f5be0d>] ? ldlm_cli_enqueue_local+0x27d/0x870 [ptlrpc]
      [497595.229208]  [<ffffffffc16f5bba>] lod_declare_instantiate_components+0x9a/0x1d0 [lod]
      [497595.237131]  [<ffffffffc17084d5>] lod_declare_layout_change+0xb65/0x10f0 [lod]
      [497595.244442]  [<ffffffffc177a102>] mdd_declare_layout_change+0x62/0x120 [mdd]
      [497595.251584]  [<ffffffffc1782e52>] mdd_layout_change+0x882/0x1000 [mdd]
      [497595.258213]  [<ffffffffc15e9b30>] ? mdt_object_lock_internal+0x70/0x3e0 [mdt]
      [497595.265444]  [<ffffffffc15ea317>] mdt_layout_change+0x337/0x430 [mdt]
      [497595.271978]  [<ffffffffc15f242e>] mdt_intent_layout+0x7ee/0xcc0 [mdt]
      [497595.278543]  [<ffffffffc0f8e2f7>] ? lustre_msg_buf+0x17/0x60 [ptlrpc]
      [497595.285083]  [<ffffffffc15efa18>] mdt_intent_policy+0x2e8/0xd00 [mdt]
      [497595.291637]  [<ffffffffc0f40524>] ? ldlm_lock_create+0xa4/0xa40 [ptlrpc]
      [497595.298442]  [<ffffffffc15f1c40>] ? mdt_intent_open+0x350/0x350 [mdt]
      [497595.304999]  [<ffffffffc0f41ec6>] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc]
      [497595.311794]  [<ffffffffc0c69fa3>] ? cfs_hash_bd_add_locked+0x63/0x80 [libcfs]
      [497595.319018]  [<ffffffffc0c6d72e>] ? cfs_hash_add+0xbe/0x1a0 [libcfs]
      [497595.325490]  [<ffffffffc0f6a8a7>] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc]
      [497595.332662]  [<ffffffffc0f927f0>] ? lustre_swab_ldlm_lock_desc+0x30/0x30 [ptlrpc]
      [497595.340268]  [<ffffffffc0ff1302>] tgt_enqueue+0x62/0x210 [ptlrpc]
      [497595.346488]  [<ffffffffc0ff835a>] tgt_request_handle+0xaea/0x1580 [ptlrpc]
      [497595.353481]  [<ffffffffc0fd1a51>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc]
      [497595.361139]  [<ffffffffc0c5ebde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs]
      [497595.368309]  [<ffffffffc0f9c92b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
      [497595.376083]  [<ffffffffc0f997b5>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc]
      [497595.382959]  [<ffffffff9d6d67c2>] ? default_wake_function+0x12/0x20
      [497595.389311]  [<ffffffff9d6cba9b>] ? __wake_up_common+0x5b/0x90
      [497595.395263]  [<ffffffffc0fa025c>] ptlrpc_main+0xafc/0x1fc0 [ptlrpc]
      [497595.401650]  [<ffffffffc0f9f760>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc]
      [497595.409132]  [<ffffffff9d6c1c31>] kthread+0xd1/0xe0
      [497595.414097]  [<ffffffff9d6c1b60>] ? insert_kthread_work+0x40/0x40
      [497595.420279]  [<ffffffff9dd74c24>] ret_from_fork_nospec_begin+0xe/0x21
      [497595.426803]  [<ffffffff9d6c1b60>] ? insert_kthread_work+0x40/0x40
      

      I do have a vmcore.

      Fir has 2 MDS, fir-md1-s1 with MDT0 and MDT2 and fir-md1-s2 with MDT1 and MDT3.

       

      DOM, PFL are enabled and used.

       

      Please let me know if you have any idea how to avoid this.

      Thanks
      Stephane

      Attachments

        1. fir-md1-s1-vmcore-dmesg.txt
          997 kB
          Stephane Thiell
        2. fir-md1-s2-crash-foreach-bt-2019-06-15-01-19-53.log
          916 kB
          Stephane Thiell
        3. fir-md1-s2-crash-ps-2019-06-15-01-19-53.log
          14 kB
          Stephane Thiell
        4. vmcore-dmesg-fir-md1-s2-2019-06-15-01-19-53.txt
          1.00 MB
          Stephane Thiell

        Activity

          People

            laisiyao Lai Siyao
            sthiell Stephane Thiell
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: