Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5069

Hit LBUG in DNE racer test: (lu_object.h:852:lu_object_attr()) ASSERTION( ((o)->lo_header->loh_attr & LOHA_EXISTS) != 0 ) failed

Details

    • 3
    • 13997

    Description

      on MDS2 hit the LBUG

      LustreError: 2958:0:(mdd_dir.c:3954:mdd_migrate()) Skipped 15 previous similar messages
      LustreError: 2830:0:(lu_object.h:852:lu_object_attr()) ASSERTION( ((o)->lo_header->loh_attr & LOHA_EXISTS) != 0 ) failed: 
      LustreError: 2830:0:(lu_object.h:852:lu_object_attr()) LBUG
      Pid: 2830, comm: mdt00_003
      
      Call Trace:
      
      Message from  [<ffffffffa0399895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      syslogd@client-1 [<ffffffffa0399e97>] lbug_with_loc+0x47/0xb0 [libcfs]
      8 at May 15 14:1 [<ffffffffa0f3ae97>] mdd_is_subdir+0x277/0x280 [mdd]
      4:19 ...
       kern [<ffffffffa0e0f2ef>] mdt_rename_sanity+0xff/0x4a0 [mdt]
      el:LustreError:  [<ffffffffa0e1321c>] mdt_reint_rename_internal+0xdc/0x1a80 [mdt]
      2830:0:(lu_objec [<ffffffffa06e46f8>] ? ldlm_lock_enqueue+0x1c8/0x930 [ptlrpc]
      t.h:852:lu_objec [<ffffffffa0703edb>] ? ldlm_cli_enqueue_local+0x28b/0x5e0 [ptlrpc]
      t_attr()) ASSERT [<ffffffffa0e14e04>] mdt_reint_rename_or_migrate+0x244/0x660 [mdt]
      ION( ((o)->lo_he [<ffffffffa0702bc0>] ? ldlm_blocking_ast+0x0/0x180 [ptlrpc]
      ader->loh_attr & [<ffffffffa0704230>] ? ldlm_completion_ast+0x0/0x930 [ptlrpc]
       LOHA_EXISTS) != [<ffffffffa0e15250>] mdt_reint_rename+0x10/0x20 [mdt]
       0 ) failed: 
       [<ffffffffa0e0d881>] mdt_reint_rec+0x41/0xe0 [mdt]
       [<ffffffffa0df2e93>] mdt_reint_internal+0x4c3/0x7c0 [mdt]
       [<ffffffffa0df371b>] mdt_reint+0x6b/0x120 [mdt]
      
      Message from [<ffffffffa078fe5c>] tgt_request_handle+0x23c/0xac0 [ptlrpc]
       syslogd@client- [<ffffffffa073faea>] ptlrpc_main+0xd1a/0x1980 [ptlrpc]
      18 at May 15 14: [<ffffffffa073edd0>] ? ptlrpc_main+0x0/0x1980 [ptlrpc]
      14:19 ...
       ker [<ffffffff8109ab56>] kthread+0x96/0xa0
       [<ffffffff8100c20a>] child_rip+0xa/0x20
      nel:LustreError: [<ffffffff8109aac0>] ? kthread+0x0/0xa0
       [<ffffffff8100c200>] ? child_rip+0x0/0x20
      
       2830:0:(lu_objeLustreError: dumping log to /tmp/lustre-log.1400188459.2830
      ct.h:852:lu_object_attr()) LBUG
      

      Attachments

        Activity

          [LU-5069] Hit LBUG in DNE racer test: (lu_object.h:852:lu_object_attr()) ASSERTION( ((o)->lo_header->loh_attr & LOHA_EXISTS) != 0 ) failed
          pjones Peter Jones added a comment -

          Landed for 2.6

          pjones Peter Jones added a comment - Landed for 2.6
          di.wang Di Wang added a comment - http://review.whamcloud.com/10340
          di.wang Di Wang added a comment -

          It seems caused by this

          LU-4725 mdt: child-parent lock ordering in rename

          change rename so that it always has parent-child lock ordering,
          otherwise it may deadlock with other operations.

          Signed-off-by: Vitaly Fertman <vitaly_fertman@xyratex.com>
          Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
          Change-Id: If676da82ca50a20a4bb3aadef0f81c9c5ed3cbcb
          Xyratex-bug-id: MRP-1700
          Reviewed-on: http://review.whamcloud.com/9538
          Tested-by: Jenkins
          Tested-by: Maloo <hpdd-maloo@intel.com>
          Reviewed-by: wangdi <di.wang@intel.com>
          Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

          Hmm, do mdt_sanity_check without ldlm lock protection seems a bit risky. at least it needs to check whether object exist before mdo_is_subdir

          static int mdt_rename_sanity(struct mdt_thread_info *info,
                                       const struct lu_fid *dir_fid,
                                       const struct lu_fid *fid)
          
          {
          ...............
          
                                 <-------------------- check whether the object(dot) exists here.
                                 rc = mdo_is_subdir(info->mti_env,
                                                     mdt_object_child(dst), fid,
                                                     &dst_fid);
                                  mdt_object_put(info->mti_env, dst);
          }
          
          

          I will cook a patch.

          di.wang Di Wang added a comment - It seems caused by this LU-4725 mdt: child-parent lock ordering in rename change rename so that it always has parent-child lock ordering, otherwise it may deadlock with other operations. Signed-off-by: Vitaly Fertman <vitaly_fertman@xyratex.com> Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com> Change-Id: If676da82ca50a20a4bb3aadef0f81c9c5ed3cbcb Xyratex-bug-id: MRP-1700 Reviewed-on: http://review.whamcloud.com/9538 Tested-by: Jenkins Tested-by: Maloo <hpdd-maloo@intel.com> Reviewed-by: wangdi <di.wang@intel.com> Reviewed-by: Oleg Drokin <oleg.drokin@intel.com> Hmm, do mdt_sanity_check without ldlm lock protection seems a bit risky. at least it needs to check whether object exist before mdo_is_subdir static int mdt_rename_sanity(struct mdt_thread_info *info, const struct lu_fid *dir_fid, const struct lu_fid *fid) { ............... <-------------------- check whether the object(dot) exists here. rc = mdo_is_subdir(info->mti_env, mdt_object_child(dst), fid, &dst_fid); mdt_object_put(info->mti_env, dst); } I will cook a patch.

          People

            di.wang Di Wang
            sarah Sarah Liu
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: