Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3474

MDS LBUG on unlink?

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.1, Lustre 2.5.0
    • Lustre 2.4.0, Lustre 2.5.0
    • 3
    • 8703

    Description

      Hi,

      We have been testing v2.4 and have hit this LBUG which we have never experienced in v1.8.x for similar workloads. It looks like it is related to do an rm/unlink on certain files. I had to abort recovery and stop the ongoing file deletion in order to keep the MDS from repeatedly crashing with the same LBUG. We can supply more debug info should you need it.

      Cheers,

      Daire

      <0>LustreError: 6274:0:(linkea.c:169:linkea_links_find()) ASSERTION( ldata->ld_leh != ((void *)0) ) failed:
      <0>LustreError: 6274:0:(linkea.c:169:linkea_links_find()) LBUG
      <4>Pid: 6274, comm: mdt01_004
      <4>
      <4>Call Trace:
      <4> [<ffffffffa044b895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      <4> [<ffffffffa044be97>] lbug_with_loc+0x47/0xb0 [libcfs]
      <4> [<ffffffffa05b47d6>] linkea_links_find+0x186/0x190 [obdclass]
      <4> [<ffffffffa0b87206>] ? mdo_xattr_get+0x26/0x30 [mdd]
      <4> [<ffffffffa0b8a645>] mdd_linkea_prepare+0x95/0x430 [mdd]
      <4> [<ffffffffa0b8ab01>] mdd_links_rename+0x121/0x540 [mdd]
      <4> [<ffffffffa0b8eae6>] mdd_unlink+0xb86/0xe30 [mdd]
      <4> [<ffffffffa0e0db98>] mdo_unlink+0x18/0x50 [mdt]
      <4> [<ffffffffa0e10f40>] mdt_reint_unlink+0x820/0x1010 [mdt]
      <4> [<ffffffffa0e0d891>] mdt_reint_rec+0x41/0xe0 [mdt]
      <4> [<ffffffffa0df2b03>] mdt_reint_internal+0x4c3/0x780 [mdt]
      <4> [<ffffffffa0df2e04>] mdt_reint+0x44/0xe0 [mdt]
      <4> [<ffffffffa0df7ab8>] mdt_handle_common+0x648/0x1660 [mdt]
      <4> [<ffffffffa0e31165>] mds_regular_handle+0x15/0x20 [mdt]
      <4> [<ffffffffa0730388>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
      <4> [<ffffffffa044c5de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      <4> [<ffffffffa045dd8f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
      <4> [<ffffffffa07276e9>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
      <4> [<ffffffff81055ab3>] ? __wake_up+0x53/0x70
      <4> [<ffffffffa073171e>] ptlrpc_main+0xace/0x1700 [ptlrpc]
      <4> [<ffffffffa0730c50>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      <4> [<ffffffff8100c0ca>] child_rip+0xa/0x20
      <4> [<ffffffffa0730c50>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      <4> [<ffffffffa0730c50>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      <4> [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      <4>
      <0>Kernel panic - not syncing: LBUG
      <4>Pid: 6274, comm: mdt01_004 Tainted: G --------------- T 2.6.32-358.6.2.el6_lustre.g230b174.x86_64 #1
      <4>Call Trace:
      <4> [<ffffffff8150d878>] ? panic+0xa7/0x16f
      <4> [<ffffffffa044beeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
      <4> [<ffffffffa05b47d6>] ? linkea_links_find+0x186/0x190 [obdclass]
      <4> [<ffffffffa0b87206>] ? mdo_xattr_get+0x26/0x30 [mdd]
      <4> [<ffffffffa0b8a645>] ? mdd_linkea_prepare+0x95/0x430 [mdd]
      <4> [<ffffffffa0b8ab01>] ? mdd_links_rename+0x121/0x540 [mdd]
      <4> [<ffffffffa0b8eae6>] ? mdd_unlink+0xb86/0xe30 [mdd]
      <4> [<ffffffffa0e0db98>] ? mdo_unlink+0x18/0x50 [mdt]
      <4> [<ffffffffa0e10f40>] ? mdt_reint_unlink+0x820/0x1010 [mdt]
      <4> [<ffffffffa0e0d891>] ? mdt_reint_rec+0x41/0xe0 [mdt]
      <4> [<ffffffffa0df2b03>] ? mdt_reint_internal+0x4c3/0x780 [mdt]
      <4> [<ffffffffa0df2e04>] ? mdt_reint+0x44/0xe0 [mdt]
      <4> [<ffffffffa0df7ab8>] ? mdt_handle_common+0x648/0x1660 [mdt]
      <4> [<ffffffffa0e31165>] ? mds_regular_handle+0x15/0x20 [mdt]
      <4> [<ffffffffa0730388>] ? ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
      <4> [<ffffffffa044c5de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      <4> [<ffffffffa045dd8f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
      <4> [<ffffffffa07276e9>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
      <4> [<ffffffff81055ab3>] ? __wake_up+0x53/0x70
      <4> [<ffffffffa073171e>] ? ptlrpc_main+0xace/0x1700 [ptlrpc]
      <4> [<ffffffffa0730c50>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      <4> [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
      <4> [<ffffffffa0730c50>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      <4> [<ffffffffa0730c50>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      <4> [<ffffffff8100c0c0>] ? child_rip+0x0/0x20

      Attachments

        Issue Links

          Activity

            People

              bfaccini Bruno Faccini (Inactive)
              daire Daire Byrne (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: