Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.8.0
    • Lustre 2.5.3
    • None
    • 2.5.3-2.6.32_431.29.2.el6.atlas.x86_64_g57d5785
    • 3
    • 9223372036854775807

    Description

      This morning a production MDS hit an assertion:

      <0>[2551157.740086] LustreError: 14993:0:(osd_handler.c:4071:osd_ea_lookup_rec()) ASSERTION( dir->i_op != ((void *)0) && dir->i_op->lookup != ((void *)0) ) failed: 
      <0>[2551157.756253] LustreError: 14993:0:(osd_handler.c:4071:osd_ea_lookup_rec()) LBUG
      <4>[2551157.764766] Pid: 14993, comm: mdt01_094
      <4>[2551157.769360] 
      <4>[2551157.769361] Call Trace:
      <4>[2551157.774374]  [<ffffffffa0409895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      <4>[2551157.782474]  [<ffffffffa0409e97>] lbug_with_loc+0x47/0xb0 [libcfs]
      <4>[2551157.789707]  [<ffffffffa0ca8fcf>] osd_index_ea_lookup+0x6ff/0x8a0 [osd_ldiskfs]
      <4>[2551157.798308]  [<ffffffffa0d0dde0>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      <4>[2551157.805733]  [<ffffffffa088c7c0>] ? lod_index_lookup+0x0/0x30 [lod]
      <4>[2551157.813056]  [<ffffffffa088c7e5>] lod_index_lookup+0x25/0x30 [lod]
      <4>[2551157.820291]  [<ffffffffa0dd0daa>] __mdd_lookup+0x24a/0x440 [mdd]
      <4>[2551157.827325]  [<ffffffffa0dd1599>] mdd_lookup+0x39/0xe0 [mdd]
      <4>[2551157.833977]  [<ffffffffa0d3bee5>] ? mdt_name+0x35/0xc0 [mdt]
      <4>[2551157.840629]  [<ffffffffa0d44b09>] mdt_reint_open+0xb69/0x21a0 [mdt]
      <4>[2551157.847959]  [<ffffffffa0426376>] ? upcall_cache_get_entry+0x296/0x880 [libcfs]
      <4>[2551157.856570]  [<ffffffffa05c7a80>] ? lu_ucred+0x20/0x30 [obdclass]
      <4>[2551157.863705]  [<ffffffffa0d2d481>] mdt_reint_rec+0x41/0xe0 [mdt]
      <4>[2551157.870643]  [<ffffffffa0d12ed3>] mdt_reint_internal+0x4c3/0x780 [mdt]
      <4>[2551157.878254]  [<ffffffffa0d1345e>] mdt_intent_reint+0x1ee/0x410 [mdt]
      <4>[2551157.885669]  [<ffffffffa0d10c3e>] mdt_intent_policy+0x3ae/0x770 [mdt]
      <4>[2551157.893212]  [<ffffffffa06e42e5>] ldlm_lock_enqueue+0x135/0x980 [ptlrpc]
      <4>[2551157.901044]  [<ffffffffa070de2b>] ldlm_handle_enqueue0+0x51b/0x10c0 [ptlrpc]
      <4>[2551157.909336]  [<ffffffffa0d11106>] mdt_enqueue+0x46/0xe0 [mdt]
      <4>[2551157.916083]  [<ffffffffa0d15ada>] mdt_handle_common+0x52a/0x1470 [mdt]
      <4>[2551157.923701]  [<ffffffffa0d52595>] mds_regular_handle+0x15/0x20 [mdt]
      <4>[2551157.931144]  [<ffffffffa073cf25>] ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
      <4>[2551157.940128]  [<ffffffffa040a4ce>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      <4>[2551157.947452]  [<ffffffffa041b7c5>] ? lc_watchdog_touch+0x65/0x170 [libcfs]
      <4>[2551157.955380]  [<ffffffffa07358f9>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
      <4>[2551157.963287]  [<ffffffff810546b9>] ? __wake_up_common+0x59/0x90
      <4>[2551157.970142]  [<ffffffffa073f6ed>] ptlrpc_main+0xaed/0x1930 [ptlrpc]
      <4>[2551157.977487]  [<ffffffffa073ec00>] ? ptlrpc_main+0x0/0x1930 [ptlrpc]
      <4>[2551157.984809]  [<ffffffff8109abf6>] kthread+0x96/0xa0
      <4>[2551157.990580]  [<ffffffff8100c20a>] child_rip+0xa/0x20
      <4>[2551157.998367]  [<ffffffff8109ab60>] ? kthread+0x0/0xa0
      <4>[2551158.004229]  [<ffffffff8100c200>] ? child_rip+0x0/0x20
      <4>[2551158.010272] 
      <0>[2551158.012746] Kernel panic - not syncing: LBUG
      

      Attachments

        Activity

          [LU-6996] osd_ea_lookup_rec assertion
          pjones Peter Jones made changes -
          Link New: This issue is related to LDEV-37 [ LDEV-37 ]
          pjones Peter Jones made changes -
          Link Original: This issue is related to LDEV-142 [ LDEV-142 ]
          pjones Peter Jones made changes -
          Link New: This issue is related to LDEV-143 [ LDEV-143 ]
          pjones Peter Jones made changes -
          Fix Version/s New: Lustre 2.8.0 [ 11113 ]
          Resolution New: Fixed [ 1 ]
          Status Original: Open [ 1 ] New: Resolved [ 5 ]
          pjones Peter Jones added a comment -

          Landed for 2.8

          pjones Peter Jones added a comment - Landed for 2.8

          Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16157/
          Subject: LU-6996 osd-ldiskfs: handle stale OI mapping cache
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 7aaa680b7f22e7dfaac8af38b78d89164a94e842

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16157/ Subject: LU-6996 osd-ldiskfs: handle stale OI mapping cache Project: fs/lustre-release Branch: master Current Patch Set: Commit: 7aaa680b7f22e7dfaac8af38b78d89164a94e842
          pjones Peter Jones made changes -
          Link Original: This issue is related to JFC-10 [ JFC-10 ]
          pjones Peter Jones made changes -
          Link New: This issue is related to LDEV-123 [ LDEV-123 ]
          pjones Peter Jones made changes -
          Link New: This issue is related to LDEV-142 [ LDEV-142 ]
          yujian Jian Yu added a comment -

          Hi Alex,

          Nasf is working on the patches handling stale OI mapping cache but he was unsure of the root cause of the original issue in this ticket. Could you please give some more suggestions here?

          yujian Jian Yu added a comment - Hi Alex, Nasf is working on the patches handling stale OI mapping cache but he was unsure of the root cause of the original issue in this ticket. Could you please give some more suggestions here?

          People

            bzzz Alex Zhuravlev
            ezell Matt Ezell
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: