Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7825

ldlm_lock.c:810:ldlm_lock_decref_internal_nolock()) ASSERTION( lock->l_writers > 0

    XMLWordPrintable

Details

    • 3
    • 9223372036854775807

    Description

      Error happens during soak testing of build '20160224' (b2_8 RC2) (see:
      https://wiki.hpdd.intel.com/pages/viewpage.action?title=Soak+Testing+on+Lola& spaceKey=Releases#SoakTestingonLola-20150224). DNE is enabled.
      MDSes had been formatted using ldiskfs, OSTs using zfs. MDSes are configured in active-active HA failover configuration.

      Sequence of events:

      • 2016-02-27 02:04:02,121:fsmgmt.fsmgmt:INFO mds_failover just completed (lola-10 ---> lola-11)
      • Feb 27 02:06:44 lola-10 kernel: Lustre: soaked-MDT0005: Recovery over after 2:42, of 16 clients 14 recovered and 2 were evicted.
      • Feb 27 02:12:06 lola-10 kernel: Lustre: soaked-MDT0004: Recovery over after 8:02, of 16 clients 11 recovered and 5 were evicted.
      • 2016-02-27 02:12:58 lola-9 (different HA pair) crashed

      The error reads as:

      <0>LustreError: 5003:0:(ldlm_lock.c:810:ldlm_lock_decref_internal_nolock()) ASSERTION( lock->l_writers > 0 ) failed: 
      <0>LustreError: 5003:0:(ldlm_lock.c:810:ldlm_lock_decref_internal_nolock()) LBUG
      <4>Pid: 5003, comm: mdt02_007
      <4>
      <4>Call Trace:
      <4> [<ffffffffa0748875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      <4> [<ffffffffa0748e77>] lbug_with_loc+0x47/0xb0 [libcfs]
      <4> [<ffffffffa0a2ef0f>] ldlm_lock_decref_internal_nolock+0x17f/0x180 [ptlrpc]
      <4> [<ffffffffa0a3102d>] ldlm_lock_decref_internal+0x4d/0xa80 [ptlrpc]
      <4> [<ffffffffa083f935>] ? class_handle2object+0x95/0x190 [obdclass]
      <4> [<ffffffffa0a325a0>] ldlm_lock_decref_and_cancel+0x80/0x150 [ptlrpc]
      <4> [<ffffffffa1164c67>] mdt_object_unlock+0xa7/0x2e0 [mdt]
      <4> [<ffffffffa11867ca>] mdt_reint_rename_or_migrate+0xf3a/0x2600 [mdt]
      <4> [<ffffffffa0ab7bdd>] ? null_alloc_rs+0xcd/0x320 [ptlrpc]
      <4> [<ffffffffa0876cbc>] ? upcall_cache_get_entry+0x29c/0x880 [obdclass]
      <4> [<ffffffffa087bbf0>] ? lu_ucred+0x20/0x30 [obdclass]
      <4> [<ffffffffa0a7d100>] ? lustre_pack_reply_v2+0x180/0x280 [ptlrpc]
      <4> [<ffffffffa117d50f>] ? ucred_set_jobid+0x5f/0x70 [mdt]
      <4> [<ffffffffa1187ec3>] mdt_reint_rename+0x13/0x20 [mdt]
      <4> [<ffffffffa118118d>] mdt_reint_rec+0x5d/0x200 [mdt]
      <4> [<ffffffffa116cddb>] mdt_reint_internal+0x62b/0x9f0 [mdt]
      <4> [<ffffffffa116d63b>] mdt_reint+0x6b/0x120 [mdt]
      <4> [<ffffffffa0ae0c2c>] tgt_request_handle+0x8ec/0x1440 [ptlrpc]
      <4> [<ffffffffa0a8dc61>] ptlrpc_main+0xd21/0x1800 [ptlrpc]
      <4> [<ffffffff8152a39e>] ? thread_return+0x4e/0x7d0
      <4> [<ffffffffa0a8cf40>] ? ptlrpc_main+0x0/0x1800 [ptlrpc]
      <4> [<ffffffff8109e78e>] kthread+0x9e/0xc0
      <4> [<ffffffff8100c28a>] child_rip+0xa/0x20
      <4> [<ffffffff8109e6f0>] ? kthread+0x0/0xc0
      <4> [<ffffffff8100c280>] ? child_rip+0x0/0x20
      <4>
      <0>Kernel panic - not syncing: LBUG
      <4>Pid: 5003, comm: mdt02_007 Tainted: P           ---------------    2.6.32-504.30.3.el6_lustre.x86_64 #1
      <4>Call Trace:
      <4> [<ffffffff81529c9c>] ? panic+0xa7/0x16f
      <4> [<ffffffffa0748ecb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
      <4> [<ffffffffa0a2ef0f>] ? ldlm_lock_decref_internal_nolock+0x17f/0x180 [ptlrpc]
      <4> [<ffffffffa0a3102d>] ? ldlm_lock_decref_internal+0x4d/0xa80 [ptlrpc]
      <4> [<ffffffffa083f935>] ? class_handle2object+0x95/0x190 [obdclass]
      <4> [<ffffffffa0a325a0>] ? ldlm_lock_decref_and_cancel+0x80/0x150 [ptlrpc]
      <4> [<ffffffffa1164c67>] ? mdt_object_unlock+0xa7/0x2e0 [mdt]
      <4> [<ffffffffa11867ca>] ? mdt_reint_rename_or_migrate+0xf3a/0x2600 [mdt]
      <4> [<ffffffffa0ab7bdd>] ? null_alloc_rs+0xcd/0x320 [ptlrpc]
      <4> [<ffffffffa0876cbc>] ? upcall_cache_get_entry+0x29c/0x880 [obdclass]
      <4> [<ffffffffa087bbf0>] ? lu_ucred+0x20/0x30 [obdclass]
      <4> [<ffffffffa0a7d100>] ? lustre_pack_reply_v2+0x180/0x280 [ptlrpc]
      <4> [<ffffffffa117d50f>] ? ucred_set_jobid+0x5f/0x70 [mdt]
      <4> [<ffffffffa1187ec3>] ? mdt_reint_rename+0x13/0x20 [mdt]
      <4> [<ffffffffa118118d>] ? mdt_reint_rec+0x5d/0x200 [mdt]
      <4> [<ffffffffa116cddb>] ? mdt_reint_internal+0x62b/0x9f0 [mdt]
      <4> [<ffffffffa116d63b>] ? mdt_reint+0x6b/0x120 [mdt]
      <4> [<ffffffffa0ae0c2c>] ? tgt_request_handle+0x8ec/0x1440 [ptlrpc]
      <4> [<ffffffffa0a8dc61>] ? ptlrpc_main+0xd21/0x1800 [ptlrpc]
      <4> [<ffffffff8152a39e>] ? thread_return+0x4e/0x7d0
      <4> [<ffffffffa0a8cf40>] ? ptlrpc_main+0x0/0x1800 [ptlrpc]
      <4> [<ffffffff8109e78e>] ? kthread+0x9e/0xc0
      <4> [<ffffffff8100c28a>] ? child_rip+0xa/0x20
      <4> [<ffffffff8109e6f0>] ? kthread+0x0/0xc0
      <4> [<ffffffff8100c280>] ? child_rip+0x0/0x20
      

      Attached message, console logs of MDS nodes lola-9, lola-10 and also vmcore-dmesg.txt.
      Crash file will be saved separately.

      Attachments

        1. messages-lola-9.log.bz2
          270 kB
        2. messages-lola-10.log.bz2
          310 kB
        3. lola-9-vmcore-dmesg.txt.bz2
          34 kB
        4. console-lola-9.log.bz2
          608 kB
        5. console-lola-10.log.bz2
          392 kB

        Activity

          People

            di.wang Di Wang
            heckes Frank Heckes (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: