Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5233

2.6 DNE stress testing: (lod_object.c:930:lod_declare_attr_set()) ASSERTION( lo->ldo_stripe ) failed

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.6.0
    • None
    • 3
    • 14584

    Description

      On the same system as LU-5204 (with OST38/0026 still not reachable from MDS1/MDT0), we hit this LBUG on MDS1 during stress testing:

      0>LustreError: 26714:0:(lod_object.c:930:lod_declare_attr_set()) ASSERTION( lo->ldo_stripe ) failed:
      <0>LustreError: 26714:0:(lod_object.c:930:lod_declare_attr_set()) LBUG
      <4>Pid: 26714, comm: mdt02_089
      <4>
      <4>Call Trace:
      <4> [<ffffffffa0c55895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      <4> [<ffffffffa0c55e97>] lbug_with_loc+0x47/0xb0 [libcfs]
      <4> [<ffffffffa15d70e0>] lod_declare_attr_set+0x600/0x660 [lod]
      <4> [<ffffffffa16338b8>] mdd_declare_object_initialize+0xa8/0x290 [mdd]
      <4> [<ffffffffa1635018>] mdd_create+0xb88/0x1870 [mdd]
      <4> [<ffffffffa1506217>] mdt_reint_create+0xcf7/0xed0 [mdt]
      <4> [<ffffffffa1500a81>] mdt_reint_rec+0x41/0xe0 [mdt]
      <4> [<ffffffffa14e5e93>] mdt_reint_internal+0x4c3/0x7c0 [mdt]
      <4> [<ffffffffa14e671b>] mdt_reint+0x6b/0x120 [mdt]
      <4> [<ffffffffa103a2ac>] tgt_request_handle+0x23c/0xac0 [ptlrpc]
      <4> [<ffffffffa0fe9d1a>] ptlrpc_main+0xd1a/0x1980 [ptlrpc]
      <4> [<ffffffffa0fe9000>] ? ptlrpc_main+0x0/0x1980 [ptlrpc]
      <4> [<ffffffff8109aee6>] kthread+0x96/0xa0
      <4> [<ffffffff8100c20a>] child_rip+0xa/0x20
      <4> [<ffffffff8109ae50>] ? kthread+0x0/0xa0
      <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20

      Additionally, we had the following stuck thread:
      <3>INFO: task mdt01_020:26426 blocked for more than 120 seconds.
      <3> Not tainted 2.6.32-431.5.1.el6.x86_64 #1
      <3>"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      <6>mdt01_020 D 000000000000000a 0 26426 2 0x00000000
      <4> ffff880ffa4d7af0 0000000000000046 0000000000000000 ffffffffa0c6bd75
      <4> 0000000100000000 ffffc9003aa25030 0000000000000246 0000000000000246
      <4> ffff88100aaae638 ffff880ffa4d7fd8 000000000000fbc8 ffff88100aaae638
      <4>Call Trace:
      <4> [<ffffffffa0c6bd75>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
      <4> [<ffffffffa0d225db>] lu_object_find_at+0xab/0x350 [obdclass]
      <4> [<ffffffff81065df0>] ? default_wake_function+0x0/0x20
      <4> [<ffffffffa0d22896>] lu_object_find+0x16/0x20 [obdclass]
      <4> [<ffffffffa14e2ea6>] mdt_object_find+0x56/0x170 [mdt]
      <4> [<ffffffffa14e4d2b>] mdt_intent_policy+0x75b/0xca0 [mdt]
      <4> [<ffffffffa0f8e899>] ldlm_lock_enqueue+0x369/0x930 [ptlrpc]
      <4> [<ffffffffa0fb7d8f>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
      <4> [<ffffffffa1039f02>] tgt_enqueue+0x62/0x1d0 [ptlrpc]
      <4> [<ffffffffa103a2ac>] tgt_request_handle+0x23c/0xac0 [ptlrpc]
      <4> [<ffffffffa0fe9d1a>] ptlrpc_main+0xd1a/0x1980 [ptlrpc]
      <4> [<ffffffffa0fe9000>] ? ptlrpc_main+0x0/0x1980 [ptlrpc]
      <4> [<ffffffff8109aee6>] kthread+0x96/0xa0
      <4> [<ffffffff8100c20a>] child_rip+0xa/0x20
      <4> [<ffffffff8109ae50>] ? kthread+0x0/0xa0
      <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20

      For some time before the LBUG. This thread is - in all of these instances - stuck in a rather odd spot in cfs_hash_bd_lookup_intent:
      match = intent_add ? NULL : hnode;
      hlist_for_each(ehnode, hhead) {
      if (!cfs_hash_keycmp(hs, key, ehnode))
      continue;

      Specifically, it reports as being stuck on the cfs_hash_keycmp line. It's not clear to me how a thread could get stuck there. I may be missing some operation it's doing as part of that.

      I'll make the dump available shortly.

      Attachments

        Issue Links

          Activity

            People

              di.wang Di Wang (Inactive)
              paf Patrick Farrell (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: