Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5344

ldlm/ifind deadlock for striped directory

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.8.0
    • Lustre 2.6.0, Lustre 2.7.0
    • 3
    • 14905

    Description

      To reproduce:

      export MDSCOUNT=4
      export MOUNT_2=y
      llmount.sh
      
      cd /mnt/lustre
      while true; do lfs mkdir -c4 d0; touch d0/f{0..3}; done &
      
      cd /mnt/lustre2
      while true; do rm -rf d0; done
      

      After about 10 rms we are stuck:

      7185 touch
      [<ffffffffa068376a>] ptlrpc_set_wait+0x2ea/0x830 [ptlrpc]
      [<ffffffffa0683d37>] ptlrpc_queue_wait+0x87/0x220 [ptlrpc]
      [<ffffffffa065f13e>] ldlm_cli_enqueue+0x36e/0x860 [ptlrpc]
      [<ffffffffa09105ae>] mdc_enqueue+0x2be/0x1ab0 [mdc]
      [<ffffffffa0911f82>] mdc_intent_lock+0x1e2/0x52f [mdc]
      [<ffffffffa08cbd2b>] lmv_intent_open+0x31b/0x9f0 [lmv]
      [<ffffffffa08cc6e0>] lmv_intent_lock+0x2e0/0x1180 [lmv]
      [<ffffffffa0e81faa>] ll_lookup_it+0x25a/0xad0 [lustre]
      [<ffffffffa0e828ac>] ll_lookup_nd+0x8c/0x4a0 [lustre]
      [<ffffffff811b0442>] __lookup_hash+0x102/0x160
      [<ffffffff811b0b7a>] lookup_hash+0x3a/0x50
      [<ffffffff811b5250>] do_filp_open+0x2e0/0xd30
      [<ffffffff8119f809>] do_sys_open+0x69/0x140
      [<ffffffff8119f920>] sys_open+0x20/0x30
      [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      [<ffffffffffffffff>] 0xffffffffffffffff
      
      4792 mdt01_004
      [<ffffffffa06643c9>] ldlm_completion_ast+0x4c9/0x930 [ptlrpc]
      [<ffffffffa0663b23>] ldlm_cli_enqueue_local+0x1f3/0x5d0 [ptlrpc]
      [<ffffffffa0c9e264>] mdt_object_local_lock+0x394/0xa60 [mdt]
      [<ffffffffa0c9e995>] mdt_object_lock_internal+0x65/0x360 [mdt]
      [<ffffffffa0c9ed54>] mdt_object_lock+0x14/0x20 [mdt]
      [<ffffffffa0c9ef11>] mdt_object_find_lock+0x61/0x170 [mdt]
      [<ffffffffa0cc8926>] mdt_reint_open+0x5c6/0x20b0 [mdt]
      [<ffffffffa0cb07a1>] mdt_reint_rec+0x41/0xe0 [mdt]
      [<ffffffffa0c9baf3>] mdt_reint_internal+0x4c3/0x7c0 [mdt]
      [<ffffffffa0c9bfe6>] mdt_intent_reint+0x1f6/0x520 [mdt]
      [<ffffffffa0c9a6c9>] mdt_intent_policy+0x499/0xca0 [mdt]
      [<ffffffffa0645422>] ldlm_lock_enqueue+0x302/0x920 [ptlrpc]
      [<ffffffffa066d651>] ldlm_handle_enqueue0+0x341/0x11e0 [ptlrpc]
      [<ffffffffa06ec9a2>] tgt_enqueue+0x62/0x1d0 [ptlrpc]
      [<ffffffffa06ebc35>] tgt_request_handle+0x245/0xad0 [ptlrpc]
      [<ffffffffa069ed91>] ptlrpc_main+0xcf1/0x1880 [ptlrpc]
      [<ffffffff8109eab6>] kthread+0x96/0xa0
      [<ffffffff8100c30a>] child_rip+0xa/0x20
      [<ffffffffffffffff>] 0xffffffffffffffff
      
      7186 rm
      [<ffffffffa068376a>] ptlrpc_set_wait+0x2ea/0x830 [ptlrpc]
      [<ffffffffa0683d37>] ptlrpc_queue_wait+0x87/0x220 [ptlrpc]
      [<ffffffffa065f13e>] ldlm_cli_enqueue+0x36e/0x860 [ptlrpc]
      [<ffffffffa09105ae>] mdc_enqueue+0x2be/0x1ab0 [mdc]
      [<ffffffffa0911f82>] mdc_intent_lock+0x1e2/0x52f [mdc]
      [<ffffffffa08cae7e>] lmv_revalidate_slaves+0x49e/0x1030 [lmv]
      [<ffffffffa08b25ba>] lmv_update_lsm_md+0x1a/0x20 [lmv]
      [<ffffffffa0e63ac0>] ll_update_inode+0x1370/0x1e90 [lustre]
      [<ffffffffa0e64668>] ll_read_inode2+0x88/0x480 [lustre]
      [<ffffffffa0e7e62b>] ll_iget+0x13b/0x3c0 [lustre]
      [<ffffffffa0e71740>] ll_prep_inode+0x6c0/0xe80 [lustre]
      [<ffffffffa0e80e91>] ll_lookup_it_finish+0x2f1/0x11b0 [lustre]
      [<ffffffffa0e82007>] ll_lookup_it+0x2b7/0xad0 [lustre]
      [<ffffffffa0e828ac>] ll_lookup_nd+0x8c/0x4a0 [lustre]
      [<ffffffff811b29b5>] do_lookup+0x1a5/0x230
      [<ffffffff811b2fc4>] __link_path_walk+0x584/0x840
      [<ffffffff811b398a>] path_walk+0x6a/0xe0
      [<ffffffff811b3b9b>] filename_lookup+0x6b/0xc0
      [<ffffffff811b4cc7>] user_path_at+0x57/0xa0
      [<ffffffff811a8790>] vfs_fstatat+0x50/0xa0
      [<ffffffff811a8804>] sys_newfstatat+0x24/0x50
      [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      [<ffffffffffffffff>] 0xffffffffffffffff
      
      4799 mdt00_005
      [<ffffffffa06643c9>] ldlm_completion_ast+0x4c9/0x930 [ptlrpc]
      [<ffffffffa0663b23>] ldlm_cli_enqueue_local+0x1f3/0x5d0 [ptlrpc]
      [<ffffffffa0c9e085>] mdt_object_local_lock+0x1b5/0xa60 [mdt]
      [<ffffffffa0c9e995>] mdt_object_lock_internal+0x65/0x360 [mdt]
      [<ffffffffa0c9ed54>] mdt_object_lock+0x14/0x20 [mdt]
      [<ffffffffa0ca3f1c>] mdt_getattr_name_lock+0xd4c/0x1a60 [mdt]
      [<ffffffffa0ca5152>] mdt_intent_getattr+0x292/0x470 [mdt]
      [<ffffffffa0c9a6c9>] mdt_intent_policy+0x499/0xca0 [mdt]
      [<ffffffffa0645422>] ldlm_lock_enqueue+0x302/0x920 [ptlrpc]
      [<ffffffffa066d651>] ldlm_handle_enqueue0+0x341/0x11e0 [ptlrpc]
      [<ffffffffa06ec9a2>] tgt_enqueue+0x62/0x1d0 [ptlrpc]
      [<ffffffffa06ebc35>] tgt_request_handle+0x245/0xad0 [ptlrpc]
      [<ffffffffa069ed91>] ptlrpc_main+0xcf1/0x1880 [ptlrpc]
      [<ffffffff8109eab6>] kthread+0x96/0xa0
      [<ffffffff8100c30a>] child_rip+0xa/0x20
      [<ffffffffffffffff>] 0xffffffffffffffff
      
      
      3831 ldlm_bl_00
      [<ffffffff811bf08e>] inode_wait+0xe/0x20
      [<ffffffff811c0c0c>] ifind+0xac/0xe0
      [<ffffffff811c0c8a>] ilookup5+0x4a/0x60
      [<ffffffffa0e80a5d>] ll_md_blocking_ast+0x6bd/0x800 [lustre]
      [<ffffffffa063fe6f>] ldlm_cancel_callback+0x6f/0x160 [ptlrpc]
      [<ffffffffa065d6aa>] ldlm_cli_cancel_local+0x8a/0x480 [ptlrpc]
      [<ffffffffa0662280>] ldlm_cli_cancel+0x60/0x360 [ptlrpc]
      [<ffffffffa0e80487>] ll_md_blocking_ast+0xe7/0x800 [lustre]
      [<ffffffffa0666060>] ldlm_handle_bl_callback+0x130/0x400 [ptlrpc]
      [<ffffffffa0668161>] ldlm_bl_thread_main+0x281/0x400 [ptlrpc]
      [<ffffffff8109eab6>] kthread+0x96/0xa0
      [<ffffffff8100c30a>] child_rip+0xa/0x20
      [<ffffffffffffffff>] 0xffffffffffffffff
      
      
      
      u:lustre-release# xddr2line ll_md_blocking_ast+0x6bd/0x800 [lustre]
      ll_md_blocking_ast
      /root/lustre-release/lustre/llite/namei.c:322
      
              master_inode = ilookup5(inode->i_sb, hash,
                                                      ll_test_inode_by_fid,
                                                      (void *)&lli->lli_pfid);
      

      Attachments

        Issue Links

          Activity

            People

              di.wang Di Wang
              jhammond John Hammond
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: