Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6904

linkea prepare in mdt_reint_rename cause deadlock in 24 hours failover

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.8.0
    • Lustre 2.8.0
    • None
    • 3
    • 9223372036854775807

    Description

      During 24 hours failover test. MDT stuck at

      LNet: Service thread pid 3156 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      Pid: 3156, comm: mdt01_005
      
      Call Trace:
       [<ffffffff81083e1c>] ? lock_timer_base+0x3c/0x70
       [<ffffffff8152a512>] schedule_timeout+0x192/0x2e0
       [<ffffffff81083f30>] ? process_timeout+0x0/0x10
       [<ffffffffa07d9fa9>] ptlrpc_set_wait+0x319/0xa30 [ptlrpc]
       [<ffffffffa07cf520>] ? ptlrpc_interrupted_set+0x0/0x110 [ptlrpc]
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa07e61b5>] ? lustre_msg_set_jobid+0xf5/0x130 [ptlrpc]
       [<ffffffffa07da741>] ptlrpc_queue_wait+0x81/0x220 [ptlrpc]
       [<ffffffffa113f271>] osp_remote_sync+0x121/0x190 [osp]
       [<ffffffffa1125b47>] osp_xattr_get+0x737/0x10e0 [osp]
       [<ffffffffa0d2c378>] ? __ldiskfs_journal_stop+0x68/0xa0 [ldiskfs]
       [<ffffffffa104ab25>] lod_xattr_get+0x185/0x760 [lod]
       [<ffffffffa10b8825>] mdd_links_read+0xf5/0x2d0 [mdd]
       [<ffffffffa10bbc4f>] mdd_linkea_prepare+0x39f/0x4d0 [mdd]
       [<ffffffffa10bbea1>] mdd_links_rename+0x121/0x540 [mdd]
       [<ffffffff8152b3a6>] ? down_write+0x16/0x40
       [<ffffffffa10c28ba>] mdd_rename+0x13aa/0x1d50 [mdd]
       [<ffffffffa0f7a105>] mdt_reint_rename_internal+0x1305/0x1a50 [mdt]
       [<ffffffffa0798246>] ? ldlm_lock_enqueue+0x2c6/0x8e0 [ptlrpc]
       [<ffffffffa0f7aa4d>] mdt_reint_rename_or_migrate+0x1fd/0x7e0 [mdt]
       [<ffffffffa07b7160>] ? ldlm_blocking_ast+0x0/0x180 [ptlrpc]
       [<ffffffffa07b8ad0>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc]
       [<ffffffffa080cea2>] ? __req_capsule_get+0x162/0x6e0 [ptlrpc]
       [<ffffffff8128b45a>] ? strlcpy+0x4a/0x60
       [<ffffffffa0f7b063>] mdt_reint_rename+0x13/0x20 [mdt]
       [<ffffffffa0f7386d>] mdt_reint_rec+0x5d/0x200 [mdt]
       [<ffffffffa0f5f78b>] mdt_reint_internal+0x62b/0xb80 [mdt]
       [<ffffffffa0f6017b>] mdt_reint+0x6b/0x120 [mdt]
       [<ffffffffa084ef42>] tgt_request_handle+0xa42/0x1230 [ptlrpc]
       [<ffffffffa07f73a1>] ptlrpc_main+0xe41/0x1920 [ptlrpc]
       [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
       [<ffffffff810623a9>] ? find_busiest_queue+0x69/0x150
       [<ffffffff815296ee>] ? thread_return+0x4e/0x770
       [<ffffffffa07f6560>] ? ptlrpc_main+0x0/0x1920 [ptlrpc]
       [<ffffffff8109abf6>] kthread+0x96/0xa0
       [<ffffffff8100c20a>] child_rip+0xa/0x20
       [<ffffffff8109ab60>] ? kthread+0x0/0xa0
       [<ffffffff8100c200>] ? child_rip+0x0/0x20
      

      The thread stuck here because the remote MDT restarts. But it also holds the journal handle, which cause a lot journal thread stuck

      INFO: task jbd2/sdf1-8:2813 blocked for more than 120 seconds.
            Not tainted 2.6.32-431.29.2.el6_lustre.g2382eb0.x86_64 #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      jbd2/sdf1-8   D 0000000000000001     0  2813      2 0x00000080
       ffff880fe9511d20 0000000000000046 0000000000000000 0000000000016880
       ffff880fe9511cd0 ffffffff810534d7 0000000000016880 ffff88102e42c080
       ffff88102e42c638 ffff880fe9511fd8 000000000000fbc8 ffff88102e42c638
      Call Trace:
       [<ffffffff810534d7>] ? walk_tg_tree_from+0x67/0xc0
       [<ffffffffa0cd580f>] jbd2_journal_commit_transaction+0x19f/0x1500 [jbd2]
       [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
       [<ffffffff81083e1c>] ? lock_timer_base+0x3c/0x70
       [<ffffffff8109afa0>] ? autoremove_wake_function+0x0/0x40
       [<ffffffffa0cdba58>] kjournald2+0xb8/0x220 [jbd2]
       [<ffffffff8109afa0>] ? autoremove_wake_function+0x0/0x40
       [<ffffffffa0cdb9a0>] ? kjournald2+0x0/0x220 [jbd2]
       [<ffffffff8109abf6>] kthread+0x96/0xa0
       [<ffffffff8100c20a>] child_rip+0xa/0x20
       [<ffffffff8109ab60>] ? kthread+0x0/0xa0
       [<ffffffff8100c200>] ? child_rip+0x0/0x20
      
      INFO: task tdtd-0:2919 blocked for more than 120 seconds.
            Not tainted 2.6.32-431.29.2.el6_lustre.g2382eb0.x86_64 #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      tdtd-0        D 0000000000000000     0  2919      2 0x00000080
       ffff880ffbe2dc20 0000000000000046 0000004cffffffff ffffffffa046b27b
       0000000055b31da1 ffff88103152a000 ffff880ffbe2dc40 ffffffffa0c816cf
       ffff880ffbe1b098 ffff880ffbe2dfd8 000000000000fbc8 ffff880ffbe1b098
      Call Trace:
       [<ffffffffa046b27b>] ? cfs_set_ptldebug_header+0x2b/0xc0 [libcfs]
       [<ffffffffa0c816cf>] ? qsd_op_begin+0x5f/0xb70 [lquota]
       [<ffffffff8109b2ce>] ? prepare_to_wait+0x4e/0x80
       [<ffffffffa0cd408a>] start_this_handle+0x25a/0x480 [jbd2]
       [<ffffffff8109afa0>] ? autoremove_wake_function+0x0/0x40
       [<ffffffffa0cd4495>] jbd2_journal_start+0xb5/0x100 [jbd2]
       [<ffffffffa0d2c406>] ldiskfs_journal_start_sb+0x56/0xe0 [ldiskfs]
       [<ffffffffa0d7dbef>] osd_trans_start+0x1df/0x410 [osd_ldiskfs]
       [<ffffffffa0860d6e>] distribute_txn_commit_thread+0xa1e/0x1700 [ptlrpc]
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa0860350>] ? distribute_txn_commit_thread+0x0/0x1700 [ptlrpc]
       [<ffffffff8109abf6>] kthread+0x96/0xa0
       [<ffffffff8100c20a>] child_rip+0xa/0x20
       [<ffffffff8109ab60>] ? kthread+0x0/0xa0
       [<ffffffff8100c200>] ? child_rip+0x0/0x20
      INFO: task mdt_out03_003:3164 blocked for more than 120 seconds.
            Not tainted 2.6.32-431.29.2.el6_lustre.g2382eb0.x86_64 #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      mdt_out03_003 D 000000000000000f     0  3164      2 0x00000080
       ffff88081a6ada60 0000000000000046 0000000000000000 ffff881029e64000
       ffff88081a6ada20 ffffffffa0db3e79 ffff88081a6ada90 ffffffff815296ee
       ffff880831563058 ffff88081a6adfd8 000000000000fbc8 ffff880831563058
      Call Trace:
       [<ffffffffa0db3e79>] ? osd_declare_qid+0x289/0x480 [osd_ldiskfs]
       [<ffffffff815296ee>] ? thread_return+0x4e/0x770
       [<ffffffffa0cd408a>] start_this_handle+0x25a/0x480 [jbd2]
       [<ffffffff8109afa0>] ? autoremove_wake_function+0x0/0x40
       [<ffffffffa0cd4495>] jbd2_journal_start+0xb5/0x100 [jbd2]
       [<ffffffffa0d2c406>] ldiskfs_journal_start_sb+0x56/0xe0 [ldiskfs]
       [<ffffffffa0d7dbef>] osd_trans_start+0x1df/0x410 [osd_ldiskfs]
       [<ffffffffa0851ffc>] out_tx_end+0x9c/0x5d0 [ptlrpc]
       [<ffffffffa0856bab>] out_handle+0xd9b/0x17e0 [ptlrpc]
       [<ffffffffa079ffb0>] ? target_bulk_timeout+0x0/0xc0 [ptlrpc]
       [<ffffffffa084ef42>] tgt_request_handle+0xa42/0x1230 [ptlrpc]
       [<ffffffffa07f73a1>] ptlrpc_main+0xe41/0x1920 [ptlrpc]
       [<ffffffff81069290>] ? pick_next_task_fair+0xd0/0x130
       [<ffffffff81529466>] ? schedule+0x176/0x3b0
       [<ffffffffa07f6560>] ? ptlrpc_main+0x0/0x1920 [ptlrpc]
       [<ffffffff8109abf6>] kthread+0x96/0xa0
       [<ffffffff8100c20a>] child_rip+0xa/0x20
       [<ffffffff8109ab60>] ? kthread+0x0/0xa0
       [<ffffffff8100c200>] ? child_rip+0x0/0x20
      

      So in rename, it should prepare the linkEA before journal start to avoid holding transaction, while sending RPC.

      Attachments

        Issue Links

          Activity

            People

              di.wang Di Wang
              di.wang Di Wang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: