Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2700

mdt deadlock in rename(?)

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.0
    • Lustre 2.4.0
    • 3
    • 6293

    Description

      It seems a deadlock in rename was introduced this Friday 25th.
      My racer runs are hittign it reliably.

      [77051.148056] LNet: Service thread pid 6227 was inactive for 62.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [77051.148872] Pid: 6227, comm: mdt01_008
      [77051.149108] 
      [77051.149108] Call Trace:
      [77051.149529]  [<ffffffff814fb054>] ? _spin_lock_irqsave+0x24/0x30
      [77051.149810]  [<ffffffff814f8ad1>] schedule_timeout+0x191/0x2e0
      [77051.150071]  [<ffffffff8107bcd0>] ? process_timeout+0x0/0x10
      [77051.150820]  [<ffffffffa0bbf771>] cfs_waitq_timedwait+0x11/0x20 [libcfs]
      [77051.151116]  [<ffffffffa0f7ed8d>] ldlm_completion_ast+0x4dd/0x950 [ptlrpc]
      [77051.151398]  [<ffffffffa0f7a6a0>] ? ldlm_expired_completion_wait+0x0/0x390 [ptlrpc]
      [77051.152764]  [<ffffffff81057d60>] ? default_wake_function+0x0/0x20
      [77051.153306]  [<ffffffffa0f7e4d8>] ldlm_cli_enqueue_local+0x1f8/0x5d0 [ptlrpc]
      [77051.153618]  [<ffffffffa0f7e8b0>] ? ldlm_completion_ast+0x0/0x950 [ptlrpc]
      [77051.153917]  [<ffffffffa07e8830>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      [77051.154214]  [<ffffffffa080984a>] mdt_reint_rename+0x7ba/0x1a70 [mdt]
      [77051.154580]  [<ffffffffa07e8830>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      [77051.154895]  [<ffffffffa0f7e8b0>] ? ldlm_completion_ast+0x0/0x950 [ptlrpc]
      [77051.155220]  [<ffffffffa0e29700>] ? lu_ucred+0x20/0x30 [obdclass]
      [77051.156752]  [<ffffffffa07e7ff5>] ? mdt_ucred+0x15/0x20 [mdt]
      [77051.157068]  [<ffffffffa080229c>] ? mdt_root_squash+0x2c/0x410 [mdt]
      [77051.157515]  [<ffffffffa0fd16f6>] ? __req_capsule_get+0x166/0x700 [ptlrpc]
      [77051.157839]  [<ffffffffa0e29700>] ? lu_ucred+0x20/0x30 [obdclass]
      [77051.158785]  [<ffffffffa0806a51>] mdt_reint_rec+0x41/0xe0 [mdt]
      [77051.159066]  [<ffffffffa08000f3>] mdt_reint_internal+0x4f3/0x7e0 [mdt]
      [77051.159358]  [<ffffffffa0800424>] mdt_reint+0x44/0xe0 [mdt]
      [77051.159609]  [<ffffffffa07f0732>] mdt_handle_common+0x8e2/0x1690 [mdt]
      [77051.163421]  [<ffffffffa0825e95>] mds_regular_handle+0x15/0x20 [mdt]
      [77051.163763]  [<ffffffffa0fb5af3>] ptlrpc_server_handle_request+0x463/0xe60 [ptlrpc]
      [77051.164304]  [<ffffffffa0bbf65e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      [77051.164635]  [<ffffffffa0fae871>] ? ptlrpc_wait_event+0xb1/0x2a0 [ptlrpc]
      [77051.164923]  [<ffffffff81051f73>] ? __wake_up+0x53/0x70
      [77051.165209]  [<ffffffffa0fb85fd>] ptlrpc_main+0xb3d/0x18c0 [ptlrpc]
      [77051.165478]  [<ffffffffa0fb7ac0>] ? ptlrpc_main+0x0/0x18c0 [ptlrpc]
      [77051.166098]  [<ffffffff8100c14a>] child_rip+0xa/0x20
      [77051.166279]  [<ffffffffa0fb7ac0>] ? ptlrpc_main+0x0/0x18c0 [ptlrpc]
      [77051.166470]  [<ffffffffa0fb7ac0>] ? ptlrpc_main+0x0/0x18c0 [ptlrpc]
      [77051.166643]  [<ffffffff8100c140>] ? child_rip+0x0/0x20
      [77051.166802] 
      [77051.166922] LustreError: dumping log to /tmp/lustre-log.1359282032.6227
      [77051.270779] LNet: Service thread pid 30749 was inactive for 62.12s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [77051.271595] Pid: 30749, comm: mdt00_004
      [77051.271845] 
      [77051.271846] Call Trace:
      [77051.272326]  [<ffffffff814fb054>] ? _spin_lock_irqsave+0x24/0x30
      [77051.272606]  [<ffffffff814f8ad1>] schedule_timeout+0x191/0x2e0
      [77051.272926]  [<ffffffff8107bcd0>] ? process_timeout+0x0/0x10
      [77051.273228]  [<ffffffffa0bbf771>] cfs_waitq_timedwait+0x11/0x20 [libcfs]
      [77051.273570]  [<ffffffffa0f7ed8d>] ldlm_completion_ast+0x4dd/0x950 [ptlrpc]
      [77051.273900]  [<ffffffffa0f7a6a0>] ? ldlm_expired_completion_wait+0x0/0x390 [ptlrpc]
      [77051.274404]  [<ffffffff81057d60>] ? default_wake_function+0x0/0x20
      [77051.274723]  [<ffffffffa0f7e4d8>] ldlm_cli_enqueue_local+0x1f8/0x5d0 [ptlrpc]
      [77051.275057]  [<ffffffffa0f7e8b0>] ? ldlm_completion_ast+0x0/0x950 [ptlrpc]
      [77051.275395]  [<ffffffffa07e8830>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      [77051.275709]  [<ffffffffa080984a>] mdt_reint_rename+0x7ba/0x1a70 [mdt]
      [77051.276044]  [<ffffffffa07e8830>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      [77051.276376]  [<ffffffffa0f7e8b0>] ? ldlm_completion_ast+0x0/0x950 [ptlrpc]
      [77051.276723]  [<ffffffffa0e29700>] ? lu_ucred+0x20/0x30 [obdclass]
      [77051.277061]  [<ffffffffa07e7ff5>] ? mdt_ucred+0x15/0x20 [mdt]
      [77051.277363]  [<ffffffffa080229c>] ? mdt_root_squash+0x2c/0x410 [mdt]
      [77051.277695]  [<ffffffffa0fd16f6>] ? __req_capsule_get+0x166/0x700 [ptlrpc]
      [77051.278010]  [<ffffffffa0e29700>] ? lu_ucred+0x20/0x30 [obdclass]
      [77051.278299]  [<ffffffffa0806a51>] mdt_reint_rec+0x41/0xe0 [mdt]
      [77051.278606]  [<ffffffffa08000f3>] mdt_reint_internal+0x4f3/0x7e0 [mdt]
      [77051.278916]  [<ffffffffa0800424>] mdt_reint+0x44/0xe0 [mdt]
      [77051.279202]  [<ffffffffa07f0732>] mdt_handle_common+0x8e2/0x1690 [mdt]
      [77051.279526]  [<ffffffffa0825e95>] mds_regular_handle+0x15/0x20 [mdt]
      [77051.279864]  [<ffffffffa0fb5af3>] ptlrpc_server_handle_request+0x463/0xe60 [ptlrpc]
      [77051.280418]  [<ffffffffa0bbf65e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      [77051.280734]  [<ffffffffa0fae871>] ? ptlrpc_wait_event+0xb1/0x2a0 [ptlrpc]
      [77051.281920]  [<ffffffff81057d60>] ? default_wake_function+0x0/0x20
      [77051.282248]  [<ffffffffa0fb85fd>] ptlrpc_main+0xb3d/0x18c0 [ptlrpc]
      [77051.282606]  [<ffffffffa0fb7ac0>] ? ptlrpc_main+0x0/0x18c0 [ptlrpc]
      [77051.282906]  [<ffffffff8100c14a>] child_rip+0xa/0x20
      [77051.283203]  [<ffffffffa0fb7ac0>] ? ptlrpc_main+0x0/0x18c0 [ptlrpc]
      [77051.283529]  [<ffffffffa0fb7ac0>] ? ptlrpc_main+0x0/0x18c0 [ptlrpc]
      [77051.283830]  [<ffffffff8100c140>] ? child_rip+0x0/0x20
      [77051.284145] 
      [77051.304078] Pid: 819, comm: mdt01_005
      [77051.304351] 
      [77051.304352] Call Trace:
      [77051.304821]  [<ffffffff814fb054>] ? _spin_lock_irqsave+0x24/0x30
      [77051.305120]  [<ffffffff814f8ad1>] schedule_timeout+0x191/0x2e0
      [77051.305424]  [<ffffffff8107bcd0>] ? process_timeout+0x0/0x10
      [77051.305728]  [<ffffffffa0bbf771>] cfs_waitq_timedwait+0x11/0x20 [libcfs]
      [77051.306062]  [<ffffffffa0f7ed8d>] ldlm_completion_ast+0x4dd/0x950 [ptlrpc]
      [77051.306400]  [<ffffffffa0f7a6a0>] ? ldlm_expired_completion_wait+0x0/0x390 [ptlrpc]
      [77051.306905]  [<ffffffff81057d60>] ? default_wake_function+0x0/0x20
      [77051.307224]  [<ffffffffa0f7e4d8>] ldlm_cli_enqueue_local+0x1f8/0x5d0 [ptlrpc]
      [77051.307562]  [<ffffffffa0f7e8b0>] ? ldlm_completion_ast+0x0/0x950 [ptlrpc]
      [77051.307892]  [<ffffffffa07e8830>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      [77051.308233]  [<ffffffffa07eadc0>] mdt_object_lock0+0x360/0xb10 [mdt]
      [77051.308525]  [<ffffffffa07e8830>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      [77051.309085]  [<ffffffffa0f7e8b0>] ? ldlm_completion_ast+0x0/0x950 [ptlrpc]
      [77051.309409]  [<ffffffffa07eb614>] mdt_object_lock+0x14/0x20 [mdt]
      [77051.309714]  [<ffffffffa07fd9c5>] mdt_getattr_name_lock+0xee5/0x1980 [mdt]
      [77051.310049]  [<ffffffffa0fa6275>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
      [77051.310377]  [<ffffffffa0fd16f6>] ? __req_capsule_get+0x166/0x700 [ptlrpc]
      [77051.310710]  [<ffffffffa0fa8474>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc]
      [77051.311017]  [<ffffffffa07fe6fd>] mdt_intent_getattr+0x29d/0x490 [mdt]
      [77051.311280]  [<ffffffffa07fc26e>] mdt_intent_policy+0x3ae/0x750 [mdt]
      [77051.311568]  [<ffffffffa0f5f70a>] ldlm_lock_enqueue+0x2ea/0x870 [ptlrpc]
      [77051.311843]  [<ffffffffa0f85f07>] ldlm_handle_enqueue0+0x4f7/0x1090 [ptlrpc]
      [77051.312143]  [<ffffffffa07fc746>] mdt_enqueue+0x46/0x110 [mdt]
      [77051.312420]  [<ffffffffa07f0732>] mdt_handle_common+0x8e2/0x1690 [mdt]
      [77051.312690]  [<ffffffffa0825e95>] mds_regular_handle+0x15/0x20 [mdt]
      [77051.312986]  [<ffffffffa0fb5af3>] ptlrpc_server_handle_request+0x463/0xe60 [ptlrpc]
      [77051.313421]  [<ffffffffa0bbf65e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      [77051.313690]  [<ffffffffa0fae871>] ? ptlrpc_wait_event+0xb1/0x2a0 [ptlrpc]
      [77051.313952]  [<ffffffff81051f73>] ? __wake_up+0x53/0x70
      [77051.314203]  [<ffffffffa0fb85fd>] ptlrpc_main+0xb3d/0x18c0 [ptlrpc]
      [77051.314477]  [<ffffffffa0fb7ac0>] ? ptlrpc_main+0x0/0x18c0 [ptlrpc]
      [77051.314730]  [<ffffffff8100c14a>] child_rip+0xa/0x20
      [77051.314977]  [<ffffffffa0fb7ac0>] ? ptlrpc_main+0x0/0x18c0 [ptlrpc]
      [77051.315274]  [<ffffffffa0fb7ac0>] ? ptlrpc_main+0x0/0x18c0 [ptlrpc]
      [77051.315605]  [<ffffffff8100c140>] ? child_rip+0x0/0x20
      [77051.315836] 
      [77051.316029] LustreError: dumping log to /tmp/lustre-log.1359282032.819
      [77051.372055] Pid: 29786, comm: mdt00_001
      [77051.372306] 
      [77051.372307] Call Trace:
      [77051.372715]  [<ffffffff814fb054>] ? _spin_lock_irqsave+0x24/0x30
      [77051.372988]  [<ffffffff814f8ad1>] schedule_timeout+0x191/0x2e0
      [77051.373257]  [<ffffffff8107bcd0>] ? process_timeout+0x0/0x10
      [77051.373537]  [<ffffffffa0bbf771>] cfs_waitq_timedwait+0x11/0x20 [libcfs]
      [77051.373850]  [<ffffffffa0f7ed8d>] ldlm_completion_ast+0x4dd/0x950 [ptlrpc]
      [77051.374153]  [<ffffffffa0f7a6a0>] ? ldlm_expired_completion_wait+0x0/0x390 [ptlrpc]
      [77051.374620]  [<ffffffff81057d60>] ? default_wake_function+0x0/0x20
      [77051.374910]  [<ffffffffa0f7e4d8>] ldlm_cli_enqueue_local+0x1f8/0x5d0 [ptlrpc]
      [77051.375217]  [<ffffffffa0f7e8b0>] ? ldlm_completion_ast+0x0/0x950 [ptlrpc]
      [77051.375532]  [<ffffffffa07e8830>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      [77051.375816]  [<ffffffffa07eadc0>] mdt_object_lock0+0x360/0xb10 [mdt]
      [77051.376135]  [<ffffffffa07e8830>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      [77051.376447]  [<ffffffffa0f7e8b0>] ? ldlm_completion_ast+0x0/0x950 [ptlrpc]
      [77051.376746]  [<ffffffffa07eb614>] mdt_object_lock+0x14/0x20 [mdt]
      [77051.377025]  [<ffffffffa07fd9c5>] mdt_getattr_name_lock+0xee5/0x1980 [mdt]
      [77051.377339]  [<ffffffffa0fa6275>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
      [77051.377641]  [<ffffffffa0fd16f6>] ? __req_capsule_get+0x166/0x700 [ptlrpc]
      [77051.377977]  [<ffffffffa0fa8474>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc]
      [77051.378274]  [<ffffffffa07fe6fd>] mdt_intent_getattr+0x29d/0x490 [mdt]
      [77051.378564]  [<ffffffffa07fc26e>] mdt_intent_policy+0x3ae/0x750 [mdt]
      [77051.378858]  [<ffffffffa0f5f70a>] ldlm_lock_enqueue+0x2ea/0x870 [ptlrpc]
      [77051.379163]  [<ffffffffa0f85f07>] ldlm_handle_enqueue0+0x4f7/0x1090 [ptlrpc]
      [77051.379460]  [<ffffffffa07fc746>] mdt_enqueue+0x46/0x110 [mdt]
      [77051.379735]  [<ffffffffa07f0732>] mdt_handle_common+0x8e2/0x1690 [mdt]
      [77051.380053]  [<ffffffffa0825e95>] mds_regular_handle+0x15/0x20 [mdt]
      [77051.380358]  [<ffffffffa0fb5af3>] ptlrpc_server_handle_request+0x463/0xe60 [ptlrpc]
      [77051.380835]  [<ffffffffa0bbf65e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      [77051.381128]  [<ffffffffa0fae871>] ? ptlrpc_wait_event+0xb1/0x2a0 [ptlrpc]
      [77051.381421]  [<ffffffff81051f73>] ? __wake_up+0x53/0x70
      [77051.381696]  [<ffffffffa0fb85fd>] ptlrpc_main+0xb3d/0x18c0 [ptlrpc]
      [77051.381992]  [<ffffffffa0fb7ac0>] ? ptlrpc_main+0x0/0x18c0 [ptlrpc]
      [77051.382269]  [<ffffffff8100c14a>] child_rip+0xa/0x20
      [77051.382545]  [<ffffffffa0fb7ac0>] ? ptlrpc_main+0x0/0x18c0 [ptlrpc]
      [77051.382839]  [<ffffffffa0fb7ac0>] ? ptlrpc_main+0x0/0x18c0 [ptlrpc]
      [77051.383114]  [<ffffffff8100c140>] ? child_rip+0x0/0x20
      [77051.383373] 
      [77051.383565] LustreError: dumping log to /tmp/lustre-log.1359282032.29786
      [77052.220055] LNet: Service thread pid 10930 was inactive for 62.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [77052.220826] LNet: Skipped 2 previous similar messages
      [77052.221069] Pid: 10930, comm: mdt00_010
      [77052.221281] 
      [77052.221281] Call Trace:
      [77052.221656]  [<ffffffff814fb054>] ? _spin_lock_irqsave+0x24/0x30
      [77052.221907]  [<ffffffff814f8ad1>] schedule_timeout+0x191/0x2e0
      [77052.222159]  [<ffffffff8107bcd0>] ? process_timeout+0x0/0x10
      [77052.222416]  [<ffffffffa0bbf771>] cfs_waitq_timedwait+0x11/0x20 [libcfs]
      [77052.222703]  [<ffffffffa0f7ed8d>] ldlm_completion_ast+0x4dd/0x950 [ptlrpc]
      [77052.222984]  [<ffffffffa0f7a6a0>] ? ldlm_expired_completion_wait+0x0/0x390 [ptlrpc]
      [77052.223414]  [<ffffffff81057d60>] ? default_wake_function+0x0/0x20
      [77052.223690]  [<ffffffffa0f7e4d8>] ldlm_cli_enqueue_local+0x1f8/0x5d0 [ptlrpc]
      [77052.223974]  [<ffffffffa0f7e8b0>] ? ldlm_completion_ast+0x0/0x950 [ptlrpc]
      [77052.224247]  [<ffffffffa07e8830>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      [77052.224511]  [<ffffffffa07eadc0>] mdt_object_lock0+0x360/0xb10 [mdt]
      [77052.224799]  [<ffffffffa07e8830>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      [77052.225083]  [<ffffffffa0f7e8b0>] ? ldlm_completion_ast+0x0/0x950 [ptlrpc]
      [77052.225502]  [<ffffffffa07eb614>] mdt_object_lock+0x14/0x20 [mdt]
      [77052.225781]  [<ffffffffa07fd9c5>] mdt_getattr_name_lock+0xee5/0x1980 [mdt]
      [77052.226094]  [<ffffffffa0fa6275>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
      [77052.226413]  [<ffffffffa0fd16f6>] ? __req_capsule_get+0x166/0x700 [ptlrpc]
      [77052.226712]  [<ffffffffa0fa8474>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc]
      [77052.227012]  [<ffffffffa07fe6fd>] mdt_intent_getattr+0x29d/0x490 [mdt]
      [77052.227298]  [<ffffffffa07fc26e>] mdt_intent_policy+0x3ae/0x750 [mdt]
      [77052.227577]  [<ffffffffa0f5f70a>] ldlm_lock_enqueue+0x2ea/0x870 [ptlrpc]
      [77052.227855]  [<ffffffffa0f85f07>] ldlm_handle_enqueue0+0x4f7/0x1090 [ptlrpc]
      [77052.228229]  [<ffffffffa07fc746>] mdt_enqueue+0x46/0x110 [mdt]
      [77052.228503]  [<ffffffffa07f0732>] mdt_handle_common+0x8e2/0x1690 [mdt]
      [77052.228830]  [<ffffffffa0825e95>] mds_regular_handle+0x15/0x20 [mdt]
      [77052.229154]  [<ffffffffa0fb5af3>] ptlrpc_server_handle_request+0x463/0xe60 [ptlrpc]
      [77052.229758]  [<ffffffffa0bbf65e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      [77052.230033]  [<ffffffffa0fae871>] ? ptlrpc_wait_event+0xb1/0x2a0 [ptlrpc]
      [77052.230322]  [<ffffffff81051f73>] ? __wake_up+0x53/0x70
      [77052.230615]  [<ffffffffa0fb85fd>] ptlrpc_main+0xb3d/0x18c0 [ptlrpc]
      [77052.230880]  [<ffffffffa01979c0>] ? rpc_wait_bit_killable+0x0/0xa0 [sunrpc]
      [77052.231181]  [<ffffffffa0fb7ac0>] ? ptlrpc_main+0x0/0x18c0 [ptlrpc]
      [77052.231489]  [<ffffffff8100c14a>] child_rip+0xa/0x20
      [77052.231737]  [<ffffffffa0fb7ac0>] ? ptlrpc_main+0x0/0x18c0 [ptlrpc]
      [77052.232006]  [<ffffffffa0fb7ac0>] ? ptlrpc_main+0x0/0x18c0 [ptlrpc]
      [77052.232260]  [<ffffffff8100c140>] ? child_rip+0x0/0x20
      [77052.232498] 
      [77052.232673] LustreError: dumping log to /tmp/lustre-log.1359282033.10930
      [77052.235465] LNet: Service thread pid 29790 was inactive for 62.00s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
      [77052.252077] LNet: Service thread pid 29788 was inactive for 62.00s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
      [77052.252623] LNet: Skipped 2 previous similar messages
      [77052.252883] LustreError: dumping log to /tmp/lustre-log.1359282033.29788
      [77052.268049] LustreError: dumping log to /tmp/lustre-log.1359282033.29785
      ...
      [77189.148054] Lustre: lock timed out (enqueued at 1359281970, 200s ago)
      [77189.148058] Lustre: lock timed out (enqueued at 1359281970, 200s ago)
      [77189.168469] LustreError: dumping log to /tmp/lustre-log.1359282170.30749
      [77189.304162] Lustre: lock timed out (enqueued at 1359281970, 200s ago)
      [77190.220049] Lustre: lock timed out (enqueued at 1359281971, 200s ago)
      [77190.220048] Lustre: lock timed out (enqueued at 1359281971, 200s ago)
      [77190.220048] Lustre: lock timed out (enqueued at 1359281971, 200s ago)
      [77190.220048] Lustre: Skipped 1 previous similar message
      [77190.220051] Lustre: Skipped 1 previous similar message
      

      Attachments

        Activity

          People

            di.wang Di Wang
            green Oleg Drokin
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: