[LU-5530] MDS thread lockup witrh patched 2.5 server - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.7.0, Lustre 2.5.4
Affects Version/s: Lustre 2.5.2
Labels:
None
Environment:

Hide
RHEL6.5 MDS server running latest 2.5 with patches for

~~LU-793~~
~~LU-2827~~
~~LU-3338~~
~~LU-4933~~
~~LU-5266~~
~~LU-5496~~

Show
RHEL6.5 MDS server running latest 2.5 with patches for LU-793 LU-2827 LU-3338 LU-4933 LU-5266 LU-5496

Severity:
3
Rank (Obsolete):
15395

Description

While testing 2.5 servers for a possible upcoming test shot I hit this bug while running simul.

<6>[ 3724.190222] mdt00_003 D 0000000000000010 0 15957 2 0x00000000
<4>[ 3724.197292] ffff8807e1f118b8 0000000000000046 0000000000000000 ffffffffa0f5b6eb
<4>[ 3724.205018] ffff8810021cf190 ffff8810021cf138 ffff8808326a0538 ffffffffa0f5b6eb
<4>[ 3724.212759] ffff8807f846fab8 ffff8807e1f11fd8 000000000000fbc8 ffff8807f846fab8
<4>[ 3724.220491] Call Trace:
<4>[ 3724.223032] [<ffffffff8152a6d5>] rwsem_down_failed_common+0x95/0x1d0
<4>[ 3724.229597] [<ffffffffa0d9d3fb>] ? ldiskfs_xattr_trusted_get+0x2b/0x30 [ldiskfs]
<4>[ 3724.237247] [<ffffffff811ae017>] ? generic_getxattr+0x87/0x90
<4>[ 3724.243199] [<ffffffff8152a866>] rwsem_down_read_failed+0x26/0x30
<4>[ 3724.249499] [<ffffffffa0fe8083>] ? lod_xattr_get+0x153/0x420 [lod]
<4>[ 3724.255867] [<ffffffff8128eab4>] call_rwsem_down_read_failed+0x14/0x30
<4>[ 3724.262580] [<ffffffff81529d64>] ? down_read+0x24/0x30
<4>[ 3724.267923] [<ffffffffa0f2569d>] mdt_object_open_lock+0x1ed/0x9d0 [mdt]
<4>[ 3724.274736] [<ffffffffa0f077e0>] ? mdt_attr_get_complex+0x520/0x7f0 [mdt]
<4>[ 3724.281720] [<ffffffffa0f2dcc7>] mdt_reint_open+0x15b7/0x2150 [mdt]
<4>[ 3724.288187] [<ffffffffa05e9f76>] ? upcall_cache_get_entry+0x296/0x880 [libcfs]
<4>[ 3724.295688] [<ffffffffa073fc10>] ? lu_ucred+0x20/0x30 [obdclass]
<4>[ 3724.301900] [<ffffffffa0f16611>] mdt_reint_rec+0x41/0xe0 [mdt]
<4>[ 3724.307916] [<ffffffffa0efbe63>] mdt_reint_internal+0x4c3/0x780 [mdt]
<4>[ 3724.314551] [<ffffffffa0efc3ee>] mdt_intent_reint+0x1ee/0x520 [mdt]
<4>[ 3724.321023] [<ffffffffa0ef9bce>] mdt_intent_policy+0x3ae/0x770 [mdt]
<4>[ 3724.327619] [<ffffffffa085a2e5>] ldlm_lock_enqueue+0x135/0x950 [ptlrpc]
<4>[ 3724.334445] [<ffffffffa0883ccf>]
ldlm_handle_enqueue0+0x50f/0x10c0 [ptlrpc]
<4>[ 3724.334445] [<ffffffffa0883ccf>] ldlm_handle_enqueue0+0x50f/0x10c0 [ptlrpc]
<4>[ 3724.341608] [<ffffffffa0efa096>] mdt_enqueue+0x46/0xe0 [mdt]
<4>[ 3724.347484] [<ffffffffa0efec5a>] mdt_handle_common+0x52a/0x1470 [mdt]
<4>[ 3724.354147] [<ffffffffa0f3b945>] mds_regular_handle+0x15/0x20 [mdt]
<4>[ 3724.360670] [<ffffffffa08b4015>] ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
<4>[ 3724.368508] [<ffffffffa05ce4ce>] ? cfs_timer_arm+0xe/0x10 [libcfs]
<4>[ 3724.374864] [<ffffffffa05df3cf>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
<4>[ 3724.381791] [<ffffffffa08ab699>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
<4>[ 3724.388691] [<ffffffff810546b9>] ? __wake_up_common+0x59/0x90
<4>[ 3724.394666] [<ffffffffa08b537d>] ptlrpc_main+0xaed/0x1920 [ptlrpc]
<4>[ 3724.402340] [<ffffffffa08b4890>] ? ptlrpc_main+0x0/0x1920 [ptlrpc]
<4>[ 3724.408702] [<ffffffff8109ab56>] kthread+0x96/0xa0
<4>[ 3724.413684] [<ffffffff8100c20a>] child_rip+0xa/0x20
<4>[ 3724.418754] [<ffffffff8109aac0>] ? kthread+0x0/0xa0
<4>[ 3724.423841] [<ffffffff8100c200>] ? child_rip+0x0/0x20

Attachments

Issue Links

is related to

LU-2827 mdt_intent_fixup_resent() cannot find the proper lock in hash

Resolved

LU-5686 (mdt_handler.c:3203:mdt_intent_lock_replace()) ASSERTION( lustre_msg_get_flags(req->rq_reqmsg) & 0x0002 ) failed

Resolved

Activity

[LU-5530] MDS thread lockup witrh patched 2.5 server

Oleg Drokin added a comment - 09/Sep/14 4:58 PM

So I have looked in the core file, checked all mds threads.
There is no obvious culprit for the deadlock observed. Nothing sleeping or otherwise blocked with this lock held.
So it appears that the semaphore is somehow leaked.
I inspected all the callers and users of the open semaphore and there does not seem to be any possible way for the leakage.
I double-checked using ORNL patched 2.5 tree and it seems to be the case there as well.

As such I am switching mode now.
The plan is to trigger resending of the client requests at all times due to too small of a buffer. Either by making the server to always send huge replies or by the client to always provide too little ones.

I am getting some interesting crashes this way already, though possibly this is bit self-inflicted.
This is also useful so that we can execute this code even on our small test clusters during normal operations.

Oleg Drokin added a comment - 09/Sep/14 4:58 PM So I have looked in the core file, checked all mds threads. There is no obvious culprit for the deadlock observed. Nothing sleeping or otherwise blocked with this lock held. So it appears that the semaphore is somehow leaked. I inspected all the callers and users of the open semaphore and there does not seem to be any possible way for the leakage. I double-checked using ORNL patched 2.5 tree and it seems to be the case there as well. As such I am switching mode now. The plan is to trigger resending of the client requests at all times due to too small of a buffer. Either by making the server to always send huge replies or by the client to always provide too little ones. I am getting some interesting crashes this way already, though possibly this is bit self-inflicted. This is also useful so that we can execute this code even on our small test clusters during normal operations.

James A Simmons added a comment - 02/Sep/14 5:18 PM

crash> bt ffff8807e2630080
PID: 15771 TASK: ffff8807e2630080 CPU: 20 COMMAND: "mdt01_000"
#0 [ffff8807e26817f8] schedule at ffffffff81528020
#1 [ffff8807e26818c0] rwsem_down_failed_common at ffffffff8152a6d5
#2 [ffff8807e2681920] rwsem_down_read_failed at ffffffff8152a866
#3 [ffff8807e2681960] call_rwsem_down_read_failed at ffffffff8128eab4
#4 [ffff8807e26819c8] mdt_object_open_lock at ffffffffa0f2569d [mdt]
#5 [ffff8807e2681a38] mdt_reint_open at ffffffffa0f2dcc7 [mdt]
#6 [ffff8807e2681b18] mdt_reint_rec at ffffffffa0f16611 [mdt]
#7 [ffff8807e2681b38] mdt_reint_internal at ffffffffa0efbe63 [mdt]
#8 [ffff8807e2681b78] mdt_intent_reint at ffffffffa0efc3ee [mdt]
#9 [ffff8807e2681bc8] mdt_intent_policy at ffffffffa0ef9bce [mdt]
#10 [ffff8807e2681c08] ldlm_lock_enqueue at ffffffffa085a2e5 [ptlrpc]
#11 [ffff8807e2681c78] ldlm_handle_enqueue0 at ffffffffa0883ccf [ptlrpc]
#12 [ffff8807e2681ce8] mdt_enqueue at ffffffffa0efa096 [mdt]
#13 [ffff8807e2681d08] mdt_handle_common at ffffffffa0efec5a [mdt]
#14 [ffff8807e2681d58] mds_regular_handle at ffffffffa0f3b945 [mdt]
#15 [ffff8807e2681d68] ptlrpc_server_handle_request at ffffffffa08b4015 [ptlrpc]
#16 [ffff8807e2681e48] ptlrpc_main at ffffffffa08b537d [ptlrpc]
#17 [ffff8807e2681ee8] kthread at ffffffff8109ab56
#18 [ffff8807e2681f48] kernel_thread at ffffffff8100c20a

info

[ 3724.166886] INFO: task mdt00_003:15957 blocked for more than 120 seconds.
[ 3724.173775] Tainted: P --------------- 2.6.32-431.17.1.el6.wc.x86_64 #1
[ 3724.182236] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3724.190222] mdt00_003 D 0000000000000010 0 15957 2 0x00000000
[ 3724.197292] ffff8807e1f118b8 0000000000000046 0000000000000000 ffffffffa0f5b6eb
[ 3724.205018] ffff8810021cf190 ffff8810021cf138 ffff8808326a0538 ffffffffa0f5b6eb
[ 3724.212759] ffff8807f846fab8 ffff8807e1f11fd8 000000000000fbc8 ffff8807f846fab8
[ 3724.220491] Call Trace:
[ 3724.223032] [<ffffffff8152a6d5>] rwsem_down_failed_common+0x95/0x1d0
[ 3724.229597] [<ffffffffa0d9d3fb>] ? ldiskfs_xattr_trusted_get+0x2b/0x30 [ldiskfs]
[ 3724.237247] [<ffffffff811ae017>] ? generic_getxattr+0x87/0x90
[ 3724.243199] [<ffffffff8152a866>] rwsem_down_read_failed+0x26/0x30
[ 3724.249499] [<ffffffffa0fe8083>] ? lod_xattr_get+0x153/0x420 [lod]
[ 3724.255867] [<ffffffff8128eab4>] call_rwsem_down_read_failed+0x14/0x30
[ 3724.262580] [<ffffffff81529d64>] ? down_read+0x24/0x30
[ 3724.267923] [<ffffffffa0f2569d>] mdt_object_open_lock+0x1ed/0x9d0 [mdt]
[ 3724.274736] [<ffffffffa0f077e0>] ? mdt_attr_get_complex+0x520/0x7f0 [mdt]
[ 3724.281720] [<ffffffffa0f2dcc7>] mdt_reint_open+0x15b7/0x2150 [mdt]
[ 3724.288187] [<ffffffffa05e9f76>] ? upcall_cache_get_entry+0x296/0x880 [libcfs]
[ 3724.295688] [<ffffffffa073fc10>] ? lu_ucred+0x20/0x30 [obdclass]
[ 3724.301900] [<ffffffffa0f16611>] mdt_reint_rec+0x41/0xe0 [mdt]
[ 3724.307916] [<ffffffffa0efbe63>] mdt_reint_internal+0x4c3/0x780 [mdt]
[ 3724.314551] [<ffffffffa0efc3ee>] mdt_intent_reint+0x1ee/0x520 [mdt]
[ 3724.321023] [<ffffffffa0ef9bce>] mdt_intent_policy+0x3ae/0x770 [mdt]
[ 3724.327619] [<ffffffffa085a2e5>] ldlm_lock_enqueue+0x135/0x950 [ptlrpc]

[ 3724.334445] [<ffffffffa0883ccf>] ldlm_handle_enqueue0+0x50f/0x10c0 [ptlrpc]
[ 3724.341608] [<ffffffffa0efa096>] mdt_enqueue+0x46/0xe0 [mdt]
[ 3724.347484] [<ffffffffa0efec5a>] mdt_handle_common+0x52a/0x1470 [mdt]
[ 3724.354147] [<ffffffffa0f3b945>] mds_regular_handle+0x15/0x20 [mdt]
[ 3724.360670] [<ffffffffa08b4015>] ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
[ 3724.368508] [<ffffffffa05ce4ce>] ? cfs_timer_arm+0xe/0x10 [libcfs]
[ 3724.374864] [<ffffffffa05df3cf>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
[ 3724.381791] [<ffffffffa08ab699>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
[ 3724.388691] [<ffffffff810546b9>] ? __wake_up_common+0x59/0x90
[ 3724.394666] [<ffffffffa08b537d>] ptlrpc_main+0xaed/0x1920 [ptlrpc]
[ 3724.402340] [<ffffffffa08b4890>] ? ptlrpc_main+0x0/0x1920 [ptlrpc]
[ 3724.408702] [<ffffffff8109ab56>] kthread+0x96/0xa0
[ 3724.413684] [<ffffffff8100c20a>] child_rip+0xa/0x20
[ 3724.418754] [<ffffffff8109aac0>] ? kthread+0x0/0xa0
[ 3724.423841] [<ffffffff8100c200>] ? child_rip+0x0/0x20

James A Simmons added a comment - 02/Sep/14 5:18 PM crash> bt ffff8807e2630080 PID: 15771 TASK: ffff8807e2630080 CPU: 20 COMMAND: "mdt01_000" #0 [ffff8807e26817f8] schedule at ffffffff81528020 #1 [ffff8807e26818c0] rwsem_down_failed_common at ffffffff8152a6d5 #2 [ffff8807e2681920] rwsem_down_read_failed at ffffffff8152a866 #3 [ffff8807e2681960] call_rwsem_down_read_failed at ffffffff8128eab4 #4 [ffff8807e26819c8] mdt_object_open_lock at ffffffffa0f2569d [mdt] #5 [ffff8807e2681a38] mdt_reint_open at ffffffffa0f2dcc7 [mdt] #6 [ffff8807e2681b18] mdt_reint_rec at ffffffffa0f16611 [mdt] #7 [ffff8807e2681b38] mdt_reint_internal at ffffffffa0efbe63 [mdt] #8 [ffff8807e2681b78] mdt_intent_reint at ffffffffa0efc3ee [mdt] #9 [ffff8807e2681bc8] mdt_intent_policy at ffffffffa0ef9bce [mdt] #10 [ffff8807e2681c08] ldlm_lock_enqueue at ffffffffa085a2e5 [ptlrpc] #11 [ffff8807e2681c78] ldlm_handle_enqueue0 at ffffffffa0883ccf [ptlrpc] #12 [ffff8807e2681ce8] mdt_enqueue at ffffffffa0efa096 [mdt] #13 [ffff8807e2681d08] mdt_handle_common at ffffffffa0efec5a [mdt] #14 [ffff8807e2681d58] mds_regular_handle at ffffffffa0f3b945 [mdt] #15 [ffff8807e2681d68] ptlrpc_server_handle_request at ffffffffa08b4015 [ptlrpc] #16 [ffff8807e2681e48] ptlrpc_main at ffffffffa08b537d [ptlrpc] #17 [ffff8807e2681ee8] kthread at ffffffff8109ab56 #18 [ffff8807e2681f48] kernel_thread at ffffffff8100c20a info [ 3724.166886] INFO: task mdt00_003:15957 blocked for more than 120 seconds. [ 3724.173775] Tainted: P --------------- 2.6.32-431.17.1.el6.wc.x86_64 #1 [ 3724.182236] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 3724.190222] mdt00_003 D 0000000000000010 0 15957 2 0x00000000 [ 3724.197292] ffff8807e1f118b8 0000000000000046 0000000000000000 ffffffffa0f5b6eb [ 3724.205018] ffff8810021cf190 ffff8810021cf138 ffff8808326a0538 ffffffffa0f5b6eb [ 3724.212759] ffff8807f846fab8 ffff8807e1f11fd8 000000000000fbc8 ffff8807f846fab8 [ 3724.220491] Call Trace: [ 3724.223032] [<ffffffff8152a6d5>] rwsem_down_failed_common+0x95/0x1d0 [ 3724.229597] [<ffffffffa0d9d3fb>] ? ldiskfs_xattr_trusted_get+0x2b/0x30 [ldiskfs] [ 3724.237247] [<ffffffff811ae017>] ? generic_getxattr+0x87/0x90 [ 3724.243199] [<ffffffff8152a866>] rwsem_down_read_failed+0x26/0x30 [ 3724.249499] [<ffffffffa0fe8083>] ? lod_xattr_get+0x153/0x420 [lod] [ 3724.255867] [<ffffffff8128eab4>] call_rwsem_down_read_failed+0x14/0x30 [ 3724.262580] [<ffffffff81529d64>] ? down_read+0x24/0x30 [ 3724.267923] [<ffffffffa0f2569d>] mdt_object_open_lock+0x1ed/0x9d0 [mdt] [ 3724.274736] [<ffffffffa0f077e0>] ? mdt_attr_get_complex+0x520/0x7f0 [mdt] [ 3724.281720] [<ffffffffa0f2dcc7>] mdt_reint_open+0x15b7/0x2150 [mdt] [ 3724.288187] [<ffffffffa05e9f76>] ? upcall_cache_get_entry+0x296/0x880 [libcfs] [ 3724.295688] [<ffffffffa073fc10>] ? lu_ucred+0x20/0x30 [obdclass] [ 3724.301900] [<ffffffffa0f16611>] mdt_reint_rec+0x41/0xe0 [mdt] [ 3724.307916] [<ffffffffa0efbe63>] mdt_reint_internal+0x4c3/0x780 [mdt] [ 3724.314551] [<ffffffffa0efc3ee>] mdt_intent_reint+0x1ee/0x520 [mdt] [ 3724.321023] [<ffffffffa0ef9bce>] mdt_intent_policy+0x3ae/0x770 [mdt] [ 3724.327619] [<ffffffffa085a2e5>] ldlm_lock_enqueue+0x135/0x950 [ptlrpc] [ 3724.334445] [<ffffffffa0883ccf>] ldlm_handle_enqueue0+0x50f/0x10c0 [ptlrpc] [ 3724.341608] [<ffffffffa0efa096>] mdt_enqueue+0x46/0xe0 [mdt] [ 3724.347484] [<ffffffffa0efec5a>] mdt_handle_common+0x52a/0x1470 [mdt] [ 3724.354147] [<ffffffffa0f3b945>] mds_regular_handle+0x15/0x20 [mdt] [ 3724.360670] [<ffffffffa08b4015>] ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc] [ 3724.368508] [<ffffffffa05ce4ce>] ? cfs_timer_arm+0xe/0x10 [libcfs] [ 3724.374864] [<ffffffffa05df3cf>] ? lc_watchdog_touch+0x6f/0x170 [libcfs] [ 3724.381791] [<ffffffffa08ab699>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc] [ 3724.388691] [<ffffffff810546b9>] ? __wake_up_common+0x59/0x90 [ 3724.394666] [<ffffffffa08b537d>] ptlrpc_main+0xaed/0x1920 [ptlrpc] [ 3724.402340] [<ffffffffa08b4890>] ? ptlrpc_main+0x0/0x1920 [ptlrpc] [ 3724.408702] [<ffffffff8109ab56>] kthread+0x96/0xa0 [ 3724.413684] [<ffffffff8100c20a>] child_rip+0xa/0x20 [ 3724.418754] [<ffffffff8109aac0>] ? kthread+0x0/0xa0 [ 3724.423841] [<ffffffff8100c200>] ? child_rip+0x0/0x20

Niu Yawei (Inactive) added a comment - 29/Aug/14 3:01 AM

Is it possible that the messages in vmcore-dmesg.txt was truncated? I think the output of bt-a and log could be still valuable. I never used lockdep, so I'm afraid that I can't give any advice on this.

Niu Yawei (Inactive) added a comment - 29/Aug/14 3:01 AM Is it possible that the messages in vmcore-dmesg.txt was truncated? I think the output of bt-a and log could be still valuable. I never used lockdep, so I'm afraid that I can't give any advice on this.

James A Simmons added a comment - 28/Aug/14 6:17 PM

I did get the bt-a and log data but it didn't reveal much. The bt-a looked like the dmesg output. Perhaps we should approach this with kernel lockdep?

James A Simmons added a comment - 28/Aug/14 6:17 PM I did get the bt-a and log data but it didn't reveal much. The bt-a looked like the dmesg output. Perhaps we should approach this with kernel lockdep?

Niu Yawei (Inactive) added a comment - 22/Aug/14 9:29 AM

Hi, James

I downloaded the vmcore-dmesg.txt, it looks lots service threads are blocked on the inode->xattr_sem when trying to read xattr. unfortunately, downloading those files are extremely slow for me, it could take more than 10 hours to download a core file, I tried few times, and gave up at the end.

Could you send me the output of the "bt -a" and "log" (two crash commands)? Thanks in advance.

Niu Yawei (Inactive) added a comment - 22/Aug/14 9:29 AM Hi, James I downloaded the vmcore-dmesg.txt, it looks lots service threads are blocked on the inode->xattr_sem when trying to read xattr. unfortunately, downloading those files are extremely slow for me, it could take more than 10 hours to download a core file, I tried few times, and gave up at the end. Could you send me the output of the "bt -a" and "log" (two crash commands)? Thanks in advance.

Peter Jones added a comment - 21/Aug/14 9:29 PM

Niu

Could you please advise on this one?

Thanks

Peter

Peter Jones added a comment - 21/Aug/14 9:29 PM Niu Could you please advise on this one? Thanks Peter

James A Simmons added a comment - 21/Aug/14 7:07 PM

I uploaded a crash dump to ftp.whamcloud.com/uploads/~~LU-5530~~

Also you will see the debuginfo rpms as well.

James A Simmons added a comment - 21/Aug/14 7:07 PM I uploaded a crash dump to ftp.whamcloud.com/uploads/ LU-5530 Also you will see the debuginfo rpms as well.

People

Assignee:: Oleg Drokin

Reporter:: James A Simmons

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Dates

Created:: 21/Aug/14 6:58 PM

Updated:: 13/Oct/21 3:05 AM

Resolved:: 11/Oct/14 4:11 AM