Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5530

MDS thread lockup witrh patched 2.5 server

Details

    • 3
    • 15395

    Description

      While testing 2.5 servers for a possible upcoming test shot I hit this bug while running simul.

      <6>[ 3724.190222] mdt00_003 D 0000000000000010 0 15957 2 0x00000000
      <4>[ 3724.197292] ffff8807e1f118b8 0000000000000046 0000000000000000 ffffffffa0f5b6eb
      <4>[ 3724.205018] ffff8810021cf190 ffff8810021cf138 ffff8808326a0538 ffffffffa0f5b6eb
      <4>[ 3724.212759] ffff8807f846fab8 ffff8807e1f11fd8 000000000000fbc8 ffff8807f846fab8
      <4>[ 3724.220491] Call Trace:
      <4>[ 3724.223032] [<ffffffff8152a6d5>] rwsem_down_failed_common+0x95/0x1d0
      <4>[ 3724.229597] [<ffffffffa0d9d3fb>] ? ldiskfs_xattr_trusted_get+0x2b/0x30 [ldiskfs]
      <4>[ 3724.237247] [<ffffffff811ae017>] ? generic_getxattr+0x87/0x90
      <4>[ 3724.243199] [<ffffffff8152a866>] rwsem_down_read_failed+0x26/0x30
      <4>[ 3724.249499] [<ffffffffa0fe8083>] ? lod_xattr_get+0x153/0x420 [lod]
      <4>[ 3724.255867] [<ffffffff8128eab4>] call_rwsem_down_read_failed+0x14/0x30
      <4>[ 3724.262580] [<ffffffff81529d64>] ? down_read+0x24/0x30
      <4>[ 3724.267923] [<ffffffffa0f2569d>] mdt_object_open_lock+0x1ed/0x9d0 [mdt]
      <4>[ 3724.274736] [<ffffffffa0f077e0>] ? mdt_attr_get_complex+0x520/0x7f0 [mdt]
      <4>[ 3724.281720] [<ffffffffa0f2dcc7>] mdt_reint_open+0x15b7/0x2150 [mdt]
      <4>[ 3724.288187] [<ffffffffa05e9f76>] ? upcall_cache_get_entry+0x296/0x880 [libcfs]
      <4>[ 3724.295688] [<ffffffffa073fc10>] ? lu_ucred+0x20/0x30 [obdclass]
      <4>[ 3724.301900] [<ffffffffa0f16611>] mdt_reint_rec+0x41/0xe0 [mdt]
      <4>[ 3724.307916] [<ffffffffa0efbe63>] mdt_reint_internal+0x4c3/0x780 [mdt]
      <4>[ 3724.314551] [<ffffffffa0efc3ee>] mdt_intent_reint+0x1ee/0x520 [mdt]
      <4>[ 3724.321023] [<ffffffffa0ef9bce>] mdt_intent_policy+0x3ae/0x770 [mdt]
      <4>[ 3724.327619] [<ffffffffa085a2e5>] ldlm_lock_enqueue+0x135/0x950 [ptlrpc]
      <4>[ 3724.334445] [<ffffffffa0883ccf>]
      ldlm_handle_enqueue0+0x50f/0x10c0 [ptlrpc]
      <4>[ 3724.334445] [<ffffffffa0883ccf>] ldlm_handle_enqueue0+0x50f/0x10c0 [ptlrpc]
      <4>[ 3724.341608] [<ffffffffa0efa096>] mdt_enqueue+0x46/0xe0 [mdt]
      <4>[ 3724.347484] [<ffffffffa0efec5a>] mdt_handle_common+0x52a/0x1470 [mdt]
      <4>[ 3724.354147] [<ffffffffa0f3b945>] mds_regular_handle+0x15/0x20 [mdt]
      <4>[ 3724.360670] [<ffffffffa08b4015>] ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
      <4>[ 3724.368508] [<ffffffffa05ce4ce>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      <4>[ 3724.374864] [<ffffffffa05df3cf>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
      <4>[ 3724.381791] [<ffffffffa08ab699>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
      <4>[ 3724.388691] [<ffffffff810546b9>] ? __wake_up_common+0x59/0x90
      <4>[ 3724.394666] [<ffffffffa08b537d>] ptlrpc_main+0xaed/0x1920 [ptlrpc]
      <4>[ 3724.402340] [<ffffffffa08b4890>] ? ptlrpc_main+0x0/0x1920 [ptlrpc]
      <4>[ 3724.408702] [<ffffffff8109ab56>] kthread+0x96/0xa0
      <4>[ 3724.413684] [<ffffffff8100c20a>] child_rip+0xa/0x20
      <4>[ 3724.418754] [<ffffffff8109aac0>] ? kthread+0x0/0xa0
      <4>[ 3724.423841] [<ffffffff8100c200>] ? child_rip+0x0/0x20

      Attachments

        Issue Links

          Activity

            [LU-5530] MDS thread lockup witrh patched 2.5 server
            pjones Peter Jones added a comment -

            Landed for 2.5.4 and 2.7

            pjones Peter Jones added a comment - Landed for 2.5.4 and 2.7
            green Oleg Drokin added a comment -

            My "force frequent resends to facilitate lu2827-like issues" patch reallyworked as expected ( http://review.whamcloud.com/11842 ).

            Now I was able to reproduce this problem and found out that it is LU-2827 related too - mismatched release of a semaphore (so that it was released more than it was taken) in mdt_reint_open.
            Proposed patch for this issue: http://review.whamcloud.com/11841 (also applies to b2_5)

            In addition, after fixing this I also hit LU-5579 while testing with racer. And LU-5496 is also a bugfix in a related area so both of those patches should be adopted too.

            green Oleg Drokin added a comment - My "force frequent resends to facilitate lu2827-like issues" patch reallyworked as expected ( http://review.whamcloud.com/11842 ). Now I was able to reproduce this problem and found out that it is LU-2827 related too - mismatched release of a semaphore (so that it was released more than it was taken) in mdt_reint_open. Proposed patch for this issue: http://review.whamcloud.com/11841 (also applies to b2_5) In addition, after fixing this I also hit LU-5579 while testing with racer. And LU-5496 is also a bugfix in a related area so both of those patches should be adopted too.
            green Oleg Drokin added a comment -

            So I have looked in the core file, checked all mds threads.
            There is no obvious culprit for the deadlock observed. Nothing sleeping or otherwise blocked with this lock held.
            So it appears that the semaphore is somehow leaked.
            I inspected all the callers and users of the open semaphore and there does not seem to be any possible way for the leakage.
            I double-checked using ORNL patched 2.5 tree and it seems to be the case there as well.

            As such I am switching mode now.
            The plan is to trigger resending of the client requests at all times due to too small of a buffer. Either by making the server to always send huge replies or by the client to always provide too little ones.

            I am getting some interesting crashes this way already, though possibly this is bit self-inflicted.
            This is also useful so that we can execute this code even on our small test clusters during normal operations.

            green Oleg Drokin added a comment - So I have looked in the core file, checked all mds threads. There is no obvious culprit for the deadlock observed. Nothing sleeping or otherwise blocked with this lock held. So it appears that the semaphore is somehow leaked. I inspected all the callers and users of the open semaphore and there does not seem to be any possible way for the leakage. I double-checked using ORNL patched 2.5 tree and it seems to be the case there as well. As such I am switching mode now. The plan is to trigger resending of the client requests at all times due to too small of a buffer. Either by making the server to always send huge replies or by the client to always provide too little ones. I am getting some interesting crashes this way already, though possibly this is bit self-inflicted. This is also useful so that we can execute this code even on our small test clusters during normal operations.

            crash> bt ffff8807e2630080
            PID: 15771 TASK: ffff8807e2630080 CPU: 20 COMMAND: "mdt01_000"
            #0 [ffff8807e26817f8] schedule at ffffffff81528020
            #1 [ffff8807e26818c0] rwsem_down_failed_common at ffffffff8152a6d5
            #2 [ffff8807e2681920] rwsem_down_read_failed at ffffffff8152a866
            #3 [ffff8807e2681960] call_rwsem_down_read_failed at ffffffff8128eab4
            #4 [ffff8807e26819c8] mdt_object_open_lock at ffffffffa0f2569d [mdt]
            #5 [ffff8807e2681a38] mdt_reint_open at ffffffffa0f2dcc7 [mdt]
            #6 [ffff8807e2681b18] mdt_reint_rec at ffffffffa0f16611 [mdt]
            #7 [ffff8807e2681b38] mdt_reint_internal at ffffffffa0efbe63 [mdt]
            #8 [ffff8807e2681b78] mdt_intent_reint at ffffffffa0efc3ee [mdt]
            #9 [ffff8807e2681bc8] mdt_intent_policy at ffffffffa0ef9bce [mdt]
            #10 [ffff8807e2681c08] ldlm_lock_enqueue at ffffffffa085a2e5 [ptlrpc]
            #11 [ffff8807e2681c78] ldlm_handle_enqueue0 at ffffffffa0883ccf [ptlrpc]
            #12 [ffff8807e2681ce8] mdt_enqueue at ffffffffa0efa096 [mdt]
            #13 [ffff8807e2681d08] mdt_handle_common at ffffffffa0efec5a [mdt]
            #14 [ffff8807e2681d58] mds_regular_handle at ffffffffa0f3b945 [mdt]
            #15 [ffff8807e2681d68] ptlrpc_server_handle_request at ffffffffa08b4015 [ptlrpc]
            #16 [ffff8807e2681e48] ptlrpc_main at ffffffffa08b537d [ptlrpc]
            #17 [ffff8807e2681ee8] kthread at ffffffff8109ab56
            #18 [ffff8807e2681f48] kernel_thread at ffffffff8100c20a

            info

            [ 3724.166886] INFO: task mdt00_003:15957 blocked for more than 120 seconds.
            [ 3724.173775] Tainted: P --------------- 2.6.32-431.17.1.el6.wc.x86_64 #1
            [ 3724.182236] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
            [ 3724.190222] mdt00_003 D 0000000000000010 0 15957 2 0x00000000
            [ 3724.197292] ffff8807e1f118b8 0000000000000046 0000000000000000 ffffffffa0f5b6eb
            [ 3724.205018] ffff8810021cf190 ffff8810021cf138 ffff8808326a0538 ffffffffa0f5b6eb
            [ 3724.212759] ffff8807f846fab8 ffff8807e1f11fd8 000000000000fbc8 ffff8807f846fab8
            [ 3724.220491] Call Trace:
            [ 3724.223032] [<ffffffff8152a6d5>] rwsem_down_failed_common+0x95/0x1d0
            [ 3724.229597] [<ffffffffa0d9d3fb>] ? ldiskfs_xattr_trusted_get+0x2b/0x30 [ldiskfs]
            [ 3724.237247] [<ffffffff811ae017>] ? generic_getxattr+0x87/0x90
            [ 3724.243199] [<ffffffff8152a866>] rwsem_down_read_failed+0x26/0x30
            [ 3724.249499] [<ffffffffa0fe8083>] ? lod_xattr_get+0x153/0x420 [lod]
            [ 3724.255867] [<ffffffff8128eab4>] call_rwsem_down_read_failed+0x14/0x30
            [ 3724.262580] [<ffffffff81529d64>] ? down_read+0x24/0x30
            [ 3724.267923] [<ffffffffa0f2569d>] mdt_object_open_lock+0x1ed/0x9d0 [mdt]
            [ 3724.274736] [<ffffffffa0f077e0>] ? mdt_attr_get_complex+0x520/0x7f0 [mdt]
            [ 3724.281720] [<ffffffffa0f2dcc7>] mdt_reint_open+0x15b7/0x2150 [mdt]
            [ 3724.288187] [<ffffffffa05e9f76>] ? upcall_cache_get_entry+0x296/0x880 [libcfs]
            [ 3724.295688] [<ffffffffa073fc10>] ? lu_ucred+0x20/0x30 [obdclass]
            [ 3724.301900] [<ffffffffa0f16611>] mdt_reint_rec+0x41/0xe0 [mdt]
            [ 3724.307916] [<ffffffffa0efbe63>] mdt_reint_internal+0x4c3/0x780 [mdt]
            [ 3724.314551] [<ffffffffa0efc3ee>] mdt_intent_reint+0x1ee/0x520 [mdt]
            [ 3724.321023] [<ffffffffa0ef9bce>] mdt_intent_policy+0x3ae/0x770 [mdt]
            [ 3724.327619] [<ffffffffa085a2e5>] ldlm_lock_enqueue+0x135/0x950 [ptlrpc]

            [ 3724.334445] [<ffffffffa0883ccf>] ldlm_handle_enqueue0+0x50f/0x10c0 [ptlrpc]
            [ 3724.341608] [<ffffffffa0efa096>] mdt_enqueue+0x46/0xe0 [mdt]
            [ 3724.347484] [<ffffffffa0efec5a>] mdt_handle_common+0x52a/0x1470 [mdt]
            [ 3724.354147] [<ffffffffa0f3b945>] mds_regular_handle+0x15/0x20 [mdt]
            [ 3724.360670] [<ffffffffa08b4015>] ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
            [ 3724.368508] [<ffffffffa05ce4ce>] ? cfs_timer_arm+0xe/0x10 [libcfs]
            [ 3724.374864] [<ffffffffa05df3cf>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
            [ 3724.381791] [<ffffffffa08ab699>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
            [ 3724.388691] [<ffffffff810546b9>] ? __wake_up_common+0x59/0x90
            [ 3724.394666] [<ffffffffa08b537d>] ptlrpc_main+0xaed/0x1920 [ptlrpc]
            [ 3724.402340] [<ffffffffa08b4890>] ? ptlrpc_main+0x0/0x1920 [ptlrpc]
            [ 3724.408702] [<ffffffff8109ab56>] kthread+0x96/0xa0
            [ 3724.413684] [<ffffffff8100c20a>] child_rip+0xa/0x20
            [ 3724.418754] [<ffffffff8109aac0>] ? kthread+0x0/0xa0
            [ 3724.423841] [<ffffffff8100c200>] ? child_rip+0x0/0x20

            simmonsja James A Simmons added a comment - crash> bt ffff8807e2630080 PID: 15771 TASK: ffff8807e2630080 CPU: 20 COMMAND: "mdt01_000" #0 [ffff8807e26817f8] schedule at ffffffff81528020 #1 [ffff8807e26818c0] rwsem_down_failed_common at ffffffff8152a6d5 #2 [ffff8807e2681920] rwsem_down_read_failed at ffffffff8152a866 #3 [ffff8807e2681960] call_rwsem_down_read_failed at ffffffff8128eab4 #4 [ffff8807e26819c8] mdt_object_open_lock at ffffffffa0f2569d [mdt] #5 [ffff8807e2681a38] mdt_reint_open at ffffffffa0f2dcc7 [mdt] #6 [ffff8807e2681b18] mdt_reint_rec at ffffffffa0f16611 [mdt] #7 [ffff8807e2681b38] mdt_reint_internal at ffffffffa0efbe63 [mdt] #8 [ffff8807e2681b78] mdt_intent_reint at ffffffffa0efc3ee [mdt] #9 [ffff8807e2681bc8] mdt_intent_policy at ffffffffa0ef9bce [mdt] #10 [ffff8807e2681c08] ldlm_lock_enqueue at ffffffffa085a2e5 [ptlrpc] #11 [ffff8807e2681c78] ldlm_handle_enqueue0 at ffffffffa0883ccf [ptlrpc] #12 [ffff8807e2681ce8] mdt_enqueue at ffffffffa0efa096 [mdt] #13 [ffff8807e2681d08] mdt_handle_common at ffffffffa0efec5a [mdt] #14 [ffff8807e2681d58] mds_regular_handle at ffffffffa0f3b945 [mdt] #15 [ffff8807e2681d68] ptlrpc_server_handle_request at ffffffffa08b4015 [ptlrpc] #16 [ffff8807e2681e48] ptlrpc_main at ffffffffa08b537d [ptlrpc] #17 [ffff8807e2681ee8] kthread at ffffffff8109ab56 #18 [ffff8807e2681f48] kernel_thread at ffffffff8100c20a info [ 3724.166886] INFO: task mdt00_003:15957 blocked for more than 120 seconds. [ 3724.173775] Tainted: P --------------- 2.6.32-431.17.1.el6.wc.x86_64 #1 [ 3724.182236] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 3724.190222] mdt00_003 D 0000000000000010 0 15957 2 0x00000000 [ 3724.197292] ffff8807e1f118b8 0000000000000046 0000000000000000 ffffffffa0f5b6eb [ 3724.205018] ffff8810021cf190 ffff8810021cf138 ffff8808326a0538 ffffffffa0f5b6eb [ 3724.212759] ffff8807f846fab8 ffff8807e1f11fd8 000000000000fbc8 ffff8807f846fab8 [ 3724.220491] Call Trace: [ 3724.223032] [<ffffffff8152a6d5>] rwsem_down_failed_common+0x95/0x1d0 [ 3724.229597] [<ffffffffa0d9d3fb>] ? ldiskfs_xattr_trusted_get+0x2b/0x30 [ldiskfs] [ 3724.237247] [<ffffffff811ae017>] ? generic_getxattr+0x87/0x90 [ 3724.243199] [<ffffffff8152a866>] rwsem_down_read_failed+0x26/0x30 [ 3724.249499] [<ffffffffa0fe8083>] ? lod_xattr_get+0x153/0x420 [lod] [ 3724.255867] [<ffffffff8128eab4>] call_rwsem_down_read_failed+0x14/0x30 [ 3724.262580] [<ffffffff81529d64>] ? down_read+0x24/0x30 [ 3724.267923] [<ffffffffa0f2569d>] mdt_object_open_lock+0x1ed/0x9d0 [mdt] [ 3724.274736] [<ffffffffa0f077e0>] ? mdt_attr_get_complex+0x520/0x7f0 [mdt] [ 3724.281720] [<ffffffffa0f2dcc7>] mdt_reint_open+0x15b7/0x2150 [mdt] [ 3724.288187] [<ffffffffa05e9f76>] ? upcall_cache_get_entry+0x296/0x880 [libcfs] [ 3724.295688] [<ffffffffa073fc10>] ? lu_ucred+0x20/0x30 [obdclass] [ 3724.301900] [<ffffffffa0f16611>] mdt_reint_rec+0x41/0xe0 [mdt] [ 3724.307916] [<ffffffffa0efbe63>] mdt_reint_internal+0x4c3/0x780 [mdt] [ 3724.314551] [<ffffffffa0efc3ee>] mdt_intent_reint+0x1ee/0x520 [mdt] [ 3724.321023] [<ffffffffa0ef9bce>] mdt_intent_policy+0x3ae/0x770 [mdt] [ 3724.327619] [<ffffffffa085a2e5>] ldlm_lock_enqueue+0x135/0x950 [ptlrpc] [ 3724.334445] [<ffffffffa0883ccf>] ldlm_handle_enqueue0+0x50f/0x10c0 [ptlrpc] [ 3724.341608] [<ffffffffa0efa096>] mdt_enqueue+0x46/0xe0 [mdt] [ 3724.347484] [<ffffffffa0efec5a>] mdt_handle_common+0x52a/0x1470 [mdt] [ 3724.354147] [<ffffffffa0f3b945>] mds_regular_handle+0x15/0x20 [mdt] [ 3724.360670] [<ffffffffa08b4015>] ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc] [ 3724.368508] [<ffffffffa05ce4ce>] ? cfs_timer_arm+0xe/0x10 [libcfs] [ 3724.374864] [<ffffffffa05df3cf>] ? lc_watchdog_touch+0x6f/0x170 [libcfs] [ 3724.381791] [<ffffffffa08ab699>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc] [ 3724.388691] [<ffffffff810546b9>] ? __wake_up_common+0x59/0x90 [ 3724.394666] [<ffffffffa08b537d>] ptlrpc_main+0xaed/0x1920 [ptlrpc] [ 3724.402340] [<ffffffffa08b4890>] ? ptlrpc_main+0x0/0x1920 [ptlrpc] [ 3724.408702] [<ffffffff8109ab56>] kthread+0x96/0xa0 [ 3724.413684] [<ffffffff8100c20a>] child_rip+0xa/0x20 [ 3724.418754] [<ffffffff8109aac0>] ? kthread+0x0/0xa0 [ 3724.423841] [<ffffffff8100c200>] ? child_rip+0x0/0x20

            Is it possible that the messages in vmcore-dmesg.txt was truncated? I think the output of bt-a and log could be still valuable. I never used lockdep, so I'm afraid that I can't give any advice on this.

            niu Niu Yawei (Inactive) added a comment - Is it possible that the messages in vmcore-dmesg.txt was truncated? I think the output of bt-a and log could be still valuable. I never used lockdep, so I'm afraid that I can't give any advice on this.

            I did get the bt-a and log data but it didn't reveal much. The bt-a looked like the dmesg output. Perhaps we should approach this with kernel lockdep?

            simmonsja James A Simmons added a comment - I did get the bt-a and log data but it didn't reveal much. The bt-a looked like the dmesg output. Perhaps we should approach this with kernel lockdep?

            Hi, James

            I downloaded the vmcore-dmesg.txt, it looks lots service threads are blocked on the inode->xattr_sem when trying to read xattr. unfortunately, downloading those files are extremely slow for me, it could take more than 10 hours to download a core file, I tried few times, and gave up at the end.

            Could you send me the output of the "bt -a" and "log" (two crash commands)? Thanks in advance.

            niu Niu Yawei (Inactive) added a comment - Hi, James I downloaded the vmcore-dmesg.txt, it looks lots service threads are blocked on the inode->xattr_sem when trying to read xattr. unfortunately, downloading those files are extremely slow for me, it could take more than 10 hours to download a core file, I tried few times, and gave up at the end. Could you send me the output of the "bt -a" and "log" (two crash commands)? Thanks in advance.
            pjones Peter Jones added a comment -

            Niu

            Could you please advise on this one?

            Thanks

            Peter

            pjones Peter Jones added a comment - Niu Could you please advise on this one? Thanks Peter

            I uploaded a crash dump to ftp.whamcloud.com/uploads/LU-5530

            Also you will see the debuginfo rpms as well.

            simmonsja James A Simmons added a comment - I uploaded a crash dump to ftp.whamcloud.com/uploads/ LU-5530 Also you will see the debuginfo rpms as well.

            People

              green Oleg Drokin
              simmonsja James A Simmons
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: