Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4195

MDT Slow with ptlrpcd using 100% cpu.

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • None
    • Lustre 2.1.5
    • None
    • server running 2.1.5-2nas
      our source is at git://github.com/jlan/lustre-nas.git
    • 3
    • 11369

    Description

      mdt response very slow. Top showed ptlrpcd running at 100% cpu. Console showed errors. Was able to run debug trace. See attached files.

      Lustre: Service thread pid 7065 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      Pid: 7065, comm: mdt_93

      Call Trace:
      [<ffffffff810539b2>] ? enqueue_task_fair+0x142/0x490
      [<ffffffff81096f5d>] ? sched_clock_cpu+0xcd/0x110
      [<ffffffffa04f960e>] cfs_waitq_wait+0xe/0x10 [libcfs]
      [<ffffffffa0a9f6de>] qos_statfs_update+0x7fe/0xa70 [lov]
      [<ffffffff8105fab0>] ? default_wake_function+0x0/0x20
      [<ffffffffa0aa00fd>] alloc_qos+0x1ad/0x21a0 [lov]
      [<ffffffffa0aa306c>] qos_prep_create+0x1ec/0x2380 [lov]
      [<ffffffffa04f98be>] ? cfs_free+0xe/0x10 [libcfs]
      [<ffffffffa0a9c63a>] lov_prep_create_set+0xea/0x390 [lov]
      [<ffffffffa0a84b0c>] lov_create+0x1ac/0x1400 [lov]
      [<ffffffffa0d8a50f>] ? obd_iocontrol+0xef/0x390 [mdd]
      [<ffffffffa0d8f90e>] mdd_lov_create+0x9ee/0x1ba0 [mdd]
      [<ffffffffa0da1871>] mdd_create+0xf81/0x1a90 [mdd]
      [<ffffffffa0ea9df3>] ? osd_oi_lookup+0x83/0x110 [osd_ldiskfs]
      [<ffffffffa0ea456c>] ? osd_object_init+0xdc/0x3e0 [osd_ldiskfs]
      [<ffffffffa0eda3f7>] cml_create+0x97/0x250 [cmm]
      [<ffffffffa0e165e1>] ? mdt_version_get_save+0x91/0xd0 [mdt]
      [<ffffffffa0e2c06e>] mdt_reint_open+0x1aae/0x28a0 [mdt]
      [<ffffffffa078f724>] ? lustre_msg_add_version+0x74/0xd0 [ptlrpc]
      [<ffffffffa0da456e>] ? md_ucred+0x1e/0x60 [mdd]
      [<ffffffffa0e14c81>] mdt_reint_rec+0x41/0xe0 [mdt]
      [<ffffffffa0e0bed4>] mdt_reint_internal+0x544/0x8e0 [mdt]
      [<ffffffffa0e0c53d>] mdt_intent_reint+0x1ed/0x530 [mdt]
      [<ffffffffa0e0ac09>] mdt_intent_policy+0x379/0x690 [mdt]
      [<ffffffffa074b351>] ldlm_lock_enqueue+0x361/0x8f0 [ptlrpc]
      [<ffffffffa07711ad>] ldlm_handle_enqueue0+0x48d/0xf50 [ptlrpc]
      [<ffffffffa0e0b586>] mdt_enqueue+0x46/0x130 [mdt]
      [<ffffffffa0e00772>] mdt_handle_common+0x932/0x1750 [mdt]
      [<ffffffffa0e01665>] mdt_regular_handle+0x15/0x20 [mdt]
      [<ffffffffa079fb4e>] ptlrpc_main+0xc4e/0x1a40 [ptlrpc]
      [<ffffffffa079ef00>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc]
      [<ffffffff8100c0ca>] child_rip+0xa/0x20
      [<ffffffffa079ef00>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc]
      [<ffffffffa079ef00>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc]
      [<ffffffff8100c0c0>] ? child_rip+0x0/0x20

      Attachments

        1. bt.LU4195
          1.07 MB
        2. meminfo.LU4195
          33 kB

        Activity

          [LU-4195] MDT Slow with ptlrpcd using 100% cpu.

          Hello again Mahmoud,
          Please advise if you want us to keep this ticket open?

          If we don't hear back we'll mark it as resolved, with no fix, and we can re-open it if requested to do that.

          Thanks,
          ~ jfc.

          jfc John Fuchs-Chesney (Inactive) added a comment - Hello again Mahmoud, Please advise if you want us to keep this ticket open? If we don't hear back we'll mark it as resolved, with no fix, and we can re-open it if requested to do that. Thanks, ~ jfc.

          Hello Mahmoud,
          Just want to check in to see if this is still a 'live' issue for you?
          If the problem has been dealt with, can we mark this issue as 'resolved'?
          Thanks,
          ~ jfc.

          jfc John Fuchs-Chesney (Inactive) added a comment - Hello Mahmoud, Just want to check in to see if this is still a 'live' issue for you? If the problem has been dealt with, can we mark this issue as 'resolved'? Thanks, ~ jfc.
          ashehata Amir Shehata (Inactive) added a comment - - edited

          Would it be possible to grab the route configuration for one of the nodes that have the problem? Also if you could please highlight the problematic routes.

          As basic sanity check, please make sure that routers are actually configured as routers, IE: forwarding="enabled". If not, then the node will drop all messages not destined to itself.

          This error message is hit whenever LNET tries to send a message to a final destination that exists on a net that do not have a route from the net the current node is on. Thus an appropriate route is chosen.

          Furthermore, the routes can exist but they might be down because the NID specified is not reachable. Is it possible to try and do an lctl ping <nid> from the node that's having the problem. That should return a list of NIDs of the target router and their status (up/down). If one of these nids are down and avoid_asym_router_failure is 1 (which it is by default), then the entire router is considered down and when sending messages we'll hit the above error.

          NOTE: if you have a router that has multiple nids, but one of the nids is "unused" (IE, it sends/receives no messages), this would cause that NID to be considered down, and will lead to the above described scenario.

          ashehata Amir Shehata (Inactive) added a comment - - edited Would it be possible to grab the route configuration for one of the nodes that have the problem? Also if you could please highlight the problematic routes. As basic sanity check, please make sure that routers are actually configured as routers, IE: forwarding="enabled". If not, then the node will drop all messages not destined to itself. This error message is hit whenever LNET tries to send a message to a final destination that exists on a net that do not have a route from the net the current node is on. Thus an appropriate route is chosen. Furthermore, the routes can exist but they might be down because the NID specified is not reachable. Is it possible to try and do an lctl ping <nid> from the node that's having the problem. That should return a list of NIDs of the target router and their status (up/down). If one of these nids are down and avoid_asym_router_failure is 1 (which it is by default), then the entire router is considered down and when sending messages we'll hit the above error. NOTE: if you have a router that has multiple nids, but one of the nids is "unused" (IE, it sends/receives no messages), this would cause that NID to be considered down, and will lead to the above described scenario.

          Amir,
          Could you take a look at this one?
          Thank you!

          jlevi Jodi Levi (Inactive) added a comment - Amir, Could you take a look at this one? Thank you!

          I agree with Lai, the log shows that no router for o2ib to o2ib233, and the stack trace shows all ptlrpcd threads were busy on acquiring LNET_LOCK (which I think is caused by the router problem).

          Mahmoud, I didn't see anything related to quota in the log, what kind of evidence indicating it's triggered by 'lfs setquota' command?

          niu Niu Yawei (Inactive) added a comment - I agree with Lai, the log shows that no router for o2ib to o2ib233, and the stack trace shows all ptlrpcd threads were busy on acquiring LNET_LOCK (which I think is caused by the router problem). Mahmoud, I didn't see anything related to quota in the log, what kind of evidence indicating it's triggered by 'lfs setquota' command?

          We were not able to recover from this and had to dump the system.
          See attached files for memory and stack track info

          We have 7 other filesystem that use the same routers and clients and they were not experiencing this issue.

          There is "some" evidence that this may have been triggered by running a "lfs setquota" command.

          mhanafi Mahmoud Hanafi added a comment - We were not able to recover from this and had to dump the system. See attached files for memory and stack track info We have 7 other filesystem that use the same routers and clients and they were not experiencing this issue. There is "some" evidence that this may have been triggered by running a "lfs setquota" command.
          laisiyao Lai Siyao added a comment -

          In debug logs there are lots of messages like this:

          00000400:02000400:2.0:1383201992.510338:0:7620:0:(lib-move.c:1454:lnet_send()) No route to 12345-10.153.1.199@o2ib233 via 10.151.27.60@o2ib (all routers down)
          

          It looks like routers are down, and all connections are down too, and clients kept reconnecting to MDS (but failed), so ptlrpcd are always 100% busy.

          laisiyao Lai Siyao added a comment - In debug logs there are lots of messages like this: 00000400:02000400:2.0:1383201992.510338:0:7620:0:(lib-move.c:1454:lnet_send()) No route to 12345-10.153.1.199@o2ib233 via 10.151.27.60@o2ib (all routers down) It looks like routers are down, and all connections are down too, and clients kept reconnecting to MDS (but failed), so ptlrpcd are always 100% busy.

          Could you provide the full stack trace (for all tasks on all CPUs) as well? What about the memmory usage? Can this situation be recovered?

          niu Niu Yawei (Inactive) added a comment - Could you provide the full stack trace (for all tasks on all CPUs) as well? What about the memmory usage? Can this situation be recovered?

          Hope this helps

          crash> bt 5560
          PID: 5560 TASK: ffff881fb5e87540 CPU: 8 COMMAND: "ptlrpcd"
          #0 [ffff8820b0c07e90] crash_nmi_callback at ffffffff81029956
          #1 [ffff8820b0c07ea0] notifier_call_chain at ffffffff815222f5
          #2 [ffff8820b0c07ee0] atomic_notifier_call_chain at ffffffff8152235a
          #3 [ffff8820b0c07ef0] notify_die at ffffffff81095f1e
          #4 [ffff8820b0c07f20] do_nmi at ffffffff8151ff2f
          #5 [ffff8820b0c07f50] nmi at ffffffff8151f7e0
          [exception RIP: _spin_lock+33]
          RIP: ffffffff8151f051 RSP: ffff881fac2b3bf0 RFLAGS: 00000297
          RAX: 0000000000000d28 RBX: 0000000000000004 RCX: 0000000000000000
          RDX: 0000000000000d27 RSI: 0000000000000007 RDI: ffffffffa05ada40
          RBP: ffff881fac2b3bf0 R8: 0000000000000246 R9: 0000000000000001
          R10: ffff883343e44cc0 R11: 0000000000000000 R12: 000500000a971b57
          R13: 0000000000003039 R14: 0000000000000000 R15: 000527015f9078c7
          ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
          — <NMI exception stack> —
          #6 [ffff881fac2b3bf0] _spin_lock at ffffffff8151f051
          #7 [ffff881fac2b3bf8] LNetMEAttach at ffffffffa057f99f [lnet]
          #8 [ffff881fac2b3c68] ptl_send_rpc at ffffffffa078b532 [ptlrpc]
          #9 [ffff881fac2b3d28] ptlrpc_send_new_req at ffffffffa0781993 [ptlrpc]
          #10 [ffff881fac2b3d98] ptlrpc_check_set at ffffffffa07840a8 [ptlrpc]
          #11 [ffff881fac2b3e38] ptlrpcd_check at ffffffffa07b4ed0 [ptlrpc]
          #12 [ffff881fac2b3e68] ptlrpcd at ffffffffa07b516e [ptlrpc]
          #13 [ffff881fac2b3f48] kernel_thread at ffffffff8100c0ca

          crash> bt 5560 -l
          PID: 5560 TASK: ffff881fb5e87540 CPU: 8 COMMAND: "ptlrpcd"
          #0 [ffff8820b0c07e90] crash_nmi_callback at ffffffff81029956
          /usr/src/debug/kernel-lustre215-2.6.32-279.19.1.el6/linux-2.6.32-279.19.1.el6.x86_64/arch/x86/include/asm/paravirt.h: 115
          #1 [ffff8820b0c07ea0] notifier_call_chain at ffffffff815222f5
          /usr/src/debug/kernel-lustre215-2.6.32-279.19.1.el6/linux-2.6.32-279.19.1.el6.x86_64/kernel/notifier.c: 95
          #2 [ffff8820b0c07ee0] atomic_notifier_call_chain at ffffffff8152235a
          /usr/src/debug/kernel-lustre215-2.6.32-279.19.1.el6/linux-2.6.32-279.19.1.el6.x86_64/kernel/notifier.c: 192
          #3 [ffff8820b0c07ef0] notify_die at ffffffff81095f1e
          /usr/src/debug/kernel-lustre215-2.6.32-279.19.1.el6/linux-2.6.32-279.19.1.el6.x86_64/kernel/notifier.c: 573
          #4 [ffff8820b0c07f20] do_nmi at ffffffff8151ff2f
          /usr/src/debug/kernel-lustre215-2.6.32-279.19.1.el6/linux-2.6.32-279.19.1.el6.x86_64/arch/x86/kernel/traps.c: 513
          #5 [ffff8820b0c07f50] nmi at ffffffff8151f7e0
          /usr/src/debug/kernel-lustre215-2.6.32-279.19.1.el6/linux-2.6.32-279.19.1.el6.x86_64/arch/x86_64/kernel/entry.S
          [exception RIP: _spin_lock+33]
          RIP: ffffffff8151f051 RSP: ffff881fac2b3bf0 RFLAGS: 00000297
          RAX: 0000000000000d28 RBX: 0000000000000004 RCX: 0000000000000000
          RDX: 0000000000000d27 RSI: 0000000000000007 RDI: ffffffffa05ada40
          RBP: ffff881fac2b3bf0 R8: 0000000000000246 R9: 0000000000000001
          R10: ffff883343e44cc0 R11: 0000000000000000 R12: 000500000a971b57
          R13: 0000000000003039 R14: 0000000000000000 R15: 000527015f9078c7
          ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
          — <NMI exception stack> —
          #6 [ffff881fac2b3bf0] _spin_lock at ffffffff8151f051
          /usr/src/debug/kernel-lustre215-2.6.32-279.19.1.el6/linux-2.6.32-279.19.1.el6.x86_64/arch/x86/include/asm/spinlock.h: 127
          #7 [ffff881fac2b3bf8] LNetMEAttach at ffffffffa057f99f [lnet]
          /usr/src/redhat/BUILD/lustre-2.1.5/lnet/lnet/lib-me.c: 158
          #8 [ffff881fac2b3c68] ptl_send_rpc at ffffffffa078b532 [ptlrpc]
          /usr/src/redhat/BUILD/lustre-2.1.5/lustre/ptlrpc/niobuf.c: 614
          #9 [ffff881fac2b3d28] ptlrpc_send_new_req at ffffffffa0781993 [ptlrpc]
          /usr/src/redhat/BUILD/lustre-2.1.5/lustre/ptlrpc/client.c: 1437
          #10 [ffff881fac2b3d98] ptlrpc_check_set at ffffffffa07840a8 [ptlrpc]
          /usr/src/redhat/BUILD/lustre-2.1.5/lustre/ptlrpc/client.c: 1468
          #11 [ffff881fac2b3e38] ptlrpcd_check at ffffffffa07b4ed0 [ptlrpc]
          /usr/src/redhat/BUILD/lustre-2.1.5/lustre/ptlrpc/ptlrpcd.c: 240
          #12 [ffff881fac2b3e68] ptlrpcd at ffffffffa07b516e [ptlrpc]
          /usr/src/redhat/BUILD/lustre-2.1.5/lustre/ptlrpc/ptlrpcd.c: 326
          #13 [ffff881fac2b3f48] kernel_thread at ffffffff8100c0ca
          /usr/src/debug////////kernel-lustre215-2.6.32-279.19.1.el6/linux-2.6.32-279.19.1.el6.x86_64/arch/x86/kernel/entry_64.S: 1213

          mhanafi Mahmoud Hanafi added a comment - Hope this helps crash> bt 5560 PID: 5560 TASK: ffff881fb5e87540 CPU: 8 COMMAND: "ptlrpcd" #0 [ffff8820b0c07e90] crash_nmi_callback at ffffffff81029956 #1 [ffff8820b0c07ea0] notifier_call_chain at ffffffff815222f5 #2 [ffff8820b0c07ee0] atomic_notifier_call_chain at ffffffff8152235a #3 [ffff8820b0c07ef0] notify_die at ffffffff81095f1e #4 [ffff8820b0c07f20] do_nmi at ffffffff8151ff2f #5 [ffff8820b0c07f50] nmi at ffffffff8151f7e0 [exception RIP: _spin_lock+33] RIP: ffffffff8151f051 RSP: ffff881fac2b3bf0 RFLAGS: 00000297 RAX: 0000000000000d28 RBX: 0000000000000004 RCX: 0000000000000000 RDX: 0000000000000d27 RSI: 0000000000000007 RDI: ffffffffa05ada40 RBP: ffff881fac2b3bf0 R8: 0000000000000246 R9: 0000000000000001 R10: ffff883343e44cc0 R11: 0000000000000000 R12: 000500000a971b57 R13: 0000000000003039 R14: 0000000000000000 R15: 000527015f9078c7 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 — <NMI exception stack> — #6 [ffff881fac2b3bf0] _spin_lock at ffffffff8151f051 #7 [ffff881fac2b3bf8] LNetMEAttach at ffffffffa057f99f [lnet] #8 [ffff881fac2b3c68] ptl_send_rpc at ffffffffa078b532 [ptlrpc] #9 [ffff881fac2b3d28] ptlrpc_send_new_req at ffffffffa0781993 [ptlrpc] #10 [ffff881fac2b3d98] ptlrpc_check_set at ffffffffa07840a8 [ptlrpc] #11 [ffff881fac2b3e38] ptlrpcd_check at ffffffffa07b4ed0 [ptlrpc] #12 [ffff881fac2b3e68] ptlrpcd at ffffffffa07b516e [ptlrpc] #13 [ffff881fac2b3f48] kernel_thread at ffffffff8100c0ca crash> bt 5560 -l PID: 5560 TASK: ffff881fb5e87540 CPU: 8 COMMAND: "ptlrpcd" #0 [ffff8820b0c07e90] crash_nmi_callback at ffffffff81029956 /usr/src/debug/kernel-lustre215-2.6.32-279.19.1.el6/linux-2.6.32-279.19.1.el6.x86_64/arch/x86/include/asm/paravirt.h: 115 #1 [ffff8820b0c07ea0] notifier_call_chain at ffffffff815222f5 /usr/src/debug/kernel-lustre215-2.6.32-279.19.1.el6/linux-2.6.32-279.19.1.el6.x86_64/kernel/notifier.c: 95 #2 [ffff8820b0c07ee0] atomic_notifier_call_chain at ffffffff8152235a /usr/src/debug/kernel-lustre215-2.6.32-279.19.1.el6/linux-2.6.32-279.19.1.el6.x86_64/kernel/notifier.c: 192 #3 [ffff8820b0c07ef0] notify_die at ffffffff81095f1e /usr/src/debug/kernel-lustre215-2.6.32-279.19.1.el6/linux-2.6.32-279.19.1.el6.x86_64/kernel/notifier.c: 573 #4 [ffff8820b0c07f20] do_nmi at ffffffff8151ff2f /usr/src/debug/kernel-lustre215-2.6.32-279.19.1.el6/linux-2.6.32-279.19.1.el6.x86_64/arch/x86/kernel/traps.c: 513 #5 [ffff8820b0c07f50] nmi at ffffffff8151f7e0 /usr/src/debug/kernel-lustre215-2.6.32-279.19.1.el6/linux-2.6.32-279.19.1.el6.x86_64/arch/x86_64/kernel/entry.S [exception RIP: _spin_lock+33] RIP: ffffffff8151f051 RSP: ffff881fac2b3bf0 RFLAGS: 00000297 RAX: 0000000000000d28 RBX: 0000000000000004 RCX: 0000000000000000 RDX: 0000000000000d27 RSI: 0000000000000007 RDI: ffffffffa05ada40 RBP: ffff881fac2b3bf0 R8: 0000000000000246 R9: 0000000000000001 R10: ffff883343e44cc0 R11: 0000000000000000 R12: 000500000a971b57 R13: 0000000000003039 R14: 0000000000000000 R15: 000527015f9078c7 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 — <NMI exception stack> — #6 [ffff881fac2b3bf0] _spin_lock at ffffffff8151f051 /usr/src/debug/kernel-lustre215-2.6.32-279.19.1.el6/linux-2.6.32-279.19.1.el6.x86_64/arch/x86/include/asm/spinlock.h: 127 #7 [ffff881fac2b3bf8] LNetMEAttach at ffffffffa057f99f [lnet] /usr/src/redhat/BUILD/lustre-2.1.5/lnet/lnet/lib-me.c: 158 #8 [ffff881fac2b3c68] ptl_send_rpc at ffffffffa078b532 [ptlrpc] /usr/src/redhat/BUILD/lustre-2.1.5/lustre/ptlrpc/niobuf.c: 614 #9 [ffff881fac2b3d28] ptlrpc_send_new_req at ffffffffa0781993 [ptlrpc] /usr/src/redhat/BUILD/lustre-2.1.5/lustre/ptlrpc/client.c: 1437 #10 [ffff881fac2b3d98] ptlrpc_check_set at ffffffffa07840a8 [ptlrpc] /usr/src/redhat/BUILD/lustre-2.1.5/lustre/ptlrpc/client.c: 1468 #11 [ffff881fac2b3e38] ptlrpcd_check at ffffffffa07b4ed0 [ptlrpc] /usr/src/redhat/BUILD/lustre-2.1.5/lustre/ptlrpc/ptlrpcd.c: 240 #12 [ffff881fac2b3e68] ptlrpcd at ffffffffa07b516e [ptlrpc] /usr/src/redhat/BUILD/lustre-2.1.5/lustre/ptlrpc/ptlrpcd.c: 326 #13 [ffff881fac2b3f48] kernel_thread at ffffffff8100c0ca /usr/src/debug////////kernel-lustre215-2.6.32-279.19.1.el6/linux-2.6.32-279.19.1.el6.x86_64/arch/x86/kernel/entry_64.S: 1213

          Mahmoud, do you know what kind of application/operation caused the problem? I see there are quite lot of transaction commits in the log, looks like heavy load was on mds.

          Is it possible to get a full stack trace (especially the stack trace for ptlrpcd) by sysrq when this happened again? Thanks.

          niu Niu Yawei (Inactive) added a comment - Mahmoud, do you know what kind of application/operation caused the problem? I see there are quite lot of transaction commits in the log, looks like heavy load was on mds. Is it possible to get a full stack trace (especially the stack trace for ptlrpcd) by sysrq when this happened again? Thanks.
          pjones Peter Jones added a comment -

          Lai

          What do you advise here?

          Peter

          pjones Peter Jones added a comment - Lai What do you advise here? Peter

          People

            ashehata Amir Shehata (Inactive)
            mhanafi Mahmoud Hanafi
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: