Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7808

Service threads hung at ptlrpc_abort_bulk

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Critical
    • None
    • Lustre 2.5.3
    • None
    • 4
    • 9223372036854775807

    Description

      Server threads getting hung. Some time they will clear by them self but not always.

      <3>LustreError: 26311:0:(ldlm_lib.c:2715:target_bulk_io()) @@@ timeout on bulk PUT after 100+0s  req@ffff881cd7d35800 x1526640544175816/t0(0) o3->d13c09b7-cb83-8238-fde5-d86e0a048f3a@10.151.63.50@o2ib:0/0 lens 488/432 e 1 to 0 dl 1456265702 ref 1 fl Interpret:/0/0 rc 0/0
      <3>LustreError: 26311:0:(ldlm_lib.c:2715:target_bulk_io()) Skipped 62 previous similar messages
      <6>LNet: 2454:0:(o2iblnd_cb.c:1937:kiblnd_close_conn_locked()) Closing conn to 10.151.27.36@o2ib: error 0(waiting)
      <6>LNet: 2454:0:(o2iblnd_cb.c:1937:kiblnd_close_conn_locked()) Skipped 3 previous similar messages
      <3>LustreError: 0:0:(ldlm_lockd.c:346:waiting_locks_callback()) ### lock callback timer expired after 225s: evicting client at 10.151.55.224@o2ib  ns: filter-nbp8-OST00e0_UUID lock: ffff880b6d477100/0xc9935e7dda65f272 lrc: 4/0,0 mode: PW/PW res: [0x1fff69f:0x0:0x0].0 rrc: 32 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x60000000000020 nid: 10.151.55.224@o2ib remote: 0x18f2b8c9e47dad96 expref: 4 pid: 2516 timeout: 4704175646 lvb_type: 0
      <3>LustreError: 2513:0:(ldlm_lockd.c:664:ldlm_handle_ast_error()) ### client (nid 10.151.55.224@o2ib) returned 0 from completion AST ns: filter-nbp8-OST00e0_UUID lock: ffff880b6d477100/0xc9935e7dda65f272 lrc: 6/0,0 mode: PW/PW res: [0x1fff69f:0x0:0x0].0 rrc: 33 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x60000000000020 nid: 10.151.55.224@o2ib remote: 0x18f2b8c9e47dad96 expref: 4 pid: 2516 timeout: 4704175646 lvb_type: 0
      <3>LustreError: 0:0:(ldlm_lockd.c:346:waiting_locks_callback()) ### lock callback timer expired after 225s: evicting client at 10.151.17.216@o2ib  ns: filter-nbp8-OST012e_UUID lock: ffff8805c4f78180/0xc9935e7dda65e30d lrc: 4/0,0 mode: PW/PW res: [0x1ff164b:0x0:0x0].0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0x60000000010020 nid: 10.151.17.216@o2ib remote: 0x179b912da7369126 expref: 4 pid: 2472 timeout: 4704191266 lvb_type: 0
      <6>LNet: 2454:0:(o2iblnd_cb.c:1937:kiblnd_close_conn_locked()) Closing conn to 10.151.27.24@o2ib: error 0(waiting)
      <6>LNet: 2454:0:(o2iblnd_cb.c:1937:kiblnd_close_conn_locked()) Skipped 5 previous similar messages
      <4>LNet: Service thread pid 28275 was inactive for 300.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      <4>Pid: 28275, comm: ll_ost_io02_091
      <4>
      <4>Call Trace:
      <4> [<ffffffff810867ec>] ? lock_timer_base+0x3c/0x70
      <4> [<ffffffff81566572>] schedule_timeout+0x192/0x2e0
      <4> [<ffffffff81086900>] ? process_timeout+0x0/0x10
      <4> [<ffffffffa0798c58>] ptlrpc_abort_bulk+0x158/0x2e0 [ptlrpc]
      <4> [<ffffffff81064f90>] ? default_wake_function+0x0/0x20
      <4> [<ffffffffa0765694>] target_bulk_io+0x474/0x980 [ptlrpc]
      <4> [<ffffffffa04ce200>] ? cfs_crypto_hash_final+0x30/0x80 [libcfs]
      <4> [<ffffffff81064f90>] ? default_wake_function+0x0/0x20
      <4> [<ffffffffa150287c>] ost_brw_read+0x103c/0x1350 [ost]
      <4> [<ffffffff812b9076>] ? vsnprintf+0x336/0x5e0
      <4> [<ffffffffa075e500>] ? target_bulk_timeout+0x0/0xc0 [ptlrpc]
      <4> [<ffffffffa04d47c3>] ? libcfs_debug_vmsg2+0x5d3/0xbd0 [libcfs]
      <4> [<ffffffffa079ff8c>] ? lustre_msg_get_version+0x8c/0x100 [ptlrpc]
      <4> [<ffffffffa07a00e8>] ? lustre_msg_check_version+0xe8/0x100 [ptlrpc]
      <4> [<ffffffffa15096d8>] ost_handle+0x24a8/0x44d0 [ost]
      <4> [<ffffffffa04d0a44>] ? libcfs_id2str+0x74/0xb0 [libcfs]
      <4> [<ffffffffa07ae0c5>] ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
      <4> [<ffffffffa04d68d5>] ? lc_watchdog_touch+0x65/0x170 [libcfs]
      <4> [<ffffffffa07a6a69>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
      <4> [<ffffffffa07b089d>] ptlrpc_main+0xafd/0x1780 [ptlrpc]
      <4> [<ffffffff8100c28a>] child_rip+0xa/0x20
      <4> [<ffffffffa07afda0>] ? ptlrpc_main+0x0/0x1780 [ptlrpc]
      <4> [<ffffffff8100c280>] ? child_rip+0x0/0x20
      <4>
      

      I am attaching backtrace and lustre debug dump.

      Attachments

        Activity

          People

            delbaryg DELBARY Gael
            mhanafi Mahmoud Hanafi
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: