Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1578

ll_ost_io_xx thread hangs and OST status get into down

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 1.8.7
    • None
    • RHEL5.7
    • 3
    • 6370

    Description

      At the our customer site, some of ll_ost_io_xx threads hang and OST's status get into "DOWN" with "lctl dl".
      We saw the following Call traces on these OSSs during the time.

      Jun 28 16:22:18 nos011i kernel: Pid: 32708, comm: ll_ost_io_188
      Jun 28 16:22:18 nos011i kernel: 
      Jun 28 16:22:18 nos011i kernel: Call Trace:
      Jun 28 16:22:18 nos011i kernel:  [<ffffffff888a914d>] ldlm_cli_enqueue_local+0x4fd/0x520 [ptlrpc]
      Jun 28 16:22:18 nos011i kernel:  [<ffffffff800645e3>] __down_write_nested+0x7a/0x92
      Jun 28 16:22:18 nos011i kernel:  [<ffffffff88bd7b29>] filter_destroy+0x969/0x1f90 [obdfilter]
      Jun 28 16:22:18 nos011i kernel:  [<ffffffff8876ecfd>] libcfs_debug_vmsg2+0x70d/0x970 [libcfs]
      Jun 28 16:22:18 nos011i kernel:  [<ffffffff888d2cd2>] lustre_pack_reply_flags+0x8e2/0x950 [ptlrpc]
      Jun 28 16:22:18 nos011i kernel:  [<ffffffff888d2d69>] lustre_pack_reply+0x29/0xb0 [ptlrpc]
      Jun 28 16:22:18 nos011i kernel:  [<ffffffff88b89070>] ost_destroy+0x660/0x790 [ost]
      Jun 28 16:22:18 nos011i kernel:  [<ffffffff888ceef5>] lustre_msg_get_opc+0x35/0xf0 [ptlrpc]
      Jun 28 16:22:18 nos011i kernel:  [<ffffffff88b92a46>] ost_handle+0x1556/0x55b0 [ost]
      Jun 28 16:22:18 nos011i kernel:  [<ffffffff888de6d9>] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc]
      Jun 28 16:22:18 nos011i kernel:  [<ffffffff8008cc1e>] __wake_up_common+0x3e/0x68
      Jun 28 16:22:18 nos011i kernel:  [<ffffffff888dfdc6>] ptlrpc_main+0xf66/0x1120 [ptlrpc]
      Jun 28 16:22:18 nos011i kernel:  [<ffffffff8005dfb1>] child_rip+0xa/0x11
      Jun 28 16:22:18 nos011i kernel:  [<ffffffff888dee60>] ptlrpc_main+0x0/0x1120 [ptlrpc]
      Jun 28 16:22:18 nos011i kernel:  [<ffffffff8005dfa7>] child_rip+0x0/0x11
      Jun 28 16:22:18 nos011i kernel: 
      Jun 28 16:22:18 nos011i kernel: Lustre: Service thread pid 8870 was inactive for 200.00s. The thread might be hung, or it might
       only be slow and will resume later. Dumping the stack trace for debugging purposes:
      Jun 28 16:22:18 nos011i kernel: LustreError: dumping log to /tmp/lustre-log.1340868138.32708
      Jun 28 16:22:18 nos011i kernel: Pid: 8870, comm: ll_ost_io_37
      Jun 28 16:22:20 nos011i kernel: 
      Jun 28 16:22:20 nos011i kernel: Call Trace:
      Jun 28 16:22:21 nos011i kernel:  [<ffffffff800645e3>] __down_write_nested+0x7a/0x92
      Jun 28 16:22:21 nos011i kernel:  [<ffffffff88bd7b29>] filter_destroy+0x969/0x1f90 [obdfilter]
      Jun 28 16:22:21 nos011i kernel:  [<ffffffff8876ecfd>] libcfs_debug_vmsg2+0x70d/0x970 [libcfs]
      Jun 28 16:22:21 nos011i kernel:  [<ffffffff888d2cd2>] lustre_pack_reply_flags+0x8e2/0x950 [ptlrpc]
      Jun 28 16:22:21 nos011i kernel:  [<ffffffff888d2d69>] lustre_pack_reply+0x29/0xb0 [ptlrpc]
      Jun 28 16:22:21 nos011i kernel:  [<ffffffff88b89070>] ost_destroy+0x660/0x790 [ost]
      Jun 28 16:22:21 nos011i kernel:  [<ffffffff888ceef5>] lustre_msg_get_opc+0x35/0xf0 [ptlrpc]
      Jun 28 16:22:21 nos011i kernel:  [<ffffffff88b92a46>] ost_handle+0x1556/0x55b0 [ost]
      Jun 28 16:22:21 nos011i kernel:  [<ffffffff888de6d9>] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc]
      Jun 28 16:22:21 nos011i kernel:  [<ffffffff8008cc1e>] __wake_up_common+0x3e/0x68
      Jun 28 16:22:21 nos011i kernel:  [<ffffffff888dfdc6>] ptlrpc_main+0xf66/0x1120 [ptlrpc]
      Jun 28 16:22:21 nos011i kernel:  [<ffffffff8005dfb1>] child_rip+0xa/0x11
      Jun 28 16:22:21 nos011i kernel:  [<ffffffff888dee60>] ptlrpc_main+0x0/0x1120 [ptlrpc]
      Jun 28 16:22:21 nos011i kernel:  [<ffffffff8005dfa7>] child_rip+0x0/0x11
      Jun 28 16:22:21 nos011i kernel: 
      

      Attachments

        Issue Links

          Activity

            People

              niu Niu Yawei (Inactive)
              ihara Shuichi Ihara (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: