Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10402

Service thread hung at jbd2_journal_start

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • None
    • Lustre 2.7.0
    • None
    • Lustre 2.7.3 fe
    • 2
    • 9223372036854775807

    Description

      OSS started to become unresponsive with lots of strack traces.

      First stack trace was

      4>LNet: Service thread pid 30365 was inactive for 962.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      <4>LNet: Skipped 4 previous similar messages
      <4>Pid: 30365, comm: ll_ost_io00_100
      <4>
      <4>Call Trace:
      <4> [<ffffffff810a3f5e>] ? prepare_to_wait+0x4e/0x80
      <4> [<ffffffffa0df0fca>] start_this_handle+0x25a/0x480 [jbd2]
      <4> [<ffffffff810a3c30>] ? autoremove_wake_function+0x0/0x40
      <4> [<ffffffffa0df13d5>] jbd2_journal_start+0xb5/0x100 [jbd2]
      <4> [<ffffffffa0e49b86>] ldiskfs_journal_start_sb+0x56/0xe0 [ldiskfs]
      <4> [<ffffffffa0f08ebf>] osd_trans_start+0x1df/0x660 [osd_ldiskfs]
      <4> [<ffffffffa10ac4e5>] ofd_write_attr_set+0x2c5/0x8c0 [ofd]
      <4> [<ffffffffa10ad4c6>] ofd_commitrw_write+0x256/0x11a0 [ofd]
      <4> [<ffffffffa10b47ad>] ? ofd_fmd_find_nolock+0xad/0xd0 [ofd]
      <4> [<ffffffffa10ae9c3>] ofd_commitrw+0x5b3/0xba0 [ofd]
      <4> [<ffffffffa07045a1>] ? lprocfs_counter_add+0x151/0x1c0 [obdclass]
      <4> [<ffffffffa09b438d>] obd_commitrw.clone.0+0x11d/0x390 [ptlrpc]
      <4> [<ffffffffa09bc299>] tgt_brw_write+0xc69/0x1520 [ptlrpc]
      <4> [<ffffffffa090dd10>] ? target_bulk_timeout+0x0/0xc0 [ptlrpc]
      <4> [<ffffffffa09baece>] tgt_request_handle+0x8be/0x1020 [ptlrpc]
      <4> [<ffffffffa0964ca1>] ptlrpc_main+0xf41/0x1a80 [ptlrpc]
      <4> [<ffffffffa0963d60>] ? ptlrpc_main+0x0/0x1a80 [ptlrpc]
      <4> [<ffffffff810a379e>] kthread+0x9e/0xc0
      <4> [<ffffffff8100c28a>] child_rip+0xa/0x20
      <4> [<ffffffff810a3700>] ? kthread+0x0/0xc0
      <4> [<ffffffff8100c280>] ? child_rip+0x0/0x20
      

      I will attach bt for all threads.

      Is this a dup of LU-6918?

      Attachments

        Activity

          People

            ys Yang Sheng
            mhanafi Mahmoud Hanafi
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: