Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
Lustre 2.7.0
-
None
-
Lustre 2.7.3 fe
-
2
-
9223372036854775807
Description
OSS started to become unresponsive with lots of strack traces.
First stack trace was
4>LNet: Service thread pid 30365 was inactive for 962.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: <4>LNet: Skipped 4 previous similar messages <4>Pid: 30365, comm: ll_ost_io00_100 <4> <4>Call Trace: <4> [<ffffffff810a3f5e>] ? prepare_to_wait+0x4e/0x80 <4> [<ffffffffa0df0fca>] start_this_handle+0x25a/0x480 [jbd2] <4> [<ffffffff810a3c30>] ? autoremove_wake_function+0x0/0x40 <4> [<ffffffffa0df13d5>] jbd2_journal_start+0xb5/0x100 [jbd2] <4> [<ffffffffa0e49b86>] ldiskfs_journal_start_sb+0x56/0xe0 [ldiskfs] <4> [<ffffffffa0f08ebf>] osd_trans_start+0x1df/0x660 [osd_ldiskfs] <4> [<ffffffffa10ac4e5>] ofd_write_attr_set+0x2c5/0x8c0 [ofd] <4> [<ffffffffa10ad4c6>] ofd_commitrw_write+0x256/0x11a0 [ofd] <4> [<ffffffffa10b47ad>] ? ofd_fmd_find_nolock+0xad/0xd0 [ofd] <4> [<ffffffffa10ae9c3>] ofd_commitrw+0x5b3/0xba0 [ofd] <4> [<ffffffffa07045a1>] ? lprocfs_counter_add+0x151/0x1c0 [obdclass] <4> [<ffffffffa09b438d>] obd_commitrw.clone.0+0x11d/0x390 [ptlrpc] <4> [<ffffffffa09bc299>] tgt_brw_write+0xc69/0x1520 [ptlrpc] <4> [<ffffffffa090dd10>] ? target_bulk_timeout+0x0/0xc0 [ptlrpc] <4> [<ffffffffa09baece>] tgt_request_handle+0x8be/0x1020 [ptlrpc] <4> [<ffffffffa0964ca1>] ptlrpc_main+0xf41/0x1a80 [ptlrpc] <4> [<ffffffffa0963d60>] ? ptlrpc_main+0x0/0x1a80 [ptlrpc] <4> [<ffffffff810a379e>] kthread+0x9e/0xc0 <4> [<ffffffff8100c28a>] child_rip+0xa/0x20 <4> [<ffffffff810a3700>] ? kthread+0x0/0xc0 <4> [<ffffffff8100c280>] ? child_rip+0x0/0x20
I will attach bt for all threads.
Is this a dup of LU-6918?