[LU-10402] Service thread hung at jbd2_journal_start Created: 15/Dec/17 Updated: 16/Jun/18 Resolved: 16/Jun/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Mahmoud Hanafi | Assignee: | Yang Sheng |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Lustre 2.7.3 fe |
||
| Attachments: |
|
| Severity: | 2 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
OSS started to become unresponsive with lots of strack traces. First stack trace was 4>LNet: Service thread pid 30365 was inactive for 962.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: <4>LNet: Skipped 4 previous similar messages <4>Pid: 30365, comm: ll_ost_io00_100 <4> <4>Call Trace: <4> [<ffffffff810a3f5e>] ? prepare_to_wait+0x4e/0x80 <4> [<ffffffffa0df0fca>] start_this_handle+0x25a/0x480 [jbd2] <4> [<ffffffff810a3c30>] ? autoremove_wake_function+0x0/0x40 <4> [<ffffffffa0df13d5>] jbd2_journal_start+0xb5/0x100 [jbd2] <4> [<ffffffffa0e49b86>] ldiskfs_journal_start_sb+0x56/0xe0 [ldiskfs] <4> [<ffffffffa0f08ebf>] osd_trans_start+0x1df/0x660 [osd_ldiskfs] <4> [<ffffffffa10ac4e5>] ofd_write_attr_set+0x2c5/0x8c0 [ofd] <4> [<ffffffffa10ad4c6>] ofd_commitrw_write+0x256/0x11a0 [ofd] <4> [<ffffffffa10b47ad>] ? ofd_fmd_find_nolock+0xad/0xd0 [ofd] <4> [<ffffffffa10ae9c3>] ofd_commitrw+0x5b3/0xba0 [ofd] <4> [<ffffffffa07045a1>] ? lprocfs_counter_add+0x151/0x1c0 [obdclass] <4> [<ffffffffa09b438d>] obd_commitrw.clone.0+0x11d/0x390 [ptlrpc] <4> [<ffffffffa09bc299>] tgt_brw_write+0xc69/0x1520 [ptlrpc] <4> [<ffffffffa090dd10>] ? target_bulk_timeout+0x0/0xc0 [ptlrpc] <4> [<ffffffffa09baece>] tgt_request_handle+0x8be/0x1020 [ptlrpc] <4> [<ffffffffa0964ca1>] ptlrpc_main+0xf41/0x1a80 [ptlrpc] <4> [<ffffffffa0963d60>] ? ptlrpc_main+0x0/0x1a80 [ptlrpc] <4> [<ffffffff810a379e>] kthread+0x9e/0xc0 <4> [<ffffffff8100c28a>] child_rip+0xa/0x20 <4> [<ffffffff810a3700>] ? kthread+0x0/0xc0 <4> [<ffffffff8100c280>] ? child_rip+0x0/0x20 I will attach bt for all threads. Is this a dup of |
| Comments |
| Comment by Peter Jones [ 18/Dec/17 ] |
|
Yang Sheng Can you please look into this one? Thanks Peter |
| Comment by Yang Sheng [ 19/Dec/17 ] |
|
From stack trace: ID: 22328 TASK: ffff881b2cbed520 CPU: 5 COMMAND: "ll_ost00_003" #0 [ffff881b2cbf3630] schedule at ffffffff81581292 #1 [ffff881b2cbf3708] __wait_on_freeing_inode at ffffffff811b1f78 #2 [ffff881b2cbf3778] find_inode_fast at ffffffff811b1ff8 #3 [ffff881b2cbf37a8] ifind_fast at ffffffff811b315c #4 [ffff881b2cbf37d8] iget_locked at ffffffff811b33f9 #5 [ffff881b2cbf3818] ldiskfs_iget at ffffffffa0e247b7 [ldiskfs] #6 [ffff881b2cbf3888] osd_iget at ffffffffa0f04d4e [osd_ldiskfs] #7 [ffff881b2cbf38b8] osd_obj_map_lookup at ffffffffa0f34743 [osd_ldiskfs] #8 [ffff881b2cbf3938] osd_oi_lookup at ffffffffa0f2117a [osd_ldiskfs] #9 [ffff881b2cbf3968] osd_object_init at ffffffffa0f16499 [osd_ldiskfs] #10 [ffff881b2cbf3a48] lu_object_alloc at ffffffffa0729e18 [obdclass] #11 [ffff881b2cbf3aa8] lu_object_find_try at ffffffffa072b361 [obdclass] #12 [ffff881b2cbf3b38] lu_object_find_at at ffffffffa072b521 [obdclass] #13 [ffff881b2cbf3bc8] lu_object_find at ffffffffa072b566 [obdclass] #14 [ffff881b2cbf3bd8] ofd_object_find at ffffffffa10a5aa5 [ofd] #15 [ffff881b2cbf3c08] ofd_lvbo_init at ffffffffa10b977f [ofd] #16 [ffff881b2cbf3cb8] ldlm_handle_enqueue0 at ffffffffa09308dd [ptlrpc] #17 [ffff881b2cbf3d28] tgt_enqueue at ffffffffa09b9eb1 [ptlrpc] #18 [ffff881b2cbf3d48] tgt_request_handle at ffffffffa09baece [ptlrpc] #19 [ffff881b2cbf3da8] ptlrpc_main at ffffffffa0964ca1 [ptlrpc] #20 [ffff881b2cbf3ee8] kthread at ffffffff810a379e #21 [ffff881b2cbf3f48] kernel_thread at ffffffff8 ............... PID: 61556 TASK: ffff8804b009a040 CPU: 8 COMMAND: "perfquery" #0 [ffff8804b009f5a8] schedule at ffffffff81581292 #1 [ffff8804b009f680] start_this_handle at ffffffffa0df0fca [jbd2] #2 [ffff8804b009f740] jbd2_journal_start at ffffffffa0df13d5 [jbd2] #3 [ffff8804b009f780] ldiskfs_journal_start_sb at ffffffffa0e49b86 [ldiskfs] #4 [ffff8804b009f7a0] ldiskfs_dquot_drop at ffffffffa0e49f15 [ldiskfs] #5 [ffff8804b009f7d0] vfs_dq_drop at ffffffff811f7ca2 #6 [ffff8804b009f7e0] clear_inode at ffffffff811b2623 #7 [ffff8804b009f800] dispose_list at ffffffff811b2710 #8 [ffff8804b009f840] shrink_icache_memory at ffffffff811b2a64 #9 [ffff8804b009f8a0] shrink_slab at ffffffff8114253a #10 [ffff8804b009f900] do_try_to_free_pages at ffffffff811448df #11 [ffff8804b009f9a0] try_to_free_pages at ffffffff81144d85 #12 [ffff8804b009fa50] __alloc_pages_nodemask at ffffffff81138d8d #13 [ffff8804b009fba0] alloc_pages_current at ffffffff8117255a #14 [ffff8804b009fbd0] __get_free_pages at ffffffff8113655e #15 [ffff8804b009fbe0] get_zeroed_page at ffffffff811365b6 #16 [ffff8804b009fbf0] sysfs_follow_link at ffffffff81215136 #17 [ffff8804b009fc50] __link_path_walk at ffffffff811a4f26 #18 [ffff8804b009fd30] path_walk at ffffffff811a5e0a #19 [ffff8804b009fd70] filename_lookup at ffffffff811a601b #20 [ffff8804b009fdb0] do_filp_open at ffffffff811a74f4 #21 [ffff8804b009ff20] do_sys_open at ffffffff81191607 #22 [ffff8804b009ff70] sys_open at ffffffff81191710 #23 [ffff8804b009ff80] system_call_fastpath at ffffffff8100b0d2 It is really duplicated with Thanks, |
| Comment by Peter Jones [ 19/Dec/17 ] |
|
As per Alex, |
| Comment by Mahmoud Hanafi [ 14/Jun/18 ] |
|
This can be closed |
| Comment by Peter Jones [ 16/Jun/18 ] |
|
Thanks Mahmoud |