Details
-
Bug
-
Resolution: Incomplete
-
Critical
-
None
-
Lustre 2.10.4, Lustre 2.10.5, Lustre 2.10.6
-
None
-
CentOS 7, 3.10.0-862.2.3.el7_lustre.x86_64, 1 MDS (+1 HA pair), 4 OSS
-
3
-
9223372036854775807
Description
After an upgrade of Lustre servers from Centos 6 and IEEL/DDN version of Lustre 2.5 to Centos 7 and Lustre 2.10.5 we are experiencing stability issues related to and MDS/OSS deadlock, most likely caused by hung OST thread. The issue is visible on one of two filesystems, the smaller one, both are working on identical HW and SW stack, so the issue is most likely related to a specific workload on the filesystem. This is mostly an midterm storage filesystem, so more metadata ops happen on it than on the scratch one. We initially thought that the issue might be caused by our try to enable project quotas (tune2fs -O project on all targets), so we turned it off (tune2fs -O ^project), which didn't change a thing. Also all combinations of MDT/OST quota enforcement (initally it was 'g', than 'gp', now 'none') doesn't change a thing, so probably it is not related to the problem. In terms of relevant information, we run on sync_journal=1 as a workaround to a problem with memory accounting on the clients suggested by DDN some time ago. The crash happens at least once per few hours, sometimes sequentially one after another, on different OSS nodes. I also tagged 2.10.6 as affected, as we tried everything from 2.10.4 to current b2_10.
On OSS we get:
[Tue Oct 2 11:56:44 2018] LNet: Service thread pid 11401 was inactive for 200.27s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [Tue Oct 2 11:56:44 2018] Pid: 11401, comm: ll_ost_out00_00 3.10.0-862.2.3.el7_lustre.x86_64 #1 SMP Tue May 22 17:36:23 UTC 2018 [Tue Oct 2 11:56:44 2018] Call Trace: [Tue Oct 2 11:56:44 2018] [<ffffffff8395aa77>] call_rwsem_down_write_failed+0x17/0x30 [Tue Oct 2 11:56:44 2018] [<ffffffffc0eeaa8c>] osd_write_lock+0x5c/0xe0 [osd_ldiskfs] [Tue Oct 2 11:56:44 2018] [<ffffffffc0b5b119>] out_tx_attr_set_exec+0x69/0x3f0 [ptlrpc] [Tue Oct 2 11:56:44 2018] [<ffffffffc0b55591>] out_tx_end+0xe1/0x5c0 [ptlrpc] [Tue Oct 2 11:56:44 2018] [<ffffffffc0b596d3>] out_handle+0x1453/0x1bc0 [ptlrpc] [Tue Oct 2 11:56:44 2018] [<ffffffffc0b4f38a>] tgt_request_handle+0x92a/0x1370 [ptlrpc] [Tue Oct 2 11:56:44 2018] [<ffffffffc0af7e4b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc] [Tue Oct 2 11:56:44 2018] [<ffffffffc0afb592>] ptlrpc_main+0xa92/0x1e40 [ptlrpc] [Tue Oct 2 11:56:44 2018] [<ffffffff836bae31>] kthread+0xd1/0xe0 [Tue Oct 2 11:56:44 2018] [<ffffffff83d1f5f7>] ret_from_fork_nospec_end+0x0/0x39 [Tue Oct 2 11:56:44 2018] [<ffffffffffffffff>] 0xffffffffffffffff [Tue Oct 2 11:56:44 2018] LustreError: dumping log to /tmp/lustre-log.1538474204.11401 [Tue Oct 2 11:56:44 2018] Pid: 35664, comm: ll_ost_io00_098 3.10.0-862.2.3.el7_lustre.x86_64 #1 SMP Tue May 22 17:36:23 UTC 2018 [Tue Oct 2 11:56:44 2018] Call Trace: [Tue Oct 2 11:56:44 2018] [<ffffffffc0e7e495>] jbd2_log_wait_commit+0xc5/0x140 [jbd2] [Tue Oct 2 11:56:44 2018] [<ffffffffc0e76533>] jbd2_journal_stop+0x343/0x3d0 [jbd2] [Tue Oct 2 11:56:44 2018] [<ffffffffc112ab4c>] __ldiskfs_journal_stop+0x3c/0xb0 [ldiskfs] [Tue Oct 2 11:56:44 2018] [<ffffffffc0efb783>] osd_trans_stop+0x183/0x850 [osd_ldiskfs] [Tue Oct 2 11:56:44 2018] [<ffffffffc107d352>] ofd_trans_stop+0x22/0x60 [ofd] [Tue Oct 2 11:56:44 2018] [<ffffffffc10835f4>] ofd_commitrw_write+0x7e4/0x1c90 [ofd] [Tue Oct 2 11:56:44 2018] [<ffffffffc10877a9>] ofd_commitrw+0x4c9/0xae0 [ofd] [Tue Oct 2 11:56:44 2018] [<ffffffffc0b80864>] obd_commitrw+0x2f3/0x336 [ptlrpc] [Tue Oct 2 11:56:44 2018] [<ffffffffc0b5338d>] tgt_brw_write+0xffd/0x17d0 [ptlrpc] [Tue Oct 2 11:56:44 2018] [<ffffffffc0b4f38a>] tgt_request_handle+0x92a/0x1370 [ptlrpc] [Tue Oct 2 11:56:44 2018] [<ffffffffc0af7e4b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc] [Tue Oct 2 11:56:44 2018] [<ffffffffc0afb592>] ptlrpc_main+0xa92/0x1e40 [ptlrpc] [Tue Oct 2 11:56:44 2018] [<ffffffff836bae31>] kthread+0xd1/0xe0 [Tue Oct 2 11:56:45 2018] [<ffffffff83d1f5f7>] ret_from_fork_nospec_end+0x0/0x39 [Tue Oct 2 11:56:45 2018] [<ffffffffffffffff>] 0xffffffffffffffff [Tue Oct 2 11:56:45 2018] Pid: 35678, comm: ll_ost_io00_112 3.10.0-862.2.3.el7_lustre.x86_64 #1 SMP Tue May 22 17:36:23 UTC 2018 [Tue Oct 2 11:56:45 2018] Call Trace: [Tue Oct 2 11:56:45 2018] [<ffffffffc0e7e495>] jbd2_log_wait_commit+0xc5/0x140 [jbd2] [Tue Oct 2 11:56:45 2018] [<ffffffffc0e76533>] jbd2_journal_stop+0x343/0x3d0 [jbd2] [Tue Oct 2 11:56:45 2018] [<ffffffffc112ab4c>] __ldiskfs_journal_stop+0x3c/0xb0 [ldiskfs] [Tue Oct 2 11:56:45 2018] [<ffffffffc0efb783>] osd_trans_stop+0x183/0x850 [osd_ldiskfs] [Tue Oct 2 11:56:45 2018] [<ffffffffc107d352>] ofd_trans_stop+0x22/0x60 [ofd] [Tue Oct 2 11:56:45 2018] [<ffffffffc10835f4>] ofd_commitrw_write+0x7e4/0x1c90 [ofd] [Tue Oct 2 11:56:45 2018] [<ffffffffc10877a9>] ofd_commitrw+0x4c9/0xae0 [ofd] [Tue Oct 2 11:56:45 2018] [<ffffffffc0b80864>] obd_commitrw+0x2f3/0x336 [ptlrpc] [Tue Oct 2 11:56:45 2018] [<ffffffffc0b5338d>] tgt_brw_write+0xffd/0x17d0 [ptlrpc] [Tue Oct 2 11:56:45 2018] [<ffffffffc0b4f38a>] tgt_request_handle+0x92a/0x1370 [ptlrpc] [Tue Oct 2 11:56:45 2018] [<ffffffffc0af7e4b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc] [Tue Oct 2 11:56:45 2018] [<ffffffffc0afb592>] ptlrpc_main+0xa92/0x1e40 [ptlrpc] [Tue Oct 2 11:56:45 2018] [<ffffffff836bae31>] kthread+0xd1/0xe0 [Tue Oct 2 11:56:45 2018] [<ffffffff83d1f5f7>] ret_from_fork_nospec_end+0x0/0x39 [Tue Oct 2 11:56:45 2018] [<ffffffffffffffff>] 0xffffffffffffffff
On MDS:
[Tue Oct 2 11:56:44 2018] LNet: Service thread pid 69588 was inactive for 200.49s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [Tue Oct 2 11:56:44 2018] Pid: 69588, comm: mdt00_095 3.10.0-862.2.3.el7_lustre.x86_64 #1 SMP Tue May 22 17:36:23 UTC 2018 [Tue Oct 2 11:56:44 2018] Call Trace: [Tue Oct 2 11:56:44 2018] [<ffffffffc0f7e140>] ptlrpc_set_wait+0x4c0/0x920 [ptlrpc] [Tue Oct 2 11:56:44 2018] [<ffffffffc0f7e61d>] ptlrpc_queue_wait+0x7d/0x220 [ptlrpc] [Tue Oct 2 11:56:44 2018] [<ffffffffc14de463>] osp_remote_sync+0xd3/0x200 [osp] [Tue Oct 2 11:56:44 2018] [<ffffffffc14c6dd0>] osp_attr_set+0x4c0/0x5d0 [osp] [Tue Oct 2 11:56:44 2018] [<ffffffffc1421c6b>] lod_sub_attr_set+0x1cb/0x460 [lod] [Tue Oct 2 11:56:44 2018] [<ffffffffc1403be6>] lod_obj_stripe_attr_set_cb+0x16/0x30 [lod] [Tue Oct 2 11:56:44 2018] [<ffffffffc140fa96>] lod_obj_for_each_stripe+0xb6/0x230 [lod] [Tue Oct 2 11:56:44 2018] [<ffffffffc1411043>] lod_attr_set+0x2f3/0x9a0 [lod] [Tue Oct 2 11:56:44 2018] [<ffffffffc14810a0>] mdd_attr_set_internal+0x120/0x2a0 [mdd] [Tue Oct 2 11:56:44 2018] [<ffffffffc1481e8d>] mdd_attr_set+0x8ad/0xce0 [mdd] [Tue Oct 2 11:56:44 2018] [<ffffffffc13645f5>] mdt_reint_setattr+0xba5/0x1060 [mdt] [Tue Oct 2 11:56:44 2018] [<ffffffffc1364b33>] mdt_reint_rec+0x83/0x210 [mdt] [Tue Oct 2 11:56:44 2018] [<ffffffffc134636b>] mdt_reint_internal+0x5fb/0x9c0 [mdt] [Tue Oct 2 11:56:44 2018] [<ffffffffc1351f07>] mdt_reint+0x67/0x140 [mdt] [Tue Oct 2 11:56:44 2018] [<ffffffffc0fee38a>] tgt_request_handle+0x92a/0x1370 [ptlrpc] [Tue Oct 2 11:56:44 2018] [<ffffffffc0f96e4b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc] [Tue Oct 2 11:56:44 2018] [<ffffffffc0f9a592>] ptlrpc_main+0xa92/0x1e40 [ptlrpc] [Tue Oct 2 11:56:44 2018] [<ffffffff946bae31>] kthread+0xd1/0xe0 [Tue Oct 2 11:56:44 2018] [<ffffffff94d1f5f7>] ret_from_fork_nospec_end+0x0/0x39 [Tue Oct 2 11:56:44 2018] [<ffffffffffffffff>] 0xffffffffffffffff [Tue Oct 2 11:56:44 2018] LustreError: dumping log to /tmp/lustre-log.1538474204.69588 [Tue Oct 2 11:56:49 2018] LNet: Service thread pid 7634 was inactive for 200.51s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [Tue Oct 2 11:56:49 2018] Pid: 7634, comm: mdt_rdpg00_000 3.10.0-862.2.3.el7_lustre.x86_64 #1 SMP Tue May 22 17:36:23 UTC 2018 [Tue Oct 2 11:56:49 2018] Call Trace: [Tue Oct 2 11:56:49 2018] [<ffffffffc06ef085>] wait_transaction_locked+0x85/0xd0 [jbd2] [Tue Oct 2 11:56:49 2018] [<ffffffffc06ef368>] add_transaction_credits+0x268/0x2f0 [jbd2] [Tue Oct 2 11:56:49 2018] [<ffffffffc06ef5e1>] start_this_handle+0x1a1/0x430 [jbd2] [Tue Oct 2 11:56:49 2018] [<ffffffffc06efa93>] jbd2__journal_start+0xf3/0x1f0 [jbd2] [Tue Oct 2 11:56:49 2018] [<ffffffffc118aa99>] __ldiskfs_journal_start_sb+0x69/0xe0 [ldiskfs] [Tue Oct 2 11:56:49 2018] [<ffffffffc12cac8e>] osd_trans_start+0x1ae/0x460 [osd_ldiskfs] [Tue Oct 2 11:56:49 2018] [<ffffffffc1002512>] top_trans_start+0x702/0x940 [ptlrpc] [Tue Oct 2 11:56:49 2018] [<ffffffffc13ec3f1>] lod_trans_start+0x31/0x40 [lod] [Tue Oct 2 11:56:49 2018] [<ffffffffc148c1ba>] mdd_trans_start+0x1a/0x20 [mdd] [Tue Oct 2 11:56:49 2018] [<ffffffffc1481b69>] mdd_attr_set+0x589/0xce0 [mdd] [Tue Oct 2 11:56:49 2018] [<ffffffffc136c1d6>] mdt_mfd_close+0x1a6/0x610 [mdt] [Tue Oct 2 11:56:49 2018] [<ffffffffc1371951>] mdt_close_internal+0x121/0x220 [mdt] [Tue Oct 2 11:56:49 2018] [<ffffffffc1371c70>] mdt_close+0x220/0x780 [mdt] [Tue Oct 2 11:56:49 2018] [<ffffffffc0fee38a>] tgt_request_handle+0x92a/0x1370 [ptlrpc] [Tue Oct 2 11:56:49 2018] [<ffffffffc0f96e4b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc] [Tue Oct 2 11:56:49 2018] [<ffffffffc0f9a592>] ptlrpc_main+0xa92/0x1e40 [ptlrpc] [Tue Oct 2 11:56:49 2018] [<ffffffff946bae31>] kthread+0xd1/0xe0 [Tue Oct 2 11:56:49 2018] [<ffffffff94d1f5f7>] ret_from_fork_nospec_end+0x0/0x39 [Tue Oct 2 11:56:49 2018] [<ffffffffffffffff>] 0xffffffffffffffff [Tue Oct 2 11:56:49 2018] LustreError: dumping log to /tmp/lustre-log.1538474210.7634 [Tue Oct 2 11:56:49 2018] Pid: 69508, comm: mdt00_067 3.10.0-862.2.3.el7_lustre.x86_64 #1 SMP Tue May 22 17:36:23 UTC 2018 [Tue Oct 2 11:56:49 2018] Call Trace: [Tue Oct 2 11:56:49 2018] [<ffffffffc06ef085>] wait_transaction_locked+0x85/0xd0 [jbd2] [Tue Oct 2 11:56:49 2018] [<ffffffffc06ef368>] add_transaction_credits+0x268/0x2f0 [jbd2] [Tue Oct 2 11:56:49 2018] [<ffffffffc06ef5e1>] start_this_handle+0x1a1/0x430 [jbd2] [Tue Oct 2 11:56:49 2018] [<ffffffffc06efa93>] jbd2__journal_start+0xf3/0x1f0 [jbd2] [Tue Oct 2 11:56:49 2018] [<ffffffffc118aa99>] __ldiskfs_journal_start_sb+0x69/0xe0 [ldiskfs] [Tue Oct 2 11:56:49 2018] [<ffffffffc12cac8e>] osd_trans_start+0x1ae/0x460 [osd_ldiskfs] [Tue Oct 2 11:56:49 2018] [<ffffffffc1369de7>] mdt_empty_transno+0xf7/0x840 [mdt] [Tue Oct 2 11:56:49 2018] [<ffffffffc136cf1e>] mdt_mfd_open+0x8de/0xe70 [mdt] [Tue Oct 2 11:56:49 2018] [<ffffffffc136da2b>] mdt_finish_open+0x57b/0x690 [mdt] [Tue Oct 2 11:56:49 2018] [<ffffffffc136f308>] mdt_reint_open+0x17c8/0x3190 [mdt] [Tue Oct 2 11:56:49 2018] [<ffffffffc1364b33>] mdt_reint_rec+0x83/0x210 [mdt] [Tue Oct 2 11:56:49 2018] [<ffffffffc134636b>] mdt_reint_internal+0x5fb/0x9c0 [mdt] [Tue Oct 2 11:56:49 2018] [<ffffffffc1346892>] mdt_intent_reint+0x162/0x430 [mdt] [Tue Oct 2 11:56:49 2018] [<ffffffffc1351671>] mdt_intent_policy+0x441/0xc70 [mdt] [Tue Oct 2 11:56:49 2018] [<ffffffffc0f3b2ba>] ldlm_lock_enqueue+0x38a/0x980 [ptlrpc] [Tue Oct 2 11:56:49 2018] [<ffffffffc0f64b53>] ldlm_handle_enqueue0+0x9d3/0x16a0 [ptlrpc] [Tue Oct 2 11:56:49 2018] [<ffffffffc0fea452>] tgt_enqueue+0x62/0x210 [ptlrpc] [Tue Oct 2 11:56:49 2018] [<ffffffffc0fee38a>] tgt_request_handle+0x92a/0x1370 [ptlrpc] [Tue Oct 2 11:56:49 2018] [<ffffffffc0f96e4b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc] [Tue Oct 2 11:56:49 2018] [<ffffffffc0f9a592>] ptlrpc_main+0xa92/0x1e40 [ptlrpc] [Tue Oct 2 11:56:49 2018] [<ffffffff946bae31>] kthread+0xd1/0xe0 [Tue Oct 2 11:56:49 2018] [<ffffffff94d1f5f7>] ret_from_fork_nospec_end+0x0/0x39 [Tue Oct 2 11:56:49 2018] [<ffffffffffffffff>] 0xffffffffffffffff [Tue Oct 2 11:56:49 2018] Pid: 69607, comm: mdt00_105 3.10.0-862.2.3.el7_lustre.x86_64 #1 SMP Tue May 22 17:36:23 UTC 2018 [Tue Oct 2 11:56:49 2018] Call Trace: [Tue Oct 2 11:56:49 2018] [<ffffffffc06ef085>] wait_transaction_locked+0x85/0xd0 [jbd2] [Tue Oct 2 11:56:49 2018] [<ffffffffc06ef368>] add_transaction_credits+0x268/0x2f0 [jbd2] [Tue Oct 2 11:56:49 2018] [<ffffffffc06ef5e1>] start_this_handle+0x1a1/0x430 [jbd2] [Tue Oct 2 11:56:49 2018] [<ffffffffc06efa93>] jbd2__journal_start+0xf3/0x1f0 [jbd2] [Tue Oct 2 11:56:49 2018] [<ffffffffc118aa99>] __ldiskfs_journal_start_sb+0x69/0xe0 [ldiskfs] [Tue Oct 2 11:56:49 2018] [<ffffffffc12cac8e>] osd_trans_start+0x1ae/0x460 [osd_ldiskfs] [Tue Oct 2 11:56:49 2018] [<ffffffffc1369de7>] mdt_empty_transno+0xf7/0x840 [mdt] [Tue Oct 2 11:56:49 2018] [<ffffffffc136cf1e>] mdt_mfd_open+0x8de/0xe70 [mdt] [Tue Oct 2 11:56:49 2018] [<ffffffffc136da2b>] mdt_finish_open+0x57b/0x690 [mdt] [Tue Oct 2 11:56:49 2018] [<ffffffffc136f308>] mdt_reint_open+0x17c8/0x3190 [mdt] [Tue Oct 2 11:56:49 2018] [<ffffffffc1364b33>] mdt_reint_rec+0x83/0x210 [mdt] [Tue Oct 2 11:56:49 2018] [<ffffffffc134636b>] mdt_reint_internal+0x5fb/0x9c0 [mdt] [Tue Oct 2 11:56:49 2018] [<ffffffffc1346892>] mdt_intent_reint+0x162/0x430 [mdt] [Tue Oct 2 11:56:49 2018] [<ffffffffc1351671>] mdt_intent_policy+0x441/0xc70 [mdt] [Tue Oct 2 11:56:49 2018] [<ffffffffc0f3b2ba>] ldlm_lock_enqueue+0x38a/0x980 [ptlrpc] [Tue Oct 2 11:56:49 2018] [<ffffffffc0f64b53>] ldlm_handle_enqueue0+0x9d3/0x16a0 [ptlrpc] [Tue Oct 2 11:56:49 2018] [<ffffffffc0fea452>] tgt_enqueue+0x62/0x210 [ptlrpc] [Tue Oct 2 11:56:49 2018] [<ffffffffc0fee38a>] tgt_request_handle+0x92a/0x1370 [ptlrpc] [Tue Oct 2 11:56:49 2018] [<ffffffffc0f96e4b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc] [Tue Oct 2 11:56:49 2018] [<ffffffffc0f9a592>] ptlrpc_main+0xa92/0x1e40 [ptlrpc] [Tue Oct 2 11:56:49 2018] [<ffffffff946bae31>] kthread+0xd1/0xe0 [Tue Oct 2 11:56:49 2018] [<ffffffff94d1f5f7>] ret_from_fork_nospec_end+0x0/0x39 [Tue Oct 2 11:56:49 2018] [<ffffffffffffffff>] 0xffffffffffffffff [Tue Oct 2 11:56:49 2018] Pid: 69594, comm: mdt00_098 3.10.0-862.2.3.el7_lustre.x86_64 #1 SMP Tue May 22 17:36:23 UTC 2018 [Tue Oct 2 11:56:49 2018] Call Trace: [Tue Oct 2 11:56:49 2018] [<ffffffffc06ef085>] wait_transaction_locked+0x85/0xd0 [jbd2] [Tue Oct 2 11:56:49 2018] [<ffffffffc06ef368>] add_transaction_credits+0x268/0x2f0 [jbd2] [Tue Oct 2 11:56:49 2018] [<ffffffffc06ef5e1>] start_this_handle+0x1a1/0x430 [jbd2] [Tue Oct 2 11:56:49 2018] [<ffffffffc06efa93>] jbd2__journal_start+0xf3/0x1f0 [jbd2] [Tue Oct 2 11:56:49 2018] [<ffffffffc118aa99>] __ldiskfs_journal_start_sb+0x69/0xe0 [ldiskfs] [Tue Oct 2 11:56:49 2018] [<ffffffffc12cac8e>] osd_trans_start+0x1ae/0x460 [osd_ldiskfs] [Tue Oct 2 11:56:49 2018] [<ffffffffc1002512>] top_trans_start+0x702/0x940 [ptlrpc] [Tue Oct 2 11:56:49 2018] [<ffffffffc13ec3f1>] lod_trans_start+0x31/0x40 [lod] [Tue Oct 2 11:56:49 2018] [<ffffffffc148c1ba>] mdd_trans_start+0x1a/0x20 [mdd] [Tue Oct 2 11:56:49 2018] [<ffffffffc14763c0>] mdd_create+0xbe0/0x1400 [mdd] [Tue Oct 2 11:56:49 2018] [<ffffffffc136fcb5>] mdt_reint_open+0x2175/0x3190 [mdt] [Tue Oct 2 11:56:49 2018] [<ffffffffc1364b33>] mdt_reint_rec+0x83/0x210 [mdt] [Tue Oct 2 11:56:49 2018] [<ffffffffc134636b>] mdt_reint_internal+0x5fb/0x9c0 [mdt] [Tue Oct 2 11:56:49 2018] [<ffffffffc1346892>] mdt_intent_reint+0x162/0x430 [mdt] [Tue Oct 2 11:56:49 2018] [<ffffffffc1351671>] mdt_intent_policy+0x441/0xc70 [mdt] [Tue Oct 2 11:56:49 2018] [<ffffffffc0f3b2ba>] ldlm_lock_enqueue+0x38a/0x980 [ptlrpc] [Tue Oct 2 11:56:49 2018] [<ffffffffc0f64b53>] ldlm_handle_enqueue0+0x9d3/0x16a0 [ptlrpc] [Tue Oct 2 11:56:49 2018] [<ffffffffc0fea452>] tgt_enqueue+0x62/0x210 [ptlrpc] [Tue Oct 2 11:56:49 2018] [<ffffffffc0fee38a>] tgt_request_handle+0x92a/0x1370 [ptlrpc] [Tue Oct 2 11:56:49 2018] [<ffffffffc0f96e4b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc] [Tue Oct 2 11:56:49 2018] [<ffffffffc0f9a592>] ptlrpc_main+0xa92/0x1e40 [ptlrpc] [Tue Oct 2 11:56:49 2018] [<ffffffff946bae31>] kthread+0xd1/0xe0 [Tue Oct 2 11:56:49 2018] [<ffffffff94d1f5f7>] ret_from_fork_nospec_end+0x0/0x39 [Tue Oct 2 11:56:49 2018] [<ffffffffffffffff>] 0xffffffffffffffff
Please have a look at above, full dmesg and lustre log files are attached as an archive.