Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11465

OSS/MDS deadlock in 2.10.5

    XMLWordPrintable

Details

    • Bug
    • Resolution: Incomplete
    • Critical
    • None
    • Lustre 2.10.4, Lustre 2.10.5, Lustre 2.10.6
    • None
    • CentOS 7, 3.10.0-862.2.3.el7_lustre.x86_64, 1 MDS (+1 HA pair), 4 OSS
    • 3
    • 9223372036854775807

    Description

      After an upgrade of Lustre servers from Centos 6 and IEEL/DDN version of Lustre 2.5 to Centos 7 and Lustre 2.10.5 we are experiencing stability issues related to and MDS/OSS deadlock, most likely caused by hung OST thread. The issue is visible on one of two filesystems, the smaller one, both are working on identical HW and SW stack, so the issue is most likely related to a specific workload on the filesystem. This is mostly an midterm storage filesystem, so more metadata ops happen on it than on the scratch one. We initially thought that the issue might be caused by our try to enable project quotas (tune2fs -O project on all targets), so we turned it off (tune2fs -O ^project), which didn't change a thing. Also all combinations of MDT/OST quota enforcement (initally it was 'g', than 'gp', now 'none') doesn't change a thing, so probably it is not related to the problem. In terms of relevant information, we run on sync_journal=1 as a workaround to a problem with memory accounting on the clients suggested by DDN some time ago. The crash happens at least once per few hours, sometimes sequentially one after another, on different OSS nodes. I also tagged 2.10.6 as affected, as we tried everything from 2.10.4 to current b2_10.

      On OSS we get:

       

      [Tue Oct  2 11:56:44 2018] LNet: Service thread pid 11401 was inactive for 200.27s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [Tue Oct  2 11:56:44 2018] Pid: 11401, comm: ll_ost_out00_00 3.10.0-862.2.3.el7_lustre.x86_64 #1 SMP Tue May 22 17:36:23 UTC 2018
      [Tue Oct  2 11:56:44 2018] Call Trace:
      [Tue Oct  2 11:56:44 2018]  [<ffffffff8395aa77>] call_rwsem_down_write_failed+0x17/0x30
      [Tue Oct  2 11:56:44 2018]  [<ffffffffc0eeaa8c>] osd_write_lock+0x5c/0xe0 [osd_ldiskfs]
      [Tue Oct  2 11:56:44 2018]  [<ffffffffc0b5b119>] out_tx_attr_set_exec+0x69/0x3f0 [ptlrpc]
      [Tue Oct  2 11:56:44 2018]  [<ffffffffc0b55591>] out_tx_end+0xe1/0x5c0 [ptlrpc]
      [Tue Oct  2 11:56:44 2018]  [<ffffffffc0b596d3>] out_handle+0x1453/0x1bc0 [ptlrpc]
      [Tue Oct  2 11:56:44 2018]  [<ffffffffc0b4f38a>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
      [Tue Oct  2 11:56:44 2018]  [<ffffffffc0af7e4b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
      [Tue Oct  2 11:56:44 2018]  [<ffffffffc0afb592>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
      [Tue Oct  2 11:56:44 2018]  [<ffffffff836bae31>] kthread+0xd1/0xe0
      [Tue Oct  2 11:56:44 2018]  [<ffffffff83d1f5f7>] ret_from_fork_nospec_end+0x0/0x39
      [Tue Oct  2 11:56:44 2018]  [<ffffffffffffffff>] 0xffffffffffffffff
      [Tue Oct  2 11:56:44 2018] LustreError: dumping log to /tmp/lustre-log.1538474204.11401
      [Tue Oct  2 11:56:44 2018] Pid: 35664, comm: ll_ost_io00_098 3.10.0-862.2.3.el7_lustre.x86_64 #1 SMP Tue May 22 17:36:23 UTC 2018
      [Tue Oct  2 11:56:44 2018] Call Trace:
      [Tue Oct  2 11:56:44 2018]  [<ffffffffc0e7e495>] jbd2_log_wait_commit+0xc5/0x140 [jbd2]
      [Tue Oct  2 11:56:44 2018]  [<ffffffffc0e76533>] jbd2_journal_stop+0x343/0x3d0 [jbd2]
      [Tue Oct  2 11:56:44 2018]  [<ffffffffc112ab4c>] __ldiskfs_journal_stop+0x3c/0xb0 [ldiskfs]
      [Tue Oct  2 11:56:44 2018]  [<ffffffffc0efb783>] osd_trans_stop+0x183/0x850 [osd_ldiskfs]
      [Tue Oct  2 11:56:44 2018]  [<ffffffffc107d352>] ofd_trans_stop+0x22/0x60 [ofd]
      [Tue Oct  2 11:56:44 2018]  [<ffffffffc10835f4>] ofd_commitrw_write+0x7e4/0x1c90 [ofd]
      [Tue Oct  2 11:56:44 2018]  [<ffffffffc10877a9>] ofd_commitrw+0x4c9/0xae0 [ofd]
      [Tue Oct  2 11:56:44 2018]  [<ffffffffc0b80864>] obd_commitrw+0x2f3/0x336 [ptlrpc]
      [Tue Oct  2 11:56:44 2018]  [<ffffffffc0b5338d>] tgt_brw_write+0xffd/0x17d0 [ptlrpc]
      [Tue Oct  2 11:56:44 2018]  [<ffffffffc0b4f38a>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
      [Tue Oct  2 11:56:44 2018]  [<ffffffffc0af7e4b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
      [Tue Oct  2 11:56:44 2018]  [<ffffffffc0afb592>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
      [Tue Oct  2 11:56:44 2018]  [<ffffffff836bae31>] kthread+0xd1/0xe0
      [Tue Oct  2 11:56:45 2018]  [<ffffffff83d1f5f7>] ret_from_fork_nospec_end+0x0/0x39
      [Tue Oct  2 11:56:45 2018]  [<ffffffffffffffff>] 0xffffffffffffffff
      [Tue Oct  2 11:56:45 2018] Pid: 35678, comm: ll_ost_io00_112 3.10.0-862.2.3.el7_lustre.x86_64 #1 SMP Tue May 22 17:36:23 UTC 2018
      [Tue Oct  2 11:56:45 2018] Call Trace:
      [Tue Oct  2 11:56:45 2018]  [<ffffffffc0e7e495>] jbd2_log_wait_commit+0xc5/0x140 [jbd2]
      [Tue Oct  2 11:56:45 2018]  [<ffffffffc0e76533>] jbd2_journal_stop+0x343/0x3d0 [jbd2]
      [Tue Oct  2 11:56:45 2018]  [<ffffffffc112ab4c>] __ldiskfs_journal_stop+0x3c/0xb0 [ldiskfs]
      [Tue Oct  2 11:56:45 2018]  [<ffffffffc0efb783>] osd_trans_stop+0x183/0x850 [osd_ldiskfs]
      [Tue Oct  2 11:56:45 2018]  [<ffffffffc107d352>] ofd_trans_stop+0x22/0x60 [ofd]
      [Tue Oct  2 11:56:45 2018]  [<ffffffffc10835f4>] ofd_commitrw_write+0x7e4/0x1c90 [ofd]
      [Tue Oct  2 11:56:45 2018]  [<ffffffffc10877a9>] ofd_commitrw+0x4c9/0xae0 [ofd]
      [Tue Oct  2 11:56:45 2018]  [<ffffffffc0b80864>] obd_commitrw+0x2f3/0x336 [ptlrpc]
      [Tue Oct  2 11:56:45 2018]  [<ffffffffc0b5338d>] tgt_brw_write+0xffd/0x17d0 [ptlrpc]
      [Tue Oct  2 11:56:45 2018]  [<ffffffffc0b4f38a>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
      [Tue Oct  2 11:56:45 2018]  [<ffffffffc0af7e4b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
      [Tue Oct  2 11:56:45 2018]  [<ffffffffc0afb592>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
      [Tue Oct  2 11:56:45 2018]  [<ffffffff836bae31>] kthread+0xd1/0xe0
      [Tue Oct  2 11:56:45 2018]  [<ffffffff83d1f5f7>] ret_from_fork_nospec_end+0x0/0x39
      [Tue Oct  2 11:56:45 2018]  [<ffffffffffffffff>] 0xffffffffffffffff
      

       

       

      On MDS:

      [Tue Oct  2 11:56:44 2018] LNet: Service thread pid 69588 was inactive for 200.49s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [Tue Oct  2 11:56:44 2018] Pid: 69588, comm: mdt00_095 3.10.0-862.2.3.el7_lustre.x86_64 #1 SMP Tue May 22 17:36:23 UTC 2018
      [Tue Oct  2 11:56:44 2018] Call Trace:
      [Tue Oct  2 11:56:44 2018]  [<ffffffffc0f7e140>] ptlrpc_set_wait+0x4c0/0x920 [ptlrpc]
      [Tue Oct  2 11:56:44 2018]  [<ffffffffc0f7e61d>] ptlrpc_queue_wait+0x7d/0x220 [ptlrpc]
      [Tue Oct  2 11:56:44 2018]  [<ffffffffc14de463>] osp_remote_sync+0xd3/0x200 [osp]
      [Tue Oct  2 11:56:44 2018]  [<ffffffffc14c6dd0>] osp_attr_set+0x4c0/0x5d0 [osp]
      [Tue Oct  2 11:56:44 2018]  [<ffffffffc1421c6b>] lod_sub_attr_set+0x1cb/0x460 [lod]
      [Tue Oct  2 11:56:44 2018]  [<ffffffffc1403be6>] lod_obj_stripe_attr_set_cb+0x16/0x30 [lod]
      [Tue Oct  2 11:56:44 2018]  [<ffffffffc140fa96>] lod_obj_for_each_stripe+0xb6/0x230 [lod]
      [Tue Oct  2 11:56:44 2018]  [<ffffffffc1411043>] lod_attr_set+0x2f3/0x9a0 [lod]
      [Tue Oct  2 11:56:44 2018]  [<ffffffffc14810a0>] mdd_attr_set_internal+0x120/0x2a0 [mdd]
      [Tue Oct  2 11:56:44 2018]  [<ffffffffc1481e8d>] mdd_attr_set+0x8ad/0xce0 [mdd]
      [Tue Oct  2 11:56:44 2018]  [<ffffffffc13645f5>] mdt_reint_setattr+0xba5/0x1060 [mdt]
      [Tue Oct  2 11:56:44 2018]  [<ffffffffc1364b33>] mdt_reint_rec+0x83/0x210 [mdt]
      [Tue Oct  2 11:56:44 2018]  [<ffffffffc134636b>] mdt_reint_internal+0x5fb/0x9c0 [mdt]
      [Tue Oct  2 11:56:44 2018]  [<ffffffffc1351f07>] mdt_reint+0x67/0x140 [mdt]
      [Tue Oct  2 11:56:44 2018]  [<ffffffffc0fee38a>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
      [Tue Oct  2 11:56:44 2018]  [<ffffffffc0f96e4b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
      [Tue Oct  2 11:56:44 2018]  [<ffffffffc0f9a592>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
      [Tue Oct  2 11:56:44 2018]  [<ffffffff946bae31>] kthread+0xd1/0xe0
      [Tue Oct  2 11:56:44 2018]  [<ffffffff94d1f5f7>] ret_from_fork_nospec_end+0x0/0x39
      [Tue Oct  2 11:56:44 2018]  [<ffffffffffffffff>] 0xffffffffffffffff
      [Tue Oct  2 11:56:44 2018] LustreError: dumping log to /tmp/lustre-log.1538474204.69588
      [Tue Oct  2 11:56:49 2018] LNet: Service thread pid 7634 was inactive for 200.51s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [Tue Oct  2 11:56:49 2018] Pid: 7634, comm: mdt_rdpg00_000 3.10.0-862.2.3.el7_lustre.x86_64 #1 SMP Tue May 22 17:36:23 UTC 2018
      [Tue Oct  2 11:56:49 2018] Call Trace:
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc06ef085>] wait_transaction_locked+0x85/0xd0 [jbd2]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc06ef368>] add_transaction_credits+0x268/0x2f0 [jbd2]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc06ef5e1>] start_this_handle+0x1a1/0x430 [jbd2]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc06efa93>] jbd2__journal_start+0xf3/0x1f0 [jbd2]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc118aa99>] __ldiskfs_journal_start_sb+0x69/0xe0 [ldiskfs]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc12cac8e>] osd_trans_start+0x1ae/0x460 [osd_ldiskfs]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc1002512>] top_trans_start+0x702/0x940 [ptlrpc]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc13ec3f1>] lod_trans_start+0x31/0x40 [lod]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc148c1ba>] mdd_trans_start+0x1a/0x20 [mdd]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc1481b69>] mdd_attr_set+0x589/0xce0 [mdd]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc136c1d6>] mdt_mfd_close+0x1a6/0x610 [mdt]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc1371951>] mdt_close_internal+0x121/0x220 [mdt]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc1371c70>] mdt_close+0x220/0x780 [mdt]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc0fee38a>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc0f96e4b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc0f9a592>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
      [Tue Oct  2 11:56:49 2018]  [<ffffffff946bae31>] kthread+0xd1/0xe0
      [Tue Oct  2 11:56:49 2018]  [<ffffffff94d1f5f7>] ret_from_fork_nospec_end+0x0/0x39
      [Tue Oct  2 11:56:49 2018]  [<ffffffffffffffff>] 0xffffffffffffffff
      [Tue Oct  2 11:56:49 2018] LustreError: dumping log to /tmp/lustre-log.1538474210.7634
      [Tue Oct  2 11:56:49 2018] Pid: 69508, comm: mdt00_067 3.10.0-862.2.3.el7_lustre.x86_64 #1 SMP Tue May 22 17:36:23 UTC 2018
      [Tue Oct  2 11:56:49 2018] Call Trace:
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc06ef085>] wait_transaction_locked+0x85/0xd0 [jbd2]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc06ef368>] add_transaction_credits+0x268/0x2f0 [jbd2]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc06ef5e1>] start_this_handle+0x1a1/0x430 [jbd2]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc06efa93>] jbd2__journal_start+0xf3/0x1f0 [jbd2]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc118aa99>] __ldiskfs_journal_start_sb+0x69/0xe0 [ldiskfs]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc12cac8e>] osd_trans_start+0x1ae/0x460 [osd_ldiskfs]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc1369de7>] mdt_empty_transno+0xf7/0x840 [mdt]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc136cf1e>] mdt_mfd_open+0x8de/0xe70 [mdt]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc136da2b>] mdt_finish_open+0x57b/0x690 [mdt]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc136f308>] mdt_reint_open+0x17c8/0x3190 [mdt]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc1364b33>] mdt_reint_rec+0x83/0x210 [mdt]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc134636b>] mdt_reint_internal+0x5fb/0x9c0 [mdt]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc1346892>] mdt_intent_reint+0x162/0x430 [mdt]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc1351671>] mdt_intent_policy+0x441/0xc70 [mdt]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc0f3b2ba>] ldlm_lock_enqueue+0x38a/0x980 [ptlrpc]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc0f64b53>] ldlm_handle_enqueue0+0x9d3/0x16a0 [ptlrpc]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc0fea452>] tgt_enqueue+0x62/0x210 [ptlrpc]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc0fee38a>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc0f96e4b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc0f9a592>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
      [Tue Oct  2 11:56:49 2018]  [<ffffffff946bae31>] kthread+0xd1/0xe0
      [Tue Oct  2 11:56:49 2018]  [<ffffffff94d1f5f7>] ret_from_fork_nospec_end+0x0/0x39
      [Tue Oct  2 11:56:49 2018]  [<ffffffffffffffff>] 0xffffffffffffffff
      [Tue Oct  2 11:56:49 2018] Pid: 69607, comm: mdt00_105 3.10.0-862.2.3.el7_lustre.x86_64 #1 SMP Tue May 22 17:36:23 UTC 2018
      [Tue Oct  2 11:56:49 2018] Call Trace:
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc06ef085>] wait_transaction_locked+0x85/0xd0 [jbd2]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc06ef368>] add_transaction_credits+0x268/0x2f0 [jbd2]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc06ef5e1>] start_this_handle+0x1a1/0x430 [jbd2]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc06efa93>] jbd2__journal_start+0xf3/0x1f0 [jbd2]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc118aa99>] __ldiskfs_journal_start_sb+0x69/0xe0 [ldiskfs]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc12cac8e>] osd_trans_start+0x1ae/0x460 [osd_ldiskfs]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc1369de7>] mdt_empty_transno+0xf7/0x840 [mdt]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc136cf1e>] mdt_mfd_open+0x8de/0xe70 [mdt]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc136da2b>] mdt_finish_open+0x57b/0x690 [mdt]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc136f308>] mdt_reint_open+0x17c8/0x3190 [mdt]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc1364b33>] mdt_reint_rec+0x83/0x210 [mdt]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc134636b>] mdt_reint_internal+0x5fb/0x9c0 [mdt]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc1346892>] mdt_intent_reint+0x162/0x430 [mdt]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc1351671>] mdt_intent_policy+0x441/0xc70 [mdt]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc0f3b2ba>] ldlm_lock_enqueue+0x38a/0x980 [ptlrpc]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc0f64b53>] ldlm_handle_enqueue0+0x9d3/0x16a0 [ptlrpc]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc0fea452>] tgt_enqueue+0x62/0x210 [ptlrpc]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc0fee38a>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc0f96e4b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc0f9a592>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
      [Tue Oct  2 11:56:49 2018]  [<ffffffff946bae31>] kthread+0xd1/0xe0
      [Tue Oct  2 11:56:49 2018]  [<ffffffff94d1f5f7>] ret_from_fork_nospec_end+0x0/0x39
      [Tue Oct  2 11:56:49 2018]  [<ffffffffffffffff>] 0xffffffffffffffff
      [Tue Oct  2 11:56:49 2018] Pid: 69594, comm: mdt00_098 3.10.0-862.2.3.el7_lustre.x86_64 #1 SMP Tue May 22 17:36:23 UTC 2018
      [Tue Oct  2 11:56:49 2018] Call Trace:
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc06ef085>] wait_transaction_locked+0x85/0xd0 [jbd2]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc06ef368>] add_transaction_credits+0x268/0x2f0 [jbd2]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc06ef5e1>] start_this_handle+0x1a1/0x430 [jbd2]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc06efa93>] jbd2__journal_start+0xf3/0x1f0 [jbd2]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc118aa99>] __ldiskfs_journal_start_sb+0x69/0xe0 [ldiskfs]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc12cac8e>] osd_trans_start+0x1ae/0x460 [osd_ldiskfs]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc1002512>] top_trans_start+0x702/0x940 [ptlrpc]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc13ec3f1>] lod_trans_start+0x31/0x40 [lod]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc148c1ba>] mdd_trans_start+0x1a/0x20 [mdd]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc14763c0>] mdd_create+0xbe0/0x1400 [mdd]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc136fcb5>] mdt_reint_open+0x2175/0x3190 [mdt]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc1364b33>] mdt_reint_rec+0x83/0x210 [mdt]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc134636b>] mdt_reint_internal+0x5fb/0x9c0 [mdt]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc1346892>] mdt_intent_reint+0x162/0x430 [mdt]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc1351671>] mdt_intent_policy+0x441/0xc70 [mdt]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc0f3b2ba>] ldlm_lock_enqueue+0x38a/0x980 [ptlrpc]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc0f64b53>] ldlm_handle_enqueue0+0x9d3/0x16a0 [ptlrpc]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc0fea452>] tgt_enqueue+0x62/0x210 [ptlrpc]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc0fee38a>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc0f96e4b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
      [Tue Oct  2 11:56:49 2018]  [<ffffffffc0f9a592>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
      [Tue Oct  2 11:56:49 2018]  [<ffffffff946bae31>] kthread+0xd1/0xe0
      [Tue Oct  2 11:56:49 2018]  [<ffffffff94d1f5f7>] ret_from_fork_nospec_end+0x0/0x39
      [Tue Oct  2 11:56:49 2018]  [<ffffffffffffffff>] 0xffffffffffffffff
      

      Please have a look at above, full dmesg and lustre log files are attached as an archive.

       

      Attachments

        1. logs.zip
          15.44 MB
        2. cyf_dec10_server_stacks.log
          117 kB
        3. cyf_dec10_client_stacks.log
          28 kB

        Issue Links

          Activity

            People

              bzzz Alex Zhuravlev
              m.magrys Marek Magrys
              Votes:
              3 Vote for this issue
              Watchers:
              19 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: